The Sofie Treebank -

A Parallel Treebank of North European languages

The Sofie Treebank is a parallel treebank that at completion will consist of material from nine North European languages; Danish, Dutch, English, Estonian, Finnish, German, Icelandic, Norwegian and Swedish. The text material of treebank is taken from the Norwegian original and the translations of the first two chapters of Jostein Gaarder's novel Sofies verden (acknowledgments).

The treebank is being developed by the participants of the Nordic Treebank Network, in which academic institutions from Denmark, Estonia, Finland, Iceland, Norway, and Sweden take part. Some of the languages in the treebank are represented by more than one set of analyses, reflecting the fact that more than one institution has done work for that language. The analyses reflect different grammars, such as Dependency Grammar (Swedish - Växjö University) and a Phrase Structure Grammar of syntactic and function categories (University of Oslo).

The web interface is being developed by the Text Laboratory, using the Tiger-XML format, Tgrep2 and MySQL.

Permission to use the corpus can be given to those signing an agreement that they will only use the corpus for research, development and teaching. A web-form will be available soon, in the meantime, contact Lars Nygaard. If you already have got a permission, click here to use the corpus.

At the moment (May 2004) there are tree-representations of 50 sentences in six languages (two analyses for Danish, one for Norwegian, Swedish, English, German, and Estonian):

An example of some of the clickable sentences in the corpus:

This is an example of one of the Danish analyses:


 

 

 

Kontakt oss.

Oppdatert 28. mai 2004 av KH.