|
Oslo Multilingual CorpusWe are currently developing the Oslo Multilingual Corpus (OMC), which is an extension of the English-Norwegian Parallel Corpus (ENPC). The ENPC has the following structure:
A bi-directional corpus of this type can be used for studies of different kinds: a cross-linguistic comparison of original texts, a cross-linguistic comparison of original and translated texts, a comparison of original and translated texts in the same language, and a cross-linguistic comparison of translated texts. The corpus is now being extended on the German side in particular, to ensure equal representation of texts in English, German, and Norwegian, to the extent that this is possible. Recently, the project has been extended to French. Eventually, the corpus will contain original texts in four languages (English, German, French, Norwegian) and their translations into as many as possible of the other three languages. Currently (November 2005), the English-German-Norwegian part of the corpus consists of 32 English, 37 German, and 27 Norwegian original texts with translations into the other two languages, whereas the French-Norwegian part comprises excerpts from 10 Norwegian and 10 French non-fictional texts with their respective translations. Due to copyright restrictions, the corpus is only available to researchers and graduate students at the universities in Oslo and Bergen. However, some texts from the European Union (EU) and the World Health Organization (WHO) are generally available and offer the opportunity to see how the search in parallel texts is done. The search tool is WebTCE, an earlier version of PerlTCE (see above). Lists of the OMC texts that are currently available can be obtained by accessing the corpus.
|
||||||||
