The English-Norwegian Parallel Corpus

The English-Norwegian Parallel Corpus (ENPC) consists of original texts and their translations (English to Norwegian and Norwegian to English).

It is intended as a general research tool, available beyond the present project for applied and theoretical linguistic research. It started out as a research project at the Department of British and American Studies, University of Oslo, in 1994. The corpus was completed in 1997. In the period 1997-2001 the corpus was extended to include more languages (German, Dutch, Portuguese), and the English and the Norwegian original texts were tagged for part of speech. The manual was completed in 1999 and revised in 2002.

The focus has been on novels and fairly general non-fictional books. In order to include material by a range of authors and translators, the texts of the corpus are limited to text extracts (chunks of 10,000-15,000 words). The fiction part of the corpus contains 30 original text extracts in each language and their translations, whereas the non-fiction part contains 20 in each direction.

The project

Documentation

Brief introduction

The ENPC manual

Frequency information

 

Texts in the corpus

 

Extensions of the project

 Fiction

More languages

 Non-fiction

Multiple translations

 

Part-of-speech tagging

 

Dialogue marking

 

Publications (last updated 2001)

 

People

 

 Access the corpus (user name/password required)

 

Apply for access to the corpus
(restricted to researchers and students at the
University of Oslo and the University of Bergen who need the corpus for their research or for term papers etc.)

These pages will not continue to be updated, but see the OMC (Oslo Multilingual Corpus) website for further information.

Last updated June 2013, SOE 

 

 

 

Published July 6, 2010 10:39 AM - Last modified June 7, 2013 11:52 AM