The English-Norwegian Parallel Corpus

The English-Norwegian Parallel Corpus (ENPC) consists of original texts and their translations (English to Norwegian and Norwegian to English).

It is intended as a general research tool, available beyond the present project for applied and theoretical linguistic research. It started out as a research project at the Department of British and American Studies, University of Oslo, in 1994. The corpus was completed in 1997. In the period 1997-2001 the corpus was extended to include more languages (German, Dutch, Portuguese), and the English and the Norwegian original texts were tagged for part of speech. The manual was completed in 1999 and revised in 2002.

The focus has been on novels and fairly general non-fictional books. In order to include material by a range of authors and translators, the texts of the corpus are limited to text extracts (chunks of 10,000-15,000 words). The fiction part of the corpus contains 30 original text extracts in each language and their translations, whereas the non-fiction part contains 20 in each direction.


The project


Brief introduction

The ENPC manual

Frequency information


Texts in the corpus


Extensions of the project


More languages


Multiple translations


Part-of-speech tagging


Dialogue marking


Publications (last updated 2001)



 Access the corpus (via Feide)

 Access the corpus via PerlTCE (old interface; username/password required)

These pages will not continue to be updated, but see the OMC (Oslo Multilingual Corpus) website for further information.

Last updated May 2019, SOE 




Published July 6, 2010 10:39 AM - Last modified Apr. 15, 2022 12:45 AM