The NORINT Corpus
The NORINT Corpus consists of speech from 51 and written texts from 116 adult learners of Norwegian as second language, all of whom were taking advanced Norwegian courses (≈the CEFR level B2) at the University of Oslo during the summers of 2014 and 2015.
The NORINT Corpus is divided into three sub-parts:
- NORINT Speech: The speech part of the corpus consists of interviews and conversations, 111,000 words all together. In the interviews, a teacher asks L2 learners general questions about their background, studies, work, and future plans. In addition, the same L2 learners converse in pairs about optional themes such as culture, leisure, travel, or life in Norway. There are both audio and video recordings of the interviews and conversations. We have made two recordings of each L2 learner, creating 30-40 minutes of speech material per participant.
The recordings are transcribed with the transcription tool Elan. We have transcribed orthographically, word for word.
You can read the transcription guidelines in Norwegian here.
Search NORINT Speech
- NORINT Recited: 57 L2 learners, 51 of whom contributed to the NORINT Speech sub-part, recite a short story, as well as 60 non-contextualized sentences. This part of the corpus has been audio-recorded.
Search NORINT Recited
- NORINT Text: The text part of the corpus consists of 53,247 words from 116 exam papers written by adult L2 learners taking their Norwegian exams. The informants are partially the same as in NORINT Speech and NORINT Recited but the identification of participants is not possible in the corpus because of privacy protection.
The texts are available in three formats: one original hand written version in pdf format, one written digital copy of the original version and one version where all the orthographic errors are corrected. The original text version and the corrected version are linked together.
Search NORINT Text
Log in to the corpus parts via Feide or eduGAIN where you use your university username and password. Contact us if you don't have access via Feide or eduGAIN: email@example.com.
You can read more about how to use the corpus in the user guide (in Norwegian).
The corpus search interface looks like this: