Frequency lists from NoWaC
The frequency lists from NoWaC contain frequencies of word forms and lemmas.
Homonyms are counted separately according to how they have been tagged by the Oslo-Bergen grammatical tagger. For example, the verb "arbeid" and the noun "arbeid" are counted separately, and the same goes for e.g. past tense and past participle of verbs like "hoppe".
All words are converted to lowercase letters so that e.g. "The" and "the" are counted together. An exception is proper names that retain their original form.
It should be noted that parts of the corpus contain text in formats that are difficult to recognize for the grammatical tagger (e.g. different newpaper bylines or question-answer formats on chat sites). This means that many words have been analysed as proper names when they are in fact sentence initial common nouns, pronouns etc.
The frequency lists are distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Generic license.
- Download the frequency list of analyzed word forms
- Download the raw word form frequency list
- Download the lemma frequency list
- Download the list of inflected forms for the 1000 most frequent lemmas
- Download the frequency list sorted primary alphabetic and secondary by frequency within each character.
(User example: Search for a blank followed by the character b and get the most frequent words starting with the character b.)
- Norwegian noun-noun compounds (compiled from NoWaC and edited by Eli Anne Eiesland)