habit

Last modified Aug. 9, 2017 1:44 PM by Kristin Hagen
Last modified Oct. 23, 2023 4:07 PM by lenkeretter@localhost

The Text Laboratory can now offer two big web corpora for Norwegian, finished in 2017.

• HaBiT Norwegian Web Corpus 2015 (Bokmål) with 1.18 billion words (3.4 million documents).

• HaBiT Norwegian Web Corpus 2015 (Nynorsk) with more than 55 million words (214 000 documents).

The corpus for Nynorsk is the first web corpora collected for this language.