The two corpora contain a lot of blog texts and other texts that are less normative and closer to speech than texts found in corpora based solely on edited texts, such as newspapers, reports and fiction published by a publisher.
Both corpora are collected in February 2015 using SpiderLing. The texts are tagged with the Oslo-Bergen Tagger. The work has been done at Masarykova Univerzita in Brno, the Czech Republic in cooperation with the Text Laboratory, University of Oslo and NTNU within the framework of the HaBiT project, financed by the Czech-Norwegian Research Programme (EEA and Norway Grants).
The corpora can be searched in SketchEngine: