Norwegian version of this page

New version of Glossa

The Text Laboratory continues to develope the search and post-processing tool Glossa.

Glossa offers a modern, simple and functional search interface with advanced post-processing possibilities for both written corpora, multilingual corpora and speech corpora.

Glossa is so easy to install so that institutions and researchers can create their own corpora and put them on their own server or even on their laptop.

Glossa can also be used to search corpora located on different servers from the one where Glossa itself is installed. This is possible by using 
CLARIN federated content search.

In addition, Glossa has a modern interface which can be easily themed for individual Glossa installations and metadata menus. Glossa offers several versions of the search interface: a simple (Google-like) interface, a more advanced interface with clickable possibilities for e.g. lemma or POS searches, and an interface that allows the use of regular expressions.

Glossa offers login through Feide, eduGAIN and CLARIN and has a system for authorization of different corpus licenses.

Work on Glossa has been funded by CLARINO and the LIA project, and the entire system is open source software that can be freely downloaded. The system is still under development, now by the infrastructure project CLARINO+. In this project the main focus will be on developing different solutions for showing search results.


Read more and see available corpora in Glossa

Most corpora have user guides in English or Norwegian.  Read e.g the user guide for LIA Sápmi (written corpus) or Nordic Dialect corpus (speech corpus).


Download Glossa from GitHub


Glossa was selected as a showcase during the annual CLARIN meeting 2013 in Prague. See the presentation on the CLARIN web page.


Nøklestad, Anders, Hagen, Kristin, Johannessen, Janne Bondi, Kosek, Michal and Joel Priestley. 2017. A modernised version of the Glossa corpus search system. In Jörg Tiedemann (ed.): Proceedings of the 21st Nordic Conference on Computational Linguistics (NoDaLiDa). 2017, 251-254. Read the paper.

Kosek, Michal , Anders Nøklestad, Joel Priestley, Kristin Hagen, and Janne Bondi Johannessen. 2015. In Gintarė Grigonytė, Simon Clematide, Andrius Utka and Martin Volk (eds.): Visualisation in speech corpora: maps and waves in the Glossa system, Proceedings of the Workshop on Innovative Corpus Query and Visualization Tools at NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania, NEALT Proceedings Series 25, 23–31. Read the paper.

Published Oct. 7, 2016 3:47 PM - Last modified Nov. 6, 2020 5:41 PM