The VESPA corpus

The Varieties of English for Specific Purposes dAtabase (VESPA) learner corpus consists of academic essays written by learners of English from a variety of first language backgrounds. The project is co-ordinated by Dr. Magali Paquot at the Catholic University of Louvain.

The Norwegian subcorpus of VESPA (VESPA-NO) is being compiled by Signe Oksefjell Ebeling and Hilde Hasselgård at the Department of Literature, Area Studies and European Languages at the University of Oslo. The contributors to the corpus may be described as advanced learners of English. So far the corpus comprises texts from the following disciplines:

  • Linguistics
  • Literature
  • Business

VESPA-NO consists of texts written by students whose first language is Norwegian. There is also a separate component of texts written by students with other mother-tongue backgrounds. Texts are typically produced as part of a taught course, i.e. as obligatory assignments or term papers.

The corpus has been enriched with functional annotation using a set of macros and Perl scripts based on the macros first developed for the British Academic Written English Corpus (BAWE) (cf. Ebeling & Heuboeck 2007), and adjusted for VESPA by Alois Heuboeck (Reading University, UK).

The corpus is suitable for use with WordSmith Tools. It is available to students and researchers at the Univesity of Oslo and to researchers developing other subcorpora of VESPA.

Current status of the corpus: The linguistics component is practically completed at close to 330,000 words. The literature component currently (2018) comprises c. 150,000 words, while the business component is much smaller (at 50,000 words), and texts are being compiled. More disciplines can hopefully be added in the future.

We are grateful to the Department of Literature, Area Studies and European Languages for funding at various stages of the development of macros and the compilation and annotation of the corpus.

Published Feb. 17, 2014 4:43 PM - Last modified Dec. 18, 2018 10:46 AM