Corpus Linguistics and Beyond

Wolfgang Teubert

University of Birmingham

Corpus linguistics is often seen as  a field of applied linguistics, and it has always kept its distance to language theory. To the extent there has been a discussion of its principles, they have been centred around the issues ‘corpus-based vs. corpus-driven’ and ‘top-down vs. bottom up.’ Some see corpus linguistics as bunch of tools, others as a more or less scientific methodology and very few as a language theory in its own right.

The aim of corpus linguistics has been to come up with reliable empirical data concerning language as a (standardised) system, yielding generalisations useful for language teaching and dictionary compilation. This aim, however, is in conflict with the principle that the meaning of units of meaning (whatever they may be) is largely determined by the context in which it is embedded, a context that can hardly be reduced to a common denominator. Meaning is only found in discourse. It is not part of a language system.

Discourse has, of necessity, a diachronic dimension. What is said is often a reaction to what has been said before and cannot be understood without following its intertextual links. The meaning of a unit of meaning (type) is the entirety of what has been said about it. It is constructed and continually (re-)negotiated in discourse. Therefore the meaning of a unit of meaning (token) is the difference between its current context and the entirety of contexts in which it has been previously used.

There is no ‘scientific’ methodology to extract meaning without making arbitrary decisions. It is up to the interpretive community (i.e. the discourse community) to review and make sense of the data extracted from discourse. A theory of corpus linguistics finds its place not in a science of language, but in discourse studies as part of the humanities.

