The project deals with the relationship between information structure and word order in old Germanic and Romance languages, with reference to the modern languages. Even though these languages belong to two different language groups, they had certain common word order structures in their older stages, and our main hypothesis is that the present-day word order variation is a result of historical changes in the dynamics between information structure and syntax. We believe that by studying word order change from a pragmatic perspective, i.e. focusing on text structuring mechanisms rather than solely on syntactic mechanisms, and from a contrastive perspective, we may arrive at a greater understanding of the general mechanisms of language change. We aim to find out to what extent and in what way word order in the older stages of the languages was governed by information-structural constraints, how the languages changed with respect to the relation between word order and information structure, and how the modern languages differ from their older versions concerning these properties.

Languages can be classified typologically according to the word order of main, declarative clauses. Consequently, word order changes are seen as major events in the history of a language, since a change in word order may lead to a change in the typological classification of a language. For example, English is unique among the Germanic languages in having changed from a verb-second language (Old English), in which the verb follows any initial element, into a verb-medial language, in which the verb follows the subject. Thus, Modern English belongs to the group of languages that encode syntactic functions, such as subject and object, by means of their position in the sentence. Languages such as English and French always have the subject directly before the verb, except in special syntactic environments, e.g. questions. In other languages, these restrictions may not apply, and the constituents of a sentence can be placed in almost any order. In Portuguese, for example, all save one of the logically possible combinations of word orders in a sentence with subject, verb and object are allowed.


Information structure is defined as the relationship between the sentence constituents and the surrounding text. Whenever we speak or write, we (subconsciously) make choices about what parts of the information we convey are important, and what parts have to be relegated to the background. These choices are reflected in the linguistic structure of the language, so that the form of an utterance signals not just the relation between the sentence constituents, but also the relation of the utterance to the surrounding textual context. For example, on the sentence level, information that is already known from the context is usually placed at the beginning of the sentence, and new information is placed toward the end. Cross-linguistic surveys of modern languages reveal that a large number of languages have means of structuring information within the discourse. However, the extent to which and the manner in which syntax and information structure work together vary across languages, synchronically and diachronically.

For the purposes of this project we have selected three languages from each language group: the Germanic languages German, English and Norwegian, and the Romance languages French, Spanish and Portuguese, in their modern and historical versions.

The older Germanic and Romance languages are characterized by a large number of verb-second (V2) sentences, i.e. sentences in which the verb occupies second position regardless of the type of initial element. This feature has often been attributed to a V2 structure similar to the one we find in for instance Modern Scandinavian and German, although there are considerable differences between the languages with respect to the frequency of V2 sentences and the syntactic and pragmatic features associated with these constructions. (1a)-(6a) are examples of such structures in the older languages, with their modern equivalents given in (1b)-(6b). In (1a)-(6a), a sentence-initial adverb causes the word order V(erb) - S(ubject) - (O(bject)). The present-day languages, on the other hand, show three different patterns of subject placement: German and Norwegian retain the verb-second structure, Spanish and Portuguese allow both VS and SV word order, while English and French, even though they belong to two different language groups, show much of the same behavior and have preverbal subjects, even when there is a sentence-initial adverb.

Norwegian and German:

(1a) Old Norse:
þa komu Kvenir til hans og sögþu...
'Then came Kvens [people of Finnish descent] to him and said...'

(1b) Modern Norwegian:
Da kom kvener til ham og sa....

(2a) Old High German:
Tho quad her zi andaremo manthen...
'Then said he to another man...'

(2b) Modern German:
Dann sagte er zu einem anderen Mann... 
Portuguese and Spanish:

(3a) Old Portuguese:
E entom sayo do boosco hũa molher nua
'And then came out-from the-woods a woman naked'

(3b) Mod. Portuguese:
E então saiu do bosco uma mulher nua
E então uma mulher nua saiu do bosco

(4a) Old Spanish:
Entonces tomaron los romanos la dicha cibdad
'Then took the Romans the already-mentioned city'

(4b) Modern Spanish:
Entonces tomaron los romanos dicha ciudad
Entonces, los romanos tomaron dicha ciudad

English and French:

(5a) Old English:
þa for he norþryhte be þæm lande
'Then went he northwards near the land'

(5b) Modern English:
Then he went northwards near the land

(6a) Old French:
Lors descendi Placidas de la montaigne
'Then descended Placidas from the mountain'

(6b) Modern French:
Alors Placidas descendit de la montagne

The modern Germanic languages are generally characterized by verb-second word order in declarative sentences, with the exception of English, which has developed into a verb-medial language, in which preverbal position is the subject position. This means that if the sentence starts with a non-subject constituent, e.g. an adverbial, there will be two elements in preverbal position, since the subject must be placed before the verb. However, in spite of the syntactic constraints of Modern English, the language is still susceptible to the workings of information structure, but the conditions under which pragmatic features are allowed to act have changed. Norwegian and German have kept their general V2 structure, though other aspects of word order have changed. It is therefore of interest to study the interplay between syntax and information structure also in these languages, and compare them with English and the Romance languages.

In the modern Romance languages, the unmarked word order for transitive verbs is SVO. However, Spanish and Portuguese allow both pre- and postverbal subject placement, and placement of the subject is conditioned by both information structure and verb class.[3] In Portuguese, only new information occurs after the verb; thus, subjects may occur postverbally, provided they contain new information. In Spanish, both new and old subjects may occur postverbally (if the subject is new, it has to be the rightmost element). In (4), the subject is given information, and its position preceding or following the verb in (4b) is determined by the adverb entonces; whether it is an adverbial of time signalling narrative progression or functions as a discourse marker, which again depends on the context.

The motivation for choosing these six languages is thus as follows: We wanted to include two languages that have kept verb-second word order, i.e. Norwegian and German. We also wanted to compare two languages that have lost verb-second, but which belong to two different language groups, i.e. English and French. As regards the choice of Spanish and Portuguese, they represent two very similar languages that, on the one hand, both have lost the V2 constraint while at the same time they maintain VS order in certain contexts. This inversion is triggered by different factors in the two languages, and it is these factors we want to examine more closely.


A considerable amount of research has been carried out on both theoretical syntax and syntactic change in the languages we plan to study. The interaction between syntax and information structure is, however, a relatively new topic, especially in diachronic studies. Although some work has been done on this topic (e.g. Bech 2001, Haugan 2001, Faarlund 2003, Hinterhölzl & Petrova 2005, Petrova 2006, Eide 2006, van Kemenade & Los 2006, Petrova & Solf 2008), much of this project will produce pioneer work, both the separate studies of each language and the comparative studies that will emerge as a result. In fact, no such diachronic comparative study on information structure has ever been conducted before.

In the analysis of the languages, information structure and its relation to syntax will be the main focus of investigation, but we will supplement the analysis by also looking at prosody. The study will be corpus-based, and for each language we will establish an annotated corpus. Electronic text corpora from relevant periods are available for all six languages, and extracts from these will be used as basic texts for annotation. While some languages also have morpho-syntactically annotated corpora that we can use as a starting point for further annotation, the contrastive approach in this project requires that we establish a common annotation for all the texts/languages. The texts will be annotated for morphosyntactic structure as well as information structure, and will be modelled on the annotated corpus of old Indo-European languages compiled by our local partners at the PROIEL project. For some of the languages, small corpora with information structure annotation already exist, but they will also have to be adapted to a common annotation. The annotation is based on dependency grammar, enriched with secondary dependencies (slashes) reminiscent of the structure-sharing mechanism in Lexical-Functional Grammar; however, we expect that researchers who work within other theoretical frameworks (e.g. generative grammar) will be able to use the annotated corpus for their purposes.



Topic and focus
The terms topic and focus, or theme and rheme, or topic and comment have been used to define pragmatic relations in sentences. Traditionally, topic is defined as 'the constituent that identifies what the sentence is about', while focus is defined as 'the constituent which adds new information to the sentence/topic'. Distribution patterns for topic and focus tell us how the discourse develops in a particular language; i.e. how the writer relates the new sentence to what has gone before, and what constitutes new information the discourse. These patterns, which signal how the narration progresses, help the reader structure the information and understand the coherence of the text, hence the term information structure. Since languages have different grammatical systems, the means by which topic and focus are expressed may vary, between languages and between historical periods.

However, though it is useful to operate with topic and focus in studies of information structure, there are certain problems associated with it, which is reflected in the fact that the notions of topic and focus have received a variety of interpretations in linguistic literature (see e.g. Givón 1979, Lambrecht 1994, Cummings 1995). It is not always easy to determine what constitutes the topic or the focus in a sentence, even if we assume that texts progress coherently and we know the distribution pattern for topic and focus in a language. Therefore, it is also necessary to carry out an analysis of sentence elements in terms of given and new information.

Given and new information
The categories of topic and focus are closely linked to the structuring of given and new information because there is a correlation between topic and given information on the one hand and focus and new information on the other hand. However, whereas topic and focus are subject to the interpretation of the hearer/reader, given and new information may be established by means of criteria independent of interpretation; elements that have been mentioned in the previous discourse are given, and elements that have not been mentioned are new. In addition, there are subtypes of given information, since given information may be more or less given (‘accessible’), depending on how recently it has been mentioned in the discourse (see e.g. Prince 1981, 1992, Firbas 1992, Chafe 1994).

The annotation method for information structure is based on Nissim et al. (2004), but with simplifications, and this is also the system used by the PROIEL project. The information status (IS) of an entity reflects the speaker's assumptions about the hearer's knowledge/beliefs, and is expressed by the given/new distinction. This distinction also indicates how much a discourse entity contributes to changing or updating the discourse model. A discourse entity will thus be analyzed as either old, inferrable, or new. For the first two, Nissim et al. operate with several subtypes, six for old information and nine for inferrable information. There are no subtypes for new information. Our annotation method will be simpler. For old information, there is only one category, but anaphoric chains will be marked for all old information elements, so that the antecedent may be identified. Accessible information is divided into three categories: inferrable, general and situation.

Information structure has to do with the relation between the sentence constituents and the surrounding context, but it may also be relevant to look at discourse relations in parts of the corpora. Discourse relations are relations between sentences in the text as a whole. The main assumption is that discourse has a hierarchical structure, and a key feature of this hierarchical structure is the distinction between subordinating and coordinating discourse relations (Asher & Lascarides 2003; Asher & Vieu 2005). The idea is that some parts of the text play a subordinate role relative to other parts, i.e. sentences have different rhetorical functions in the text and do not work on the same level when it comes to advancing the discourse. In general, sentences that express a coordinating relation have a narrative-advancing function, whereas sentences that express a subordinating relation serve to elaborate on or explain the preceding discourse. From a word order perspective it may be relevant to investigate whether discourse relations govern word order in any way.

While we know much about syntax and information structure, little has been produced on the interface between prosody and information structure until quite recently. In recent years, however, developments in information structure analysis in combination with technical equipment has enabled advanced prosodic studies of modern languages. Introspection as well as modern laboratory research on intonation, topic and focus have revealed regular, cross-linguistic correspondences between information structure and prosodic markedness. This knowledge has been used to analyze other aspects of sentence structure. For instance, we know that prosodic features are traditionally known to have an impact on the placement of other elements in the clause, in particular pronouns, clitics and unstressed adverbs. Changes that involve topic and focus are thus likely to have an effect on these elements as well. Galves and Sousa (2005) have used such prosodic theories in diachronic analysis of the placement of Portuguese object pronouns, and Eide (2008) has used it to argue in favor of a prosodically driven language change in Portuguese. We believe the same type of analysis can be tried out for the other languages in this project. 


Our project consists of three main parts:

(i) Establishing the corpora, corpus annotation and automatic search program.

(ii) Studies of syntax and information structure in the different languages.

(iii) Comparative/contrastive studies of the languages. 


