Patternization! -- How to measure syntactic diversity

Alexander Pfaff

In this presentation, I will summarize aspects of an ongoing research project intended to account for “Constraints on Syntactic Variation” in Old Germanic noun phrases. One component of the project is the development of an annotated database specifically dedicated to noun phrases.

More specifically, I will develop the (theory-neutral) notion of a pattern as a linear string of formal features, and patternization as a method to systematically and exhaustively describe, classify and quantify word order variation in the noun phrase – and beyond.  At the outset, patternization provides a purely mathematical perspective examining (annotation) categories in terms of numbers, combinatorics, distribution etc., but since it is intended to be a recursive procedure, it will successively produce results that are also syntactically significant. In principle, the method described can be conducted manually, but practically, it requires more computational power; the presentation will alternately describe the idea itself and a Python tool (also currently being developed) used to perform the various tasks.

For a brief illustration of (one aspect of) the idea, consider the following example (found by accident in the MÍM[1] corpus):

 

(1)

a.

Poss

Adj

N

(Old Icelandic)

 

 

sína

fullkomna

vináttu

 

 

 

his

perfect

friendship

 

 

b.

Adj

N

Poss

 

 

 

fullkomna

vináttu

sína

 

 

c.

Adj

Poss

N

 

 

 

fullkominni 

sinni 

vináttu

 

 

d.

N

Poss

Adj

 

 

 

vináttu

sinni

fullkominni 

 

 

e.

Poss

N

Adj

 

 

 

sinni

vináttu

fullkominni 

 

 

f.

N

Adj

Poss

 

 

*(?)

vináttu

fullkomna 

sína   

 

 

Three elements allow for 3! = 3 x 2 x 1 = 6 permutations; in the above case, five out of those six are attested, which allows us to put a number on syntactic diversity. However, interesting though this may be, it only applies in a minimal domain. Patternization goes two steps further and examines any sets of permutations of any length for any category. Even though many individual results may not be as interesting as the above one, the method as such is exhaustive. As will be shown, outputs of this operation can be input to further procedures, allowing us to successively develop a multifaceted profile of the noun phrase landscape (for a specific language / corpus / text …).

 
Published Jan. 28, 2020 4:17 PM - Last modified Jan. 28, 2020 4:35 PM