Exploring new approaches to the corpus-based contrastive study of hedging strategies in spoken language

Stine Hulleberg Johansen

In recent years corpus linguistics and pragmatics have begun exploring their common ground. However, this has not been altogether straightforward, mainly because “core features of pragmatics studies [...] are harder to catch with corpus methodology than lexical or morpho-syntactic features” (Taavitsainen & Jucker, 2015: 12). One such core feature is that of hedging. Hedging strategies can take almost any linguistic (or paralinguistic) form and is not an inherent property of words or phrases (Stenström, 1994). Thus identifying hedging strategies is challenging without a pragmatically annotated corpus.

Consequently, in the absence of pragmatically annotated corpora, two main ways of studying pragmatic phenomena through corpus linguistic methodologies have emerged. One is the form-to-function approach that starts from pre-defined lexical words or constructions whose potential pragmatic uses are examined (Aijmer & Rühlemann, 2015). The second is the function-to-form approach, which starts from a language function and investigates the forms used to perform that function.

Although these approaches are presented as equally relevant in the literature, researchers show a clear preference for the former. This is not surprising as a major challenge with the latter approach is that the function cannot be retrieved, only surface forms orbiting it can be used to identify the function in the corpus. This raises the question of whether the function-to-form approach is a realistic methodological alternative. Moreover, there is a need to understand how this approach actually manifests itself and how it can be applied in corpus-based contrastive studies of pragmatic phenomena to capture cross-linguistic variation.

The present study explores one potential application of the function-to-form approach. By searching for certain characteristics of situations where hedging strategies tend to occur, the study aims to retrieve various realisations of hedging strategies. More specifically, by using the conventionalised direct non-performative refusal strategy no (English) and the corresponding nei (Norwegian) (Beebe, Takahashi, & Uliss-Weltz, 1990) as well as the conjunction but (English) and men (Norwegian) signalling contrast or denial of expectation (Blakemore, 1989) as framing devices, the aim is to identify co-occurring hedging strategies in these face-threatening situations. This leads to the following research questions:

RQ1: How can we identify framing devices for extracting pragmatic functions from corpora?
A, conventionalised realisations of speech acts
B, explicit signals of contradiction/contrast
RQ2: Will this application of the function-to-form approach work across languages (Norwegian and English) allowing for a comparison of two or more languages?

The choice of no/nei and but/men as tools in retrieving hedging strategies is rooted in pragmatic research on speech acts and politeness. Politeness is considered a primary motivation for using hedging strategies in conversations (Markkanen & Schröder, 1997). Moreover, refusals have proven to be intrinsically face-threatening across various cultures (Demirkol, 2016). Thus, it is likely that heding strategies will co-occur with refusals as a way of softening the blow. Similarly, saying something that contradicts or is in contrast to what has previously been said can also threaten the hearer’s positive face (Brown & Levinson, 1987). Even contradicting oneself is considered threatening to the speaker’s positive face. Thus identifying conventionalised realisations of refusals or contradictions can be instrumental in locating hedging strategies within a corpus.

In this study, direct refusals will be retrieved from four spoken corpora: BNC2014, Nordic Dialect Corpus (NDC), Norwegian Speech Corpus (NoTa) and the BigBrother corpus (BB). Only the conversational part of the NDC and NoTa will be used to make the data more comparable. There are no bidirectional or directly comparable corpora of spoken Norwegian and English, thus the corpora in this study are chosen based on their degree of comparability and their availability. This allows for a comparison of the results between the two languages.

In the study, 150 random instances of nei and no and 150 random instances of men and but in the respective languages were chosen from the corpora. Table 1 shows the number of nei/no used as refusals and the number of contrastive uses of men/but among the 150 instances in each language. 

Although 38.7 % and 32.9 % of the occurrences of nei and no were hedged, there were only 116 instances of the total of 300 nei and no being used as refusals in the data. This indicates that in order to use nei and no as framing devices, one would have to manually process a great deal of data to retrieve a sensible amount and variety of hedging strategies. In contrast, men and but showed more promising numbers with 284 relevant instances and 68.9 % and 56.4 % co-occurring with hedging strategies respectively. Example 1 below illustrates a typical example of hedging strategies (italicised) co-occurring with but in the English dataset.

Example 1 from BNC2014

S0598: you 're nearly an adult --ANONnameF
S0596: I am an adult […]
S0596: I am an adult I can vote
S0598: yeah but what I mean is like you can still say that you 're a teenager though cos eighteen

Preliminary results suggest that conventionalised realisations of face-threatening speech acts and the like can be used to identify other language functions in a corpus. However, the choice of framing device must be carefully selected and tested. In this study, both no and nei and but and men co-occurred with hedging strategies, but but and men returned the highest number of hedging strategies and the greatest variation between realisations. However, more data need to be analysed to confirm this. Furthermore,
using this approach to study how a particular function, with potentially indefinite realisations, is realised can be fruitful in the absence of pragmatically annotated corpora, particularly to capture linguistic variation across languages.

