к.ф.н., доцент Шингарева М.Ю., магистрант Жандарбекова А.

Региональный социально-инновационный университет

The Potential and Goals of Corpus Stylistic Analyses

Corpus linguistic analyses generate data and evidence for claims which are inherently empirical, quantitative and probabilistic. Qualitative statements are results of the interpretation of this data. This enlarges the knowledge of the data and of the analytic techniques, so that the corpus linguistic approach to an analysis conforms to Lakatos' position that 'empiricalness (or scientific character) and theoretical progress are inseparably connected ' [1, p. 123].

Corpus linguistic analyses reveal tendencies and probabilities in language by way of electronically generated quantitative data. These tendencies and probabilities are the results of generalizations of data extracted from a text or a corpus by electronic means. Absolute statements, such as feature x in corpus y occurs in z percent of all cases, are the basis for hypotheses and generalizations on the usage of a specific feature in a particular linguistic context. Generalizations regarding a context can only be extensions of the data type in the original analysis. This means that, for example, the analysis of a representative corpus of literary texts can lead to conclusions on general literary language as represented by the corpus, but not on general language usage.

Language is an open system. This means that a corpus usually contains only a selection of the possible language of the variety in question. Consequently, statements on language are mostly generalizations of features that occur in the corpus. They show the probability with which the feature occurs in the language variety represented by the corpus.

The only corpora to which this does not apply, are corpora which consist of the entire language of a particular variety. Austen is an example of this kind of corpus as it includes all novels by the author Jane Austen. Analysing this corpus allows absolute statements to be made on Austen's language in her novels, but not on her language in other writings, such as her letters, since these are not represented by the corpus.

Generalizations on linguistic patterns from a representative corpus are one of the potentials of and, indeed, an explicit goal in corpus linguistics. This is because insights into the language system cannot be gained from the analysis of one text only. Generalizations based on the analysis of corpora allow a determination of the significance of linguistic patterns in the language data and a selection of patterns for further detailed analyses. They permit insights into the language system and the encoding of meaning in language. It is the analysis of large amounts of language data that makes studying actual language use possible.

Since statements on the usage of language are generalizations, they are inherently probabilistic. This is because the absolute frequency of a feature varies between different texts and corpora; a particular feature might occur several times in some, but not at all in other data. Thus, corpus linguistic analyses establish the average frequency with which a feature is used in language.

Whether a specific linguistic pattern occurs in the data is influenced by several factors, including sociolinguistic ones, such as the participants of a conversation, its situation and its purpose. Other factors influencing language use are, for example, differences between written and spoken language, and genre or general linguistic conventions, for example, on how something is typically phrased. Every utterance is a reaction to previously made utterances [2, p. 4], since conventions for language use are established by continuous and repeated usage in particular contexts. Language does not stand in isolation, but refers to previously used language. This situates an utterance within the language system of production and reception and makes corpus linguistic analyses inherently comparative in nature.

In corpus stylistics, we can use the same methods to analyse both an individual text and a corpus. This allows us

·        to develop analytic techniques for investigating various research questions,

·        to evaluate the success of different research techniques for different sets of data, and

·        to gain new literary and structural insights into the data.

The evaluation of the use of corpus linguistic techniques for the study of literature is based on two criteria:

·        Can literary insights into the data that have been published be repro­duced by using corpus linguistic techniques in the analysis?

·        Is it possible to gain new insights into the data by using new analytic, that is, corpus linguistic, techniques?

The latter question is not only the greater challenge, it is also of greater importance for the evaluation of the analyses. Only its affirmation legitimizes corpus stylistics as its negation would also negate the usefulness of the analyses. The mere reproduction of knowledge would make using corpus linguistic techniques in the analysis of literature unnecessary.

As the analyses shows, however, it is possible to gain new insights into the data by using corpus linguistic techniques for their analysis. One example of a new insight into the data is the contribution of the novel's frequent phrases to characterizing people and places. This shows that identifying linguistic patterns in large quantities of data, such as a novel or a corpus, by using corpus linguistic techniques, not only complements traditional techniques and methods in the analysis of literature by reproducing findings, but that the analyses also expand findings from previous research and offer new insights into the data.

Also, the first question posed earlier can be answered in the affirmative, as previously generated results can be replicated using corpus linguistic techniques for the analysis. Reproducing knowledge about the data increases the significance of new findings, since their plausibility has already been asserted.

New insights into the data can be gained since (1) the data is studied in a detailed and systematic way and (2) a larger number of units of meaning in language is analysed than in literary studies. While literary studies have traditionally looked at a text as a unit of meaning, corpus linguistics looks at more than one unit as carriers of meaning. Words, phrases, text parts and the text itself are units of meaning which all contribute to the literary meanings of the data. They are therefore the objects of the analyses.

Analysing more than one of these units of meaning gives a more comprehensive view of the data than the analysis of a unit of only one kind. This multi-dimensional approach to an analysis in corpus linguistics and corpus stylistics stands in contrast to the traditional approaches to text analysis in literary studies and in other linguistic disciplines, and is made possible only because software is used for the analyses. This multi-dimensionality is one of the major features of corpus linguistics and one of the foundations of the success and the potential of corpus stylistics for analysing literary texts.

Literature:

1.     Lakatos, I. (1970), 'Falsification and the methodology of scientific research pro­grammes', in I. Lakatos and A. Musgrave (eds), Criticism and the Growth of Knowledge. Cambridge: Cambridge University Press, pp. 91-196.

2.     Teubert, W. (2005), 'My version of corpus linguistics'. International Journal ofCorpus Linguistics, 10 (1), 1-13.