к.ф.н., доцент Шингарева М.Ю., магистрант Жандарбекова А.
Региональный социально-инновационный университет
The Potential and Goals of Corpus Stylistic Analyses
Corpus linguistic analyses generate
data and evidence for claims which are inherently empirical, quantitative and
probabilistic. Qualitative statements are results of the interpretation of this
data. This enlarges the knowledge of the data and of the analytic techniques,
so that the corpus linguistic approach to an analysis conforms to Lakatos' position
that 'empiricalness (or scientific character) and theoretical progress are
inseparably connected ' [1, p. 123].
Corpus
linguistic analyses reveal tendencies and probabilities in language by way of
electronically generated quantitative data. These tendencies and probabilities
are the results of generalizations of data extracted from a text or a corpus by
electronic means. Absolute statements, such as feature x in corpus y occurs in z percent of all cases, are the basis for
hypotheses and generalizations on the usage of a specific feature in a
particular linguistic context. Generalizations regarding a context can only be
extensions of the data type in the original analysis. This means that, for
example, the analysis of a representative corpus of literary texts can lead to
conclusions on general literary language as represented by the corpus, but not
on general language usage.
Language is an open system. This means
that a corpus usually contains only a selection of the possible language of the
variety in question. Consequently, statements on language are mostly
generalizations of features that occur in the corpus. They show the probability
with which the feature occurs in the language variety represented by the
corpus.
The only corpora to which this does not
apply, are corpora which consist of the entire language of a particular
variety. Austen is an example of
this kind of corpus as it includes all novels by the author Jane Austen.
Analysing this corpus allows absolute statements to be made on Austen's language
in her novels, but not on her language in other writings, such as her letters,
since these are not represented by the corpus.
Generalizations on linguistic patterns
from a representative corpus are one of the potentials of and, indeed, an
explicit goal in corpus linguistics. This is because insights into the language
system cannot be gained from the analysis of one text only. Generalizations
based on the analysis of corpora allow a determination of the significance of
linguistic patterns in the language data and a selection of patterns for
further detailed analyses. They permit insights into the language system and
the encoding of meaning in language. It is the analysis of large amounts of
language data that makes studying actual language use possible.
Since statements on the usage of
language are generalizations, they are inherently probabilistic. This is
because the absolute frequency of a feature varies between different texts and
corpora; a particular feature might occur several times in some, but not at all
in other data. Thus, corpus linguistic analyses establish the average frequency
with which a feature is used in language.
Whether a specific linguistic pattern
occurs in the data is influenced by several factors, including sociolinguistic
ones, such as the participants of a conversation, its situation and its
purpose. Other factors influencing language use are, for example, differences
between written and spoken language, and genre or general linguistic
conventions, for example, on how something is typically phrased. Every
utterance is a reaction to previously made utterances [2, p. 4], since conventions for
language use are established by continuous and repeated usage in particular
contexts. Language does not stand in isolation, but refers to previously used
language. This situates an utterance within the language system of production
and reception and makes corpus linguistic analyses inherently comparative in
nature.
In corpus stylistics, we can use the same methods to analyse
both an individual text and a corpus. This allows us
·
to develop analytic techniques for investigating various
research questions,
·
to evaluate the success of different research techniques for
different sets of data, and
·
to gain new literary and structural insights into the data.
The evaluation of the use of corpus linguistic techniques
for the study of literature is based on two criteria:
·
Can literary insights into the data that have been published
be reproduced by using corpus linguistic techniques in the analysis?
·
Is it possible to gain new insights into the data by using
new analytic, that is, corpus linguistic, techniques?
The latter question is not only the greater challenge, it is
also of greater importance for the evaluation of the analyses. Only its
affirmation legitimizes corpus stylistics as its negation would also negate the
usefulness of the analyses. The mere reproduction of knowledge would make using
corpus linguistic techniques in the analysis of literature unnecessary.
As the analyses shows, however, it is possible to gain new
insights into the data by using corpus linguistic techniques for their
analysis. One example of a new insight into the data is the contribution of the
novel's frequent phrases to characterizing people and places. This shows that
identifying linguistic patterns in large quantities of data, such as a novel or
a corpus, by using corpus linguistic techniques, not only complements
traditional techniques and methods in the analysis of literature by reproducing
findings, but that the analyses also expand findings from previous research and
offer new insights into the data.
Also, the first question posed earlier can be answered in
the affirmative, as previously generated results can be replicated using corpus
linguistic techniques for the analysis. Reproducing knowledge about the data
increases the significance of new findings, since their plausibility has
already been asserted.
New insights into the data can be gained since (1) the data
is studied in a detailed and systematic way and (2) a larger number of units of
meaning in language is analysed than in literary studies. While literary
studies have traditionally looked at a text as a unit of meaning, corpus
linguistics looks at more than one unit as carriers of meaning. Words, phrases,
text parts and the text itself are units of meaning which all contribute to the
literary meanings of the data. They are therefore the objects of the analyses.
Analysing more than one of these units of meaning gives a
more comprehensive view of the data than the analysis of a unit of only one
kind. This multi-dimensional approach to an analysis in corpus linguistics and
corpus stylistics stands in contrast to the traditional approaches to text
analysis in literary studies and in other linguistic disciplines, and is made possible
only because software is used for the analyses. This multi-dimensionality is
one of the major features of corpus linguistics and one of the foundations of
the success and the potential of corpus stylistics for analysing literary
texts.
Literature:
1. Lakatos, I. (1970), 'Falsification and
the methodology of scientific research programmes', in I. Lakatos and A.
Musgrave (eds), Criticism and the
Growth of Knowledge. Cambridge: Cambridge
University Press, pp. 91-196.
2. Teubert, W. (2005), 'My version of corpus
linguistics'. International
Journal ofCorpus Linguistics, 10 (1), 1-13.