Perspectives on corpus linguistics

Perspectives on corpus linguistics. Ed. by Vander Viana, Sonia Zyngier, and Geoff  Barnbrook. (Studies in corpus linguistics 48.) Amsterdam: John Benjamins, 2011. Pp. xvi, 256. ISBN 9789027203533. $143 (Hb).

Reviewed by Ksenia Shilikhina, Voronezh State University

This book comprises fourteen interviews with leading corpus linguistics scholars (namely, Guy Aston, Paul Baker, Tony Berber Sardinha, Susan Conrad, Mark Davies, Stefan Th. Gries, Ken Hyland, Stig Johansson, Sara Laviosa, Geoffrey N. Leech, William Ernest Louw, Geoffrey Sampson, Mike Scott, and John M. Swales) who present their theoretical positions and practical experience within the field of corpus research. Seven questions are addressed to all contributors; three questions concern specific areas of corpus linguistics and its applications (e.g. using corpora in translation, historical linguistics, crosslinguistic research, or teaching languages). The contributors discuss the origin of corpus linguistics, its present status, methods of corpus research, and the choice of research questions.

Responses to the question of the status of corpus linguistics reveal the diversity of views that coexist in the field. Some contributors consider corpus linguistics to be both a science and a methodology: However, not everyone supports this view: to say that corpus linguistics is a methodology is too little; to call it a science is too much.   As a compromise, Conrad suggests that corpus linguistics can be classified as ‘an approach to studying languages’ (49). It incorporates multiple projects with different research goals and different amount of effort put into the work.

The contributors discuss the controversial issue of corpus representativeness and suggest ways of improvement, e.g. increasing a corpus size or working out criteria for an adequate description of communities of language users. Another thorny question concerns the role of intuition in conducting a corpus study. It cannot be fully excluded from linguistic research; however, because intuition is unreliable, it should not precede a corpus search.

The use of corpora is not a cure-all solution for linguistics. That is the reason why in each interview there is a question about the strengths and weaknesses of corpus research. Among the strong points, the linguists name authenticity of data and availability of statistical methods of data analysis: this combination allows for studying patters of language use. Representativeness of corpora remains the major weakness of corpus linguistics.

Computer technologies and linguistic corpora transform our view of how language is used. They also change analytical techniques of linguistic research. So, perhaps the most important is the question of the technological and linguistic future of corpus linguistics. According to the contributors, corpora can contribute to practical research in multiple natural language processing applications, such as machine translation systems or ontologies.

The final three questions address various topics of corpus research, including types of existing corpora, techniques of data annotation, comparability of corpora, use of statistics in data analysis, public availability of corpora, and the level of technical expertise required for creating and using corpora.

Each chapter is comprised of an interview and an essay, and the variety of responses shows the diversity of approaches as well as the gains and losses of corpus linguistics. This book is a timely publication. The rapid growth of the field in the past decades, and the research ambitions and the variety of approaches presented in the book, reflect the need for clarification of the core concepts and research approaches in corpus linguistics.