Corpora in language acquisition research

Corpora in language acquisition research: History, methods, perspectives. Ed. by Heike Behrens. (Trends in language acquisition research 6.) Amsterdam: John Benjamins, 2008. Pp. xxx, 234. ISBN 9789027290267. $128 (Hb).

Reviewed by Dimitrios Ntelitheos, United Arab Emirates University

This collection of papers focuses on the contribution of corpus studies to the development of language acquisition research. Despite the long-standing significance of corpus databases in language development research, little work has been done examining the processes involved in data archiving and mining or the validity of statistical generalizations based on language acquisition corpora.

Heike Behrens provides an introductory chapter discussing the history, methodology, and perspectives of corpora in language acquisition research in detail. After a brief historical survey of the field that describes the development of sampling methods from systematic diary studies at the end of the nineteenth century to current computerized longitudinal studies, she discusses methods of data archiving and sharing, information retrieval methods, and quality control.

Caroline Rowland, Sarah Fletcher, and Daniel Freudenthal, in ‘How big is big enough?’, assess the reliability of data from naturalistic samples and the validity of results from longitudinal studies, which in general sample only a very small percentage of the linguistic output of children. To maximize the reliability of child acquisition corpus studies, they propose statistical methods that can assess the amount of data required and different types of sampling regimes.

Dorit Ravid, Wolfgang Dressler, Bracha Nir, Katharina Korecky-Kröll, Agnita Souman, Katja Rehfeldt, Sabine Laaha, Johannes Bertl, Hans Basbøll, and Steven Gillis discuss the form of plurals in the input transmitted by care-takers to children in the first stages of language acquisition. They show that plurals in child-directed speech are much more predictable and regular than in the fully mature adult system, and, children’s plurals are quantitatively very close to the input they receive from their care takers

Elena Lieven provides a usage-based account of the development of the English auxiliary system that tracks the development of six children’s productive auxiliaries between the ages of 2;0–3;0. She shows that the developmental process begins as children establish productive frames around specific auxiliary forms, then later integrate tense and agreement information from these frames into a ‘system-wide’ grammar. Her account provides an alternative to earlier approaches that are based on abstract linguistic representations of the auxiliary system.

Shanley Allen, Barbora Skarabela, and Mary Hughes examine the use of corpora in determining discourse effects in syntax. They claim that naturalistic corpus research is essential in studying discourse factors on argument omission in children’s speech. Much of the chapter concerns the effect of information flow on argument realization in adult and child speech.

Padraic Monaghan and Morten Christiansen reformulate the poverty of stimulus argument by investigating interacting phonological and semantic cues in the acquisition of syntax. When these cues are considered together, they provide the child adequate and reliable information about syntactic categories.

Brian MacWhinney discusses how the Child Language Data Exchange System (CHILDES) database can be enriched for advanced morphosyntactic analysis. After surveying improvements to the database, he shows that MOR grammars, which introduce morphological information in corpus transcripts, can be implemented successfully to calculate morphological development indices.

Finally, Katherine Demuth summarizes issues related to corpora for language acquisition research: e.g. corpus size, longitudinal case studies, early production data, the nature of input, and the relevance of discourse content.

This collection is a welcome contribution to the field of language acquisition, as it critically examines one of the most important sources of data in acquisition research. It is essential reading for everyone working on creating new corpora of longitudinal studies of children acquiring a first language, as well as a valuable source of information about language acquisition and corpus studies.