{"id":1796,"date":"2011-10-01T10:00:38","date_gmt":"2011-10-01T08:00:38","guid":{"rendered":"http:\/\/elanguage.net\/blogs\/booknotices\/?p=1796"},"modified":"2011-09-19T09:30:18","modified_gmt":"2011-09-19T07:30:18","slug":"corpus-linguistics","status":"publish","type":"post","link":"https:\/\/journals.linguisticsociety.org\/booknotices\/?p=1796","title":{"rendered":"Corpus linguistics"},"content":{"rendered":"<div style=\"margin-left: 2em; text-indent: -2em;\"><strong>Corpus linguistics: <\/strong>Readings in a widening discipline. Ed. by<strong> Geoffrey Sampson <\/strong>and<strong> Diana McCarthy<\/strong>.<strong> <\/strong>London: Continuum, 2005. Pp. xv, 524.<strong> <\/strong>ISBN <a href=\"http:\/\/www.worldcat.org\/title\/corpus-linguistics-readings-in-a-widening-discipline\/oclc\/213223928&amp;referer=brief_results\">9780826488039<\/a>. $60.<\/div>\n<p style=\"text-align: right;\">Reviewed\u00a0by <strong><a href=\"http:\/\/linguistlist.org\/people\/personal\/get-personal-page2.cfm?PersonID=122191\">Carmela Chateau<\/a><\/strong>, <em>Universit\u00e9 de Bourgogne<\/em><\/p>\n<p>Corpus linguists generally start their careers as linguists or computer scientists. Researchers from vastly different backgrounds will find this book of great assistance in learning more about the sources of the discipline, and it will prove invaluable for students just starting out as corpus linguists. This reader contains forty-two key texts in chronological order spanning fifty years. Besides a general introduction, each paper has a brief introduction setting it in historical context.<\/p>\n<p>The first article predates electronic corpora: <strong>Charles Carpenter Fries<\/strong> (1952) used recordings of telephone conversations (about 250,000 words) to investigate the structure of English in use. The subcorpus of 72,000 words used by <strong>F. G. A. M. Aarts<\/strong> (1971) contained some spoken texts. <strong>Bengt Altenberg<\/strong> (1986) worked on spoken English, to chunk language naturally as part of a Text-to-Speech (TTS) program; <strong>Louis C.W. Pols et al.<\/strong> (1998) explored the use of authentic corpora to improve such programs. <strong>Peter C. Collins<\/strong> (1987) examined differences between spoken and written English using the Lancaster-Oslo-Bergen (LOB) and London-Lund one-million-word corpora, constructed along the lines of the Brown corpus presented by <strong>W. Nelson Francis<\/strong> (1965). <strong>Geoffrey Leech<\/strong> and <strong>Roger Fallon<\/strong> (1992) examined the cultural aspects revealed by the Brown (American) and LOB (British) comparable corpora.<\/p>\n<p>Corpus construction was discussed by <strong>John Sinclair<\/strong> (1987), a key figure in the creation of the Collins Birmingham University International Language Database (COBUILD) Bank of English. <strong>Douglas Biber<\/strong> (1992) showed how statistics can be used to confirm the representativeness of a corpus. Statistics were brought into play by <strong>William Gale<\/strong> and <strong>Kenneth Church<\/strong> (1989) and by <strong>Peter F. Brown et al.<\/strong> (1990), investigating parallel corpora for machine translation. <strong>Jean Carletta<\/strong> (1996) suggested using the kappa statistic to assess interannotator reliability. <strong>Donald Hindle<\/strong> and <strong>Mats Rooth<\/strong> (1993) investigated parsing, finding that in some cases there could be no single correct answer.<\/p>\n<p>The treebank approach to parsing corpora was presented by <strong>Mitchell P. Marcus et al.<\/strong> (1993). <strong>E.J. Briscoe<\/strong> and <strong>J.A. Carroll<\/strong> (1995) evaluated a probabilistic parser. The topic of treebanks was discussed in greater depth by <strong>Eugene Charniak<\/strong> (1996) and by <strong>Geoffrey Sampson<\/strong> (1999). Another approach to treebanks, for Czech, was presented by <strong>Alena B\u00f6hmov\u00e1<\/strong> and <strong>Eva Haji\u010dov\u00e1<\/strong> (1999). A Swedish corpus was discussed by <strong>Staffan Hellberg<\/strong> (1991), and <strong>Anthony McEnery<\/strong> (2001) made the case for corpus research into nonindigenous minority languages (NIMLs). <strong>Estelle Campione<\/strong> and <strong>Jean V\u00e9ronis<\/strong> (2001) presented spoken French corpora, semiautomatically tagged for intonation. <strong>Esther Grabe<\/strong> and <strong>Brechtje Post<\/strong> (2002) looked at Intonational Variation in English (IViE). <strong>Ossi Ihalainen<\/strong> (1991) also investigated a British dialect, while <strong>Jan Tent<\/strong> and <strong>France Mugler<\/strong> (1996) discussed the creation of the Fiji component of the International Corpus of English (ICE).<\/p>\n<p><strong>Douglas Biber<\/strong> and <strong>Edward Finegan<\/strong> (1987) examined English from a historical viewpoint, as did <strong>Matti Rissanen<\/strong> (1991). Various idiosyncratic aspects of spoken English were also investigated: <strong>Ingrid Kristine Hasund<\/strong> and <strong>Anna-Brita Stenstr\u00f6m<\/strong> (1996) looked into female disputes; <strong>Anthony McEnery et al.<\/strong> (1998) focused on swearing; <strong>Christopher C. Werry<\/strong> (1996) examined Internet Relay Chat (IRC); <strong>David McKelvie<\/strong> (1998) studied disfluency; and <strong>Mark G. Core<\/strong> (1998) investigated the use of Dialog Act Markup in Several Layers (DAMSL) utterance tags to explore speech acts.<\/p>\n<p><strong>Gavin Burnage<\/strong> and <strong>Dominic Dunlop<\/strong> (1992) were involved in encoding the British National Corpus (BNC). Jean Carletta et al. (2000) used XML for linguistic annotation. <strong>L.W.M. Bod<\/strong> and <strong>R.J.H. Scha<\/strong> (1996) provided an overview of data-oriented language processing. <strong>Gill Francis<\/strong> (1993) and <strong>William Louw<\/strong> (1993) used the COBUILD to produce a new, corpus-driven grammar of English and to investigate semantic prosody, respectively.<\/p>\n<p>Corpora have also been used to produce dictionaries for language learners. <strong>Philip Resnik<\/strong> and <strong>David Yarowsky<\/strong> (1997) discussed word sense disambiguation. <strong>Patrick Hanks<\/strong> (1986) investigated meaning potentials. <strong>Kenji Kita et al.<\/strong> (1994) used corpora for the automatic extraction of collocations for language learning. <strong>Dieter Mindt<\/strong> (1996) investigated corpus linguistics and foreign-language learning. <strong>Kenneth Hyland<\/strong> and <strong>John Milton<\/strong> (1997) explored differences in native speakers\u2019 and second language learners\u2019 writing. Finally, <strong>Adam Kilgarriff<\/strong> (2001) explored the twenty-firstcentury trend, web-as-corpus.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Corpus linguistics: Readings in a widening discipline. Ed. by Geoffrey Sampson and Diana McCarthy. London: Continuum, 2005. Pp. xv, 524. ISBN 9780826488039. $60. Reviewed\u00a0by Carmela Chateau, Universit\u00e9 de Bourgogne Corpus linguists generally start their careers as linguists or computer scientists. Researchers from vastly different backgrounds will find this book of great assistance in learning more [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[],"tags":[],"_links":{"self":[{"href":"https:\/\/journals.linguisticsociety.org\/booknotices\/index.php?rest_route=\/wp\/v2\/posts\/1796"}],"collection":[{"href":"https:\/\/journals.linguisticsociety.org\/booknotices\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/journals.linguisticsociety.org\/booknotices\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/journals.linguisticsociety.org\/booknotices\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/journals.linguisticsociety.org\/booknotices\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1796"}],"version-history":[{"count":1,"href":"https:\/\/journals.linguisticsociety.org\/booknotices\/index.php?rest_route=\/wp\/v2\/posts\/1796\/revisions"}],"predecessor-version":[{"id":1797,"href":"https:\/\/journals.linguisticsociety.org\/booknotices\/index.php?rest_route=\/wp\/v2\/posts\/1796\/revisions\/1797"}],"wp:attachment":[{"href":"https:\/\/journals.linguisticsociety.org\/booknotices\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1796"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/journals.linguisticsociety.org\/booknotices\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1796"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/journals.linguisticsociety.org\/booknotices\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1796"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}