Genres on the web: Computational models and empirical studies

Genres on the web: Computational models and empirical studies. Ed. by Alexander Mehler, Serge Sharoff, and Marina Santini. (Text, speech and language technology 42.) Dordrecht: Springer, 2011. Pp. xiv, 362. ISBN 9789048191772. $189 (Hb).

Reviewed by Daria Dayter, University of Bayreuth

The present collection consists of six parts devoted to the exploration of the fluid concept of genre in the environment of the Internet. In Ch. 1, the editors discuss the practical potential of the notion of genre for empirical and computational fields. They introduce three open issues that are dealt with in this volume: the nature of web documents, the construction and use of web-based corpora, and the design of computational models.

The second part of the volume opens with a chapter by Jussi Karlgren, who takes a reader-oriented approach to genre and finds that web users perceive two new types of genre: one, based on new technology; and another, significantly more specialized in terms of content. In Ch. 3, Mark A. Rosso and Stephanie W. Haas concentrate on methodological issues in operationalizing genre theory for the enhancement of web search. Ch. 4, by Kevin Crowston, Barbara Kwasnick, and Joseph Rubleske, reports on a field study that attempts a bottom-up approach to building a taxonomy of web genres.

The third and fourth parts of this collection adopt an applied perspective by tackling various problems of automatic web genre identification (AGI) and developing structure-based approaches to genre classification. Marina Santini explores the cross-testing method to evaluate genre models. Yunhyong Kim and Seamus Ross examine the role of word distribution patterns in the classification of documents. Serge Sharoff proposes a function-based typology of web pages and tests it on two Internet corpora. Among other problems of the existing AGI models, Benno Stein, Sven Meyer zu Eissen, and Nedim Lipka suggest a method to overcome the insufficient generalization capability of these models. In the final contribution, Pavel Braslavski reports an experiment that involves merging genre-related and text-relevance rankings, with moderate improvement in search results.

As Christoph Lindemann and Lars Littig establish in Ch. 10, a twofold approach to the classification of web pages using structural and content features significantly improves the overall accuracy of classification at the super-genre level. An alternative approach is exemplified in Ch. 11 by Matthias Dehmer and Frank Emmert-Streib, who analyze web genre data by applying a graph-based representation model. In Lennart Björneborn’s contribution, an analysis of patterns of interconnectedness between genres in academic web space reveals a dynamic web of genres.

Finally, the fifth part of the book comprises three case studies of emerging web genres. An amateur Flash exchange website is the focus of attention in the chapter by John Paolillo, Jonathan Warren, and Breanne Kunz. The principal dimensions of linguistic variation in blogs are identified by Jack Grieve et al. in Ch. 14, and in Ch. 15, Ian Bruce combines social and cognitive perspectives on genre in analyzing a sample of participatory news texts.

This collection concludes with a call for further research in developing a characterization of web genres, drawing up annotation guidelines, and creating genre benchmarks in different languages.