Speech tested for Zipfian fit using rigorous statistical techniques

Authors

  • Paul De Palma Gonzaga University http://orcid.org/0000-0001-7910-116X
  • Leon Antonio Garcia-Camargo Gonzaga University
  • Jeb Kilfoyle University of New Mexico
  • Mark Vandam Washington State University
  • Joseph Stover Gonzaga University

DOI:

https://doi.org/10.3765/plsa.v6i1.4975

Keywords:

computational linguistics, Zipf, ASD

Abstract

Zipf’s law describes the relationship between the frequencies of words in a corpus and their rank. Its most basic form is a simple series, indicating that the frequency of a word is inverselyproportional to its rank:

1/2, 1/3, 1/4,...

The past two decades have seen the emergence of usage-based and cognitive approaches to language study. A key observation of these approaches, along with the importance of frequency, is that speech differs in substantial and structural ways from writing. Yet, except for a few older analyses performed on very small corpora, most studies of Zipf’s law have been done on written corpora. Further, a judgement of Zifianness in much of this work is based on loose and informal criteria.  In fact, sophisticated statistical techniques have been developed for curve fitting in recent years in the mathematics and physics literature. These include the use of the Kolmogorov-Smirnov statistic, along with maximum likelihood estimation to generate p-values and the use of the complementary error function for normal distributions. The latter helps determine if a corpus, failing a Zipfian fit, might be better described by another distribution. In this paper, we will:

  • Show that three corpora of recorded speech follow a power law distribution using rigorous statis- tical techniques: Buckeye, Santa Barbara, MiCase

  • Describe preliminary results showing that the techniques outlined in this paper may be useful in the diagnoses of those conditions that can include disordered speech.

  • Explain how to do the analyses described in this paper.

  • Explain how to download and use the R/Python code we have written and packaged as the Zipf Tool Kit

Author Biographies

  • Paul De Palma, Gonzaga University
    Professor of Computer Science

  • Leon Antonio Garcia-Camargo, Gonzaga University

    Student, Department of Computer Science

  • Jeb Kilfoyle, University of New Mexico
    Graduate Student, Department of Computer Science

  • Mark Vandam, Washington State University
    Associate Professor of Speech and Hearing Sciences
  • Joseph Stover, Gonzaga University
    Assistant Professor of Mathematics

Downloads

Published

2021-03-20

How to Cite

De Palma, Paul, Leon Antonio Garcia-Camargo, Jeb Kilfoyle, Mark Vandam, and Joseph Stover. 2021. “Speech Tested for Zipfian Fit Using Rigorous Statistical Techniques”. Proceedings of the Linguistic Society of America 6 (1): 394–402. https://doi.org/10.3765/plsa.v6i1.4975.