Comparing K-means and OPTICS clustering algorithms for identifying vowel categories

Emily Grabowski, Jennifer Kuo


The K-means algorithm is the most commonly used clustering method for phonetic vowel description but has some properties that may be sub-optimal for representing phonetic data. This study compares K-means with an alternative algorithm, OPTICS, in two speech styles (lab vs. conversational) in English to test whether OPTICS is a viable alternative to K-means for characterizing vowel spaces. We find that with noisier data, OPTICS identifies clusters that more accurately represent the underlying data. Our results highlight the importance of choosing an algorithm whose assumptions are in line with the phonetic data being considered.


phonetics; vowels; unsupervised clustering; K-means; machine learning; corpus methods

Full Text:



Copyright (c) 2023 Emily Grabowski, Jennifer Kuo

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Donate to the Open-Access Fund of the LSA

Linguistic Society of America

Advancing the Scientific Study of Language since 1924

ISSN (online): 2473-8689

This publication is made available for free to readers and with no charge to authors thanks in part to your continuing LSA membership and your donations to the open access fund.