Comparing K-means and OPTICS clustering algorithms for identifying vowel categories
Abstract
The K-means algorithm is the most commonly used clustering method for phonetic vowel description but has some properties that may be sub-optimal for representing phonetic data. This study compares K-means with an alternative algorithm, OPTICS, in two speech styles (lab vs. conversational) in English to test whether OPTICS is a viable alternative to K-means for characterizing vowel spaces. We find that with noisier data, OPTICS identifies clusters that more accurately represent the underlying data. Our results highlight the importance of choosing an algorithm whose assumptions are in line with the phonetic data being considered.
Keywords
phonetics; vowels; unsupervised clustering; K-means; machine learning; corpus methods
Full Text:
PDFDOI: https://doi.org/10.3765/plsa.v8i1.5488
Copyright (c) 2023 Emily Grabowski, Jennifer Kuo

This work is licensed under a Creative Commons Attribution 4.0 International License.

Linguistic Society of America
Advancing the Scientific Study of Language since 1924
ISSN (online): 2473-8689
This publication is made available for free to readers and with no charge to authors thanks in part to your continuing LSA membership and your donations to the open access fund.