Lexical Strata and Phonotactic Perplexity Minimization
Keywords:lexical strata, Japanese, neural network language model, gradient phonotactics
AbstractWe present a model of gradient phonotactics that is shown to reduce overall phoneme uncertainty in a language when the phonotactic grammar is modularized in an unsupervised fashion to create more than one sub-grammar. Our model is a recurrent neural network language model (Elman 1990), which, when applied in two separate, randomly initialized modules to a corpus of Japanese words, learns lexical subdivisions that closely correlate with two of the main lexical strata for Japanese (Yamato and Sino-Japanese) proposed by Ito and Mester (1995). We find that the gradient phonotactics learned by the model, which are based on the entire prior context of a phoneme, reveal a continuum of gradient strata membership, similar to the gradient membership proposed by Hayes (2016) for the Native vs. Latinate stratification in English.
Published by the LSA with permission of the author(s) under a CC BY 3.0 license.