Sociolinguistically-aware computational models of Mandarin-English codeswitching


  • Irene Yi Yale University



codeswitching, Mandarin-English bilinguals, sociolinguistics, computational modeling


Current research on computational modeling of codeswitching has focused on the use of syntactic constraints as model predictors (Li & Fung 2014; Li & Vu 2019). However, proposed syntactic constraints (Poplack 1978; Poplack 1980; Myers-Scotton 1993; Belazi et al. 1994) are largely based around Spanish-English codeswitching, and are violated repeatedly (and potentially systematically) by codeswitching involving other languages. Thus, a computational model trained on these syntactic constraints, when applied to codeswitching involving languages that are not Spanish-English, may not capture the naturalistic patterns of those languages in codeswitching contexts. This paper demonstrates the value of sociolinguistic factors as predictors in training a Classification and Regression Tree (CART) model on novel Mandarin-English codeswitch data, which come from 12 bilingual speakers of two different generations from Grand Rapids, Michigan. Participants also answered metalinguistic questions about their own language practices and attitudes and completed a written Language History Questionnaire (LHQ) (Li et al. 2020), which asked for self-evaluations of language habits (proficiency, immersion, and dominance in the two languages). LHQ responses were then quantified into numerical scores serving as sociolinguistic predictors in the CART model. The model, which highlighted that age, L2 Dominance, and L1 Immersion were among the top predictors, achieved an accuracy of 0.804 with the area under its ROC curve being 0.692. This is comparable to, if not more powerful than, previous computational studies (e.g. Li & Fung 2014) that trained models using only proposed syntactic constraints as predictors. This paper shows the importance of sociolinguistic factors in computational research previously focused on syntactic constraints; the intersection of these methodologies could improve a cross-linguistic and computational understanding of codeswitching patterns.




How to Cite

Yi, Irene. 2022. “Sociolinguistically-Aware Computational Models of Mandarin-English Codeswitching”. Proceedings of the Linguistic Society of America 7 (1): 5247.