Problematic phonemes ” and German / ɛ : / : An acoustic analysis

The decision to include or exclude phonemes in the description of a language is not always straightforward; presentations of the phoneme inventory of Modern Standard German (MSG) often include a discussion of why /ɛ:/ is problematic as a phoneme. This study describes the acoustic realization of /ɛ:/ in comparison to /e:/ in spoken German, specifically South Westphalian. 39 native German speakers produced /ɛ:/ and /e:/ in hVt non-word frames and vowel productions were measured for: (1) first and second formants from the steady state of the vowel, (2) duration, and (3) fundamental frequency (f0). Measurements were analyzed with a logistic regression model using the glm package in R. The model showed that while the main effects of F2, duration, and pitch were not significant, F1 was; speakers reliably produced /ɛ:/ lower in the vowel space than /e:/, but not fronter. This preliminary investigation into the acoustic realizations of /ɛ:/ and /e:/ through the lens of the debate on whether these two sounds truly are phonetically and phonemically contrastive is a first step toward truly understanding these two sounds within the larger phonemic inventory of MSG. We hope that this study will reopen a discussion on this topic and help answer the question of whether /ɛ:/ really is a problematic phoneme.


Introduction.
The inclusion or exclusion of phonemes in the description of a language is rarely straightforward or truly objective. The parameters linguists choose for the phonemic representation of a language may vary depending on prevailing theories, language change or views about standard language. A study of the phonetics of a language can only describe a subset of speakers who, under ideal conditions, speak the language in the same way. Each speaker has an idiolect and a dialect. A person's speech may conform more or less to an idealized standard, but no one actually speaks the standard language because variation is intrinsic to spoken language (Lippi-Green 1997:25).
According to Milroy and Milroy standard language is not understood as any specific language, but instead as 'an idea in the mind rather than a reality -a set of abstract norms to which actual usage may conform to a great or lesser extent ' (1991:22-23). The 'standard' is often a prestige variant of the language that is privileged because of social, cultural and political factors rather than features intrinsic to the language. In contrast to the standards of English or French, Modern Standard German cannot be traced back to one regional variety or dialect (Barbour & Stevenson 1990, Clyne 1995, Zsiga 2013. The political fragmentation of Germany between the 16th century and the second half of the 19th century prevented any region from having enough prestige as well as social and political clout to provide the conditions needed to facilitate the region's language becoming the standard. The union of grapheme and phoneme that comprises 'Modern Standard German' was primarily driven by grammarians, teachers, writers and publishers as well as the Sprachgesellschaften 'language societies' of the 19th century. In 1898, Theodore Siebs published Deutsche Bühnensprache 'German pronunciation for theater' with the intent to normalize German pronunciation to the written language. During the 20 th century, Siebs has been the reference work for how to pronounce standard German, although recently new attitudes towards the existence of a spoken standard has caused this to change. For the most part, the written standard privileges the southern dialects (Mangold 2005), but Siebs preferred northern German pronunciations so chose those as his model. In the case of /ɛ:/, northern dialects do use this sound, although southern dialects do not. This has led to the argument by some scholars that /ɛ:/ is an artificial construct not realized in any spoken dialect and that the sound is only a product of the need to have a sound to correspond to <ä> in a one-to-one grapheme-phoneme system (Moulton, 1962;Reis 1974;von Polenz, 2000). Thus, the distinction between the pronunciation of <ä> and <ee> is an artificial one.
Presentations of the phonemic vowel inventory of Modern Standard German (MSG) often include a discussion of why /ɛ:/ is a problematic phoneme and justifications for why a particular linguist chose to include or exclude it. One argument for leaving /ɛ:/ out is that some dialects of German have collapsed this sound with /e:/ (Grantham-O'Brien and Fagan, 2016). Therefore, to include /ɛ:/ would unnecessarily complicate the vowel inventory. Another argument against its inclusion is that /ɛ:/ is the result of spelling conventions that drive speakers to differentiate between <ä> and <ee>, and differences in minimal pairs such as Bären and Beeren can be attributed to orthographic hallucination (Moulton, 1962;Reis 1974;von Polenz, 2000). What is evident from this dichotomy is that dialects that have the sound /ɛ:/ should include it as a phoneme and dialects that do not use /ɛ:/ should not include it. The question is then not whether /ɛ:/ exists as a phoneme in any given dialect, but who will prevail in the argument about whether it belongs in the inventory of phonemes that represent the abstract concept that is standard spoken German.
According to Wiese (1996), phonemic inventories should be flexible in order to account for language change and foreign loan words. In Wiese (1996) and Mangold (2005) /ɛ:/ is included in the vowels of Standard German (SG). The decision of scholars to include or exclude this phoneme is seemingly based on their judgement of what is more common or prestigious in usage. Since SG is intended to be an amalgamation of the most common forms of German cross-dialectally, it is not a true representation of how any one German-speaker speaks the language (Barbour and Stevenson, 1990;Fox, 1990;Milroy and Milroy, 1999;Mangold, 2005). Therefore, expert opinion is an acceptable method to determine the phonemic inventory of SG.
The task of deciding whether two vowel qualities are distinct and phonemically contrastive is, at best, a complicated one. One traditional method in phonology is to find minimal pairs in a language, that is a pair of words differing only in a single sound. In German, typical minimal pairs for vowel sounds are fühle vs fülle, Stille vs Stiele, Höhle vs Hölle. If you change just one sound in these words, you will also change the meaning of the word. (Zsiga 2013:204) A number of the hallmarks of a phonological contrast can be motivation to categorize two sounds as distinct phonemes; some of these are present in the contrast in question.
[…] However, the vowel systems of German dialects vary just as much as those of American English do, due to their history of vowel merger, splits, and shifts. In this study, the dialect of South Westphalian was chosen, represented by the dark green in the map below. More specifically, this study investigated the realization of two vowel phonemes, /ɛ:/ and /e:/ (e.g., [ge:be] vs. [gɛ:be]). We looked at three acoustic dimensions: (1) first two formants, (2) duration, and (3) pitch (f0). If these sounds are indeed distinct in South Westphalian, we expect a multi-dimensional acoustic contrast, specifically that /e:/ will be higher and fronter in the vowel space, marked by a lower F1 and higher F2 measurement than /ɛ:/ across speakers.

Methods.
2.1. PARTICIPANTS AND PROCEDURE. A total of 40 speakers (female = 21, mean age = 45, sd = 17) from the Rhein Ruhr area of North-Rhine-Westphalia provided recordings for the study. The native dialect of the speakers is Westphalian. One speaker was excluded from the study due to missing target items; 39 speakers were included in the analysis.
Speakers were recruited through social media and recorded themselves at home following a set of guidelines provided to them by the researchers; they were instructed to record themselves in a quiet room and start from the beginning if they made a mistake or had an exceptionally unnatural production. They uploaded the recording of them reading the word list provided, providing initials and age as participant ID.
The word list consisted of hVt non-words containing both /ɛ:/ and /e:/ along with six other distractor vowels. Participants read the words within the frame sentence "Das Wort ist ___" (The word is ____) and repeated it three times. The words as presented to the speakers followed Standard German spelling conventions that was unambiguous to participants to ensure they would produce the long version of the vowel and not the short. Non-words were used to avoid frequency effects; some work has shown that high-frequency words show a higher degree of vowel reduction and are therefore more centralized in comparison to low-frequency words (Phillips, 1984;Dinkin, 2008).
2.2. ANALYSIS. Target words were first segmented manually in Praat (Boersma and Weenink, 2018). The target vowels were then located manually and extracted; the beginning of the vowel at the onset of periodic voicing and the end at the cessation of periodicity and beginning of the postvocalic consonant closure. Both the beginning and end were taken at the zero crossing. F1, F2, F3, F0, and duration were tracked in two different ways to ensure accuracy: once with the Parselmouth package in Python using an adapted script from Feinberg (2019) and once with FastTrack (Barreda, 2021). Python and FastTrack outputs were manually compared and verified for all tokens. All vowels were grouped by speaker and mean values from the three repetitions were generated for all measures for each speaker. Vowels were normalized using the phonTools (Barreda, 2015) package in R.

Results.
To investigate the correlation between the independent variables and the dependent variable, a Pearson correlation matrix was generated using R. The Pearson correlation is used to show the magnitude of association between two variables, as well as the direction of their relationship. A value near ± 1 indicates a perfect correlation. A value between ± 0.50 and ± 1 indicates a high correlation.
The Pearson correlation revealed F1 to have the strongest correlation to vowel identity. F2 still showed a high correlation, but not as high as F1. For the regression model, vowel category was coded binary as the dependent variable (/ɛ:/ =0, and /e:/ =1). A logistic regression model was run in R using the glm package with vowel category as dependent variable, and the log-normalized formant frequencies and duration as independent variables. A random term for speaker was added to the model as well. This resulting regression formula was: glm(vowel ~ (1|ID) + f0 + f1log + f2log + duration, data = logs, family = "binomial"). The results revealed only F1 to be significant, which is also reflected in the Pearson correlation. While F2 still had a Pearson correlation of .51, it did not come out as significant in the model. When plotting the data in the F1/F2 plane, it can be seen that /e:/ is produced with a higher F1 by both male and female speakers, reflecting the findings from the model output. While there is a non-negligible level of overlap in the phonemes in the vowel space, /ɛ:/ still occupies a large section of the vowel space alone.

Discusssion.
The current study was designed to investigate the acoustic characteristics of two sounds in German, /ɛ:/ and /e:/, and make a case for their status as contrastive phonemes through acoustic analysis of utterances produced by 39 native German speakers followed with the statistical analysis of these measurements. The data indicated that the two vowels were significantly different in their productions in one of the three critical dimensions: spectral quality. This result calls into question the validity of /ɛ:/'s former designation as a ghost phoneme in German and supports the idea that these two vowels are indeed separate, contrastive phonemes.
The first goal of this study was to determine if the two aforementioned vowels inhabited distinct areas of the vowel space or if tokens would intermingle into one large space. The measurements taken from participant utterances showed that /e:/ had a lower F1, indicating that it was higher and fronter than /ɛ:/ and the difference was significant, indicating that the two vowels had distinct articulations and vowel qualities based on their acoustic characteristics. While a singular phoneme may have distinct articulations due to factors such as coarticulatory pressures or position in a word, it is worth noting that in addition to these distinct articulations, these two sounds have minimal pairs. Therefore, in addition to having acoustic cues that mark the contrast between these two vowels, there is not a clear or overt environment that triggers the production of one over the other. A claim that they are variations of a single phoneme while distinct articulations encoding lexical differences in words like Beeren and Bären is difficult to justify and, we believe, should be revisited and re-evaluated.
The second goal of this study was to examine if there were any secondary enhancing cues to this contrast, pitch and duration, but those were not found to be significant. Enhancement Theory (Stevens & Keyser, 1989) indicates that a contrast is typically signaled by multiple acoustic cues, with redundant secondary cues serving to enhance a contrast. While we did not find a multi-dimensional contrast as we had hoped, we still found reliable differences in the spectral characteristics of the sounds: the difference in F1 corroborates established articulatory consequences.
Given the acoustic evidence above, it is important to revisit the context in which we conducted this study: the motivations behind the inclusion or exclusion of phonemes in the phonological inventory of a language. Recall that Wiese (1996) asserts that phonemic inventories should be flexible in order to account for language change and even loan words. Vowels are often the most susceptible to language given their fluid articulation. Both vowel mergers and vowel divergence are common and well-studied sources of language change. While we cannot make strong claims with the preliminary acoustic analysis we have conducted here, it is easy to ponder if the emergence of /e:/ and /ɛ:/ as phonemes with distinct articulatory targets could be a product of a vowel divergence in progress. While this contrast is commonly attributed to orthographic hallucination (Moulton, 1962;Reis 1974;von Polenz, 2000), which could have been the case in the past, it has moved on to a phonemic contrast.
In conclusion, we believe we have provided sufficient evidence that these two sounds are contrastive (i.e., distinct spectral characteristics and contrastive distribution). Therefore, it is now the job of phonologists working to document and describe the German language to create a more inclusive account that is representative of more than just the constructed "standard" dialect. Thus, based upon the data we have presented here, we assert that /e:/ and /ɛ:/ are distinct phonemes and should be described as such in forthcoming accounts of German phonology.

Limitations and future directions.
As with any study, this one is not without limitations. First, the study elicited productions from participants through an orthographic representation of the desired utterance. This cannot completely rule out the idea of orthographic hallucination as productions were still directly tied to the orthography presented to the listener. It was hard to reconcile if the possibility of orthographic influence outweighed the possibility of frequency effects in order to have the most accurate productions possible. In the future, it would be worth to find another way to circumvent the need for orthography to elicit productions from speakers to completely escape orthographic hallucination as a potential confounding factor.
Another limitation of our study is due to the way in which the recordings were collected. Since data was collected in Germany, speakers recorded themselves with their own devices. Although speakers were instructed to record themselves in a quiet room, background noise and low quality of some recordings can potentially disturb and deprecate the acoustic signal. While this depreciation was not enough to inhibit the measurements taken in this study, more finetuned measurements and analysis in future studies may require a more controlled recording environ-ment. Therefore, it is our intention to conduct future data collection in a sound attenuated booth when possible. Furthermore, speakers could only receive instructions through the written online form and could not get any direct and immediate feedback to any questions to the nature of the task. Collecting data in-person would solve this issue.