Acoustic properties of word and phrasal prominence in Uzbek

Based on a large-scale corpus of experimental data produced by 8 native speakers of Tashkent Uzbek, we assess the presence of canonical word-final stress in real words spoken in three dialogue types: without focus, with contrastive focus, and with new information focus on the target. The first context provides baseline information regarding the manifestation of stress, in the absence of additional focus properties. By comparing the latter two contexts with the former, we are also able to assess the acoustic manifestation of the two types of focus. The most noteworthy properties of the final syllable are its relatively long duration and sharp falling contour, potentially serving as the cues to lexical stress, and enhanced by both types of focus. Due to the word-final position of stress, however, the patterns we observe could also be consistent with boundary properties, a possibility we consider as well. In addition, we briefly compare the prosodic patterns we observe in Uzbek with similarly collected data in Turkish. We find that the prominence patterns in Uzbek, while not particularly strong, are nevertheless stronger than those in Turkish, and also exhibit crucial differences. Implications for Turkic prosody more generally are also suggested.

In Section 2, we summarize the previous descriptive accounts of Uzbek stress and phrasal prosody. Section 3 introduces the present investigation, including our hypotheses and methodology. The results of our analysis are presented in Section 4, followed by a discussion of the findings and conclusions in Sections 5 and 6, respectively.

Previous descriptions of Uzbek prosody.
Uzbek is a Turkic language spoken in Uzbekistan and elsewhere in Central Asia. Like Turkish, Uzbek is reported to have final lexical stress, illustrated in (1). The acute accent mark identifies the stressed vowel here and below. 1 (1) Uzbek Word-Final Stress (from Sjoberg 1963: 24-5) a.
(  1 Uzbek has multiple dialects which differ in their phonology. The transcriptions from Sjoberg 1963 reflect Standard Uzbek, while Bokhari & Washington 2015 reflect Saudi Diaspora Uzbek. Our study focuses on Tashkent Uzbek, which has some phonological differences from Standard Uzbek. Unless a different source is cited, we have followed Sjoberg in transcribing the graphemes <i,u,g'> as the phonemes /i,u,ɣ/, and corresponding phones. A reviewer points out, however, that phonetically, these segments may more accurately be transcribed as [ɨ,ʉ,ʁ], as seen in the examples from Bokhari and Washington 2015. The reviewer also points out that Uzbek orthography uses "modifier letter turned comma" (U02BB) <ʻ>, but in this paper, as in our stimuli, we follow the common practice of using a simple apostrophe <'> suggested by our native language informant. 2 Stress in Uzbek compounds is also distributed as in Turkish, mostly appearing on the first member of a compound (Sjoberg 1963:27, Bodrogligeti 2003. In addition, there are some reports of secondary stress (Sjoberg 1963:31, Bodrogligeti 2003. 3 The term "personal copula particle" and the morphological segmentation are taken from Bokhari & Washington 2015. However, a reviewer suggests that a better term might be "predicate person agreement morpheme". In the present investigation, designed to provide basic acoustic information on Uzbek stress, we examine only stimuli that have canonical final stress (see Appendix). Stimuli with prestressing suffixes (personal copula particles) were also recorded, but are not discussed here.

Acoustic Investigation of Uzbek prosody.
Given the absence of acoustic information about the prosody of Uzbek, a primary goal of the present study is to verify the descriptions in previous work according to which Uzbek exhibits lexical and phrasal prominence similarly to Turkish. As noted, we consider here only the canonical stress pattern: stress on the word-final syllable.
First, we investigate the acoustic properties of word stress. According to the previous descriptions, the words we tested are all expected to carry final stress. Thus, we expect that in a multisyllabic word one or more of the typical acoustic properties of prominence (e.g., duration, F0, intensity) will be enhanced on the final (stressed) syllable in comparison with the manifestation of the same properties on the preceding (unstressed) syllables (Hypothesis 1).
(6) Hypothesis 1: Lexical stress in Uzbek is manifested by enhancement of one or more of the acoustic properties typically associated with prominence on the final syllable of a word, when compared with the properties of preceding syllables.
Next, given that focus is commonly found to enhance the properties of the stressed syllable of a word (e.g., Ladd 2008), we expect that the final (stressed) syllable will be additionally enhanced under focus in Uzbek. Although different types of focus may exhibit somewhat different prosodic patterns, we predict enhancement of the stressed syllable for both Contrastive Focus (Hypothesis 2) and New Information Focus (Hypothesis 3).
(7) Hypothesis 2: Contrastive Focus is manifested by enhancement of one or more of the acoustic properties typically associated with prominence on the final syllable of a word, when compared with the properties of this syllable in the absence of focus.
(8) Hypothesis 3: New Information Focus is manifested by enhancement of one or more of the acoustic properties typically associated with prominence on the final syllable of a word, when compared with the properties of this syllable in the absence of focus.
In addition to the general enhancement of the stressed syllable under focus, we consider whether there are differences between the enhancement patterns of the two types of focus (Hypothesis 4).
(9) Hypothesis 4: Contrastive Focus and New Information Focus are manifested by different enhancement patterns of one or more of the acoustic properties typically associated with prominence on the final syllable of a word.
Finally, given the expected similarity between Uzbek and Turkish stress, we briefly compare the prominence patterns of the two languages (Hypothesis 5).
(10) Hypothesis 5: The prominence (stress and focus) patterns of Uzbek and Turkish exhibit the same acoustic properties.
3.1. PARTICIPANTS AND PROCEDURE. The participants were ten native speakers (5 female) of the Tashkent variety of Uzbek, and were recorded by a speaker of the same variety in Tashkent. All speakers were university students (20-24 years), and while typically fluent in Russian and/or English, Uzbek was their first language, and the one they used predominantly in their daily lives. The experiment consisted of reading short dialogues alternating with pictures of objects to be named, presented on PowerPoint slides. The recordings were made in a quiet location to the same computer used for the slide presentation, using a head-mounted microphone. Prior to beginning the actual experiment, the participants were given instructions (in Uzbek), and they had a practice session with items that were different from those included in the experiment itself. 3.2. STIMULI. The present study is part of an on-going cross-linguistic investigation of the acoustic properties word prosody (stress and tone) and focus. 4 Crucial to the project is the use of the same methodology for all of the languages, so that meaningful comparisons and typological generalizations may be made. Thus, the structure of the Uzbek stimuli is in conformity with that of the stimuli used in the other languages.
All of the stimulus words are real three-syllable words of Uzbek, expected to be known by a typical speaker of the language. Ten each of the target vowels /i, u, a/ appear in all three syllable positions, except for the absence of /u/ in the final syllable due to a gap in the language. 5 To the extent possible, the words consist of CV syllables; however, the target vowels are always in such a syllable. For the present study, we examine only words with canonical (final) stress; the vowels in the first two syllables are thus unstressed. Sample stimuli are shown with simple translations in Table 1. The full list of stimuli, along with their morphological segmentation and glosses, can be found in the Appendix. Each participant saw the full set of stimuli, in all three focus conditions described in Section 3.3., in one of two (pseudo-random) orders.

ELICITATION OF PROSODIC PATTERNS: CARRIER DIALOGUES. The
Uzbek target words appeared in short dialogues that primed focus either following the target or on the target, as in the previous investigations of Turkish and the other languages in the cross-linguistic project. To observe the lexical stress properties, without the confounding presence of focus properties, we examine the targets in the Non-Focus (NF) condition, which primes focus on a word following the target. Placing the NF target prior to the focused word avoids the potential risk of post-focal compression, a reduction of F0 after the focused word (Xu 2011), which could distort the properties of the target. The effects of focus are examined in dialogues that prime focus on the target. For Uzbek, two types of focus, Contrastive Focus (CF) and New Information Focus conditions (NIF), were primed by different dialogues. 6 The dialogue structures are provided in Table 2, where, in the answers, the target word (here tepada 'at the top') is underlined, and the focused word is bolded. Figure 1 shows a sample slide with the NIF condition.

Non-Focus Q:
O'tgan yili "tepada" so'zi Lolaning sevimli so'zi edimi? Last year "at the top" word Lola's favorite word was? 'Last year, was the word "at the top" Lola's favorite word?' A: Yo'q, o'tgan yili "tepada" so'zi mening sevimli so'zim edi, Lolanikimas. No, last year "at the top" word my favorite word was, Lola's-not 'No, last year, the word "at the top" was my favorite word, not Lola's.'
Last year which word your favorite word was 'Last year, which word was your favorite word?'
Last year "at the top" word my favorite word was 'Last year, the word "at the top" was my favorite word.' Table 2: Uzbek dialogue structures for the three focus conditions. Q = question; A = answer.  Table 2.) As can be seen, the targets in the answers -the items used for the acoustic analysis -are always followed by the same word, so'zi 'word', so that the context is constant, regardless of the focus condition. 7 As was previously pointed out, the dialogue slides were interspersed with slides showing pictures of common items that the participants had to name. These additional slides avoided repetitive prosody from one target to the next; the responses were not analyzed.
3.4. ANALYSIS. The data were segmented and analyzed using Praat (Boersma & Weenink 2019). The recordings of two participants were excluded due to technical problems, so the analysis is based on the speech of 4 female and 4 male speakers. Thus, measurements were made of a total of 1920 vowels (240 per speaker): 10 each of /i, u, a/ vowels across the three syllables, except for the absence of /u/ in Syllable 3 (= 80), all appearing in each of the three focus conditions. 7 A reviewer points out that other options may also have been possible for the context following the target. We used so'zi since this was the structure our consultant provided consistently for the different dialogues.
The target vowels were measured for duration, F0, intensity, and vowel centralization. For F0, two measurements were considered: mean F0 across the vowel, and F0 change, the contour from the beginning to the end of the vowel, determined by subtracting the mean F0 of the first quarter of the vowel from that of the last quarter of the vowel. Vowel centralization was measured as the Euclidean distance from the center of the acoustic vowel space (Winn et al. 2008). To permit the data of the eight speakers to be pooled, each measurement was normalized with zscores for vowel and speaker. 8 The normalized z-scores were tested for significance by a MANOVA, with Focus (Non-Focused, New Information Focus, Contrastive Focus) and Syllable (Syll1, Syll2, Syll3) as the independent variables, and Duration, Intensity, mean F0, F0 change, and vowel centralization as the dependent variables. When appropriate, significant MANOVA effects were followed up with ANOVA and post-hoc tests.

Results.
In the MANOVA, there were significant effects of Focus (Wilks' λ=.812, F(10, 3854)=42.2, p<.005, partial-η 2 =.099) and Syllable (Wilks' λ=.529, F(10, 3854)=144.4, p<.005, partial-η 2 =.273), as well as a significant interaction between Focus and Syllable (Wilks' λ=.910, F(20, 6392)=9.2, p<.005, partial-η 2 =.023). Since Intensity and Vowel Centralization were not significant for the interaction between Syllable and Focus, we will not discuss these measurements further. We thus discuss only the significant results for Duration, mean F0, and F0 change (i.e., contour). We first consider the patterns observed in the Non-Focus condition as a baseline for the stress properties since they are not combined with, or obscured by, the presence of additional focus properties. After examining the non-focus results, we discuss the differences between these findings and those of the two focus conditions for each measurement separately.
In the Non-Focus condition, we found a significant ANOVA effect of Syllable for each of the measurements considered here (Duration: F(2, 669)=44.7, p<.005; mean F0: F(2, 648)=22.2, p<.005; F0 change: F(2, 641)=23.3, p<.005). As can be seen in Figure 2, duration increases gradually from the first to the third syllable, with all three syllables being statistically significantly different from each other (p<.005). With regard to mean F0, it can be seen that the lowest value is on the final, presumably stressed, syllable (Syll3) (p<.005), while the first (Syll1) and second syllables (Syll2) do not differ statistically (p=.21). With regard to the F0 contours, all three syllables fall to some degree. Syll3 exhibits the largest fall (p<.005), thus accounting for the low mean F0 in that position. Syll2 has the smallest fall from the beginning to the end of the syllable, but it is still significant (p<.05). We may now examine each property separately in order to assess the effects of the two types of focus on the stressed syllable (Syll3). We also consider the effects the two focus types may have on the previous syllables. In Figures 3 -5, the results of the baseline NF condition are repeated, followed by the results for the CF and NIF conditions. Figure 3 first shows the mean duration of the target vowels. As can be seen, the duration of Syll1 remains essentially unaffected by focus (F(2, 740)=.100, p=.91), while the durations of Sylls2 and 3 show increases in both focus conditions in comparison with NF: CF (p<.005) and NIF (p<.005). Moreover, as we saw for NF, there is also a gradual increase in duration from the first to the last syllable in the focus contexts, so Syll3 is the longest in all of the conditions (NF:  Turning now to mean F0, we compare the patterns in the two focus conditions with that of NF (Figure 4). Differently from duration, F0 increases with respect to the NF values in all syllables under both types of focus: CF (p<.005) and NIF (p<.005). Interestingly, it is Syll2, not the word-final syllable (Syll3), presumed to bear lexical stress, that has the highest F0 under focus (CF: p<.05; NIF: p<.005). In fact, a somewhat similar pattern is also observed in the baseline NF condition, where the Syll3 F0 is significantly lower than that of Syll1 (p<.005); the F0 of Syll2 is Uzbek Non-Focus not, however, statistically different from that of Syll1 (p=.210). Syll3, by contrast, has the lowest F0 of the three syllables under NIF (p<.005), as it does without focus (p<.005), although, as noted, the F0 values in the NF condition are lower overall than those of both NIF and CF. The greatest increase in F0 on Syll3 is observed in CF, an effect that is significantly greater than that of the other focus condition, NIF (p<.005). Finally, as Figure 5 shows, all three conditions exhibit a considerable falling F0 contour on Syll3, while the other two syllables have relatively flatter contours (p<.005). Both types of focus show sharper F0 drops in Syll3 than NF (p<.005). In sum, the two properties that appear to lend prominence to the final syllable, expected to bear the canonical primary stress, are duration and F0, specifically the steep (falling) F0 contour. As noted, though, Syll2 has higher mean F0 than Syll3 and longer duration than Syll1, so a full understanding of the acoustic properties of stress in Uzbek must also include this information. With regard to the effects of focus, while both Contrastive and New information Focus generally enhance (increase) the acoustic properties across the board in the Uzbek stimuli, the increase in mean F0 on Syll3 associated with CF is substantially greater than that associated with NIF. New Info Focus 5. Discussion. In the following sections, we discuss our results with respect to the hypotheses about Uzbek stress and focus. We also consider how Uzbek prosody compares to that of Turkish, and fits into the typology of Turkic languages more generally.

DOES UZBEK HAVE FINAL STRESS? The initial question addressed by our investigation is
whether there is acoustic evidence for the canonical word stress pattern previously described for Uzbek (cf. Bidwell 1955, Sjoberg 1963, Bodrogligeti 2003, Bokhari & Washington 2015. To this end, we tested Hypothesis 1 that canonical (word-final) stress in the absence of additional focus properties in Uzbek would be manifested by enhancement of one or more of the acoustic properties typically associated with prominence. Only (mean) F0, F0 change and duration turned out to be statistically different with respect to syllable position and focus, so intensity and vowel centralization were not further examined. The presence of both the greatest increase in duration and the sharpest falling F0 contour on the final syllable may be viewed as confirmation of Hypothesis 1. The mean F0 values, however, introduce a potential challenge to this interpretation. That is, the fact that the mean F0 is highest on the penultimate syllable suggests this syllable may, instead, be the one bearing the lexical stress, which would then disconfirm Hypothesis 1.
Since focus typically enhances the properties of the stressed syllable of a word, we also sought further insight into the presence of final stress in Uzbek by assessing whether its properties are preferentially enhanced in our focus conditions, Hypotheses 2 and 3. Overall, both types of focus examined, CF and NIF, tended to enhance all of the statistically significant acoustic properties on all three syllables. Specifically with regard to enhancement of the final syllable, we found that duration increased with both types of focus, as did the extent of the F0 falling contour, which we may take as support for the hypotheses.
Our findings also confirm Hypothesis 4, since there were some differences between the enhancement patterns of CF and NIF. As was seen, there is a greater increase in duration with CF than with NIF. By contrast, while both CF and NIF show almost identical F0 increases in Syll2, NIF exhibits a sharper contour in Syll3, falling farther, than CF.
Thus, overall, our findings may be taken as support for the presence of canonical final stress in Uzbek; however, several observations invite further consideration. As was pointed out, the (mean) F0 pattern at first glance suggests that it may be the penultimate, rather than the final, syllable that exhibits the most additional enhancement under focus. In addition, we note that Syll2 shows a significant increase in duration with respect to Syll1, as well as a significant enhancement of duration under focus. The question is whether these patterns are sufficient to challenge the analysis of Uzbek as a language with (canonical) final stress.
With regard to F0, it is crucial to consider that the final syllable exhibits a steep falling contour, so its mean F0 is not fully informative, specifically in comparison with that of Syll2. In fact, the high F0 on Syll2 may be interpreted as providing the necessary height to permit the sharp falling contour on the final syllable seen in both focus conditions. That is, rather than indicating that the penult bears the lexical stress, the higher F0 throughout Syll2, followed by an even higher starting point in Syll3, serves to intensify the enhancement of the modest falling contour seen on the final syllable in the NF condition. With regard to the increased duration on Syll2, what is most noteworthy is, again, its change from Syll1, rather than its difference from Syll3. In fact, although Syll3 is consistently longer in all conditions, it is possible that the additional duration is, at least in part, attributable to a word-final lengthening effect.
Taken together, the patterns we observe in Uzbek do not lead us to reject final stress in favor of penultimate stress; however, they do suggest the possibility of a broader prominence pattern in the language. That is, since the focus enhancement appears to begin on the penultimate syllable, its domain may involve a two-syllable sequence, possibly a foot, at the right edge of the word.

COMPARISON WITH TURKISH.
Given that the only other Turkic language for which there is systematic acoustic prosodic data is Turkish, we are also interested in the extent to which the Uzbek manifestations of stress and focus compare to those of Turkish. Both languages are described similarly in terms of their word prosody, with canonical stress falling on the final syllable (e.g., Bokhari & Washington 2015, Kamali 2011); therefore, we may expect that the acoustic properties of the languages will be similar (Hypothesis 5).
Since the acoustic properties of Turkish stress and focus have previously been examined in the context of our larger cross-linguistic prosody project (Vogel et al. 2016) using the same methodology described for Uzbek, we may compare the findings of the two languages. The only differences between our Uzbek and Turkish studies are that in Turkish, we did not examine the first syllable properties, and we only compared one focus condition, New Information Focus, to the Non-Focus pattern. 9 In Figure 6, the baseline stress patterns, without focus properties, are shown for Uzbek (repeated from Figure 2) and Turkish. 10 As can be seen, the manifestations of stress in Uzbek and Turkish are not the same. There is no duration difference between Syll2 and Syll3 in Turkish, while in Uzbek, the considerably greater duration seen on Syll3 may be a manifestation of stress on that syllable. Also, where Uzbek seems to cue final stress with a distinct falling F0 contour on Syll3, Turkish only exhibits a small falling F0 contour. There is, however, a slight rise on Syll2 in Turkish, although none were observed at all in Uzbek. With regard to (mean) F0, although the difference is not very large, we see that Syll3 in Turkish has a higher F0 than Syll2, suggesting that this may be the manifestation of final stress, a pattern that we do not observe in Uzbek. 11 Figure 6. Uzbek and Turkish Non-Focus Condition. Z-scores for Duration, mean F0, and F0 contours in each syllable position. The Uzbek data are repeated from Figure 2. 9 Turkish was one of the first languages investigated in the Prosodic Typologies Lab, and since we were focusing our attention on the stressed syllables of a language, we did not consider syllable 1. For a fuller comparison with the languages investigated subsequently, including Uzbek, it will be important to collect additional data. 10 In the Turkish analysis, intensity and vowel centralization, at best, showed only weakly significant effects, and thus are not considered in the comparison with Uzbek. 11 The F0 and F0 contour differences in Turkish were found to be weakly significant in the Binary Logistic Regression Analyses presented in Vogel et al. (2016). Uzbek Non-Focus Turkish Non-Focus syll1 syll2 syll3 Figure 7 shows the acoustic patterns associated with NIF in Uzbek and Turkish, to be viewed in comparison with the stress properties without focus provided in Figure 6. As with lexical stress, we see that NIF is not manifested in the same way in the two languages. First, in Uzbek there is an increase in duration of the final syllable under focus, but in Turkish, the duration is similar in both conditions. With regard to F0, in Uzbek, the means increase on both Syll2 and Syll3 under NIF, but in Turkish, Syll2 does not show a change, and Syll3 actually shows a decrease. The F0 contour in Turkish shows the same slight rise on Syll2 with NIF as is seen in the absence of focus, while in Uzbek, we see a similar small decrease in both the NIF and NF conditions. The increase in steepness of the word-final F0 fall under NIF in Uzbek, however, is not seen in Turkish, where there is only a slight fall, as is also found in the NF condition. The F0 and duration properties observed with the canonical (word-final) stress patterns in Uzbek and Turkish are summarized in Table 3, and a summary of the changes in these properties due to NIF (compared to NF) is provided in Table 4. The properties are also assessed as potential acoustic cues for word-final lexical stress or NIF: "ü" indicates that the property is a likely stress or focus cue; "û" indicates that it is not such a cue.
Uzbek Turkish Duration longest on Syll3 ü no difference from Syll2 û F0 (mean) lower on Syll3 û higher on Syll3 ü F0 contour fall on Syll3 ü essentially flat û  Uzbek and Turkish belong to the same (Turkic) language family and are described as having essentially the same type of stress system (i.e., canonical final stress with certain types of exceptions); however, it turns out that the acoustic properties of stress in the two languages differ considerably. Turkish, like other languages with predictable stress, tends to have relatively weak stress cues (e.g., Athanasopoulou et al. 2017, Vogel 2020, exhibiting a minimal acoustic distinction of the final stressed syllable from the penult unstressed syllable. That is, it has been found that in languages with predictable stress, since the position of word stress is independently known, there is less need to enhance the stressed syllable than in languages with unpredictable stress such as Spanish and Greek (e.g., Vogel et al. 2016, Ortega-Llebaría & Prieto 2010, Arvaniti 2007. Given that Uzbek stress is described as predictable, like that of Turkish, we could expect it to exhibit a similarly weak manifestation, and indeed, we could expect a similarly weak manifestation of stress in other Turkic languages described as having related stress systems. What we observed in Uzbek, instead, is not only the use of different acoustic properties from those of Turkish, but a somewhat stronger presence of the properties corresponding to canonical (final) stress, as well as more substantial enhancement under focus. As seen in the previous sections, the main stress cues in Uzbek are increased duration along with a falling F0 contour on the final syllable, both of which are enhanced under focus. While these observations support the possibility of stress in that position, it must be noted that the same properties (final lengthening and F0 fall) are also often associated with boundary marking (Ladd 2008). Indeed, a similar challenge of teasing apart final stress and boundary properties will arise for all Turkic languages purported to have final stress. Nevertheless, at least with regard to Uzbek, we suggest that since the properties in question are observed under all three of our information structure conditions, the boundary effect interpretation is less likely to be accurate. That is, since we may expect different types of boundaries to be present in different information conditions, we might similarly expect somewhat different acoustic properties to be present in the different conditions, something that is not observed.
Finally, as was pointed out above, the combination of the higher F0 on the penult and the falling contour on the final syllable, along with the increase in duration on the penult and final syllables, especially under focus, suggest the possibility of a broader, shared distribution of the prominence properties of Uzbek, rather than their restriction to a single position of word-final stress. In fact, it is possible that such a disyllabic pattern may also ultimately lead to an increased role of the penultimate syllable in expressing lexical prominence, or even to a gradual shift towards penultimate stress in Uzbek, beyond the cases of pre-stressing suffixes and other stress irregularities. Given the similarities across the stress systems of Turkic languages as a group, it is thus crucial that future studies consider the acoustic properties of stress not only on word-final syllables, but also (at least) on the penult, even if it is expected that canonical stress is on the final syllable.