Spanish-influenced rhythm in Miami English

This study found that monolingual English speakers from Miami speak an English variety influenced by Spanish. In this study, speech from Miami English monolinguals, English monolinguals not from Miami, and early and late SpanishEnglish bilinguals were collected, and rhythm metrics (Ramus et al., 1999) were compared between groups. Surprisingly, results also suggest that Miami English monolinguals with English-speaking parents and from neighborhoods with a lower Hispanic population may be leading this change. These results support Labov’s (2014) claim that children may reject features of their parent language (in this case, English) when the speech community is highly stratified.

The language groups investigated in this study are (1) English monolinguals from Miami, 3 (2) English monolinguals from Ithaca, 4 (3) Spanish-English bilinguals who are from Miami and learned English at an early age, and (4) Spanish-English bilinguals who are not from Miami and learned English at a later age-henceforth, MEMs (Miami English Monolinguals), IEMs (Ithaca English Monolinguals), EBs (Early Spanish-English Bilinguals), and LBs (Late Spanish-English Bilinguals), respectively.
To compare the rhythm of these linguistic groups, read speech was collected and compared using rhythm metrics (Ramus et al., 1999). Results show that MEMs' rhythm differs from (non-Miami) English and are comparable to Spanish-English bilingual speech. Surprisingly, results further suggest that MEMs with English-speaking parents (MEME) and from neighborhoods with a lower Hispanic population (MEML)-who likely have less direct contact with Spanish than MEMs with Spanish-speaking parents (MEMS) or from neighborhoods with a higher Hispanic population (MEMH)-may be leading this change.
Consequently, this study provides evidence for Labov's (2014) claim that children may reject features of their parent language (in this case, English) when the speech community is highly stratified. This study argues that frequent contact between English and Spanish speakers in Miami-as well as the social, political, and economical prominence of Spanish in Miami (Lynch, 2000)-is causing Miami English to acquire Spanish-influenced prosodic properties. Further, it sheds light on how language contact can influence prosody, creating new language varieties in diverse speech communities.
In the remainder of this paper, I briefly discuss relevant literature on rhythm, bilinguals and Miami (Section 2). In Sections 3 and 4, I describe the study and present results. Last, in Section 5, I discuss the results and conclude.

Background.
2.1. RHYTHM: PROBLEMS WITH CATEGORIZING AND MEASURING IT. While previous attempts have been made to categorize speech rhythm, there has been much disagreement about how to do so, despite obvious cross-linguistic differences. Still, the notion of rhythm classes has appeared repeatedly throughout the literature: According to Arvaniti (2012), "Despite the lack of evidence to support it, the notion of rhythmic classes has remained popular and has been relied upon in research in phonology (e.g., Nespor & Vogel 1989, Nespor 1990, Coetzee & Wissing 2007 and especially in research in language acquisition and speech processing" (4). Nava (2011) defines rhythm as "the regular occurrence of a beat event, such that there is a perceived patterning of 'heavy' (or strong) and 'light' (or weak) elements, and this perception results from the acoustic correlates, such as duration, pitch, intensity, and spectral quality, associated with stressed versus unstressed syllables" (84). Based on these acoustic correlates, languages have been characterized in the past as having one of two rhythms: a syllable-timed or stress-timed rhythm-and more recently, a third: a mora-timed rhythm (Pike 1945, Abercrombie 1967, Ramus et al. 1999. Romance languages, like Spanish or Italian, were labeled as syllabletimed because of their 'machine-gun rhythm.' Germanic languages, like English or Dutch, were considered stress-timed because of their 'Morse code rhythm.' Languages like Japanese and Tamil were labeled as mora-timed.
However, Dauer (1987) proposed a "continuous uni-dimensional model of rhythm, with typical stress-timed and syllable-timed languages at either end of the continuum" (Ramus et al. 1999:268-269, Nava, 2011. Under this approach, the phonetic and phonological properties of a language, such as syllable structure and the presence of vowel reduction, influence the rhythm of a language. The more a language possesses characteristic properties of a stress-timed or syllabletimed rhythm, the more syllable-timed or stress-timed the language is considered to be along the continuum. English, for example, is considered to be stress-timed because it has vowel reduction and a highly varied syllable structure inventory. Contrastingly, Spanish is syllable-timed because it does not have vowel reduction and has much less syllable structure variety. Catalan, however, falls somewhere between syllable-timed and stress-timed rhythm on the rhythm continuum. While Catalan has a syllable structure similar to Spanish, it also has vowel reduction similar to English. Therefore, it is not possible to label Catalan as strictly syllable-timed or stress-timed. Comparably, Levelt & van de Vijver (1998) proposed that there are five classes of rhythm, rather than the basic three (Ramus et al. 1999). "Three of these classes seem to correspond to the three rhythmic classes described in the literature. It might very well be that the other two classes-both containing less studied languages-have characteristic rhythms, pointing to the possibility that there are more rhythmic classes rather than a continuum" (Ramus et al. 1999:269). Thus, as has been shown, categorizing rhythm is not a straightforward task, making categorizing the rhythm of Miami English somewhat difficult and arbitrary.
Similarly, how to analyze and measure rhythm is still under debate. Various methods, called into question by Tilsen & Johnson (2008), have been utilized across studies-such as Grabe & Low's (2002) PVI, Nava's (2011) comparison of stressed versus unstressed syllable durations, and Wagner and Dellwo's (2004) YARD (Yet Another Rhythm Determination) (Arvaniti 2012). However, the measurement used in this study-and thus relevant for discussion here-is Ramus et al.'s (1999) %V and ∆C. Ramus et al. (1999) measure and analyze rhythm by segmenting speech into vocalic and consonantal intervals. The duration of an utterance is equal to the sum of the duration of vocalic and consonantal intervals; silences durations are excluded. "A vocalic interval is located between the onset and the offset of a vowel, or of a cluster of vowels, . . . [and] a consonantal interval is located between the onset and the offset of a consonant, or of a cluster of consonants" (Ramus et al. 1999:271). An illustration of how 'its path high above' would be segmented according to this method is shown in Figure 1. The total duration of consonantal and vocalic intervals within an utterance are calculated using this segmented speech, along with the standard deviation of consonantal intervals (∆C) and the proportion of vocalic intervals (%V). According to Ramus et al. (1999), results show that rhythm can be inferred from %V and ∆C. Specifically, a higher ∆C demonstrates that a language permits heavier syllables because the number of consonants allowed in syllables is more flexible. A lower %V means that there is a greater consonant/vowel ratio. Thus, Ramus et al.'s results show that English has a lower %V and a higher ∆C than Spanish.
These measurements are reliable for Spanish and English due to differences in syllable length. For instance, vowel reduction occurs frequently in English. Consequently, "stressed syllables are 50% longer than unstressed in English, whereas in Spanish that difference is only 10%'' (Nava 2011:89). L2 acquisition of this feature can be troublesome for second-language learners of English. Several studies have shown this to be the case. Wenk (1985), for instance, studied the acquisition of English rhythm by L1 French speakers, looking at pitch changes and vowel duration (Nava 2011). "In describing the L2 acquirer's 'rhythmic interlanguage', he isolates the acquisition of vowel reduction as key in moving from one type of rhythm to the other" (Nava 2011:81). Similarly, Adams & Munro (1978) examined the production of English stress and rhythm by L1 speakers of English versus L2 speakers of English whose L1 was one of "various Asian languages" (Nava 2011:82). In their study, they discovered that the non-native speakers of English consistently produced longer vowels in unstressed syllables than native English speakers. Similarly, Carter (2005) and Gut (2003) found vowel reduction/deletion to be infrequent by L2 learners whose L1 is a Spanish or a Romance language, respectively (Nava 2011). Thus, it is expected that L2 learners of English living in Miami will have trouble adopting English stress patterns and corresponding vowel reduction.
As shown, categorizing and measuring rhythm are not straightforward tasks. Fortunately, for the purposes of this study, it is not imperative to resolve the rhythm debate, as the goal of this study is not to assess the cross-linguistic validity of Ramus et al.'s (1999) measurement. Rather, this study aims to compare vowel and consonant durations of several linguistic groups who are all uttering the same passage: a portion of "The Rainbow Passage" in English. Thus, English is being compared to English, not English to Spanish. For this purpose, Ramus et al.'s measurement is sufficient.
2.2. BILINGUALS: THEY ARE DIFFICULT TO CATEGORIZE TOO. There are two groups of bilinguals in this study: EBs and LBs. However, the bilinguals tested in this study do not fit into two sharply distinct groups. Rather, because of their wide variety of language experience, each bilingual is different. It is as Holt wrote: "We do not yet have a complete understanding of how speech categories [or an L2 in this case] are learned in infancy or adulthood. At least part of the reason for this is that it is not feasible to entirely control and manipulate speech experience" (2011:350). Despite this, attempts have been made to account for and categorize the differences in bilingual speech production and perception.
For example, according to Escudero (2011), there are two types of bilinguals: sequential and simultaneous bilinguals. Sequential bilinguals are second-language learners, people who learned a second language after learning their first language. Simultaneous bilinguals are bilinguals who learned two languages at the same time. Studies have shown that these two bilingual groups differ in speech perception and production abilities and that simultaneous bilinguals are able to achieve monolingual-like L2 speech perception (Escudero, 2011).
However, according to Holt (2011), sequential bilinguals can still achieve native-like perception (and, hopefully, consequently production) (351). Therefore, depending on the amount of L1 and L2 input, a sequential bilingual, especially one who learned their L2 at an early age, can perform their L2 with native-like accuracy. Based on these assumptions, three major factors will likely influence this study's participants' speech production: whether or not the participant learned his/her L1 and L2 simultaneously, 5 whether or not the participant learned his/her L2 at an early age, and the amount of L1 versus L2 input the participant receives.
Thus, we can think about bilingual speech production as a continuum, where monolingual speakers of languages X and Y are at each end of the continuum, respectively, and the X-Y bilinguals fall somewhere in between, depending on their language experience. In this study, the amount of Spanish input will likely vary depending on the demographics of a participant's neighborhood. For example, a sequential bilingual who lives in Hialeah (94.7% Hispanic) may have had more Spanish input than a sequential bilingual living in Aventura (35.8% Hispanic). Similarly, an MEM who lives in Hialeah will have greater Spanish input than a monolingual living in Aventura, and an English monolingual from any part of Miami will have greater Spanish input than an IEM. Thus, a speech continuum for the 4 linguistic groups in this study would likely look like Figure 2, with IEMs on one end (the most like an English monolingual, the least Spanish input) and LBs on the other (the most like a Spanish monolingual, the most Spanish input).  Fradd (1996), "Miami has more Spanish-language television channels, radio stations, and newspapers than the cities of Los Angeles and New York combined," and "Miami controls 43% of all U.S. trade with the Caribbean, 28% of all U.S. trade with South America, and almost half of all the trade with Central America'' (Lynch 2000:274). Since 1973, almost every Miami mayor has been Hispanic, and every mayor serving his or her first term since 1985 has been Cuban-born (Joyner 2008).
Because Hispanics hold high social, economical, and political positions, there is less-if any-stigma against having a Spanish accent in Miami. According to Lynch (2000), Miami is the only major metropolitan area in the world where Spanish and English compete for social, economic, and political prevalence. A complete shift to English at the expense of bilingualism appears not to be a requirement for achieving the American dream in South Florida; as Gustavo Pèrez-Firmat (1994) writes, "Sometimes the American dream is written in Spanglish." (272) Because of this, we can assume that Spanish speakers use their L1 more frequently than in most other U.S. cities, causing L1 input for Miami bilinguals to be high and English monolinguals to frequently come into contact with Spanish.

The Study.
3.1. RESEARCH QUESTIONS. This study aims to answer the following questions: The materials were presented to participants in the following order: a language-background questionnaire and reading passage "The North Wind and the Sun" (of Aesop's Fables). When presented with the reading passage, participants were asked to read the materials once to themselves and then once aloud; the readings were audio recorded. Of the materials, the language-background questionnaire and the first eight lines of "The Rainbow Passage" were used in the present analysis; these eight lines are presented below: When the sunlight strikes raindrops in the air, they act as a prism and form a rainbow. The rainbow is a division of white light into many beautiful colors. These take the shape of a long round arch, with its path high above, and its two ends apparently beyond the horizon. There is, according to legend, a boiling pot of gold at one end. People look, but no one ever finds it. When a man looks for something beyond his reach, his friends say he is looking for the pot of gold at the end of the rainbow. Throughout the centuries people have explained 7 Testing within this age range helps to ensure that any results used as evidence for or against the existence of an emerging Miami dialect are characteristic of a younger generation of Miami-English speakers. Additionally, focusing on younger speakers means major shifts in their demographics are less likely. the rainbow in various ways. Some have accepted it as a miracle without physical explanation.
3.5. ANALYSIS. Each participant's results were segmented into vocalic and consonantal intervals, silences, and disfluencies in Praat; syllabic consonants were labeled as vowels and disfluency durations were not counted in duration totals. 8 Then, the resulting text grids were read and analyzed in Matlab. Figure 1, above, illustrates how 'its path high above' was segmented in this study. The data were analyzed by-utterance. 9 In this analysis, vocalic and consonantal interval durations were added to the total duration of an utterance until there was a silence of 200 ms or greater. According to Thomas & Carter (2006), "Butterworth (1980) notes that 200 ms has become a standard minimum threshold for a pause in studies of pauses" (340). Assuming this standard, the duration of a new utterance began to be calculated after any 200+ ms pause, resulting in multiple %V and ∆C values for each participant.
For example, Figure 3 shows a stretch of speech where a speaker pauses (labeled 'S') for more than 200 ms (476 ms). This silence falls between 'the rainbow is a division of white light into many beautiful colors' and 'these take the shape of a long round arch . . .' The consonantal and vocalic intervals before that silence (and after the preceding silence) are equal to one total utterance duration, and the intervals after that silence (and before the next silence) are equal to another total utterance duration. For each utterance, the proportion of vocalic intervals (%V) and the standard deviation of consonantal intervals (∆C) were calculated.  Table 3. For results in this section, all data two standard deviations from the mean were removed from the analysis. As a result, of the 519 total values for %V and for ∆C, 488 %V values and 496 ∆C values were used in the analysis. 8 Disfluency durations were not included to ensure that participants' recordings included the same exact stretch of speech. 9 For by-subject analysis, see Enzinna (2015), linked in Footnote 2.  IEMs (48.23,5.80), at the .05 level of significance. An explanation for why LBs' %V is lower than expected is provided in Section 4.4. For those reasons, I remove LBs from future statistical analyses. All other comparisons were not significant. The significant relationships are displayed in Figure 4; the linguistic group in blue differs significantly the linguistic groups in red. Next, two-sample t-tests were conducted to compare %V for IEM, MEM, and EB groups. The t-tests revealed that IEMs and MEMs differed significantly from each other, t(209) = -3.11, p = .002; but MEMs and EBs did not, t(233) = -1.63, p = .102. IEMs and EBs had a nearsignificant difference in %V, t(216) = -1.76, p = .079.

Group
A one-way analysis of variance was conducted to compare ∆C across all 4 linguistic groups. The ANOVA did not yield a significant difference between the linguistic groups, F(3, 492) = 0.99, MSE, = 0.00, p < .397.
In Figure 5, the mean %V and ∆C for each participant and the mean and standard error of %V and ∆C for each linguistic group are displayed. Results suggest that parent language may be an influencing factor for MEMs' %V, but not for ∆C. Specifically, MEMEs have a greater %V than IEMs and EBs, but MEMSs do not.
The mean %V and ∆C for each linguistic group are shown in Table 4. All data two standard deviations from the mean were removed from the analysis. As a result, of the 349 total values for %V and for ∆C, 316 %V values and 341 ∆C values were used in the analysis.  Figure 5; the linguistic group in blue differs significantly the linguistic groups in red. Of interest is whether the number of Hispanics in a particular area, and thus Spanish speakers, influences %V and ∆C results. To determine this, MEM and EB participants were divided into two groups: participants from an area with a high Hispanic population (H Area) and participants from an area with a low Hispanic population (L Area). Accordingly, MEMs were split into two groups: MEMHs 11 (3 participants) and MEMLs (7 participants), and EBs were split into two groups: EBHs (5 participants) and EBLs (5 participants). A participant is described as living in an H Area if they lived for at least a year in an area with a population that is 50% Hispanic or greater; otherwise, the participant was described as living in an L Area. 12 These 4 groups are compared with IEMs.

Group
Results suggest that neighborhood demographics do influence rhythm: MEMs and EBs from L Areas have significantly greater %V than MEMs and EBs from H Areas and IEMs. Additionally, results show that MEMLs have a significantly lower ∆C than IEMs.
The mean %V and ∆C for each linguistic group are shown in Table 5. All data two standard deviations from the mean were removed from the analysis. As a result, of the 349 total values for %V and for ∆C, 316 %V values and 341 ∆C values were used in the analysis.  Table 4. Mean (and standard deviation) %V, ∆C for linguistic and neighborhood groups Two-sample t-tests compared %V for IEMs, MEMLs, MEMHs, EBLs, and EBHs. The ttests revealed a significant difference between MEMLs and MEMHs, t(112) = -1.88, p = .062; EBLs and EBHs, t(119) = -2.13, p = .035; IEMs and MEMLs, t(164) = 4.10, p < .001; IEMs and EBLs, t(157) = 2.72, p = .007; MEMLs and EBHs, t(126) = -3.25, p = .001. No significant dif-ferences were found between the following groups: IEMs and MEMHs, t(140) = 1.06, p = .290; IEMs and EBLs, t(154) = 0.21, p = .830; MEMLs and EBLs, t(129) = -1.20, p = .230; MEMHs and EBLs, t(105) = 0.93, p = .352; MEMHs and EBHs, t(102) = -0.74, p = .457. The significant relationships for MEML are displayed in Figure 6, and the significant relationships for EBL are displayed in Figure 7; the linguistic group in blue differs significantly from the linguistic groups in red.  First, as reported in Section 4.1, LBs have a significantly lower %V than MEMs. This result is unexpected because Spanish has a higher %V than English (Ramus et al. 1999). To account for this difference, the duration of two consonant clusters ('str' in 'strikes' and 'ndr' in 'raindrops') and two reduced vowels (the first 'a' in 'apparently ' and 'i' in 'beautiful') 13 were measured in the speech of all 40 participants. The results show that LBs have the greatest mean duration of consonant clusters; this suggests that LBs have difficulty producing consonant clusters, increasing %C (the proportion of consonantal intervals) and lowering %V as a result. Second, the results show that LBs have the greatest mean duration of (what is expected to be) reduced vow-els. This finding supports the claim that L2 learners of English have difficulty acquiring and producing English stress patterns (Nava 2011). The mean durations (in seconds) are presented in the Third, the number of total intervals (including consonants, vowels, and silences) and the number of silences correlate with ∆C for LBs. 14 A Pearson product-moment correlation coefficient was computed to assess the relationship between ∆C and the number of total intervals, showing a positive correlation, r = .7384, p = .014. A Pearson product-moment correlation coefficient was also computed to assess the relationship between ∆C and the number of silences. Again, there was a positive correlation, r = .7290, p = .0168. Thus, a greater number of intervals and silences correlated with a higher ∆C. 15 Therefore, these results suggest that LBs who have difficulty producing L2 speech have difficulty producing consonant clusters, resulting in a higher ∆C and a lower %V than predicted.

Discussion & Conclusions.
5.1. MIAMI ENGLISH RHYTHM COMPARISON. When examining the influence of various mechanisms-parent language and Miami neighborhood demographics-on Miami English rhythm, a trend emerged: MEMs and EBs with (likely) less Spanish input (MEMEs, MEMLs, EBLs) have a higher %V than IEMs, while their counterparts (MEMSs, MEMHs, EBHs) do not. What's more, in some cases, these low-input groups have a higher %V than their counterparts. These results differ from Hypothesis 2, which predicts that MEMs and EBs with more Spanish input have a higher %V and lower ∆C than those with less Spanish input.
Regarding parent language, results suggest that MEMEs have a greater %V than IEMs, but MEMSs do not. Regarding neighborhood demographics, MEMLs and EBLs have a significantly greater %V than MEMHs, EBHs, and IEMs. Additionally, MEMLs have a significantly lower ∆C than IEMs. These results suggest that MEMs and EBs with less access to the dominant speech community, in regards to proximity and familial connections, are leading this dialectal change.
Regarding language change in diverse speech communities, Labov (2014) argues that "children may or may not adopt features of parental language, depending on how these features match the features of the speech community. Children may reject the patterns of parental language and conform to the patterns of the surrounding community instead, especially in richly stratified societies whose members belong to different social and dialectal groups" (Celata & Calamai 2014:3). MEMs may be adopting Spanish prosodic characteristics for the same reason. As discussed in Section 2.3, Hispanics in Miami hold high social, economical, and political positions. Thus, speaking Spanish may assist in creating economic or social connections. Additionally, it is inevitable that most MEMs and Miami EBs have Spanish-speaking friends. As a result, MEMs and EBs may (unconsciously) adopt Spanish prosodic features in order to assimilate into the 14 There is no such correlation for other linguistic groups. 15 Correlation figures can be viewed in Enzinna (2015), linked in Footnote 2. dominant speech community, particularly when parent language and/or neighborhood do not provide them with immediate connections to that community. 5.2. UNEXPECTED LATE BILINGUAL RESULTS. In Section 4.1, LBs have a significantly lower %V than MEMs. This result does not support Prediction 1b, that LBs will have the greatest %V of all 4 linguistic groups. This prediction assumed that L1 prosodic features would carry over into the L2. However, several durational comparisons from this study suggest that LBs have difficulty reading and/or producing L2 speech, likely causing these unexpected results.
For example, LBs have the greatest mean duration for all consonant duration comparisons. In this study, LBs have the greatest mean duration of consonantal intervals and the greatest mean duration of consonant clusters. These results suggest that the LBs have difficulty producing L2 consonants/consonant clusters.
Additionally, in this study, LBs have the greatest mean total silence duration and the greatest mean number of silences, as well as the greatest mean duration of all intervals. Further, LBs' total number of intervals and total number of silences correlate with ∆C: a greater ∆C occurred with a higher number of intervals and silences. All of these findings support the notion that LBs' L2 speech is affected by L2 reading and/or production difficulties.
As shown in Stockmal, Markus, & Bond (2005), who examined the rhythm of Latvian when spoken by native speakers and proficient and non-proficient learners, L2 production is affected by slower reading times: "Proficient [L2] learners read somewhat more slowly, while the nonproficient learners were the slowest as one would expect . . . ∆C decreased with increasing speaking rate but there was no tendency for %V to increase" (61). However, unlike Stockmal, Markus, and Bond, %V was affected by production/reading difficulties in this study (Section 4.1). 5.3. CONCLUSION. Results from this study suggest that Miami is developing its own variety of English, one with Spanish-influenced rhythm (a higher %V), and MEMs with less Spanish input are driving this dialectal change. This finding support Labov's (2014) claim that children may adopt features of the dominant speech community and reject features of their parental language when the speech community is largely stratified. In this case, MEMs with English-speaking parents are rejecting English %V and adopting Spanish %V. The results in this study further extend this claim to neighborhood demographics: a speaker will adopt features of the dominant speech community when the speaker has less direct access to that community. In this case, English monolinguals living in Miami-a city with a high Hispanic population-adopt features of Spanish when they live in areas of Miami with a lower Hispanic population.