Diphthongs are micro-feet: Prominence and sonority in the nucleus*

Diphthongs are usually described as falling or rising in sonority rather than in prominence. MartínezParicio & Torres-Tamarit (2013), however, have recently analyzed the glide in diphthongs with a sonority plateau, i.e., iu and ui in Spanish (iú, uí) and Catalan (íu, úi) as the result of a reranking of prominence alignment constraints: trochaic prominence gives you Catalan and iambic prominence, Spanish. We push this idea further and claim that diphthongs are generally structured like metrical feet, which we call micro-feet to distinguish them from their larger cousins. We find evidence for left-prominent trochees, right-prominent iambs, and sonority-sensitive micro-feet with prominence on the most sonorous vowel (ái, iá). In sonoritysensitive systems, diphthongs with a sonority plateau can default to iambs (Spanish) or trochees (Catalan), as we have seen. While metrical feet can be quantity-sensitive or quality-sensitive (Kenstowicz 1997, Zec 2003), we propose that micro-feet can only be quality-sensitive (sonority-sensitive), and are never quantitysensitive. Length only correlates with prominence in diphthongs as a phonetic correlate of stress (as in Cangnang Southern Min Chinese, Hu & Ge 2016). This difference between metrical feet and micro-feet, we argue, stems from the fact that a nucleus can have maximally two moras. With this restriction, length contrasts in diphthongs, as attested in Estonian, Saami or Seri (Marlett 1981, 1988) and many other languages are surprising at first. To account for this, we follow Broselow (1992) and others, assuming that a length distinction in diphthongs is represented as morasharing. In short diphthongs, both vowels share one mora, in long/heavy diphthongs each vowel has a mora, and in super-heavy diphthongs the long vowel shares the second mora with the short vowel. Any constraint that picks a two-mora sequence for stress/prominence assignment over a one-mora unit thus cannot decide between the long and the short vowel in a diphthong since the long vowel shares one mora with the short vowel. This analysis also explains an interesting prominence reversal in Sami diphthongs. Short diphthongs are quality-sensitive (ié), while long diphthongs are trochaic (íe). If diphthongs are microfeet, as we proposed, the question arises whether these feet are parallel to the metrical feet in every language, i.e., whether the metrical foot type in a given language is the same as the micro-foot type. Seri, for example, has right-aligned quantity-sensitive trochees and within its diphthongs, prominence falls on the more sonorous vowel or defaults to the left in diphthongs with a sonority plateau; micro-feet thus parallel metrical feet – both are trochaic. In Spanish on the other hand, stress feet are trochaic (e.g., Roca 2007) while diphthongs default to iambs: Spanish diphthongs are generally quality-sensitive (áj, já), but those with sonority plateaus are iambic, (jú, wí, *iw, *uj). Thus Seri has trochaic metrical feet and defaults to trochaic micro-feet as well, while Spanish has trochaic metrical feet but defaults to iambic micro-feet. A question that has been discussed in the literature is whether diphthongs have one segment or two (e.g., Kehrein 1994). As single segments they could be similar to affricates. We will not enter this discussion here. We focus instead on the relations between prominence, sonority and length. In language descriptions, one sometimes reads that all tautosyllabic vowel sequences are sequences of a vowel and a glide or vice versa and thus not true diphthongs. To extend the empirical coverage of our claims, we define any tautosyllabic sequence of vocoids as a diphthong (aj, ai, a ͡ i͡, etc.). Gliding is considered to be

part of the phonetics of the prominence differential between the two parts of a diphthong.
There is sometimes reason to consider a prevocalic glide as an onset, of course, even if it is postconsonantal. Consider for example the glide in Italian fiore [fjore] 'flower'. It can be analyzed as a rising diphthong (ió) or as a complex onset (fj) followed by a monophthongal nucleus (o). In this particular case the glide developed historically from a lateral that constituted a complex onset with the preceding consonant and is thus more likely to be part of the onset than the nucleus. A further argument against an analysis of prevocalic glides as part of a diphthong is the non-existence of complex onsets before glide-vowel sequences in a language that displays complex onsets elsewhere. Mutatis mutandis, the glide at the end of vowel-glide sequences in Ulwa (Green 1999) can be either voiced or voiceless and is thus clearly a consonant, and we consider it to be in the coda, not in the nucleus. Consequentially, Ulwa falling diphthongs are not "proper" diphthongs and do not enter the typology. See 4.2 for more discussion of Ulwa.
The rest of the paper is organized as follows. In §2 we introduce our claim that diphthongs are microfeet and the expectations we derive from this. §3 exemplifies the two main foot types and §4 discusses the role of quality and quantity in micro-feet. In §5 we propose an OT analysis and §6 concludes.

The micro-foot typology
Diphthongs have a very restricted set of options when it comes to the combinatorics of vowels and the location of prominence. Languages often restrict diphthongs by the backness, height, and rounding of the vowels involved: many only have diphthongs that combine a low and a high vowel, as in ai and au, and realize the low vowel with more prominence, reducing the high vowel in length and loudness (often transcribed ai̯ and au̯ or aj and aw). In this case we see a maximal distance in the sonority hierarchy. Low vowels are of higher sonority than high vowels and mid vowels are intermediate. In addition, prominence aligns with sonority. In this paper we look into how sonority and the location of prominence combine. Apart from sonority alignment, i.e., prominence on the most sonorous vowel, prominence can align with the left or the right edge of the syllable nucleus. We call such left-aligned prominence trochaic micro-footing and the corresponding right-aligned prominence iambic micro-footing. Either foot type can combine with sonority sensitivity. We expect sonority-sensitive diphthong systems to show whether they are iambic or trochaic in diphthongs with a sonority plateau.
The iambic-trochaic law (Hayes 1995 and references therein) connects higher intensity with group-initial prominence, i.e., trochaic feet, and increased length with group-final prominence, i.e., iambic feet. Parker (2002) shows that sonority is directly connected to (intrinsic) intensity. Since sonority in the form of vowel height plays a major role in the choice of vowels that can be combined in a diphthong, one might conclude that diphthongs are preferably trochaic because of the connection between sonority and intensity. A second factor that could warp the typology of diphthongs in that direction is the restriction of moras in syllables. If syllable rhymes are maximally bimoraic, no vowel in a diphthong can be singled out as the prominent one in terms of mora count. Micro-feet thus cannot be quantity-sensitive in the way that iambic feet are claimed to be; they can only be quality-sensitive, sensitive to the inherent sonority of the vowels they contain. If we link Parker's connection between intensity and sonority we could hypothesize that diphthongs are not only preferably trochaic, but also that trochaic diphthongs are quality-sensitive while iambic diphthongs are not. The iambic-trochaic law, however, has been challenged repeatedly (Kager 1993, van de Vijver 1998, Crowhurst 2016, Metzler & Driscoll 2018 and if it is not valid in phonology, we do not expect to find any iambic-trochaic law effects in the typology of diphthongs. Given this, our expected typology of diphthongs is summarized in (1).

Foot type
The two foot-type constraints IAMB and TROCHEE can be ranked with either dominating the other, determining whether diphthongs in a language are realized with more prominence (length, loudness, pitch) at the beginning or at the end. We discuss trochaic diphthongs in Portuguese, Hawaiian, Crow, and Hmong in §3.1, and iambic diphthongs in Estonian and Seward Peninsula Inupiaq in §3.2.

Trochaic prominence
We find trochaic patterns aligned with sonority (ái, áu), against sonority (ía, úa), and resolving ties in cases of sonority plateaus (íu, úi). While in the first type the high sonority vowel in the diphthong is also the prominent one, in the second type it is the low sonority vowel that is prominent, because it is the first part of the diphthong. We illustrate each type in turn.
Crow (Siouan, Graczyk 2007 and our fieldwork) has three diphthongs [iə uə eə], all ending in a schwa off-glide. Since schwa is not part of the contrastive vowel inventory of Crow we follow most Siouanists in assuming an underlying low vowel. Either way, these are trochaic diphthongs of rising sonority, as long as high [i, u] are more sonorous than low [a] or central [ə].
White Hmong (Hmong-Mien, Heimbach 1980 and our fieldwork) has five diphthongs of falling and rising sonority (ai, aɨ, au; ia, ua), all with prominence on the first element. As in Crow, some of these phonetically include centralized vowels (iə, uə), but this does not affect the sonority issues here or the analysis. Finnish has 18 diphthongs (Karlsson 1983:83, Pöchtrager 2006) with a trochaic difference in length: the first vowel is on average 110ms and the second only 60ms (Niemelä & Määttä 1998).
We assume that all these diphthongs are tautosyllabic bimoraic microfeet.
(3) MicroTrochee: (x.) µµ Vowels are contrastively short or long, while diphthongs count as long. Within diphthongs the first part is consistently shorter than the second, with the first vowel ranging from 75-105ms, and the second from 100-Golston, Krämer 115ms in length (Piir 1985). We conclude that this subphonemic length difference is a phonetic cue to prominence and that Estonian diphthongs are right-prominent, i.e., iambic micro-feet. Tautosyllabic bimoraic vowel sequences are micro-iambs (cf. Kager 1993, Metzler & Driscoll 2018. (3) Micro-iamb: (.x) µµ VV Estonian prosody is extremely predictable, as in Finnish, with stress on the first syllable of the word and alternating unstressed and stressed syllables thereafter. Estonian has trochaic stress feet (Lehiste 1965, Gordon 1997, showing that there is no necessary connection between stress feet and micro-feet within a language. Another example of iambic micro-feet is Seward Peninsula Inupiaq (Eskimo-Aleut), which has rightedge prominence in diphthongs and a shift in word-final position to left-edge prominence, as described by Kaplan (1985:193). We assume that his 'vowel clusters' are diphthongs, while what he refers to as diphthongs are only historically or orthographically diphthongs: "Whereas long vowels (aa, ii, and uu) and diphthongs (ai [eː] and au [oː]) are stressed uniformly throughout their entire quantity, vowel clusters (iu, ui, ia, ua) receive stress on their second member, e.g. puítuq 'it is swollen', except in a word-final syllable, e.g. úi 'husband' and niɣ ̣ iřúaq 'one who is eating'."

Quality and quantity
In this section we show that vowel quality/sonority can determine the placement of prominence in a diphthong; that phonological length does not determine prominence location; and that length serves only as a phonetic cue to prominence in diphthongs, as we saw with Estonian above.

Quality
In the sonority hierarchy (Selkirk 1984, Parker 2002 and references therein) vowels are ranked by openness/height, with lower vowels of higher sonority than higher vowels. Kiparsky (1979) proposes a sonority hierarchy for vowels in which backness/roundness plays a role as well, yielding the hierarchy in (3).
(3) Sonority hierarchy for vowels a > e > o > i > u The relative sonority of vowels plays a role in unstressed vowel reduction (Crosswhite 2001, de Lacy 2006 inter alios) and in stress assignment in some languages (Kenstowicz 1997, pace Shih 2016, Shih & de Lacy 2019see as well Zec 2003). In a number of languages, vowel height/sonority determines the location of prominence in diphthongs. In Italian (Indo-European), for example, high vowels are reduced in a diphthong with a non-high vowel (ai̯ , au̯ vs. i̯a, u̯ a; Bertinetto & Loporcaro 2005, Veer 2006). In Italian diphthongs with a sonority plateau, the prominent vowel can be on either side (wi, uj, ju -though *iw). It is a matter of debate, however, whether the onglides are in the nucleus or part of the onset (see van der Veer 2006, Krämer, in press, for discussion).
In Sixian Hakka (Sino-Tibetan; Hsu 2004), u is reduced in high-high diphthongs (iu̯ , u̯ i) and i survives u in syllable contraction. Moreover, i has higher intrinsic amplitude than u. Hsu concludes that iu is a falling diphthong and ui a rising diphthong. We surmise that diphthongal prominence is placed on the vowel with higher sonority in Sixian.
The four vowels of Seri (isolate) are transcribed as a, ɛ, o, i by Marlett (1981). The diphthongs show that in the sonority hierarchy they group as two low sonority vowels and two high sonority vowels, since prominence in diphthongs falls on a or ɛ in áo, ái, oá, oɛ. It falls on the first vowel in the combination of the two high sonority vowels and of the two low sonority vowels, respectively (ɛá, ói). We analyze this as a quality-sensitive diphthong system with default to trochee.
An interesting case of micro-variation is reported in Spanish and Catalan by In our analysis, Spanish has quality-sensitive diphthongs that default to iambs if both vowels tie in sonority (iú, uí). Catalan has the same diphthong system, except that sonority plateaus receive left edge trochaic prominence (íu, úi), creating a sort of minimal pair with Spanish.

Quantity
Length in vowels is irrelevant for the placement of prominence in diphthongs, but it can be a phonetic cue to prominence, as we show here. As mentioned in 4.1, Seri has quality-sensitive diphthongs that default to trochees when sonority does not decide the matter. It also has a lexical length distinction in vowels and distinguishes short and long diphthongs. In long diphthongs, either the first or the second vowel can be long: (5) Seri long diphthongs (Marlett's 1981 "triphthongs") áːi áːo íːo óːi áiː ɛáː ɛíː óiː oáː oɛː Prominence, however, is still placed according to sonority and is unaffected by length. Seri's diphthongs are thus quality sensitive but quantity insensitive, in terms familiar from metrical feet. This is the case generally, as far as we have been able to determine: we have not found a language in which long vowels in diphthongs attract prominence, as we can see in hypothetical Seri′.
(7) Unattested Seri′: Quantity sensitive diphthongs áːi áːo íːo óːi aíː ɛáː ɛíː oíː oáː oɛː This gap in the typology is represented by the asterisk in the last row in (1). If the gap is real, it shows that diphthongs are quantity insensitive, that their prominence never follows from the length of their members. Ulwa (Misumalpan; Green 1999) at first glance presents with a length contrast only in the prominent vowel in its trochaic diphthongs, i.e., ái vs áːi, úi vs. úːi, but no áiː etc. However, the non-prominent part of these diphthongs, as described by Green (p. 32f), consists of contrastively voiced or voiceless glides: there are also aj, aːj̊ etc. The language has a voicing contrast in consonants, including sonorants, but not vowels.
We thus suspect that these sequences are actually not diphthongs in the strict sense of two vocoids in a nucleus, but that the glide is a consonantal coda. These glides are moraic, however, since Ulwa has iambic quantity-sensitive word stress. A further argument against a diphthong analysis is that the four glides, voiced and voiceless w and j, combine with any preceding vowel except for /u/ and /i/, respectively. Following Broselow (1992) and others, we treat contrastive length in diphthongs as mora sharing (9). (9a) is a short diphthong with both vowels linked to a single µ; (9b) is the usual case, with each vowel linked to a different µ; and (9c)  Our claim is that micro-feet are built on the moraic level and cannot 'see' down below it. This models the fact that there are no QS diphthongs. Only inherent sonority and foot type (IAMB or TROCHEE) determine diphthong-internal prominence. Northern Saami (Finno-Ugric) provides a good illustration of how sonority takes over within diphthongs. Northern Saami has a gradation system with three different grades of quantity (Sammalahti 1998). Diphthongs are bimoraic and trochaic, with prominence on the left vowel (10a). In the µ grade the prominence switches from leftmost to the vowel with highest sonority (10b; see Bals Baal, Odden & Rice 2012 for a detailed analysis of Northern Saami gradation in moraic terms). We leave out falling sonority diphthongs since they do not show a prominence switch. When the second mora is removed in the µ grade, the two vowels share one mora (11) and the micro-foot becomes mono-moraic. There is now no left or right mora to choose from to identify it as trochee or iamb and sonority is the only criterion left to determine the prominent vowel. Cangnang has a rich vowel system with falling, rising, and plateauing diphthongs, but no length contrast. Hu & Ge measure the lengths of the diphthongs and their component parts. In diphthongs of two vowels with different sonority, the higher sonority vowel is generally longer than the vowel of lower sonority, regardless of their temporal order. We assume that Cangnang is quality-sensitive and that the length difference is the phonetic encoding of prominence at the micro-foot level.

Discussion of OT typology
With the constraints proposed in section 2 we account for the foot type, iamb or trochee, by constraint ranking. Quality sensitivity emerges if the QS constraint outranks the dominant foot type constraint.
In our basic typology we define the following candidate space. Inputs include both rising (ia) and falling (ai) diphthongs, and output candidates are mappings to surface forms with prominence on either the first or the second vowel in the diphthong, as in the master tableau in (12). 7 Golston,Krämer (12) Prominence and sonority in diphthongs A language with trochaic diphthongs selects a.i and b.i as optimal; an iambic language selects a.ii and b.ii, and a sonority sensitive language selects a.i and b.ii. A combination of a.ii and b.i as the winning candidates is not possible under any ranking, modeling the fact that no language creates diphthongs whose prominent member is always the lower-sonority vowel (no language has both aí and ía, though many have ái and iá).
The following Hasse diagrams represent the three language types.

QS IAMB QS TROCH TROCH IAMB
The relative ranking of TROCHEE and IAMB in (13c) distinguishes sonority-sensitive languages with default to trochee (TROCHEE outranks IAMB) from those with default to iamb (IAMB dominates TROCHEE), as illustrated in the two tableaux in (14): [We do not include here the constraints that determine whether certain vowels, such as i and u are considered of equal or different sonority in a language (Catalan/Spanish vs. Sixian Hakka. This might be due to their contrastive feature specifications, i.e., whether the difference between i and u is due to a feature that is relevant for sonority (for instance [coronal]) or not (for instance [labial]).] In a second step we expand the typology by sonority alignment and faithfulness violations. That is, we add a candidate form to each input that maps the input to an unfaithful output, which changes it to a monophthong, /ia/ → [eː], or to a hetero-syllabic string, /ia/ → [i.a]. Both cases are observed synchronically and diachronically in various languages. The next tableau displays the constraint set with the ranking for a language that does not have any diphthongs (which is most languages). An example is Cairene Arabic, which maps inherited Classical Arabic diphthongs ai and au to mid vowels eː and oː, respectively, and allows diphthongs only across morpheme boundaries; Youssef 2013).
Ranking FAITH above the ALIGN constraints yields free distribution of high and low vowels in diphthongs, as we find in White Hmong (16) and Estonian (17).
(16) White Hmong falling and rising trochaic diphthongs  (18) and Portuguese (19). Height alignment can also be an effect of a conspiracy of a foot type constraint with QS against FAITH. In this scenario sonority and prominence converge. Such languages admit only ái or iá as diphthongs. When ALIGN determines the distribution of high and low vowels, foot type constraints can cause a consistent sonority mismatch, as in Crow (18). ALIGN places the low sonority vowel at the left edge and TROCHEE places prominence at that edge. A second ranking condition for this mismatch is that FAITH dominates all other structural constraints. If that is not the case, the grammar rejects the rising trochaic diphthong and the language will not have any diphthongs.
Length cannot attract stress for the reasons outlined above. We conclude that there is no quantity-sensitive stress for diphthongs, i.e., there is no quantity sensitivity-inducing constraint parallel to Weight-to-Stress (Prince 1990). The diphthongs we discussed here are all bimoraic, which we assume to be the default. Light or monomoraic diphthongs, attested in languages like Old English (Mitchell & Robinson 2012) and Tohono O'odham (Miyashita 2011), violate the markedness constraint *MULTILINKED-µ, because the two vowels share the only mora. Monomoraic diphthongs are excluded if this constraint dominates the faithfulness constraint DEP-µ. The extra-long diphthongs with length on one vowel are linked to two moras, a property they share with long vowels. The constraint against single segments with more than one associated mora has to be ranked below MAX-µ 2 for a length contrast to emerge at all. We assume that trimoraic nuclei are universally impossible.
The tableaux below illustrate both cases, an underlyingly long vowel as part of a diphthong (a.i and a.ii) and an input with only one mora for the whole diphthong (b.i and b.ii). (20) Length and weight in diphthongs MAX-µ *MULTILINKED-µ DEP-µ *MULTI-µ-SEGMENT a.i. µ µ µ µ The high vowel might as well be parsed into the onset or they might create a hiatus, as is in fact the case in Portuguese, as in [ˈdʒi.ɐ] 'day', violating phonotactic constraints *HIATUS or ONSET, that are ranked high in languages that prefer diphthongs over hiatus. For reasons of expository clarity, we keep the typology simple here and abstract away from minor details. 2 When violations are assessed in the way done here, MAX-μ is actually MAXLINK-μ, i.e., faithfulness to the association line between a segment and a mora rather than the mora itself. See Morén (2001) for a discussion.

Conclusions
In this paper we have tried to show that diphthongs are structurally like metrical feet. These micro-feet are trochaic or iambic diphthongs and can be quality sensitive. We did not encounter any language with quantity sensitive diphthongs and propose this as a linguistic universal. This micro-foot typology is summarized below, repeated from (1) Implicit in the typology is the non-existence of diphthong systems that invert the sonority hierarchy; these are the non-occurring systems with ía and aí, úa and aú; no ranking of our constraints is able to select such a system. The gap in the typology marked with an asterisk is explained by the structural restriction of syllables to maximally two moras. Quantity-sensitivity singles out pairs of moras as the location of stress. Since the long vowel in a diphthong shares one of its two moras with the other vowel, this short vowel is within the location of stress too and some other constraint has to decide which of the two vowels receives more prominence. Quality sensitivity, on the other hand, categorizes vowels by relative sonority. We assume that sonority is determined by a vowel's height: high vowels are less sonorous than low and mid vowels, which are less sonorous than low vowels. Quality sensitivity thus has to access the vowels themselves rather than the moras that dominate them.
A final observation on metrical and micro-feet: within a language there does not seem to be any connection between foot types at the two levels. Estonian and Spanish have trochaic metrical feet, while both prefer iambic micro-feet. Even though the constraints we propose for the analysis of micro-feet are mostly identical to those for metrical feet, they have to be distinct from the latter. We are dealing with a different layer of prosodic structure, the micro-foot rather than the metrical foot.