Variable Hiatus in Persian is Affected by Suffix Length

Sequences of vowels are often avoided. In fact, many languages ban hiatus outright (Casali 1998, 2011). Languages differ in how they resolve hiatus. For instance, Yoruba deletes the first vowel, Chichewa merges the vowels, while Malay epenthesizes a consonant between the vowels. But do these well-described patterns present a full range of cross-linguistic variation? In this paper, we examine hiatus in Persian, which displays three rare properties. First, Persian most commonly avoids hiatus by deleting the second vowel, which is considerably less frequent than deletion of the first vowel across languages (Casali 1997). Second, while Persian generally avoids hiatus, it tolerates it when the suffix contains a single vowel, as to maintain paradigmatic contrast. Finally, hiatus in Persian is variable: more often than not, multiple realizations are attested within and across speakers. Our work contributes to a growing body of research that has focused on variable sound patterns, which are subject to various restrictions just as much as exceptionless generalizations are. For instance, Ernestus & Baayen (2003) show that Dutch speakers’ productions of nonce words reflect distributional characteristics of the Dutch lexicon, with velars in root-final position eliciting relatively more voicing than labials and coronals. Becker et al. (2011) demonstrate that variable laryngeal alternations in Turkish depend on word length and consonant place. Jurgec & Schertz (2020) show that variable velar palatalization in Slovenian depends on the suffix and is restricted by a consonant co-occurrence restriction on postalveolars. Persian hiatus constitutes another example. This paper is organized as follows. Section 2 presents the language background and reviews the literature on hiatus in Persian. Section 3 presents the elicitation-based production experiment which allows us to gauge the extent of variation. Section 4 discusses the perception experiment in which we focus on three main variants: hiatus, elision of the second vowel, and epenthesis. We show that while elision is the most common variant, hiatus is generally retained with suffixes consisting of a single vowel.

In this study we investigate the main variants of Persian hiatus altogether and discuss the factors that motivate such variation. To tackle these issues, we conduct two experiments. In the elicitation-based production experiment we will gauge the variation across and within speakers. The results reveal a few core variants and dependence of one of them on suffix length. Then, the perception experiment will allow us to focus on the effect of suffix length. As we we will see, the variation in Persian hiatus is systematic and highly dependent on the length of the suffix.

Production
One way to extract information about variable phonological phenomena is to look at corpus data (Zuraw 2006;Kager & Pater 2012;Becker & Jurgec 2020, among many others). To the best of our knowledge, no corpus of Spoken Persian exists. To gain insight into the general properties of hiatus and its resolution in spoken Persian (such as consistency and variation within and across speakers, the effect of vowel quality and lexical effects), we thus conducted a small elicitation-based production experiment.

Stimuli
In this study, we focus on hiatus at the root-suffix boundary. Our stimuli for the elicitation task were polymorphemic words consisting of a vowel-final root and vowel-initial suffix. Table 1 shows the variables and their levels in this task. Our primary variable is Suffix Length, as evidenced in (1). Suffixes are categorized into two groups: polysegmental and monosegmental. With our second variable, Lexical Category, we tackle the question whether the patterns observed in native words are productively extended to loanwords (see Yip 1993;Itô & Mester 1995;Jurgec 2010;Kang 2011) and nonce words (Berko 1958  xirÃA-e 'the (nonce)' Table 1: Elicitation-based production experiment variables and their levels.
Due to lexical gaps and phonological restrictions, the variables are not perfectly balanced. To start with, we used all 17 productive suffixes in Spoken Persian, but there are more polysegmental suffixes than monosegmental ones, as shown in Table 2. These suffixes begin with all of the vowels of the Persian vowel inventory, except for [u]. We included a total of 195 vowel-final roots per participant. The roots ended in every vowel of Spoken Persian, except for [ae], which is not allowed root-finally. This means that not all vowel combinations are possible in Spoken Persian: there were no ae-initial and u-final hiatus combinations.
Each root belonged to one of the three lexical categories. The native roots include nativized Arabic loans (75 roots in total). The reason for considering Arabic loans as part of the native category is that these borrowings are completely adapted to the Persian phonology, since Persian has a long history of borrowing from Arabic (Morony 1986). Arabic loans also differ from other loanwords (60 roots in the experiment), which were borrowed into Persian more recently from English, Russian, and French (Zomorodian 1994;Navabzadeh Shafi'i 2014;Ariyaee 2019). The nonce words (60 roots) used in the elicitations were consistent with the Persian phonotactics. They were either CVCV, CVCCV or CVCVCV, with final stress. All nonce words were checked by a native speaker to make sure they sound natural and are not attested words.

Procedure
The data elicitation procedure included two parts: the familiarization task and the main task. The familiarization task included four stages. At the first stage, participants were presented with a consonant-final root in Persian orthography. At the second stage, they heard a derived word, consisting of the same consonant-final root with a vowel-initial suffix. Then, the participant was presented with a new consonant-final root in Persian orthography. Finally, the participant was asked to derive the word using the suffix they heard before.
In the main task, the participants were similarly presented with a vowel-final root, and were then asked to derive it with a particular suffix. Once the participants pronounced the derived word, they were asked if there is another way to pronounce it. When the pronunciation was unclear, the participants were asked to repeat the word. The stimuli were organized into blocks by suffix to make the task easier for the participants.

Participants
We recruited 6 participants, of which 3 were female and 3 were male, with the mean age of 30. One of the participants participated twice (with the two sessions being apart more than a month) so to that we could better gauge the variation within speaker. Thus, this speaker had twice as much data as the other participants.

Analysis
The productions were then manually categorized into three categories: V 2 elision, hiatus, and epenthesis. This categorization was conducted by a native speaker both impressionistically and using an acoustic analysis. It was challenging to distinguish between hiatus and [P] epenthesis. While there were many cases of a clear [P], there were also 17 instances of glottalization and creaky voice. All these cases were transcribed as having a [P] and included in the results we report below. Another challenging distinction involved glides. Persian allows [w] and [j] intervocalically to resolve hiatus in specific contexts (Sadeghi 1986;Shaghaghi 2000;Dehghan & Kord Zafaranlu Kambuziya 2012 (Sadeghi 1986;Windfuhr & Perry 2009). Unless there was evidence of a glottal stop, the glides were transcribed in all expected environments, since we could not find a reliable method to distinguish glide epenthesis from hiatus acoustically. Across all speakers, there were 180 such tokens. Beyond these cases, there were also a few instances of allomorph selection for one particular suffix, which we leave out from what follows. There were no other realizations.

Suffix Length
Our primary variable of interest is Suffix Length. Figure 1 shows the share of three realizations across all participants by Suffix Length. The area represents the relative share among all tokens. We can see that V 2 elision (henceforth, simply elision) is frequent with polysegemtanl suffixes, but rare with monosegmental suffixes. Epenthesis is more common with monosegmental suffixes, but it is the rarest of the three variants. Finally, hiatus is the most frequent realization with monosegmental suffixes. Participants provided variable realizations in 8% of cases, which we turn to next. Figure 2 shows that the second variant is most commonly hiatus when the first variant is elision. Only 8 tokens in total have elision and epenthesis as the second variant.

Lexical Category
There were no substantive difference across different lexical categories, which suggest that the generalizations in the native words are productively extended to loanwords and nonce words. In lieu of a graphic representation, we present the inferential statistics below.

Statistical analysis
To test whether hiatus patterns across conditions differ significantly, the data was fit into a mixed-effects logistic regression model the glmer function from the lme4 package (Bates et al. 2015) in R (R Core Team 2013). The response variable was binary coded: elision versus other realizations (hiatus or epenthesis). Both predictor variables were simple coded with polysegmental (for Suffix Lenth) and native (for Lexical Category) as reference levels. We first ran the model with the interactions between these two variables, but the model did not converge; hence, the interactions were not included in the model we report here. Finally, we also included random intercepts for item and participant, and random slopes for each fixed factor allowed to vary by participant. This model is reported in Table 3.  Table 3: Production-data regression model: elision is significantly less common with monosegmental suffixes than hiatus or epenthesis.
We can see that the Lexical Category was not a significant predictor, which means that generalizations observed in native words did not differ from the ones in loanwords and nonce words. Suffix Length, however, was significant: there was less elision with monosegmental suffixes when compared to polysegmental suffixes.

Interim summary In this section we showed that hiatus is variable in Persian. Among the variants,
V 2 elision is common, while V 1 elision is unattested. We also found that the length of the suffix has an effect on the realization of the variants.
Since the elicitation-based production experiment was exploratory, the variation across different conditions was not tightly controlled for. For instance, while we considered all suffixes and a broad range of roots, the vowels appearing in hiatus contexts were not controlled for. There were lexical gaps and the insertion of the two glides was difficult to determine beyond doubt. Finally, our production data is limited to only six participants. We designed the perception experiment presented in the following section with these concerns in mind.

Perception
In the perception experiment, we further tackle the effect of suffix length on hiatus variation. Unlike the production experiment, the participants will provide acceptability judgments for all three main variants (hiatus, elision, epenthesis), with all the variables tightly controlled for.

Stimuli
We compiled a set of 30 CVC(C)V nonce roots, half of which were A-final and the other half were e-final. These vowels were chosen because they are licit word-final vowels in Persian and do not condition glide epenthesis, which proved problematic in the production experiment. Our main factor of interest was suffix length. We chose 3 polysegmental and 3 monosegmental suffixes, shown in Table 4.  The combination of the suffixes with the nonce roots resulted in polymorphemic words. Bare roots as well as suffixed words were pronounced by a native speaker. The suffixed words were recorded under three conditions: with elision, hiatus, and [P] epenthesis.

Procedure
The perception experiment was conducted online. The participants were asked to judge acceptability of the paradigms consisting of a bare nonce root and its suffixed form. The participants saw two frame sentences and played the corresponding paradigm (Figure 3). After they had heard the recordings, they were asked to judge the paradigms as either acceptable or unacceptable. At each trial, the suffixed word was pronounced under one of the three conditions (elision, epenthesis or hiatus), which were randomized across trials. The participants could play the recordings as many times as they wanted. To make the task easier for the participants, the participants heard all paradigms with the same suffix in one block. Each root appeared only once, and each suffix appeared 5 times. Each pair of words appeared under three randomized conditions (elision, epenthesis, hiatus), for a total of 90 items per participant.

Participants
In total, the survey was completed by 54 participants (28 female and 26 male, with a mean age of 29). They all reported to be native speakers of Spoken Persian.

Results
As in the production experiment, our primary variable of interest was Suffix Length. Figure 4 shows the mean acceptability rates by the length of the suffix, separated for the three different conditions. One data point in this graph is the mean acceptability rate across all speakers for one paradigm (bare root and its suffixed form). When comparing acceptability rates across different conditions, we see that the paradigms with elision in polysegmental suffixes have the highest values. Conversely, elision is the least acceptable with monosegmental suffixes. Finally, epenthesis is always less acceptable than hiatus. The results demonstrate that hiatus resolution is sensitive to suffix length, with elision being the most acceptable with polysegmental suffixes but the least acceptable with monosegmental suffixes. Figure 5 presents the results by suffix. While the polysegmental suffixes appear to be very similar, the suffix -e stands out among the monosegmental suffixes, as it has a higher than expected acceptability of elision. This opens the question how much individual suffixes can vary (beyond the effect of their length), which should be addressed in subsequent work.

Statistical analysis
We fitted acceptability into another mixed-effects logistic regression model, using the same program and package as in the production experiment. The first set of predictors in our model was the variant, which was Helmert coded as two binary predictors: elision versus other, with values of +2/3 for elision and -1/3 for hiatus or epenthesis, and hiatus versus epenthesis, with values of +1/2 for hiatus and -1/2 for epenthesis, and 0 for elision. We also considered the length of the suffix (polysegmental, monosegmental), with polysegmental as the reference level. We included the interactions between the predictor variables as well as random intercepts for participant and item, with slopes for variants and suffix length allowed to vary by participant. This accounted for variability across participants and items. We report the results of the model in Table 5. The model shows that elision is statistically significantly more acceptable than hiatus and epenthesis combined, and further that hiatus is more acceptable than epenthesis. Suffix length has no overall effect on acceptability. The key result is the interactions. In particular, elision is less acceptable with monosegmental suffixes, which confirms the trend seen in Figure 4. Conversely, hiatus is also more acceptable with monosegemental suffixes.
The statistical analysis confirms what we have seen so far: elision is the preferred hiatus resolution strategy in Persian, except with monosegemental suffixes, where hiatus is the most acceptable.

Constraint-based analysis Both experiments confirm that hiatus resolution strategies in Spoken
Persian are dependent on suffix length. In particular, elision is the default strategy, except with monosegmental suffixes. This makes sense, since deleting the suffix vowel would delete the entire suffix, neutralizing the paradigmatic contrast. In this section, we model the results of the perception experiment in a Maximum Entropy (MaxEnt) grammar.
MaxEnt is an OT grammar with weighed constraints and probabilistic outputs. The results of the perception experiment were fed to a MaxEnt learner (Goldwater & Johnson 2003;Hayes & Wilson 2008). We included two faithfulness constraints-DEP and MAX (McCarthy & Prince 1995)-as well as two markedness constraints-*HIATUS and REALIZEMORPHEME. The former is violated by a sequence of two vowels and as such drives epenthesis or elision (Casali 1998). We use REALIZEMORPHEME (2) to model the asymmetry between polysegmental and monosegmental suffixes; in particular, elision of the entire suffix violates REALIZEMORPHEME (Kurisu 2001).
(2) REALIZEMORPHEME Morphemes must have output realizations. REALIZEMORPHEME was assigned the highest weight, followed by DEP and *HIATUS. With polysegmental suffixes, the elision candidate (3-a) is preferred over the other two, while with monosegmental suffixes, hiatus (4-c) is the most common. Regardless of the suffix length, the probability of hiatus (c) is estimated at rates 1.8-times higher than the epenthesis (b). This follows directly from the violation profile of these candidates which are identical in (3) and (4). This prediction closely matches the perception and production data, suggesting that the proposed constraints are adequate.
( 3)  4.4 Interim summary The perception experiment had greater control for condition and confirmed the results of the elicitations. The results reveal that hiatus variants are dependent on suffix length. In particular, V 2 elision was the most acceptable variant with polysegmental suffixes, but the least accetable variant with monosegmental suffixes. This asymmetry can be captured in probabilistic constraint-based grammars, such as MaxEnt.

Conclusions
In this paper, we report novel data on variable hiatus in Persian. This variation mainly depends on the length of the suffix: V 2 elision is preferred with polysegmental suffixes, but dispreferred with monosegmental suffixes. This asymmetry is grounded in the fact that deleting the entire suffix would neutralize the paradigmatic contrast. We also found that Persian allows hiatus particularly with monosegmental suffixes. To the best of our knowledge, this is the first experimental study of hiatus to date.
The reported data contribute to the existing literature on hiatus. First, Persian shows that the hiatus resolutions typically found across languages can be observed variably in a single language. Second, hiatus is marked in Persian, but can surface under specific situations, such as with monosegmental suffixes. Third, Persian displays a cross-linguistically rare type of elision of the second vowel, which is productively extended from real words to loanwords and nonce words.