Languages put restrictions on large sonority distances

. An underlying assumption regarding sonority distances is that consonant clusters with large sonority distances are more common than those with small distances, as reflected in the unmarked status of large sonority distances and formalized in terms of sonority constraints on consonant clusters. However, a cross-linguistic survey of attested sonority distances in 357 languages reveals that large sonority distances are not the most commonly attested. Instead, there is a point of sonority distance at which the largest number of languages are attested. When the sonority distance exceeds a particular value, the number of languages starts to decrease, regardless of the sonority scales. The finding challenges the unmarked status of large sonority distances and suggests a potential restriction on large sonority distances from surfacing.

The sonority distance constraint hierarchy (2) predicts if a language has clusters with small distances, it also typically has clusters with large distances.For example, if a language has a consonant cluster with a distance of one, it is also likely to have clusters with larger distances like two, three, four or five.The cross-linguistic prediction imposed by the sonority distance constraint hierarchy on languages is that larger distances are more common cross-linguistically (when there are no interactions of other constraints).
Typological preferences toward large sonority distances in tauto-syllable clusters have been investigated in Russian, Spanish, Yakut and Icelandic in Baertsch (1998).According to the study, the realization of /klub/ ('cloud') in Russian as [kuluup] is the effect of the sonority distance constraint hierarchy, specifically a constraint ranking of *Dist 1 >> *Dist 2 >> *Dist 3 >> Dep, which prevents the realization of /kl/ (data from Kharitonov (1982)).An illustrative analysis from Baertsch (1998) for the Russian example is provided in Tableau (3) below.
(3) The realization of [kuluup]  Clusters with large distances are given unmarked status not only in tauto-syllabic clusters but also in hetero-syllabic clusters (Gouskova 2002;Murray & Vennemann 1983).In heterosyllabic clusters, the steeper the sonority distance in C1.C2, the more unmarked the corresponding cluster is.For example, /r.t/ is a more unmarked cluster than /f.t/.In terms of syllable structure, a sharp increase in sonority in the onset position is considered to be a less complex cluster and thus more unmarked (Clements 1990).A typological survey of cross-linguistically preferred CC sequences conducted by Parker (2016) identified glide as the default, unmarked segment in the C2 position (rather than liquid) based on a diverse sample of 122 languages.He found that compared to the competing status of liquid in the C2 position, CG sequence was the default, unmarked segment.However, he remained open as to whether this survey is statistically reliable.In sum, various studies suggest a cross-linguistic preference towards large sonority distances, which have accordingly been given unmarked status in theoretical formalizations, at least in the tautosyllabic onset position as well as in the hetero-syllabic position.
Nevertheless, an important empirical gap is the empirical standing of the typological prediction of sonority distance constraints.To fill this gap, the current study aims to test the sonority distance constraint on empirical data at a large typological scale.Especially given recent reports of 'irregular patterns' in sonority sequences in under-studied languages, a typological survey on sonority distance variation on a global scale makes an empirical contribution.The main goal of the study is to explore to what extent the sonority distance constraint can predict cross-linguistic variation in sonority distance.To examine the cross-linguistic prediction of sonority distance constraint, we will apply the sonority distance constraint shown in (4) and ( 5) below to geographically and genealogically diverse 357 languages, and explore the unmarked status of large sonority distances.The definition of sonority and sonority hierarchy has been extensively debated in the literature, with arguments for both a phonetic and a substance-free approach.For a detailed discussion on this topic, see Parker (2002Parker ( , 2008) ) for the phonetically-based concept of sonority and Clements (1990Clements ( , 2005) ) for the phonologically-based notion of sonority, and Hayes and Steriade (2004) and Clements and Hume (1995) for a more general discussion on the phonetic grounding of phonological constructs.In our study, we adopt a phonetically motivated definition of sonority, defined in terms of acoustic intensity as proposed by Parker (2002), and accordingly, we use a sonority hierarchy that arranges sonority classes based on the acoustic intensity, as shown in (6) below.We also include a simpler sonority hierarchy (7) to assess whether the sonority distance variation observed by using ( 6) is due to the specific sonority hierarchy adopted.Therefore, we present results for both hierarchies.
(6) Glides > rhotics > laterals > flaps > trills > nasals > h > voided fricatives > voiced stops/affricates > voiceless fricatives > voiceless plosives/affricates (7) Sonorants > Obstruents The rest of the sections are structured as follows: Section 2 describes the materials and methods used in the study; Section 3 details empirical results, Section 4 explores possible explanations for the cross-linguistic tendency of sonority distances, and Section 5 concludes the study.
2. Methods.Phoneme sequences from geographically and genealogically different 357 languages were collected from two large lexical databases, the Database of Cross-Linguistic Colexifications 2 (CLICS 2 ) (List et al. 2018) and AusPhon-Lexicon (Round 2017).Only languages with standardized phonemic IPA symbols were used in the current study.From lexical lists in each language, permissible consonant clusters in each language were obtained, and then permissible sonority distances were calculated using a sonority hierarchy that has eleven consonantal sonority classes based on a phonetically motivated definition of sonority.To investigate the relationship between the number of languages and attested sonority distances, a sequential polynomial regression was performed.The above process was repeated using the simplest sonority hierarchy given in ( 7) above.In this study, we only focus on word-initial andfinal phoneme sequences since it is difficult to identify syllable boundaries from lexical lists given in databases.For example, in a given lexical sequence like /#CCVCCCVCC#/, it is not clear where the syllable boundary is in the medial clusters.Additionally, only onset positions will be considered in this study.

Results.
Results show that large sonority distances are generally more preferred crosslinguistically.The larger the sonority distance is, the more likely it is to be attested.However, the correlation between the number of languages and sonority distance does not show a linear model, but rather a polynomial trend.The following section details the results.
Firstly, in terms of raw frequencies, the number of languages that contain SSP-violating clusters is smaller than the number of languages that contain SSP-following clusters, which is in line with the sonority sequencing projection.Among languages that contain SSP-violating clusters, the largest number of languages generally increases as the sonority distance value increases, where sonority distance is calculated in terms of the sonority value of the right segment minus the sonority value of the left segment based on a sonority hierarchy of eleven consonantal classes recapped in (8) below.Illustrative examples of calculating sonority distances are given in (9) below.
On the other hand, for clusters with SSP-conforming rising sonority profile, the number of languages generally increases as the sonority distance increases, however, when the sonority distance exceeds a certain value, the number starts to decrease.At a sonority distance of 6, the largest number of languages is attested -41.18% out of 357 languages; at sonority distances of 4 and 5, around 35% of languages are attested; and at sonority distances of 1 and 8, nearly 30% of languages are attested; and lastly, at the largest sonority distances of 9 or 10, the lowest number of languages is attested, only 13.17% and 17.65%, respectively.More detailed results can be found in Table 1  Types of sonority classes attested at each sonority distance are provided in Table 2 below.We find that, in general, there are more types of sonority classes at lower absolute values of sonority distances (i.e., at sonority distances between -4 to 4).As the absolute values of sonority distances increase, the number of types of attested sonority classes decreases, which is also expected, as there can be more possibilities of combinations with small sonority distances.Additionally, we find that if an SSP-violating sonority class is attested at a certain sonority distance, the reversed SSP-following class is also attested at the corresponding positive sonority distance value in our language sample, or vice versa.This shows a certain degree of onset-coda symmetry, but it is not an entirely symmetrical pattern, however, it is not clear whether these classes tend to be in the same language, which will be left to future study.

Sonority Distances
Attested Sonority Class Types at Each Sonority Distance SSPconforming clusters 10 ts-j, t-j 9 s-j, f-j, ts-ɹ, t-ɹ 8 dz-j, d-j, f-ɹ, ts-l, t-l 7 z-j, v-j, d-ɹ, s-l, f-l, ts-ɾ, t-ɾ j-n, l-v, l-z, ɾ-d, r-s, n-ts, n-t -6 l-dz, l-d, ɾ-s, r-t -7 j-z, j-v, l-f, ɾ-t -8 j-d, l-ts, l-t -9 j-s, j-f, ɹ-t -10 j-ts, j-t Table 2. Types of sonority classes attested at each sonority distance.Sonority classes are coded as follows: vcl stops (t), vcd stops (d), vcl affricates (ts), vcd affricates (dz), vcl fricatives (f), vcd fricatives (v), vcl sibilants (s), vcd sibilants (z), laterals (l), rhotics (ɹ), flaps (ɾ), trills (r), and glides (j).1 To test the correlation between sonority distances and the number of languages attested at each sonority distance, a sequential polynomial regression was conducted.The linear model was initially used, and then quadratic and cubic components were sequentially added to the model until the best fit was achieved based on model diagnostic parameters R 2 , F, AIC and BIC.The results showed that the quadratic model provided the best fit (F(2,18) = 65.93,p < .001,R 2 = .867).For the results of the quadratic model as well as the other two linear and cubic models that were tested, see Table 3 below.Based on the parameters of the quadradic model given in Figure 1 below, it is projected that the largest number of languages will be attested at around a sonority distance of 5 (shown in 0 below) instead of at large sonority distances of 7, 8, 9 or 10.
(10) -β1/2* β2: -0.Table 3. Model fitting for the correlation between sonority distances and the number of languages attested at each sonority distance in the onset position When the sonority distance constraint for an eleven consonantal sonority class, recapped in (11) below, is applied to typological empirical data, the polynomial trend of the attested number of languages is clearly shown in Figure 1 below.The figure demonstrates that as sonority distances increase, the number of languages also increases, with the number of languages with SSP-following clusters generally being larger than the number of languages with SSP-violating clusters.However, as the distance increases beyond a certain value, the number of languages starts to decrease.The number of languages attested at each distance does not exhibit a monotonic increasing trend.Note that the figure does not indicate that the largest sonority distance of 10 (i.e., obstruent-glide) is unlikely to occur; Rather, it shows that distances that exceed a particular value (i.e., distances 6, 7, 8, 9, 10) are less likely to occur than distance 5.That is, at a distance of 5, we are likely to observe the largest number of languages.We repeated the above processes using the simplest sonority hierarchy and found that the trend remained the same.We applied the sonority distance constraint (12) to typological data, and corresponding results are presented in Table 4, Table 5 and Figure 2 below.We used a hierarchy with two consonantal sonority classes as a reference to demonstrate that the polynomial trend is not limited to a specific sonority hierarchy but rather a generalized tendency regardless of hierarchies.The results show a larger number of languages with SSP-following clusters (62.46%) than the number of languages with SSP-violating clusters (14.29%), consistent with the sonority sequencing projection (Table 4 below).And statistically speaking, among linear, quadratic and cubic models, the quadratic model provided a better fit for the data points, and in our case, we found a perfect fit for the data points.Therefore, based on the equation shown in (13) below, the largest number of languages is projected to be found at a sonority distance of 1the largest sonority distance for this hierarchy.However, note that the trend does not show a monotonic increase, as demonstrated in Figure 2  ).This constraint hierarchy predicts if a language has small distances, it also has larger sonority distances, and accordingly, large sonority distances are predicted to be more common cross-linguistically (when there are no interactions with other markedness constraints).A typological investigation of 357 geographically and genealogically diverse languages shows that large sonority distances are generally more common than small distances and are attested in more languages than small distances.However, rather than large distances, there are particular sonority distances where the largest number of languages are found regardless of sonority hierarchies.When the sonority distance exceeds a certain value, the number of languages attested at each sonority distance starts to decrease.The number of languages does not show a monotonic increase as sonority distances increase.Under Optimality Theory (OT), the variation in phonological grammar across languages is attributed to different constraint rankings of constraints in each individual language (McCarthy 2002(McCarthy , 2008;;Prince & Smolensky 1993/2004;Zec 2007).Under this framework, the finding from the current study that many languages do not have large sonority distances could be the effect of different constraint rankings in different languages.Each language's idiosyncrasies with respect to the sonority distance constraint hierarchy prediction may be the effect of different constraint rankings in each language.In this case, the typological survey of the current study motivates further insight into why many different rankings in different languages prevent large distances from being realized and thus provides further motivation for why large sonority distances that exceed a particular value would be prevented from being realized.Under OT, the results of the current study may also be the effects of interacting markedness constraints that prevent large distances from being realized, i.e., a constraint that frequently ranks above large sonority distance constraints in many languages.In this case, the current study invites further reasons why some constraints frequently rank above large sonority distance constraints and prevent large distances from being realized, especially if large distances are the most unmarked clusters crosslinguistically.
The typological tendency that large distances are not found as commonly as they are supposed to be also motivates a simplified mechanism to account for the cross-linguistic tendency of sonority distancesa constraint that simply prevents large distances from being realized as a default setting embedded in constraint hierarchy.That said, there might be an interaction of a markedness constraint with the sonority distance constraint hierarchy which prevents large distances of some values from being realized.Accordingly, the present study motivates additional investigation of phonological processes of consonant clusters in those languages large distances are supposed to be realized but have not been attested on the surface.
The empirical findings from the current study are also in line with a more general tendency that the strength of phonotactics decays as the distance of phonotactics increases as found in many studies (Frisch 1996;Hayes & Zsuzsa 2006;Kharitonov 1982;Kimper 2011;Pierrehumbert 1993;Zymet 2014).It has been found in various studies using different approaches that the strictness of phonotactic strengths decreases as the distance of two phonemes increases.However, it remains to be found what constitutes the basis of this decrease, as well as the precise nature of the decrease.The finding from the current study that large sonority distances are not favoured by languages is also consistent with this more general phonotactic law, which might serve as motivation for why there could be large sonority restrictions.Therefore, empirically, both typological tendencies of sonority distances and the strength of phonotactic decay motivate constraints on large distances between two phoneme sequences.If sonority is to be embedded in OT as a grammar, the typological survey of sonority distances motivates a constraint on large distances.

Conclusion.
Large sonority distances have been regarded as the most unmarked clusters, as formulated in sonority (or sonority distance) constraint hierarchies.A typological survey on sonority distances shows that consonant clusters with large sonority distances are generally more common than those with small ones, but not all large distances are always the most common ones.There is an optimal sonority distance at which the largest number of languages is attested, and above this value, the number of languages starts to decrease, regardless of the sonority hierarchies.The finding that large sonority distances are not always favoured by languages crosslinguistically presents a challenge to the unmarked status of large sonority distances.

(
Figure 1.The correlation between sonority distances and the number of languages (lgs.)attested at each sonority distance when a sonority hierarchy of eleven consonantal sonority classes is adopted (i.e., glides > rhotics > laterals > flaps > trills > nasals > h > vcd fricatives > vcd stops/affricates > vcl fricatives > vcl stops/affricates) Figure 2. The correlation between sonority distances and the number of languages (lgs.)attested at each sonority distance for a sonority hierarchy of two consonantal sonority classes (i.e., sonorants > obstruents) 4. Discussion.Large sonority distances are assumed to be more favoured cross-linguistically, as formulated in the sonority distance constraint hierarchy (e.g., *Dist 0 >> … >> *Dist 9 >> *Dist 10).This constraint hierarchy predicts if a language has small distances, it also has larger sonority distances, and accordingly, large sonority distances are predicted to be more common cross-linguistically (when there are no interactions with other markedness constraints).A typological investigation of 357 geographically and genealogically diverse languages shows that large sonority distances are generally more common than small distances and are attested in more languages than small distances.However, rather than large distances, there are particular sonority distances where the largest number of languages are found regardless of sonority hierarchies.When the sonority distance exceeds a certain value, the number of languages attested at each sonority distance starts to decrease.The number of languages does not show a monotonic increase as sonority distances increase.Under Optimality Theory (OT), the variation in phonological grammar across languages is attributed to different constraint rankings of constraints in each individual language(McCarthy 2002(McCarthy , 2008;;Prince & Smolensky 1993/2004;Zec 2007).Under this framework, the finding from the current study that many languages do not have large sonority distances could be the effect of different constraint rankings in different languages.Each language's idiosyncrasies with respect to the sonority distance constraint hierarchy prediction may be the effect of different constraint rankings in each language.In this case, the typological survey of the current study motivates further insight into why many different rankings in different languages prevent large distances from being realized and thus provides further motivation for why large sonority distances that exceed a particular value would be prevented from being realized.Under OT, the results of the current study may also be the effects of interacting markedness constraints that prevent large distances from being realized, i.e., a constraint that frequently ranks above large sonority distance constraints in many languages.In this case, the current study invites further reasons why some constraints frequently rank above large sonority distance constraints and prevent large distances from being realized, especially if large distances are the most unmarked clusters crosslinguistically.The typological tendency that large distances are not found as commonly as they are supposed to be also motivates a simplified mechanism to account for the cross-linguistic

Table 1 .
below.The number and proportion of languages (out of a total of 357) attested at each sonority distance in the onset position for a sonority hierarchy with eleven consonantal sonority classes in which sonority is defined phonetically (i.e., glides > rhotics > laterals > flaps > trills > nasals > h > vcd fricatives > vcd stops/affricates > vcl fricatives > vcl stops/affricates).