Against the Law of Three Consonants in French: Evidence from Judgment Data

In French, some morphemes alternate between a form with schwa and a form without schwa. For instance, the noun demande ‘request’ can be realized with a schwa as [d@mãd] or without schwa as [dmãd]. Determining the factors that condition the distribution of schwa­zero alternations has been a central topic in French phonology for more than a century (see Bürki et al. 2011:3982­3985 for an overview). Among these factors, the consonantal context, and in particular the number of consonants surrounding schwa, has received particular attention early on. In his influential treaty on French pronunciation, Grammont (1914:115­116) states that a preconsonantal schwa is obligatory when preceded by two consonants (CC_C), as illustrated in (1­a), but excluded when preceded by a single consonant (C_C), as illustrated in (1­b). He calls this generalization the ‘loi des trois consonnes’ (Law of Three Consonants; LTC) and explains it as a strategy to avoid three­consonant clusters. In (1­a), the schwa form is preferred because it makes it possible to avoid the three­consonant cluster [tdm]. In (1­b), the schwa­less form is preferred in the absence of three­consonant clusters.


Introduction
In French, some morphemes alternate between a form with schwa and a form without schwa. For instance, the noun demande 'request' can be realized with a schwa as [d@mãd] or without schwa as [dmãd]. Determining the factors that condition the distribution of schwazero alternations has been a central topic in French phonology for more than a century (see Bürki et al. 2011:39823985 for an overview). Among these factors, the consonantal context, and in particular the number of consonants surrounding schwa, has received particular attention early on. In his influential treaty on French pronunciation, Grammont (1914:115116) states that a preconsonantal schwa is obligatory when preceded by two consonants (CC_C), as illustrated in (1a), but excluded when preceded by a single consonant (C_C), as illustrated in (1b). He calls this generalization the 'loi des trois consonnes' (Law of Three Consonants; LTC) and explains it as a strategy to avoid threeconsonant clusters. In (1a), the schwa form is preferred because it makes it possible to avoid the threeconsonant cluster [tdm]. In (1b), the schwaless form is preferred in the absence of threeconsonant clusters.
(1) Grammont Subsequent works on French schwa have provided a more nuanced view of the LTC. First, the LTC has been found to hold as a gradient rather than a categorical generalization: schwa is not obligatory in CC_C and excluded in C_C but more likely in CC_C overall than in C_C (Bürki et al. 2011; Racine & Andreassen 2012; Côté 2012; Hambye & Simon 2012; Hansen 2012. Second, not only the number but also the nature and order of surrounding consonants has been found to be relevant. In C_C, clusters with increasing sonority favor the schwaless form (Bürki et al. 2011). In CC_C, the likelihood of schwa presence depends on the consonants involved. For instance, schwa is more likely to be pronounced if its absence implies that an obstruentliquid cluster (OL) is not directly followed by a vowel (Dell 1976, 1985; Côté 2001, as illustrated in (2).

O#L_C [kls] chaque leçon [Sak#l(@)sÕ] 'each lesson' b. Schwa is less likely in CO_L (O=obstruent, L=liquid) C#O_L [spl] douce pelouse [dus#p(@)luz] 'sweet lawn'
This nuanced view of the LTC has been argued to be relevant in a range of contexts beyond the wordinitial context illustrated in (2): at word boundaries (e.g. act(e) pénible 'painful act'), at clitic boundaries (e.g. Annick l(e) salue 'Annick greets him'), and at morpheme boundaries before the inflectional future/conditional suffix r (e.g. je gard(e)rai 'I will keep' ; Côté 2001:85). However, there is one morphological context where the LTC is still considered to hold as a categorical generalization in line with Grammont's strict interpretation in (1). Between stems ending in two consonants and consonantinitial derivational suffixes (CC_C derivation ), schwa is reported to be categorically pronounced and this regardless of the nature and order of surrounding consonants (Dell 1978; Côté 2001:85, 109; Côté 2012. For instance, schwa is reported to be obligatory in both OL_C and CO_L before derivational suffixes, as illustrated in (3a) and (3b), even though OL_C and CO_L are treated differently in general, as illustrated in (2) for wordinitial contexts and in (4) before the inflectional future/conditional suffix r.
( The view according to which boundaries between stems and derivational suffixes are special compared to word boundaries and to boundaries between stems and inflectional suffixes implies that the phonological grammar may differ quite substantially across strata, with clearly distinct lexical and postlexical strata. 1 In the postlexical stratum, a set of phonotactic constraints referencing different types of threeconsonant clusters (e.g. *OLC, *COL, etc) would be active, resulting in different patterns of schwazero alternations for different types of threeconsonant clusters, as illustrated in (2) and (4). In the lexical stratum, only a single phonotactic constraint banning threeconsonant clusters would be active (*CCC), 2 resulting in a single pattern of schwa zero alternations for all CC_C sequences at stemsuffix boundaries in derived words, as illustrated in (3).
The hypothesis of clearly distinct lexical and postlexical phonologies will be referred to as the strong version of the lexicalphonology hypothesis. The predictions of the strong version of the lexicalphonology hypothesis are summarized in (5) for the two relevant consonant sequences (OL_C and LO_L) and the two morphological contexts this paper will focus on (derivation and inflection). The predictions are summarized in (5a) at the level of the grammar (constraint set in each stratum) and in (5b) at the level of the data (probability distribution of schwazero alternations).

(5)
Strong version of the lexicalphonology hypothesis: predictions a. Grammar: phonotactic constraints differ in derivation (lexical stratum) and in inflection (post lexical stratum) {*CCC derivation } {*OLC inflection , *LOL inflection } b. Data: schwazero alternations are sensitive to cluster type only in inflection (postlexical stratum) P (@|OL_C derivation ) = P (@|LO_L derivation ) P (@|OL_C inflection ) ̸ = P (@|LO_L inflection ) The strong version of the lexicalphonology hypothesis treating the LTC as categorical in the lexicon seems to be assumed in the literature, at least in Dell (1985) and Côté (2001). It was tested by Côté (2012) in a corpus study of Laurentian French: in this corpus, she found no exception to the LTC under its categorical version in the lexicon (Côté 2012:258). However, the corpus used in this study is probably too small to draw strong conclusions regarding the categorical nature of the LTC at stemsuffix boundaries. Indeed, the corpus used in this study contains 2,530 contexts for schwa but only three instances of CC_C at the boundary between a stem and the inflectional future/conditional suffix r (Côté 2012:258). Alternatively, schwazero alternations could be gradient and sensitive to cluster type in both lexical and postlexical strata but closer to categorical in the lexical stratum. According to this view, the same phonotactic constraints against various types of threeconsonant clusters would be active in both strata but constraint weights would be more similar (and higher) in the lexical stratum. As a result, deviations from categoricity and differences among different types of threeconsonant clusters would be harder to detect in the lexical stratum. This hypothesis will be referred to as the weak version of the lexicalphonology hypothesis. Its predictions are summarized in (6), with (6a) focusing on the predictions at the level of the grammar and (6b) on the predictions at the level of the data.

(6)
Weak version of the lexicalphonology hypothesis: predictions a. Grammar: phonotactic constraints are the same but their weights differ in derivation (lexical stratum) and in inflection (postlexical stratum) {*OLC derivation , *LOL derivation } {*OLC inflection , *LOL inflection } b. Data: schwazero alternations are sensitive to cluster type in both derivation (lexical stratum) and inflection (postlexical stratum) The main goal of this paper is to tease apart these two versions of the lexicalphonology hypothesis. This question also has theoretical implications beyond French. Some theories assume that phonotactic asymmetries ultimately reflect perceptual and articulatory asymmetries (e.g. KawasakiFukumori 1992; Flemming 2002 or sonoritydriven asymmetries among segments (Clements 1990). According to these theories, the same phonotactic asymmetries should be reflected across the grammar's strata if the same perceptual/articulatory or sonoritydriven asymmetries among segments hold across these strata. Under the default assumption that segmental properties are largely independent from morphosyntactic context (e.g. word boundaries, stem suffix boundaries), these theories of phonotactics are more directly compatible with the weak version of the lexicalphonology hypothesis in (6).

Judgment task
The present study uses speakers' metalinguistic judgments as primary data, following a long tradition in linguistics (Schütze & Sprouse 2013; Schütze 2016; Myers 2017 and in the study of French schwa (Dell 1985; Côté 2001; Racine & Grosjean 2002; Racine 2007. The present study's design was inspired by a previous study by Racine (2007Racine ( , 2008) that used metalinguistic judgments to estimate the likelihood of words' schwa forms and schwaless forms in French (see also Racine & Grosjean 2002:312313). More specifically, participants were asked to rate how likely they would be to pronounce schwa variants and schwa less variants for a set of 115 words. The task was slightly different from that used by Racine. In the present study, the task corresponds to a judgment of relative frequency whereas participants in Racine's study were asked to rate the absolute frequency of each variant independently. A judgment of relative frequency was used because it makes it possible to directly obtain the information most relevant for the research question of interest, namely the estimated relative frequency of the two variants. Following Racine (2007), the judgments were elicitated using a sevenpoint Likert scale, with 1 indicating a categorical preference for the schwa variant (e.g. garderie), 7 indicating a categorical preference for the schwaless variant (e.g. gard'rie), and 4 indicating no preference for either form. An example is shown in Figure 1. Based on previous research, French speakers from Switzerland are expected to rate higher schwaless variants as compared to speakers from France (Racine 2007). Participants from both origins were tested to control whether this difference in the baseline rate of schwa production interferes with the LTC. The participants provided their informed consent to participate in the research and agreed to make their data available online.

Experimental items and fillers
Two variables were manipulated to construct the experimental items: Cluster (with three levels: OL_C, LO_L, C_C) and Morphology (with two levels: derivation, inflection). C_C stands for any twoconsonant cluster, LO_L for liquidobstruentliquid clusters, and OL_C for obstruentliquidconsonant cluster. Inflected words all featured the future suffix r because this suffix is to the author's knowledge the only consonantinitial inflectional suffix in French. Inflected words were all presented with a subject pronoun preceding them (e.g. je chanterai 'I will sing') to ensure that they were correctly identified as inflected words. Derived words were presented without any additional information (e.g. garderie).
Four of the six experimental conditions included 15 words whereas the two remaining ones included 14 words. 3 There was therefore a total of 88 experimental items in the study. Table 1 illustrates each condition using items that were featured in the stimulus set. 27 filler items were used in addition. The fillers featured schwa in morphemeinternal position, mostly in the first syllable of words (e.g. chemin 'path'). Experimental items and fillers used in the study can be found in a longer version of this paper. 4 Suffix derivational inflectional Cluster C_C biscuiterie (14) chanterai (15) LO_L conciergerie (15) garderons (15) OL_C soufflerie (14) règlera (15) For each word, the schwa variant was conveyed using the word's graphic form (e.g. garderie). The graphic form always contains an e corresponding to the schwa phone [@]. The schwaless variant was conveyed by replacing the e by the apostrophe (e.g. gard'rie). The order of presentation of the experimental items and fillers was randomized.

Data analyses
While it is common practise to analyze ordinal data such as Likertscale data as metric variables using linear regression, Liddell & Kruschke (2018) show that this can lead to a number of errors, including false alarms (i.e. detecting an effect that is not real), misses (i.e. failure to detect real effects), and even inversions of effects (i.e. the order of the means according to the metric scale is opposite to the true ordering of the means). In this paper, the judgment data were modeled using the ordinal cumulative model (Bürkner & Vuorre 2019:7879). The cumulative model assumes that the observed ordinal response variable derives from the categorization of a latent continuous unobserved variable. In the present study, the ordinal variable is the rating of the preference for the schwa or schwaless variant along the sevenpoint scale. The latent variable is the participant's underlying opinion about the relative frequency of the two variants. To model this categorization in the case of a sevenpoint Likert scale, the cumulative model assumes that there are six thresholds which partition the latent variant variable into seven ordered categories (1, 2, ..., 6, 7). The model provides estimates both for the different conditions' means along the latent continuous variable and the position of the six thresholds. The reader is referred to Bürkner & Vuorre (2019) for further details.
The analysis of the judgment data was also supplemented with a linguistic analysis using probabilistic constraintbased grammars. This additional analysis is motivated by the fact that we ultimately care about the linguistic system that underlies participants' behavior. And this linguistic system can be characterized as a constraintbased grammar. A constraintbased analysis is used because, as noted by Durand & Laks (2000:32), constraints provide a very intuitive interpretation of the LTC as caused by a general markedness constraint *CCC banning threeconsonant clusters. Also recent theoretical papers have modeled schwazero alternations using probabilistic constraintbased grammars (Bayles et al. 2016; Smith & Pater 2020. Among the different families of probabilistic constraintbased grammars, MaxEnt was chosen (Hayes & Wilson 2008) because it is easy to implement and has been shown to provide a good fit to linguistic data compared to alternative frameworks (e.g. Smith & Pater 2020).
A Bayesian approach was adopted (rather than a frequentist approach) for inferring the parameters of both the ordinal regression and the probabilistic grammars. This choice was motivated by the fact that Bayesian inference yields outcomes that are intuitive and easy to interpret. In particular, it provides a posterior distribution for all the model's parameters and combinations of parameter values given the data. This makes it very easy to test any hypothesis about the parameter values and about differences between parameter values. Also, Bayesian approaches virtually always converge to accurate values of the parameters (Liddell & Kruschke 2018).

Ordinal regression
3.1.1 Description of analysis A Bayesian hierarchical ordinal cumulative regression was fit to the sevenpoint Likertscale data as a function of dummycoded factors Morphology (reference level 'derivation'), Cluster (reference level OL_C), and Origin (reference level 'France') and all their interactions, using Stan (Carpenter et al. 2017) and the brms package (Bürkner 2017) in R (R Core Team 2020). The model included the maximal random effect structure justified by the study's design (Barr et al. 2013), allowing the effects and their interactions to vary by participant (Morphology, Cluster) and by word (Origin). 5 The probit link function was used in order to apply a cumulative model assuming the latent variable to be normally distributed (Bürkner & Vuorre 2019:84). The default priors of the brms package were used. Equal variances were assumed for the unobserved variables that underlie the observed ordinal variable.
Four sampling chains with 4,000 iterations with a warmup period of 2,000 iterations for each chain were run, resulting in a total of 8,000 samples. To avoid initialization at too small or too large values, initial values for the MCMC sampler were set to zero. 6 For all relevant parameters, their mean and 95% credibility interval (CI) according to the model's posterior distribution are reported. In the analysis, the parameters concern the latent unobserved continuous variable corresponding to participants' opinion about the likelihood of schwa absence. Due to the way the Likert scale was set up, greater values correspond to a greater likelihood of schwa deletion (according to the participants). For testing hypotheses about the difference ∆ between two conditions, Franke & Roettger (2019)'s recommendations were followed. The posterior probability that this difference is larger than zero (∆ > 0) is reported. If this probability is close to 1 and furthermore zero is outside of the posterior 95% CI for ∆, compelling evidence is considered to be provided for the hypothesis that posits the existence of a difference between the relevant conditions. Figure 2 shows the posterior distribution (mean and 95% CI) of each response category (1, 2, ..., 6, 7) for all cells in the factorial design. This posterior distribution was calculated using Equation 5 in Bürkner & Vuorre (2019:79). This equation expresses the probability of each response category k as a function of the predictors, their corresponding regression coefficients, and the thresholds τ k and τ k−1 inferred along the latent continuous variable.
Differences between clusters in inflected words. Participants were also found to rate schwa absence It can be concluded that there is sufficient evidence to support the hypothesis that both the number and nature of surrounding consonants matters in both derived and inflected words, with OL_C being judged as more likely to feature schwa than LO_L and LO_L more likely to feature schwa than C_C in both derived and inflected words.

Probabilistic constraintbased grammars
To supplement the analysis of judgment data with a more linguistically meaningful interpretation, the data were also analyzed using probabilistic constraint based grammars. In this framework, the likelihood of schwa presence/absence in the different experimental conditions can be directly interpreted in terms of constraint weights. This makes it possible to interpret the judgment data in terms of the relative strengths of phonotactic constraints against consonant clusters. In this section, the judgment data were aggregated across all participants and words. In other words, the grammar that was inferred is the average grammar across participants and words. Although participants from France and Switzerland probably have different grammars because they have different baseline rates for schwa production, they generally show the same asymmetries between consonant clusters, as shown in Figure  2. Therefore, the weights for the corresponding phonotactic constraints should be ordered in the same way for both groups.

Description of analysis
For the constraintbased analysis, the response variable (the 7point Likert scale) was transformed into a binary variable (schwa presence vs. absence). The reason for this transformation is that constraintbased grammars are designed as models of language production (a form is produced or not) rather than as models of metalinguistic judgment. In language production, a form is produced or not. In judgment data, a form may receive a gradient judgment of acceptability and this does not directly translate into a binary choice (unless binary judgments are collected). However constraintbased grammars may be used and are often used to model judgment data (e.g. Boersma & Hayes 2001 on dark and light /l/ in English, Smith & Pater 2020 on schwazero alternations in French). If the judgment data are not binary, this requires applying a transformation that binarizes the data (e.g. Boersma & Hayes 2001:82). In this paper, the following transformation was applied. Words that received ratings strictly above 4 were treated as categorically favoring the schwaless variant. Words that received ratings strictly below 4 were treated as categorically favoring the schwa variant. Words that received a rating equal to 4 were randomly assigned to one or the other category.
Two constraintbased grammars were fit to the transformed data aggregated across participants and words, using MaxEnt as grammatical framework (Hayes & Wilson 2008). Two grammars were constructed to represent the weak and strong lexicalphonology hypotheses, respectively. The first grammar had a different markedness constraint for each of the six clustersuffix combinations (*OLC inf , *LOL inf , *CC inf , *OLC der , *LOL der , *CC der ), allowing for OL_C and LO_L to behave differently in derived words. The second grammar was identical except that it had a single *CCC constraint for derived words, in accordance with the hypothesis that the Law of Three Consonants is categorical in this context (*OLC inf , *LOL inf , *CC inf , *CCC der , *CC der ). 8 Table 2 shows how the first grammar assigns different constraint violations for OL_C and LO_L in derived words.  Both grammars included a faithfulness constraint protecting against schwa epenthesis (Dep(V)). This analysis assumes that the schwaless variant is the underlying form and the schwa variant is derived through epenthesis. This is the classic analysis of French schwa at morpheme boundaries (Dell 1985). However this choice is not crucial to the analysis.
For both analyses, the constraint weights were inferred using a Bayesian binomial regression implemented in rjags (Plummer 2016). To help with model convergence, one of the weights was set to a constant value of 1 (the weight of *CC inf ). Following Goldwater & Johnson (2003), a Gaussian prior with mean equal to zero was chosen for all other constraint weights. Informally, this prior specifies that zero is the default weight for constraints (which means that the constraint has no effect on the output). The variance of the Gaussian 8 In this paper, markedness hierarchies are set up as scalepartition constraint families and not as stringency constraint families (see Smith & Moreton 2012 for a discussion of these two approaches). In the stringency approach, there would be one markedness constraint banning specific clusters (e.g. *OLC) and a general markedness constraint banning all CCC clusters (*CCC) instead of two specific markedness constraints (*OLC, *LOL). Similarly, in the stringency approach, there would be a morphologically indexed markedness constraint (e.g. *OLC der ) and a general markedness constraint that does not depend on morphological domains (*OLC) instead of two morphologically indexed markedness constraints (*OLC der , *OLC inf ). Specific constraints were chosen in all cases so as not to bias the analysis in one way or the other (e.g. OLC is not a priori assumed to be more marked than LOL, clusters are not a priori assumed to be more marked in derivation than in inflection). Constraint weights only (and not constraint violations) will determine whether one context is more marked than the other.  prior was set to 1,000. Three MCMC chains were used with 100,000 samples and a thinning interval of 10 (which means that every 10th value in the chain was kept in the final MCMC sample while all other values were discarded). The first 5,000 samples of each chain were used for burnin (which means they were also discarded). Convergence of the chains on the posterior distribution was assessed using the GelmanRubin statistic: it was very close to 1 for all parameters, 9 indicating that the samples were representative of the posterior distribution (Kruschke 2015:181). The effective sample size for each constraint weight estimated by the model was superior to 10,000, indicating that the MCMC samples were large enough for stable and accurate numerical estimates of the posterior distributions (Kruschke 2015:184). For model comparison, the deviance information criterion (DIC; Gelman et al. 2013:172173) was used.

Description of results
The posterior distributions for the constraint weights are shown in Table   4 and 5 for the grammar that distinguishes threeconsonant clusters in derived words and for the grammar that does not, respectively. Note that *CCC and *OLC inf end up having the same weights in Table 5 but this is not a feature of the analysis. The more complex grammar in Table 4 was found to have a smaller deviation information criterion than the more simple one in Table 5 (∆ = −19.36), indicating that it is a better model of the data. The predicted frequencies of schwa presence under the two grammars are plotted against the attested frequencies in Figures 3 and 4. The predictions of the more complex grammar in Figure  3 better match the attested frequencies. In other words, the data provide evidence for constraints referencing the nature of consonants in CCC clusters even in lexical phonology.  Table 4: Posterior distribution of the constraint weights (mean and 95% CI) in the grammar distinguishing *OLC der and *LOL der 9 The GelmanRubin statistics was calculated individually for each parameter and not globally for all parameters because one of the parameters (the weight of *CC inf ) was set to a constant value of 1 to help with model convergence and a global GelmanRubin statistics cannot be computed in this case. See the following post by Martyn Plummer for more details: https://sourceforge.net/p/mcmcjags/discussion/610037/thread/28cef6e5/.
As expected under the hypothesis that *OLC is more marked than *LOL and *LOL more marked than *CC, *OLC was found to have a greater weight than *LOL and *LOL a greater weight than *CC within each stratum (derivation and inflection), as shown in Table 4

Conclusion
Grammont's Law of Three Consonants (LTC) states that schwa is obligatorily pronounced in CC_C sequences in French to avoid threeconsonant clusters. Although the LTC has been shown to depend on the nature and order of consonants in CC_C in postlexical phonology, Grammont's categorical formulation is still considered as accurate to describe schwazero alternations in lexical phonology. The judgment data collected in this study support the hypothesis that not only the number but also the nature of surrounding consonants matters for schwazero alternations in derived words. This means that Grammont's Law of Three Consonants should be relaxed not only for postlexical but also for lexical phonology. The results suggest that the same phonotactic constraints are relevant in lexical and postlexical phonology, but with potentially different weights in the two strata. Furthermore, the same phonotactic asymmetries were found in both strata. This is compatible with theories of phonotactics that hold that phonotactic asymmetries are not arbitrary but rooted in extragrammatical factors such as perception, articulatory effort or sonority.