Learning a gradient grammar of French liaison∗

In French, liaison consonants (which we represent here as L) are weak elements that only appear in inter-word contexts (W1 W2) when W1 is one of a specific set of words (typically written as ending with a specific liaison consonant) and W2 is vowel-initial (Côté, 2011, 2008, 2001). However, liaison consonants do not surface when W2 is an “h-aspiré” word like héros ‘hero.’ These words appear to be vowel-initial but resist liaison and block @-deletion when following le/la ‘the’ or (optionally) une ‘a’ (fem.). h-aspiré words also block resyllabification of W1 final consonants (e.g., quel hibou (kEl.i.bu)) ‘what an owl’). Furthermore, such effects vary across morphemes (Zuraw & Hayes, 2017) and interact with the appearance of feminine and plural morphemes at the W1 W2 juncture. Learning to account for the full range of interacting phenomena presents a significant challenge. Here, we propose the Error-Driven Gradient Activation Readjustment (EDGAR), an error-driven algorithm that can successfully acquire a grammar that accounts for the full range of phenomena reviewed above. This grammar is stated in the Gradient Symbolic Computation (GSC) framework, a type of Harmonic Grammar incorporating partially-activated representations (in addition to real-valued constraint weights). Smolensky & Goldrick (2016) (henceforth S&G) developed a GSC analysis deriving liaison patterns from the coalescence of partially-activated input consonants L1 and L2 that occur finally in W1 (L1) and initially in W2 (L2). A liaison consonant L surfaces iff its aggregate activation surpasses an epiphenomenal threshold determined by weighted MAX and DEP constraints. The work reported here goes beyond S&G’s hand-calculated activation and weight values. EDGAR learns both constraint weights and the activation level of representations, accounting for liaison and its interactions with other processes. The paper is structured as follows. Following a review of the core empirical patterns in French liaison, we discuss the two primary analysis strategies in the literature (viewing liaison consonants as belonging exclusively to W1 or W2). We then present the basic structure of the GSC analysis which blends together these two accounts (viewing liaison consonants as associated to both words). We introduce our learning model, EDGAR, and show how the acquired grammar accounts for the empirical data reviewed above.


Introduction
In French, liaison consonants (which we represent here as L) are weak elements that only appear in inter-word contexts (W 1 W 2 ) when W 1 is one of a specific set of words (typically written as ending with a specific liaison consonant) and W 2 is vowel-initial (Côté, 2011(Côté, , 2008(Côté, , 2001. However, liaison consonants do not surface when W 2 is an "h-aspiré" word like héros 'hero.' These words appear to be vowel-initial but resist liaison and block @-deletion when following le/la 'the' or (optionally) une 'a' (fem.). h-aspiré words also block resyllabification of W 1 final consonants (e.g., quel hibou (kEl.i.bu)) 'what an owl'). Furthermore, such effects vary across morphemes (Zuraw & Hayes, 2017) and interact with the appearance of feminine and plural morphemes at the W 1 W 2 juncture.
Learning to account for the full range of interacting phenomena presents a significant challenge. Here, we propose the Error-Driven Gradient Activation Readjustment (EDGAR), an error-driven algorithm that can successfully acquire a grammar that accounts for the full range of phenomena reviewed above. This grammar is stated in the Gradient Symbolic Computation (GSC) framework, a type of Harmonic Grammar incorporating partially-activated representations (in addition to real-valued constraint weights). Smolensky & Goldrick (2016) (henceforth S&G) developed a GSC analysis deriving liaison patterns from the coalescence of partially-activated input consonants L 1 and L 2 that occur finally in W 1 (L 1 ) and initially in W 2 (L 2 ). A liaison consonant L surfaces iff its aggregate activation surpasses an epiphenomenal threshold determined by weighted MAX and DEP constraints. The work reported here goes beyond S&G's hand-calculated activation and weight values. EDGAR learns both constraint weights and the activation level of representations, accounting for liaison and its interactions with other processes.
The paper is structured as follows. Following a review of the core empirical patterns in French liaison, we discuss the two primary analysis strategies in the literature (viewing liaison consonants as belonging exclusively to W 1 or W 2 ). We then present the basic structure of the GSC analysis which blends together these two accounts (viewing liaison consonants as associated to both words). We introduce our learning model, EDGAR, and show how the acquired grammar accounts for the empirical data reviewed above. Côté (2011)  But in addition to syntactic constraints on the occurrence of liaison, we find that it occurs only in certain phonological contexts, as shown in (2): (2) Liaison-associated W 1 followed by vowel-initial word

Two competing analyses of liaison
In the literature we find two competing analyses of liaison: a final-L analysis (e.g. Tranel (1994)) and a L-initial analysis (e.g., Morin (2005)). In the former, a liaison consonant, which occurs underlyingly at the right edge of W 1 , surfaces when, and only when, it is necessary to provide an onset for a following vowel-initial word (satisfying a constraint such as ONSET). This occurs except before h-aspiré words.
If we compare the two approaches, we find that each approach has its own advantages and disadvantages: neither covers all of the data. Here, we briefly review the empirical data; see S&G §3.1- §3.10 for further discussion. (The following phenomena are treated in the full account of which the results presented here are a part,) 3.1 Challenges for syllabification-motivated final-L analyses In final-L analyses, liaison consonants surface to 'repair' onset-less W 2 . To account for h-aspiré words' exceptionality, this constraint must be weakened. 1 But this fails to explain why W 1 s with fixed final consonants regularly exhibit resyllabification with h-aspiré words (e.g., quel hasard [kEl.a.zaK/kE.la.zaK] (Tranel, 1995)) -suggesting a pressure for onsets to occur even in the initial position of h-aspiré words.
3.2 Accounting for W 1 effects in an L-initial analysis Allomorph selection for W 2 , e.g., among /tami, nami, zami/, depends on idiosyncratic information in W 1 . If there is no underlying final /t/ in petit 'small, masc.' (as proposed in the L-initial analysis), it is not clear how it selects allomorph /tami/ as W 2 . It is also unclear why the W 2 allomorph shares a consonant with the related feminine form petite, where a /t/ surfaces word-finally in citation form.

3.3
Liaison is not exclusively driven by properties of a single morpheme A further challenge for either a final-L or L-initial analysis is that when liaison is variable or optional, there is a frequency effect in which the probability that liaison occurs increases with the conditional probability of the W 2 given W 1 (See §8.4 below for references and further discussion.)

Fundamentals of the Gradient Symbolic Computation Analysis
We show that Gradient Symbolic Computation (Smolensky & Goldrick, 2016) (henceforth GSC) can effectively blend the two approaches to French liaison: L-initial and L-final, to achieve a greater degree of explanation than either of them alone. GSC is a novel category of computation -a cognitive architecture that unifies symbolic and neural-network computation. Representations are symbol structures whose components are associated with continuously-varying activation values. Knowledge is represented through weighted constraints, specified by a Harmonic Grammar. This formalism is part of a larger research program in which computation derives outputs from gradient representations in phonology, syntax and semantics (Cho et al., 2017;Faust & Smolensky, 2017;Faust, 2017;Goldrick et al., 2016;Hsu, 2018;Müller, 2017;Rosen, 2016Rosen, , 2018bRosen, ,a, 2019Smolensky et al., 2014;Smolensky & Goldrick, 2016;van Hell et al., 2016;Zimmermann, 2017aZimmermann, ,b, 2018.

Constraints over partially activated representations GSC, like other versions of Harmonic
Grammar, has weighted constraints. For example, the constraint MAX (which provides Harmonic reward for an input that surfaces) could have weight 0.6. Novel to GSC is the use of gradiently activated representations. For example, a consonant /t/ in the underlying representation could have activation 0.48. In addition to consonants, we discuss below gradient morpheme boundaries in the underlying form as well as underlying elements that are pure activation, with no underlying segmental/featural material. (N.b. As the patterns we seek to explain comprise discrete representations, we consider only discrete possible outputs, with activations limited to 1 for present or 0 for absent representational elements.) Candidates are evaluated for their Harmony, measured through the effect of constraints on possible outputs and on input-output correspondence. These constraints either penalize or reward structure by assigning negative or positive Harmony, respectively.
Constraints considered here that penalize candidates via negative Harmony: • DEP: assesses a Harmony penalty that is proportional to the activation deficit between output material and its corresponding input material, 2 e.g., input activation of 0.48 on an underlying final liaison /t/ of petit results in a harmonic penalty of (1 − 0.48) × w Dep if the /t/ surfaces. 3 • UNIFORMITY: penalizes single output segments that correspond to multiple input segments (coalescence). One violation per output segment. 4 • INTEGRITY: penalizes segments in the input that have multiple correspondences in the output. One violation per input segment. 5 • ONSET: penalizes onsetless syllables. One violation per syllable.
• NOCODA: penalizes each syllable with a coda. One violation per syllable. 6 Constraints considered here that reward candidates via positive Harmony: • MAX: rewards a candidate in which underlying elements have output correspondents. Reward is proportional to the activity of each underlying element that surfaces; e.g., input activation of 0.48 on an underlying word-final liaison /t/ in petit creates a harmonic reward of 0.48 × w M ax when it corresponds to an output segment [t].
• ANCHOR-L/R-C/V-MORPHEME-σ: If the left/right edge of a vowel V i / consonant C i is aligned with the left/right edge of a morpheme M j in the input, then the left/right edges of V i /C i and M j align with the left/right edge of a syllable in the output. The harmonic reward is w Anchor times the activation of the underlying material (C/V or morpheme boundaries; as discussed in §8, the strength of each morpheme boundary is determined by the gradient activation of the edge segment(s)).

The Error-Driven Gradient Activation Readjustment (EDGAR)
To weight these constraints and acquire the activation level of various components of underlying representations, we developed the Error-Driven Gradient Activation Readjustment (EDGAR). This builds on work such as the Gradual Learning Algorithm of Pater & Boersma (2013), but incorporates learning of representation activations in addition to constraint weights. Our dataset comprised a set of examples relevant to the liaison pattern (see below for further discussion, and the Appendix for a comprehensive list of training examples). For each example, the candidate with the greatest Harmony is calculated from input activations and constraint weights. If the wrong winner is chosen, EDGAR strengthens the weights of constraints and activations on segments that favour the desired winner and weakens those favouring the false winner. 7 EDGAR found constraint weights and input activations that derived all the examples correctly. We initialize constraint weights at 0.5 and liaison consonant activations at 0.015. The following pseudocode shows the steps of the algorithm.
(3) η = 0.015 Stepsize for changing activations and weights loop errors ⇐ 0 for t k ∈ training inputs do for t kl ∈ output candidates f or t k do H t kl = c H t kl |c Sum the Harmonies resulting from each constraint c. end for if arg max H t kl has the actual output form then continue else errors ⇐ errors + 1 Strengthen by stepsize η all activations and constraint weights that favour the desired winner. Weaken by stepsize η all non-zero activations and constraint weights that favour the false winner. end if end for if errors == 0 then break Learning completed. end if end loop 6 Accounting for basic liaison phenomena The case of French liaison is an example in which two competing analyses -the liaison consonant belongs exclusively to W 1 vs. W 2 -succeed and fail on distinct sets of phenomena. When a gradient blend 7 The sign of a weight change depends on the contribution of constraints to Harmony (e.g., strengthening a negative vs. positive constraint means making its weight more negative or more positive, respectively). Activations are treated just like positive constraints. Note as well that EDGAR prevents constraints that reward candidates from having negative weights and constraints that penalize candidates from having positive values. of structures from the two analyses can capture all of the data, we have reason to go beyond classical symbol structures in grammatical theory. As Hankamer (1977) 8 writes: "we must give up the assumption that two or more conflicting analyses cannot be simultaneously correct for a given phenomenon (pp. 583-4) . . . such constructions have both analyses at once (in the conjunctive sense)" (p. 592).
6.1 Analysis overview As shown in the following table (drawing from empirical data reviewed by S&G), we propose here a gradient blend of final-L and L-initial analyses, where t represents a partially activated /t/ at the right edge of W 1 in the input and L 2 represents a blend of partially-activated /t,z,n/ at the left edge of W 2 in the input. (4) Overview of input forms Word type Orthog. Gloss Input As word 1 Adj. w. potential liaison petit 'small' (masc.) p@tit (t is partially activated) Liaison consonants surface from two underlying positions: W 1 -final and W 2 -initial, with the two partially-activated inputs coalescing in the output. When only one of these underlying positions is present, there is not enough aggregate input activation on that consonant for it to surface. no{t, z, n} The activation of L 2 alone is not enough for a consonant to surface.
• petit héros /p@tit/ + /eKo/ → [p@.ti.eKo] (No L 2 on h-aspiré words.) no L 2 The activation of t alone is not enough for a consonant to surface.
• Similarly with petit copain: /p@tit/ + /kopẼ/ → [p@.ti.ko.pẼ] 6.2 Sample optimizations In (6), 9 two liaison consonants on W 1 and W 2 coalesce with enough activation to surface. In candidate (g) the combined activations of coalescing L 1 and L 2 result in optimal net Harmony due to faithfulness of 0.34 (MAX reward) -0.25 (DEP penalty). In candidate (a), in which the liaison consonant does not surface, the net Harmonic benefit of 0.06 from ANCHOR-LEFT-V-MORPHEME-σ, combined with -0.72 penalty for an ONSET violation, yields lower Harmony than the liaison candidate. In (7), a single liaison consonant on W 2 does not have enough activation to surface. Parsing the liaison consonant (in coda as in candidate (b) or onset as in (c)) incurs negative net Harmony with respect to Faithfulness. Because of the partially activated liaison consonant at the left edge of /{t, z, n} 2 ami/, there is a gradient morpheme boundary to the left of the /a/ in ami which results in candidates (a) and (b) receiving a partial reward from ANCHOR-L-V. (See §8.2 for further discussion of gradient morpheme boundaries.) In (8), a single liaison consonant on W 1 does not have enough activation to surface. Parsing it (candidate (b)) would violate NOCODA; this results in a greater Harmonic penalty than the reward from ANCHOR-RIGHT-C-MORPHEME-σ, making this less harmonic than the liaison candidate (a). In (9), a single liaison consonant on W 1 does not have enough activation to surface as either coda (candidate (b)) or onset (candidate (c)). While parsing the liaison consonant as an onset avoids a penalty from ONSET, candidate (a) reaps the full benefit of satisfying constraint ANCHOR-LEFT-V-MORPHEME-σ; the h-aspiré word has no liaison consonant at its left edge. (See §8 below for further elaboration.) In the following sections, we move beyond these core cases to consider liaison's interactions with other aspects of French (morpho)phonology.
7 Liaison and other morphemes at the W 1 W 2 juncture 7.1 Morphemes as pure activation: Masculine/feminine alternations We propose that graded activation of underlying material can exist not only at the level of segments, but also at the level of morphemes (Faust & Smolensky, 2017) and features (Rosen, 2016), so that a morpheme may contain no melodic material at all and yet carry activation. In particular, we propose that in French, the feminine gender morpheme is represented underlyingly by pure activation, whose learned value from the algorithm was 0.35. We use this to account for alternations between masculine and feminine forms such petit/petite, where the liaison /t/ always surfaces in the feminine. The following tableaux show how the learned input activation ensures that the liaison /t/ surfaces in feminine forms even without the support of a following word with an initial L consonant. At the same time, the activation of the feminine morpheme is not sufficient to incorrectly cause an L 2 consonant to surface in jolie amie.  Côté, 2011). Even though the algorithm gave the plural morpheme a lower activation than L 1 liaison consonants, it surfaces in preference to an L 1 because the plural is rewarded by ANCHOR-LEFT-C-MORPHEME-σ. For candidate (a) in (12) below, z pl is at the left edge of a morpheme in the input and the left edge of a syllable in the output. This cannot be the case for a L 1 consonant (more strongly activated in the input than an L 2 ) in candidates (d), (e), or (f), which reap no such reward. Producing both the plural and liaison consonants (candidate (b)) is not optimal as it requires splitting the liaison consonant /t 2 , z 2 , n 2 / into two output locations 11 .

ANCHOR and graded activation
8.1 h-aspiré and enchaînement As seen above, EDGAR found a strong weight to ANCHOR-LEFT-V-MORPHEME-σ. By rewarding candidates where the morpheme-initial vowel is also syllable-initial, this constraint accounts for the tendency for h-aspiré words to maintain a syllable boundary at their left edge. Preservation of syllable boundaries accounts for the lack of "enchaînement" (resyllabification) in word pairs like quel hasard 'what chance'. 12 The anchoring constraint rewards maintenance of the syllable boundary at the left edge of hasard in spite of the preceding coda consonant. 10 Note that we assume petit amis is a frequent phrase and therefore reduce the weight of UNIFORMITY in this tableau. Additionally, for a feminine form petites amies, we assume the linear order of morphemes to be /p@tit1/ + /fem/ + /z pl /; the activation of /fem/ combines with t1 as in the examples reviewed in the preceding section. 11 An additional candidate that would have /z pl / surface on W2 is not a contender. While this would earn a 0.14 harmony reward from ANCHOR-RIGHT-C-MORPHEME-σ, the net contribution of MAX and DEP would be −0.13, and the violation of NOCODA (not shown in 12), would decrease harmony by −0.34. 12 Côté (2008) claims that enchaînement is optionally possible for these words. 8.2 h-aspiré and schwa deletion When a segment at a morpheme edge is partially active (partially present, so to speak), where is the corresponding morpheme boundary? We propose that the boundary is, in effect, split, partially existing on each side of the segment. If the activation of the right-edge segment is a, then there is in effect a (right) boundary with activity a following it, and a (right) boundary with activity (1 − a) preceding it; mutatis mutandis for the left edge. When, as in the case of liaison, there are n segments with activation a at the left edge, each of these segments is associated with a (left) boundary of activity a to its left; there is then a (left) boundary with activity (1 − n · a) following the n segments; mutatis mutandis for the right edge. 13 Showing the effective boundary activity via superscripts, we have for example: The Harmony rewards from ANCHOR constraints are proportional to the activity of the relevant boundary.
The following tableaux illustrate how this accounts for optional schwa deletion for h-aspiré forms such as une hache. Similar to the enchaînement cases above, and in contrast to consonant-initial morphemes (see below), ANCHOR-LEFT-V-MORPHEME-σ rewards candidates that avoid resyllabification of the preceding consonant (candidates (b) and (c)). The candidate that maintains the schwa (b) is supported by ANCHOR-RIGHT-V-MORPHEME-σ, modulated by the input activation of the schwa. Alternatively, if the schwa does not surface (c), there is enough reward from ANCHOR-RIGHT-C-MORPHEME-σ (modulated by the gradiently activated morpheme boundary, 1 − a schwa ) to syllabify along the morpheme boundary. The extremely close activations of (< 0.03) for these two candidates lead to optional variation between deletion vs. maintenance of the schwa 14 between these forms. In contrast, for other lexical items schwa deletion occurs. When W 2 is consonant-initial, the high ranked ANCHOR-LEFT-V-MORPHEME-σ constraint is inactive. This is illustrated in (16); inactive ANCHOR is omitted from the tableau. Without the *C.V penalty for (15c), the net negative Harmony contributions from faithfulness to the schwa (MAX 0.25 − 0.34 DEP) now block forms with schwa, (16b-c), from surfacing. 13 An anonymous reviewer suggests that "the model crucially relies on morphological boundaries to be phonological elements." The reviewer assumes that such boundaries necessarily need to be phonological objects in order to be gradient. Our response is that there is no reason that morphology cannot have gradient properties independent of the phonology. See, for example Hay & Baayen (2005) for an argument in favour of gradient structure in morphology. 14 In the GSC framework, the probability of a candidate is proportional to the exponential of its Harmony. If two candidates differ only slightly in Harmony, their probabilities will be similar and we predict optionality. In their group 3 are words such as hiéroglyphe and hiatus that are intermediate in their behaviour between alignant h-aspiré words and non-alignant words that are orthographically h-initial. Our account can explain this behaviour if these words have some activation of liaison consonants at the left edge but less than the activation of regular vowel-initial words. The following shows how the variable behaviour with le could occur for hiatus. First, consider tableaux for non-alignant l'ami and alignant le héros.  Because of the underlying liaison consonants on amis that we take to be absent in héros, allowing the leftmost vowel on W 2 to surface at a left syllable edge does better with respect to (gradiently assessed) left V-anchoring on héros than on amis: the morpheme boundary is gradiently present at the left edge of the vowel in amis and fully present in héros. Intermediate degrees of alignancy can be readily produced by manipulating the degree of activation of liaison consonants. For example, suppose that the blended liaison consonants in hiatus have lower activations than those in amis. As shown in (20), under these conditions, the Harmony of the candidate in which schwa surfaces (a) is nearly equal to the Harmony of a candidate in which it does not (b), predicting greater variation than observed in the preceding cases. 16 In summary, the gradient effects of word 2 can be explained by the degree of input activation on the blend of liaison consonants at the left edge. Words whose behaviour is intermediate between canonical h-aspiré and other vowel-initial words will have a higher degree of activation on that blend of consonants than do regular vowel initial words. 15 Because these data depend on written corpora, where word 1 varies orthographically depending on whether word 2 that follow is alignant, we have no way of knowing from their data, or from other web searches, how these words behave with adjectives such as petit. 16 Following note 14. Note that the learned activation of schwa on le was different from the activation of schwa on une.

Gradient activation and processing-based variability in liaison Graded underlying activation is
not only lexically specified: it can also vary across processing contexts. Kilbourn-Ceron (2017b,a) documents variability in liaison in spontaneous and read speech from French, finding that liaison is more likely to occur when W 1 or W 2 is frequent, and when the conditional probability of W 2 given W 1 is high (see also Côté (2011)). She attributes the impact of W 2 frequency and conditional probability to the Locality of Production Planning Hypothesis (see Kilbourn-Ceron for further discussion of W 1 frequency). This hypothesis claims that variation in liaison (and other external sandhi processes) is due in part to the ease of retrieval and encoding of phonological material participating in liaison. For example, if W 2 is hard to retrieve (due to its low frequency or low predictability), speakers may begin planning and speaking W 1 before they have retrieved W 2 's phonological structure. In this case, W 1 is planned as if W 2 were not present -and therefore unable to trigger liaison. This situation becomes less and less likely if W 2 is easy to retrieve.
In GSC, we can accommodate the effects of W 2 frequency and conditional probability by assuming that the variation in the activation of phonological structure is impacted by retrieval processes. Difficult-toretrieve structure will have activation that is lower than the maximum value specified in its underlying lexical representation. This processing-based variation will therefore have the same effect found in other cases where there is lexical variation in activation, as discussed previously in this section: less active material will be less likely to trigger liaison.

Conclusions
In this work, we have introduced EDGAR, a learning algorithm that can successfully find weights and underlying activations specifying a GSC grammar. This learner grammar successfully accounts for complex interactions between liaison and other processes in French morphophonology. Liaison consonants interact with other gradiently activated morphemes at the W 1 W 2 juncture. The graded activation of underlying material, coupled with ANCHOR constraints sensitive to this graded activation, accounts for variation in the alignancy of different morphemes -including variation in enchaînement and schwa deletion.
We see liaison as providing a paradigmatic example in support of a hypothesis concerning a much broader phenomenon in linguistic theory: long-standing impasses in adequately explaining a set of related linguistic phenomena can be resolved through a gradient blend of two seemingly conflicting approaches. While French liaison is a complex phenomenon that we have by no means fully accounted for, the success of this analysis arguably adds weight to the proposal that gradient symbol structures play an important role in linguistic cognition.
10 Appendix: range of the examples fed to EDGAR