Gradient Symbolic Computation (Smolensky & Goldrick, 2016) does derive A’ingae stress patterns

This paper refutes the claim made by Dabkowski (2021) that stress patterns in the A’ingae languagethat he documents and analyses are only explainable by co-phonologies and not by what he refers toas ‘representational’ frameworks such as Gradient Symbolic Computation (Smolensky & Goldrick, 2016)(henceforth GSC). This issue is important because it bears upon the question of what kinds of frameworkscan or cannot account for observed patterns in phonology.


Introduction
This paper refutes the claim made by Da ˛bkowski (2021) that stress patterns in the A'ingae language that he documents and analyses are only explainable by co-phonologies and not by what he refers to as 'representational' frameworks such as Gradient Symbolic Computation (Smolensky & Goldrick, 2016) (henceforth GSC).This issue is important because it bears upon the question of what kinds of frameworks can or cannot account for observed patterns in phonology.
On page 637 Da ˛bkowski (2021) says that the kinds of representations used in GSC "are insufficient to capture A'ingae stress deletion triggered by dominant stressless suffixes," and on page 647 he claims that the behaviour of these suffixes is "process-like and thus cannot be captured in a representational fashion."I argue that the behaviour of A'ingae suffixes appears process-like only when one assumes a serial approach in the first place and that when one takes a parallel view as in standard OT, GSC or Max-Ent grammars, this suffix behaviour does not need to be viewed sequentially at all.His claim on page 647 that "dominant stressless suffixes cause the deletion of preceding stress", is only true if one views stress assignment in A'ingae as occurring in sequential steps, which is not the only possible approach.The surface form of verbs that contain these suffixes is in fact not stressless at all but has stress occurring either on the penultimate syllable if this is the only suffix or else on other loci depending on the presence of other co-occurring suffixes.As I shall show, there exists a learnable, parallel account of all the stress patterns in A'ingae given by Da ˛bkowski in which surface stress patterns depend on the complex interaction of suffixes with gradient input activations as mediated by weighted constraints that determine stress.

Interaction of stems and suffixes in A'ingae
Da ˛bkowski describes two types of stems and four types of verbal suffixes in A'ingae, where the suffixes vary along two dimensions: dominant vs. recessive and stressless vs. prestressing.I shall maintain the labels used by Da ˛bkowski, even though they may not be descriptively accurate in a parallel view, where, for example, 'stressless' does not mean that an output form with no stress results.So-called stressless stems surface with stress on the penultimate syllable if there are no suffixes or else in a position that is determined by the combination of suffixes that occur.Stressed stems surface with stress in a position that results from the competition between the effects of the stem and possible effects of suffixes.Dominant suffixes have a stronger effect on the locus of stress than recessive suffixes, where there is competition between stem type and the types of the suffixes that occur.When there is either one suffix or no suffixes the location of stress is shown in the table below, where stressless stems have no effect on the locus of stress, which is determined by suffix type: penult stress for 'stressless' suffixes and pre-suffix for prestressing suffixes.When the stem is 'stressed', word-initial stress occurs unless the suffix is one of the two dominant types, where dominant stressless results in penult stress and dominant prestressing in pre-suffix stress.When there is more than one suffix, the competition is more complex, with effects to be shown below.All examples are from Da ˛bkowski.
(1) When there is more than one suffix, Da ˛bkowski observes that stressless suffixes always precede and never follow prestressing suffixes.If there is more than one recessive prestressing suffix, in conditions where such suffixes can prestress, it is the leftmost such suffix that prestresses.Recessive prestressing suffixes have an effect on stress if (a) the stem is 'stressless' or (b) there is a co-occurring dominant stressless suffix.In this latter situation, Da ˛bkowski takes a serial approach in which the dominant stressless suffix first deletes stress and then in a later cycle the recessive prestressing suffix prestresses.In the parallel approach advanced here, the dominant stressless suffix acts as a catalyst by boosting the prestressing effect of a co-occurring recessive prestressing suffix.If a dominant prestressing suffix co-occurs with any recessive prestressing suffixes, the dominant prestressing suffix is the one that prestresses.Da ˛bkowski observes that the two dominant prestressing suffixes in the language never occur together.

Gradient Symbolic Computation
Before showing how a GSC analysis can cover all the A'ingae stress patterns presented by Da ˛bkowski, in this section I introduce the GSC framework.As described in Smolensky et al. (2019), "GSC is a cognitive architecture that unifies symbolic and neural-network computation."Symbol structures are represented as vectors and knowledge is represented through weighted constraints in a Harmonic Grammar.This framework constitutes a research program in which outputs are derived from gradient representations in phonology, syntax and semantics (Cho et al., 2017;Faust & Smolensky, 2017;Faust, 2017;Goldrick et al., 2016;Hsu, 2018;Müller, 2017;Rosen, 2016Rosen, , 2018bRosen, ,a, 2019;;Smolensky et al., 2014;Smolensky & Goldrick, 2016;van Hell et al., 2016;Zimmermann, 2017bZimmermann, ,a, 2018)).See also Hsu (2022) and references cited there.
GSC, like other versions of Harmonic Grammar, has weighted constraints.For example, the constraint MAX (which provides Harmonic reward for an input that surfaces) could have weight 0.6.In addition to weighted constraints, GSC allows gradiently activated representations.For example, an input Foot edge (elaborated on below) in the underlying representation could have activation 0.88.Because the output patterns that we are deriving are discrete representations, we consider only outputs whose activations are either 0 or 1. Specific examples of these constraints are given in the analysis below.
Candidates are evaluated for their Harmony, measured through the effect of constraints on possible outputs and on input-output correspondence.These constraints either penalize or reward structure by assigning negative or positive Harmony, respectively.

Da ˛bkowski's hypothetical GSC analysis
Da ˛bkowski presents a hypothetical GSC analysis of A'ingae stress which he claims is incapable of explaining the behaviour of dominant 'stressless' suffixes, which he places lowest on a 'preference hierarchy' of stress-affecting suffixes.His argument is based on an apparent contradiction in the behaviour of these suffixes, which appear strong in that they dominate the effect of stems or of recessive suffixes but weak in that they "do not have any preference for stress assignment" and that "they also have the property of deleting preexisting stress, which is not captured by ranking them with respect to other suffixes or assigning them metrical structure of some intermediate degree of activation."(p.640).Here, he is assuming a serial view, where dominant stressless suffixes first delete stress and then stress is assigned in a later cycle.In an analysis in parallel GSC, I shall show that this suffix does not destress but activates stress through other means and therefore does not need to be given an especially weak input activation.

How a GSC analysis can work
Like Da ˛bkowski, I take stress in A'ingae to occur on the left syllable of a head trochaic Foot, with Foot edges as underlying representations that contribute to determining stress (Yates, 2017).I posit that stressed stems have left Foot edges and prestressing suffixes right Foot edges in their underlying representations.1Because of their more complex behaviour, I posit that dominant stressless suffixes have left and right Foot edges in their underlying representations.
To account for the way different combinations of stems and affixes affect stress, I posit two kinds of MAX and DEP constraints: first, MAX and DEP Path constraints on Foot edges that care about location and are sensitive to whether or not the input and output forms of a Foot edge occur in the same position with respect to the segmental melody of the word.I also consider MAX and DEP non-Path constraints on Foot edges that don't care about location.To prevent such constraints from allowing input Foot edges on a suffix to migrate to the stem in the output, or vice-versa, I also posit the following two strongly-weighted constraints that prevent Foot edge migration across a stem-suffix boundary.One way to formulate this is to say that no material that is not part of the stem may intervene between phonological elements that are part of the stem.
(2) STEMCONTIGUITY: "∀x, y and z, phonological objects in the output such that x ≺ y ≺ z and x and z are members of the stem, the y must be a member of the stem.
This constraint is violated if a Foot edge that originates on a suffix surfaces inside a stem.It is also violated if a Foot edge that originates on a stem surfaces to the right of the left edge of the leftmost suffix.In addition, the following constraint prevents the input right edge of an affix from forming the right edge of a Foot in a stem.
(3) CRISPFOOTEDGESTEM "If the rightmost element of a stem (Foot edge or segmental melody) in the output has an input correspondent, that input correspondent is a member of the stem."I take the above two constraints to be strongly weighted such that they are not violated.
To account for the fact that among multiple recessive prestressing suffixes, it is the leftmost one that prestresses, I propose a constraint, following Hyde (2012) that localizes Anchor constraints as Hyde does for alignment constraints.ANCHORRTFTEDGE (labeled Φ R ) checks whether a right syllable edge aligned with a right Foot edge in the input is so aligned in the output, but applies only within a Foot in the output.If an input syllable σ i ) Φj with right Foot edge ) Φj surfaces in Φ j but is not anchored to its right edge, Φ R penalizes by the input edge activation times the constraint weight, but if σ i surfaces outside a Foot, there is no violation.For ease of measurement, this constraint is evaluated negatively.
(4) ANCHORRTFTEDGE: "If the right edge of syllable σ i is aligned with the right edge of Foot Φ j in the input, then if σ ′ i occurs in a Foot in the output, the right edge of σ ′ i must align with the right edge of Φ ′ j in the output, where σ i corresponds to σ ′ i and Φ j corresponds to Φ j ." To account for the cases where stress surfaces in the penultimate syllable, as does Da ˛bkowski, I posit the following constraint: (5) ALIGNHEADFTRIGHT: "The right edge of the head Foot of the word is aligned with the right edge of the word." This constraint receives positive Harmonic reward when it is satisfied.
To account for the fact that initial stress can only occur when the stem is underlyingly stressed, I posit an anchoring constraint: (6) DEPANCHORSTEMLEFT: "If a Foot occurs at the left edge of the stem in the output, there must be a corresponding left Foot edge at the left edge of the stem in the input." 6 Learning constraint weights and activations In order to show how A'ingae stress patterns can be explained by a GSC analysis, I ran a learning algorithm on 25 examples that cover all the combinations of stem and suffix types that appear in Da ˛bkowski (2021).14 were used in training and 10 were held back to test on with the weights and activations learned by the 14 training examples.
Two different learning methods were tested.The first was the Error-Driven Gradient Activation Readjustment algorithm of Smolensky et al. (2019).For this method, a certain amount of supervised initialization of values was done.For each example in each epoch for training, candidates with each possible stress locus were considered and the Harmony for the candidate calculated according to constraint weights and input activations.If the wrong candidate of a training example received the highest Harmony, weights and activations were each adjusted in a direction that favoured the intended winner.After 14 epochs, weights and activations were found that correctly predicted all training and test examples.
Because this method resulted in a narrow Harmonic margin between the desired winner and its runner-up candidate in few examples such as the one in (8) below, a second learning method was also tested.In this case, weights and activations were randomly initialized as Pytorch parameters and, for each example, Harmonies for each candidate were calculated and cross-entropy loss between the Harmonies of the predicted and target candidates was used to adjust weights and activations with the Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.03.A sigmoid function was applied to activations to keep them between 0 and 1 and the absolute values of constraint weights were exponentiated to keep both positive and negative constraints on the correct side of 0. With this method, it took an average of 30 epochs to correctly learn parameters that predicted all train and test examples.It was found that by extending the epochs to 100, Harmonic margins improved, where the Harmonies of candidates converted to probabilities by softmaxing were in most cases 100% for the winning target candidate and in the worst case, 89% for one example.The tableaux shown below use weights and activations that were learned by this second method.In tableaux ( 8) and ( 9) we shall see that the Harmonic margin for the winning candidate is large enough to the extent that the predicted probability of the winner is 100%.2To achieve maximum generality over the examples, in the tableaux below, stems and affixes are represented by type as A, U, D, N, R and S as shown below in ( 7). 7 A crucial case that Da ˛bkowski considered problematic for GSC After a stressed stem, a dominant stressless suffix allows a recessive prestressing suffix to prestress where it wouldn't without the help of the stressless suffix.For Da ˛bkowski, the stressless suffix feeds the recessive prestressing suffix.In parallel GSC, the stressless suffix boosts the prestressing effect of the recessive prestressing suffix ((8) below) by increasing the reward of non-path MAX-FOOT-EDGE constraints and reducing the penalty of non-path DEP-FOOT-EDGE constraints.d) and (e) the deficit is a full 1.0 for the DEP-PATH constraint, since there is no input left edge at the same locus as where it surfaces.For the MAX and DEP right edge constraints, in candidates (b), (c) and (d) it is most Harmonic for the input right edge on the stressless suffix S to surface rather than the edge on R, because of S's higher activation.Assuming a high weight on CRISPFOOTEDGESTEM, we do not consider a candidate that is like (a) but in which the input source of the right edge of the Foot is the right edge of S. Such a candidate would incur a penalty from CRISPFOOTEDGESTEM because the Foot edge at right edge of the stem does not have an input correspondent that is part of the stem.
Candidate (c) violates ANCHORRTFTEDGE because S occurs in a Foot in the output and the right edge of S aligns with a Foot in the input but is misaligned with the same Foot edge in the output.Candidate (d) does similarly for R but with less penalty because of the lower input activation on the recessive suffix R.
Candidate (e) is optimal because the input left Foot edge on S is able to migrate to the left of N, gaining Harmonic reward for MAXLEFTFOOTEDGE and reducing Harmonic penalty for DEPLEFTFOOTEDGE.
In tableau (9) below, the lack of a stressless suffix in the input means that there is no available boost to the activation of the Foot edges on candidate (d).Shaded cells in the two tableaux highlight the crucial differences in Harmonies between the two examples.8 Some other key derivations This section presents, with tableaux, derivations of some other key stem-affix combinations that show how a GSC analysis can correctly predict the examples given by Da ˛bkowski.
8.1 Leftmost recessive prestressing suffix wins As observed by Da ˛bkowski, dominant prestressing suffixes never occur multiple times in a word so they cannot compete with each other for stress assignment, but recessive prestressing suffixes can compete and it is always the leftmost one that determines stress.The following example illustrates how the constraint ANCHORRTFTEDGE determines stress in such a case.Even though candidate (c) has a reward for having a right-aligned head Foot, candidate (b), in which the leftmost recessive prestressing suffix prestresses, wins because candidate (c) violates ANCHORRTFTEDGE.The leftmost recessive suffix occurs inside a Foot in the output but its input right edge either does not surface, or if it does, it does not do so in the same location as in the input.Its Harmonic penalty is the weight of the constraint (−73.1)times the input activation (0.12) of the right Foot edge on the recessive suffix that fails to surface.In candidate (b), the rightmost recessive suffix does not surface inside a Foot so it is not susceptible to a violation of this constraint.Fore ease of exposition, we do not show the constraint DEP-ANCHOR-STEM-LEFT which is also violated by candidate (a) for which the left edge of the Foot in the output has no input correspondent.Candidate (c) violates ANCHORRTFTEDGE because S surfaces within a Foot but its input right Foot edge does not surface.Candidate (e) also violates this constraint because the first syllable of D occurs in a Foot in the output but its input right Foot edge does not surface at the same locus.The stressed stem does not retain its underlying stress partly because of the heavy DEP penalty on the right Foot edge of candidate (a).Even if the stem were monosyllabic, with candidate (a) as (AS)NDD, which would somewhat mitigate the DEPRTFOOTEDGE penalty through the right Foot Edge of S surfacing, it would not be enough of a Harmony boost for this candidate.And candidate (b) is worse off than (d) mainly because of the weaker input left Foot edge activation on S as compared to D and weaker left edge activation on A as compared to S.

8.2
8.4 Dominant prestressing with recessive prestressing A dominant prestressing suffix will always prestress in preference to the effect of any recessive prestressing suffix that is also present.We take a bisyllabic dominant prestressing suffix to have a right Foot edge in the input on its first syllable.Candidate (c), in which the recessive prestressing suffix surfaces within a Foot, incurs a Harmonic penalty for having its input right Foot edge not surface.The penalty is the weight of the constraint times the input activation of the recessive prestressing suffix, i.e., −73.1 × 0.12 = 8.5.This candidate still wins over candidate (b) because the higher right Foot edge activation on the dominant suffix is higher than on the recessive one, which results in a higher reward for MAX-PATH-RT-FOOT-EDGE and lower penalty for MAX-PATH-RT-FOOT-EDGE.

Discussion
The relative input activations of morphemes affect their ability to affect stress.For Da ˛bkowski, the A'ingae data appeared to pose a challenge for GSC because, in his view, dominant stressless suffixes seemed to have no preference for stress assignment.As a result, he considered them to be "located on the lowest rung of the preference hierarchy", resulting for him in an apparent contradiction.This illusory contradiction disappears in parallel GSC, where these suffixes enable stress to occur through their underlying representation and the effects of locus-agnostic Max and Dep constraints on Foot edges.These results are important because they refute the claim that that GSC cannot handle these kinds of 'dominance' effects.
Learned input activations on Foot edges: (none on N = recessive stressless suffix or on U = stressless stem)Left edge Right edge ( φ A (stressed stem) 0.74 R) φ (recessive prestressing suffix) 0.12 ( φ S (dominant stressless suffix) 0.91 S) φ 0.28 D) φ (dominant prestressing suffix) 0.98 Subscript labels on a Foot edge of a candidate indicate its input source.Arrows indicate that it migrated from a different position; underscore, that it has no input correspondent.
-FT-EDGE 'Max(' rewards the surfacing of input ( A (candidates (a) and (b)) and input ( S (candidates (c), (d) and (e).)MAX-PATH-LEFT-FT-EDGE rewards the surfacing of input ( A in the same location (candidate (a)) and input ( S in the same location (candidate (c).)The two left-edge DEP constraints penalize Harmony by the deficit between input and full activation.In candidates (b), ( Recessive prestressing suffix with stressed stem The input activation on a recessive prestressing suffix is too weak to overcome the effect of a stressed stem.Candidate (a) fares better than the other two candidates with respect to Faithfulness to a left Foot edge by a greater margin than candidate (b)'s Harmonic advantage from having Faithfulness to a right Foot edge.Dominant prestressing beats dominant destressing Here DD represents a single bisyllabic dominant prestressing suffix.Even if its second syllable had an underlying right Foot edge, as shown below, candidate (e) is sub-optimal.