Is Sour Grapes Learnable? A Computational and Experimental Approach *

In this paper, I present results from simulations using three different maximum entropy phonotactic models (Hayes & Wilson, 2008; Moreton et al., 2017): one that can only represent Sour Grapes, one that can only represent standard, attested harmony, and one that has the expressive power to capture both patterns. I then present results from an experiment designed to test the predictions of these models and find that humans behave most like the model that can capture both generalizations—challenging the idea that Sour Grapes is categorically unlearnable.


Introduction
Vowel harmony is a typologically common phonological pattern that restricts which kinds of vowels can appear together in a surface form, often causing all of a word's vowels to agree in their value for one or more features (Rose & Walker, 2011). Harmony patterns are also commonly used in artificial language learning studies (see, e.g., Pycha et al., 2003;Finley, 2008;Moreton, 2012;Lai, 2015), with the focus of most studies often being a comparison between the relative learnability of attested and unattested generalizations. One such unattested pattern-and the focus of this paper-is Sour Grapes Harmony (Bakovic, 2000;Wilson, 2006b;Finley, 2008;Lin & Myers, 2010).
Attested harmony patterns often involve blocker segments, which "block" harmony in the sense that different values of the relevant feature are allowed on either side of the blocker (Rose & Walker, 2011: §3.3.3). For example, if a backness harmony pattern was blocked by [a], the form [tipitaku] would be grammatical, despite the fact that [u] and [i] have different values for the feature [back]. Crucially, in a standard pattern like this, all of the vowels to one side of a blocker (e.g. the first two [i]'s in this example) still share the same value for the harmonizing feature.
In Sour Grapes, disharmonic sequences of vowels are generally not allowed in surface forms, just like attested harmony patterns. However, when a blocker segments is present in a word, otherwise ungrammatical sequences can appear either to its left or right, depending on the kind of Sour Grapes harmony being implemented (Bakovic, 2000;Wilson, 2006b). For example, in a left to-right Sour Grapes pattern, segments to the left of a blocker can disagree in their value for the relevant feature. So, if backness was harmonized in such a pattern, a form like *[tiputu] would be ungrammatical, but a form like [tiputaku] would be allowed, because, in the latter word, the [a] licenses the vowels to its left to be disharmonic. Sour Grapes is predicted by a number of constraint based (Prince & Smolensky, 1993) theories of long distance assimilation (Bakovic, 2000;Wilson, 2006b), with different explanations for its absence from phonological typology. These explanations usually take one of two approaches: they either revise the set of constraints used to capture harmony so that Sour Grapes cannot be represented in the phonological grammar (e.g., Wilson, 2006b;McCarthy, 2011) or focus on the formal complexity of Sour Grapes and work toward a theory of phonology that categorically forbids any patterns that are at that level of complexity (e.g., Heinz, 2018;Smith & O'Hara, 2019;Lamont, 2019). Crucially, both of these approaches seek to categorically limit phonological grammars so that they lack the expressive power to ever capture Sour Grapes harmony. However, past experimental work has struggled to find evidence that Sour Grapes is categorically unlearnable (Finley, 2008;Lin & Myers, 2010).
In this paper, I present results from simulations using three different maximum entropy phonotactic models (Hayes & Wilson, 2008;Moreton et al., 2017): one that can only represent Sour Grapes, one that can only represent standard, attested harmony, and one that has the expressive power to capture both patterns. I then present results from an experiment designed to test the predictions of these models and find that humans behave most like the model that can capture both generalizations-challenging the idea that Sour Grapes is Prickett categorically unlearnable.
The paper proceeds as follows: §2 discusses past theoretical and experimental work on Sour Grapes harmony, §3-6 describe an experiment designed to test the predictions of maximum entropy models that vary in their ability to represent harmony patterns and reports the experiment's results, and §7 concludes.

Background
2.1 Theoretical Work on Sour Grapes Several optimality theoretic (Prince & Smolensky, 1993) approaches to harmony, such as AGREE constraints (Bakovic, 2000;Wilson, 2006b), predict Sour Grapes in systems that include a blocker segment. This is because constraints like AGREE penalize words with blocker segments, regardless of whether vowels to one side of the blocker are harmonized. For example, the words [tiputaku] and [tipitaku] would both incur a violation of AGREE(back)-the former because of the [i...u] sequence in the first two syllables and the latter because of the [i...a] sequence in the second and third syllables. This means that both forms would be expected in a language that has a constraint enforcing [a]'s role as a blocker (such as *[-high, -back]) ranked or weighted more highly than AGREE, and AGREE ranked or weighted more highly than a constraint enforcing faithfulness to underlying values of [back]. However, Sour Grapes is an unattested phonological pattern, inspiring a range of research attempting to explain this absence. Wilson (2006b) proposed that the reason for Sour Grapes' unattestedness is that attested harmony patterns are myopic and Sour Grapes fails to meet this criterion. That is, to determine whether a word is grammatical according to attested patterns, one only needs to look at each of its vowels two at a time, with no need to check pairs of non-adjacent vowels or to look at more than two vowels at once. Since blockers in Sour Grapes license strings of disharmonic vowels of arbitrary length, one needs to look at a potentially infinite window of vowels to determine whether a word is grammatical, to ensure a blocker isn't at its end. Wilson (2006b) suggested that a variant of optimality theory (Prince & Smolensky, 1993) that uses targeted constraints (Wilson, 2001), would solve this issue by only predicting myopic harmony patterns. Targeted constraints do this by spreading the harmonizing feature in a word one vowel at a time and never looking ahead to non-adjacent vowels.
McCarthy ( However, these approaches have been critiqued, since a number of non-myopic phonological patterns do seem to exist (Walker, 2010;Jardine, 2016;McCollum & Essegbey, 2018;Stanton, 2018). Since targeted constraints and SHARE constraints both categorically restrict a grammar so that it can only represent myopic harmony, these patterns are evidence that myopia is not the crucial factor causing Sour Grapes to be unattested.
Another approach for explaining Sour Grapes' absence is Formal language Theory (FLT; Chomsky, 1956). FLT is a way of describing how complex a pattern is in terms of the computational machinery needed to represent it. The framework was originally designed to demonstrate that natural language syntax was more complex than the set of regular patterns (i.e. those that could be represented using finite state machines). However, Johnson (1972) showed that all known phonological mappings could be considered regular. Recent work has supported this finding, showing that attested phonological patterns can be characterized as subregular (Heinz, 2018), and suggesting that this is due to a categorial restriction on phonological learning (Heinz, 2010;Heinz & Idsardi, 2013;Jardine & Heinz, 2016).
Specifically, it has been argued that all phonological patterns can be characterized as either Strictly Local or Tier-based Strictly Local (TSL; Heinz et al., 2011). The former level of complexity includes any pattern that bans a finite set of substrings occurring in a word, while the latter includes any pattern that does so over a tier of segments (i.e., certain classes of segments can be ignored by the pattern). An example of a Strictly Prickett Local pattern that commonly occurs in natural language is the typologically common restriction banning voiceless sounds after nasals (henceforth *NC̥ ; Pater, 1999). This pattern is Strictly Local since it bans any word containing the finite set of strings that results from combining all nasals with all voiceless sounds (e.g. [nt], [np], [mt], [mp], etc.). TSL patterns are also common in phonology and most harmony patterns belong to this region of the subregular hierarchy (Heinz et al., 2011;Heinz, 2018). Figure 1 shows the full Subregular Hierarchy and where each of these two types of patterns are located on it.

Figure 1. The Subregular Hierarchy, with examples of Strictly Local and Tier-based Strictly Local patterns given. Dashed grey lines indicate different orders of logic. From lowest to highest, these are: Conjunction of Negative Literals, Propositional, First Order, and Monadic Second Order. Solid black lines indicate subset relationships.
While Sour Grapes is still a subregular pattern, it does not reside in the same region of the subregular hierarchy as standard, attested harmony patterns (Heinz, 2018;Smith & O'Hara, 2019;Lamont, 2019). This is because all TSL patterns can be defined by a set of banned substrings, but Sour Grapes allows almost any sequence of segments on one side of a blocker. For example, a standard backness harmony pattern in a language with three vowels in its inventory ( (2019) showed that Sour Grapes is located in the noncounting region of the subregular hierarchy, which is more complex than TSL. This is illustrated in Figure  2.

Prickett
Lamont (2019) also shows that several non-myopic harmony patterns that have been compared to Sour Grapes in the past (e.g., tone in Copperbelt Bemba; Jardine, 2016) belong to less complex regions of the subregular hierarchy. This means that FLT provides a way to categorically differentiate between Sour Grapes and attested harmony patterns. Learning algorithms have been developed that are limited to learning TSL languages (see, e.g., Jardine & Heinz, 2016;Jardine & McMullin, 2017) and any theory that used these would correctly predict a lack of Sour Grapes in phonological typology.

Past Experimental Approaches to Sour Grapes
The learnability of Sour Grapes relative to attested harmony patterns has also been explored in experimental work, however neither of the past studies that explored the topic were able to explain the pattern's typological absence. Artificial language learning studies usually seek to show that unattested patterns are more difficult to learn or generalize in the lab than their attested counterparts (see Moreton & Pater, 2012a, 2012b for a review of such experiments) and use this difficulty as a way to explain their typological absence. The idea behind such studies is that if a difficult pattern arises diachronically, it might be quickly replaced by an easier one or never properly phonologized at all by the community using the language (Moreton, 2008). Finley (2008) attempted to train participants on phonological alternations that were ambiguous between Sour Grapes and an attested harmony pattern in an artificial language learning design to see whether participants generalized as if they had learned the latter. This poverty of the stimulus method of artificial language learning experiment has been used effectively to demonstrate a learning bias away from an unattested pattern (e.g., Wilson, 2006a;Finley & Badecker, 2009). However, Finley's (2008) participants were unable to learn either Sour Grapes or the attested harmony-possibly because the combination of blockers and alternations introduced too much complexity into the learning task for participants to handle over the course of a short experiment. Lin and Myers (2010) also attempted to explain Sour Grapes' absence using an artificial language learning experiment. Specifically, they used nasal harmony patterns that were blocked by [s] and [k]. Their participants were native speakers of Taiwan Southern Min, a language with phonemic contrasts for nasality on both consonants and vowels and thus were able to properly perceive a novel pattern that spread nasality. Lin and Myers (2010) trained half of these participants on an attested nasal harmony pattern and the other half on Sour Grapes, then compared the groups' accuracies in a separate testing phase. While they did find a marginal difference between the accuracies in the two conditions, the difference suggested that Sour Grapes was easier to learn than its attested counterpart. Lin and Myers (2010) proposed that Sour Grapes' absence could be due to factors other than learnability, such as limitations introduced by the phonetic origins of harmonic phonological patterns.

Design
To test whether humans have a preference against Sour Grapes patterns, I designed an experiment that combined the two previous attempts to find such a bias (Finley, 2008;Lin & Myers, 2010). That is, like Finley's (2008) study, I trained participants on data that is ambiguous between a Sour Grapes and attested harmony language (a poverty of the stimulus design). However, following Lin and Myers (2010), I will only be presenting participants with the surface forms of the language-meaning that they will not be exposed to any phonological alternations. The goal of this hybridization is to see whether the poverty of the stimulus approach might find evidence of a learning bias against Sour Grapes that more easily explains its typological absence, while hopefully presenting participants with a simpler learning scenario than Finley (2008), so that they succeed in learning the patterns in the first place.
Sour Grapes is more permissive than attested harmony patterns with blockers-that is, the set of surface forms attested harmony allows is always a subset of the ones allowed in a minimally different Sour Grapes pattern. This is because Sour Grapes allows all of the surface forms that are grammatical in attested harmony, as well as words with disharmonic sequences that occur to one side of a blocking segment. This is illustrated in the Venn diagram in Figure 3, for left-to right backness harmony.
The experiment I describe here consisted of two parts: a training phase and a testing phase. In the training phase, participants were presented with words that were grammatical in both Sour Grapes and attested harmony.  Table 1 and Table 2.

Figure 3. Venn diagram demonstrating the SRs that are grammatical in Sour Grapes and Attested Harmony, when [back] is the feature being spread from left to right. Bulleted items describe the words that are grammatical in each language, with descriptions outside of both circles being banned by both patterns. For examples of each of these types from my experiment, see
There were four types of trials in the training phase, each of which was grammatical in both Sour Grapes and attested harmony and conveyed different kinds of information about the pattern being learned. The first trial type was Faithful, which consisted of words that had at most one [i]  The testing phase presented participants with novel words that belonged either to the trial types from training or to two additional trial types: Ungrammatical Both and Ungrammatical-AH trials. These novel trial types were designed to test whether participants learned a Sour Grapes pattern or an attested harmony pattern from the ambiguous data in training. Words in the Ungrammatical-Both trials would be banned by both Sour Prickett Grapes and attested harmony, meaning they contained an [iu] or [ui] pair in their vowel tier without an [a] to the right of the disharmonic sequence. Ungrammatical-AH trials consisted of words that were ungrammatical in attested harmony but allowed by Sour Grapes. These words also contained an [iu] or [ui] sequence in their vowel tier, but there was always an [a] somewhere to the right of it, licensing the disharmonic string. Examples of these additional trial types are shown in Table 2.

Methods
The consonantal templates described in §3 were combined with all relevant vowel sequences to create a pool of possible stimuli. Sets of stimuli were randomly sampled for each participant such that no individual saw the same item more than once in training, but across individuals, words could be reused. A broad transcription for each stimulus was fed through the text-to-speech software, Tacotron 2 (Shen et al., 2018) to convert the transcriptions into .wav files. These audio files were the only version of the stimuli that participants ever encountered, with no orthographic stimuli being presented at any point.
The experiment took an average of 29 minutes to complete and participants (N=40) were paid $7.50 for their time (UMass Amherst IRB protocol number 2017-4040). All participants were recruited using Prolific (www.prolific.co) with the following text advertising the experiment on the website: You will be learning the words of a new, made-up language. At first, you'll participate in a training phase in which you'll hear words, and then be asked to repeat them. Later there will be a testing phase that will ask you to judge whether new words sound like they belong to the language of interest.
After agreeing to participate on Prolific, participants were forwarded to the experiment, which was hosted on Ibex Farm (http://spellout.net/ibexfarm/). The first page of the experiment presented them with the following instructions (which were mostly a repetition of the instructions on Prolific), as well as an informed consent form: In this experiment you will be learning the words of a new, made-up language. At first, you'll participate in a training phase in which you'll hear words, and then be asked to repeat them. Later there will be a testing phase that will ask you to judge whether new words sound like they belong to the language of interest. Be sure to be using headphones so that you can carefully listen to each recording. Also, please use a computer (as opposed to a smart phone) so that all of the audio plays clearly.
They were then taken to a page that played a non-linguistic audio file, which gave them an opportunity to ensure their computer's audio was working properly. Next, the training phase of the experiment began, which presented 30 stimuli from each of the four training trial types discussed in §3, in a randomized order. Each trial began with a page that said "Please listen to the following word" and played an audio file of the relevant stimulus. A new page would then appear that said "Now repeat", giving participants a chance to repeat the stimulus out loud. After the training phase, a new set of instructions was presented: Nice work! You've completed the training phase of the experiment. Next you will participate in the testing phase. In this phase, you will be asked to judge whether new words sound like they belong to the language you've been learning. Be sure to let the full word play before selecting an option. Your response times are being recorded.
Each trial in testing presented participants with a stimulus and then asked them "Does this word sound like it could belong to the language?". They were given the options "Yes" and "No" and were not given any Prickett feedback about their answers. The stimuli in the testing phase included 5 words for each trial type repeated from training (20 total), 10 novel stimuli from each of the four training trial types (40 total), and ten novel items from each of the two testing trial types described in §3.

Predictions
To better understand what predictions models might make for this experiment, depending on whether they have the expressive power to represent Sour Grapes, I implemented three maximum entropy phonotactic learners (Hayes & Wilson, 2008;Moreton et al., 2017): one with a constraint that represented Sour Grapes harmony (violated by any word that would appear in the Ungrammatical-Both trials), one with a constraint that represented the attested harmony pattern (violated by any word that would appear in either the Ungrammatical-Both or Ungrammatical-AH trials), and a model that had both of these constraints. Each learner also had a base set of constraints that allowed it to penalize words containing specific strings of vowels that were four or three segments long (e.g., *uiu, *iiuu, *iuui, etc.). This was meant to represent the possibility that participants were simply memorizing the words in the experiment's training phase and favoring words in testing that had similar segments in their vowel tier.
Each of the models was trained for 10 separate runs on a set of data that was randomly produced in the same manner as the stimuli in the human participants' training phase (described in §4). That is, 30 unique words were randomly produced for each of the 4 training trial types and then given to each of the models. All simulations used batch gradient descent with a learning rate of .01 and initial weights of 1.0. The predictions discussed below are the model's probability estimates after being trained for a single epoch (i.e., one full pass through the data and a single weight update), since this corresponds to the same amount of exposure to stimuli that the participants were given in the experiment. However, these results are qualitatively identical to results after 50 epochs-a more standard amount of exposure to give to a maximum entropy learner.
The test items were also randomly produced in the same manner as the human participants' stimuli (described in §4) and were divided into three categories: words that were ungrammatical in both Sour Grapes and attested harmony (*SG,*AH below, analogous to Ungrammatical-Both trials), words that were grammatical in Sour Grapes but ungrammatical in its attested counterpart (SG,*AH below, analogous to Ungrammatical-AH trials), and words that were grammatical according to both patterns (SG,AH below, analogous to the trial types found in training). The models' average probabilities for these three categories of stimulus after a single epoch of training are shown in Figure 4.

Prickett
The results from the maximum entropy learners show that the three different kinds of models each make unique predictions for the experiment. The model that can only represent Sour Grapes predicted no difference in the acceptability between the SG,*AH and SG,AH words, but predicts that *SG,*AH will be significantly less acceptable. This is because that model's constraint enforcing Sour Grapes is blind to the difference between SG,*AH and SG,AH forms, assigning no violations to either type. The model that can only represent attested harmony predicts that there will be no difference between the *SG,*AH words and the SG,*AH words, but that SG,AH words will be significantly more acceptable than both of the other categories. Similarly, this is because the constraint enforcing attested harmony is blind to difference between *SG,*AH and SG,*AH words, assigning violations to both equally. And finally, the model that can represent both Sour Grapes and attested harmony predicts a three way contrast with *SG,*AH words at the bottom, SG,*AH words in the middle, and SG,AH words at the top, in terms of acceptability.

Results
The proportion of "Yes" responses for the human participants, by stimulus type, is shown in Figure 5. On average, participants were least likely to judge *SG,*AH stimuli as belonging to the language they had been trained on. They were more likely to judge SG,*AH as belonging to the language-however, SG,AH stimuli had the highest likelihood of receiving a "Yes" response from participants. To ensure these differences were unlikely to arise from chance, I ran a logistic regression on the data using the lme4 (Bates et al., 2015) package in R (R Core Team, 2016). The stimulus type of each participant's response was given to the model using the coding scheme in Table 4. This allowed for a comparison of participants' likelihood of saying "Yes" to SG,*AH items with their likelihood of saying "Yes" to the other two trial types. The predictor SG,*AH vs. *SG,*AH compared SG,*AH items with *SG,*AH ones, while the predictor SG,*AH vs. SG,AH compared SG,*AH and SG,AH items. The model was also given random effects of participant and item on the intercept and the two slopes, and its results are shown in Table 5.  Both predictors in this model were found to be significant (p<.01 for SG,*AH vs. *SG,*AH and p<.05 for SG,*AH vs. SG,AH), meaning that the observations about Figure 5 discussed above were unlikely to have arisen from chance. Specifically, SG,*AH items were significantly more likely to get "Yes" responses from participants than *SG,*AH stimuli, but significantly less likely to get "Yes" responses than SG,AH items. These results are consistent with a model that can represent both Sour Grapes and attested harmony patterns, according to the simulation results presented in §5 and are consistent with the evidence that Lin and Myers' (2010) found using a different methodology that suggested both patterns were learnable.

Conclusions
Despite being representable in constraint-based theories of harmony, Sour Grapes patterns are absent from phonological typology. Past attempts to explain this have sought to design phonological theories that categorically exclude Sour Grapes from being representable in the phonological grammar, either by restricting constraint sets (Wilson, 2006b;McCarthy, 2011) or by focusing on the computational complexity of the pattern (Heinz, 2018;Smith & O'Hara, 2019;Lamont, 2019). However, past experimental work (Lin & Myers, 2010) has failed to find a bias against learning Sour Grapes in artificial language learning.
Here I showed that a maximum entropy model that had the expressive power to represent both Sour Grapes and an attested harmony pattern could correctly predict human generalization in an artificial language learning experiment with a poverty of the stimulus design. Future work should test such a model to see if it can help explain Sour Grapes' typological absence despite the model's ability to represent the pattern.