Variables Must be Limited to a Single Feature*

These markedness constraints function similarly to the rules above. The first constraint bans voiceless sounds before voiced ones, while the second constraint bans the opposite ordering of segments. Halle’s (1962) proposal points out that assimilation and dissimilation can be represented more simply than the analysis in (1) if the phonological grammar has the ability to make use of algebraic variables in its representations. Such an algebraic representation is shown in (3), where α represents either [+] or [−], but must be the same value across feature bundles.


Introduction
Since they were first introduced by Halle (1962), algebraic variables (often called alpha notation) have primarily been used by phonologists to describe and represent assimilatory and dissimilatory patterns. Without such variables, assimilation processes can be represented as two separate rules, as shown in (1) In this example, the first rule voices consonants that precede voiced segments and the second rule devoices consonants preceding voiceless ones. A similar strategy is necessary when using variable-free constraint-based frameworks to express assimilatory patterns, as illustrated in (2). (

2) *[−Voice][+Voice] *[+Voice][−Voice]
These markedness constraints function similarly to the rules above. The first constraint bans voiceless sounds before voiced ones, while the second constraint bans the opposite ordering of segments. Halle's (1962) proposal points out that assimilation and dissimilation can be represented more simply than the analysis in (1) if the phonological grammar has the ability to make use of algebraic variables in its representations. Such an algebraic representation is shown in (3), where α represents either [+] or [−], but must be the same value across feature bundles.
Variables like the α above can also be used in constraints, such as *[αVoice][−αVoice], which bans any segmental bigrams that don't match in their voicing value. By creating dependencies across the feature bundles that constraints and rules are made of, alpha notation allows for simpler representations when capturing patterns like assimilation and dissimilation. However, Halle (1962) did not limit his use of variables to these kind of processes-he also used variables to simplify the representation of more arbitrary patterns. In this paper, I show that applying variables in this unconstrainted way causes the representation of certain phonotactic patterns to be simplified and that this simplification prohibits an otherwise standard MaxEnt model from being biased in ways that reflect human behavior. As a solution, I propose limiting variables in a manner that makes them represent more modern theories of assimilation and dissimilation like autosegmental spreading (Goldsmith, 1976) and surface correspondence constraints (Rose & Walker, 2004).
The paper proceeds as follows: §2 describes Halle's (1962) use of variables outside the context of assimilatory and dissimilatory patterns, §3 describes the phonotactic patterns that will be used here and the previous research that has explored their learnability, §4 demonstrates that a model with unconstrained

Unconstrained variables
Halle's proposal for using variables in phonological representations did not include any kind of restriction on how the variables could be utilized. While assimilation and dissimilation were the focus of his proposal, he also used alpha notation to describe a more complex pattern present in Slavic languages. In this pattern, /o/ raises to [u] and /e/ lowers to [ae] before nasals. He unified these two mappings into a single rule, shown in (4).
When an /o/ occurs before a nasal, the value of α in the rule will be Halle (1962) is not the only person to discuss using variables in this unconstrained manner-Wang (1967) also used alpha notation to tie together separate features' values when analyzing tonal processes. This kind of approach has been critiqued on theoretical grounds in the past (McCawley, 1971;Schuh, 1978;Odden, 2013), but in §4 I will show that it also makes incorrect empirical predictions. Shepard et al. (1961) originally explored and defined six different types of patterns. While they originally defined these for the domain of visual category learning, they have been applied in number of studies to capture the complexity of phonological patterns. Here I will focus on the first two pattern types, labeled Type I and Type II. Type I patterns involve a single, valued feature (for example, banning all voiceless sounds would be a Type I restriction). Type II patterns are more complex, and involve two features and a logical biconditional (for example, banning any sounds from a language that are either [−Voice, +Continuant] or [+Voice, −Continuant]).

Representational complexity and phonotactic learning
Across domains, Type I patterns have been shown to be consistently easier for humans to learn than Type II patterns. This Type I ≫ Type II bias has been demonstrated for phonotactic learning (Moreton et al., 2017) and mirrors trends in phonological typology (Moreton & Pertsova, 2014). Furthermore, Pater and Moreton (Moreton & Pater, 2012) show that a number of complexity-based biases documented in the artificial language learning literature can be expressed as a preference for Type I patterns. Pater and Moreton (2014) showed that the Type I ≫ Type II bias emerges for free from a Maximum Entropy phonotactic learner (henceforth MaxEnt; Hayes & Wilson, 2008) with a conjunctive constraint set. The reason for this is Type I patterns' simpler representation. This is illustrated in (5)  In the example above, only a single constraint is needed to represent the Type I pattern, "No voiceless segments." However, two constraints are necessary to represent its Type II counterpart, "No voiceless Prickett continuants or voiced non-continuants." MaxEnt models' learning process involves moving more weight onto constraints that ban unattested sequences and taking weight away from constraints that do not. Pater and Moreton (2014) found that their model was able to acquire Type I patterns more quickly than Type II because there were fewer constraints that were crucial to capturing the pattern (see Moreton et al., 2017 for a more detailed account of why this is the case). I replicated this finding with a reimplementation of Pater and Moreton's (2014) model and data based on the Type I and II patterns in Moreton et al. (2017). The complete set of training data for each language is given in Table 1 Moreton et al. (2017). To limit the number of total constraints given to the model, velar sounds were replaced by labials and vowels were replaced with continuants. All words occurred with an equal frequency.  Table 1 separately, using online gradient descent and a learning rate of .01, averaged over 15 repetitions. At each epoch (i.e. each full pass through the data), the model's estimated probabilities for each training datum in the language were averaged and the average of these results across repetitions is plotted below in Figure 1.

Prickett
These results demonstrate that, as found previously with MaxEnt models using conjunctive constraint sets (Pater & Moreton, 2014;Moreton et al., 2017), Type I patterns are more easily acquired by the model than minimally different Type II patterns. This is a desirable feature of a phonotactic learner, since it mirrors the biases observed in human learning.

Simplifying Type II
When unconstrained variables like the ones used by Halle (1962) are added to the representations of the MaxEnt phonotactic learner described in §3, the difference in representational complexity between Type I and Type II patterns disappears. This is illustrated in (6) The learner's average probability across repetitions is shown in Figure 2, demonstrating that the Type I ≫ Type II bias has disappeared. This shows that the simplification of Type II's representation was not without consequence-it also simplified the learning process and interfered with the emergent biases that variable-free MaxEnt models predict.

Conclusions
Here I've shown that the kind of unconstrained variables that Halle (1962) proposed make pathological predictions about the relative learnability of Type I and Type II phonotactic patterns. This suggests that if variables are to be included in theories of phonology, they need to be limited in a way that preserves the Prickett Type I ≫ Type II bias normally predicted by theories of phonotactic acquisition. 1 Since humans do have a learning bias that prefers assimilatory and dissimilatory processes (Moreton, 2008(Moreton, , 2012Gallagher, 2013), any limitation on the use of variables should not restrict them from simplifying the representation of those patterns.
By restricting variables so that they can only create dependencies between different values of the same feature, both of these goals are realized. This ensures that Type II patterns like the one tested in §3-4 cannot be simplified using alpha notation, while the complexity for assimilation and dissimilation can be reduced. While this restriction was absent from Halle's (1962) original proposal for variables in phonology, it resembles a number of more modern theories of assimilation and dissimilation such as autosegmental spreading (Goldsmith, 1976) and surface correspondence constraints (Rose & Walker, 2004).