Leveling or reanalysis? An explanation of Middle High German paradigm merger

Strong verbs in Middle High German (MHG) have two past indicative stems in the verb inflectional paradigm, which merged into one in Modern High German (NHG). This change is mostly assumed as paradigmatic leveling in previous studies. However, the NHG past indicative stems are inherited from different cells in the MHG paradigm across different inflectional classes, or even innovatively created by combining different parts of the MHG past indicative stems. This paper attempts to identify the base of leveling using a computational model called Minimal Generalization Learner, proposed in Albright (2002b). The results can account for the extraordinary patterns of merger found in German to some extent, but they are not perfect and even pose new problems. As a counter-proposal, I argue that the merger that appears to be paradigmatic leveling might be triggered by reanalysis of phonological features as morphological exponents.

past singular stem and zero-grade past plural stem. Classes VI and VII have different origins in PIE, and correspondingly, the past indicative forms in these classes share the same stem from the very beginning. For Strong Classes that show vowel alternation between the past 1 st /3 rd and the past plural stems (i.e. Classes I -V), the distinction in stem vowel is lost in NHG, partially triggered by regular sound change in some Strong Classes and partially by morphological change in others. In cases where morphological changes occurred, we can see different directions of changes happened across different Strong Classes or even across different lexical items within the same Strong Class. The past 1 st /3 rd stem and the past plural stem are originally the same in MHG for Strong Classes VI and VII. Therefore, though regular sound changes occurred in the stem vowel (i.e. the monophthongization of MHG /uo/, /ie/ to NHG /uː/, /iː/, see Paul 1929:24), the morphological relationship between these two stems remains intact.
past.1.sg. past.1.pl. past.part. IV /nam/ /naːmən/ /ɡənomən/ /naːm/ /naːmən/ /ɡənomən/ "took" Table 2. Class IV: Merger of the past 1 st /3 rd singular stem and the past plural stems caused by regular sound change Vowel lengthening started in some areas at a period as early as Old High German (OHG), and it spread through the whole High-German-speaking region by the time of NHG. It mostly occurred under three conditions: (1) in open syllables; (2) before /r/ + dental clusters; (3) in monosyllabic words ending with nasals or liquids (Mettke 1967:69-70;Paul 1929:21-22). MHG Strong Class IV verbs happen to be characterized by having nasals or liquids as stem-final consonants. As a result, the stem vowels in past 1 st /3 rd stems (e.g. /nam/) fulfill the third condition and are therefore lengthened. This, then, is a merger triggered by sound change. However, analogical changes are attested in some exceptional cases. For example, the past indicative stem of /ʃerən/ "to shear" is /ʃor-/, whose stem vowel is the same as the stem vowel of past participle and believed to result from the preceding /ʃ/ sound (Mettke 1967:194). But /ʃerən/ has become weak and no longer belongs to Strong Class IV in NHG. 5 2 Stem-final voiced stops are devoiced at word-final position in both MHG and NHG, i.e. /ɡruob/, /ɡruːb/ are pronounced as [ɡruop], [ɡruːp]. But the phonologically conditioned final devoicing rule is irrelevant to our study and not represented in the phonemic transcription. 3 Though there are six subclasses in Strong Class VII, the stem vowels in their past stems are the same. They are thus collapsed into one row in the table. For details about the subclasses see Mettke 1967:198-200, Paul 1929:112, Boor & Wisniewski 1984 In MHG, the past subjunctive stem is phonologically predictable from the past plural stem through Umlaut (Paul 1929:107-108), and in NHG, it can be phonologically derived from the past stem. Therefore, the past subjunctive stem is mostly omitted in tables where inflectional paradigms of MHG or NHG are shown. 5 Strictly speaking, two individual processes of leveling are involved in the development of /ʃerən/ from MHG to NHG. The MHG past indicative stems are first leveled by the past participle in terms of stem vowel. Then its inflectional paradigm is completely rebuilt from its present stem based on derivational rules of NHG weak verbs, since it has become a weak verb in NHG. The past indicative stem in NHG Strong Class I is built based on the MHG past plural stem, whose stem vowel has undergone lengthening when in open syllables, cf. MHG /stiɡən/ "we climbed" > NHG /ʃtiːɡən/. By contrast, the expected descendants of MHG past 1 st /3 rd singular stems */staɪk/ and */leːh/ were replaced.

Class
Similarly, the lengthening of stem vowel in Strong Class V past 1 st /3 rd stem does not result from regular sound change but analogy, because the stem vowels in forms like [ɡab]  In NHG, the /u/ in past plural stems has been replaced by /a/ in past singular forms, so that all past indicative forms share the same stem with /a/ across most lexical items in Strong Class III. The only exception is /vɛrdən/ 'to become'. The distinction between the past singular form /vard/ "I became" and the past plural form /vurdən/ "we became" is retained, and an innovative past singular /vurdə/ "I would" was even created based on the plural stem. (termed Reibelaut in German) in syllable-final position (Mettke 1967:34, Paul 1929:52, Boor & Wisniewski 1984. Like the final devoicing rule, the phonologically conditioned change is irrelevant and not represented in the phonemic transcriptions. 7 The MHG /h/ dropped in NHG because of regular sound change but is still represented in NHG orthography (Paul 1929:28). 8 Mettke (1967:195) states that the stem vowel of NHG Strong Class V indicative past stems can be either long or short, but no examples of indicative plural stems with short vowels are provided. stem. 9 Similar to Strong Class I, the vowel quantity depends on the quantity of the following consonant and the corresponding syllabification: it is lengthened when the stem vowel of the indicative past plural and the past participle in MHG is in an open syllable, and spread into the past 1 st /3 rd singular forms (Mettke 1967:189; for a full list of comparison between long /o/ and short /o/ in past and past participle stems, see Blatz 1895:499).
There are few individual words that originated in MHG Strong Class II but have special stem vocalism in their present stem in NHG, such as /lyːgən/ "to tell lies", /zauɡən/ "to suck", but the stem vowels in their past indicative and part participle forms are the same as other Strong Class II words.

THE MERGER OF STEM-FINAL CONSONANT ALTERNATION FOR PAST INDICATIVE STEMS FROM
MHG TO NHG. Some MHG Strong Classes also show patterns of stem-final consonant alternation in inflectional paradigms, which are inherited from PG and termed Grammatischer Wechsel (Mettke 1967:105-109, Paul 1929:59-61, Boor & Wisniewski 1984 The NHG past forms in Strong Classes Ib and V are created based on the stem vowel of the MHG past plural stems and the stem-final consonants of MHG past 1 st /3 rd singular stem, instead of being directly inherited from either of them.
2. An assumption of paradigmatic leveling. The paradigmatic change described above is traditionally understood as leveling, which is a morphological change where morphophonemic alternations within paradigms are completely or partially given up (Hock 1991:168). Under this assumption, the case of German raises a question about how the direction of leveling is deter-mined, i.e. which is the base form that will replace other alternative forms. Scholars have struggled for decades to solve the problem. For instance, Tiersma (1984) proposes that the base for leveling might be determined by the markedness of paradigm cells, which is indicated by relative frequencies of cells. On the other hand, according to Albright (2002aAlbright ( , 2002b, the predictiveness of a single surface form over other forms in the paradigm can play a crucial role in determining the base of leveling, under the assumption that speakers should use the most predictive surface form as the base for deriving the rest of the paradigm in language acquisition. The theory is worth discussing in detail because it provides a quantitative method of explaining the multiple directions of leveling in the history of German, which is computationally implementable and theoretically falsifiable. With MHG data fed into Albright's model, if the prediction is consistent with the attested outcomes of leveling, we could reasonably conclude that the problem of multiple directions of leveling might be satisfactorily solved within the framework of Albright's model. 2.1. BASIC IDEAS UNDERLYING ALBRIGHT'S MODEL ABOUT PARADIGMATIC LEVELING. There are two underlying hypotheses in Albright's proposal about the organization of paradigm: (1) the form in the paradigm selected by learners is not only single but also a surface form, and the selection is valid for all lexical items, termed global; (2) the selection of base is based on its informativeness, which means the surface form that allows the preservation of most contrasts and permits accurate predictions to most forms of most lexical items will be most likely chosen as the base (Albright 2002b:129). In terms of diachronic change, this organization of paradigms indicates that the remainder of forms in the paradigm might be rebuilt from the global base by analogical changes, but the global base itself would not be leveled by other forms in the paradigm.
Apart from the global base, Albright also proposed, as a refinement, the concepts of subparadigm and local base to establish a more sophisticated paradigm structure, in order to better explain cases in which certain cells in a paradigm enjoy high mutual predictiveness but relatively low predictability by the remaining cells in the paradigm. In this structure, cells with high mutual predictiveness constitute a subparadigm and are not directly derived from the global base but rather from the local base in the subparadigm, which is either derived from the global base or lexically stored. In other words, the local base serves as a medium associating the global base and forms in a subparadigm. The structure permits leveling from the global base to the whole subparadigm including the local base and leveling from the local base to the rest of the subparadigm, but excludes the possibility that some forms in a subparadigm might be derived from the global base while others might be derived from the local base (Albright 2002b:118-120). The relationship among global base, local base and non-base cells can be schematized as in Figure 1: In his later work, Albright (2008) also admits the role that token frequency of forms could play in determining the global base and explores the interaction between frequency and form predictiveness. If the difference in frequency is so large that the potential neutralization might not lead to a serious decrease in the accuracy of prediction, a less predictive form could be preferred over another form that preserves more contrasts and thus selected as the base (Albright 2008:28-31

MINIMAL GENERALIZATION. Albright proposed a model called Minimal Generalization
Learner to quantify a form's predictiveness to another in the paradigm and identify the global base in this way (Albright & Hayes 2002, Albright 2002b, and executable scripts based on the algorithm are available at http://www.mit.edu/~albright/mgl/. To evaluate the predictiveness of paradigm cells, the model is first fed with training data consisting of a set of lexical items, from whose inflectional paradigm one surface form is extracted as input and another as output. The model makes comparisons between the input and the output for each lexical item, identifying which segments they have in common and which segments they don't, and then writes a rule that describes the change from the input to the output for each pair, including the structural change (i.e. the parts the input and the output don't have in common) and the (phonological) contexts that condition the change (i.e. the parts both the input and the output share). As more wordspecific rules become available, rules with the same structure change are collapsed by generalizing their phonological contexts in terms of phonological features. The procedure is called minimal generalization, and the results are a set of generalized rules on the phonological contexts of each structure change attested in the training data. Mikheev (1997) to propose confidence score, a quantitative measurement for evaluating generalized rules. For each generalized rule, its estimated proportion of success p̂ is first calculated through the formula:

EVALUATING GENERALIZED RULES WITH CONFIDENCE SCORE. The model of Minimal Generalization Learner adopts the algorithm suggested in
̂= where x is the number of outcomes the rule correctly predicts, n is the number of input forms to which the rule is applicable.
While p̂ is a good indicator of the rule's accuracy, it fails to reflect its "ambitiousness": with p̂ remaining equal, the more forms a rule is applicable to, the more preferable it should be. Therefore, p̂ is assumed to be a sample from t-distribution with its degrees of freedom equaling n-1, so that the problem can be tackled by calculating the lower confidence limit π L . It indicates the lowest possible value for p̂ given a level of confidence. First, p̂ is adjusted to avoid zeros in positive (p̂) or negative (1-p̂) outcome probabilities: ̂ * = + 0.5 + 1 Hence: where α is the level of confidence and df (= n-1) is degrees of freedom. The coefficient of the tdistribution t df (1-α)/2 can be found in t-distribution tables or calculated by software using α and df. We use an α of 0.75 in the test.
Since higher values for π L favor rules that can generate a larger portion of correct outcomes and rules that are applicable to larger size of the data simultaneously, π L is an ideal criterion for rule selection, and it is called the confidence score. Albright (2002b:43-45), four criteria based on the confidence score are used for measuring the predictiveness of a candidate cell to other cells in the paradigm:

CRITERIA FOR EVALUATING PREDICTIVENESS OF A PARADIGM CELL. In
(1) ACCURACY: Accuracy evaluates a candidate's ability to reproduce the training data. For each input form in the training data, the model looks for generalized rules that are applicable to the form, and selects the one with the highest confidence score to check if it is able to derive the output correctly. Accuracy for each projecting direction is the ratio of the correctly derived outcomes to the sum of inputs for which there is at least one applicable generalized rule in this direction (2) MEAN CONFIDENCE OF RULES: Mean confidence of rules reflects whether rules constructed around a candidate are generally more preferable. It is the mean of confidence scores of all rules in a direction, whether they are winners or not.
(3) MEAN CONFIDENCE OF WINNING OUTPUTS: Mean confidence of winning outputs measures how confidently rules around a candidate form produce their outcomes. For each input form in the training data, the model looks for the rule that has the highest confidence score if there are any forms applicable to the input and marks it as a winning rule. Mean confidence of winning outputs equals the mean of the winning rules' confidence scores.
(4) AVERAGE WINNING MARGIN: Average winning margin indicates an input's ambiguousness by calculating the distance between the highest confidence score and the second highest one among all applicable rules. Average winning margin is the mean of distances between the two confidence scores across all inputs.
2.3. MATERIALS. Corpus data were collected from Referenzkorpus Mittelhochdeutsch (1050-1350) (Klein et al. 2016), from which all verb tokens (including finite verbs, infinitives, participles, etc.) in two dialects (Alemannic and Upper German) were extracted. The lemmata come not only from the Strong Classes but also from Weak Classes and unproductive classes like the Irregular Class.
Dialect Lemmata  Tokens  Alemannic  995  39735  Upper German  878  28468  Table 8. Numbers of tokens and lemmata in the data for each dialect For each token, person endings were manually removed to obtain its stem, and to cope with variations, only the forms with highest token frequency for each lemma and each stem were chosen to constitute the training data. The MHG paradigm consisted of six cells (the present singular stem, the infinitive stem, the past 1 st /3 rd singular stem, the past plural stem, the past subjunctive stem and the past participle stem), and since a set of training data selects one form in the paradigm as input and another as output, there are 30 sets of training data in total.
2.4.1 THE ALEMANNIC DATA. Since the scripts did not provide confidence score values for generalized rules other than the winning ones, information about mean confidence of rules could not be obtained. The results for the other three criteria based on the Alemannic data are shown in Figure 2. In Figure 2, darker color indicates higher values. On the one hand, the results for all three criteria are similar, in that mapping directions that have high values for any one of those criteria usually enjoy high values for others. On the other hand, it is not easy to identify a cell in the paradigm that is predictive of other cells in general, i.e. the single global base. On the contrary, relatively high mutual predictiveness can be found between several cells, e.g. the infinitive stem and the present singular stems, the past plural stem and the past subjunctive stem, etc.. Albright's assumption of subparadigm structure might be helpful for determining the global base and the organization of the whole paradigm.  Table 9. Values for accuracy of each mapping direction in MHG paradigm respectively found in the Alemannic data (values decisive for identifying local or global bases are in bold) The columns represent the cells from which the remainder of the paradigm are derived (i.e. the input cell), and the rows stand for the target cells derived from other cells (i.e. the output cell). In general, the past plural stem is the most predictive cell in general, in that the past participle stem, the past subjunctive stem and the past 1 st /3 rd singular stem can be derived most accurately from the past plural stem (see the values in bold). Nevertheless, the present singular stem and the infinitive stem cannot be reliably derived from the past plural stem as others are, but they have high mutual predictiveness. It can therefore be assumed that the infinitive stem and the present singular stem constitute a subparadigm, in which the infinitive serves as the local base and is derived from the past participle stem, a cell outside the subparadigm that enjoys the highest accuracy in predicting cells in the subparadigm. To conclude, the organization of Alemannic verb inflectional paradigm can be summarized as in Figure 3. The paradigm organization deduced from the Alemannic data indicates that when leveling happens, it is the past 1 st /3 rd singular stem, the past subjunctive stem and the past participle stem that should be replaced by the past plural stem, while the past plural stem would not be affected by leveling because it is the global base in the paradigm. This is contradictory to the attested fact in the history of German, e.g., the MHG past plural /bundən/ "we bound", /ɡultən/ "we applied" were leveled by the past 1 st /3 rd stem in NHG, let alone cases like NHG past indicative /ʦiː/ "I accused", /laːs/ "I read", which are innovatively built from the stem vowel of past plural stem and the stem-final consonant of past 1 st /3 rd singular stem in MHG.

THE UPPER GERMAN DATA.
The results based on the Upper German data are presented in Figure 4.  Similar to the Alemannic data, it is reasonable to assume a subparadigm in which the present singular stem and the infinitive are involved. However, a single global base in the Albrightian sense can't be identified, because there is not a single cell that has the highest predictiveness to most cells in the paradigm, e.g. the past 1 st /3 rd singular stem is most predictive to the past participle, the past plural is most predictive to the past 1 st /3 rd singular and past subjunctive, and the past subjunctive is most predictive to the past plural. If a subparadigm structure were to be established, most cells in the paradigm would have to be local bases, with hardly any cells remaining as non-base ones, and the structure itself would lose most of its explanatory power. Therefore, the best conclusion we can draw from the Upper German data should be that the results might not be compellingly accounted for by the paradigm organization assumed in Albright's model at all.

CAN FREQUENCY PLAY A ROLE?
Many approaches towards the identification of the base for leveling argue that token frequency can play a crucial role. Albright (2008) provides evidence of interaction between cell predictiveness and token frequency in determining the base for leveling. Tiersma (1982) also claims that markedness, which is empirically related to frequency, should be the main factor in determining the direction of leveling, in that the more marked (i.e. the less frequent) form is usually replaced by the unmarked one. However, in comparison to Albright's model, Tiersma (1982) argues that lexical items in a given category do not necessarily have the same base of leveling, depending on their differences in semantics and practical usage of language, which is termed as local markedness. For example, if the referent of a noun is a place, the locative case of the noun is likely to be unmarked and serve as a base for deriving the paradigm; if its referent is a tool or an instrument, the unmarked form in its paradigm might be the instrumental case. Frequency effects can be attested in the paradigmatic change from MHG to NHG to a certain extent.  I  172  373  247  62  21  268  Strong II  185  228  201  30  10  189  Strong III  677  825  970  189  132  231  Strong IV  697  631  1427  208  101  402  Strong V  940  935  1923  455  477  490  Strong VI   200  296  318  99  42  201  Strong VII  518  543  577  169  69  270  Weak  3479  3896  1985  595  160  1989  Table 11. Token frequencies of each stem across all verb classes in the Alemannic data ( Generally speaking, the past 1 st /3 rd singular stem is usually the most frequent among the nonpresent stems, and the past participle stem is the second most frequent, with few exceptions (e.g. the past participle stem is more frequent than the past 1 st /3 rd singular stem in Alemannic Strong Class I, the past plural and the past subjunctive stems are more frequent than the past participle stem in Upper German Strong Class V). Considering the conclusion drawn in the previous section, that the past plural stem is the most predictive one in the paradigm and could serve as the global base (at least in the Alemannic data), we could see that either the past 1 st /3 rd singular stem or the past plural stem might reasonably be the base of leveling among non-present stems. Nevertheless, new problems occur: the fact that the MHG past stems in Strong Class II were replaced by the past participle stem in NHG still remains unexplained, since the past participle stem is neither more frequent than the past 1 st /3 rd singular stem nor more predictive than the past plural stem. Besides, the explanation fails to account for why it was Class I that selected the more predictive stem as the base and it was Class III that selected the more frequent one. It should be equally reasonable that they had chosen the other one as the base.
2.6. PROBLEMS OF THE ASSUMPTION OF PARADIGMATIC LEVELING. The assumption of leveling within the theoretical framework of the Albrightian model does help us understand the mechanism of the paradigmatic change from MHG to NHG. However, some details in the paradigmatic change from MHG to NHG are not compatible with the fundamental framework of the Albrightian model. As mentioned above, the elimination patterns of stem vowel alternation and stemfinal consonant alternation in MHG past indicative forms do not always have the same base, and innovative forms created by combining the stem vowel of one stem and the stem-final consonant of another can be found. This challenges Albright's underlying assumption of one single surface base: instead of treating them as individual changes and identifying their bases respectively, the algorithm of the Minimal Generalization Learner should treat the stem vowel and the stem-final consonant as a whole, where structural change happens (Albright 2002b:38). Therefore, the generalized rules should offer predictions of the outcomes' stem vowels and stem-final consonants at the same time, and the single global base identified by the model should thus be predictive of both the stem vowel and stem-final consonant.
3. An alternative: Reanalysis of phonological features. Interestingly, comparison between the attested NHG forms and the expected descendants in NHG based on regular sound change shows some regularity in the paradigmatic change. We first classify Strong Classes into four groups based on stem vocalism. 13 3.1. CLASSES V, IV, III.
Pres If we focus exclusively on vowel quality, we can find that in NHG, present tense and past tense are associated with particular stem vowels in finite forms. On the one hand, present forms mostly carry /i/ or /e/ (both are non-low vowels) in its present forms. On the other hand, past finite forms are systematically related to /a/, a low vowel, while the MHG /u/, a non-low vowel, was driven out of the paradigm. Again, past participles are marked with non-low vocalism (i.e. /e/, /o/, /u/). It can be assumed that the loss of /u/ in MHG past plural stems might not be motivated by pure analogy, but a reinterpretation of phonological features as morphemes, so that the distinctions between the present and the past indicative stems, the past indicative and the past participle stems are simplified as the presence or absence of the phonological feature [low]. The paradigmatic changes attested in Strong Classes I and II can also be explained in terms of phonological features. First of all, all diphthongs (i.e. /ei/, /ou/) as stem vowel were leveled. It can be found that in NHG diphthongs can never be the stem vowel of a past or past participle form in any one of the Strong Classes. Also, the past indicative and past participle stems in these classes share the same stem vowel and are differentiated from the present stem by vowel height. In Strong Class I, the past indicative and past participle forms have a high vowel /i/ as stem vowel, while the present stem contains a non-high vowel /a/ (as part of the diphthong). The stem that is leveled has /eː/, which is non-high and consistent with the present stem rather than other past and past participle stems. In Strong Class II, it is the present stem that carries a high vowel, whereas the past indicative and the past participle stems do not in NHG. The vowel that is replaced in past stems is /u/, which has the [high] feature as the vowel of present stem does.
To summarize, the outcome of the paradigmatic change from MHG to NHG in Strong Classes I and II is that diphthongs do not occur as stem vowel in past forms, and the stem vowel of either the present stem or the past indicative and past participle stems has the feature [high].  Weinhold (1967), but it appears there are more similarities between Strong Class VI and VII. On the one hand, no paradigmatic change happened among past forms in Strong Classes VI and VII. On the other hand, the phonological patterns found in both classes are similar: the stem vowels of the past indicative forms are all high, and most of the present forms and the past participle form share the same vocalism. These characteristics distinguish Strong Classes VI and VII from other Strong Classes.
3.4. SUMMARY. The tendency towards phonological homogeneity of stem vowel found in the paradigmatic change of Strong Classes from MHG to NHG can be summarized in Compared to traditional description of NHG grammar, the table above gives a much more simplified picture of what has been changed from MHG to NHG in Strong Class verb inflection paradigm. For example, it indicates that in group Low, past indicative stems that have non-low vowels were replaced with those having low vowels, and in group Contrastive, past indicative stem vowels with the same values for [high] feature as present stem vowels were eliminated as well. This approach regards changes in vowel alternation patterns of German strong verb classes as independent of the leveling of stem-final consonant (Grammatischer Wechsel) and change in stem vowel quantity (as attested in Strong Classes IV and V), and therefore makes the creation of NHG stems by combining the stem vowel of one MHG stem and the stem-final consonant of another theoretically possible. By assuming phonological features as the decisive factor in paradigmatic change, this explanation does not need to select any cell in the paradigm as a base for leveling. We don't even need to assume the change to be caused by leveling at all. Though not thoroughly elaborated, the explanation based on phonological features seems to be simpler and  [-high]; when the past has γ and the past participle also has γ, it means the stem vowels are the same across the two cells. more persuasive in accounting for the multiple directionality attested in the paradigmatic change from MHG to NHG than predictiveness-based or frequency-based explanations.

Conclusion.
Though the paradigmatic change from MHG to NHG appears to be a process of leveling, the solution provided by the Minimal Generalization Learner in the framework of Albright (2002b), albeit possible, is problematic. By contrast, upon closer examination of the data, the assumption that the change was triggered by the reanalysis of certain phonological features as morphemes might be more promising and have more explanatory power.