Morphological and phonological origins of Albanian nasals and its parallels with other laws

. The Albanian language is traditionally divided between the Gheg dialect to the geographic north and the now Standard Tosk dialect to the geographic south. Recent literature of the historically isolated dialect of Malsia Madhe (Dedvukaj 2022) has revealed a subdialect which has not undergone the specific phonological sound changes seen in both the Standard Tosk and Modern Gheg dialects. The Tosk dialect is distinct from the dialects of Gheg and Malsia Madhe (Malsia) in that it contains homorganic nasal-stop clusters in positions where they did not occur in various Proto-Indo-European (PIE) reconstructed forms. Three historical processes and the distinct ways in which nasal-stop clusters appear in Tosk are discussed: homorganic nasal assimilation that occurred in the 16 th -18 th century, the insertion of epenthetic stops due to sonority constraints and analogical extension, and constraints on the analogy that can be attributed to the Obligatory Contour Principle (OCP), which restricts Tosk Albanian to one nasal-stop cluster within a single morpheme.

(1) The majority of word-initial NT clusters are produced by a morpho-phonological process whereby the Proto-Albanian preposition *en developed into a prefix, attaching to stems with initial stops (e.g./ɛn/ + /paj/).The unstressed initial vowel deleted through a regular sound change (apheresis), and the nasal consonant assimilated to the place of articulation of the following stop, which subsequently voiced: /ɛn/ + /paj/ = [mbaj] 'hold, carry' ( §3).
(2) The second group of NT clusters formed by stop epenthesis can be divided into three subgroups: those phonetically motivated in (a) and (b), and those that arose through analogy in (c).The relevant processes are discussed in more detail in §3-4.
(28) Proto-Albanian *en-busa > Tosk Having introduced a vowel and a nasal in the form en-, an unstressed word-initial vowel preceding a NT cluster appears.The Malsia forms with initial stops appear to preserve the phonemic environment that was present in Proto-Albanian (4 th -7 th CE) and PIE (22-27).Take for example the formation of the Tosk Albanian adverb nga [ŋga] 'where, from where'.Nga is a shortened form of *ën-ka 'where, from where'.The preposition *ën 'in' is added to the old relative adverb ka 'where, from where' (see Orel 1998: 292, Fortson 2010: 455).Only in Malsia has the Proto-Albanian preposition en 'in' been preserved.In the other Albanian dialects this has been reanalyzed as a verbal prefix, thus losing its prepositional quality.Malsia en 'in' and ka 'where, from where' are separate lexical terms preserved from Proto-Albanian, whereas in Tosk Albanian these two terms attached and ultimately led to the formation of nga < *ën-ka.Note that there is progressive voicing assimilation of the stop and regressive place assimilation of the nasal in Modern Tosk.
If Malsia preserved the preposition en 'in' and the word-initial stops from Proto-Albanian, then the question that remains is during what period of Albanian did Tosk reanalyze the preposition en as a prefix?It is possible to position this change during the period of Old Albanian (16 th -18 th CE), which encompasses both dialects in the literary period of Old Gheg and Old Tosk.The chronological progress can be exemplified with the PIE terms *b h er-'carry' and *peh2-'protect', as shown in Figure 1.
In Proto-Albanian (4 th -7 th CE), these terms yield *baj 'carry' and *paj 'hold'.The Proto-Albanian preposition en 'in' from PIE *h1en 'in' developed into a verbal prefix by the time of Old Gheg and Tosk (16 th -18 th CE).The verbal prefix en-attached to the verbs forming *enbaj and *enpaj.The Old Gheg priest Gjon Buzuku wrote the form of <enbaj>, which means both 'to carry ' and 'to hold' (B. Demiraj 1997: 86).The progressive voicing assimilation of the stop and regressive place assimilation of the nasal occurred in Modern Tosk, as well as in Old-Gheg three centuries earlier, but the orthography of <enbaj> demonstrates the writer Buzuku's knowledge that en-is a prefix and distinct from the verbal root.Old Gheg and Old Tosk erased the distinction between *en-paj 'hold' and *en-baj 'carry', which merge into *em-baj 'carry, hold'.As the distinction between the verbal prefix and the root continued to weaken, the unstressed word-initial vowel deleted through apheresis./en/ now /n/ is reanalyzed into the root and incorporated into the underlying form, yielding Modern Tosk mbaj 'carry, hold'.Modern Gheg maj is a reduction from the cluster mb-.

Figure 1. Albanian en reanalysis
The diachronic progression of the prefix en-is most noticeable in the variation of the Old Gheg and Old Tosk writers.The 16 th century priest Buzuku wrote <peshtetete> '(s)he is leaning on', with a word-initial /p/.A century later the term was written by the priest Budi as <mpshtet> with an initial nasal /m/ < /en/.In other examples, Buzuku used terms with or without the prefix en-, while a century later most writers continually used the prefix.This would entail that the Proto-Albanian reconstructions with the prefixal en-(28-30), rather date to the period of Old Albanian (16 th -18 th CE).Malsia with the word-initial stops is the only Modern Albanian dialect that has preserved the phonemic voiced and voiceless stops of Proto-Albanian.Table 3 below outlines a diachronic development of NT clusters from Old Albanian to Modern Tosk.The specific Albanian variant is given under the century in which it surfaces.For the forms written in the 16 th -18 th centuries CE, the authors are cited in parenthesis under the specific term.(Standard Tosk) Table 3. Diachronic development of en-from Old Albanian to Modern Albanian Buzuku's usage of the prefix en-varied.Interestingly, he even wrote the same term with and without the prefix (ënginjën ~ ginjën 'satiate').The same variation is evident with the Gheg author Budi in the 17 th century (nginj ~ gīne).The Old Tosk writer Matrënga wrote the form glīrë without an initial nasal, whereas in Modern Tosk the prefix has been fully reanalyzed into the root as a homorganic cluster [ŋg] in [ŋgij]. 8The forms in the Malsia dialect show evidence of an earlier isolation and preservation of the Proto-Albanian stops (also evidenced in Figure 1). 8Old Albanian data taken from Schumacher and Matzinger (2013) 9 Orel (1998: 307).Cf.Sanskrit ápa 'away, off', Greek ἀπό 'from, Gothic af 'from'. 10Witczak (2016) "The earliest Albanian loanwords in Greek". 11See Fortson (2010: 451).To summarize, the process of apheresis was productive in Albanian, evident in Proto-Albanian reconstructions and loan terms.The Old Albanian en-reanalysis and historical apheresis accounts for the vast majority of Tosk word-initial NT clusters not present in IE.

4.
Nasal-stop clusters by sonority driven/motivated epenthesis.Epenthetic stops (also known as excrescence) surface in Tosk Albanian word-initially (37), medially through a mistiming of the articulatory gestures (38), and medially on a morpheme boundary (39).Example 40 is also a medial cluster but appears word-finally in the indefinite form due to a -∅ suffix in masculine nouns.There are few options for how to explain the appearance of word-initial, medial, and apparent word-final NT clusters.It could be generated by a phonotactic constraint in Albanian such as metrical or syllabification requirements, a sonority constraint, or it could be an analogical extension of apheresis which generates word-initial NT clusters.
4.2 MEDIAL CLUSTERS.Analysis reveals that these clusters can be motivated to optimize sonority.Albanian is not phonotactically stringent in terms of syllabification or sonority profiles, allowing up to (C)(C)CC structured onsets (çndryshk 'unrust') with a high degree of freedom in what type of consonants are allowed within that structure (Hysa, 2019: 192); nonetheless certain conditions can optimize these.First focusing on word-medial constructions, we can compare Gheg and Tosk equivalents (41), showing that the NT cluster in Tosk corresponds to the nasal in Gheg, requiring stop epenthesis to achieve the Tosk form.The medial cluster in (41) occurs at the syllable boundary between the nominal root and the case suffix.Knowing that we are inserting a stop word medially between the nominal root (pen-) and the case suffix -ë (singular feminine indefinite) it is possible to motivate epenthesis (42).The Syllable Contact Law in Murray and Vennemann (1993) states that at points of syllable contact, a language will attempt to maximize the sonority drop from the preceding syllable to the following syllable.This is what appears to be occurring between the nominal root and case suffix.The sonority drop from the nucleus of syllable-1 is maximized by the insertion of a stop before rising again with the onset of syllable-2.In Figure 2, the Latin loan pinna 'feather' would presumably surface in Tosk Albanian as *penë but in order to maximize the sonority drop from syllable-1 pen-to syllable-2 -ë, Tosk inserts a stop that receives laryngeal and place features from the nasal /n/.Also note the masculine definite forms in (44)(45)(46), where the NT clusters in Tosk appear in the medial position.epenthetic stop; *h2en-m(e)lit-> Gheg âm(e)li suggesting that the NT cluster in Tosk is the result of an epenthetic stop. 16 The medial cluster does not occur at a syllable boundary marked by a final article as in §4.2.Rather than positing analogy, it is possible to attribute the change to a miscontrol of the articulatory gestures.Hock (2021: 115-16) calls this "wrong timing epenthesis", citing the development of an oral stop between a nasal stop and a [-nasal] consonant.Examples of this type of change include pre-Greek *anros > andros 'of man' and pre-Greek *amrotos > ambrotos 'immortal'.The switch from -n-to -r-in andros is not completed at once.Hock (115) states, "Instead, the gesture of nasality is discontinued before the stop articulation comes to an end.The result is a stretch of an oral dental stop." In the Tosk Albanian feminine singular e ëmbël, masculine singular i ëmbël, and masculine plural të ëmbël, the cluster -mb-is not adjacent to /-l/.However, in the feminine plural të ëmbla, the epenthetic stop surfaces between a nasal and liquid consonant.The Gheg form të âmla provides the environment where the nasal -m-and liquid -l-are adjacent and a miscontrol of the articulatory gestures could have occurred ( 47).
(47) *të amla > të ëmbla Interestingly, Murray and Vennemann (1983: 520) state that according to the Syllable Contact Law a hypothetical pattern am.la is a preferred consonantal pattern to a form at.ia, because the consonantal strength values of l and m differ.Analogy for the origin of the wordinitial clusters will be posited in the following section, but in this instance, it appears as if the epenthetically generated stop in the feminine plural të ëmbla may have spread to all case forms of the adjective. 17 In the Italo-Albanian dialects (a Tosk variety), there are forms with epenthetic stops distinguishable from the Standard Tosk forms with a lone nasal.These include zëmbra < zemra 'the heart', and dimbri < dimri 'the winter' (see Çabej 2017: 89).The epenthetic stops in these forms occur in definite forms where the nasal [m] and rhotic [ɾ] are adjacent.These are two other cases just as *të amla > të ëmbla that can be attributed to a mis-control of the articulatory gestures.

INITIAL NASAL-STOP CLUSTERS BY ANALOGY TO APHERESIS.
The last group that cannot be accounted for with syllable contact epenthesis are the word-initial NT clusters.show that these initial clusters are historically from word-initial nasals.In fact, the insertion of a stop as seen in the Tosk forms actually interrupts the optimal sonority profile (see Figure 3 for Gheg ner ~ Tosk nder).

Figure 3. Syllable Sonority profile of Gheg ner and Tosk nder
There is however a model in the productive apheresis group in Albanian ( §3).This large group of words has undergone a process of cluster formation.Analogy with the apheretic group would assume that a stop is inserted in order to form the target word-initial cluster.It is also possible that sociolinguistic pressures to differentiate from clusterless Gheg speakers has reinforced this analogy in some forms and resulted in hypercorrection in Tosk.Gheg NT clusters that have recently arisen may also be due to this form of hypercorrection.Yet, in words like 56, the expectation given the findings with sonority repair is that in order to maximize the sonority drop, there should be a stop insertion.
(56) a. Tosk /mbɾəmə/ 'last night' *[mbɾəmbə] b.Malsia/Old Gheg /pɾɔ :m/20 The way to account for the failure to commit stop epenthesis by sonority repair is to apply the Obligatory Contour Principle (OCP), which will allow for only one marked element (the NT cluster) within a domain. 21As seen in 57a.-c., this domain is within the morpheme.Tosk permits multiple instances of a NT cluster if it occurs across morpheme boundaries, such as the compound in 57a.It will not allow for multiple NT clusters when the term consists of one morpheme (57b.-c.).
(57) a. /#kəmbə+t ͡ siŋgθ+i#/ b. /#meɾimange#/ 'spider' c. /#bind#/ 'convince' leg/peg+belt 'hopping on one foot' Finally, a lone word mbrenda 'within, inside' appears to violate this OCP law.But an analysis of its etymology confirms the use of OCP within the domain of the morpheme (58).The formation of mbrenda 'within, inside' consists of *en 'in' (the Old Gheg prefix from §3.1), two internal units *per 'for' and *en 'in', and a demonstrative *ta (see 58i.). 22Next is the wordinitial apheresis and voicing agreement of mbren-+ -da (ii).Due to these changes, this polymorphemic preposition became morphologically opaque and reanalyzed as monomorphemic to Tosk speakers (iii).Because of this reanalysis, except for in formal writing, mbrenda colloquially surfaces as brenda, obeying the OCP constraint and eliminating all but one nasalstop cluster within the morpheme (iv).
This usage of OCP has typological similarities cross-linguistically.The Indo-European law known as Grassmann's Law, operated in Ancient Greek and Sanskrit (59-60), disallowing two aspirated plosives from co-occurring in a word, eliminating the leftmost aspiration much in the same way Tosk eliminated the nasal in the leftmost cluster in (58iv.).
(59) Greek /tʰrík-s/ θρίξ 'hair'; /tríkʰ-es/ τρίχες 'hairs' (60) Sanskrit phal 'burst'; phalati (present), paphala (perfect) Other typological similarities include Lyman's Law in Japanese, which prevents sequential voicing in Japanese (rendaku).When two words are compounded, the initial voiceless consonant of the second element voices if it does not already contain a voiced obstruent (examples 61-62 taken from Vance et al. 2021).In this regard, it is similar to Albanian's OCP constraint only taking effect within the domain of the morpheme.
(61) /asa/ 'morning' + /ɸuro/ 'bath' → /asa-buro/ *asa+ ɸuro 'morning bath' (62) /umi/ 'sea' + /kame/ 'turtle' → /umi + game/ *umi-kame 'sea turtle' A similar phenomenon is also seen in Austronesian and Australian languages.Blust (2012) describes dissimilation patterns that affect geminates or nasal clusters, where there is a dispreference for more than one marked element.When there are two marked elements, one is eliminated, which Blust compares to Grassmann's Law.In summary, it is possible to constrain the generation of Tosk Albanian NT clusters with the Obligatory Contour Principle.With this evidence the following law can be adopted.o Rosenthall's Law: In Albanian, only one instance of a nasal-stop cluster is allowed within a morpheme, preserving any clusters belonging to the phonemic word at the expense of clusters generated by subsequent morpho-phonological processes.23

Conclusion. The nasal-stop clusters [mb],
[nd], and [ŋg], in Standard Tosk Albanian appear in word-initial, medial, and final positions.By comparing the dialectal variants, we are able to show that the Malsia Madhe dialect preserved word-initial stops from Proto-Albanian and Proto-Indo-European.The word-initial nasal-stop clusters in Standard Tosk Albanian are thus explainable by a rule that eliminated unstressed word initial vowels (16 th -18 th CE).This caused verbs with a prefix en 'in' to reanalyze the nasal into the beginning of the verb (undergoing place assimilation with the following stop).Thus, *en-paj 'to hold' > mbaj 'to hold, carry'.The same rule eliminated word-initial vowels from Proto-Albanian and in several loan words.Most cases of stop epenthesis in the word-medial environment are accounted for with the Syllable Contact Law, maximizing the sonority drop at the syllable boundary (penë > pendë).The medial cluster that could not be attributed to the Syllable Contact Law appears to be a case of wrong timing epenthesis (ëmbël).Instances of word-initial nasal-stop clusters that cannot be explained by either of the previous processes, may be attributable to analogy with the apheresis rule, which was highly productive in Albanian.These processes have their nasal-stop clusters limited by the Obligatory Contour Principle.This new constraint named Rosenthall's law, accounts for the appearance of only one nasal-stop cluster within the domain of a morpheme in Albanian. 3

Table 1 .
Positional distribution of standard Albanian nasals 2. Attested nasals and gaps.The first step is to confirm which nasals are attested in the Indo-European family.This includes checking Tosk Albanian against Proto-Indo-European (PIE) reconstructions and other Albanian dialects.2.1 ATTESTED NASALS IN INDO-EUROPEAN.The following examples demonstrate the origins of nasals in Tosk Albanian directly derived from PIE. Examples (3-5) demonstrate bilabial nasals in word-initial, medial, and final positions, and (6-8) demonstrate alveolar nasals in the same environments.("Albanian" will be used if the term is consistent cross-dialectally, otherwise the dialectal variant will be specified).UNATTESTED NASALS IN INDO-EUROPEAN.Examples (9-12) demonstrate where NT clusters can appear in Tosk Albanian and where they are not attested in PIE reconstructions or in the relevant IE languages.The bilabial distributions demonstrate that the NT clusters can occur in the same position as the IE nasals, and without attestation in those positions elsewhere in the IE Family.(The alveolar nasals have the same distributions as the bilabials).

19 5. Constraining Albanian nasal-stop clusters. Importantly
, NT cluster formation through apheresis, sonority repair, wrong timing epenthesis, analogical epenthesis/hypercorrection, and any phonemic clusters should not generate beyond what exists in Tosk Albanian.Examples (51-55) models a synchronic phonology in Tosk Albanian at the time of these changes.Note that only one nasal-stop cluster is permitted in the examples.'honor of the nation, hero', and i pa njer 'man without honor' (literally: without a person).For this analysis, the specific etymology is not important, rather that both possibilities position an original single nasal /n/ (Alb.njer or Latin honōrem), with an epenthetic /d/ in Tosk[ndeɾ].19Variation has also arisen in the Gheg dialects with the addition of clusters, albeit minor.One clear example that may be attributable to hypercorrection is found in Gheg Albanian Reader by Linda Mëniku.The standard genitive case gjinore is written as <gjindore> with a medial cluster -nd-(see Mëniku 2008: xix).