Sociolinguistic perception of lexical and syntactic variation among Persian-English b ilinguals

. This study examines the relationship between sociolinguistic perception and Persian language variation. Prior work has shown that preconceived notions about how speakers use language and what kind of language they produce can affect listeners’ perceptions (

In line with previous work, this investigation focuses on intra-community perceptions of Persian speakers' language patterns in different contexts: native speakers, heritage speakers, and L2 learners. Given previous work showing variation among heritage speakers and learners in relation to the native-speaker baseline (Megerdoomian 2020), this study examines the perceptual side of language contact interactions. This study aims to contribute to ongoing research of contact patterns that indicate, in diverse ways, how listeners formulate categorical assumptions and apply social meaning and formulate about speakers, relying on language use for extract and establish these different meanings. Further investigation into these patterns may offer more insight about how listeners, within various sociological contexts, position themselves and their interlocutors and go about their linguistic interactions based on preconceived notions about the kind of language use people around them produce. Such findings may help illuminate how generalizations, comprised of distinct categories, and formed based on native-speaker ideologies, are established across a wide range of individual speakers.
1.1. NON-PHONOLOGICAL VARIATION. By and large, much of the literature on sociolinguistic perception is derived from examinations of sociophonetic variation (e.g., Campbell-Kibler 2009;Foulkes & Docherty 2006;Hay et al. 2006;Johnson 2006). This body of research has been highly noteworthy in providing abundant evidence that language users incorporate social information in their language practices. And while phonetic data have contributed to these insights, there has been little work looking at other levels of grammar and language use such as morphology, syntax, and pragmatics, though some scholars have argued for more work in this realm (Cheshire 1987).
Despite this, recent work has begun to look at patterns beyond the phonetic level. For example, Squires (2013) examined morpho-syntactic variation in English subject-verb agreement, suggesting this type of variation subject to some of the same patterns observed with phonetic variables. Squires also notes that perceptual bidirectionality-the idea that linguistic variation and social perception affect one another equally-appears to exhibit a more asymmetric and unidirectional pattern. More specifically, it seems that certain types of linguistic information present more significant constraints on perception than other variables. The study at hand has implications for this question as well, addressing how listeners utilize forms of morpho-syntactic and lexical variation to arrive at conclusions of speaker persona. The suggestion that patterns of directionality in perception may favor one side could have implications about the heavy role that language plays in how people position themselves and their interlocutors at the interactive level.
In the same vein, Montrul and Sanchez-Walker looked at Spanish differential object marking (DOM) omission rate patterns among monolingual Mexican Spanish speakers and U.S. Spanish bilinguals (2013). Their findings showed that while monolingual speakers never omitted the object marker a for direct animate objects, both heritage speakers and firstgeneration immigrants did so at various rates of variability. Follow up studies have corroborated these findings, showing additionally that listeners perceive these omission patterns as acceptable (Montrul, Bhatt, & Girju 2015).
In terms of variation and change in language contact, recent work demonstrates many contact induced patterns beyond the phonological level among bilingual speakers in contact situations (Albirini et al. 2013;Montrul et al. 2019;Yager et al. 2015). These investigations, however, note that bilingual acquisition of certain features is not uniform across levels of grammar. Notably, the literature suggests that heritage bilinguals do not exhibit contact-induced change in syntactic constructions (Benmamoun, Montrul, & Polinsky 2006;Montrul, & Polinsky 2010;Bolonyai 2002;Paradis & Genesee 1996;Tsimpli et al. 2004) while simultaneously suggesting that lexical acquisition and inflectional morphology are more subject to change and attrition (Montrul 2004;Montrul & Potowski 2007;Tsimpli et al. 2004). While many studies have investigated sociolinguistic perception of phonetic variation, examining differences in the perception of other forms of grammar may offer fuller insight as to how language users utilize various parts of language in constructing socio-cultural norms and categories.
1.2. SOCIOLINGUISTIC PERCEPTION AND HERITAGE SPEAKERS. This study examines contact situations and, in particular, the sociolinguistic nature of interaction among Persian speakers in the U.S. Various studies have examined heritage speaker populations with the suggestion that these populations exhibit patterns divergent from native speakers and L2 learners and may be perceptually classified outside of constructed native-speaker norms (Chang & Yao 2016;Godson 2004;Polinsky 2018;Yeni-Komshian et al. 2000). Although the body of work observing linguistic patterns among self-identified heritage speakers is extensive and growing, there has been little work done in investigating the range and implications of sociolinguistic perception in terms of personae. However, there is evidence showing that bilingual speakers may construct internalized perceptions of their own language use-and that of other bilinguals-in the sense of speaking with less proficiency, despite that not matching their true competence (Montrul 2006). Although the assessment of attitudes is not relevant for the study at hand, what is important here is that previous work shows that strong social perceptions are at play among speakers in contact contexts. Investigating how these perceptions play out and how social meaning is applied across speaker identities may offer more to this developing literature.
In addition to the question of how heritage speakers situate themselves in the social terrain of identity, the question of how linguistic variation is perceived is relevant for looking at notions of nativeness and its effects. Here, I use nativeness to refer to the set of ideologies and attitudes that uphold certain forms of language use as the most "correct" or standard way of speaking. Not only does this yield the construct of the native speaker in relation to other speakers, but it also makes way for others to be positioned as non-native speakers, and, therefore, as possessing lessthan desirable language skills. The dichotomy of nativeness (i.e., native vs. non-native) plays a significant role in how speakers are positioned by their interlocutors (Brutt-Griffler & Samimy 2001). Additionally, native speaker ideologies do not just involve classifying a speaker based solely on their language use patterns, but these ideologies are also heavily informed by other structures such as race (Bonfiglio 2010). In this long-established paradigm that grants linguistic authority and credibility status uniquely to self-identified native speakers, the push for nonnative speaker advocacy as credible authorities over language use began in the early 2000s, specifically in the realm of pedagogy (Braine 2012). Questions remain as to how far such movements have progressed and to what extent native speaker ideologies permeate within certain linguistic communities.
1.3. IRANIAN AMERICAN LANGUAGE CONTACT. Compared to many other U.S. immigrant groups, the Iranian American community stands out as relatively new, with large waves of migration occurring both right before and immediately after the 1979 Iranian Revolution (Modarres 1998). Much of the Iranian American population has situated itself primarily in parts of California, with communities existing in other U.S. States as well. Modarresi (2001) notes that in the years of uncertainty following the revolution, most individuals of Iranian descent decided to settle permanently in the U.S., consolidating their communities and situating the sociocultural basis within which the next generation would grow up. Against this backdrop, Modarresi also points out that with the eventual permanent establishment of Iranian Americans as a diasporic community, various cultural institutions have emerged in which second-generation Iranians could become exposed to Persian language and culture.
Though still relatively stark, recent work has managed to document and observe some of the ongoing processes of cultural and linguistic maintenance at work in the Iranian American community. Though rapid assimilatory processes of Americanization have been observed, many Iranian Americans may achieve strong language maintenance through a variety of means, such as family home policies, educational spaces, and cultural institutions (Bozorgmehr & Meybodi 2016;Kaveh 2018;Salahshoor 2017). Taken together, this suggests that many Iranian American parents are actively engaged in maintaining Persian linguistic practices well into the second generation. The process of Americanization though, mentioned above, is still a strong sociopolitical force that may manifest itself highly in linguistic practice (Baker 2006; Morales 2016).
1.4. PERSIAN LANGUAGE VARIATION IN THE U.S. At its current state, the body of sociolinguistic knowledge on Persian heritage language variation is still a growing realm of inquiry with many questions left unanswered. Most research in this area has focused specifically on phonological variation (Sheikhbahaie 2020), and pedagogy especially in conjunction with second language learners (Atoofi 2013a(Atoofi , 2013bSedighi 2010;Taleghani 2020).
Notwithstanding some of the gaps in the literature, there has been some work addressing other levels of grammar from a native vs. non-native perspective. Megerdoomian (2020) examined linguistic competence among heritage speakers and learners in comparison to native speakers on a wide range of measures, including morphosyntax. Using a variety of distinct tasks, Megerdoomian showed that all three groups differed from each other in several ways, with morphosyntax, syntax, and lexical variation yielding a wide range of proficiency levels among non-native speakers (heritage speakers and learners). Megerdoomian specifically claims that while heritage speakers tend to demonstrate near-native proficiency of certain syntactic features such as negative polarity and outperform learner groups in these areas, elsewhere learners may outperform them. This is especially the case with conjunctions and prepositional choices, where heritage speakers tend to show evidence of English transference.
Given evidence suggesting variability in the language practices of self-identified Persian heritage speakers, it may come as no surprise that many within this population exhibit distinct patterns of acquisition and practice relative to constructed native norms. Given this and the various maintenance efforts that exist within the U.S. Persian speaking community, the dichotomy of native vs. non-native and heritage vs. non-heritage that enters into the sociocultural model of linguistic interactions between Persian bilinguals are important distinctions to examine since the way speakers are positioned, based on these norms, could affect their future language choices and how such variation is exhibited overall.
1.5. CURRENT STUDY. Overall, previous work suggests a complex array of patterns in the linguistic practices of bilingual Iranian Americans. Working beyond the scope of Persian as a heritage language, ongoing research in this realm could have implications regarding the nature of language contact that bilingual speakers-and specifically, heritage speakers-find themselves in. Given the amount of evidence already showing how much our linguistic practices are contextualized within frameworks of social meaning, it would be noteworthy from a theoretical perspective to see how these social dynamics play out in interactions affected by language contact. This study aims to answer the following questions: (1) in what ways do bilingual Persian speakers classify other Persian speakers, and (2) what grammatical features make it more likely that a speaker will be classified as a native speaker, heritage speaker, L2 learner, and the various  The reason for these features is due to prior work demonstrating some level of variability in usage rates between native and non-native speakers, including heritage populations. Differential object marking, while not directly investigated in immigrant groups, has been noted by some scholars to be a feature that does not present salient difficulties for heritage speakers (Sedighi 2018). Although we could frame morpho-syntactic variation on the binary basis of "correct" versus "incorrect" choices, lexical borrowings present another kind of variation that may offer distinct insights into how speakers view the use of words of non-Persian origin.
5 categories between these specific personae? While the scope of this study only includes the perceptual realm of speaker classification, results could suggest other patterns that Persian bilinguals may rely on when socially categorizing their interlocutors. For example, speakers may rely on the assumption that someone is a learner and, as such, use that information in future interactions with the same interlocutor which could then affect future language practices with said interlocutor.
The next sections outline my current study as such: Section 2 outlines the methods used to examine social perception among bilinguals. Section 3 outlines the overall results with regression models for each response type along with likelihood ratings. Finally, sections 4 and 5 discusses implications of these results, goes over study limitations, and draw up conclusions that may influence future studies and work within the realm of sociolinguistic perception and Persian language contact.

Methods
2.1. PARTICIPANTS. Native and heritage bilinguals from the Iranian American community were recruited through online advertisements and personal contacts (n = 16, 10 native speakers). Postsurvey questionnaire responses helped index participants' linguistics backgrounds. Because of the vast level of variation present among speakers classified as "heritage" speakers (Polinsky & Kagan 2007), the questionnaire served to obtain rough estimates of each participant's linguistic background, including frequency of speaking and listening, and level of proficiency in speaking, listening, and writing.
2.3. PROCEDURE. The survey was designed in the format of a game in which participants read a set of sentences and had to choose who they believe may have written each one. Three fictional characters with animated pictures representing a native speaker, a heritage speaker, and an L2 learner were used for the response choices. Short descriptions and an image of each character were given (example in Figure 1), with the intention of suggesting these respective speaker identities to the participants. Careful attention was paid to the attire and name choices of each character, to accurately follow daily sociocultural guidelines. The purpose of this detail was to provide participants with a model persona indexed for each speaker identity and see if any perception of language use was tied to each respective persona.
After the characters were introduced, participants were told they would see twenty-six sentences written in Persian script along with a romanized transliteration. The transliteration was provided to level the stimuli for any speakers who were potentially not familiar with Persian script. Participants were then instructed to read each sentence and guess who they think may have written the sentence. Response choices could consist of one character, two, or all threeprocedure instructions noted that participants could choose all three if they were unsure of whom to guess. After the survey, participants were directed to a post-survey questionnaire asking questions about their background and Persian language experience.
The survey task was not forced choice. As such, participants could choose one persona, or a combination of several personae: (1) Native Speaker, (2) Heritage Speaker, (3) L2 Learner, (4) Native Speaker and Heritage Speaker, (5) Heritage Speaker and L2 Learner, (6) Native Speaker and L2 Learner, and (7) All Speakers.  The former was omitted because this was framed as the choice that participants could use if they were uncertain about what kind of speaker may have produced a certain token. The latter was omitted due to it only being chosen a handful of times across all participants.

3.
Results. This analysis aims to examine how bilingual Persian readers perceive different morpho-syntactic and lexical forms to correspond to personae with different linguistic backgrounds: heritage speaker, native speaker, and L2 learner. Although the procedure was designed to visually present three choices, the survey was not a forced-choice task such that participants could select combinations of answers: (1) Native Speaker, (2) Heritage Speaker, (3) L2 Learner, (4) Native Speaker and Heritage Speaker, (5) Heritage Speaker and L2 Learner, (6) Native Speaker and L2 Learner, and (7) All Speakers. (6) and (7) were excluded from the overall analysis (see Section 2.4 for exclusions).  Figure 3 shows the number of both standard/correct and non-standard/incorrect tokens that were classified into each respective category, suggesting that most tokens tagged as incorrect (grammatical errors or constructions deviating from standardized norms) were classified into non-native speaker categories. This is not a surprising result as, based on prior work looking into variable patterns of non-standard language use among non-native populations (Megerdoomian 2020), it is most likely the case that many Persian speakers will consider these types of constructions as not coming from a native speaker.
To calculate maximum likelihood effects for grammatical category in choosing a particular persona and modeling variable outcomes, a generalized logistic mixed-effects regression model was designed for each persona response, given the current data. A generalized linear mixed Preliminary results suggest that for the native speaker persona response choice, lexical borrowings (estimate = -3.4333, Pr = 0.00479) and differential object marking errors (estimate = -2.9746, Pr = 0.00859) were significant in the likelihood that participants would rate a sentence production as not coming from a native speaker. For the heritage response choice, no fixed effects appeared significant. For the learner response choice, differential object marking errors (estimate = 1.6906, Pr = 0.0341) and negative concord errors (estimate = 1.7677, Pr = 0.0291) were significant, suggesting that perceived mistakes in these grammatical classes made it more likely that a person would be perceived as a learner. Interestingly, subject identity also turned out significant for both native speaker persona response (estimate = -2.6846, Pr = 0.000755) and L2 Learner persona response (estimate = 2.6608, Pr = 1.73e-06). This suggests that native speakers were more likely to rate incorrect productions as coming from L2 learners than from native speakers, which may not surprising if we consider attitudes about L2 learners possessing relatively lower language proficiency.    Figures 4-6 respectively show the likelihood rating values for each persona category, within each grammatical class. One key observation is that, in Figure 4, heritage speaker participants appeared more likely to rate non-standard stimuli productions as coming from a native speaker. This is visible most grammatical categories besides lexical variation, in which both types of participants appeared to rate borrowings as not coming from native speakers at similar levels. For the heritage speaker response likelihood ( Figure 5), results visually appear more mixed. In the lexical borrowing category, native speaker participants were much more likely to rate these productions as coming from heritage speakers, though this seems negligible given the lack of statistical significance. Future work aims to increase n power in the data to better analyze how this population perceives grammatical and lexical variation. Overall, heritage speakers appeared slightly more likely to rate non-standard productions as coming from a heritage speaker. In Figure 6, native speaker participants appeared much more likely in almost all grammatical categories, save for lexical borrowings, to rate non-standard stimuli productions as coming from an L2 Learner persona. This is especially the case for prepositions, where one can observe a large difference in the response likelihood values between heritage speaker participants and native speaker participants. Regarding the lexical borrowing category, the low likelihood of tagging these productions as belonging to a learner suggests that native and heritage speakers are aware of the common use of these borrowings in the Persian lexicon. More work is needed to corroborate this.
Overall, between both subject groups, statistical analyses suggests that non-standard productions in differential object marking, lexical borrowings, and negative concord carried more weight in terms of how likely someone might be rated as a learner versus a native speaker. Participants seemed to rely on object marking as a sign of whether someone was either (1) not a native speaker, or (2) an L2 learner. Participants were also more likely to rate non-standard borrowings as not coming from a native speaker, and non-standard productions in negative concord as coming from a learner. Additionally, due to significant subject level effects, native speaker participants were more likely to rate non-standard productions as not coming from a native speaker.

Discussion and llimitations.
The purpose of this study is to explore what kind of grammatical cues are most salient for bilingual Persian listeners/readers in classifying a production as coming from either a native speaker, heritage speaker, or L2 learner. Exploratory analyses suggest that speakers tend to assign non-standard productions to non-native speaker personae, with significant results suggesting potentially strong perceptions in rating certain productions as either (1) not coming from native speaker, or (2) directly coming from an L2 learner. Productions deviating from standard forms were sometimes assigned to heritage speakers as well, though this seemed negligible, and more data is needed for further and more complete analysis.
Based on the likelihood analyses looking at grammatical categories, differential object marking appeared to be the category that bilinguals gave the most perceptual weight to. In other words, perceived errors in this category appeared to make it more likely that sentence productions would be perceived as belonging not to a native speaker.
While these results are based on a small amount of data, these preliminary results suggest that bilingual speakers are conscious of and construct associations with forms of variation that stem from language contact. In the case of lexical borrowings, Persian speakers associate the heritage speaker persona repertoire as being influenced by English lexical choice patterns. This is interesting given that anecdotal evidence suggests that English borrowings are used extensively by speakers categorized in the native speaker population. In the case of grammatical correctness or deviations from standardized forms, results corroborate expectations about how Persian bilinguals associate learner speech. While non-standardized lexical variation was assigned primarily to heritage speakers, other forms of syntactic variation that could be classified as "incorrect" were assigned to L2 learners. This is not surprising either if we think about the influence of native speaker ideologies in categorizing speaker productions. In the case of Persian language interactions, non-standardized forms of syntactic and morpho-syntactic variation are perceived to be produced by learners because these forms are simply regarded as "incorrect" and, therefore, not likely to be heard or seen from a native speaker (or even a heritage speaker).
It is important to note that this study is ongoing and rests at a preliminary stage. At this point, only sixteen participants have fully completed the study, which includes the post-survey questionnaire that asks for background information and language experience. As such, with more participants recruited for this study, results from exploratory data analyses may change over time and reveal other patterns at play in the sociolinguistic perception of lexical and syntactic variation. Another limitation stems from the survey design. In particular, the questions were not forced choice. As such, while participants were presented with three options (i.e., native speaker, heritage speaker, L2 learner), there were seven total response types since participants had the choice of choosing more than one speaker persona. In the analysis of mixed-choice responses, although there are seven response categories treated as unique values, the unique associations that listeners construct for linguistic variables might not show up in the data. However, because this study aims to observe how listeners and readers perceive speaker productions without introducing forced choices that may hinder us from observing real-world scenarios of perception.

Conclusion.
The data at hand suggest that certain grammatical categories such as object marking, and lexical variation could predict how someone will classify their interlocutor. As mentioned previously, a small number of participants have completed the entire survey, and this is part of an ongoing project, so data analysis presented here is subject to change in the future. In any case, preliminary results point to possible insight in how Persian-English bilingual speakers perceive their interlocutors and what kind of grammatical features are utilized in forming these perceptions.
Future studies may expand on these findings to examine not just perceptual patterns and social classification, but also attitudes associated with perceived speaker personae. Examining the kind of judgements associated with categorized speaker backgrounds could be revealing as to the choices that speakers (especially heritage speakers and learners) tend to make due to having conscious awareness of these social attitudes. In addition, since this study exclusively utilizes written data, future work could examine sociolinguistic perception in spoken interactions, which may reveal other linguistic cues that are used by listeners to socially categorize speakers.
Overall, investigations into sociolinguistic perception can reveal a plethora of information about our daily interactions within the diverse array of contact situations that speakers constantly find themselves in. The multilingual Iranian American community is just one community that finds itself in a world of continuous linguistic contact, as the same can be said for many other communities of practice. Within and across all these communities-many of which are formed based on all types of experience-investigations in perception are bound to offer more insight about how such patterns play a large role in our everyday interactions and relationships.