Adjective ordering in Arabic: Post-nominal structure and subjectivity-based preferences

Adults have a collective tendency to choose certain adjective orderings in nominals with multiple adjectives. For example, English-speaking adults prefer the order big blue box over blue big box; they are uncomfortable with the latter ordering, yet they are unable to articulate why. Scontras, Degen & Goodman (2017) showed that subjectivity is a robust predictor of adjective ordering preferences in English. That is, less subjective adjectives are preferred closer to the noun. In the example big blue box, big is more subjective than blue, so it is preferred farther from the noun. This paper investigates adjective ordering preferences in Arabic, a language with post-nominal adjectives (i.e., a language where adjectives occur after the noun they modify). We have found that native speakers of Arabic have adjective ordering preferences, and, like English, these preferences are predicted by subjectivity. In addition to establishing the preference baseline in monolingually-raised Arabic speakers, we also ask what happens to ordering preferences in heritage speakers: bilinguals who shifted their language dominance from Arabic to English early in childhood.

1. Introduction. Adjective ordering preferences have received a lot of attention from the linguistics community. The cross-linguistic robustness of these preferences evidence cognitive properties that shape language. These preferences have been observed in languages with prenominal adjectives like English, Hungarian (Uralic), Telugu (Dravidian), Mandarin Chinese, and Dutch, to name a few (Dixon 1982;Hetzron 1978;LaPolla & Huang 2004;Sproat & Shih 1991), and in languages with post-nominal adjectives (e.g., Indonesian; Martin 1969). The findings suggest that the ordering of multi-adjective strings is non-arbitrary. Work by Scontras et al. (2017) demonstrates that subjectivity is a strong predictor of these preferences in English. In light of this finding, one wonders whether subjectivity predicts these preferences in post-nominal languages as well. Rosales Jr. & Scontras (2019) investigated this question in Spanish, a language with post-nominal adjectives. Results revealed that Spanish speakers do not exhibit adjective ordering preferences. However, the multi-adjective strings tested featured conjunctions of adjectives (e.g., blue and big box), which could be the reason for obtaining these results. Indonesian, another post-nominal language, was shown to have adjective ordering preferences (Martin 1969). However, the experimental data collected in Martin's study is no longer available for analysis and is not comparable to more recent work on ordering preferences. To see whether post-nominal adjectives are responsible for the lack of stable ordering preferences in Spanish, we investigated these preferences in Arabic. Arabic is a language with post-nominal adjectives like Spanish, but, unlike Spanish, Arabic multi-adjective strings are freely formed without conjunction.
In this paper, we investigate (i) whether Arabic possesses adjective ordering preferences, and, if so, (ii) to what extent adjective subjectivity predicts these preferences. The paper is structured as follows: We provide background on subjectivity-based adjective ordering preferences in English, then the relevant background on the structure of adjectival modification in Arabic. We then present the results of two experiments: the first measuring ordering preferences in native speakers of Arabic, and the second measuring adjective subjectivity in these speakers; comparing the results of the two experiments, we determine the predictive power of subjectivity in Arabic ordering preferences. We further investigate adjective ordering preferences in heritage speakers of Arabic, and conclude by comparing our results for both baseline and heritage speakers.
2. Background. Adjective ordering preferences have been studied for a while now, and most of the research in this field arrives at the same finding: in multi-adjective strings, some adjectives are preferred closer to the modified noun than other adjectives. These findings are not limited to English, but have proven to exist cross-linguistically. The robustness of these findings leads to the question of where these ordering regularities come from. Answers to this question have arisen from different approaches.
A null hypothesis would be that adults simply repeat what they have heard before. That is, adults produce the phrase the big blue box because they have heard it this way before. However, this hypothesis is unable to account for preferences in novel phrases previously unheard, as in the tiny green magical mouse-riding gnomes (Bar-Sever et al. 2018). A number of hypotheses have emerged discussing the source of these preferences. A lexical-class approach to this phenomenon presupposes that adjectives come pre-sorted into discrete semantic classes and are grouped together depending on their properties. For example, SIZE adjectives are grouped together and COLOR adjectives are grouped together. Moreover, some semantic classes are preferred closer to the noun than other semantic classes, forming a hierarchy of classes and their distance to the modified noun (see (1); Dixon 1982). That is, COLOR adjectives are closer to the noun than SIZE adjectives, for example.
(1) Quality > Size > Shape > Color > N ationality > N oun This hierarchy, however, does not explain why these preferences exist: why should the hierarchy be ordered as in (1), and not, say, in the reverse order? Other approaches suggest a relation between preferred distance and the meaning of adjectives. That is, adjectives closer in distance to the noun are closer in meaning to the noun (Sweet 1898), or inherit more characteristics of the noun (Whorf 1945). According to Sweet, in the example the big blue box, blue is closer to the noun in meaning than big, and that is why blue is closer to the noun. Sproat & Shih (1991) argue that adjective ordering is associated with absoluteness: adjectives which hold absolute properties occur closer to the noun, such as shape, color, and nationality adjectives; adjectives that hold relative properties such as size and quality occur farther from the noun.
Recently, Scontras et al. (2017) documented a robust empirical generalization concerning adjective ordering preferences in English. The authors show that the order of adjectives in multi-adjective strings is predicted by subjectivity: less subjective adjectives occur closer to the modified noun. In the big blue box, blue is considered less subjective than big; two people are more likely to agree on whether the box is blue, but whether or not the box is big depends on each person's experience with box sizes. That is, one person might consider a box big, because he or she is used to small-sized boxes, while the other person disagrees and considers it small.
The importance of subjectivity lies in the importance of successful communication: speakers aim to maximize their listeners' understanding of their utterances, and thus they minimize the distance between the objective adjectives and the nouns these adjectives modify. In an attempt to formalize this notion, recent work by Scontras et al. (2019) proposes that adjectives that are linearly closer to the noun are usually structurally closer. That is, adjectives are used to establish reference, and having more objective adjectives closer to the nouns aids in making that referent clear to the listener. Hahn et al. (2018) studied the source of this phenomenon from an information-theoretic perspective. The authors assumed that adjectives are usually used not to establish reference but to share attitudes and perspectives. Therefore, integrated with a memory limitation model, subjectivity preserves the intended goal of communication. In other words, because language processing is incremental, and people tend to forget what they have heard farther in the past, having more subjective adjectives farther from the nouns serves as a better strategy for sharing the speakers' mental state. Scontras et al. (2017) constructed two experiments to test their hypothesis. In the first experiment, they measured adjective ordering preferences using native English speakers' judgments about multi-adjective strings; they asked participants to indicate their preferences on phrases like the big blue box vs. the blue big box. Then, in their second experiment, they established a behavioral measure to determine subjectivity. They presented participants with adjectives and asked them how "subjective" each was on a scale from "completely subjective" to "completely objective". The authors then validated their subjectivity measure with a faultless disagreement task (Kölbel 2004;Kennedy 2013;Barker 2013;MacFarlane 2014). In a faultless disagreement task, participants encounter a scenario in which two people argue about whether an object has some property, for example whether a box is big or not. The participant has to determine if both people can be right or if one of them must be wrong. If the participant indicates that both people in the scenario can be right, then the adjective is subjective.
The methodology of Scontras et al. (2017) has been extended to Tagalog (Samonte & Scontras 2019) and Spanish (Rosales Jr. & Scontras 2019). Tagalog, a language with prenominal adjectives like English, does have adjective ordering preferences and these preferences are predicted by subjectivity. Spanish, a language with post-nominal adjectives where speakers prefer to use conjunction to form multi-adjective strings, does not exhibit adjective ordering preferences-at least not in multi-adjective strings formed with conjunction (cf. Scontras et al. 2020). We use the same methodology to test for Arabic adjective ordering preferences, a language with post-nominal adjectives where speakers do not have a preference to use conjunction in multi-adjective strings.
Arabic consists of two varieties: Spoken Arabic and Standard Arabic (also referred to as Modern Standard Arabic, MSA, classical Arabic, or literary Arabic). The two varieties are used in different situations. Spoken Arabic is used in communication in everyday life, while Standard Arabic is used in schools, media, politics, speeches, academic research, and formal documents. In the written form, Standard Arabic remains almost exclusively the only recognized language of literacy across the Arabic-speaking world. In Standard Arabic (and Spoken Arabic), adjectives attach to the nouns post-nominally as the default. While it is the case that pre-nominal adjectives are possible (see example (2); Fehri 1999), these adjectives come with special syntactic properties and have their own interpretations in Arabic grammar. Postnominal adjectives occur much more frequently, and they agree with the noun in definiteness, case, number, and gender. In this paper, we only consider post-nominal adjectives.
(2) labis-tu wore-I jamīl-a beautiful.ACC al-thiyab-i the-clothes-GEN 'I wore the (most) beautiful (of) clothes.' (3) al-korsey-u the-chair-NOM al-'azraq-u the-blue-NOM al-saghīr-u the-small-NOM 'the small blue chair' Although Standard Arabic is not spoken on a daily basis by Arab people, there are many reasons why this research is testing in Standard Arabic. First, all the spoken dialects of Arabic are tied to Standard Arabic. That is, dialects have a lot of shared vocabulary with Standard Arabic. The amount of vocabulary sharing varies from one dialect to another, but it is more likely that speakers can easily interpret Standard Arabic than interpret another dialect. Recent work by Abou-Ghazaleh et al. (2018) investigating the neural basis of the diglossic situation in Arabic using fMRI showed that there is no difference in brain activity between naming objects in Standard and Spoken Arabic. Given the diglossic situation in Arabic, and since our experiments present stimuli in written form, we use Standard Arabic.
In terms of adjective ordering, a lot of research has been conducted on the adjectival system and adjective constructions in Arabic (Al-Sharifi & Sadler 2009;Al-Shurafa 2006;Kremers 2003), but very little research has studied the ordering of multiple adjectives in one phrase. Sproat & Shih (1991) argue that Arabic adjectives are freely ordered when combined together, while Shlonsky (2004) suggests that this is not confirmed in the Arabic literature. Fehri (1999) proposes that post-nominal Arabic adjectives follow a mirror image order to the pre-nominal languages. Panayidou (2014) studies adjective ordering in Cypriot Maronite Arabic (CMA), which is a dialect of Arabic spoken by the Maronite community of Cyprus, and is highly influenced by Greek. Panayidou argues that adjective ordering in CMA is not flexible. However, none of the work mentioned above has employed experimental approaches to this phenomenon. Also, non of the work that suggested the presence of ordering preferences in Arabic gives an explanation of why this happens, or link this phenomenon to cognition.
In this paper, we use experimental data to show that Arabic in fact exhibits adjective ordering preferences. We replicate the methodology of Scontras et al. (2017), showing that these preferences are predicted by subjectivity. Then, using the same methodology, we explore the robustness of these preferences by studying English-dominant heritage speakers of Arabic.
3. Experiment 1: Ordering preferences in native Arabic. Replicating the methodology of Scontras et al. (2017), we begin by measuring adjective ordering preferences in native speakers of Arabic.
3.1. PARTICIPANTS. 135 participants were recruited via Mechanical Turk, a crowd sourcing service by Amazon.com. 24 participants were identified as native speakers of Arabic. To assess language status, participants were asked a series of demographic questions. For example, participants were asked what their first language is, and what language they use at home. We identified native speakers as those participants who indicated their first language as Arabic and who continue to speak Arabic as their dominant language today, and who lived in an Arabic speaking country before and after the age of eight for more than five years. To ensure that the participants were not answering randomly, we appended three simple Arabic catch questions which contained phrases that required diagnosing gender agreement, and asked the partici-   (4)). We only considered participants who marked all three catch questions correctly. 3.2. DESIGN. Participants were asked to indicate their preferences for multi-adjective strings. To do so, participants judged which order of a multi-adjective string sounded more natural. Multi-adjective strings were random combinations of 25 adjectives that came from 7 semantic classes (i.e., quality, age, texture, size, shape, color, and material adjectives; see table 1). The adjectives described nouns that were sampled randomly from a set of 10 nouns (see table  2). For example, participants were presented with two phrases: the Arabic equivalents of the small brown chair and the brown small chair, and they were asked which order they preferred (Figure 1). Where possible, adjectives and nouns were direct translations of the materials from Scontras et al. (2017). Participants answered by adjusting a continuous slider with endpoints labeled with the competing multi-adjective strings. The distance from the slider to each endpoint indexes the degree of preference. We used the slider position to compute the naturalness score for each adjective, which corresponds to how far away from the noun the adjective is preferred. If the participants, for example, preferred the brown small chair, small would have a naturalness score close to 0 since it is preferred closer to the noun, and brown would have a naturalness score closer to 1 since it is preferred farther away from the noun.
3.3. RESULTS. Figure 2 plots ordering preferences grouped by lexical semantic class. The results show that Arabic does have stable ordering preferences, despite also having post-nominal adjectives. Some classes are preferred closer to the modified noun (shape, color, material), while others are preferred farther away (quality, age, texture, size).   4. Experiment 2: Assessing subjectivity. After obtaining measures for ordering preferences, we next measured whether these preferences are subjectivity-based. To do so, we measured adjective subjectivity using a faultless disagreement task.
4.1. PARTICIPANTS. 135 participants who did not take part in the first experiemnt were recruited through Amazon's Mechanical Turk. Only 16 participants were identified as native Arabic speakers using the same assessment mentioned for Experiment 1.
4.2. DESIGN. Participants were given a scenario in which two speakers see the same object but they disagree about its description (see Figure 3). Then, participants were asked to evaluate whether both speakers can be right or if one of them must be wrong. Research by Scontras et al. (2017) shows that the faultless disagreement measure is highly correlated with an independent measure of subjectivity. That is, if two speakers can be right while they disagree on a description, then that adjective is subjective. For example, the extent to which two speakers can faultlessly disagree about whether a chair is brown indexes the subjectivity of the adjective brown.
Responses were averaged across participants to obtain a single subjectivity score for each adjective.  4.3. RESULTS. Comparing subjectivity scores with the ordering preferences from Experiment 1, we find that subjectivity is a strong predictor of adjective ordering in Arabic (r 2 = 0.76, 95% CI [0.57, 0.88]; see Figure 4). That is, more subjective adjectives such as jayid "good" and sayi' "bad" are preferred farther from the noun than less subjective adjectives such as khasabiy "wooden" or azraq "blue". Two people are less likely to faultlessly disagree about whether a table is wooden or not, or whether a ball is blue or red. These types of adjectives are objective. Whether cheese is good or bad, however, seems to be more subjective and dependent on a person's perspective; as in English, these adjectives are preferred father from the modified noun.

Experiment 3:
Ordering preferences in heritage Arabic. After obtaining results from baseline native speakers, we investigated whether English-dominant heritage speakers of Arabic exhibit similar ordering preferences to the native baseline. Heritage speakers are unbalanced bilinguals who have experienced a shift in language use from their home language to the dominant language of their society. This shift often results in bilingualism unbalanced in favor of the dominant societal language, but some abilities of the heritage language persist (Scontras  et al. 2015;Polinsky & Scontras 2020). In some cases, knowledge from the dominant language transfers to the heritage language. In other cases, heritage speakers simply fail to acquire the relevant knowledge, owing to a lack of suitable input, or they acquire the knowledge and then lose it in the absence of proper maintenance (i.e., without sufficient use). By studying the behavior of heritage speakers, we can understand how language functions in the mind of a bilingual. More specifically, we can examine how robust ordering preferences are to reduced input and maintenance. Importantly, with adjective ordering, we have seen that both English and the Arabic baseline order adjectives hierarchically with respect to decreasing subjectivity, so transfer from English would result in the same preferences. However, heritage speakers might transfer the linear order of adjectives from English to Arabic, rather than following hierarchical ordering preferences. If heritage speakers are attending to linear order and transferring that order from their dominant English, we should find heritage preferences that are the mirror image of baseline Arabic.
We measured heritage ordering preferences using a version of Experiment 1 adapted for heritage speakers. 5.1. PARTICIPANTS. We began by recruiting English-dominant heritage speakers of Arabic through Mechanical Turk, but we were unable to obtain a sufficiently large sample. So, we chose to take advantage of the large Arab-American community in Southern California and use an email chain and social media to reach out to different organizations and associations. Most of the younger generation raised in the U.S. context either did not speak Arabic, or did not read Arabic (which is important since our experiment is delivered in written form). We managed to recruit twelve participants who were identified as heritage Arabic speakers. English-dominant heritage speakers were identified as those whose first language was Arabic and whose main language now is English; participants who lived in an Arabic-speaking coun-  Figure 5: Naturalness ratings from Experiment 3 (heritage) grouped by adjective semantic class. Error bars represent bootstrapped 95% confidence intervals drawn from 10,000 samples of the data.
try after the age of eight were excluded, as were participants who failed to answer all three catch questions correctly. All of our twelve heritage speakers identified their Arabic dialect as Levantine.
5.2. DESIGN. Participants indicated their preferences for multi-adjective strings using the same methodology described for Experiment 1 above, with one modification: we added English translations of the instructions (without translating the multi-adjective strings). 5.3. RESULTS. Figure 5 plots results from heritage speakers grouped by semantic class. Although there is more variance overall compared to the native baseline-likely due both to the smaller number of participants and the heterogeneity inherent to heritage populations-we continue to see stable preferences: some classes are preferred closer to the noun, and others are preferred father away. Moreover, we see a similar qualitative pattern of results between the native baseline and the heritage results, suggesting that both exhibit subjectivity-based preferences. The one point of qualitative deviation from the baseline involves material adjectives, which baseline speakers prefer maximally close to the noun.

COMPARING HERITAGE ORDERING PREFERENCES WITH ADJECTIVE SUBJECTIVITY.
To evaluate the predictive power of subjectivity in explaining the heritage ordering preferences, we used the subjectivity scores obtained from native speakers in Experiment 2. Figure 6 plots the heritage ordering preferences against these subjectivity scores. While the heritage results are not nearly as robust as the native results, there is a positive correlation between the preferred distance of adjectives from the noun and the subjectivity of these adjectives (r 2 = 0.26).
6. Discussion. The results from Experiment 1 demonstrate that baseline native Arabic speakers do have stable adjective ordering preferences (pace Sproat & Shih 1991). Moreover, the results of Experiment 2 show that subjectivity is a reliable predictor of these preferences in Arabic: more subjective adjectives are preferred farther away from the noun. In Experiment 3, results from twelve English-dominant heritage speakers of Arabic show that heritage speakers have adjective ordering preferences that, as in the baseline, are predicted by adjective subjectivity. Not only does Arabic have subjectivity-based ordering preferences, but those preferences are robust to the reduced input and language maintenance characteristic of heritage speakers.
While a qualitative pattern of preferences similar to the native baseline is observed in heritage speakers, one group of adjectives stands out in the heritage data: material adjectives. Material adjectives are preferred closest to the noun in the native baseline, but fail to exhibit stable preferences in heritage speakers. Ongoing work investigates the special status of material adjectives in the heritage grammar, looking at frequency, length, and age-of-acquisition, together with possible effects from the dominant English grammar. Also, we note that the number of participants in the heritage experiment is half the number of participants in Experiment 1. Too few participants could lead to increased noise in our data, thereby obscuring the empirical picture.
In regards to heritage subjectivity, there is a clear correlation between the preferred distance from the noun and the subjectivity of the adjectives. However, this correlation is not very strong. The strength of the correlation suffers in part because of two sets of outliers. First are the adjectives 'hard' and 'long', which are preferred closer to the noun than we would expect given their subjectivity. The second group of outliers is material adjectives: 'metal', 'wooden', and 'plastic', which are preferred farther from the noun than we would expect given their subjectivity. However, despite these outliers in the heritage results, subjectivity does correlate with heritage preferences. A natural next step is to measure perceived subjectivity in heritage speakers. While unlikely, it is possible that these adjectives would no longer be outliers given heritage subjectivity estimates.