The semi-complementizer shuō and non-referential CPs in Mandarin Chinese

The empirical focus of this paper is the syntactic status of the semicomplementizer shuō grammaticalized from verbs of saying, in Mandarin Chinese. Such elements have been shown to exhibit atypical patterns compared to that in English, which triggers discussions of whether shuō should be analyzed as a complementizer (Paul, 2014; Huang, 2018). This paper presents novel data surrounding the distributional patterns of shuō and argues that shuō is a C head that introduces a subtype of CPs called non-referential CPs, following de Cuba (2017).

(1) a. 張三 先 說 Zhāngsān xiān shuō Zhāngsān first say 'Zhāngsān says first.' b. 我 總是 覺得 說 生活-裡 缺-了 點兒 什麼 Wǒ zǒngshì juéde shuō shēnghuó-lǐ quē-le diǎnr shénme I always think say life-in lack-PERF little something 'I've always felt that there is something a little lacking in my life.' (Fāng 2006: 109) c. 我 打算 說 到 英國 留學 Wǒ dǎsuàn shuō dào Yīngguó liúxué I plan say arrive UK study.abroad 'I planned to study abroad in the UK.' On the basis of the distributional pattern of shuō in (1b), there has been a tradition in analyzing shuō (and their counterparts in other Sinitic languages) as a complementizer (Yeung 2006;Hsieh & Sybesma 2008). However, this is by no means an undebatable conclusion, which has been challenged by some problematic cases where shuō cannot occur in clausal subjects, which is an environment that should be possible if shuō is a genuine complementizer (Paul 2014). Recent analyses make a further observation that shuō occurs not only in finite clauses (1b), but nonfinite clauses (1c), and defend the position of shuō as a complementizer by analyzing shuō as a verbal suffix at PF (N. Huang 2018), which can arguably cover some atypical patterns compared to genuine complementizers.
In this paper, I claim that shuō is a complementizer (C), and crucially that shuō introduces a specific type of CP called non-referential CPs, in the sense of de Cuba (2017). By looking into the patterns of shuō in more detail, I show that there exist environments in which shuō is consistently ruled out, which suggests that shuō only introduces a subtype of clauses (including finite and non-finite clauses), that are non-referential. This proposal then argues that shuō is not entirely optional (cf. Chappell 2008) and that the two variants of shuō that occur in both finite and nonfinite clauses can be unified (cf. Huang 2018).
The paper is organized as follows. Section 2 reviews recent proposals about the syntactic status of the shuō. Section 3 presents an overview of the data patterns in more detail, with the special focus on the environments where shuō cannot occur, which has not been discussed in previous analyses. Section 4 discusses the notion of (non)referential CPs in the sense of de Cuba (2017) and their attestation in Mandarin Chinese. Section 5 discusses the optionality of shuō. Section 6 proposes the syntactic position for shuō and its implications for the atypical patterns of shuō as a complementizer. Section 7 discusses the predictions that this proposal makes towards the counterparts of shuō in other Sinitic languages. Section 8 concludes the discussion.

Is shuō a complementizer?
There has been a tradition in analyzing shuō as a complementizer, which can be evidenced by the syntactic tests such as aspect marking (2a) and constituency test (2b), borrowing Yeung's diagnostics for the Cantonese semi-complementizer waa6 (Yeung 2006).
(2) a. 張三 以為 說-(*過) 李四 沒 來 Zhāngsān yǐwéi shuō-guo Lǐsì méi lái Zhāngsān believe say-ASP Lǐsì NEG come b. 張三 想-著 說 今天 不-去 並 且 說 明天 也 不-去 Zhāngsān xiǎng-zhe shuō jīntiān bù-qù bìngqiě shuō míngtiān yě bú-qù Zhāngsān think-DUR say today NEG-go and say tomorrow also NEG-go 'Zhāngsān is thinking that he is not going today and that he is not going tomorrow.' First, the fact that the shuō following the verb yǐwéi 'believe' cannot be aspectually marked in (2a) suggests that shuō lost its verbal property and acts as a complementizer. Second, while the constituency test in (2b) cannot directly target the complementizer status of shuō, the fact that shuō can form a constituent with the following complement in conjunction falls out naturally if shuō is analyzed as a complementizer that introduces clausal complements. Paul (2014), on the other hand, provides counterexamples that constitute serious challenges to the complementizer approach (this part of review is modified based on N. Huang's 2018 paper). First, shuō cannot appear in fronted embedded clauses in afterthought constructions (3a); Second, shuō cannot head a sentential subject (3b); Third, shuō cannot be stranded (3c); Finally, there is often a pause after shuō, but crucially not between shuō and the preceding verb (cf. (2b)). N. Huang further observes that shuō cannot appear in the clausal complement of a noun (3d).
(3) a. This set of examples on the surface undermines the complementizer approach since a genuine complementizer like English that does not exhibit this restriction, as shown in the translation in (3a, 3b, 3d). N. Huang (2018) recently proposes an analysis in defense of shuō as a complementizer while aiming to explain the problematic cases in (3a-3d). First, he proposes that there exist two variants of shuō that occur in finite (e.g., complements introduced by verbs such as yǐwéi 'believe') and non-finite contexts (e.g. complements introduced by verbs such as dǎsuàn 'plan') respectively, as shown in (1b) and (1c), due to different sets of syntactic properties related to either of the contexts respectively. Second, he specifically analyses shuō as a verbal suffix at PF, following Bošković & Lasnick's (2003) analysis of null C in English, which shows similar patterns in being disallowed in clausal subjects or in clausal complements of nouns. With shuō being a verbal suffix, it must immediately follow a verb when syntactic structures are linearized at PF. This proposal then accounts for the ungrammaticality in (3a) and (3b) with shuō at the sentential-initial position 2 , because in these two cases there are no preceding verbs that the shuō can attach to, as well as the pro-sodic pattern that shuō forms a unit with the preceding verb. For (3d), Huang argues that there exist some functional elements in the embedded left periphery that prevents shuō from suffixing onto the preceding verb.
Two further implications of Huang's analysis are (i) shuō can basically co-occur with all kinds of verbs as a suffix at PF so long as there is not intervening material between the verb and shuō, and (ii) the two variants of shuōs in finite and non-finite clauses are two different elements syntactically. This paper proposes an alternative approach by obtaining a more comprehensive picture of data surrounding shuō. It will be shown that shuō consistently fails to occur in a set of environments, which is left unexplained in the existing analyses. The renewed look at the pattern of shuō in turn points to a unified analysis of the two shuō that occur in finite and nonfinite contexts, which then explains the atypical patterns of shuō shown in (3) as a complementizer.
3. Data Patterns in more detail. Previous literature offers a large corpus documenting the environments where shuō can occur, which is conditioned by the degree of grammaticalization. Chappell (2008Chappell ( , 2017 offers an implicational hierarchy of verb classes that can co-occur with semi-complementizers in Sinitic languages (including shuō in Mandarin), which can be divided into five stages (4).
(4) Stage I (Quotative constructions) à Stage II (Speech act verbs, e.g. 'ask', 'tell') à Stage III (Cognition verbs, e.g., 'think') à Stage IV (Perception, emotion, and stative verbs, e.g. 'be worried') à Stage V (Modal verbs, e.g. 'be necessary'). Chappell (2008Chappell ( , 2017 argues that the shuō in Beijing Mandarin has reached Stage IV, by which shuō can occur not only in quotative constructions (the least grammaticalized stage) and speech act contexts (5a), but crucially in complements introduced by cognition verbs such as the verb juéde 'feel' (1b), repeated in (5b), and perception, emotion, or stative verbs, such as dānxīn 'be worried'(5c), which renders the lexical usage of shuō 'say' highly unlikely. Chappell also notices that the shuō in Taiwanese Mandarin exhibits even a higher degree of grammaticalization by which it can co-occur even with modal verbs, signaling that it has reached Stage V, but no examples are provided 3 .
(5) a. 他 就 告訴 說 他 姑姑 來 了 Tā jiù gàosù shuō tā gūgu lái le he then tell say he aunt come ASP 'He then told that his aunt had came.' b. 我 總是 覺得 說 生活-裡 缺-了 點兒 什麼 Wǒ zǒngshì juéde shuō shēnghuó-lǐ quē-le diǎnr shénme I always feel say life-in lack-PERF little something 'I've always felt that there is something a bit lacking in my life.' c. 他 就 會 擔心 說 這-個 孩子 以後 會 怎麼樣 Tā jiù huì dānxīn shuō zhè-ge háizi yǐhòu huì zěnmeyang he then will be.worried say this-CL child afterwards will how 'He is then worried that how this child will become afterwards.' (Beijing Mandarin from Fāng (2006)) To summarize, shuō in Mandarin (including both Beijing and Taiwanese Mandarin) can cooccur with a wide range of verbs. These distributional patterns form the basis for the existing syntactic analyses (see Section 2). However, no special attention has been paid to the environments where shuō is consistently not allowed. The only place where the discussion of the environment where shuō cannot occur is one note made by Chappell in which she claims that when the grammaticalization of semi-complementizers reaches upon the Stage V (e.g., modal 3 In Chappell's documentation of the shuō counterparts in other non-Mandarin Sinitic languages, Hakka koŋ 31 has reached Stage II, Cantonese waa6 has reached Stage III, Taiwanese Southern Min koŋ 51 has reached Stage V, which is the most highly grammaticalized complementizer of all the Sinitic languages, as shown by the well-attested cooccurrence between koŋ 51 and modal verbs (i).
(i) I thâu-náu hó bô-it-tēng koŋ 51 I toh gâu chò seng-lí. 3SG brains good NEG.necessary say 3SG then clever do business 'Having good brains does not necessarily mean that you are good at business.' (Chappell 2017) complements), 'expansion of verb classes taking the complementizer to other kinds of factive verbs may take place, with the potential of broadening the scope of further verb classes in an unrestricted manner (Chappell 2008: 62)'. In other words, factive verbs seem to behave as a cutting point along the scale of grammaticalization, after which the grammaticized complementizer can finally behave like a genuine complementizer. Interestingly, existing literature similarly shows the deficiency in detailed documentations of shuō occurring in factive complements. This paper tries to fill the gap by exactly offering a more detailed look at the environments where shuō cannot occur. My data collection of Beijing Mandarin indicates that the shuōs occurring in complements introduced by factive verbs such as emotive factives like gāoxìng 'be happy'/shāngxīn 'be sad' and hòuhuǐ 'regret'/fǎngǎn 'resent' 4 , are generally degraded compared to all the attested contexts (Stage I -V), which are all non-factive, as shown in (6) and (7) In other words, the occurrence of shuō is not random. In particular, shuō is incompatible with factive contexts and all the attested examples are those shuō that occur in non-factive contexts. It is unclear how the existing approach can capture this distributional discrepancy of shuō. In particular, N. Huang's approach that analyzes shuō as a C and a verbal suffix predicts that shuō is allowed to appear so long as there exists a verb that it can attach to in LF. In Section 4, I will argue that the very fact shuō is ruled out in factive contexts suggests that shuō heads a specific type of CP, which in turns explains the problematic cases Paul (2014) reports in (3) and the two variants of shuō in finite and non-finite contexts (1b) and (1c).

Referentiality of CPs.
Recall from Section 2 and Section 3 that the attested instances of complementizer shuō mainly occur in non-factive contexts, shuō occurring in factive contexts are generally degraded. de Cuba & Ürögdi (2009) One immediate question of adopting this proposal is that whether there exists independent evidence for the referentiality distinction in the clausal domain in Chinese. de Cuba & MacDonald (2013) present two diagnostics in distinguishing referential CPs from non-referential CPs, one is whether there exists a difference between the two subtypes of clause in requiring discourse context: Referential CPs requires discourse context and cannot be used in out-of-the-blue-contexts; By contrast, non-referential CPs can be used in out-of-the-blue-contexts. This contrast is attested in Mandarin Chinese (9) where a context in which a child stole a book from the library is provided (assuming this is a context where a teacher is meeting with the child's parents).
(9) Context: 你-的 孩子 今天 偷-了 一-本 書 Nǐ-de háizi jīntiān tōu-le yì-běn shū your child today steal-PERF one-CL book 'You child stole a book today.' (uttered by a teacher) a. 我 很 抱歉/ 遺憾 我-的 孩子 今天 偷-了 一-本 書 Wǒ hěn bàoqiàn/yíhán wǒ-de háizi jīntiān tōu-le yì-běn shū I very sorry/ regret my child today steal-PERF one-CL book 'I am sorry/regret that my child stole a book today.' b. 我 覺得/ 以為 我的 孩子 今天 偷-了 一-本 書 Wǒ juéde/yǐwéi wǒ-de háizi jīntiān tōu-le yì-běn shū I feel/ believe my child today steal-PERF one-CL book 'I am sorry/regret that my child stole a book today.' (uttered by the parents) Under the context provided in (9), only (9a) instead of (9b) is a possible response to the utterance made by the teacher. (9a) instantiates the referential CPs while (9b) exemplifies the nonreferential CPs. Since 'your child stole a book' is already accepted in the existing discourse, which makes the complements introduced by verbs such as juéde 'feel' or yǐwéi 'believe' that comes with a speech act introducing a proposition that is not accepted in the existing discourse inappropriate. There is no need to introduce a proposition that is already established in the context. This constitutes as the first piece evidence of a referentiality distinction in Mandarin Chinese.
Finally, the preposition duì 'on' in Mandarin provides language-internal evidence for the referentiality distinction of clausal complements. Huang, Li, Li (2009) treats duì 'on' as a diagnostic to distinguish adjectives from verbs: Many adjectives in Mandarin require prepositional phrases headed by duì 'on' followed by the preposed object when they are used transitively (12).

Non-referential CP with or without shuō.
It has been established in the literature that shuō is optional (Chappell 2008). The clausal complements with shuō and the one without shuō are argued to be interchangeable. A further scrutiny of the data pattern suggests that shuō is not optional, at least for complements introduced by non-speech act verbs (such as dānxīn 'be worried' and dǎsuàn 'plan', which make the shuō very unlikely to be interpreted in its quotative usage).
de Cuba & MacDonald (2013) report that the apparent optional complementizer que with preguntar in Spanish is in fact not optional at all. The presence of que suggests a presence of a speech-act operator that introduces a proposition into the common ground (which is not shared by the speakers). The minor yet significant difference between the one with the que and the one without is that the latter clause is an initial attempt on the part of the speakers to introduce the proposition to the common ground while the former clause instead is a non-initial attempt to introduce the following proposition into the common ground. Put it differently, there are two subtypes of non-referential CPs that both introduce unsettled propositions to the common ground but in slightly different ways: the non-referential CPs without que is used in a context where the proposition is introduced by the speaker to the common ground for the first time; By contrast, the non-referential CPs with que can be uttered when the proposition introduced by the que should already be shard and accepted in the context from the point of view of the speaker but somehow has not yet been totally accepted in the common ground. Under a well-defined context, the difference between the two can be distinguished. (14) shows a similar case from shuō in Mandarin Chinese, with the verb dǎsuàn 'plan' as an illustration.
(14) Context: Lǐsì is sharing a travel plan with his friends, who thought Lǐsì will have to stay in town because he needs to prepare for an important exam and plus there will still be many travel restrictions due to the pandemic.  (14), only the instance without the shuō is appropriate. This is expected if the variant with shuō in (14a) reflects a 'non-initial' attempt on the part of the speaker to (re)introduce the following proposition (which should already be accepted in the common ground from the point of view of the speaker), which is incompatible with the context in (14) where the following proposition 'going on vacation in Europe' has never part of the common ground (in fact, the context is completely opposite to what is accepted in the common ground (that Lǐsì is staying in town)). This suggests that the presence or absence of shuō conveys two different interpretations, contra previous analyses that analyze shuō as a completely optional element in introducing clausal complements.

Syntactic positions of shuō.
Having established that shuō heads a subtype of non-referential CPs, which is not entirely interchangeable with the clausal complement without shuō, this section provides an analysis of the syntactic positions of shuō.
There are at least two major approaches for referential CPs (which subsume factive CPs in the traditional sense). The first one comes from de Cuba's own truncation approach in which referential CPs are those with an impoverished left periphery (15b) while non-referential CPs with a full periphery (15a) (see Haegeman 2006 also Haegeman & Ürögdi (2010) instead argue that referential clauses are the result of an operator movement (which derives the referentiality in the clausal domain) (16) in the left periphery compared to the lack of it in non-referential clauses.
(16) [CP OPi C…]FP ti [TP…]]] (Haegeman & Ürögdi, 2010: 115) Given the unsettled status of referential clauses, I remain non-committal about the final implementation of these clauses 5 . Regardless, both lines of approach agree that there is a referentiality distinction in the CP domains, which I argue is the key to the understanding some of the unexpected patterns of shuō in Mandarin as a complementizer. Specifically, following Melvold's (1991) proposal that referential clauses are encoded with [+definite] features whereas non-referential CPs with [-definite] feature 6 . I argue that shuō heads a specific type of non-referential CPs with a [-definite] feature (17a), whereas the referential CPs are headed by an obligatorily null C with a [+definite] feature (17b), hence the unacceptability of shuō in referential clauses.
With shuō heading a [-def] clausal complement while the referential CP headed by an obligatorily null C with a [+def] feature, some patterns atypical of shuō as a complementizer fall out naturally. Recall that Paul (2014) presents examples that (i) embedded clauses headed by shuō cannot be fronted in afterthought constructions (3a) (ii) shuō cannot appear in clausal subjects (3b), unlike typical complementizers such as English that. More examples are shown in (18a) and (18b) The impossibility of shuō occurs in (18a) and (18b) is accounted for in the current proposal by the [-def] feature in the sentence-initial position, because Mandarin generally disallows 5 The fact that referential clauses introduced by factive verbs such as hòuhuǐ 'regret' permits Main Clause Phenomenon (e.g. inner topics) in Mandarin (i) gives a plus to the operator movement approach and a minus to the truncation approach. It is unclear how a truncation approach accounts for the acceptability of inner topics within the factive clauses if the left periphery are truncated.
(i) 我 很 遺憾 那-本 書 你 沒 讀-過 Wǒ hen yíhàn nà-běn shū ní méi dú-guo I very regret that-CL book you NEG read-ASP 'I regretted that you haven't read the book.' (Liao & Kao 2017) 6 Melvold's original approach uses the traditional factive/non-factive distinction. Here I change the factivity distinction to a referentiality distinction, following de Cuba (2017). 7 Recall in Section 5, adopting de Cuba & MacDonald's analysis of preguntar 'ask' with que in Spanish, I present evidence that the absence or presence of shuō in similarly not optional. While they both introduce new and unsettled propositions (therefore non-referential), but the propositions introduced are slightly different, namely, they introduce proposition as either the first-attempt or the non-initial attempt from the perspective of the speaker, respectively. The difference between the two can be captured along the line of cartographic approach by which shuō, similar to que in Spanish, is analyzed as the head of an extended projection in the CP licensed by a speech-act operator (de Cuba & MacDonald 2013: 137), which is absent in the non-referential CPs without shuō.
indefinite subjects in clause-initial positions, which is a well-discussed fact in the syntax of Chinese (Huang, Li & Li, 2009), as shown in (19).
(19) *三-個 學生 吃-了 蛋糕 Sān-ge xuéshēng chī-le dàngāo three-CL student eat-PERF cake Intended: 'Three students ate the cake.' (Huang, Li, Li: 2009: 288) It can be predicted that (18a) and (18b) without the sentential-initial shuō are acceptable with a [+def] reading, which cannot be used without discourse context. This is borne out in (20) in which a context where the participant of the embedded clause has not been established in the discourse. Only the clausal complements in its postverbal position (20b) can be used in this context, since the embedded proposition needs to be introduced to the common ground for the first time by the speakers. (20a) with the fronted clause requires a context where Lisi is already shared in the common ground, which is incompatible with the out-of-the-blue context in (20).
(20) Context: Assuming Lisi has never been brought up in the discourse context (no one knows who Lisi is). a. #李四 會 來， 我 以為 Lǐsì huì lái , Wǒ yǐwéi Lǐsì will come I believe 'That Lǐsì will come, I believe.' b. 我 以為 李四 會 來 Wǒ yǐwéi Lǐsì huì lái I believe Lǐsì will come 'I believe Lǐsì will come.' This proposal then unifies the two variants of shuō that occur both in finite and nonfinite clauses, as shown in (1b) and (1c), repeated in (21), because they both introduce non-referential CPs (introduced by non-factive verbs). Recent discussions on control (e.g. Landau, 2015) propose that there also exists a left periphery that represents the context of speech for attitude complements (e.g. dǎsuàn 'plan') in C, or a 'logophoric center' in the sense of Bianchi (2003), which are available for both finite and attitude nonfinite contexts (e.g. external logophoric center for finite clauses while internal logophoric center for nonfinite clauses). I argue that shuō exemplifies this C head for both for (21a) and (21b).

Predictions for other Sinitic languages.
Recall this current analysis proposes that the fact that shuō heads a non-referential CP accounts for the atypical patterns of shuō being unable to occur in sentential initial position, due to an independent restriction in syntax of Chinese where indefinite subjects are disallowed in sentence-initial position. Given that shuō as a complementizer is a result of grammaticalization, there is no doubt that different varieties (that have similar type of processes of grammaticalization from verbs of saying to complementizers) exhibit different degrees of compatibility with different subtypes of verbs. Crucially, it can be predicted that if there exists a variety whose grammaticalized complementizers can occur in the sentence-initial position (a stage that Mandarin shuō has not yet reached), this very C should be able to introduce not only non-referential clauses, but crucially referential clauses (including factive contexts). This prediction is borne out by the counterpart of shuō in Taiwanese Southern Min, koŋ 51 , which is documented to be the most grammaticalized complementizer (compared to other Sinitic languages, see footnote 1) 8 . Sentence-initial koŋ 51 that introduces clausal subjects or main clauses is attested in Taiwanese Southern Min (22a) and (22b) 910 . Interestingly, this koŋ 51 is reported to be compatible with referential clauses, as exemplified by the emotive factives phai2-sek3 (歹勢) 'feel embarrassed', as shown in (22c) koŋ 51 PT 3SG also feel.embarrassed. say 8 Another candidate from Sinitic languages could be da (呾) in Jieyang dialect, another Southern Min language, which has been argued to be even more grammaticalized than koŋ 51 in Taiwanese Southern Min (Huáng 2016: 689) in that it has become obligatory for introducing complement clauses for a wide range of verbs. Also, it was reported that da (呾) is compatible with factive verbs such as 'realize'. If da (呾) is more grammaticalized than koŋ 51 , it can be predicted that da (呾) is allowed to occur in sentence-initial positions. More data are needed to confirm this prediction. (See Xu and Matthews (2007) for discussions of da (呾) in Chaozhao dialect, yet another variety of Southern Min). 9 The sentence-initial koŋ 51 usage has been reported in different sources (e.g. Cheng 1997;Hsieh & Sybesma 2008;Tseng, 2008;Lien 2020, and among many others), which has received different analyses. Hsieh & Sybesma (2008) and Tseng (2008) suggest that koŋ 51 is a complementizer that introduces main clauses and clausal subjects, while Cheng (1997) analyzes it as an adverb with a hearsay interpretation, and Lien (2020) categorizes koŋ 51 as a topic marker. While more investigations are certainly needed, based on the interpretation given in (22a) with arguably no 'hearsay' interpretation, I take it as supporting the complementizer view, which can introduce a clausal subject (contra shuō in Beijing Mandarin), as shown in (22a), or even a main clause, as shown in (22b). 10 Note that koŋ 51 can also occur in the sentence-final position, which makes it similar to many of the sentence-final particles in Chinese languages. It expresses the subjectivity on the part of the speaker. It can also be used in imperative (Lien 2020). (See Simpson & Wu (2002) for a formal analysis of the sentence-final koŋ 51 as a head-initial complementizer.) 安呢 共 趕 伊 走 安-呢 啦 an2-ni1 ka7 kuann2 i1 tsau2 an2-ni1 lah8 so OBL expel 3SG go so SFP 'He is also embarrassed that he expels him.' (Tseng 2008: 99) 8. Conclusions. In this paper I have claimed that shuō is a semi-complementizer that introduces non-referential CPs based on a revisit of the distributional patterns between shuō and different clausal complements. The atypical patterns observed in the previous literature can be explained by the non-referential/indefinite feature of the clauses headed by shuō. This paper also provides predictions for the analysis of other semi-complementizers that are grammaticalized from verbs of saying in other Sinitic languages. It remains to be seen how such an account can be extended to the rich cross-linguistic patterns of (semi)-complementizers of the similar kinds. This issue is however outside the scope of this paper and must be left to further investigation.