What is in the neighborhood of a tonal syllable ? Evidence from auditory lexical decision in Mandarin Chinese

Phonological neighborhood effects have been found in spoken word recognition, word production and phonetic variation (Gahl, Yao, & Johnson, 2012; Luce & Pisoni, 1998; Vitevitch, 2002). Overall, words from dense neighborhoods are harder to recognize but easier to produce. However, most previous studies have focused on English, while evidence suggests that these effects may not generalize cross-linguistically due to language-specific configurations of the lexicon (Michael S Vitevitch & Stamer, 2006, 2009). In the current study, we investigate the effects of phonological neighborhoods in Mandarin Chinese, which has a vastly different lexicon structure from that of English. Results from an auditory lexical decision experiment showed that phonological neighborhood density and neighbor frequency (defined by the one-phoneme/tone difference rule) are predictive of the speed and accuracy of lexical decision. Homophone density also has a facilitative effect on the accuracy of lexical decision. The implications of the current findings are discussed in the framework of the lexicon model proposed by Zhou & Marslen-Wilson (1994, 2009).

The widely-cited work by Luce & Pisoni (1998) consisted of a series of perceptual experiments (perceptual identification, auditory lexical decision, word naming), using an exhaustive set of 918 three-phoneme, monosyllabic English words.The experimental results showed that neighborhood density (i.e. the number of phonological neighbors a word has) strongly predicted recognition speed and accuracy.High-density words (i.e.words with many neighbors) had longer reaction times and lower accuracy rates in the perceptual tasks, compared with low-density words (i.e.words with fewer neighbors).These results indicated that phonological neighbors acted as competitors of the target word during the recognition process and as a result, the more neighbors (competitors) a target word has in the lexicon, the greater the inhibitory effect on the recognition of the target word.Similar effects were found for the usage frequency of phonological neighbors ("neighbor frequency" hereafter).When neighborhood density was controlled for, words with higher-frequency neighbors also had slower and less accurate responses compared to words with lower-frequency neighbors, suggesting that highfrequency neighbors induced stronger competition than low-frequency neighbors.The neighbor frequency effects were more pronounced among low-denstiy target words.Similar findings have been reported in a few other studies (Cluff & Luce, 1990;Goldinger, Luce, & Pisoni, 1989;Luce, Goldinger, Auer, & Vitevitch, 2000;Vitevitch & Luce, 1999).
By contrast, phonological neighbors play an overall facilitative role in word production tasks.Vitevitch (2002) investigated the effect of phonological neighborhoods in speech production using two speech error elicitation techniques and a series of picture naming tasks.The results showed that compared with words from sparse neighborhoods, words from dense neighborhoods elicited fewer errors and were named faster, providing evidence that phonological neighbors facilitate the production process-probably via increasing the activation of the target word.Similar findings were reported in subsequent studies (Peramunage, Blumstein, Myers, Goldrick, & Baese-berk, 2011;Vitevitch, Ambrüster, & Chu, 2004;Vitevitch & Sommers, 2003).
Most of the research reviewed above has focused on the processing of short, monosyllabic words in English.When the scope of investigation is extended to other languages, the patterns reported above may not generalize well cross-linguistically.Just like in English, high-density words in French are hard to perceive (Dufour & Frauenfelder, 2010;Ziegler, Muneaux, & Grainger, 2003), but they also seem to be hard to produce (Sadat, Martin, Costa, & Alario, 2014), and tended to be hyperarticulated in conversational speech (Yao & Meunier, 2014).The research on Spanish presented an even greater contrast with the findings from English: high-density words in Spanish are easy to perceive but hard to produce (Sadat et al., 2014;Vitevitch & Rodrí guez, 2005;Vitevitch & Stamer, 2006, 2009)-both of which are in the opposite directions of their counterpart effects in English.There have also been studies about neighborhood effects in Japanese, but the findings are mixed (Amano & Kondo, 2000;Yoneyama, 2002).Some cross-language discrepancies have been attributed to language-specific properties of neighborhood structure, word structure, and morphology.For instance, Vitevitch & Stamer (2006) points out that Spanish words have more word-final inflections-indicating word class, gender and tense-compared to English words.Therefore, phonological neighbors in Spanish tend to be morphologically related and sharing the onset, which makes them strong competitors in a production task (O'Seaghdha & Marin, 2000) and thus inhibit the production process.
In the current study, we focus on Mandarin Chinese, a language that is typologically distant from English.An immediate challenge for the research on Chinese phonological neighborhoods is the definition of phonological neighbors.The widely cited one-phoneme difference rule obviously has no consideration for tones, although 70% of the world's languages are tonal languages (Yip, 2002).Malins & Joanisse (2010, 2012) studied the influence of tonal and segmental information in spoken word recognition using eye-tracking and event-related potentials.They concluded that tones and segments were processed simultaneously in Chinese word recognition, although previous studies showed that segments were processed earlier than lexical tones (H.-C.Chen & Cutler, 1997;Ye & Connine, 1999).Despite the issue of relative timing, it is widely agreed that tones do play an important role in spoken word recognition of Mandarin Chinese.A major goal of the current study is to explore the structure of phonological neighborhood in Mandarin Chinese-in particular, the role of tone in the notion of phonological neighbor.
The second challenge is the issue with homophones.According to the one-phoneme difference rule, homophones-words with identical pronunciations (e.g.time and thyme)-are not phonological neighbors, although we may feel that homophones are phonologically more similar than one-phoneme different neighbors (see Gahl 2008 for a discussion on the phonetic realizations of homophones).The issue of homophones is negligible for the English phonological neighborhood model, due to the small number of homophones in the lexicon.But the Chinese lexicon has a vast number of homophones, as the building blocks of the Chinese lexicon are monosyllabic tonal morphemes that can be combined to make longer compound words.The set of legal monosyllables is relatively small (~1200 attested tonal syllables), leading to a high density of homophonic morphemes.According to Su & Lin (2006), a monosyllable (with tone) in Mandarin Chinese is associated with on average 8.31 morphemes (characters).The second goal of the current study is to investigate the effects of homophones in spoken word recognition, and to compare the effects of homophones with those of phonological neighbors.
A few previous studies have looked at neighborhood and homophone effects in the processing of monosyllabic items in Mandarin Chinese.Tsai (2007) found an inhibitory effect of neighborhood density in an auditory naming task, in that high-density words were responded to slower than low-density words.However, it should be noted that Tsai (2007) defined phonological neighbors using the one-phoneme difference rule, which does not include tone.More recently, Neergaard & Huang (2016) attempted to replicate Tsai's auditory word naming study and to test a number of different ways of defining Mandarin phonological neighborhoods for their effects on spoken word recognition.Neergaard and Huang claimed that the best-fitting neighborhood density measures were the ones that incorporated both tone and segments in the definition, but the validity of the claim is seriously challenged by a number of issues regarding experimental design and data analysis in their study (for example, it is not clear what criteria were used to select the set of stimuli, which contained 210 monosyllabic Mandarin words).
As for the effects of homophone, Wang, Li, Ning, & Zhang (2012) found an inhibitory effect of homophone density (i.e. the number of homophones of a target word) in an auditory lexical decision task using monosyllabic Mandarin words: words with more homophones elicited longer reaction times.Similar results are reported in Zeng & Mattys (2011).
In this paper, we report an auditory lexical decision experiment in Mandarin Chinese using an exhaustive set of more than 1200 monosyllabic morphemes.Similar to Neergaard & Huang (2016), we tested multiple neighborhood density measures for their predictive power for the speed and accuracy of lexical processing.To preview the results, we found an inhibitory effect of neighborhood density and a facilitative effect of neighbor frequency on processing speed, as well as a facilitative effect of homophone density on processing accuracy.The bestfitting neighborhood metrics were defined by the one-phoneme/tone difference rule, which recognizes any two syllables that only differ in one phoneme or tone as phonological neighbors.The findings are discussed in the lexicon model proposed by Zhou & Marslen-Wilson, 1994, 2009).

Method.
2.1.PARTICIPANTS.Seventy-eight right-handed native Mandarin speakers (49F, 29M; mean age = 23.4years, SD = 4.26) born and raised in Mainland China participated in the experiment.None of the participants reported any speech or hearing problem.
2.2.STIMULI.The stimuli consisted of 1258 real monosyllables (critical) and 761 pseudosyllables (filler).The set of real syllables was an exhaustive set of all possible monosyllables in Mandarin Chinese, i.e. all the syllables that can be associated with at least one morpheme (Chinese character).The set of filler pseudo-syllables consisted of two types of unattested but phonotactically possible monosyllables: syllables whose segmental composition is attested in a different tone ("tonal gaps", N = 353 items, e.g.[pan 2] ban2) and syllables whose segmental composition is not attested in any tone ("segmental gaps", N = 408 items, e.g.[lun1] lun1).
All the stimuli were recorded by a female native Mandarin speaker in an acoustically treated room using a uni-directional microphone, routed to Digi design.The stimuli had a mean duration of 626 ms (SD = 100), and were normalized for intensity at 70 dB using Praat (Boersma & Weenink, 2010).To keep the experimental sessions under reasonable durations, the set of real syllable stimuli were evenly divided into six blocks; each block also contained an equal number of randomly drawn pseudo-syllables (half tonal gaps and half segmental gaps).Each participant only worked on one block, and each block was presented to a total of 13 participants.In order to evaluate cross-block consistency, 12 items were shared among all the blocks.2.3.PROCEDURE.The auditory lexical decision experiment was carried out with E-Prime version 2.0 (Schneider, Eschman, & Zuccolotto, 2007) on a Lenovo laptop that was connected to a set of headphones and a Chronos response box.During each trial, initially a fixation-cross appeared at the center of the computer screen for 500 ms, and then an auditory stimulus was presented.Participants were instructed to make a judgment as fast and accurately as possible on whether the spoken stimulus could be associated with a legal Chinese character or not, and to press the appropriate key in the Chronos response box to indicate their judgment.If no response was recorded within 4000 ms after the onset of the stimulus, the experimental session would proceed to the next trial.
All experimental sessions were conducted in a sound-attenuated room.Each session began with 30 practice items (with feedback), followed by a block of test items presented in a randomized order (without feedback).Response times (RT) were recorded as the duration between the onset of the stimulus and the participant's button-press response.Most experimental sessions lasted less than 30 minutes.2.4.ANALYSIS.Lexical measures (syllable frequency, neighborhood density, neighbor frequency, homophone density) of the stimuli were obtained from the SUBTLEX-CH corpus (Cai & Brysbaert, 2010) and the neighborhood metrics database based on the SUBTLEX-CH corpus (Neergaard, Xu, & Huang, 2016). 1 Neergaard et al. calculated the neighborhood metrics for a number of different schemas.All the schemas defined phonological neighbors with the onedifference rule, but differed in how the syllables were segmented and whether tone was considered when counting the number of differences.In the current study, we chose four sets of neighborhood metrics from Neergaard et al.'s database, which were contrasted by whether syllables were segmented into phonemes or larger components (onset+rime) and whether tone was included when counting the differences.Each set consisted of a neighborhood density measure (i.e.number of neighbors) and a neighbor frequency measure (i.e.sum of all neighbors' usage frequencies).Table 1 lists the names of the neighborhood measures.Table 2 summarizes the distribution of all the lexical measures in the current set of syllable stimuli.

Neighborhood definition Neighborhood density
Neighbor frequency Syllable segmented into phonemes.One-segment/tone difference rule.

ND_CompT NF_CompT
Syllable segmented into components (onset + rime).Onecomponent difference rule.Figure 1 shows the mean neighborhood density-under different schemas-for each syllable length in real syllables.Similar to the findings for English, shorter syllables tend to have more neighbors than longer words, across all four schemas.To control for the correlation between syllable length and neighborhood density, in the current study we focus the analysis on 3phoneme consonant-initial (CXX) syllables, which comprise about half of the items (625 real syllables, 378 pseudo-syllables).
Figure 1.Mean neighborhood density by different neighborhood definitions and by syllable length.
Two types of mixed-effects models were built: linear regression for modeling (log) response times and logistic regression for modeling accuracy.In the models of critical real-syllable trials, fixed effects included neighborhood density, (log) neighbor frequency, (log) homophone density (HomoD), (log) syllable frequency (SyllFreq), (log) syllable duration (SyllDur), and lexical tone (Tone); the models of filler pseudo-syllable trials excluded homophone density and syllable frequency-both of which were not available for pseudo-syllables-and added pseudo item type (ItemType) as a fixed-effect predictor.Each model was built in four versions, with each version using only one of the four sets of neighborhood measures.Effectiveness of neighborhood measures was evaluated by comparing across different versions of the same model.All the models had random intercepts for participant and item.Numerical variables were logtransformed (if necessary for a more normal distribution) and centered before being entered into the models.All the analysis was completed with the lme4 package (Bates, Maechler, Bolker, & Walker, 2014) in R (R Team, 2014).

Results.
As shown in Table 3, the overall performance in the lexical decision task was slower (mean = 1054 ms) and less accurate (mean = 84.8%)than what we usually expect from a lexical decision task in other languages (e.g.English).This is probably due to two facts: (1) the high number of phonological neighbors and homophones in Chinese may heighten the level of competition in lexical processing, (2) the Chinese participants may be less familiar with processing monosyllabic morphemes as the majority of Chinese words are di-or tri-syllabic compounds.
We removed two types of outliers from data analysis: (1) items with extremely low accuracy (<20%), and (2) trials with extremely short or long RTs (>2 SD away from the mean RT).Altogether about 5% of the trials were excluded.Furthermore, RT analysis was conducted only with data from correctly-answered trials, whereas accuracy analysis included all valid trials.
3.1.RT IN REAL-SYLLABLE TRIALS.The models on RT in real-syllable trials included 12868 trials of 554 syllable types.When neighborhood measures based on the one-segment/tone difference rule (ND_SegT, NF_SegT) were used, the model showed a positive effect of neighborhood density (β = 0.0023, t = 2.5, 95% CI =[0.0005, 0.004]; see Table 4 for model summary) and a negative effect of neighbor frequency (β = -0.012,t = -3.0,95% CI = [-0.02,-0.004]).In other words, everything else being equal, real syllables with more neighbors are slower to respond to, but real syllables with more frequent neighbors are faster to respond to.There is no effect of homophone density (β = 0.0013, t = 0.3, 95% CI = [-0.009,0.01]) in the model.In alternative neighborhood measures were used, neighbor frequency had a similar, significant negative effect (|t| > 2.5) in two of the three alternative models, but neighborhood density was not significant in any of the alternative models (|t| < 2; see Table 5).Homophone density was not significant in any models (all |t| < 1).Other control effects are as predicted: syllables with high usage frequency are responded to faster; syllables with longer stimuli were responded to slower; T3 syllables were responded to faster than average, probably due to an earlier recognition point.Nevertheless, a significant positive effect of homophone density was observed in three of the four models (p < .005), on top of a significant effect of syllable frequency, which was observed in all four models (all p < .001).That is to say, the more homophones a syllable can be associated with-and the more frequent these homophones are-the more likely the syllable will be correctly classified as "real".As an example, the summary of fixed-effects terms in the model with ND_SegT and NF_SegT is shown in Table 6.Critical fixed effects in alternative models are summarized in Table 7.Taken together, for the processing of real syllables in the lexical decision task, neighborhood measures mainly affect the speed of classification, while homophone density affects the accuracy of classification.Among the four sets of neighborhood metrics, ND_SegT and NF_SegT are the only set that show an inhibitory effect of neighborhood density; none of the other three sets showed any sensitivity between the neighborhood measure and response time.

RT AND ACCURACY IN PSEUDO-SYLLABLE TRIALS.
Three out of the four models on RT in pseudo-syllable trials (11269 trials, 314 items) revealed a strong positive effect of neighborhood density on response time (all |t| > 3).In other words, pseudo-syllables with many real-syllable neighbors took more time to respond to, as more effort was needed to suppress the co-activated real syllables that interfered with the correct, "non-word" response.Neighbor frequency was not a significant predictor in any model (all |t| < 1.5).In addition, there is also a strong effect of item type (tonal gaps vs. segmental gaps), in that tonal gaps elicited significant longer response times (all |t| > 4) than segmental gaps, which corroborated the interference from real-syllable neighbors that only differ in tone.Table 8 shows the summary of a sample model with ND_SegT and NF_SegT; Table 9 shows the summary of critical effects in models with alternative neighborhood measures.The inhibitory effect of neighborhood density for pseudo-syllable trials was also found in the accuracy models (12840 trials, 314 syllable types).In all four models, neighborhood density had a significant negative effect on response accuracy (all p < .001).No neighbor frequency effect was found in any model (all p > .01).Consistent with the findings from the RT models, tonal gaps elicited more errors (i.e.lower accuracy) than segmental gaps.Table 10 shows the summary of the model with ND_SegT and NF_SegT; Table 11 lists the critical effects in all the alternative models.

Fixed effects
In sum, our results showed that neighborhood density played an overall inhibitory role in auditory lexical decision.Real monosyllables from dense neighborhoods had longer response times than those from sparse neighborhoods; pseudo monosyllables with more real-syllable neighbors had both longer response times and lower accuracy than those with fewer neighbors.Neighbor frequency, on the other hand, had a curious facilitative effect in terms of response times for real monosyllables, but was not significant in any other models.Among the four sets of neighborhood metrics tested in the current study, ND_SegT and NF_SegT-defined based on the one-phoneme/tone difference rule-turned out to be the most sensitive to lexical decision performance.Homophone density was tested only in the models for real syllables.Unlike previous studies (e.g., Wang et al., 2012), we found a facilitative effect of homophone density in terms of response accuracy.Real monosyllables associated with more homophonic morphemes (characters) are recognized more accurately than those associated with fewer morphemes (characters).

Discussion
. The current study aimed to investigate the effects of phonological neighborhood measures (neighborhood density and neighbor frequency) and homophone density on spoken word recognition in Mandarin Chinese.Our results showed that all three variables were, to some extent, predictive of Mandarin-speaking subjects' performance in auditory lexical decision.Specifically, we found both inhibitory and facilitative effects from neighborhood measures: while high neighborhood density lengthens response times (for both real and pseudo syllables), high neighbor frequency predicts for faster responses (for real syllables only).The co-existence of inhibitory and facilitative neighborhood effects on response time can be explained by the nature of a lexical decision task.During each trial, the subject only needs to make a binary decision about whether the syllable is "real" or not, without having to identify which syllable they have heard.Thus, when the target response is "real", the subject may arrive at a correct, "real" response via two routes: (a) when the target syllable is highly activated and correctly identified, or (b) when a non-target albeit "real" syllable is highly activated.Phonological neighbors may impede route (a) by introducing more competition during the recognition process and thus slow down the response, meanwhile, they may also facilitate route (b) and thus speed up the response by quickly identifying a non-target syllable.This also explains why the facilitative effect of neighborhoods on response time was only found for real syllables but not for pseudosyllables: when the target response is "not real", route (b) will result in a wrong response, which will in turn be excluded from the response time analysis.When compared with previous reports of Mandarin auditory lexical decision such as Wang et al. (2012), the current results presented two seeming discrepancies.First, the current subjects' performance is much slower (mean RT = 1054 ms) and less accurate (mean accuracy = 84.8%)than those reported in Want et al.'s study (mean RT around 800 ms; mean accuracy over 94%).Second, Wang et al. found an inhibitory effect of homophone density on both response time and accuracy, whereas the current study found a null effect on RT and a facilitative effect on accuracy.The first discrepancy can be reconciled by the fact that Wang et al.'s spoken stimuli are significantly shorter (mean duration = 430 ms) than those in the current study (mean duration = 626 ms), which would result in an earlier recognition point and thus shorter response times.Furthermore, Wang et al.only used high-frequency syllables, which we know would elicit faster and more accurate responses.As for the second discrepancy, a few potential sources can be speculated although we cannot offer a definitive answer for now.For example, it is possible that the inclusion of low-frequency syllables makes the current task much more difficult than that in Wang et al.'s study, which encouraged current subjects to adopt a task-specific strategy that trades response speed off for accuracy.As a result of accuracy being prioritized over speed, syllables associated with more homophonic morphemes (characters) are more likely to be correctly recognized than those with fewer homophones.After all, it is somewhat counterintuitive to conceive an inhibitory effect of homophone density on recognition accuracy (i.e. more homophones lead to higher rates of incorrect "non-word" responses), unless we assume that the subject has set up a fast pace through the experiment-as they probably have in Wang et al.'s task-and could not wait for the initial competition among homophone mates to be resolved.
Among the four sets of neighborhood metrics we tested, ND_SegT and NF_SegT turned out to be most predictive.The inhibitory effect of neighborhood density (only observed with ND_SegT) on response time for real syllables is consistent with the more widely observed inhibition from neighborhood density for pseudo syllables.Thus, the current results provide evidence that the one-phoneme/tone difference rule is a more accurate definition of Chinese phonological neighborhoods than the other three alternatives.
The general findings of neighborhood effects from this research can be interpreted in Zhou andMarslen-Wilson's (1994, 2009) model of the Chinese lexicon.During a real-syllable trial in the auditory lexical decision task, the acoustic input activates simultaneously a number of monosyllables at the phonological form level, including both the target syllable and its phonological neighbors, which are one phoneme/tone away from the target syllable.Each activated syllable further passes the activation to the morphemes (and characters) it is associated with.When the activation of one morpheme (character) surpasses a certain threshold, a decision of "real syllable" will be reached.During the recognition process, the co-activated phonological neighbors act as competitors of the target syllable, suppressing the activation of the target syllable (and the associated morphemes) and delaying a response.Meanwhile, if there exists some very strong competitors (e.g.high-frequency neighbors), they may quickly accumulate enough activation to cause a coincidentally correct "real-syllable" response.
To conclude, the current study found complex effects of phonological neighborhoods (both inhibitory and facilitative) and a facilitative effect of homophone density in spoken word recognition in Mandarin Chinese.It should be noted that the observed effects may be specific to the current task (auditory lexical decision) and experimental design.Our future research will further explore the effects of phonological neighbors and homophones in spoken word recognition in Mandarin Chinese, using different perceptual tasks and focusing more on the possible differences between strong and weak neighbors (Chen & Mirman, 2012).In doing so, we hope to achieve a more comprehensive understanding of the roles of similar-sounding words in lexical processing.

Table 2 .
Summary of lexical measures for all the real syllable items.All frequency-related measures (syllable frequency, and all the NF_ measures) are number of occurrences per million words.

Table 4 .
Summary of fixed effects in the RT model for real-syllable trials, with ND_SegT and NF_SegT as neighborhood metrics.

Table 5 .
Summary of neighborhood effects in the RT models for real-syllable trials, with alternative neighborhood metrics.3.2.ACCURACY IN REAL-SYLLABLE TRIALS.Overall, models on the accuracy of real-syllable trials (14495 trials, 554 items) revealed no significant effects of either neighborhood density or neighbor frequency, regardless of which neighborhood metrics were used (all p > .01).

Table 6 .
Summary of fixed effects in the accuracy model for real-syllable trials, with ND_SegT and NF_SegT as neighborhood metrics.

Table 7 .
Summary of neighborhood effects in the accuracy models for real-syllable trials, with alternative neighborhood metrics.

Table 8 .
Summary of fixed effects in the RT model for real-syllable trials, with ND_SegT and NF_SegT as neighborhood metrics.

Table 9 .
Summary of neighborhood effects in the accuracy models for pseudo-syllable trials, with alternative neighborhood metrics.

Table 10 .
Summary of fixed effects in the accuracy model for pseudo-syllable trials, with ND_SegT and NF_SegT as neighborhood metrics.

Table 11 .
Summary of neighborhood effects in the accuracy models for pseudo-syllable trials, with alternative neighborhood metrics.