Lexical ambiguity and acoustic distance in discrimination

This work presents a perceptual study on how acoustic details and knowledge of the lexicon influence discrimination decisions. English-speaking listeners were less likely to identify phonologically matching items as the same when they differed in vowel duration, but differences in mean F0 did not have an effect. Although both are components of English contrasts, the results only provide evidence for attention to vowel duration as a potentially contrastive cue. Lexical ambiguity was a predictor of response time. Pairs with matching duration were identified more quickly than pairs with distinct duration, but only among lexically ambiguous items, indicating that lexical ambiguity mediates attention to acoustic detail. Lexical ambiguity also interacted with neighborhood density: Among lexically unambiguous words, the proportion of ‘same’ responses decreased with neighborhood density, but there was no effect among lexically ambiguous words. This interaction suggests that evaluating phonological similarity depends more on lexical information when the items are lexically unambiguous.

1. Introduction. In perceptual tasks, how does acoustic distance in different characteristics influence discrimination and how might that interact with lexical ambiguity? Listeners can be sensitive to acoustic distance within phonological categories (Liberman et al. 1957;Pisoni & Tash 1974), though much of the work on acoustic distance in discrimination in just a few characteristics, particularly VOT. It is unclear whether acoustic distance would have similar effects across different characteristics.
Lexical ambiguity can influence processing of words (e.g. Kellas et al. 1988;Borowsky & Masson 1996), but most studies on ambiguity effects use orthographic stimuli, leaving open questions about how lexical ambiguity influences perception of acoustic input. Sanker (2019) demonstrates differences in responses to acoustically presented pairs of lexically ambiguous words and lexically unambiguous words; some differences can be attributed to acoustic differences between homophone mates in production (cf. Guion 1995;Gahl 2008), though other differences suggest an effect of ambiguity itself in how an acoustic stimulus is evaluated.
This work presents a perceptual study on how sub-phonemic details and knowledge of the lexicon influence decisions in a discrimination task. English-speaking listeners were less likely to identify phonologically matching paired items as the same when they differed in vowel duration, but differences in mean F0 did not have an effect. Distance in duration also had an effect on response time, but only among lexically ambiguous items, suggesting that lexical ambiguity mediates attention to acoustic detail.
1.1 PRODUCTION OF ACOUSTIC DETAIL. Despite their matching phonological identity, homophone mates can exhibit relatively consistent phonetic differences in production due to factors such as lexical frequency (e.g. Guion 1995;Gahl 2008) and part of speech (e.g. Sorensen et al. 1978;Conwell 2017). However, differences are found most reliably in natural speech (e.g. Gahl 2008;Lohman 2017), and can disappear when words are produced outside of their natural contexts. Differences correlated with lexical frequency can largely be eliminated by producing words in isolation or in frame sentences rather than in meaningful sentences (Guion 1995). Similarly, effects of part of speech can largely be explained as the result of prosodic structure, because they can be eliminated by producing all words in the same position in the sentence or the phrase (Sorensen et al. 1978;Conwell 2017). Productions are also influenced by predictability in context; controlling for predictability can eliminate other factors as predictors of phonetic characteristics in natural speech (Jurafsky et al. 2002).
Even if phonetic differences across phonologically identical words are driven by predictability and similar factors, it is plausible that listeners might learn these differences. The existence of these differences in production does not necessarily indicate that homophone mates have different phonetic details in their representations, but some theories predict that listeners do associate phonetic details with particular items (e.g. Johnson 1997;Pierrehumbert 2002). In particular, Exemplar Theory posits that each token of a word enters the cloud of that word's representation; differences in phonetic convergence based on lexical frequency have been taken as evidence for these lexically-specific representations (e.g. Goldinger 1998;Babel 2010). However, other evidence from convergence indicates that phonological shifts are consistent across the same segment in different words (Nielsen 2011;Pardo et al. 2012), which suggests that phonetic details in the representation are at the phonological level rather than the lexical level.
1.2 PERCEPTION OF ACOUSTIC DETAIL. Listeners are sensitive to acoustic detail in perception. Response times are faster for discriminating between paired items that are more acoustically distinct (Pisoni & Tash 1974). Identification decisions about sounds that are closer to a category boundary are slower (Pisoni & Tash 1974) and less accurate (Liberman et al. 1957) than decisions about sounds that are more central within a category; greater activation with greater prototypicality is also reflected in stronger priming effects produced by stimuli that are phonetically prototypical within their categories (Andruski et al. 1994;Ju & Luce 2006). Eye tracking similarly reflects greater uncertainty about categorization with items closer to a category boundary (McMurray et al. 2002). However, much of the work on acoustic distance in discrimination uses VOT contrasts; it is unclear whether effects would be consistent across different characteristics.
Listeners can be sensitive to acoustic detail and homophone mates can exhibit acoustic differences, so can listeners distinguish between homophone mates? Sanker (2019) found that pairs of homophone mates (e.g. sun-son) are more likely to be identified as phonologically distinct than pairs of the same word (e.g. sun-sun) when the stimuli were extracted from meaningful sentences. This result likely reflects the greater acoustic distance between homophone mates than between pairs of different speakers' productions of the same word; this difference was largely apparent in vowel formants, though also found to a smaller degree in several other characteristics. When stimuli were made from productions in isolation, the two pair types did not differ from each other in acoustic distance between paired items or in response patterns, which further indicates that the discrimination results cannot be interpreted as evidence for distinct acoustic details in the lexical representations of homophone mates.
Even clearer evidence against homophone mates having distinct phonetic details in their representations is provided by lexical identification results; perceptual identification tasks requiring listeners to decide between homophone mates produce accuracy that is at chance or only marginally above change (Bond 1973;Sanker 2019). Lexically ambiguous acoustic stim-uli activate all of the associated homophone mates, which is demonstrated in priming of orthographic lexical decisions after exposure to ambiguous acoustic stimuli (Onifer & Swinney 1981;Grainger et al. 2001).
1.3 EFFECTS OF LEXICAL AMBIGUITY. Even if homophone mates cannot be distinguished from each other, lexical ambiguity seems to have an effect on lexical access and phonological processing. Lexical ambiguity in a discrimination task with acoustic stimuli produces slower responses and fewer identifications of phonologically matching items as being the same (Sanker 2019), which could indicate that listeners' awareness of ambiguity in the lexicon impedes their decisions about such items; with lexically unambiguous stimuli, they can approach the task lexically, while they cannot determine that lexically ambiguous stimuli match based on narrowing both down to being the same word. The pattern of responses to lexically ambiguous items could also indicate that this lexical uncertainty causes listeners to attend more closely to acoustic detail, because they are expecting differences.
Responses to lexically ambiguous words and lexically unambiguous words differ in a range of tasks. Some studies have found that responses are faster for homophones than nonhomophones in orthographic lexical decisions (e.g. Rubenstein et al. 1970;Jastrzembski 1981;Kellas et al. 1988;Borowsky & Masson 1996) and naming tasks (e.g. Hino et al. 2002), though other studies have found the opposite effect (e.g. Pexman et al. 2001). However, the apparent effects of lexical ambiguity might be indirect, due to characteristics which correlate with lexical ambiguity. Gernsbacher (1984) suggested that faster responses for lexically ambiguous words were the result of familiarity, as words with multiple meanings are often more familiar. However, experiments controlling for factors such as familiarity and frequency still found effects of lexical ambiguity in lexical decision tasks (e.g. Kellas et al. 1988;Borowsky & Masson 1996). Faster responses for lexically ambiguous words have been interpreted as the result of multiple lexical entries with the same phonological form all contributing to the activation of that phonological form (Jastrzembski 1981;Kellas et al. 1988).
The effects of lexical ambiguity in processing depend on whether the task motivates semantic processing or not, and whether the phonological advantage offsets the effects of semantic competition (Joordens & Besner 1994). The predicted results thus differ by task, with facilitation only when the response is consistent for all homophone mates, e.g. lexical decision and naming. In semantic decision tasks, homophones have a consistent disadvantage, at least when the meanings are in disagreement (Hino et al. 2002;Siakaluk et al. 2007); the disadvantage can be intensified by priming their homophone mates (Pylkkänen et al. 2006).
1.4 LEXICAL FREQUENCY AND NEIGHBORHOOD DENSITY. Words with homophones have higher frequency and greater neighborhood density on average than other words. While studies often control for frequency, they less often control for neighborhood density, resulting in different neighborhood densities of the lexically ambiguous and lexically unambiguous (e.g. Rubenstein et al. 1970;Hino & Lupker 1996). As a result of this relationship, some of the effects of frequency and neighborhood density might appear to be effects of lexical ambiguity. It is additionally unclear whether effects of frequency and neighborhood density would be the same for lexically ambiguous words and for lexically unambiguous words; most studies do not look for an interaction.
A range of experiments demonstrate faster processing for high frequency words, usually with orthographic stimuli, e.g. for lexical decision (Stanners et al. 1975;Murray & Forster 2004) and semantic categorization (Monsell et al. 1989;Lewellen et al. 1993). The same effect is also found in picture naming (Oldfield & Wingfield 1965;Carroll & White 1973). Higher frequency words also have a processing advantage with acoustic stimuli; in lexically ambiguous items, the higher frequency meaning will be more strongly activated (Simpson & Burgess 1985;Binder & Rayner 1998). When stimuli are partially obscured by noise or contain segments manipulated to be acoustically ambiguous, listeners are often more likely to interpret them as higher frequency words (Howes 1957;Connine et al. 1993), though Samuel (1981) did not find an effect of word frequency in phoneme restoration.
In discrimination tasks, response times increase with higher neighborhood density (Vitevitch & Luce 1999;Luce & Large 2001); because there are more phonologically similar competitors, listeners take longer to determine whether paired forms from a dense neighborhood are distinct words or not. In contrast, high neighborhood density may facilitate reading tasks (Borowsky & Masson 1996;Mulatti et al. 2006), though this apparent effect might be better described as an effect of phonotactic probability, which is highly correlated with neighborhood density. In lexical decision tasks, neighborhood density does not seem to be a predictor of response latency (Borowsky & Masson 1996;Vitevitch & Luce 1999); lexical neighbors do not serve as competitors in such decisions.
2. Methods. 48 native speakers of American English (6 male; mean age 20.7) participated in the task and were paid for participation. Stimuli were pairs of monosyllabic English words, produced in isolation by two female native speakers of English. The juxtaposed items in each pair differed in speaker, to encourage phonological decisions rather than decisions about phonetic identity. The full list of stimulus words is given in Tables 3-4 at the end of the paper.
Among the phonologically matching pairs, there were two conditions of lexical ambiguity. All participants heard all of the words: (a) 48 ambiguous (e.g. made-made, cf. maid), and (b) 80 unambiguous (e.g. mud-mud). The lexically ambiguous and unambiguous groups were matched to have comparable mean neighborhood density. Due to lack of agreement in the literature about how to measure lexical frequency of homophones, frequency matching between the two categories was a compromise between having the mean frequency of the lexically unambiguous words match the mean combined frequency of the homophones (e.g. the frequency of /s2n/ as the frequency of sun + the frequency of son), having it match the mean individual frequency of the homophones (the frequency of /s2n/ as the mean of sun's frequency and son's frequency), or having it match the frequency of the higher frequency homophone mate (e.g. the frequency of /s2n/ as the frequency of son). In analysis, the frequency of lexically ambiguous words was treated as being the frequency of the higher frequency homophone mate, based on evidence that acoustic stimuli elicit faster retrieval of a higher frequency homophone mate than its lower frequency mates (Simpson & Burgess 1985;Binder & Rayner 1998).
There were two conditions of acoustic manipulations in the phonologically matching pairs. Each participant was assigned to a single acoustic manipulation condition: (a) F0: half of pairs were manipulated to have equal mean F0 and half differed by 70 Hz, and (b) vowel duration: half of paired items had equal vowel duration and half differed by 100 ms.
There were an equal number of phonologically matching pairs and phonologically distinct filler pairs, which had a single segmental contrast. For half of the participants, the difference was in onsets (e.g. pile-file), and for the other half of the participants, the difference was in codas (e.g. leaf-leave). None of the words that appeared in these filler pairs also appeared in a phonologically matching pair. Results are reported only for the matching pairs.
Participants heard the word pairs presented over headphones, separated by 200 ms of silence. Instructions on a computer screen asked listeners to identify each pair as either being the same or different. Responses were given with the left and right arrow keys on the keyboard; which side corresponded to 'same' and 'different' was balanced across listeners. The experiment was self-timed; the next trial began 500 ms after a response was given. Response times were measured from the beginning of the second word. Trials with response times greater than 5 s or less than 250 ms were excluded from analysis (1.9% of the data).
The experiment was run in PsychoPy (Pierce 2007). All statistical results are from mixed effects models calculated with the lme4 package in R (Bates et al. 2015). p-values were calculated by the lmerTest package (Kuznetsova et al. 2015).
3. Results. Two aspects of responses are analyzed: The proportion of 'same' responses and the log response time. The former reflects perception of the two forms as phonologically the same or distinct, while the latter can capture processing factors that might not be reflected in listeners' ultimate decisions about the stimuli. Table 1 presents a logistic mixed effects model for 'same' responses to the phonologically matching stimuli. The random effects were participant and word pair. The fixed effects were manipulation type (duration, F0), manipulation distance (close, further), lexical ambiguity (ambiguous, unambiguous), neighborhood density, the interaction between lexical ambiguity and neighborhood density, the interaction between lexical ambiguity and manipulation distance, and the interaction between manipulation type and manipulation distance. There was a significant effect of manipulation type on responses. Participants in the duration manipulation condition were more likely to identify phonologically matching pairs as being the same word. There was also a significant interaction between manipulation type and manipulated distance, which was the primary source of the main effect of manipulation type. Listeners were more likely to identify the phonologically matching pairs as the same when they matched in vowel duration than when they had distinct vowel durations, but having matching or distinct F0 mean did not influence responses. The effects of manipulation type and manipulated distance are illustrated in Figure 1. Adding a three-way interaction between manipulation type, manipulation distance, and lexical ambiguity did not significantly improve the model (χ 2 = 4.27, df = 2, p = 0.118).
There was a marginal effect of lexical ambiguity on responses when the interaction with  Figure 2: 'Same' responses, by lexical ambiguity and neighborhood density neighborhood density was included, because the effect of neighborhood density differed between lexically unambiguous and lexically ambiguous words. Within lexically unambiguous words, listeners were less likely to identify words with higher neighborhood density as being the same. Among lexically ambiguous words, the effect was absent, indicated in the opposite and nearly equal coefficient for the interaction between lexical ambiguity and neighborhood density. The effects of neighborhood density and lexical ambiguity are illustrated in Figure 2. Figure 3 demonstrates the small overall effect of lexical ambiguity. Including the position of the contrast (onsets or codas) that appeared in the phonologically distinct filler pairs did not improve the model (χ 2 = 0.0176, df = 1, p = 0.894). That is, the position of the phonological contrasts that listeners heard in phonologically distinct pairs did not influence their responses to phonologically matching pairs; listeners did not seem to develop expectations about the position where contrasts would appear, or at least did not develop expectations that influenced their decisions.
The model was not improved by adding lexical frequency (χ 2 = 0.557, df = 1, p = 0.455) or lexical frequency and an interaction between lexical frequency and lexical ambiguity (χ 2 = 2.82, df = 2, p = 0.244), even when neighborhood density was excluded from the model. Table 2 presents a linear mixed effects model for log response times of responses to the phonologically matching stimuli. The random effects were participant and word pair. The fixed effects were manipulation type (duration, F0), manipulation distance (close, further), lexical ambiguity (ambiguous, unambiguous), neighborhood density, log lexical frequency, the interaction between lexical ambiguity and neighborhood density, the interaction between lexical ambiguity and lexical frequency, the interaction between lexical ambiguity and manipulation type, the interaction between lexical ambiguity and manipulation distance, the interaction between manipulation type and manipulation distance, and the three-way interaction between lexical ambiguity, manipulation type, and manipulation distance.  There was no main effect of manipulation type, manipulation distance, or lexical ambiguity on response time. However, there were interactions between them. The three-way inter- : Log response times, by lexical ambiguity, manipulation type, and manipulation distance action between manipulation type, manipulation distance, and lexical ambiguity was included because it significantly improves the fit, as compared to a model that parallels the one in Table 1 (χ 2 = 12.9, df = 2, p = 0.00159). Among the lexically ambiguous items, response times were faster with vowel duration contrasts than with F0 contrasts, and pairs with matching duration were identified more quickly than items with distinct durations. The fast responses to lexically ambiguous words with matching vowel duration produced both the significant interaction between manipulation type and lexical ambiguity and the significant interaction between manipulation type, manipulation distance, and lexical ambiguity. No other pair type exhibited a significant effect of manipulation distance on response time, as illustrated in Figure 4. There was no effect of neighborhood density on response time. It is included to parallel the model for 'same' responses presented in Table 1. Notably, the effect was similarly absent if lexical frequency was excluded from the model, so the lack of effect here cannot be attributed to a correlation between neighborhood density and lexical frequency.
Log lexical frequency was a significant predictor of response time. Responses were faster for higher frequency words, at least among lexically unambiguous words. For lexically ambiguous words, the effect was weaker, though the difference between the two was only marginally significant. The effect of lexical frequency on response time is presented in Figure 5. The result may indicate that lexical frequency is not a major influence on processing time for lexically ambiguous items. However, it is also possible that frequency was a weak predictor of response time for these items simply because the relevant measure of their frequency is something different from the measure used here. As described above, lexical frequency for lexically ambiguous items was treated as the frequency of the higher frequency homophone mate.
Including the position of the contrast (onsets or codas) that appeared in the phonologically distinct filler pairs did not improve the model (χ 2 = 0.813, df = 1, p = 0.367). The lack of effect of contrast position suggests that listeners did not develop expectations about the positions in which contrasts would appear; expecting contrasts earlier in the word would produce shorter response times for listeners in the onset-contrasts condition. It would be reasonable to expect the position of the contrast to interact with the manipulation type, as vowel duration is a cue to coda voicing contrasts; however, including an interaction between contrast position and manipulation type also did not improve the model (χ 2 = 0.337, df = 2, p = 0.845). 4. Discussion. The results demonstrate that effects of acoustic distance between phonologically matching stimuli are particular to the characteristic that is manipulated. They additionally suggest that lexical ambiguity influences processing in discrimination tasks, producing effects of neighborhood density and acoustic distance that are distinct from the effects with lexically unambiguous items. Listeners were less likely to identify the phonologically matching pairs as the same when they had larger differences in vowel duration, but having matching or distinct F0 mean did not influence responses. This suggests that at least in the context set up by this experiment, English speakers were attending to vowel duration as a potential cue contributing to phonological contrasts, but did not similarly perceive F0 mean as a potential element of contrast. This result might be due to F0 functioning as a phonological cue in English only as part of particular F0 contours, or depending on co-occurrence with other cues such as differences in VOT.
Lexical ambiguity did not have an overall effect on response time or the proportion of 'same' responses. However, lexical ambiguity interacted with manipulation type and manipulation distance as predictors of response time. Among the lexically ambiguous items, response times were faster with duration contrasts than with F0 contrasts, and pairs with matching duration were identified more quickly than items with more distinct duration. Among lexically unambiguous items and items with an F0 contrast, the manipulation distance did not have a significant effect on response time. This result suggests that lexical ambiguity influences listeners' attention to detail; consistent with the interaction between manipulation type and manipulation distance as predictors of 'same' responses, increased attention to acoustic detail only impacted vowel duration. Greater attention to detail could result from the expectation that these items might be phonologically distinct based on having two competing lexical entries. With lexically ambiguous items, listeners are likely to activate two possible lexical entries that are consistent with the stimuli, and require more careful processing to evaluate whether or not they are phonologically distinct. However, the lack of effect on 'same' responses indicates that this additional processing does not change listeners' ultimate decisions based on the phonetic forms.
Neighborhood density interacted with lexical ambiguity in predicting 'same' responses. Among lexically ambiguous words, there was no evidence for an effect of neighborhood density on 'same' responses. In contrast, the proportion of 'same' responses with lexically un-ambiguous words decreased with neighborhood density. The result is consistent with lexical ambiguity forcing listeners to use a different process of evaluation. With lexically unambiguous items, listeners can identify paired items as matching based on narrowing down the lexical identity of both items to the same entry. On the other hand, with lexically ambiguous items, the lexical identity cannot be narrowed down to a single item, so listeners must evaluate phonological status without relying on how it aligns with the lexicon. This phonological evaluation strategy makes the existence of lexical neighbors less relevant. Given that there is actually a higher proportion of 'same' responses for lexically ambiguous items than for lexically unambiguous items at high neighborhood density, lexical information might not just aid decisions in sparse neighborhoods but also impede decisions in dense neighborhoods. However, there were not enough items in very high density neighborhoods to conclusively demonstrate the strength of this pattern at this end of the continuum.
It is notable that the interaction between neighborhood density and lexical ambiguity could create apparent effects of lexical ambiguity based on the neighborhood density of the words used in an experiment, even if there are no effects of lexical ambiguity itself. If a word set primarily contains words of high neighborhood density, the proportion of 'same' responses would be higher for lexically ambiguous words. If a word set primarily contains words of low neighborhood density, the proportion of 'same' responses for lexically ambiguous words would be lower, particularly if the neighborhood density in each group is not controlled, because lexically unambiguous words are likely to have lower mean neighborhood density than lexically ambiguous words.
However, neighborhood density was not a significant predictor of response time, in contrast to many previous studies that have found that response times increase with neighborhood density in discrimination tasks (e.g. Vitevitch & Luce 1999;Luce & Large 2001). It is possible that the set of stimuli in this experiment did not include a wide enough distribution to capture effects of neighborhood density on response time; studies on neighborhood density often select stimuli of very high neighborhood density and very low neighborhood density and test it as a binary effect rather than a continuous effect.
Listeners gave faster responses for higher frequency words. The effect of lexical frequency was primarily apparent among lexically unambiguous words, and much weaker among lexically ambiguous words. It is possible that frequency was a weak predictor of response time due to how frequency was measured. In this study, lexical frequency for lexically ambiguous items was treated as the frequency of the higher frequency homophone mate, based on previous work demonstrating that a higher frequency homophone mate is more rapidly retrieved than a lower frequency mate when listeners hear ambiguous acoustic stimuli (Simpson & Burgess 1985;Binder & Rayner 1998). However, the best way of measuring the frequency of homophones is debated in the literature. When orthographic stimuli or semantic context make it possible to distinguish between the frequency of each homophone mate, many studies support measurement of individual frequency, based on individual frequencies predicting response latency in tasks such as lexical decision (e.g. Grainger et al. 2001), gaze duration in reading (e.g. Binder & Rayner 1998), and picture naming (e.g. Caramazza et al. 2001). However, there is also evidence for combined frequency or the frequency of the higher-frequency homophone mate as the relevant measure, based on low-frequency homophones exhibiting patterns like their high-frequency homophone mates in speed of translation and picture naming (e.g. Jescheniak & Levelt 1994;Antón-Méndez et al. 2012) and low susceptibility to produc-tion errors (e.g. Dell 1990). It is possible that none of these measures of lexically frequency would alone be sufficient to explain the frequency-related behavior of homophones.

5.
Conclusions. This study demonstrates that listeners' sensitivity to within-category acoustic distance is particular to the characteristic being manipulated. Only vowel duration influenced responses in this task, though both vowel duration and F0 are components of English contrasts. The effect of vowel duration on discrimination suggests that listeners attend to it as a potentially contrastive cue, while mean F0 might function as a cue only as part of differences in F0 contour or in combination with other cues.
Pairs with matching vowel duration were identified more quickly than pairs with distinct duration among lexically ambiguous items, but not among lexically unambiguous items, suggesting that lexical ambiguity mediates attention to acoustic detail. However, lexical ambiguity was not a significant predictor of responses, either as a main effect or in interaction with manipulation distance; while listeners spend longer considering lexically ambiguous items with larger duration differences, this deliberation does not change their ultimate decisions.
Lexical ambiguity also interacted with neighborhood density: Among lexically unambiguous words, the proportion of 'same' responses decreased with neighborhood density, but there was no effect among lexically ambiguous words. This interaction suggests that knowledge of lexical competitors influences phonological decisions in lexically unambiguous words, while processing lexically ambiguous words is less shaped by lexical competition. Given differences in the mean neighborhood density of lexically ambiguous and unambiguous words, some of the main effects of lexical ambiguity found in previous work could be due to neighborhood density rather than ambiguity itself.