Listeners integrate pitch and durational cues to prosodic structure in word categorization

In this study we investigate how listeners perceive vowel duration as a cue to voicing based on changes in pitch height, using a 2AFC task in which they categorized a target word from a vowel duration continuum as “coat” or “code”. We consider this issue in light of (1) psychoacoustic perceptual interactions between pitch and duration and (2) compensatory effects for prosodically driven patterning of pitch and duration in the accentual/prominence-marking system of English. In two experiments we found that listeners’ interpretation of pitch as a psychoacoustic, or prosodic event is dependent on continuum step size and range. In Experiment 1 listeners exemplified the expected psychoacoustic pattern in categorization. In Experiment 2, we altered the duration continuum in an attempt to highlight pitch as a language-specific prosodic property and found that listeners do indeed compensate for prosodically driven patterning of pitch and duration. The results thus highlight flexibility in listeners’ interpretation of these acoustic dimensions. We argue that, in the right circumstances, prosodic patterns influence listeners’ interpretation of pitch and expectations about vowel duration in the perception of isolated words. Results are discussed in terms of more general implications for listeners’ perception of prosodic and segmental cues, and possibilities for cross-linguistic extension.


Introduction.
It is well established that the phonetic properties of speech segments, both acoustic and articulatory, are systematically modulated by prosodic factors (e.g. Cho 2015, Georgeton et al. 2016, Keating et al. 2003, Onaka 2003. This can be conceptualized, in a general sense, as the phonetic encoding of prosodic structure (e.g. Keating 2006). However, the extent to which listeners are sensitive to prosodically driven variation in perception remains an open question (cf. Kim & Cho 2013, Mitterer et al. 2016. Previous studies investigating perceptual compensation for prosodic patterns (Kim & Cho 2013, Mitterer et al. 2016) have tested boundary phenomena, i.e. initial strengthening. In the present study we address this question in a new empirical domain by investigating listeners' perception of acoustic correlates of prominence marking in English. We further extend this line of research by testing how listeners' perception of isolated words may be mediated by prosodically driven variability in pitch and duration, whereas previous studies have used carrier phrases to provide prosodic context. Specifically, we ask whether listeners' perception of duration and pitch is influenced both by psychoacoustic factors, and by the patterning of pitch and duration as correlates of accentedness (i.e. prominence marking) in the prosodic system of English. We test if listeners incorporate their * Many thanks are due to Adam Royer for recording speech for the stimuli and Yang Wang for help with data collection. We are further grateful to attendees at the UCLA Phonetics seminar, and audience members at the 93 rd annual meeting of the Linguistic Society of America for feedback on this project. Authors: Jeremy Steffman, UCLA (jsteffman@ucla.edu) & Sun-Ah Jun, UCLA (jun@humnet.ucla.edu). experience with durational variation due to accent (signaled by pitch) in their perception of durational cues, in addition to domain-general auditory processes.
1.1. THE PRESENT STUDY. In the present study listeners categorized a "coat" ~ "code" continuum varying only in vowel duration. In English, among other languages, vowels before voiced obstruents are longer than those before voiceless obstruents (e.g. Chen 1970, Peterson & Lehiste 1960, van Santen 1992, and this is a robust cue to voicing for listeners (e.g. Raphael 1972). Pitch height on the vowel in the target word was manipulated to have one of two levels, HIGH and LOW (the creation of which is described in section 2.1). Using this continuum, we tested how listeners' perception of durational cues is influenced by changes in pitch height. Two different predictions for this manipulation are considered below in turn: (1) psychoacoustic predictions informed by documented perceptual interactions between duration and pitch and (2) linguistic/prosodic predictions informed by the patterning of duration and pitch as correlates of accentedness in English prosody.
Various psychoacoustic interactions between pitch and duration have been documented in the literature (e.g. Gruenenfelder & Pisoni 1980, Lehiste 1976, Shigeno 1986, suggesting that, to some extent, pitch and duration are interactive or integrated dimensions (e.g. Ellis & Jones 2009, Prince 2011 . Importantly, the extent to which listeners integrate these cues appears to be flexible, varying on the basis of stimulus and task factors (Prince 2011). In the present study we consider just one interactive aspect: the influence of pitch height on the perception of duration. Higher pitch increases perceived duration, both for non-speech and speech stimuli (e.g. Brigner 1988, Gussenhoven & Zhou 2013, Šimko et al. 2016, Yu 2010, Yu et al. 2014. This is argued to have a domain-general auditory basis in light of the fact it is observed with non-speech and patterns similarly for speakers of different languages (Brigner 1988(Brigner , Šimko et al. 2016. Previous studies find evidence of this in listeners' numerical ratings of duration, as well as explicit comparison of two stimuli. 2 In the present study, if listeners perceive increased pitch as increased vowel duration, they would be predicted to shift categorization of the target sound such that a vowel with HIGH pitch is perceived as longer, and thus more likely to be categorized as "code". In other words, listeners' perceptual integration of duration of pitch may influence their perception of vowel duration as a cue to voicing, increasing "code" responses when pitch is HIGH.
These psychoacoustic predictions can be contrasted with predicted compensatory effects, guided by listeners' interpretation of pitch as a correlate of prosodic structure. We first consider some structural properties of English prosody related to accentedness. Most accented syllables in English are marked with high (H*) pitch accents (Dainora 2006), while unaccented syllables tend not to have tonal targets (e.g. Beckman & Pierrehumbert 1986, Pierrehumbert 1980. It is also well established that, in general, accented syllables and vowels undergo systematic lengthening (e.g. Turk & Sawusch 1997, Turk & Shattuck Hufnagel 2007 and unaccented and unstressed vowels undergo systematic shortening. Further, focused words have expanded pitch range (Xu & Xu 2005) and are lengthened (e.g. Cooper et al. 1985, De Jong 2004, Eady et al. 1986), while post-focus words are shortened and reduced in pitch (De Jong 2004, Xu & Xu 2005. 1 Pitch and duration are not considered strictly integral dimensions in the sense discussed in e.g. Garner 1974, as compared to, for example, pitch and loudness, which are argued to be processed holistically by listeners (Grau & Nelson 1988, Nelson 1993. 2 These previous studies use explicit judgments of duration, in comparison to the present study in which listeners categorize a continuum that cues a phonemic contrast. Segmental categorization can be seen as an implicit test for perceived duration. Since implicit and explicit judgements of this sort do not necessarily align (Reinisch 2016), it is possible that previous results using explicit tasks will not be obtained in the implicit task in the present study.
These different structural properties of the prosodic/intonational system of English engender a very general acoustic consequence: accented syllables have increased duration and pitch relative to unaccented syllables (e.g. Greenberg et al. 2003, Kochanski et al. 2005. In this broad sense, increased pitch and duration can be considered acoustic correlates of accentedness in English. Aligning with this view, both increased pitch and duration have been shown to be contributing factors in listeners' perception of prominence in speech (Bishop et al. resubmitted, Ladd et al.1994, Ladd & Moreton 1997, Mo 2011. We can consider this general acoustic correlation in light of prosodically driven compensatory effects (e.g. Kim & Cho 2013). Given that increased pitch correlates with accentedness and contributes to listeners' perception of prominence, we predict that if listeners interpret pitch along these lines, the HIGH pitch condition may give a percept of prominence, or accentedness. Further, because of accentual lengthening, listeners may expect longer vowel durations when pitch is HIGH. In other words, increased pitch as a correlate of prominence marking in English might mediate listeners' expectations about vowel duration such that they expect longer vowels to cooccur with increased pitch. Following this logic, listeners may compensatorily adjust categorization of the vowel duration continuum such that they require longer vowel durations for a "code" response in the HIGH pitch condition, decreasing "code" responses when pitch is HIGH. Such an effect would reflect listeners' interpretation of pitch as a correlate of accentedness and perceptual compensation for prosodically driven patterning of pitch and duration. 3 The directionality of this effect is, crucially, the opposite of that predicted based on psychoacoustics, as outlined above. The present study therefore tests whether psychoacoustic or prosodic factors will influence perception of vowel duration as a cue to voicing.

Experiment 1.
To test these predictions, we implemented a 2AFC task in which listeners categorized a stimulus from a vowel duration continuum as one of two English words, "coat" or "code". These two words were chosen to be fairly matched for lexical frequency from the SUBTLEXUS corpus (Brysbaert & New 2009) to minimize frequency biases in categorization. 4 2.1. MATERIALS. The stimuli were created from the resynthesized speech of a ToBI-trained male English speaker. The speaker was first recorded at 44.1 kHz (32 bit) using SM10A Shure TM microphone and headset in a sound-attenuated room in the UCLA Phonetics Lab. Manipulation was carried out using PSOLA resynthesis (Moulines & Charpentier 1990) in Praat (Boersma & Weenik 2019). The utterances that served as a starting point for the creation of the stimuli are represented below with ToBI transcription (e.g. Beckman & Ayers-Elam 1997).
(1) I'll say code now (2) I'll say code now H* H* L-L% L+H* L-L% (1) is produced with neutral focus while (2) is produced with narrow focus on "say". Therefore, the word "code" in (1), being nuclear pitch-accented, has higher pitch and intensity than the post-focus production of "code" in (2). The token from which all stimuli were created was the word "code" from sentence (1), with pitch on this token manipulated. This token was excised and audible voicing after closure was removed to make the coda stop ambiguous. The intensity of the token was then manipulated to be the average between the productions in (1) and (2). Controlling intensity across conditions in this way is essential given that loudness and duration interact perceptually (e.g. Turk & Sawusch 1996) and would confound listeners' perception of duration as a function of pitch.
The f0 values from the nuclear pitch-accented target word "code" in (1) are referred to the HIGH pitch condition in the present study (onset = 135Hz; offset = 129Hz). The pitch values from the same target word in (2) are referred to as the LOW pitch condition (onset = 112Hz, offset = 103Hz). Two vowel length continua were resynthesized from these HIGH and LOW pitch words. The continua ranged from 60 ms to 150 ms, in 15 ms step intervals. These manipulations created 14 unique stimuli (seven continuum steps in each condition). An example stimulus representing each pitch condition is shown below in Figure 1. The y axis at right shows the f0 range in Hz, the y axis at left shows the frequency range for the spectrogram. A transcription is given below the spectrogram.
By using pitch values from the prosodic contexts outlined above, we ensure that they are fairly natural low and high pitch for the speaker's range, and that they are an instantiation of the intended prosodic context under investigation, giving representative pitch for an accented (HIGH pitch) and unaccented (LOW pitch) syllable. 5 2.2. PARTICIPANTS. 30 participants were recruited for Experiment 1 (15 participants identified as female and 15 as male). Participants were self-reported native English-speaking adults with normal hearing. All participants were students at UCLA and received course credit for participation. All provided informed consent to participate. No participant responses were excluded from analysis. 5 A related consideration is the role of pitch as a cue to voicing. Lowered pitch is one of the acoustic features associated with voiced obstruents (e.g. Lisker 1986, Ohde 1984. This is a salient cue for listeners (e.g. Kohler 1985, Winn et al. 2013, so one may wonder if the LOW pitch condition might be interpreted by listeners as a cue to voicing, predicting increased "code" responses in the LOW pitch condition. Two possible arguments against this are as follows. Firstly, voicing is variably realized in word final stops (e.g. Guy 1980, Ratner& Luberoff 1984, and pitch does not reliably fall as a correlate of voicing in word-final position (Gruenenfelder & Pisoni 1980). In light of this, pitch may not be as salient a cue to obstruent voicing word-finally, and previous studies have largely investigated voicing in initial position where f0 modulations are more consistent (Ohde 1984). Secondly, local f0 modulations near the coda consonant have been shown to be a more reliable cue to voicing, as compared to f0 across a vowel (e.g. Gruenenfelder & Pisoni 1980, Kohler 1985, Kohler & Van Dommelen 1986. This suggests our manipulations, which alter pitch across the vowel may not be interpreted as a cue to voicing by listeners. 3. PROCEDURE. Testing was carried out in a sound-attenuated room in the UCLA Phonetics Lab, with participants seated in front of a desktop computer. Stimuli were presented binaurally via a Peltor TM 3M TM listen-only headset, adjusted to a comfortable listening level. Before testing began, participants were told they would listen to a native English speaker saying one of two English words, "coat" or "code", and that their task was to select which word they had heard. During the trials, participants heard a stimulus and were presented visually with "coat" and "code", one on each side of the screen. Participants indicated their choice via a key press on the computer keyboard, where an 'f' keypress indicated the left side choice, and a 'j' keypress indicated a right side choice. The side of the screen on which "coat" and "code" appeared was counterbalanced. The inter-trial-interval was 250 ms. Before testing, participants performed eight training trials to familiarize themselves with the procedure. In these trials, participants heard the endpoints of the continuum for both pitch conditions. These training stimuli were randomized by pitch, such that participants heard two instances of each pitch condition for each endpoint (for a total of four randomized-by-pitch trials for each endpoint block). It was random which endpoint block came first. In the subsequent test trials the stimuli were totally randomized (by pitch and vowel duration). Participants categorized a total of 16 instances of each of unique stimulus, for a total of 224 (16*14) test trials. They were prompted to take a short self-paced break halfway through. The experimental procedure took approximately 10-15 minutes.
2.4. RESULTS AND DISCUSSION. Results were assessed using a linear mixed-effect model with a logistic linking function. Fixed effects in the model were vowel duration (treated as continuous and centered at zero), two levels of pitch (LOW and HIGH), and their interaction. Pitch was contrast-coded (HIGH was mapped to -1 and LOW was mapped to 1). The random effect structure of the model consisted of by-subject random intercepts, with maximally specified random slopes  Firstly, as would be expected from any such vowel duration continuum, increasing vowel duration significantly increased "code" responses (β = 1.34, z = 11.85, p < 0.001). Pitch, the predictor of interest, also showed a significant effect (β = -0.35, z = -2.73, p < 0.01), whereby overall LOW pitch significantly decreased "code" responses. As shown in Figure 2, listeners more readily categorized the target sound as "code" when it bore HIGH pitch. As outlined above, this effect is expected if HIGH pitch increased perceived vowel duration (as a cue to voicing). In this sense, the main effect of pitch observed in Experiment 1 is consistent with the psychoacoustic integration predictions outlined above and concurs with the results from the previously mentioned explicit rating studies. 6 A robust interaction between duration and pitch was also observed in the model (β = 0.33, z = 6.87, p < 0.001). Post-hoc testing with emmeans shows that pitch has no significant effect at the three lowest steps of the continuum, and at higher steps the effect increases in magnitude as vowel duration increases (see Table 3 in the Appendix).
These results overall suggest that listeners' interpretation of a duration as a cue to obstruent voicing can be directly influenced by pitch, such that increased pitch increases perceived duration, resulting in increased "code" responses. The presence of the interaction in the model highlights that this effect is contingent on vowel duration itself and is only observed at longer vowel durations (greater than 90 ms on the continuum).
It can also be noted that previous studies which found this effect (with explicit listener ratings of duration) all used stimuli which are substantially longer than our own continuum, both in terms of minimum and maximum values. The lowest minimum in these previous studies is 100 ms (as compared to our 60 ms minimum). Table 1 in the appendix offers a full summary of the durational ranges used in different stimuli in these previous studies. The fact that previous studies which consistently found this effect employed longer durations than our own, coupled with the finding that in Experiment 1 pitch only exerted an influence at longer vowel durations is suggestive of the possibility that the influence of pitch may vary based on vowel duration.
In light of this potential issue, we return to the question of how pitch and duration pattern as correlates of accentuation in English. Following the logic that compensatory processes related to prosodic structure are learned from patterns in the language (in the sense discussed in e.g. Holt et al. 2001, Wade & Holt 2005, we need to consider the duration of vowels observed in spoken corpora of English with the goal of seeing how these durational values compare to the stimuli used in Experiment 1. Previous studies which have systematically investigated this topic show that vowels which are analyzed as unstressed (Greenberg et al. 2003, SWITCHBOARD corpus) and perceived by naïve listeners as lacking prominence (Mo 2011, Buckeye corpus) are both under 100 ms in duration on average and can be much shorter. Notably, Greenberg et al. also measure the longest stressed vowel (in terms of vowel quality) to be 200 ms long on average. All other vowel qualities are on average shorter than 200 ms when stressed. However, previous studies with explicit listener ratings of duration all used durational maxima that are above 200 ms, which puts their stimuli outside of the typical range of unaccented or accented vowels in natural speech in English. When listeners are presented with these longer durations, they may not incorporate their expectations about accentual prominence at all. Our own stimuli in Experiment 1 have a maximum duration (i.e., 150ms) that aligns fairly well with the average durations of accented vowels. However, the longer steps of our continuum are well above the range for unaccented vowels, which raises the possibility that listeners would not interpret pitch differences in the stimuli as correlating with accentedness, because longer continuum steps are too long to be unaccented. Following this logic, we predict that listeners' sensitivity to LOW pitch as a correlate of unaccentedness may be enhanced when duration is relatively short such that vowels are more plausibly interpretable as lacking prominence. In other words, listeners' interpretation of acoustic cues as conveying prosodic information may relate to their expectations about how those cues (here duration and pitch) typically pattern as a function of prosody. When durations are longer than typical accented vowels (as in previous studies), or when durations are longer than typical unaccented vowels (as in Experiment 1) listeners simply may not generate expectations about accentual lengthening and pitch. Using shorter vowel durations, i.e. those that span a range of more plausibly accented and unaccented vowels (and reducing the presence of durations that are too long to be unaccented), may highlight pitch as a property related to prominence-marking for listeners.
We therefore predict that if listeners compensate perceptually for pitch as a correlate of prominence marking, shorter vowels (as compared to those used in Experiment 1) will make duration more salient as correlate of un-accentedness and will therefore have a greater tendency to exhibit this effect. In a second experiment we address this point by investigating how changing aspects of the stimuli might influence listeners' interpretation of pitch along these lines.

Experiment 2.
In Experiment 2, listeners categorized a modified continuum with the same HIGH and LOW pitch conditions. Modifications are outlined below.
3.1. MATERIALS. Two changes were made to the continuum used in Experiment 1. Firstly, the maximum value was reduced from 150 ms to 120 ms, meaning the new range was 60-120 ms. Secondly, the step size was reduced from 15 ms to 10 ms. This new continuum therefore had seven steps total, as in Experiment 1.
These changes were made with the intent of encouraging a linguistic/prosodic interpretation of pitch. By reducing the range of the continuum, listeners are exposed to less extreme variability in duration, rendering durational differences less pronounced. Further, in the absence of longer vowel durations, shorter continuum steps are probably more salient to listeners, where we predict an effect of pitch as a correlate of accentuation should be greatest. Importantly, accented vowel durations also fall roughly within this continuum range (Greenberg et al. 2003), though they can be longer. Accordingly, we predict that, in terms of duration, listeners will have exposure to continuum steps that can plausibly be interpreted as accented or unaccented, though unlike in Experiment 1, durations that may more plausibly be interpreted as only being accented are less present.
The reduced step size further makes changes in duration less perceptible (Healy & Repp 1982, Repp 1984. Importantly, a 10 ms step size is quite small for a vowel duration continuum and approaches the JND for continuum steps 100 ms and longer (Klatt 1976, Klatt & Cooper 1975. In this sense, vowel duration would become a less reliable cue to voicing for listeners, including, potentially, perceived duration as a function of pitch. Listeners may therefore be pushed to interpret pitch as prosodic property, compensating for differences in pitch height as originally predicted. If listeners do indeed adjust categorization along these lines, we predict that LOW pitch should significantly increase listeners' "code" responses, as outlined in section 1.1. Such a result would be a reversal of the effect observed in Experiment 1. 3.2. PARTICIPANTS AND PROCEDURE. 30 (different) participants were recruited for Experiment 2 (22 participants identified as female and 8 as male). No participant responses were excluded from analysis. The procedure was identical to Experiment 1.
3.3. RESULTS AND DISCUSSION. The statistical assessment and model fitting procedure was the same as in Experiment 1. Results from Experiment 2 are visualized in Figure 3 below. Figure 3: Categorization along the continuum split by pitch condition for Experiment 2 (at left) and the proportion of "code" responses in each pitch condition (at right).
As in Experiment 1, increasing vowel duration significantly increased "code" responses (β = 0.76, z = 10. 64, p < 0.001). It can be noted that the effect is smaller than that in Experiment 1, suggesting that, as expected, vowel duration has become a less reliable cue to voicing. Visually, we can see in Figure 3 that the categorization functions are quite shallow, and the endpoints of the continuum are not anchored, indicating that stimuli are overall fairly ambiguous to listeners. Crucially, in this context of ambiguity, pitch also showed a significant main effect (β = 0.51, z = 3.28, p < 0.01), whereby LOW pitch significantly increased "code" responses. The effect of pitch is therefore the opposite of that in Experiment 1 (cf. Figure 2). A significant interaction was also observed in the model (β = -0.07, z = -2.54, p < 0.05), showing that the magnitude of the effect of pitch is largest at the lowest endpoints of the continuum, and decreases systematically as vowel duration increases, though there is a significant effect of pitch at each continuum step (see Table 3 in the appendix). The presence of this interaction suggests that listeners are more sensitive to pitch differences at shorter vowel durations, aligning with the hypothesis that shorter durations may be more plausible as unaccented, or non-prominent vowels. In other words, the shorter end of the continuum may provide listeners with a range of stimuli where LOW pitch more reliably correlates with duration as a function of prosody, and therefore enhances listeners' interpretation of pitch as prosodic at these continuum steps.
4. General discussion. The present study gives us a nuanced view of how prosodic structure mediates listeners' perception of durational cues, and interfaces with domain-general perceptual processes. The results of Experiment 1 suggest that listeners' perception of vowel duration as a Exp. 2 overall "code" responses by pitch cue to obstruent voicing can be influenced by pitch height, reflecting psychoacoustic perceptual integration of pitch and duration. This aligns with previous literature which used explicit judgements to test listeners' perception of duration as a function of pitch height (e.g. Brigner 1988, Yu et al. 2014) and supports the view that these dimensions can be integrated perceptually by listeners (e.g. Prince 2011). Importantly, in Experiment 1 the effect of pitch was observed to be contingent on vowel duration itself, showing no effect at shorter continuum steps.
In Experiment 2, which highlighted shorter durations and reduced the perceptibility of durational differences, the effect of pitch was reversed entirely. As outlined above, the pattern seen in Experiment 2 suggests that listeners are interpreting pitch as a correlate of prosodic structure. This interpretation crucially mediates their expectations about vowel duration and engenders robust compensatory shifts in categorization. We propose that the results in Experiment 2 can be interpreted as reflecting prosodic integration, that is, the integration of expectations about duration with listeners' interpretation of pitch as a linguistic/prosodic property.
Taken together, these results suggest that compensation for prosodically driven variation in pitch and duration can indeed be observed under the right circumstances, crucially, when patterns in the stimuli map more closely onto prosodic patterns in the language. These results therefore align with the view that perceptual integration of pitch and duration is flexible and varies based on stimulus factors (e.g. Ellis & Jones 2009, Prince 2011. They further support the claim that listeners are sensitive to prosodically conditioned variation in phonetic detail (e.g. Kim & Cho), and that perceptual compensation of this sort may be based on learned patterns from language input (following e.g. Holt et al. 2001). This is consistent with the idea that "structured experience shapes perception" (Holt et al. 2001, p 772; see also Holt & Lotto 2006, 2010. The present results can complement this view in showing that acoustic patterns driven by English prosody can apparently structure perceptual experience in a way that mediates listeners' perception of speech segments, overriding psychoacoustic effects under the right circumstances. The present results extend our understanding of how prosody influences speech perception in two ways. First, extending from the previous finding that cues to a prosodic boundary influences listeners' perception of duration (e.g. Kim & Cho 2013, Mitterer et al. 2016, Steffman 2018), the present study shows that patterns associated with prominence marking can also influence how listeners interpret durational cues. These results thus highlight that multiple facets of prosodic organization influence the way listeners interpret segmental cues in the speech signal. Secondly, the present results suggest that these prosodically-driven compensatory effects for accentual prominence can occur even in isolated words, whereas previous research on prosody and segmental perception placed words within carrier phrases that varied prosodic context.
The fact that listeners are adjusting categorization of isolated words on the basis of prosodic factors suggests that they may be able to interpret and perceptually access prosodic information for words in isolation. This general notion aligns with the view that listeners retain phonetically rich representations of sounds in memory as couched in exemplar theories of speech perception (e.g. Johnson 2006, Pierrehumbert 2001. The present results may support this view, in the sense that prosodic-structural factors introduce patterned acoustic variability (in duration and pitch), which is encoded and retained by listeners, and influences categorization of words, even those that are dissociated from an explicit prosodic context. A variety of previous studies have suggested that listeners retain phonetically rich representations of prosodic information in perception (e.g. Braun et al. 2006, D'Imperio et al. 2014, Kimball et al. 2015, Schweitzer et al. 2015, and the present study may offer evidence of this in the form of a categorization task. Additionally, it has been argued that compensatory (or normalization) effects in categorization may arise as a natural consequence of exemplar storage (e.g. Johnson 1997). This offers a useful lens for considering the compensatory effects seen in Experiment 2, which may occur when acoustic properties in the stimuli map more closely on to stored exemplars. Further exploration of this idea may prove useful as a way of investigating the perceptual mechanisms underlying these effects.
Additionally, the present results make several concrete predictions for cross-linguistic extension, which may prove a valuable test for the way that learned language patterns mediate listeners' perception of durational cues. For example, vowels with lexical low tones tend to be longer than vowels with lexical high tones in both Thai (Gandour 1977) and Beijing Mandarin (Ho 1976). This general correlation between duration and pitch is therefore the opposite of that introduced by English prosody. This would predict that compensatory perceptual adjustments for speakers of these languages would show the opposite directionality as that observed in Experiment 2, in for example, a task where listeners categorize phonemic vowel length contrasts in Thai (e.g. Abramson & Ren 1990). The effect for speakers of these languages is therefore predicted to be uniform, in that both psychoacoustic and experience-based prosodic factors predict the same directionality. Testing if this uniformity is observed with comparable stimuli to the experiments reported here would provide a useful exploration for the role of language experience in this domain.
Further extending these results along these lines will therefore better our understanding of how listeners' interpretation of prosodic aspects in the speech signal mediates their perception of speech segments, and how language experience with prosody constrains listeners' interpretation of segmental contrasts. Testing how these prosodically driven effects interface with domaingeneral perceptual processes will further help us explore how the perceptual system integrates acoustic dimensions on the basis of both psychoacoustic and language-specific factors.

Type of stimuli
Stimuli duration range (ms) Study  Table 1: Summary of duration ranges used in previous studies, which found increased pitch increases perceived duration. The type of stimuli (speech versus non-speech) is given at left