The role of gesture in the English ish-construction

The English ish-construction, in which ish follows an utterance to indicate hedging, is multimodal. It has a prosodic component (a pause between the utterance and ish), and, as observed in this paper, it is often accompanied by a co-speech gesture such as a shrug. Data from a perception study suggests that unlike prosody, gesture is not a grammatical component of the ish-construction. However, gesture does play a significant role in conveying affect to listeners. I suggest that this use of gesture is a not-at-issue contribution to the utterance, and call for further work uniting the semantics/pragmatics and sociolinguistics of gesture.


Introduction.
Recent work in multiple subfields of linguistics has investigated the role of gesture in spoken languages.For example, semanticists have been making inquiries into what meaning CO-SPEECH GESTURES contribute to an utterance (Esipova 2019, Tieu et al. 2018, Zlogar and Davidson 2018, inter alia).At the same time, variationist sociolinguists have been exploring the role of EMBODIMENT in structuring linguistic variation and what social meaning is conveyed by embodiment (Podesva et al. 2015, Voight et al. 2016).The ultimate claim of this paper is that these two research domains overlap to a considerable degree, at least insofar as embodiment includes physical movement accompanying spoken language.
I make use of the following observation: English utterances which involve the ishconstruction (1) typically involve a gestural component such as a shrug (Danny Erker p.c.).
'I sort of finished my homework.' The ish-construction thus lends itself to a perception task using naturalistic examples to test a. if and how gesture affects the naturalness of the construction, and b. if and how gesture affects the social evaluation of a speaker using the construction.I show that there is no difference in ratings of the construction's naturalness with or without gesture.The presence or absence of gesture does, however, significantly influence the evaluation of the speaker.I suggest that these results indicate that gesture is not part of the grammar of the ish-construction, but does play a role in conveying affect.I argue that such a contribution can be seen as a not-at-issue contribution to the utterance.Because work in semantics has suggested that co-speech gestures can provide not-at-issue content (Esipova 2019), the role of gesture in conveying affect thus provides a link between this field and sociolinguistics for future research.

The English ish-construction.
The ish-construction involves the clause-final use of ish as what Bochnak and Csipak (2014) describe as a metalinguistic degree morpheme that hedges the degree of commitment to the preceding proposition (1, repeated below).In this use, ish takes on a meaning of approximately 'somewhat/sort of/kind of.'It appears to make a not-at-issue contri-bution to the utterance, as it is incompatible with negation (Duncan 2016, see test in Ebert and Ebert 2014).
'I sort of finished my homework.' The use of ish to hedge a proposition is generally assumed to be related to use of -ish as a derivational morpheme that serves a similar function.This makes the ish-construction an especially clear instance of degrammaticalization (Norde 2009) through syntactic change.Duncan (2015) reports data from a survey of white American English speakers in a Manhattan park that suggests this change is fairly recent and rapid, progressing in apparent time from little acceptance among speakers older than 50 years old at the time of data collection to widespread acceptance among speakers younger than 25 years old.
The ish-construction is particularly notable for being multimodal.In addition to the linear order the words appear in, the construction has a prosodic and gestural component.The prosodic component has been described as a pause between the proposition and ish (Duncan 2016); however, this is an oversimplification.More accurately, the syllable preceding ish is lengthened.This lengthened syllable may or may not be followed by a pause.The prosodic component appears to be obligatory, and Duncan (2016) in fact builds it into his proposal of the construction's syntactic structure. 1he gestural component of the construction is less documented.Speakers using the construction have been observed to accompany the utterance with an iconic co-speech gesture like a shrug (Danny Erker p.c.).Informants that I have mentioned this observation to (typically white Americans age 30 or younger) have all recognized it as something that they do, which suggests that use of a co-speech gesture with the ish-construction is relatively conventionalized.As such, it is of interest to us whether it is obligatory to produce the gestural component in the same sense that the prosodic component apparently is.

Approaches to the body in spoken language.
There is a growing body of literature on the contribution of gesture to spoken language across fields typically seen as disparate.My goal below is to briefly review these contributions, with particular focus on their research aims, claims, and methods.I keep the focus here on research into spoken language, and set aside that looking at signed languages.
3.1.SEMANTICS OF CO-SPEECH GESTURES.Broadly speaking, the question for semanticists has been what content co-speech gestures contribute to an utterance.This work tends to focus on iconic speech gestures, setting aside other types of gesture like beats and metaphoric gesture (Ebert and Ebert 2014).There is wide agreement that co-speech gesture conveys not-at-issue content (Ebert andEbert 2014, Tieu et al. 2018).That is, co-speech gestures tend to project within embedding contexts (Esipova 2019), and cannot be directly denied (Ebert and Ebert 2014).It is less clear whether co-speech gestures can contribute at-issue content, although the emphasis on their not-at-issueness implies not.Esipova (2019) created a perception task designed to force an at-issue reading through contrastive focus and found that such readings were degraded.However, she notes considerable individual variation in ratings, such that some participants appear to accept at-issue readings while others do not, and ultimately suggests that while there may indeed be a strong bias against at-issue co-speech gestures, they are not necessarily impossible.It is unclear, however, what factors guide a speaker to accept at-issue co-speech gestures.
Wide variability in acceptance of co-speech gestures appears to be a common finding.In a perception task, Zlogar and Davidson (2018) test whether co-speech gestures can serve as hard presupposition triggers and whether utterance-internal content influences acceptability.They find that co-speech gestures are not hard presupposition triggers, and that utterance acceptability is influenced by whether the co-speech gesture reinforces the content in the utterance.They too note a large degree of variability in acceptability ratings.One interesting note is that across the board, Zlogar and Davidson find that utterances with co-speech gestures are rated as less acceptable than utterances without them.
One potential reason for the variability in acceptability judgements, and perhaps the lower acceptability for utterances with co-speech gestures, is that the above studies use a quite artificial setting.For example, Zlogar and Davidson (2018) and Esipova (2019) use video of a single speaker standing against a plain background using constructed iconic gestures.While this methodology does an excellent job of controlling for other variables, perhaps participants vary in their willingness to accept the artificial construct of the task.The ish-construction provides us with potential for a more natural-seeming task.
It is worth considering what acceptability judgements regarding co-speech gesture and the ish-construction will tell us.As not-at-issue content, we expect the ish-construction to be similarly acceptable with or without gesture.If the lower acceptability ratings for utterances with cospeech gestures found by Zlogar and Davidson (2018) were a task effect, we may in fact expect to find no effect at all if we are successful in creating more naturalistic examples.Given this, acceptability judgements may tell us little regarding the semantics of co-speech gestures; however, they will be able to show whether co-speech gesture is part of the syntactic structure of the ish-construction (i.e., is the construction grammatical with/without gesture?).This in itself is a useful contribution, as knowing how co-speech gestures interact with syntactic structure may help to shed light on debate regarding whether co-speech gestures act as supplements, cosuppositions, or a hybrid (see Esipova 2019 and references therein for discussion).

EMBODIMENT AND SOCIOLINGUISTIC VARIATION.
Recent work in sociolinguistics has considered whether body movement structures linguistic variation, and what social meaning is conveyed by the variants involved.This work casts a wider net when considering body movement than that of semanticists: facial expressions, beats, and metaphoric gestures are included in addition to iconic gestures.For example, in their analysis of co-speech gesture during a town hall meeting in Arizona, Mendoza-Denton and Jannedy (2011) find that it takes two roles: gestural apices co-occur with intonational pitch accents, and speakers use gesture to create a metaphorical spatiality.The former result suggests that gesture helps to structure variation, while the latter shows a use for gesture in constructing a discourse context, and therefore social meaning, through indexing contextual relationships between constituents and their representative.Neither of these roles, however, would be included in a study of the semantics of co-speech gestures like those outlined above.
Much of the recent sociolinguistic work draws on research in psychology that suggests that co-speech gestures and body movement more generally are related to affect (see Podesva 2016 and references therein).Eckert (2010) suggests that whether a speaker is conveying positive or negative affect can be indexed by vowel quality, and that affect and its connection to stancetak-ing may thus play an important role in sociolinguistic variation.Because body movements like smiling may index positive affect as well (Podesva 2016), the key claim is that the correlation of body movement with linguistic variation is a correlation between affect and variation (Podesva et al. 2015, Podesva et al. 2015, Voight et al. 2016).That is, part of the social meaning of the variants in question is that the speaker is conveying positive/negative affect.This has been argued both with respect to voice quality and vowel quality.With respect to voice quality, head cant is correlated with higher pitch (Voight et al. 2016), and smiling and greater movement amplitude correlate with higher pitch and use of creaky voice (Podesva et al. 2015).In the case of vowel quality, smiling is found to correlate with fronting of the GOAT vowel (Podesva et al. 2015).Podesva (2016) suggests that this may also be true of the front vowels associated with the California Vowel Shift (KIT, DRESS, and TRAP), and that smiling and greater movement amplitude also correlate with vowel lowering.The claims for vowel quality, then, are that positive affect is linked to a fronted and lowered vowel, while negative affect is linked to a backed and raised vowel.
Note that the above studies are correlative; they find that body movement and linguistic variation pattern together through analysis of speech and gesture production.As such, it is unclear what part of the utterance the ascribed social meaning is attached to.That is, while we can suggest that affect is embodied in sociolinguistic variation, we cannot determine from a production study if that meaning is carried by the variant, the gesture, or both.A key question for this line of research, then, is whether listeners perceive gesture as contributing social meaning to an utterance.For our particular focus, the question is thus whether gesture influences social evaluation of a speaker using the ish-construction.

Methods.
A perception task was built in Qualtrics and distributed via Amazon Mechanical Turk.Native American English speakers were recruited; respondents with an IP address located outside of the United States were excluded (n=387 participants in total).The task involved watching paired YouTube videos of a teacher-student interaction and then providing feedback on the student's speech.The 'teacher' and 'student' roles were played by two white women, both native speakers of North American English (one Canadian, one Vermonter).
In each instance of the interaction, the 'teacher' asked about the status of an assignment (i.e., Did you finish your project?See Figure 1).

Figure 1. Example of 'teacher' video in experimental setup
In test frames, the 'student' replied using the ish-construction (2), while in filler frames, the 'student' replied affirmatively or negatively in a clearly grammatical or ungrammatical sentence (3).It should be noted, however, that conditions were obtained through recording separate takes of each test frame.It is possible that unaccounted for variables such as the speaker's facial expression or the pitch contour of the utterance varied between takes.Any variability would introduce a confound: which differences between takes were participants responding to?Takes used in the study were actively selected to minimize these differences, but I acknowledge that there may yet have been some variability.In any case, the presence or absence of the shrug appears to be the most obvious difference between items in the gesture condition.Furthermore, even if participants responded to facial expression rather than the exaggerated shrug, they would still have responded to an iconic gestural component.I believe, therefore, that the gesture condition does in fact test participants' reaction to the inclusion or exclusion of a gestural component.
The two conditions result in four possible combinations of presence/absence of gesture/pause.To account for possible frame effects, a between-subjects design was adopted in which participants were randomly divided into four groups.For each group, each possible combination was assigned to one test frame.The combinations were rotated across groups such that each group saw a given frame in a different combination of pause/gesture condition.Each participant thus viewed eight teacher-student interactions, one for each of the test and filler frames in (2-3).As such, each participant saw one test item for each possible combination of pause/gesture condition.A grammatical and ungrammatical filler were viewed first, followed by a randomized order of the test frames and remaining fillers.
Upon viewing the teacher-student interaction, participants were asked to rate how natural they found the student's reply on a 1-100 sliding scale.Use of this rating follows the methods of Esipova (2019) and Zlogar and Davidson (2018), and is intended to at least in part be a grammaticality judgment.In addition to NATURALNESS, participants also rated the student for six social attributes using the 1-100 sliding scale: HONEST, INTELLIGENT, CONFIDENT, HARDWORKING, FRIENDLY, and ORGANIZED.These attributes cluster into three categories of evaluation (see Bauman 2013 for discussion): Attractiveness (Honest, Friendly), Status (Organized, Intelligent), and Dynamism (Hardworking, Confident).After the last item, participants completed a short demographic questionnaire in which they self-reported their AGE, GENDER, and ETHNICITY.

Results.
Ratings were normalized to z-scores for each participant, and then rescaled using the overall mean rating and overall standard deviation of ratings.This was complicated because some participants gave the same rating for a given social attribute across all student responses.While participants who gave the same Naturalness rating to each student response could be discarded on the grounds that they did not actually participate in the study (regardless of their response to test items, they should have responded differently to grammatical and ungrammatical fillers), this was not the case for social attributes.It is of course possible that a participant could in good faith rate the student as equally confident across the study, especially because the same woman portrayed the student in each item.It is not possible to obtain a z-score in this scenario, as it leads to a divide-by-zero error.To account for this, responses were subjected to the jitter() function in R (R Core Team 2017) to introduce a slight degree of variability and allow for normalization.
Data was analyzed using linear mixed effects regression (Bates et al. 2014) using the general formula in (4).The test conditions and their interaction, along with the collected demographic factors, were fixed effects.The age factor was scaled to z-scores.The test frame and participant were random intercepts, while the order in which the items were presented was included as a random slope.The models used white women responding to the +pause, +gesture conditions as a baseline.Because we are interested in the outcome of seven regression models, I use the Bonferroni correction to take a p-value of 0.0071 as indicating the 0.05 threshold for significance.
(4) DV ~ Gesture * Pause + Age + Gender + Ethnicity + (1|Frame) + (1+Presentation Or-der|Participant) When naturalness is the dependent variable, we are able to simply use the normalized rating in the place of DV.This is not the case for the social attributes; as one might expect, the ratings for social attributes are positively correlated with the Naturalness rating (p << 0.0001 for all).The r 2 values are quite low for each attribute, ranging from 0.0497 for Hardworking to 0.1482 for Intelligent.This means that while the Naturalness rating is correlated with attribute ratings, it does not explain much of the variation in these ratings.However, we still want to make sure that any effects for an attribute are not in fact effects on Naturalness.As such, I residualize the social attribute ratings by regressing them against the Naturalness rating, calculating the residuals, and using the residuals as the dependent variable for the analysis outlined in (4).Because residuals capture the amount of variability in a data set not explained by a model, doing this allows us to test our fixed and random effects on the data not explained by Naturalness (see MacKenzie 2012, Becker et al. 2017, for similar uses).Any significant effects thus represent an effect on the social attribute, which is our target.
Of the test conditions, only the prosodic component affected the Naturalness rating (Figure 3).Omission of the pause significantly lowered the rating (β = -8.9807,p << 0.0001).As Figure 3 shows, this effect places the +pause and -pause conditions on opposite sides of the scale midpoint; the effect thus means the omission of pause takes the rating from overall favorability to overall disfavorability.

Figure 3. Naturalness ratings by gesture and pause conditions
There was no effect of gesture on Naturalness.In addition, two social factors affected Naturalness ratings: age and ethnicity.Older speakers rated the student response as less Natural than younger speakers (β = -1.5826,p = 0.0057), corroborating the survey evidence reported in Duncan (2015) that acceptance of the ish-construction is a relatively recent and rapid change in apparent time.African American participants rated the responses as more Natural overall than white participants (β = 6.3182, p = 0.0002).Pause was also the only test condition to affect the rating for Confident; omission of the pause was rated as more Confident (β = 3.7826, p = 0.0059).Participants identifying as male or Native American also rated the speaker as more Confident than the baseline participants (male: β = 2.7218, p = 0.0064; Native American: β = 7.6804, p = 0.0017)Although gesture did not affect Naturalness or Confidence ratings, it did significantly influence ratings for three social attributes: Hardworking, Friendly, and Honest.As seen in Figure 4, the student was rated as more Hardworking when her utterance was not accompanied by a gesture (β = 4.0537, p = 0.0015).Similar to our finding for Naturalness, the gesture effect indicates a shift from overall Not-Hardworking to overall Hardworking when gesture is omitted.There was no effect of pause or social factors that met the p < 0.0071 threshold for significance.Ratings for the Friendly and Honest attributes followed the pattern exemplified in Figure 5. Regardless of the test condition, the student was generally rated as Friendly and Honest by participants.However, there are significant differences in rating based on condition.As seen, the student is rated most Friendly when both gesture and pause are present.Removing either or both lowers the rating.This manifests in the model as a significant main effect for gesture (Friendly: β = -6.9709,p << 0.0001; Honest: β = -3.6557,p = 0.0026) and pause (Friendly: β = -5.6624,p << 0.0001; Honest: β = -3.1351,p = 0.0098), and a significant interaction term between pause and gesture (Friendly: β = 5.9580, p = 0.0005; Honest: β = 4.8487, p = 0.0047).Note, however, that the main effect of pause on Honesty, while well under 0.05, does not reach our corrected 0.0071 threshold.It is necessary to use it, however, because to properly interpret the significant interaction term requires interpreting the main effect as well.The interaction term essentially serves to prevent a compounded effect: omission of gesture or pause makes one appear less Friendly or Honest, but omitting both is not doubly bad.While no social factors influence the Friendly rating, ethnicity does have an effect on the Honest rating.Respondents who identified as Native American rated the speaker as significantly more Honest than white participants (β = 8.1417, p = 0.0003).
6. Discussion.Our aim in this study was to test whether gesture contributes to grammaticality and social evaluation of the ish-construction.Let us first consider the former question by looking to the results regarding Naturalness ratings.The goal is for this use of Naturalness to be a proxy for grammaticality.That is, speakers for whom the ish-construction is ungrammatical would rate the examples as less Natural than speakers for whom the construction is grammatical.A significant effect of gesture on Naturalness would thus indicate that the absence of gesture improved/worsened grammaticality of the construction.If this is indeed how participants used this rating, gesture appears to not influence grammaticality at all.Note that this is not necessarily a surprise; as the construction appears to contribute not-at-issue content, we do not expect ratings to be degraded due to the bias against at-issue co-speech gestures (Esipova 2019).This finding contrasts with the effect that we find for pause: absence of pause lowers Naturalness ratings, and therefore presumably worsens grammaticality.This is the expected effect, indicating that participants may indeed have used Naturalness as a proxy for grammaticality.If so, this is evidence in support of the claim in Duncan (2016) that the prosodic component of the ish-construction is part of its syntactic structure.The same, apparently, cannot be said of the gestural component.This represents a key difference between the two modalities.
Somewhat similarly, we might have thought that even if gesture did not affect grammaticality per se, it would still play a significant role in interpretation of the construction's meaning by influencing the perceived degree of commitment to the proposition.This too does not appear to be the case.Had it been, we would have thought that the presence or absence of gesture would perhaps influence the rating of the speaker's Confidence.However, we did not find this effect for gesture.Again, there was a significant effect of pause on this attribute.Perhaps the prosodic component of the construction does play this role in interpretation.Further research could explore this by utilizing a continuum of pause length similar to the approach of Holliday and Villareal (2018) in their study of the perception of intonational variables, rather than a binary pause/no pause condition.
In contrast to grammaticality, we do find that gesture influences the social evaluation of a speaker using the ish-construction.Participants found the student to be more Hardworking and less Honest and Friendly when the gesture component was omitted.This suggests that the lack of gesture is perceived negatively, as even being seen as more Hardworking is not necessarily a positive thing when the person is not seen as Honest or Friendly.It is interesting to note that Honest and Friendly were the two attributes corresponding to the Attractiveness category of evaluation.Gesture thus appears to be clearly tied to a specific domain of evaluation.
To the extent that the gestural component of the ish-construction appears to be obligatory, our results suggest that the obligation is social rather than grammatical.That is, the component is included because its omission leaves a negative impression on the listener and exposes the speaker to negative repercussions, not because its inclusion is strictly necessary to properly convey the utterance.We can see how this works in the situation created for the perception task.The task involved a teacher-student interaction in which the student had not quite completed an as-signment.If a non-gesturing student is perceived negatively in such an interaction, there are likely to be negative repercussions such as detention or a failing grade for them.While gesturing does not change the content, by coming across more positively, the student may ameliorate some of the repercussions.
Following this logic, I suggest that what gesture contributes to the ish-construction is positive affect.Our results thus show that co-speech gestures do not simply correlate with patterns of variation in a manner that suggests that speakers use it to display affect, but that listeners also perceive affect through use of co-speech gestures.This means, then, that co-speech gesture is potentially a resource available to be conventionalized for such a contribution.As such, affect appears to represent a type of not-at-issue content that may be conveyed by gesture.To adopt this view requires affect to be conveyable content that is not directly deniable (Ebert andEbert 2014, Esipova 2019).This condition appears to be easily met here; a dialogue such as ( 5) is impossible to imagine.
b. # No, you're not oriented positively toward me.
A key question for future studies is whether this use of gesture is specific to the construction at hand or more general (i.e., gesturers are evaluated more positively for traits related to Attractiveness regardless of environment).Such future work should involve perspectives from semanticists and sociolinguists alike; note that by treating affect as not-at-issue content, we have essentially unified the research into the meaning of co-speech gestures with the research into the role of embodiment in structuring variation.
It is noteworthy that we also find an effect of pause on the Honest and Friendly attributes, as well as an interaction between gesture and pause.The interaction is puzzling: if the absence of gesture and pause in isolation are both evaluated as less attractive, why does this effect not compound when both are absent?This may be potentially explained by contextual non-attention (Levon 2014).In isolation, the absence of pause or gesture negatively influences the evaluation of speaker Attractiveness.When neither is present, however, only one absence is noticed, leading participants to evaluate the speaker along that dimension alone.If this were the case, it is unclear from our results which absence is noticed and which is ignored.Regardless, our results indicate that gesture plays a role in social evaluation.That is, gesture still appears to convey positive affect even if the absence of pause were to take primacy when both pause and gesture are omitted.
The effects due to social factors, while not clearly tied to the gesture results, nevertheless deserve some comment.Regarding age, we find that Naturalness ratings increase in apparent time.This result mirrors the survey of grammaticality judgments reported by Duncan (2015).As such, it is perhaps additional evidence that we can interpret a sliding scale of Naturalness as a proxy for grammaticality.If so, this experiment shows additional evidence of the ishconstruction being a recently begun change in progress.Given this, it is rather surprising that there was no effect of age on the social attributes.One possibility is that any such effect would have disappeared as a result of the residualization process.
Effects of ethnicity and gender are equally surprising, as we did not consider interactions between the social factor and gesture/pause.The findings that males viewed the student as more Confident than female participants did and that Native Americans viewed the student as more Confident and more Honest than white participants did are main effects, occurring regardless of the experimental condition.It is unclear how to interpret these results, and whether they are meaningful.It is particularly unclear whether the latter results are meaningful, as few partici-pants identified themselves as Native American (n=16).One possibility is that gesture is evaluated differently across groups, but we had insufficient data to include the interaction in our models.Given the finding that gesture conveys affect, these results should thus be investigated further to determine whether different social groups evaluate gesture differently.Such investigations will necessitate a sufficiently large sample from non-white populations to obtain robust results.The finding that African Americans give a higher naturalness rating than white participants do is likewise surprising, although it can perhaps be explained if we hypothesize that the ishconstruction began as a feature of African American English.There is no evidence of this in the literature,2 however, and therefore such a hypothesis would need quite a bit of testing.

Conclusion.
This paper identifies gesture as an additional modality of the English ishconstruction, and seeks to understand the role that gesture plays in the construction.We find that unlike prosody, gesture does not influence the grammaticality of the construction.This suggests that it is not part of the syntactic structure of the construction.Gesture does, however, influence social evaluation of a speaker using the construction.The speaker is rated lower for both Honesty and Friendliness, two attributes that are tied to the domain of Attractiveness, and rated as more Hardworking when gesture is absent from the utterance.I suggest that these results indicate that speakers use gesture in the ish-construction to convey positive affect.Because affect is a type of not-at-issue content, this finding is in line both with recent research in semantics and sociolinguistics.As such, I suggest that there is room for researchers from both fields to collaborate on further exploring the role of co-speech gesture in spoken language.
(2) a.I finished my project ish.b.I read the book ish.c.I wrote my paper ish.d.I started my homework ish.(3) a. Yeah, I finished it last night.b.No, I didn't get a chance to start.c. * Yes, finished it night last I did.d. * No, yet start it I not have.The test frames involved two conditions: presence or absence of the prosodic component (PAUSE), and presence or absence of the gestural component (GESTURE).When present, the gesture condition used an exaggerated shrug in which both hands were visibly raised (Figure2).

Figure 4 .
Figure 4. Hardworking ratings by gesture and pause conditions

Figure 5 .
Figure 5. Friendly ratings by gesture and pause conditions