Visual boundaries in sign motion: processing with and without lip-reading cues

. Sign languages demonstrate a higher degree of iconicity than spoken languages. Studies on a number of unrelated sign languages show that the event structure of verb signs is reflected in the phonological form of the signs (Wilbur (2008), Malaia & Wilbur (2012), Krebs et al. (2021)). Previous research showed that hearing non-signers (with no prior exposure to sign language) can use the iconicity inherent in the visual dynamics of a verb sign to correctly identify its event structure (telic vs. atelic). In two EEG experiments, hearing non-signers were presented with telic and atelic verb signs unfamiliar to them, which they had to classify in a two-choice lexical decision task in their native language. The first experiment assessed the timeline of neural processing mechanisms in non-signers processing telic/atelic signs without access to lip-reading cues in their native language, to understand the pathways for incorporation of physical perceptual motion features into linguistic processing. The second experiment further probed the impact of visual information provided by lip-reading (speech decoding based on visual information from the face of the speaker, most importantly, the lips) on the processing of telic/atelic signs in non-signers.

1. Introduction. In the course of human evolution, the ability to identify and interpret discrete events in the fluidly changing environment was one of the most critical functions of cognition. As humans developed the ability to communicate using language, information about actions -their structure, temporal parameters, and participants -took a central role in linguistic communication in the form of verbs and their linguistic features. Verbs and their arguments are central to any communicative message, and consistencies in their relationships provide the basis of linguistic patterns across languages (Evans & Levinson (2009), Greenberg et al. (1963)). Every sentence in linguistic communication is centered on transmitting information about an action or an event, that is, predication. The verb and its arguments, which provide the basis of every sentence, can describe the event in two ways: as having an inherent boundary or an endpoint (telic events), or not inherently bounded or limited (atelic events).
Understanding how action processing feeds into language processing can be groundbreaking in terms of modeling language disorders, identifying them early, and developing therapies. The hypothesis that language builds on general, non-linguistic abilities -such as the ability to identify, parse, and interpret actions -has not been conclusively tested in spoken languages, as they differ from action in modality (auditory vs. visual). Sign languages allow investigation of the processes of action comprehension and language understanding within a single modality, testing the relationship between the two at various processing stages, from sensory perception to higher cognition.
In sign language verbs, event structure is often perceptually reflected in the form of the signs, i.e. the hand articulator motion dynamics. For example, Wilbur (2003) observed that in American Sign Language (ASL) lexical verbs can be analyzed as telic and atelic based on their phonological form, with telics having a more rapid deceleration to the place of articulation at the end of the sign reflecting semantic end-state of affected arguments. The observation that semantic verb classes are characterized by certain movement profiles was formulated as the Event Visibility Hypothesis (EVH; Wilbur (2008)). Empirical evidence for the EVH came from motion capture research, which indicated systematic kinematic distinctions between telic and atelic verbs, whereby the endpoint of the event in telics is marked by a higher peak velocity and significantly faster deceleration at the end in contrast to atelics (Malaia et al. (2008(Malaia et al. ( , 2013a, Krebs et al. (2021), Malaia & Wilbur (2012)).
Prior research suggests that from the standpoint of neural computations, language and action processing have a lot of overlap. Humans rely on dynamic features of visual motion for perceptual segmentation of the visual and linguistic signal. Multiple studies have shown that reality is segmented into events at multiple scales simultaneously (Zacks et al. (2001a,b)). Such event segmentation studies typically ask participants to watch a video with a dynamic scene and indicate time-points at which the participants think an action is completed; participants can do so at finegrained and coarse-grained boundaries. Across cohorts, participants show remarkable agreement in identifying the timing boundaries of both coarse and fine-grained events, either in realistic scenarios (e.g. how one folds laundry), or in abstract moving-dot experiments (Kurby & Zacks (2008), Speer et al. (2007), Zacks et al. (2001a)).
The ability to identify, hierarchically structure, and remember segmented portions of the signal appears to be transferable between action and linguistic domains. Strickland et al. (2015) provided an example of action-to-language processing transfer, showing that non-signers are capable of identifying telic/atelic semantics of sign language verbs in the absence of any prior exposure to a sign language. Non-signers, who were shown videos of sign language verbs differing in event structure and resulting motion signatures, were asked to select the likely meaning of the observed sign from two English verbs. Participants accurately inferred lexical aspectual meaning ('Aktionsart') from visual stimuli, distinguishing between atelic and telic signs with unknown meaning. The fact that non-signers were able to make sense of the visual signal suggests that the presence/absence of a dynamic visual boundary was sufficient for action segmentation. Due to the linguistic nature of the task, inference about the event structure of the verb would have been carried out on the basis of action segmentation (telic vs. atelic). Strickland et al. (2015) concluded that linguistic notions of telicity and mapping biases between telicity and visual form were universally accessible, shared between signers and non-signers.
A reverse phenomenon -language-to-action transfer of skill in segmentation of visual signalhas been demonstrated in a series of experiments in which signers and non-signers were asked to reproduce dynamic point-light drawings (Klima et al. (1999)). Signers, but not speakers, made a crucial distinction between strokes and transitions in the point-light display: signers did not draw the lines which represented transitional motion between "strokes" of the drawings. The stimuli were not linguistically informative for any of the participants; however, the signing participants were able to extrapolate their linguistic experience in visual segmentation of a signal (e.g. ignoring transitional movements between meaningful signs) to a non-linguistic task that focused on action segmentation and structuring. While non-signers and signers appear capable of relying on similar motion cues for segmentation of the visual signal and assignment of meaning, signers are capable of more nuanced structuring of the visual signal.
From multiple perceptual features experimentally tested as potentially relevant for visual action comprehension (e.g. distance between pairs of moving objects, relative location, speed, acceleration, etc.), changes in speed of individual objects emerged as the feature most highly correlated with event boundary identification. Action start and end times, as identified by participants, are highly correlated with increases and decreases of speed (acceleration and deceleration) (Zacks et al. (2006)). Rate of deceleration is also one of the motion features used for differentiating telic from atelic verbs in sign language production. At the neural level, these changes in speed of individual objects were associated with increased activity in the area of the brain termed MT+, and a nearby region in the superior temporal sulcus -both associated with processing of biological motion (Zacks et al. (2006)). Very similar neural activations were reported in sign-naïve participants observing signed sentences in ASL involving telic and atelic verbs (Malaia et al. (2012a)); yet, signers observing the same stimuli show focused activation in the left inferior frontal gyrus, an area related specifically to language processing. This indicates that while both signers and non-signers operate on the same perceptual information (i.e. both visually process the perceptual-kinematic difference between telic and atelic ASL signs), only familiarity with the language allows low-level perception of motion differences in the signal to be processed as information at the linguistic levels.
The findings described so far show that a perceptual-kinematic velocity feature used for nonlinguistic event segmentation is incorporated into the language system to be processed as an abstract linguistic feature by Deaf 1 signers (Malaia et al. (2012a)). Although the cross-linguistic nature of motion-based interpretation of lexical aspect in sign languages is widely attested (Wilbur (2008)), along with the consistency of both signers and non-signers in interpreting signs with such features (Strickland et al. (2015), Kuhn et al. (2021)), the neural bases of this universal mapping from motion features to linguistic features are not well-described. To investigate the neural timeline of mapping between visual motion and linguistic event structure, we recorded ERP (Event related potential) data during processing of telic and atelic signs in hearing non-signers. Participants were asked to label the viewed signs using a two-alternative-forced-choice task in their native language, and, additionally, to indicate how certain they were of their decision. In Experiment 1, the sign language stimuli, which represented unknown input for the participants, consisted of signed telic and atelic verbs from Turkish Sign Language (TID), Italian Sign Language (LIS), Sign Language of the Netherlands (NGT) (from Strickland et al. (2015)), and Croatian Sign Language (HZJ). In Experiment 2, the sign language stimuli consisted of telic and atelic signs from Austrian Sign Language (ÖGS) which were accompanied by mouthing (mouth movement forming (part of) a German word) that potentially provided additional information to the participants who were native German speakers.
Based on previous research (Strickland et al. (2015), Kuhn et al. (2021)), we hypothesized that non-signers would be able to accurately classify telic/atelic verbs. In line with Ji & Papafragou (2020), we expected to see higher classification accuracy for bounded events. Previous neurolinguistic studies also showed that hearing non-signers relied on sensory/occipital cortices (including MT+ region) when processing telic vs. atelic signs (Malaia et al. (2012b)). We thus expected that the sensory-perceptual difference between verb types would be reflected on the neurophysiological level in ERPs within early time windows (before 300 msec post-stimulus onset). Our research question centered on the processing mechanisms involved in the form-to-meaning mapping/integration process (past 300 msec post-stimulus onset). Due to the linguistic nature of the task, linguistic processing indicators could be expected in both conditions. However, based on prior behavioral research, it could be expected that the timeline for the process of linguistic mapping/integration might differ between telic and atelic signs within each experiment, as well as between the experiments with and without lip-reading cues.

Methods.
2.1. PARTICIPANTS. 27 participants (21 female) were included in the final analysis, with a mean age of 22.96 years (SD = 3.98; range = 16-31 years). All of them were hearing students without competence in any sign language and all of them were right-handed (tested by an adapted German version of the Edinburgh Handedness Inventory; Oldfield (1971)). At the time of the study none showed any neurological or psychological disorders. All had normal or corrected vision and were not influenced by medication or other substances which may impact cognitive ability. The participants either received 20C or got credits for their study program.

MATERIALS AND DESIGN. Experiment 1:
A 1 x 2 design with the two-level factor Telicity involving telic and atelic signs was used. 36 verbs were presented in each condition (72 critical verbs), with 92 fillers, resulting in a total of 144 items. The stimuli consisted of signs that were used in the study of Strickland et al. (2015), that is atelic and telic verbs from TID, LIS and NGT, supplemented by signs from HZJ to achieve the appropriate stimuli number. For each of the sign languages, 9 atelic and 9 telic verbs were presented 2 . Experiment 2: Equivalent to Experiment 1, but the stimuli consisted of 36 atelic and 36 telic signs ofÖGS 3 .
2.3. PROCEDURE. The material was presented in 6 blocks (24 verbs in each block). Every trial started with the presentation of a stimulus video presented in the middle of the screen with a size of 820 x 540 (25 fps). The video was followed by a two-choice decision task, similar to the labeling task used by Strickland et al. (2015). Participants were asked to guess the meaning of the presented sign by forced-choice selection from two answer choices in written German. To ensure that the participants could not determine the meaning of the signs by iconically relating the meaning to the sign in Experiment 1, both answers did not show the meaning of the presented sign, but one answer choice matched the stimulus with respect to telicity. In Experiment 2, one of the answers matched both the semantics and event structure (telicity) of the stimuli, while the other had the opposite event structure. One telic and one atelic verb were presented. After the labeling task the participants rated how certain they were of their decision on a 7 point Likert scale (one stands for "very unsure", four means "about 50% sure" and seven indicates "very sure"). Prior to the experiment, a training block was presented to familiarize participants with task requirements and permit them to ask questions. The duration of breaks after each block was determined by the participants themselves. Participants were instructed to avoid eye movements and other motions during the presentation of the video material. The participants filled out a written questionnaire containing demographic questions and questions relevant for EEG data recording. Informed consent was obtained in written form.
The EEG was recorded from twenty-six electrodes (Fz, Cz, Pz, Oz, F3/4, F7/8, FC1/2, FC5/6, C3/4, CP1/2, CP5/6, P3/7, P4/8, O1/2, PO9/10) fixed on the participant's scalp by means of an elastic cap (Easy Cap, Herrsching-Breitbrunn, Germany). Horizontal eye movements (HEOG) were registered by electrodes at the lateral ocular muscles (left and right) and vertical eye movements (VEOG) were recorded by electrodes fixed above and below the left eye. All electrodes were referenced against the electrode on the left mastoid bone and offline re-referenced against the averaged electrodes at the left and right mastoid. The AFz electrode functioned as the ground electrode. The EEG signal was recorded with a sampling rate of 500 Hz. For amplifying the EEG signal we used a Brain Products amplifier (high pass: 0.01 Hz). In addition, a notch filter of 50 Hz was used. The electrode impedances were kept below 5 kΩ. Offline, the signal was filtered with a bandpass filter (Butterworth Zero Phase Filters; high pass: 0.1 Hz, 48 dB/Oct; low pass: 20 Hz, 48 dB/Oct).

Data analysis.
3.1. BEHAVIORAL DATA. Experiment 1: The effects of Telicity and Language were examined for the participants' accuracy regarding the two-choice decision task. Behavioral data per participant and per item were assessed using repeated-measures analysis of variance (ANOVA). The fixed factors Telicity (telic vs. atelic) and Language (LIS, TID, HZJ, NGT) and the random factors SUBJECTS (F Subj ) and ITEMS (F Item ) were included. The statistical analysis was carried out hierarchically; only significant interactions (p≤.05) were resolved using a step-down approach. To correct for violations of sphericity, the Greenhouse & Geisser (1959) correction was applied to repeated measures with greater than one degree of freedom. Only significant effects (p≤.05) are reported. Experiment 2: Analysis was the same as in Experiment 1, except that only the fixed factor Telicity (telic vs. atelic) was included in the analysis.
3.2. ERP DATA. Data analysis was the same for the two experiments. To determine the onset and offset of the effects, we computed a 50 msec time window analysis. Statistical evaluation of the ERP data was carried out by comparison of the mean amplitude of the ERPs within the time window, per condition and per subject in two regions of interest (ROIs). The factor ROI involved the levels anterior = F3, F4, F7, F8, FC1, FC2, FC5, FC6, Fz, Cz, and posterior = P3, P4, P7, P8, PO9, PO10, O1, O2, Pz, Oz. The signal was corrected for ocular artifacts by the Gratton and Coles method (Gratton et al. (1983)) and screened for artifacts (minimal/maximal amplitude at -75/+75 µV). Data was baseline-corrected to -300 to 0. Statistical analysis was carried out in a hierarchical manner, that is, only significant interactions (p≤.05) were included in a step-down analysis. For statistical analysis of the ERP data an ANOVA was computed including the factors of condition Telicity (atelic vs. telic) and ROI. Only significant effects (p≤.05) are reported. ERPs were measured with respect to the time point when the target handshape reaches the target location where the movement of the verb sign starts. Significant processing differences for telics compared to atelics were revealed at the neurophysiological level. Beginning from sign onset (i.e. target handshape positioned in target location), statistically significant neural differences in processing appeared across several time ranges anteriorly (0-200 msec, 500-550 msec, 650-800 msec, 850-1300 msec, and 1400-1500 msec), posteriorly (600-700 msec, 750-1050 msec, and 1250-1300 msec), and in a broadly distributed manner (200-250 msec and 300-400 msec) (see Figure 1).

EXPERIMENT 2: LIP-READING CUES.
4.2.1. BEHAVIORAL DATA. Participants gave correct responses above chance level regarding telic and atelic signs. They were more accurate with respect to the telic condition compared to the atelic condition (telic, 94.75% accuracy; atelic, 89.81% accuracy). The analysis of variance of participants' accuracy revealed a significant main effect of Telicity [F Subj (1, 26) = 22.49, p <.001, η 2 p = .46]. 4.2.2. ERP DATA. With ERP onset time-locked to the point when the target handshape of the sign reached target location, data analysis revealed a more posteriorly distributed positive effect for telic compared to atelic signs in the 250 to 500 msec time window, a broadly distributed positive effect in the 500 to 600 msec time window, and a posteriorly distributed positive effect in the 600 to 1650 msec time window. Furthermore, an anteriorly distributed negative effect for telic compared to atelic signs was identified in the 1800 to 1850 msec window (see Figure 2).

Discussion.
Replicating previous results, the behavioral data analysis indicates that non-signers classify signs as telic or atelic with high accuracy (Strickland et al., 2015;Kuhn et al., 2021). Across sign languages, telic signs were classified more accurately than atelic signs. The finding that participants classified telics more accurately than atelics is in line with Ji & Papafragou (2020), who report that the category of bounded events was identified with greater ease compared to that of unbounded events. Ji & Papafragou (2020) suggest that the involvement of an internal structure that culminates in defined endpoints makes bounded events easier to individuate, track, and generalize Figure 1: Telic (red)/atelic (blue) sign processing without non-manual cues; difference wave in black over, as compared to unbounded events. The present data extends this observation to sign language stimuli and the use of a linguistic task.
Differences between processing of telic and atelic signs were also found at the neurophysiological level, since different ERP patterns were observed for Experiment 1 and Experiment 2. In Experiment 1, ERP analysis identified differences in processing timeline between telic vs. atelic Proceedings of ELM 2: 164-175, 2023 Julia Krebs, Evie A. Malaia, Ronnie B. Wilbur and Dietmar Roehm: Visual boundaries in sign motion. 170 Figure 2: Telic (red)/atelic (blue) sign processing with non-manual cues in native language; difference wave in black signs in both early (prior to 300 msec past stimulus onset), and later time windows. The effects in early time windows (starting at sign onset) likely reflect the difference in sensory-perceptual processing, i.e. the processing of the difference in movement dynamics between verb types. Anterior and posterior ERP effects for telic compared to atelic stimuli appearing in later time windows likely reflect different mapping/integration processes for telic signs. Experiment 2 ERP data indicated later onset of processing differences between telic vs. atelic signs, and almost exclusively posterior (temporo-parietal-occipital) distribution of sustained (to 1600 msec) differences in processing. In debriefing, the participants indicated their reliance on mouthing information for this experiment, which suggests likelihood of attempts at integrating visual speech (lip-reading), manual, and spoken lexical information prior to the decision-making task.
In contrast to Experiment 1, Experiment 2 revealed ERP effects for telic compared to atelic signs that started in later time windows, extended into later time windows, and showed a primarily posterior distribution. Thus, instead of the early perceptual processing based on sign kinematics, observed in Experiment 1, the participants seemed to rely on mouthing information, as described in the lip-reading literature. For example, research on lip-reading of speech in silence shows that observation of lip-motion leads to generation of auditory speech representation in temporal auditory cortices (Bourguignon et al. (2020)). The sustained parietal effects in Experiment 2, then, might be due to multimodal (visual and auditory) stream integration, as well as, possibly, lexical access resulting from successful integration; however, more specific research is necessary to ascertain the timeline of audiovisual integration for sign language mouthing in non-signers.
The differences in the morphology of ERP effects elicited by telic vs. atelic stimuli likely reflect differences in cognitive/linguistic processing between verb types and different mapping and/or integration processes for telics compared to atelics. Previous work showed that the telicity in verbs may facilitate online language processing, for example, in resolution of garden path sentences. Malaia et al. (2009Malaia et al. ( , 2012bMalaia et al. ( , 2013b investigated the effects of verbal telicity on syntactic reanalysis of reduced relative clauses in written English, whereby the verb in the relative clause was either telic or atelic. Sentences with atelic signs imposed higher processing costs at the disambiguation point, as compared to sentences with telic signs. Reduced relative clauses required re-assignment of thematic roles, which appeared to proceed more rapidly in sentences with telic verbs, potentially because bounded verbs triggered extraction of event template along with thematic roles inherent in it, thereby facilitating thematic role re-assignment. Atelic verbs, which did not provide the conceptual boundary for event segmentation, did not appear to trigger the same processing mechanism. In our experiments, participants viewed unfamiliar signs, followed by an offline classification task. ERP effects thus might reflect the segmentation operation the participants carried out for visually bounded telic signs. Although the segmentation operation might require more effort (e.g. attentional allocation and memory reference) at the point of being carried out, it is likely to facilitate the participants' performance in the offline classification task later (Malaia et al. (2009), Ji & Papafragou (2020). Therefore, the online ERP effects for telic signs, as compared to atelic signs, might potentially stem from two different sources: recruitment of additional processing resources for telics in the segment toward the end of each sign, or release of cognitive resources past sign offset. Crucially, the difference is observable at the neurophysiological level, suggesting that action-to-language mapping processes differ between visually bounded and unbounded events.
The observed differences regarding behavioral and ERP results observed for both studies suggest that participants used a different strategy in the two experiments. Whereas in Experiment 1, non-signers seemed to segment the visual signal on the basis of the signs' motion profiles, Experiment 2 suggests that, if available, non-signers use lip movement information -visual cues which they are familiar with from their L1 -for classifying unknown signs. Thus, in Experiment 2 non-signers paid more attention to lip-reading (as self-reported after the experiment), as opposed to tracking visual motion profiles in the stimuli. Because linguistic information provided by lip movement is part of audio-visual spoken language processing, it was easier for non-signers to classify the signs in Experiment 2 compared to Experiment 1.
These findings might reflect the potential evolutionary pathway of how physical-perceptual motion features were co-opted into the linguistic structure of sign languages. Cross-linguistic similarities in the visual representation of event structure have been described for a number of unrelated sign languages (Malaia & Milković (2021), Krebs et al. (2021)). Sign languages, however, differ in linguistic representation of event structure, such that realization of end-state marking might take on various forms. Comparative analysis of motion capture data also points to a variety of strategies for the mapping between physical parameters for articulator motion, and linguistic features that incorporate boundedness. Thus, although sign languages mark event structure in an iconic way, they show language-specific characteristics with respect to how event structure is represented and expressed. However, despite these differences, non-signers can classify these iconically motivated forms accurately, because articulator motion profiles overall are similar to motion profiles of observed events. This finding provides further neurophysiological evidence for the event segmentation theory in perception (Zacks & Tversky (2001), Zacks & Swallow (2007)) and the EVH for sign languages (Wilbur (2003(Wilbur ( , 2008(Wilbur ( , 2010).