Here, we present a protocol that manipulates interlocutor visibility to examine its impact on gesture production in interpersonal communication. This protocol is flexible to tasks implemented, gestures examined, and communication modality. It is ideal for populations with communication challenges, such as second language learners and individuals with autism spectrum disorder.
Understanding why speakers modify their co-speech hand gestures when speaking to interlocutors provides valuable insight into how these gestures contribute to interpersonal communication in face-to-face and virtual contexts. The current protocols manipulate the visibility of speakers and their interlocutors in tandem in a face-to-face context to examine the impact of visibility on gesture production when communication is challenging. In these protocols, speakers complete tasks such as teaching words from an unfamiliar second language or recounting the events of cartoon vignettes to an interlocutor who is either another participant or a confederate. When performing these tasks, speakers are visible or non-visible to their interlocutor, and the speaker is visible or non-visible to the participant. In the word learning task, speakers and interlocutors visible to one another produce more representational gestures, which convey meaning via handshape and motion, and deictic (pointing) gestures than speakers and interlocutors who are not visible to one another. In the narrative retelling protocol, adolescents with autism spectrum disorder (ASD) produced more gestures when speaking to visible interlocutors than non-visible interlocutors. A major strength of the current protocol is its flexibility in terms of the tasks, populations, and gestures examined, and the current protocol can be implemented in videoconferencing as well as face-to-face contexts. Thus, the current protocol has the potential to advance the understanding of gesture production by elucidating its role in interpersonal communication in populations with communication challenges.
Co-speech gestures (heretofore, gestures) – meaningful hand movements produced concurrently with speech – contribute to interpersonal communication by conveying information complementing verbal content1. According to the most widely used taxonomy2,3, gestures can be divided into three categories: representational gestures, which convey referents via their form and motion (e.g., flapping the hands back and forth together to convey flying); beat gestures, which convey emphasis via simple punctate movements (e.g., moving the dominant hand downward slightly in conjunction with each word in the phrase "right now"); and deictic gestures, which draw attention to the presence or absence of an entity via pointing (e.g., swinging the thumb backward to indicate something behind oneself). Representational gestures can be further divided into two additional categories: iconic gestures, which convey concrete referents (e.g., a bird), and metaphorical gestures, which convey metaphorical referents (e.g., ecstasy). Because gesture and speech arise from the same conceptual content, they are closely related in meaning4,5. By serving as an iconic visual medium for communication, gestures can help compensate when challenges in spoken language comprehension arise6, whether due to individual factors such as limited proficiency in the language spoken7,8 or environmental factors such as difficulty in hearing speech9,10. Thus, gestures are integral to understanding interpersonal communication in face-to-face and virtual contexts, providing insight into how speakers and listeners convey and comprehend information multimodally.
Several structured interactive tasks have been developed to measure how gesture affects interpersonal communication. These tasks include interviews, in which participants respond to questions by describing their personal experiences11,12; image description, in which a static image is presented to participants to describe13,14; puzzle solving, in which participants describe how to solve a puzzle by spatially orienting components correctly15,16; direction provision, in which participants are instructed to give directions to a location unfamiliar to listeners17,18,19; and narrative retelling, in which participants view a cartoon depicting a sequence of events and subsequently recount them20,21,22. Many of the tasks employed in the studies cited above incorporate spatial content and action, which gesture conveys with particular efficacy15,23,24,25. In the tasks for the studies cited above, participants typically produce gestures in conjunction with language, and information conveyed via gesture and its relationship to concurrently produced language is assessed. Alternatively, participants may view a recording of someone gesturing (or not gesturing) and producing language when completing one of these tasks, after which participants' comprehension is assessed. All of these tasks may be conducted virtually as well as face-to-face, enabling data collection from a wide range of participants and comparison across modalities.
The impact of gestures on interpersonal communication has been implemented with a wide range of participants. Of particular interest have been populations with communication challenges, including young children26,27,28,29,30, second language (L2) users8,31,32,33, individuals with autism spectrum disorder (ASD)20,21,22,34, individuals with specific language impairment35,36,37, individuals with aphasia12,38, individuals with brain injury39, and individuals who stutter40,41. This work has revealed that, while many of these populations can utilize gestures to facilitate communication, some, such as individuals with autism spectrum disorder, may encounter difficulty leveraging gestures to communicate effectively. The findings suggest that this may be due to the extent to which these populations take audience design and environmental cues into consideration, highlighting the practical implications of these explanations for the impact of gestures on communication.
A key feature providing insight into the impact of gestures on interpersonal communication is interlocutor visibility. A question of great importance in the field of gesture research is the extent to which participants gesture for their own benefit, indicating the offloading of cognitive operations onto the body, vs. the benefit of their interlocutors, indicating the use of gesture to communicate. This question has been investigated by examining the impact of interlocutor (non-)visibility on gesture production in face-to-face contexts via the use of an opaque partition26,42, as well as in telephone contexts13 and virtual contexts43. Overall, the results of this work indicate that, although speakers gesture for their own benefit, they produce more representational gestures when communicating with visible than non-visible interlocutors, whereas beat gesture production is similar regardless of interlocutor visibility. Thus, they suggest that representational gesture facilitates communication across a variety of contexts, suggesting that speakers take the perspective of their interlocutors into account and modify their gesture production accordingly. Although previous research examining the effect of interlocutor visibility has been instrumental in providing insight into the contributions of gesture to interpersonal communication, it has focused on English as a first language (L1) in typically developing populations, so it is unclear whether the findings can be extended to populations with communication challenges. Two such populations are L2 learners, who may struggle to communicate via speech in the target language, and children with ASD, whose verbal and nonverbal communication are abnormal. Furthermore, little research has examined the impact of interlocutor visibility on gesture production in virtual contexts, which allow the effect of the interlocutor's visibility on the participant to be disentangled from the effect of the participant's visibility to the interlocutor, so the replicability of findings from these contexts is currently unclear. Finally, some research has focused on the impact of interlocutor visibility on the production of specific types of gestures, so it is unclear whether the production of other types of gestures is similarly affected by interlocutor visibility.
The protocols described below manipulate interlocutor visibility to examine gesture production in challenging circumstances: L2 word learning and narrative retelling by individuals with ASD. The L2 word learning protocol bridges research examining the impact of observing gestures on L2 word learning with research examining the contribution of gestures to communication via an interactive word learning paradigm. In this paradigm, participants unfamiliar with the target language learn words in it and then teach these words to other participants unfamiliar with the target language, permitting the impact of interlocutor visibility on gesture production to be examined in the earliest stages of L2 acquisition in a conversational context. The cartoon retelling paradigm employs a widely used narrative retelling task in which differences in gesture production have been observed when interlocutor visibility is manipulated with a new population: adolescents with ASD. This population is of interest because language development, including gesture production, is mature by adolescence, and ASD entails difficulty with verbal and nonverbal communication, including gestures, as well as a lack of sensitivity to the communicative needs of interlocutors. Together, these protocols provide insight into the extent to which gesture production is dependent on – and, conversely, can compensate for – speech when interpersonal communication is challenging.
Based on findings demonstrating how listener visibility affects gesture production by L1 English speakers13,26,42,43, it was hypothesized that participants would produce more overall gestures and more representational gestures when discussing L2 words with a visible than a non-visible interlocutor. Based on findings demonstrating abnormalities in gesture production in ASD20,44, it was hypothesized that adolescents with ASD would produce fewer overall gestures as well as fewer representational and deictic gestures than typically developing (TD) adolescents. Moreover, based on findings showing that ASD entails difficulty with perspective taking45, it was hypothesized that gesture production by adolescents with ASD would not differ significantly in the presence of visible and non-visible interlocutors. Finally, an interaction between diagnosis and visibility was predicted, such that gesture production would not differ by visibility for adolescents with ASD but that it would for TD adolescents.
All participants provided written consent, and all protocols were approved by the Institutional Review Boards at the host institution. The L2 word learning and cartoon retelling protocols were implemented in the studies on which the representative results are based21,33. Although these protocols have been conducted only in in-person contexts to date, a related protocol manipulating the visibility of the interlocutor and the participant independently not described here is currently being conducted to provide insight into replicability in this context, with plans to expand it to populations with communication challenges, such as individuals with ASD.
1. Participants
NOTE: While respecting necessary constraints on participant characteristics for each study, samples should be representative of the larger population insofar as possible.
2. Overview of visibility manipulation
3. L2 word learning protocol
4. Cartoon retelling protocol
5. Transcription and coding
NOTE: To minimize bias, when possible, transcribe and code data blind without knowing which participants and trials were assigned to which conditions. Ideally, transcribers and coders should not be involved in data collection.
L2 word learning
Implementation: The results reported below are based on data collected from 52 healthy participants (21 males, 31 females; age: M = 20.15, SD = 1.73, range = 18-28) according to the protocol outlined above. All speech and gesture were coded by a primary coder, who could not be blind to visibility condition as data from both the speaker and interlocutor were recorded using a single camera, as described in step 3.4. To establish inter-rater reliability, a secondary coder independently coded all gestures produced by 6 randomly selected pairs (k = 389, 23.08% of the data). Agreement between the primary and secondary raters was 92% in identifying gestures and 90% for categorizing them.
Overview of analyses: In the following analyses, both speakers' and interlocutors' overall gesture production and production of different types of gestures were compared based on visibility to gauge their effect on the communication of L2 words. Due to the irregularity of normality and skewness, comparisons based on visibility were made using non-parametric Mann-Whitney U-tests, which do not have assumptions based on normality and skewness. More specifically, Mann-Whitney U tests were used to determine whether the probability of gesture production being greater in the visible condition than the non-visible condition was equal to the probability of gesture production being greater in the non-visible condition than the visible condition.
Findings: Mann-Whitney U tests revealed that primary speakers whose interlocutors were visible produced more gestures than primary speakers whose interlocutors were not visible, U = 53, p = .02. Specifically, primary speakers with visible interlocutors produced more representational, U = 30, p = 0.001, and deictic, U = 52, p = 0.02, gestures than primary speakers with non-visible interlocutors (see Figure 1). Primary speakers' beat gesture production did not differ significantly between visibility conditions, U = 82, n.s., though they produced more of these gestures in the presence of visible than non-visible interlocutors. Although interlocutors visible to primary speakers also produced more gestures than interlocutors not visible to primary speakers, this difference failed to reach significance, U = 89, n.s. None of the comparisons between different gesture types as a function of visibility reached significance for interlocutors, despite the fact that they produced more of every type of gesture in the presence of visible than non-visible primary speakers. These findings provide evidence that primary speakers increase their production of representational and deictic gestures in the presence of visible interlocutors, whereas their interlocutors may not.
Cartoon retelling results
Implementation: The results reported below are based on data collected from 41 adolescent participants, 18 of whom were diagnosed with ASD (15 males, 3 females; age: M = 15.17, SD = 2.75, range = 10.44 – 19.49) and 23 of whom were typically developing (15 males, 6 females; age: M = 15.81, SD = 2.42, range = 11.52-19.36). Participants in both groups were matched on chronological age, gender, and verbal IQ. All speech and gesture were coded by two coders blind to visibility condition and diagnosis. To establish inter-rater reliability, a secondary coder independently coded all gestures produced by 10 randomly selected participants (24.3% of the data). Agreement between the primary and secondary raters was 98% in identifying gestures and 95.4% for categorizing them.
Overview of analyses: In the following analyses, overall gesture production and production of different types of gestures were compared based on diagnosis and visibility to gauge their effects on communication. Specifically, two-way analyses of variance were employed to examine both independent and interactive effects of diagnosis (ASD vs. TD) and visibility (visible vs. non-visible).
Findings: For overall gesture production, there was no main effect of diagnosis (F < 1). However, we found an interaction of gesture type by diagnosis (F(3,117) = 0.74, p = 0.05, ηp2 = 0.07). Specifically, adolescents with ASD produced fewer metaphorical gestures (M = 0.48; SD = 0.63), t(39) = 1.99, p = 0.05, d = 0.49 and beat gestures (M = 0.26; SD = 0.60, t(39) = 2.13, p = 0.04, d = 0.67) than TD adolescents (metaphorical: M = 1.08; SD = 1.62; beat: M = 1.14; SD = 1.76). No differences between ASD and TD adolescents were found for the production of iconic (ASD: M = 0.79; SD = 0.25; TD: M = 0.77; SD = 0.25; t < 1) or deictic (ASD: M = 0.36; SD = 0.16; TD: M = 0.25; SD = 0.10; t<1) gestures. Both ASD and TD adolescents produced more gestures when speaking to a visible listener (M = 0.97 g/100; SD = 1.45) than a non-visible listener (M = 0.36 g/100; SD = 0.83; F(1,39) = 18.18, p < 0.001, ηp2 = 0.32; see Figure 2). However, overall gesture production did not show a diagnosis by interlocutor visibility interaction (F < 1). Taken together, these results indicate that adolescents with ASD produce different types of gestures than their TD peers, but adolescents with ASD increase their production of all types of gestures when speaking to visible interlocutors, like their TD peers.
Figure 1: Gesture production of primary speakers and interlocutors in L2 word learning task. Gesture production of primary speakers and interlocutors per hundred words by visibility condition (error bars represent standard error). Please click here to view a larger version of this figure.
Figure 2: Gesture production of ASD and TD adolescents by visibility in cartoon retelling task. Production of different gesture types per hundred words by ASD and TD adolescents in the (A) visible listener condition and (B) non-visible listener condition (error bars represent standard error). This figure has been modified with permission from Morett et al.21. Please click here to view a larger version of this figure.
The current protocol manipulates the visibility of the speaker and interlocutor to one another, providing insight into its impact on gesture production under challenging circumstances: L2 word learning and narrative retelling by adolescents with ASD. This protocol can be implemented either in-person or virtually, permitting participant and interlocutor visibility to be manipulated in tandem or independently. It can accommodate a wide variety of experimental tasks, gestures, and populations, providing them with the flexibility to be implemented in various circumstances in which communication is challenging and to reveal numerous ways in which gesture production is altered within these circumstances. Thus, by manipulating interlocutor visibility, the current protocol provides insight into how gesture affects interpersonal communication in populations with communication challenges.
As in other similar protocols used with healthy adult L1 speakers13,26,42,43, the most critical aspect of the current protocol is the manipulation of the visibility of speakers and interlocutors to one another. Because the current protocol was conducted in person in both studies reported here21,33, visibility was manipulated in tandem, such that speakers and interlocutors were either mutually visible or non-visible to one another. The current protocol can easily be adapted for implementation in a videoconferencing context so that the visibility of the speaker and interlocutor can be manipulated independently, permitting the effects of visibility on gesture production by the speaker and interlocutor to be disentangled43. Another critical aspect of the current protocol is its dialogic nature, which reflects naturalistic conversational contexts through the inclusion of both a speaker and an interlocutor, consistent with previous structured interactive tasks used to examine gesture production11,12,17,18,19,20,22. Notably, the interlocutor can be either a second participant or a research assistant, depending on whether their gesture production is of interest. A final critical aspect of the current protocol is not informing participants of its focus on gesture production and ensuring that they are unaware that they are video recorded until after the experimental task is complete26,42. Although participants have been aware of video recording in some studies using similar protocols12,20,43, gesture production in protocols such as the current protocol is more likely to approximate gesture production in naturalistic conversational settings when participants are unaware that their gestures are being recorded.
A major strength of the current protocol is its flexibility in terms of tasks, participants, and gestures. Although narrative retelling tasks are often used given their effectiveness in eliciting differences in representational gesture production in face-to-face contexts26,42, the manipulation can incorporate other tasks to elicit gesture production, such as the L2 word learning paradigm described here33,48, as well as interviews11,12, image description13,14, puzzle solving15,16, and direction provision17,19,27. Due to its participant friendliness, the interlocutor visibility manipulation can be used with populations with communication challenges, such as L2 learners and children with autism spectrum disorder, as described here, as well as children and adults with specific language impairment, stuttering, brain injury, and aphasia. The interlocutor visibility manipulation permits the examination of representational, beat, and deictic gestures, allowing differences in their production by interlocutor visibility to be quantified across populations. Finally, although the interlocutor visibility manipulation was originally developed for use in a face-to-face context, it can be implemented in a videoconferencing context by turning the webcam on and off, permitting direct comparison of gesture production when visibility is manipulated in videoconferencing and face-to-face contexts49, as well as independent examination of the effects of visibility of the primary speaker and interlocutor on gesture production43.
Although interlocutor visibility manipulation has the potential to advance the understanding of gesture production in populations and circumstances with communication challenges due to its participant friendliness and flexibility, it has some limitations. In its current instantiation, it is implemented in person, which may limit recruitment to local convenience samples. This limitation has been overcome via implementation in a videoconferencing context, but to date, it has not been implemented with populations with communication challenges in a videoconferencing context, so the extent to which findings concerning gesture production in these populations replicates in these contexts is currently unclear. To address this limitation, we are currently implementing a version of the manipulation with the cartoon retelling task in a videoconferencing context with healthy L1 English speakers, which we plan to implement with individuals with ASD subsequently. Despite the accessibility of the visibility manipulation to a wide range of potential participants with communication challenges, participants must be able to complete the tasks successfully, making them inaccessible to participants lacking sufficient command of the language in which they are conducted to do so. Although this limitation may be addressable to some extent via task simplification, reducing speech production demands may reduce gesture production, making differences in gesture production more difficult to detect due to interlocutor visibility.
The current protocol represents the first implementation of the interlocutor visibility manipulation with populations with communication challenges. Although interlocutor visibility has previously been manipulated to examine its effect on gesture production in healthy young children26, who may struggle to communicate in their L1 of English in some cases, they do not experience the degree and persistence of communication challenges that beginning L2 learners or adolescents with ASD do. Findings from the previously published studies discussed in the representative results section21,33 serve as a proof of concept that the current protocol can reveal similarities and differences in how interlocutor visibility affects gesture production in these populations, demonstrating that they can be implemented with individuals with greater and more persistent communication challenges than healthy L1 English speaking young children. In the future, it will be fruitful to implement the current protocol with populations with even greater and more persistent communication challenges, such as individuals with aphasia12,38, specific language impairment35,36,37, or stuttering40,41 to examine the impact of manipulating interlocutor visibility on their gesture production.
Overall, the current protocols provide a means to examine the influence of interlocutor visibility on gesture production in the presence of communication challenges during L2 word learning and narrative retelling in individuals with ASD. In doing so, it provides insight into the extent to which the gesture production of individuals with communication challenges is responsive to interlocutor needs, revealing the extent to which the individuals with communication challenges engage in perspective taking during communication. Manipulation of interlocutor visibility provides the flexibility to examine this as well as other important theoretical questions concerning gesture production with various tasks, populations, and gesture types in virtual as well as face-to-face contexts, revealing the extent to which the results are generalizable and replicable and providing insight into situations in which individual differences in gesture production may be observed. Thus, manipulation of interlocutor visibility has the potential to advance the understanding of gesture production under challenging communicative circumstances in face-to-face and virtual contexts.
The authors have nothing to disclose.
Development and validation of the L2 word learning protocol was supported by a National Defense Science and Engineering Graduate (NDSEG) Fellowship (32 CFR 168a) issued by the US Department of Defense Air Force Office of Scientific Research. Development and validation of the cartoon retelling protocol with adolescents with ASD was supported by a Ruth S. Kirschstein Institutional National Research Service Award (T32) from the National Institutes of Mental Health. The author thanks Rachel Fader, Theo Haugen, Andrew Lynn, Ashlie Caputo, and Marco Pilotta with assistance with data collection and coding.
Computer | Apple | Z131 | 24" iMac, 8-core CPU & GPU, M3 chip |
Conference USB microphone | Tonor | B07GVGMW59 | |
ELAN | The Language Archive | Software application used to transcribe speech and gesture | |
Video Recorder | Vjianger | B07YBCMXJJ | FHD 2.7K 30 FPS 24 MP 16X digital zoom 3" touch screen video recorder with renote control and tripod |
Weschler Abbreviated Scale of Intelligence | Pearson | 158981561 | Used to verify full scale IQ ≥ 80 in Morett et al. (2016) |