This is a qualitative comparative case study analysis of eye-tracking data on the first moments of social video scenes as viewed by three participants: one with autism spectrum disorder, one with comorbid attention deficit-hyperactive disorder, and one neurotypical control.
Children with autism spectrum disorders (ASD) are known to have sensory-perceptual processing deficits that weaken their abilities to attend and perceive social stimuli in daily living contexts. Since daily social episodes consist of subtle dynamic changes in social information, any failure to attend to or process subtle human nonverbal cues, such as facial expression, postures, and gestures, might lead to inappropriate social interaction. Traditional behavioral rating scales or assessment tools based on static social scenes have limitations in capturing the moment-to-moment changes in social scenarios. An eye-tracking assessment, which can be administered in a video-based mode, is therefore preferred, to augment clinical observation. In this study, using the single-case comparison design, the eye-tracking data of three participants, a child with autism spectrum disorder (ASD), another with comorbid attention deficit-hyperactive disorder (ADHD), and a neurotypical control, are captured while they view a video of social scenarios. The eye-tracking experiment has helped answer the research question: How does social attention differ between the three participants? By predefining areas of interest (AOIs), their visual attention on relevant or irrelevant social stimuli, how fast each participant attends to the first social stimuli appearing in the videos, for how long each participant continues to attend to those stimuli within the AOIs, and the gaze shifts between multiple social stimuli appearing concurrently in the same social scene are captured, compared, and analyzed in a video-based eye-tracking experiment.
Persons with ASD are known to be characterized by behavioral deficits in social communication, based on conventional behavioral evidence from structured observational assessments and parent interviews. In addition, sensory processing abnormalities have been recently incorporated into the DSM-5 diagnostic criteria of ASD1. Social information processing involves the lower level sensory-perceptual processing and higher level social cognitive processing of social information. Sensory-perceptual processing refers to the ability to attend to social stimuli and encode them in a short-term memory bank for instant retrieval and response-planning, while social cognitive processing refers to the interpretation of social information by social reasoning and problem-solving2,3. As such, social information-processing deficits often lead to other psychobehavioral characteristics, such as social anxiety and inattentiveness. This can be illustrated by the high comorbid prevalence rate of ASD with attention deficit-hyperactive disorder (ADHD). The range of comorbidity for ADHD in ASD has been estimated at 30% to 80%, whereas the presence of comorbid ASD in ADHD has been estimated at 20% to 50%4.
Two major hypotheses have been put forward to account for the deficits in social information processing—namely, enhanced perceptual functioning (EPF) and weak central coherence (WCC). EPF refers to the overattentiveness to or preoccupation with specific parts by individuals with ASD, whereas WCC refers to their weakness to derive the essence of wholes by pulling together the interelement relationships of the parts5. Both theoretical frameworks attest to their failure to globally configure or process the multiple stimuli concurrently presented in a confined social context6,7. In an earlier face emotion recognition study using static face expression photos8, it was found that the ASD group tended to show localized processing of facial features (such as the shape of the mouth) using EPF, but seem to be weaker in configural processing, which demands pulling together the more abstract perceptual concepts as postulated by WCC, such as the spatial relationships between multiple facial components (e.g., the distance between the eyebrows and the intensity of the eye gaze)9,10.
Since daily social episodes consist of dynamic moment-to-moment subtle changes in social information, any failure to attend or engage in the sensory-perceptual processing of subtle human nonverbal cues, such as facial expression, postures, and gestures, and to make sense of the relationships of the different social stimuli might lead to inappropriate social cognitive processing. Eye-tracking experiments have been increasingly used to supplement clinical observation in social information processing studies. Eye-tracking data, in the form of scanpath patterns, visual fixation counts, and visual duration, have been major biomarkers to investigate social information processing in ASD11,12,13,14,15.
In this study, we illustrate the use of the eye-tracking technique to investigate whether the two participants with ASD and with ASD-ADHD process the first moments of social video scenes differently than the neurotypical child. The eye tracker equipment captures four major indices during viewing: the number of visual fixations, the first fixation duration, the total fixation duration, and the scanpath patterns in the form of spatial arrangement and sequence of fixation points. In this way, how fast each participant attends to the audio-visual stimuli predefined by AOIs as they first appear into the social scenes, for how long they continue to look at those AOIs, and their gaze shifts between multiple AOIs appearing concurrently in the same social scene can be captured. Any delay to fixate AOIs during the first moments (i.e., 500 ms) and the trajectory of the scanpaths provide important evidence for data analysis. Representative findings from the qualitative analysis of this single-case comparative study using this paradigm are reported.
Parental and participant consent was obtained during the recruitment process in a primary school and a children service center for ASD in Hong Kong and the study was approved by the university ethical review committee of the Education University of Hong Kong.
1. Use of a Video-based Assessment
2. Recruitment of the Participants
3. Eye-tracking Experiment
4. Data Analysis
The eye-tracking data of the three Cantonese-speaking children (with ASD, with ASD-ADHD, and a control) aged between the ages of 7 and 9 viewing three social videos using the aforementioned paradigm is presented here (Table 1).
The first fixation duration (per 500 ms target AOI) was longer for the neurotypical child (150 ms) than for the ASD and ASD-ADHD children (both 110 ms). The total fixation duration (per 500 ms target AOI) was shorter for the ASD-ADHD child (120 ms) than for both the neurotypical child (170 ms) and the ASD child (180 ms). The total number of fixation counts (per 500 ms target AOI) was the largest for the ASD child (4.62), second for the neurotypical child (4.09), and the shortest for the ASD-ADHD child (3.19).
A scanpath plot captures the visual scanning of multiple AOIs in a social scene. An example of the scanpaths of the three children for one 10 s episode in the first video is shown in Figure 4 and Videos 1 – 3.
Video 1: Scanpaths of the control. Please click here to view this video. (Right-click to download.)
Video 2: Scanpath of the child with ASD. Please click here to view this video. (Right-click to download.)
Video 3: Scanpath of the child with ASD-ADHD. Please click here to view this video. (Right-click to download.)
Figure 1: An example of essential social scenes in Video 1. In the first scene, the boy is waiting to get his meal from the cafeteria staff. In the second scene, he is looking for a seat near the lady who is talking on the phone. In the third scene, he asks the lady whether he can sit on the empty chair next to her. In the last scene, the lady does not notice his request and puts a bag on the unoccupied chair. The boy is disappointed because he could not find a place to sit. Please click here to view a larger version of this figure.
Figure 2: Eye-tracking experimental set-up. A research investigator gave instructions to the child about viewing the videos in front of the monitor on one side of the eye-tracking experiment room. The display of the videos was controlled by another investigator using another computer on the other side of the same room separated by a partition. Please click here to view a larger version of this figure.
Figure 3: An example of the target AOIs in Video 1. The colored ovals are the AOIs (i.e., face, eyes, mouth, hands, mobile phone, and the bag of the lady) that show the first moments in one of the scenes in Video 1. Please click here to view a larger version of this figure.
Figure 4: Scanpaths of the control (top), the child with ASD (middle), and the child with ASD-ADHD (bottom). Taking a social scene in Video 1 as an example, the blue dots trace the scanpaths for the neurotypical control child, the green dots for the ASD child, and the red dots for the ASD-ADHD child. The dots in the figure indicate the locations of the visual fixations. The bigger the dots are, the longer the child attend to that particular spot on the visual stimulus. The numbers in the dots represent the sequence of visual fixations within 500 ms of the video scene. Please click here to view a larger version of this figure.
Participant groups | Raven Score | Grade | First fixation duration (ms) | Total fixation duration (ms) | Fixation counts |
Control | 120 | 3 | 150 | 170 | 4.09 |
ASD | 129 | 1 | 110 | 180 | 4.62 |
ASD-ADHD | 115 | 3 | 110 | 120 | 3.19 |
Table 1: Descriptive statistics of the eye-tracker measurements of the three children.
The first-moment fixation duration was shorter for the ASD-ADHD and ASD children than for the neurotypical child. The total fixation duration was shorter for the ASD-ADHD child than for the neurotypical child, demonstrating a general reduction in visual attention to social stimuli. This showed that the ASD-ADHD child showed a delay in attending to the entry of social stimuli in a social scene. This delay might cause the child to skip registering important momentary social information, which may lead to the misinterpretation of social information and subsequent social cognitive processing.
The total number of fixation counts was lower for the ASD-ADHD child than for the neurotypical child, while the total number of fixation counts within localized AOIs was the highest for the ASD child. This seems to support past ASD findings under the framework of enhanced perceptual functioning (EPF), which suggests that children with ASD employ featural processing; hence, they visually attend to more details of the AOIs then neurotypical controls do.
When the results of the three children are compared, it shows that the ASD child performed the fewest scans across multiple AOIs of social stimuli. This might be explained by the difficulty experienced by the ASD child in pulling together the relationship between relevant social stimuli. This can be accounted for by the weak central coherence theory (CWW), which states that ASD shows deficits in sensory perceptual processing which demands simultaneous attending to and scanning between multiple AOIs.
For scanpath analysis, several limitations are noted. Even though the same scanpath picture is used, it actually contains different scenes within a temporal period (in this study, it was predefined as a video length of 10 seconds). Therefore, there might be spatial errors of gaze spots on the scanpath plot that do not necessarily represent the actual locations of what the participant is focusing on the plot. Investigators need to be cautious of these potential eyeballing errors during data analysis and interpretation.
Since the AOIs have to be marked manually on the eye tracker, there might be a latency of visual fixation from the markers themselves. Since the AOIs were manually plotted against the moving social stimuli, there might be slight errors in the duration of how long each AOI lasts across all AOIs. For example, for a predefined 500 ms, an AOI may have been marked for 498 ms or 510 ms. This may make the comparison of performances across different videos, in contrast to that in the same video, difficult as the performance baselines differ from one video to another. Nonetheless, this artifact will have the same impact on all three participants, and therefore, this may not create a bias for a particular type of participant.
The authors have nothing to disclose.
The authors acknowledge that the wider study from which this paper is generated is financially supported by the General Research Fund under the University Grants Council of Hong Kong Special Administration Region, China (grant number: GRF 844813); and by the Research Support Scheme 2017/18 of the Department of Special Education and Counselling at the Education University of Hong Kong.
Tobii Pro TX300 | Tobii | N/A | Screen based eye-tracker (300Hz refreshing rate) |
Tobii Pro Studio | Tobii | N/A | Software for analyzing eyetracking data |