Comparing Eye-tracking Data of Children with High-functioning ASD, Comorbid ADHD, and of a Control Watching Social Videos


Your institution must subscribe to JoVE's Behavior section to access this content.

Fill out the form below to receive a free trial or learn more about access:


Enter your email below to get your free 10 minute trial to JoVE!

We use/store this info to ensure you have proper access and that your account is secure. We may use this info to send you notifications about your account, your institutional access, and/or other related products. To learn more about our GDPR policies click here.

If you want more info regarding data storage, please contact



This is a qualitative comparative case study analysis of eye-tracking data on the first moments of social video scenes as viewed by three participants: one with autism spectrum disorder, one with comorbid attention deficit-hyperactive disorder, and one neurotypical control.

Cite this Article

Copy Citation | Download Citations

Tsang, V., Chu, P. C. Comparing Eye-tracking Data of Children with High-functioning ASD, Comorbid ADHD, and of a Control Watching Social Videos. J. Vis. Exp. (142), e58694, doi:10.3791/58694 (2018).


Children with autism spectrum disorders (ASD) are known to have sensory-perceptual processing deficits that weaken their abilities to attend and perceive social stimuli in daily living contexts. Since daily social episodes consist of subtle dynamic changes in social information, any failure to attend to or process subtle human nonverbal cues, such as facial expression, postures, and gestures, might lead to inappropriate social interaction. Traditional behavioral rating scales or assessment tools based on static social scenes have limitations in capturing the moment-to-moment changes in social scenarios. An eye-tracking assessment, which can be administered in a video-based mode, is therefore preferred, to augment clinical observation. In this study, using the single-case comparison design, the eye-tracking data of three participants, a child with autism spectrum disorder (ASD), another with comorbid attention deficit-hyperactive disorder (ADHD), and a neurotypical control, are captured while they view a video of social scenarios. The eye-tracking experiment has helped answer the research question: How does social attention differ between the three participants? By predefining areas of interest (AOIs), their visual attention on relevant or irrelevant social stimuli, how fast each participant attends to the first social stimuli appearing in the videos, for how long each participant continues to attend to those stimuli within the AOIs, and the gaze shifts between multiple social stimuli appearing concurrently in the same social scene are captured, compared, and analyzed in a video-based eye-tracking experiment.


Persons with ASD are known to be characterized by behavioral deficits in social communication, based on conventional behavioral evidence from structured observational assessments and parent interviews. In addition, sensory processing abnormalities have been recently incorporated into the DSM-5 diagnostic criteria of ASD1. Social information processing involves the lower level sensory-perceptual processing and higher level social cognitive processing of social information. Sensory-perceptual processing refers to the ability to attend to social stimuli and encode them in a short-term memory bank for instant retrieval and response-planning, while social cognitive processing refers to the interpretation of social information by social reasoning and problem-solving2,3. As such, social information-processing deficits often lead to other psychobehavioral characteristics, such as social anxiety and inattentiveness. This can be illustrated by the high comorbid prevalence rate of ASD with attention deficit-hyperactive disorder (ADHD). The range of comorbidity for ADHD in ASD has been estimated at 30% to 80%, whereas the presence of comorbid ASD in ADHD has been estimated at 20% to 50%4.

Two major hypotheses have been put forward to account for the deficits in social information processing—namely, enhanced perceptual functioning (EPF) and weak central coherence (WCC). EPF refers to the overattentiveness to or preoccupation with specific parts by individuals with ASD, whereas WCC refers to their weakness to derive the essence of wholes by pulling together the interelement relationships of the parts5. Both theoretical frameworks attest to their failure to globally configure or process the multiple stimuli concurrently presented in a confined social context6,7. In an earlier face emotion recognition study using static face expression photos8, it was found that the ASD group tended to show localized processing of facial features (such as the shape of the mouth) using EPF, but seem to be weaker in configural processing, which demands pulling together the more abstract perceptual concepts as postulated by WCC, such as the spatial relationships between multiple facial components (e.g., the distance between the eyebrows and the intensity of the eye gaze)9,10.

Since daily social episodes consist of dynamic moment-to-moment subtle changes in social information, any failure to attend or engage in the sensory-perceptual processing of subtle human nonverbal cues, such as facial expression, postures, and gestures, and to make sense of the relationships of the different social stimuli might lead to inappropriate social cognitive processing. Eye-tracking experiments have been increasingly used to supplement clinical observation in social information processing studies. Eye-tracking data, in the form of scanpath patterns, visual fixation counts, and visual duration, have been major biomarkers to investigate social information processing in ASD11,12,13,14,15.

In this study, we illustrate the use of the eye-tracking technique to investigate whether the two participants with ASD and with ASD-ADHD process the first moments of social video scenes differently than the neurotypical child. The eye tracker equipment captures four major indices during viewing: the number of visual fixations, the first fixation duration, the total fixation duration, and the scanpath patterns in the form of spatial arrangement and sequence of fixation points. In this way, how fast each participant attends to the audio-visual stimuli predefined by AOIs as they first appear into the social scenes, for how long they continue to look at those AOIs, and their gaze shifts between multiple AOIs appearing concurrently in the same social scene can be captured. Any delay to fixate AOIs during the first moments (i.e., 500 ms) and the trajectory of the scanpaths provide important evidence for data analysis. Representative findings from the qualitative analysis of this single-case comparative study using this paradigm are reported.

Subscription Required. Please recommend JoVE to your librarian.


Parental and participant consent was obtained during the recruitment process in a primary school and a children service center for ASD in Hong Kong and the study was approved by the university ethical review committee of the Education University of Hong Kong.

1. Use of a Video-based Assessment

  1. Produce several social videos, about one minute long, that consist of daily life scenarios involving several people in a social context (Figure 1).
    1. Produce several social videos, about one minute long, that consist of daily life scenarios involving several people in a social context. For the three children in our case study, each child watched the same three videos. The first video demonstrates the following social scenario. In a crowded cafeteria, a student spots an unoccupied seat that is simultaneously occupied by a lady who is talking on the phone and places her bag on the seat with no awareness of his request (Figure 1). The second video demonstrates the following social scenario. Students are playing a chess game while an unfamiliar student comes too close to watch them playing the game. The third video demonstrates the following social scenario. A boy’s painting is ruined when his friend accidentally spills water from a cup on the table. 
  2. Conduct expert reviews of all the videos. Select those social scenarios that are agreed on the most by the experts as containing the actors’ intention, emotions, and thoughts through their expressions and gestures.

2. Recruitment of the Participants

  1. From the pool of participants who satisfied the research inclusion criteria, select and match participants with ASD, with ASD-ADHD, and neurotypical controls using their medical diagnostic reports and the percentile scores of Raven’s Standard Progressive Matrices16.
  2. Convert their Raven percentile scores to five percentile ranks. Select participants who perform at ranks II or III (average) and exclude those who scored above rank I (above average) or at rank IV (below average).

3. Eye-tracking Experiment

  1. Experimental set-up
    1. On one side of the eye-tracking room, display the videos on a 23-inch color LCD monitor with a screen resolution of 1920 x 1080 pixels, using an eye tracker at a distance of approximately 60 cm from the participant. Have a research investigator operate the eye tracker from the other side of the eye-tracking room (Figure 2).
    2. Have another research investigator sit next to the participant and instruct the participant to look at the screen of the monitor. Place the monitor in front of the child on the other side of the partition and connect to the eye tracker. The choice of eye-tracking equipment, testing environment, and the set-up procedures are previously discussed17.
  2. Calibration process
    1. Instruct the participants to watch the calibration dots that set the viewing boundaries across the screen by capturing the eye movements using infrared corneal reflectance technology. The calibration is properly done if all the green dots or lines fall within the grey circle dots.
    2. Repeat the calibration if some of the green dots or lines do not fall within the grey circle dots.
  3. Viewing of the videos
    1. Instruct the participant to view the social videos one after another, and capture their eye movement data during viewing using the eye tracker.

4. Data Analysis

  1. Define and set up the first-moment fixation within AOIs.
    1. Choose context-relevant targets (face, hands, targeted objects, etc.) in their initial 500 ms of appearance in each scene of the video as AOIs (Figure 3) and label the AOIs in the information box on the left panel.
    2. Upon the completion of the addition and selection of the AOIs in the current frame, move the cursors in the timeline bar at the bottom panel to the next frame.
    3. Adjust the location and boundary of the AOIs in each frame of the video in the presentation video software of the eye tracker manually as the target areas change in each time frame of the video due to the movement of the people or objects as the story of the social video develops.
    4. Click the Select button on the top panel and add new AOIs to the new scene if necessary. If some existing AOIs are present for 500 ms in the current scene (the timestamp of the video can be checked in the bottom left panel) or if they are not relevant in the new frame in the video, right-click on these AOIs to deactivate them in the new frame.
  2. Run a statistical analysis of the eye-tracking indices. Follow the steps of statistical data processing on the eye tracker as described below.
    1. Choose the recordings of the children.
    2. Select the Media file for analysis.
    3. Select from the available videos.
    4. Click Analyze selected media.
    5. Choose the Descriptive statistics (e.g., Sum).
    6. Choose the dependent measures in Metrics (e.g., First fixation duration, visit count).
    7. Choose Recordings in Rows.
    8. Select AOI Media Summary in Columns.
    9. Click Update to analyze the eye-tracking patterns. The results of the eye-tracking pattern metrics are shown on the screen.
  3. Create the scanpath of a scene from the eye-tracking data.
    1. Choose Visualization and GazePlot in the software.
    2. Select the Media and Recordings in the left panel for visualization.
    3. In the bottom timeline, move the lower cursor to the beginning of the target scene and move the upper cursor to the end of the target scene.
    4. Make sure Accumulate is chosen for the Data field to show the accumulative scanpath.
    5. Click Export and Visualization image to save the scanpath as a separate image file.

Subscription Required. Please recommend JoVE to your librarian.

Representative Results

The eye-tracking data of the three Cantonese-speaking children (with ASD, with ASD-ADHD, and a control) aged between the ages of 7 and 9 viewing three social videos using the aforementioned paradigm is presented here (Table 1).

The first fixation duration (per 500 ms target AOI) was longer for the neurotypical child (150 ms) than for the ASD and ASD-ADHD children (both 110 ms). The total fixation duration (per 500 ms target AOI) was shorter for the ASD-ADHD child (120 ms) than for both the neurotypical child (170 ms) and the ASD child (180 ms). The total number of fixation counts (per 500 ms target AOI) was the largest for the ASD child (4.62), second for the neurotypical child (4.09), and the shortest for the ASD-ADHD child (3.19).

A scanpath plot captures the visual scanning of multiple AOIs in a social scene. An example of the scanpaths of the three children for one 10 s episode in the first video is shown in Figure 4 and Videos 1 - 3.

Video 1: Scanpaths of the control. Please click here to view this video. (Right-click to download.)

Video 2: Scanpath of the child with ASD. Please click here to view this video. (Right-click to download.)

Video 3: Scanpath of the child with ASD-ADHD. Please click here to view this video. (Right-click to download.)

Figure 1
Figure 1: An example of essential social scenes in Video 1. In the first scene, the boy is waiting to get his meal from the cafeteria staff. In the second scene, he is looking for a seat near the lady who is talking on the phone. In the third scene, he asks the lady whether he can sit on the empty chair next to her. In the last scene, the lady does not notice his request and puts a bag on the unoccupied chair. The boy is disappointed because he could not find a place to sit. Please click here to view a larger version of this figure.

Figure 2
Figure 2: Eye-tracking experimental set-up. A research investigator gave instructions to the child about viewing the videos in front of the monitor on one side of the eye-tracking experiment room. The display of the videos was controlled by another investigator using another computer on the other side of the same room separated by a partition. Please click here to view a larger version of this figure.

Figure 3
Figure 3: An example of the target AOIs in Video 1. The colored ovals are the AOIs (i.e., face, eyes, mouth, hands, mobile phone, and the bag of the lady) that show the first moments in one of the scenes in Video 1. Please click here to view a larger version of this figure.

Figure 4
Figure 4: Scanpaths of the control (top), the child with ASD (middle), and the child with ASD-ADHD (bottom). Taking a social scene in Video 1 as an example, the blue dots trace the scanpaths for the neurotypical control child, the green dots for the ASD child, and the red dots for the ASD-ADHD child. The dots in the figure indicate the locations of the visual fixations. The bigger the dots are, the longer the child attend to that particular spot on the visual stimulus. The numbers in the dots represent the sequence of visual fixations within 500 ms of the video scene. Please click here to view a larger version of this figure.

Participant groups Raven Score Grade First fixation duration (ms) Total fixation duration (ms) Fixation counts
Control 120 3 150 170 4.09
ASD 129 1 110 180 4.62
ASD-ADHD 115 3 110 120 3.19

Table 1: Descriptive statistics of the eye-tracker measurements of the three children.

Subscription Required. Please recommend JoVE to your librarian.


The first-moment fixation duration was shorter for the ASD-ADHD and ASD children than for the neurotypical child. The total fixation duration was shorter for the ASD-ADHD child than for the neurotypical child, demonstrating a general reduction in visual attention to social stimuli. This showed that the ASD-ADHD child showed a delay in attending to the entry of social stimuli in a social scene. This delay might cause the child to skip registering important momentary social information, which may lead to the misinterpretation of social information and subsequent social cognitive processing.

The total number of fixation counts was lower for the ASD-ADHD child than for the neurotypical child, while the total number of fixation counts within localized AOIs was the highest for the ASD child. This seems to support past ASD findings under the framework of enhanced perceptual functioning (EPF), which suggests that children with ASD employ featural processing; hence, they visually attend to more details of the AOIs then neurotypical controls do.

When the results of the three children are compared, it shows that the ASD child performed the fewest scans across multiple AOIs of social stimuli. This might be explained by the difficulty experienced by the ASD child in pulling together the relationship between relevant social stimuli. This can be accounted for by the weak central coherence theory (CWW), which states that ASD shows deficits in sensory perceptual processing which demands simultaneous attending to and scanning between multiple AOIs.

For scanpath analysis, several limitations are noted. Even though the same scanpath picture is used, it actually contains different scenes within a temporal period (in this study, it was predefined as a video length of 10 seconds). Therefore, there might be spatial errors of gaze spots on the scanpath plot that do not necessarily represent the actual locations of what the participant is focusing on the plot. Investigators need to be cautious of these potential eyeballing errors during data analysis and interpretation.

Since the AOIs have to be marked manually on the eye tracker, there might be a latency of visual fixation from the markers themselves. Since the AOIs were manually plotted against the moving social stimuli, there might be slight errors in the duration of how long each AOI lasts across all AOIs. For example, for a predefined 500 ms, an AOI may have been marked for 498 ms or 510 ms. This may make the comparison of performances across different videos, in contrast to that in the same video, difficult as the performance baselines differ from one video to another. Nonetheless, this artifact will have the same impact on all three participants, and therefore, this may not create a bias for a particular type of participant.

Subscription Required. Please recommend JoVE to your librarian.


The authors have nothing to disclose.


The authors acknowledge that the wider study from which this paper is generated is financially supported by the General Research Fund under the University Grants Council of Hong Kong Special Administration Region, China (grant number: GRF 844813); and by the Research Support Scheme 2017/18 of the Department of Special Education and Counselling at the Education University of Hong Kong.


Name Company Catalog Number Comments
Tobii Pro TX300 Tobii N/A Screen based eye-tracker (300Hz refreshing rate)
Tobii Pro Studio Tobii N/A Software for analyzing eyetracking data



  1. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders - 5th Edition (DSM-V). American Psychiatric Association. Washington, DC. (2013).
  2. Williams, D. L., Goldstein, G., Minshew, N. J. Neuropsychologic functioning in children with autism: further evidence of disordered complex information processing. Child Neuropsychology. 12, 279-298 (2006).
  3. Bons, D., et al. emotional, and cognitive empathy in children and adolescents with autism spectrum disorder and conduct disorder. Journal of Abnormal Child Psychology. 41, 425-443 (2013).
  4. Van der Meer, J. M., et al. Are autism spectrum disorder and attention-deficit/hyperactivity disorder different manifestations of one overarching disorder? Cognitive and symptom evidence from a clinical and population-based sample. Journal of the American Academy of Child and Adolescent Psychiatry. 51, (11), 1160-1172 (2012).
  5. Brosnan, M. J., Scott, F. J., Fox, S., Pye, J. Gestalt processing in autism: Failure to process perceptual relationships and the implications for contextual understanding. Journal of Child Psychology and Psychiatry. 45, (3), 459-469 (2004).
  6. Behrmann, M., Thomas, C., Humphreys, K. Seeing it differently: Visual processing in autism. Trends in Cognitive Sciences. 10, (6), 258-264 (2006).
  7. Happé, F. G. E., Frith, U. The weak coherence account: Detail-focused cognitive style in autism spectrum disorders. Journal of Autism and Developmental Disorders. 36, 5-25 (2006).
  8. Tsang, V. Eye-tracking study on facial emotion recognition tasks in individuals with high-functioning autism spectrum disorders. Autism. 22, (2), 161-170 (2018).
  9. Renzi, C., et al. Featural and Configural processing of faces are dissociated in the dorsolateral prefrontal cortex: A TMS study. NeuroImage. 74, 45-51 (2013).
  10. Samson, F., Mottron, L., Soulieres, I., Zeffiro, T. A. Enhanced visual functioning in autism: An ALE meta-analysis. Human Brain Mapping. 33, (7), 1553-1581 (2012).
  11. Sasson, N. J., Turner-Brown, L. M., Holtzclaw, T. N., Lam, K. S. L., Bodfish, J. W. Children with autism demonstrate circumscribed attention during passive viewing of complex social and nonsocial picture arrays. Autism Research. 1, 31-42 (2008).
  12. Klin, A., Jones, W., Schultz, R., Volkmar, F., Cohen, D. Visual fixation patterns during viewing of naturalistic social situations as predictors of social competence in individuals with autism. Archives of General Psychiatry. 59, 809-816 (2002).
  13. Rutherford, M. D., Towns, A. M. Scan path differences and similarities during emotion perception in those with and without autism spectrum disorders. Journal of Autism and Developmental Disorders. 38, (7), 1371-1381 (2008).
  14. Wallace, S., Coleman, M., Bailey, A. An investigation of basic facial expression recognition in autism spectrum disorders. Cognition and Emotion. 22, (7), 1353-1380 (2008).
  15. Byrge, L., Dubois, J., Tyszka, J. M., Adolphs, R., Kennedy, D. P. Idiosyncratic Brain Activation Patterns Are Associated with Poor Social Comprehension in Autism. The Journal of Neuroscience. 35, 5837-5850 (2015).
  16. Raven, J. Manual for Raven's Progressive Matrices and Vocabulary Scales. Research Supplement No.1: The 1979 British Standardisation of the Standard Progressive Matrices and Mill Hill Vocabulary Scales. Harcourt Assessment. San Antonio, Texas. (1981).
  17. Sasson, N. J., Elison, J. T. Eye tracking young children with autism. Journal of Visualized Experiments. (61), e3675 (2012).



    Post a Question / Comment / Request

    You must be signed in to post a comment. Please or create an account.

    Usage Statistics