Young children do not passively observe the world, but rather actively explore and engage with their environment. This protocol provides guiding principles and practical recommendations for using head-mounted eye trackers to record infants’ and toddlers’ dynamic visual environments and visual attention in the context of natural behavior.
Young children’s visual environments are dynamic, changing moment-by-moment as children physically and visually explore spaces and objects and interact with people around them. Head-mounted eye tracking offers a unique opportunity to capture children’s dynamic egocentric views and how they allocate visual attention within those views. This protocol provides guiding principles and practical recommendations for researchers using head-mounted eye trackers in both laboratory and more naturalistic settings. Head-mounted eye tracking complements other experimental methods by enhancing opportunities for data collection in more ecologically valid contexts through increased portability and freedom of head and body movements compared to screen-based eye tracking. This protocol can also be integrated with other technologies, such as motion tracking and heart-rate monitoring, to provide a high-density multimodal dataset for examining natural behavior, learning, and development than previously possible. This paper illustrates the types of data generated from head-mounted eye tracking in a study designed to investigate visual attention in one natural context for toddlers: free-flowing toy play with a parent. Successful use of this protocol will allow researchers to collect data that can be used to answer questions not only about visual attention, but also about a broad range of other perceptual, cognitive, and social skills and their development.
The last several decades have seen growing interest in studying the development of infant and toddler visual attention. This interest has stemmed in large part from the use of looking time measurements as a primary means to assess other cognitive functions in infancy and has evolved into the study of infant visual attention in its own right. Contemporary investigations of infant and toddler visual attention primarily measure eye movements during screen-based eye-tracking tasks. Infants sit in a chair or parent's lap in front of a screen while their eye movements are monitored during the presentation of static images or events. Such tasks, however, fail to capture the dynamic nature of natural visual attention and the means by which children's natural visual environments are generated – active exploration.
Infants and toddlers are active creatures, moving their hands, heads, eyes, and bodies to explore the objects, people, and spaces around them. Each new development in body morphology, motor skill, and behavior – crawling, walking, picking up objects, engaging with social partners – is accompanied by concomitant changes in the early visual environment. Because what infants do determines what they see, and what they see serves for what they do in visually guided action, studying the natural development of visual attention is best carried out in the context of natural behavior1.
Head-mounted eye trackers (ETs) have been invented and used for adults for decades2,3. Only recently have technological advances made head-mounted eye-tracking technology suitable for infants and toddlers. Participants are outfitted with two lightweight cameras on the head, a scene camera facing outward that captures the first person perspective of the participant and an eye camera facing inward that captures the eye image. A calibration procedure provides training data to an algorithm that maps as accurately as possible the changing positions of the pupil and corneal reflection (CR) in the eye image to the corresponding pixels in the scene image that were being visually attended. The goal of this method is to capture both the natural visual environments of infants and infants' active visual exploration of those environments as infants move freely. Such data can help to answer questions not only about visual attention, but also about a broad range of perceptual, cognitive, and social developments4,5,6,7,8. The use of these techniques has transformed understandings of joint attention7,8,9, sustained attention10, changing visual experiences with age and motor development4,6,11, and the role of visual experiences in word learning12. The present paper provides guiding principles and practical recommendations for carrying out head-mounted eye-tracking experiments with infants and toddlers and illustrates the types of data that can be generated from head-mounted eye tracking in one natural context for toddlers: free-flowing toy play with a parent.
This tutorial is based on a procedure for collecting head-mounted eye-tracking data with toddlers approved by the Institutional Review Board at Indiana University. Informed parental consent was obtained prior to toddlers' participation in the experiment.
1. Preparation for the Study
2. Collect the Eye-Tracking Data.
3. After the Study, Calibrate the ET Data Using Calibration Software.
Note: A variety of calibration software packages are commercially available.
4. Code Regions of Interest (ROIs).
NOTE: ROI coding is the evaluation of POG data to determine what region a child is visually attending to during a particular moment in time. ROI may be coded with high accuracy and high resolution from the frame-by-frame POG data. The output of this coding is a stream of data points – one point per video frame – that indicate the region of POG over time (see Figure 5A).
The method discussed here was applied to a free-flowing toy play context between toddlers and their parents. The study was designed to investigate natural visual attention in a cluttered environment. Dyads were instructed to play freely with a set of 24 toys for six minutes. Toddlers' visual attention was measured by coding the onset and offset of looks to specific regions of interest (ROIs) — each of the 24 toys and the parent's face — and by analyzing the duration and proportion of looking time to each ROI. The results are visualized in Figure 5.
Figure 5A shows sample ROI streams for two 18-month-old children. Each colored block in the streams represents continuous frames in which the child looked at a particular ROI. The eye-gaze data obtained demonstrate a number of interesting properties of natural visual attention.
First, the children show individual differences in their selectivity for different subsets of toys. Figure 5B shows the proportion of the 6-minute interaction that each child spent looking at each of 10 selected toy ROIs. Though the total proportion of time Child 1 and Child 2 spent looking at toys (including all 24 toy ROIs) was somewhat similar, 0.76 and 0.87, respectively, proportions of time spent on individual toys varied greatly, both within and between subjects.
How these proportions of looking time were achieved also differed across children. Figure 5C shows each child's mean duration of looks to each of 10 selected toy ROIs. The mean duration of looks to all 24 toy ROIs for Child 2 (M = 2.38 s, SD = 2.20 s) was almost twice as long as that of Child 1 (M = 1.20 s, SD = 0.78 s). Comparing the looking patterns to the red ladybug rattle (purple bars) in Figure 5B,C illustrates why computing multiple looking measures, such as proportions and durations of looking, is important for a complete understanding of the data; the same proportion of looking to this toy was achieved for these children through different numbers of looks of different durations.
Another property demonstrated by these data is that both children rarely looked to their parent's face: the proportions of face looking for Child 1 and Child 2 were .015 and .003, respectively. Furthermore, the duration of these children's looks to their parent's face were short, on average 0.79 s (SD = 0.39 s) and 0.40 s (SD = 0.04 s) for Child 1 and Child 2, respectively.
Figure 1. Head-mounted eye tracking employed in three different contexts: (A) tabletop toy play, (B) toy play on the floor, and (C) reading a picture book. Please click here to view a larger version of this figure.
Figure 2. Setting up the head-mounted eye-tracking system. (A) A researcher positioning an eye tracker on an infant. (B) A well-positioned eye tracker on an infant. (C) Good eye image with large centered pupil and clear corneal reflection (CR). (D, E, F) Examples of bad eye images. Please click here to view a larger version of this figure.
Figure 3. Three different ways of obtaining calibration points. Two views of each moment are shown; top: third-person view, bottom: child's first-person view. Arrows in the third-person view illustrate the direction of a laser beam. Inset boxes in the upper right of the child's view show good eye images at each moment used for calibration and pink crosshairs indicate point of gaze based on the completed calibration. (A) Calibration point generated by an experimenter using a finger and laser pointer to direct attention to an object on the floor. (B) Calibration point generated by an experimenter using a laser pointer to direct attention to dots on a surface. (C) Calibration point during toy play with a parent in which the child's attention is directed to a held object. Please click here to view a larger version of this figure.
Figure 4. Example plots used to assess calibration quality. Individual dots represent per-frame x-y point of gaze (POG) coordinates in the scene camera image, as determined by the calibration algorithm. (A) Good calibration quality for a child toy-play experiment, indicated by roughly circular density of POG that is centered and low (child POG is typically directed slightly downward when looking at toys the child is holding), and roughly evenly distributed POG in the remaining scene camera image. (B) Poor calibration quality, indicated by elongated and tilted density of POG that is off-centered, and poorly distributed POG in the remaining scene camera image. (C) Poor calibration quality and/or poor initial positioning of the scene camera, indicated by off-centered POG. Please click here to view a larger version of this figure.
Figure 5. Two children's eye-gaze data and statistics. (A) Sample ROI streams for Child 1 and Child 2 during 60 s of the interaction. Each colored block in the streams represents continuous frames in which the child looked at an ROI for either a specific toy or the parent's face. White space represents frames in which the child did not look at any of the ROIs. (B) Proportion of time looking at the parent's face and 10 toy ROIs, for both children. Proportion was computed by summing the durations of all looks to each ROI, and dividing the summed durations by the total session time of 6 minutes. (C) Mean duration of looks to the parent's face and ten toy ROIs, for both children. Mean duration was computed by averaging the durations of individual looks to each ROI during the 6-minute interaction. Please click here to view a larger version of this figure.
This protocol provides guiding principles and practical recommendations for implementing head-mounted eye tracking with infants and young children. This protocol was based on the study of natural toddler behaviors in the context of parent-toddler free play with toys in a laboratory setting. In-house eye-tracking equipment and software were used for calibration and data coding. Nevertheless, this protocol is intended to be generally applicable to researchers using a variety of head-mounted eye-tracking systems to study a variety of topics in infant and child development. Though optimal use of this protocol will involve study-specific tailoring, the adoption of these general practices have led to successful use of this protocol in a variety of contexts (see Figure 1), including the simultaneous head-mounted eye tracking of parents and toddlers7,8,9,10, and head-mounted eye tracking of clinical populations including children with cochlear implants15 and children diagnosed with autism spectrum disorders16,17.
This protocol provides numerous advantages for investigating the development of a variety of natural competencies and behaviors. The freedom of head and body movement that head-mounted ETs allow gives researchers the opportunity to capture both participants' self-generated visual environments and their active exploration of those environments. The portability of head-mounted ETs enhances researchers' ability to collect data in more ecologically valid contexts. Due to these advantages, this method provides an alternative to screen-based looking time and eye-tracking methods for studying development across domains such as visual attention, social attention, and perceptual-motor integration, and complements and occasionally challenges the inferences researchers can draw using more traditional experimental methods. For instance, the protocol described here increases the opportunity for participants to exhibit individual differences in looking behavior, because participants have control not only over where and for how long they focus their visual attention in a scene, as in screen-based eye tracking, but also over the composition of those scenes through their eye, head, and body movements and physical manipulation of elements in the environment. The two participants' data presented here demonstrate individual differences in how long toddlers look and what objects toddlers sample when they are able to actively create and explore their visual environment. Additionally, the data presented here, as well as other research employing this protocol, suggest that in naturalistic toy play with their parents, toddlers look to their parent's face much less than suggested by previous research4,5,7,8,9,10.
Despite these benefits, head-mounted eye tracking with infants and toddlers poses a number of methodological challenges. The most critical challenge is obtaining a good calibration. Because the scene image is only a 2D representation of the 3D world that was actually viewed, a perfect mapping between eye position and gazed scene location is impossible. By following the guidelines provided in this protocol, the mapping can become reliably close to the "ground truth", however special attention should be paid to several issues. First, the freedom of head and body movement allowed by head-mounted eye tracking also means that young participants will often bump the eye-tracking system. This is a problem because any change in the physical position of the eye relative to the eye or scene cameras will change the mapping between the pupil/CR and the corresponding pixels attended in the scene image. Conducting separate calibrations for these portions of the study is therefore critical, as failure to do so will result in an algorithm that only tracks the child's gaze accurately for one portion of the study, if only points during one portion are used to calibrate. Second, accurate detection of the child's pupil and CR are critical. If a calibration point in the scene image is plotted while the pupil is incorrectly detected or not detected at all, then the algorithm either learns to associate this calibration x-y coordinate in the scene image with an incorrect pupil x-y coordinate, or the algorithm is being fed blank data in the case where the pupil is not detected at all. Thus, if good detection is not achieved for a segment of the study, calibration quality for these frames will be poor and should not be trusted for coding POG. Third, because children's heads and eyes are typically aligned, visual attention is most often directed toward the center of the scene image. Nevertheless, extreme x-y calibration points in the scene image are also necessary for establishing an accurate gaze track across the entire scene image. Thus, although calibration points should typically be chosen at moments when the eye is stable on an object, this may not be possible for calibration points in the far corners of the scene image. Finally, keep in mind that even when a good eye image is obtained and the system calibrates, this does not ensure that the data is of sufficient quality for the intended analyses. Differences in individual factors such as eye physiology, as well as environmental factors such as lighting and differences in eye-tracking hardware and software can all influence data quality and have the potential to create offsets or inaccuracies in the data. 18,19 provide more information and possible solutions for such issues (see also Franchak 201720).
Working with infants and toddlers also involves the challenge of ensuring tolerance of the head-mounted ET throughout the session. Employing the recommendations included in this protocol, designed for use with infants from approximately 9-24 months of age, a laboratory can obtain high-quality head-mounted eye-tracking data from approximately 70% of participants20. The other 30% of participants may either not begin the study due to intolerance of the eye tracker or fuss out of the study before sufficient data (e.g., >3-5 minutes of play) with a good eye track can be obtained. For the successful 70% of infant and toddler participants, these sessions typically last for upwards of 10 minutes, however much longer sessions may be infeasible with current technologies, depending on the age of the participant and the nature of the task in which the participant is engaged. When designing the research task and environment, researchers should keep in mind the developmental status of the participants, as motor ability, cognitive ability, and social development including sense of security around strangers, can all influence participants' attention span and ability to perform the intended task. Employing this protocol with infants much younger than 9 months will also involve additional practical challenges such as propping up infants that cannot yet sit on their own, as well as consideration of eye morphology and physiology, such as binocular disparity, which differ from that of older children and adults19,21. Moreover, this protocol is most successful when carried out by experienced trained experimenters, which can constrain the range of environments in which data may be collected. The more practice experimenters have, the more likely they will be able to conduct the experiment smoothly and collect eye tracking data of high quality.
Head-mounted eye tracking can also pose the additional challenge of relatively more time-consuming data coding. This is because, for the purpose of finding ROIs, head-mounted eye-tracking data is better coded frame by frame than by "fixations" of visual attention. That is, fixations are typically identified when the rate of change in the frame-by-frame x-y POG coordinates is low, taken as an indication that the eyes are stable on a point. However, because the scene view from a head-mounted eye tracker moves with the participant's head and body movements, the eye's position can only be accurately mapped to a physical location being foveated by considering how the eyes are moving relative to head and body movements. For instance, if a participant moves their head and eyes together, rather than their eyes only, the x-y POG coordinates within the scene can remain unchanged even while a participant scans a room or tracks a moving object. Thus, "fixations" of visual attention cannot be easily and accurately determined from only the POG data. For further information on issues associated with identifying fixations in head-mounted eye tracking data, please consult other work15,22. Manually coding data frame-by-frame for ROI can require extra time compared to coding fixations. As a reference, it took highly trained coders between 5 and 10 minutes to manually code for ROI each minute of the data presented here, which was collected at 30 frames per second. The time required for coding is highly variable and depends on the quality of the eye tracking data; the size, number, and visual discriminability of ROI targets; the experience of the coder; and the annotation tool used.
Despite these challenges, this protocol can be flexibly adapted to a range of controlled and naturalistic environments. This protocol can also be integrated with other technologies, such as motion tracking and heart-rate monitoring, to provide a high-density multimodal dataset for examining natural behavior, learning, and development than previously possible. Continued advances in head-mounted eye-tracking technology will undoubtedly alleviate many current challenges and provide even greater frontiers for the types of research questions that can be addressed using this method.
The authors have nothing to disclose.
This research was funded by the National Institutes of Health grants R01HD074601 (C.Y.), T32HD007475-22 (J.I.B., D.H.A.), and F32HD093280 (L.K.S.); National Science Foundation grant BCS1523982 (L.B.S., C.Y.); and by Indiana University through the Emerging Area Research Initiative – Learning: Brains, Machines, and Children (L.B.S.). The authors thank the child and parent volunteers who participated in this research and who agreed to be used in the figures and filming of this protocol. We also appreciate the members of the Computational Cognition and Learning Laboratory, especially Sven Bambach, Anting Chen, Steven Elmlinger, Seth Foster, Grace Lisandrelli, and Charlene Tay, for their assistance in developing and honing this protocol.