Here we present a protocol for familiarization-test paradigms which provide a direct test of infant categorization and help to define the role of language in early category learning.
Assessing infant category learning is a challenging but vital aspect of studying infant cognition. By employing a familiarization-test paradigm, we straightforwardly measure infants’ success in learning a novel category while relying only on their looking behavior. Moreover, the paradigm can directly measure the impact of different auditory signals on the infant categorization across a range of ages. For instance, we assessed how 2-year-olds learn categories in a variety of labeling environments: in our task, 2-year-olds successfully learned categories when all exemplars were labeled or the first two exemplars were labeled, but they failed to categorize when no exemplars were labeled or only the final two exemplars were labeled. To determine infants’ success in such tasks, researchers can examine both the overall preference displayed by infants in each condition and infants’ pattern of looking over the course of the test phase, using an eye-tracker to provide fine-grained time-course data. Thus, we present a powerful paradigm for identifying the role of language, or any auditory signal, in infants’ object category learning.
Categorization is a fundamental building block of human cognition: infants’ categorization abilities emerge early in infancy and become increasingly sophisticated with age.1,2,3 Research has also revealed a powerful role for language in infant categorization: from 3 months of age, infants learn categories more successfully when category exemplars are paired with language.4,5,6 Moreover, by the end of the first year, infants are attuned to the role of count noun labels in categorization. Pairing category exemplars with a consistent labeling phrase (“This is a vep!”) facilitates infants’ category learning relative to providing either a distinct label for each exemplar (“This is a vep,” “This is a dax,” etc.) or a non-labeling phrase (“Look at this.”).7,8,9
In infants’ everyday experiences, however, the vast majority of objects they encounter will likely remain unlabeled. No caregiver could label every object an infant sees much less provide the labels which apply to every object (e.g., “malamute,” “dog,” “pet,” “animal”). This presents a paradox: how can we reconcile the power of labels in infant categorization with their relative scarcity in infants’ daily lives?
To answer this question, we developed a protocol to assess how infants learn categories in a variety of different learning environments, including when they receive a mixture of labeled and unlabeled exemplars. Specifically, we propose that receiving even a few labeled exemplars at the beginning of learning can facilitate categorization—by enhancing infants’ ability to learn from subsequent, unlabeled exemplars as well. This strategy of using a small number of labeled exemplars as a foundation for learning from a larger number of unlabeled exemplars has been widely implemented in the field of machine learning, spawning a family of semi-supervised learning (SSL) algorithms10,11,12. Of course, the learning strategies implemented are not identical across different kinds of learners: in machine learning, algorithms typically are exposed to many more exemplars, make explicit guesses about each exemplar, and learn multiple categories simultaneously. Nevertheless, both machine and infant learners may benefit from successfully integrating both labeled and unlabeled exemplars to learn new categories in sparse labeling environments.
Our design focuses on whether 2-year-old children, in the process of acquiring words for numerous new categories, are capable of this kind of semi-supervised learning. We employ a standard infant categorization measure: a familiarization-test task. In this paradigm, 2-year-olds were exposed to a series of exemplars from a novel category during a familiarization phase. Each exemplar was paired with a different auditory stimulus, depending on the condition (i.e., either a labeling or a non-labeling phrase). Then, at the test, all 2-year-olds saw two new objects presented in silence: one object from the now-familiar category and one from a novel category.
If the 2-year-olds successfully form the category during the familiarization phase, then they should distinguish between the two exemplars presented at the test. Importantly, because a systematic preference for either the novel or familiar test image reflects an ability to distinguish between them, both familiarity and novelty preferences are interpreted as evidence of successful categorization. Note that on a given task, the nature of this preference is a function of infants’ processing efficiency for the stimulus materials, with familiarity preferences associated with less efficient stimulus processing 4,13,14,15,16,17. Presenting the test phase in silence makes it possible to directly assess infants’ success in object categorization and how this success varies according to the information that accompanied the exemplars during familiarization. Thus, this paradigm provides a compelling test of how different types of linguistic environments affect category learning. If labeling enhances category learning in both semi-supervised and fully supervised environments, then 2-year-olds in these conditions should show stronger test preferences than infants in other environments.
All methods described here have been approved by the Northwestern University Institutional Review Board.
1. Stimuli Creation
NOTE: The visual stimuli (see Figure 1) used in the representative design reported below were originally developed in Havy and Waxman (2016)18 and are available for download at https://osf.io/n6uy8/.
2. Apparatus
3. Task Design
[Place Figure 1 here]
4. Study Procedure
5. Data Analysis
Using the protocol above, we ran two experiments22. Analyses were conducted with the eyetrackingR package23, and the data and code are available at https://github.com/sandylat/ssl-in-infancy. In the first experiment, we contrasted a fully supervised condition (n = 24, Mage = 26.8 mo), featuring only labeled exemplars, with an unsupervised condition (n = 24, Mage = 26.9 mo), featuring only unlabeled exemplars.
Fully Supervised vs. Unsupervised Environments
Infants in the Fully Supervised (M = 13.86 s, SD = 3.00) and Unsupervised (M = 14.94 s, SD = 1.91) conditions showed no difference in their attention to the exemplars during familiarization, t(46) = 1.48, p = .14, d = .43.
At test, 2-year-olds in the Fully Supervised condition (M = .59, SD = .15) displayed a significant preference for the novel category exemplar, t(23) = 3.05, p = .006, d = .62, indicating they had successfully formed the category. In contrast, 2-year-olds in the Unsupervised condition (M = .49, SD = .18) looked roughly equally between the objects at test, t(23) = .39, p = .70, d = .08. Performance differed significantly between these conditions, t(46) = 2.27, p = .028, d = .66 (see Figure 2). Finally, a cluster-based permutation analysis of the time-course of looking patterns at test revealed a significant divergence between the two conditions, p = .038, from 3,450 ms to 3,850 ms (see Figure 3).
Semi-supervised vs. Reversed Semi-supervised Environments
Next, we examined whether 2-year-olds could learn categories in semi-supervised environments by integrating labeled and unlabeled exemplars. We predicted that receiving labeled exemplars at the beginning of familiarization in a Semi-supervised condition (n = 24, Mage = 27.3, 12 female), where the labeled exemplars can provide a foundation for learning from the unlabeled exemplars, would facilitate category learning whereas receiving labeled exemplars at the end of familiarization in a Reversed Semi-supervised condition (n = 24, Mage = 27.2, 13 female) would not. That is, receiving labeled exemplars first should enable 2-year-olds to learn more from the unlabeled exemplars than receiving those labeled exemplars after seeing the unlabeled exemplars.
Infants in the Semi-supervised condition (n = 24, M = 13.23 s, SD = 3.35) and Reversed Semi-supervised (n = 24, M = 12.58 s, SD = 2.78) conditions showed similar levels of attention to the exemplars during familiarization, t(46) = .73, p = .47, d = .21.
At the test, however, infants in the Semi-supervised condition (M = .59, SD = .14), displayed a significant novelty preference, t(23) = 3.11, p = .005, d = .63, whereas infants in the Reversed Semi-supervised condition (M = .52, SD = .13) performed at chance levels, t(23) = .76, p = .45, d = .16. Infants’ preferences were marginally different between the two conditions, t(46) = 1.80, p = .08, d = .52 (see Figure 2). Moreover, we also conducted a cluster-based permutation analysis of infants’ looking behavior at the test, revealing that the Semi-supervised condition showed a stronger novelty preference than the Reversed SSL condition between 3450ms and 3850ms, p = .047 (see Figure 3). This is exactly the same period of time during which the Fully Supervised condition diverged from the Unsupervised condition, suggesting infants were just as successful at learning the category in the Semi-supervised condition as in the Fully Supervised condition.
Figure 1: Sample task design. The familiarization phase consists of 6 trials, each presenting one category member paired with either a labeling or a non-labeling phrase. The test phase simultaneously presents infants with one exemplar from the now-familiar category and one from a novel category. Conditions represent the four conditions presented in the representative results section. This figure has been modified from LaTourrette, A., Waxman, S.R. A little labeling goes a long way: Semi-supervised learning in infancy. Dev. Sci. e12736 (2018). Please click here to view a larger version of this figure.
Figure 2: Mean preference scores across conditions. Infants in the Fully Supervised and Semi-supervised conditions displayed novelty preferences significantly above chance, p < .05. Infants in the Unsupervised and Reversed SSL conditions performed at chance levels. Error bars represent standard errors of the mean. This figure has been modified from22. Please click here to view a larger version of this figure.
Figure 3: The Infant’s looking patterns during test. In the Fully Supervised and Unsupervised conditions (at left) and in the Semi-supervised and Reversed Semi-supervised conditions (at right), infants’ pattern of looking to the exemplars diverged between 3,450ms and 3,850ms. The grey shaded bar in each graph denotes this divergent period. The colored shaded regions around each condition indicate standard error of the mean. This figure has been modified from22. Please click here to view a larger version of this figure.
Here, we present a procedure for evaluating the role of labeling in categorization. By presenting 2-year-olds with a realistic mix of labeled and unlabeled exemplars, we demonstrate that very young children are capable of learning in semi-supervised environments, extending work with adults and older children24,25. Thus, this method offers a resolution to the paradox posed above: if even a few labeled exemplars can spark category learning, then labels can be both rare and powerful.
Critical aspects of this paradigm include the use of novel artificial stimuli and short trials, both of which make the task appropriately challenging and engaging for 2-year-olds. In addition, using an eye-tracker, rather than hand-coding infant looking behavior, provides richer and more precise data on participants’ eye gaze; this richness and precision enables the implementation of time-course measures such as the cluster-based permutation analysis.
The central advantages of the familiarization-test paradigm are its straightforward assessment of category learning and its simplicity as a passive looking task. That is, the task directly tests category learning, rather than relying on more complex measures like naming behavior or inductive inferences3,26,27. Moreover, because familiarization-test tasks can be administered across a broad developmental range (e.g., from 3 months to 3 years), they offer an opportunity to identify developmental continuity and change.
Indeed, the familiarization-test paradigm presented here was designed for 2-year-olds, but similar designs have been widely used with infants in their first year of life4,6,7,9,28. For these younger infants, of course, the task must be simplified: longer exposure to the familiarization exemplars, more exemplars, simpler categories, and a longer window of looking at test may all improve the task’s sensitivity for younger infants. More broadly, the familiarization-test paradigm employed here can be easily extended to evaluate the effect of any auditory signal on infant cognition, including silence, sine-wave tones, nonhuman primate vocalizations, and other non-linguistic sounds5,13,29,30.
Limitations of this task stem primarily from its use of a single outcome variable: infants’ preference at the test. This makes the task unsuitable for questions about, for instance, how each familiarization exemplar changes infants’ category learning or the particular features infants use to learn the category. Time-course analyses, such as the cluster-based permutation analysis, can substantially enrich the insight offered by this paradigm. However, while these analyses enable us to draw stronger conclusions about when two conditions differ in performance, they also raise important questions about what factors drive infants’ attentional patterns throughout the test phase, a promising area for future work.
The authors have nothing to disclose.
The research reported here was supported by the National Institute of Child Health and Human Development of the National Institutes of Health under award number R01HD083310 and a National Science Foundation Graduate Research Fellowship under grant no. DGE‐1324585. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the National Science Foundation.
Final Cut Pro X | Apple | N/A | Video editing, composition software |
MorphX | Norrkross | N/A | Image-morphing software |
PhotoShop | Adobe | N/A | Image-editing software |
R | R Core Team | N/A | Statistical analysis software |
T60XL Eyetracker | Tobii Pro | Discontinued | Large, arm-mounted eyetracker suitable for work with infants and children |
Tobii Pro Studio | Tobii Pro | N/A | Software directing eyetracker display, data collection |