This method uses a dynamic visual display to index costs of distraction during visual search, including both "contingent attentional capture" and "set-specific capture," which is a cost of distraction that occurs when the participants maintain multiple search goals simultaneously. This method has revealed basic mechanisms and limitations of visual attention.
This method uses a rapid serial visual presentation (RSVP) paradigm to measure the cost of distraction when participants maintain multiple search goals. The protocol identifies two types of distraction within a single task – contingent attentional capture and set-specific capture – that represent different types of limitations of cognitive processing. Participants search for letters in two or more "target" ink colors (e.g., green and orange) within a continuous RSVP stream of heterogeneously colored letters, while ignoring two peripheral RSVPs of letters. Upon detecting a target, participants are to identify the letter. On some trials, target-colored distractors appear in the periphery just prior to the presentation of a target, causing a drop in target identification performance. Contingent attentional capture is observed by examining performance on trials in which the peripheral distractor is the same color as the target on that trial (e.g., both orange). Set-specific capture is represented by performance on trials in which the peripheral distractor is target-colored (e.g., orange), but not the same color as the target on that trial (e.g., green.) By varying the amount of time (i.e., the number of stimuli appearing) between the presentation of the distractor and the target, researchers can observe how participants recover from these distraction costs over time. As compared to static displays that are often used to measure contingent attentional capture, the dynamic display produces much larger effects, allowing the researcher to identify subtle effects of smaller manipulations. An unusual aspect of our design is that it employs a continuous display; "filler" stimuli connect one trial to the next seamlessly, and participants respond during this interval whenever they detect a target. The continuous display reduces chance performance to near-zero levels (rather than 50%) and provides researchers with a more sensitive measure of performance differences across trial types.
Contingent attentional capture refers to a performance cost (slower reaction times and lower accuracy) that occurs when a participant erroneously directs attention to a distractor similar to their search goal. Indexing top-down orienting of attention, contingent attentional capture only occurs when a goal-relevant distractor is present (e.g., a green digit when searching for green letters), but not when a goal-irrelevant stimulus is present (e.g., a blue digit). Studies of contingent attentional capture have been integral to the understanding of top-down orienting and the limitations of information processing, namely, that once a stimulus captures attention, it is processed in a serial and effortful manner1,2,3. Contingent attentional capture is most often measured using static displays that mimic a common visual search, such as searching for a red pepper in the produce section of a grocery store3,4. In this example, an item sharing features with the target, such as a red apple, might capture attention, slowing down the search. Contingent attentional capture can be observed for color3,5,6,7, shape8, motion9, time10, and semantic relevance11,12. In addition to static displays, contingent attentional capture has been measured using dynamic displays that mimic situations such as searching for a landmark while driving along a road, or looking for a person in a quickly moving crowd13,14.
More recently, researchers have investigated the consequences of attending to distractors when more than one search goal is active (such as searching for a red pepper and garlic at the same time7,8,15,16,17,18,19,20,21,22,23.) In such situations, distraction costs can be especially devastating. While evidence is mixed as to whether multi-goal searches impair performance when distraction is not present, attentional capture from goal-related distractors can cause very large deficits in performance. In particular, we identified a new form of attentional capture called "set-specific capture," which occurs when multiple goals are concurrently maintained. In the case of set-specific capture, performance costs are especially large when a distractor resembling one target goal (e.g., an apple) grabs attention from the target item matching the other goal (e.g., the garlic)7,20,21,22. See Figure 1 for an explanation of a typical finding, using this grocery example.
As in the case with contingent attentional capture, set-specific capture reveals that information is processed in a serial and effortful manner: when a distractor captures attention, attentional resources are drawn away from the target. In addition, set-specific capture shows that directing attention to the distractor's features leads to enhancement of the related goal within working memory. Thus, when more than one goal is concurrently maintained, this goal enhancement comes at the expense of any other current goals7,21,22. Set-specific capture is a consequence of multitasking, akin to switch costs and mixing costs found in task-switching studies, but also distinct from these measures24. It is important that future studies investigate this multitasking cost, both in order to understand the magnitude and nature of the impairment for practical reasons (e.g., safety-related situations involving dual-tasking), as well as to refine our understanding of the mechanics of visual search and how goals are maintained. For example, set-specific capture provides support for the idea that a single goal can be focused upon while a target or target-resembling distractor is attended, but that more goals are maintained in an accessory state during visual search25,26,27.
The present method provides a robust way of measuring both contingent attentional capture and set-specific capture within a single paradigm. It uses a dynamic display, inspired by previous work on the attentional blink and contingent attentional capture with rapid serial visual presentations (RSVPs) of stimuli13,14,28,29,30. This type of display yields much larger effects than do static display tasks, which usually rely on reaction time as a dependent measure, rather than accuracy3,31,32. These larger effects allow researchers to use this paradigm to measure more sensitive manipulations of set-specific capture, such as the effect of practice20.
In this task, participants search a heterogeneously colored, centrally-located RSVP for letters appearing in either of two "target" ink colors (e.g., green and orange; see Figure 2 for example stimulus colors). Any time a participant detects a target-colored letter appearing in the central display, they indicate whether the letter was from the first half of the alphabet ("press the 'J' key") or the second half of the alphabet ("press the 'K' key"). Meanwhile, participants ignore two RSVP displays consisting of mostly grey letters that appear on either side of the central display. Thus, at any given time, there are three letters on the screen at once – one centrally located and two peripheral. The letters change identity and color every 116 ms.
An experiment may consist of the following trial types: Target Alone, Distractor Alone, Non-Target Colored Distractor (NTC), Same Target Colored Distractor (STC), and Different Target Colored Distractor (DTC). In the Target Alone trial type, a target letter (e.g., a green C) appears in the central RSVP, without any color changes occurring in the peripheral RSVPs preceding it. In the Distractor Alone trial type, a target-colored item appears in one of the peripheral RSVP displays without a target item appearing afterward. The purpose of this trial type is to prevent participants from using a peripheral color change to predict an upcoming target, by including some trials in which a distractor did not predict a target. In the NTC, STC, and DTC trial types, a colored letter distractor appears in one of the peripheral displays before the target appears centrally, with a "lag" of 1 – 4 display frames (116 – 464 ms) between the appearance of the distractor and the target. For NTC trials, the distractor is not target-colored (e.g., a purple 'V'). In STC trials, the distractor (e.g., an orange 'B') is the same color as the following target (e.g., an orange 'T'). In DTC trials, the distractor (e.g., an orange 'C') is target-colored, but not the same color as the upcoming target (e.g., a green 'V'). See Figure 3 for a schematic of the task, including examples of each trial type. See Video 1 (video) for an example of the task. Viewed on loop, the example includes two targets. Video 2 (video) is the same video at a reduced speed for clarity.
Contingent attentional capture is indicated by the difference between NTC and STC performance, as a target-colored item captures attention only when it bears resemblance to one of the current goals (i.e., not on NTC trials, which usually yield the same accuracy level as Target Alone trials). Set-specific capture is indicated by the difference between STC and DTC performance. We have published several versions of this task, with slightly different configurations of trial types (i.e., with or without NTC and Distractor Alone trials; with just lags 1 and 3, with a variety of target colors, with three targets, etc.7,20,21,22).
One notable feature of this method is that it uses a continuous display. Each trial includes the minimum components to represent that trial type, (e.g., a peripheral distractor, a target, and any letters that appeared in time between the distractor and target.) "Filler" stimuli connect one trial to the next seamlessly, and participants respond during this intertrial interval, whenever they detect a target. The interval lasts from 15 – 21 frames (1740 – 2436 ms), which is sufficient time to respond; most responses occur within 700 ms. An advantage of this method is that chance performance is near 0%; participants are not explicitly aware that a trial has ended if they miss a target item. This allows for three types of outcomes: 1) an identified letter, which will lead to a correct response, 2) a detected but not identified item (e.g., "I saw something green"), which will lead to a 50% chance of a correct response, and 3) an undetected / missed item, which leads to no response (coded as inaccurate). These three outcomes provide more information about the degree of stimulus processing than do tasks with a two alternative forced choice response, which cannot differentiate between detection-without-identification (i.e., a response error) and an outright miss (i.e., an omission error).
We describe the method here as we have used it in published work, in which participants search for colored letters. However, it can be modified for use with images33 and potentially other stimuli, such as words34. Moreover, distractors can appear as other colored items in the central display rather than just as colored letters appearing in the periphery (e.g., a target-colored digit in the central display)21. It is also likely that set-specific capture can be identified in static displays. The further development of the extensions of this method will allow researchers to investigate topics such as the effect of reward and motivation on distraction35, or whether distraction costs are modulated by the number of concurrently maintained goals33. Other applications could include measuring distraction costs in real-world contexts such as when completing a demanding visual search task (e.g., airport baggage screening or radiology screening)36,37,38.
All of the methods described here were approved by the Arcadia University Institutional Review Board.
1. Design and Prepare the Experiment for Data Collection
NOTE: See the introduction for general information about design and trial types. See the discussion for more information about specific choices that can be made in each of these sub-steps. See Video 1 for a dynamic view of the task, and Video 2 for a slowed down version of the task.
2. Set up the Apparatus
3. Recruit participants for the experiment.
4. Test the participants
We report several examples of representative data. In the first example, there were two lags (1 and 3), two distractor trial types (STC and DTC), and 57 participants. There were also Target Alone and Distractor Alone trial types. In a repeated measures ANOVA with the factors trial type and lag, there was a main effect of each factor as well as an interaction between the two. Performance was better at lag 3 (mean (M) = 0.655, standard error (SE) = 0.018) than at lag 1 (M = 0.484, SE = 0.018), F(1, 57) = 107.6, p < 0.001, η2 = 0.654, demonstrating that distraction costs were strongest when participants had the least time to recover. Performance was better in STC (M = 0.640, SE = 0.20) than DTC (M = 0.499, SE = 0.016) trials, F(1, 57) = 74.61, p < 0.001, η2 = 0.567, supporting set-specific capture. The interaction between the two was also significant, indicating that recovery from distraction was faster in STC than DTC trials, F(1, 57) = 7.10, p = 0.01, η2 = 0.111. These effects are all quite strong, and results are typically significant with much smaller n, such as 10 participants. Subtracting Distractor Alone responses (false alarms) from Target Alone correct responses (hits), we can achieve an estimate of guessing-corrected accuracy in the absence of peripheral distraction, which in this case was M = 0.678 (SE = 0.014). This score was significantly better than STC performance at lag one (M = 0.569, SE = 0.017, t(57) = 5.38, p < 0.001), revealing a finding of contingent attentional capture. See Figure 4 for these example data.
The second example of representative data includes the NTC trial type, but no Distractor Alone trial type, and lags 1 and 4. There were 71 participants. To measure contingent attentional capture, we performed a repeated measures ANOVA with the factors trial type (NTC, STC) and lag (1, 4). We found performance was better at lag 4 (M = 0.791, SE = 0.013) than at lag 1 (M = 0.708, SE = 0.015), F(1, 70) = 7.69, p = 0.007. Participants performed better on NTC trials (M = 0.816, SE = 0.013) than STC trials (M = 0.789, SE = 0.013), F(1, 70) = 6.05, p < 0.016. There was also an interaction between trial type and lag, F(1, 70) = 19.72, p < 0.001, indicating similar performance at both lags in NTC trials, but better STC performance as lag increased. To measure set-specific capture, we performed a repeated measures ANOVA with the factors trial type (STC, DTC) and lag (1, 4). Performance was better at lag 4 (M = 0.790, SE = 0.014) than at lag 1 (M = 0.643, SE = 0.015), F(1, 70) = 60.65, p < 0.001. Performance was better in STC trials than the DTC trials (M = 0.644, SE = 0.019), F(1, 70) = 96.9, p < 0.001. Notably, the contingent attentional capture effects (comparing NTC and STC) are smaller than set-specific capture effects (comparing STC and DTC). See Figure 5 for these data.
All of the data mentioned here collapse across target-distractor response congruency, which refers to whether the target and distractor letters came from the same half of the alphabet. It is useful to note that response congruency does not typically have an impact on performance. Performance is plotted in Figure 6 for representative "incongruent" and "congruent" response mapping conditions, in an experiment that used lags 1, 2, 3 and 47.
Figure 1: A conceptual example of contingent attentional capture and set-specific capture. When looking for a red pepper43 and garlic44 (targets), the presence of a red apple45 may capture attention (distractor). Contingent attentional capture refers to decreased performance in finding a target (red pepper) in the face of a goal-related distractor (red apple). Set-specific capture refers to decreased performance in finding a target (garlic) in the face of a distractor related to another concurrently maintained goal (red apple), as attention is drawn not only to the distractor item but also to the goal state. Please click here to view a larger version of this figure.
Figure 2: An example color wheel for letter stimuli. In this example, target colors could be any combination of orange, green, and lavender (colors 1, 3, and 5, respectively). In one study, we used two of these colors as target colors, cycling through the different color pairs across participants7. The third color was used a peripheral distractor color in the NTC trial type. Other letters appearing in the central RSVP display were tan, turquoise, and magenta (colors 2, 4, & 6, respectively); these letters are called "filler." Color wheel designs vary depending on the experiment, but critically, any target colors must be linearly separable46. This means that on a color wheel, there must be at least one color that falls between the two target colors on the dimension of hue, and this color must appear in the central RSVP display as an item the participant is supposed to ignore. In this color wheel, any two of colors 1, 3, and 5 could form the two targets, with the third as the NTC distractor item, as described here. Alternatively, colors 1, 3, and 5 could all be targets in a three-target search20. Please click here to view a larger version of this figure.
Figure 3: Example trial types. Participants searched for targets appearing in either of two colors in a central RSVP while ignoring peripheral distractors. In this example, the target colors were green and orange. Each box frame shows three letters displayed simultaneously. Frames lasted for 116 ms before moving to the next display. In Target Alone trials, a target appeared centrally without any color changes occurring in the peripheral letters preceding it. In Distractor Alone trials, an item in the periphery changed to a target color, but no target appeared subsequently. In the Non-Target Colored trial type, a colored peripheral distractor appeared from 1-4 frames prior to a target, and the distractor was not target colored (e.g., lavender.) In the Same Target Colored trial type, the colored peripheral distractor was the same color as the subsequent target. In the Different Target Colored trial type, the colored peripheral distractor was the color of one of the targets, but not the same color as the subsequent target. Please click here to view a larger version of this figure.
Figure 4: Example data #1. Trial types (STC and DTC) are represented as separate lines. Lags (1 and 3) are on the x-axis. Target Alone is plotted separately. Distractor Alone trials are typically analyzed as false alarms, but to fit the rest of the data here (i.e., "proportion correct,") correct rejections are plotted instead – these are trials in which participants correctly withheld a response when a peripheral distractor was not followed by a target. Error bars represent standard error of the mean. Please click here to view a larger version of this figure.
Figure 5: Example data #2. Trial types (NTC, STC, and DTC) are represented as separate lines. Lags (1 and 3) are on the x-axis. Target Alone is plotted separately. Error bars represent standard error of the mean. Please click here to view a larger version of this figure.
Figure 6: Example data #3. Trial types are plotted (NTC, STC, and DTC) as separate lines and lags (1 – 4) are on the x-axes in two graphs representing (A) response-congruent trials (the target and colored distractor are from the same half of the alphabet) and (B) response-incongruent trials (the target and colored distractor are from different halves of the alphabet). Error bars represent standard error of the mean. Please click here to view a larger version of this figure.
Video 1: Video figure of two example trials. In this example, participants are searching for orange and green letters. This video is best viewed on loop to simulate the continuous display. There is an orange 'U' target and green 'X' target. Prior to the targets' appearance, orange peripheral distractors appear. When the orange distractor appears prior to the orange target, this is an STC trial. When the orange distractor appears prior to the green target, this is a DTC trial. Only about 10 – 12 frames separate the targets in this demonstration, but in reality, targets were separated by at least 15 frames (1740 ms), with the timing jittered unpredictably from 1750 – 2436 ms (15 – 21 frames), so that participants did not know when to expect the next target item. Please click here to view this video. (Right-click to download.)
Video 2: Video figure of two example trials, slowed down. This example is the same one from Video 1, but presented 300 ms / frame, so the targets are easier to find. Please click here to view this video. (Right-click to download.)
There are several considerations in using this method. The most important step to take is to ensure that the design requires participants to search for two or more targets at a time, and that there are "STC" and "DTC" distractor trial types, as these will provide the researcher with a measure of set-specific capture (STC – DTC). It is also helpful to have an "NTC" trial type to properly measure contingent attentional capture (NTC – STC), though one can estimate NTC performance with Target Alone performance, if necessary. To achieve the strongest effects, it is important to include lag 1 trials, with the caveat that lag-1 sparing is likely in versions of this task that use central distraction rather than peripheral distraction47,48. In lag-1 sparing, performance is better when the target appears immediately after a distractor than if they are separated by one or more frames; it is thought that both items are processed in the same attentional window49. Thus, if lag-1 sparing occurs, including lag 2 trials is recommended in order to achieve maximal distraction effects. Other lags are optional, depending on the researcher's desire to measure recovery from capture. Including several lags also keeps the timing from distractor to target unpredictable, which is useful because learning this timing can cause an improvement in performance (and reduction of observed effects).20 The dynamic RSVP display is also critical for this task. An advantage of the dynamic display over a static display is that the effects are large. However, it would be interesting to develop a measure of set-specific capture using a static display, as this mimics many everyday visual searches.
The choice of stimuli is another consideration. In terms of target, distractor, and filler colors for the letters, it is best to include colors that have equal luminance and saturation, as these features determine salience and can lead to bottom-up capture of attention50. Depending on the specifics of the experimental design, it is possible to design the color wheel with five rather than six colors. If the NTC trial type is not required and if just two target colors are searched rather than three, it is possible to use five colors in the color wheel20. It is not recommended to devise a color wheel with eight or more colors. It is too difficult to distinguish target colors from distractors in the RSVP display using more than six or seven total colors, because the colors are perceptually too similar to each other. As for the letters themselves, target letters should come from the beginning and end of the alphabet (no letters toward the middle, such as from H-S), as the goal is to keep the first half / second half of the alphabet decision simple for the participant.
Another design issue is determining how many trials to have in each trial type, as well as how many participants to run in the experiment. We make the following suggestion for trial distribution – at least 15% and up to about 50% of trials should be Target Alone trials, and there should be at least 20 Target Alone trials per target color. The Same Target Colored and Different Target Colored trial types should include at least 24 trials per target color and should have the same number of trials as each other, unless the purpose of the design is to manipulate practice in these trial types20. If the Non-Target Colored trial type is present, there should be about as many NTC trials as STC or DTC trials. Distractor Alone trials are also an option. In this trial type, ideal performance is 0% response rate / false alarm rate. Distractor Alone trials protect against participants' adopting a strategy of using distractors as warning signals of upcoming targets. Responses to Distractor Alone trials are considered inaccurate. These trials may serve as an effective deterrent to the "warning signal" strategy if they appear on about 10% of all trials. In determining sample size, it is important to note that set-specific capture effects are more reliable and larger than contingent attentional capture effects. A power calculation is recommended to determine the sample size appropriate for the particular experiment's goals41.
Specific equipment is mentioned in the materials and the protocol, but some flexibility is possible. The experiment can be designed, programmed, and presented using any software program that is flexible and provides millisecond precision of timing. The stimulus presentation rate mentioned throughout this protocol is compatible with a monitor with a 60 Hz refresh rate. A faster refresh rate is acceptable to use but note that the stimulus timing will be slightly different (e.g., 75 Hz refresh rate may yield a frame rate of 106 ms or 120 ms, but not 116 ms).
A limitation of the protocol as reported here is that it is not possible to require participants to search for more than three colors simultaneously. There are insufficient colors available in an isoluminant color wheel for participants to distinguish targets from distractors when participants maintain more than three search goals. This is because filler letters in the central RSVP display must be colors that occur between target colors in terms of hue on the color wheel (to ensure linear separation of attentional sets) and the rapid presentation allows little time for fine color discrimination30. One way to protect against this limitation is to use images as targets. We have collected data on a version of this task doing just that. In this study, participants search for distinct images (e.g., a particular camera), and similar ones (e.g., the wrong camera) appear as central distractors. The effects are exactly in line with both contingent attentional capture and set-specific capture. It is possible for participants to search for many images at a time, and we are able to measure how set-specific capture and attentional capture are modulated depending on the number of concurrent search goals33. However, it is important to note that when using images in an RSVP display, there is not likely time for multiple saccades to occur, so the image should be small enough to be processed in a single saccade.
Future directions with this paradigm could also include searching for other visual features (e.g., orientation) or concepts. Such investigations may reveal more about mechanisms of attention and their relationship with memory and perception (e.g., how attentional sets, or goal states, are stored in memory).
The authors have nothing to disclose.
This research was made possible with startup funds from Arcadia University and Elmhurst College awarded to K.S.M., a student-faculty collaborative grant from Elmhurst College to E.A.W. and K.S.M., and an Arcadia University faculty development grant to K.S.M. We would like to thank Daniel H. Weissman, a collaborator on prior publications using versions of this protocol. We also wish to thank the additional students who collected data on previous versions of this protocol, including Marshall O’Moore, Patricia Chen, Amanda Lai, Elise Darling, Erika Pinsker, Somin Lee, Celine Santos, Greg Ramos, and Kathleen Trencheny.
MATLAB | Mathworks | R2014b | General computing platform |
Psychtoolbox | Psychtoolbox | PTB-3 | Toolbox of routines for use with MATLAB |
G*Power | Universität Düsseldorf | G*Power 3.1.9.2 for Windows | Software to assist with performing power calculations |
24” HDMI Gaming Monitor | ASUS | VG248QE | High quality LCD monitor with excellent timing |