To replicate laboratory settings, online data collection methods for visual tasks require tight control over stimulus presentation. We outline methods for the use of a web application to collect performance data on two tests of visual attention.
Online data collection methods have particular appeal to behavioral scientists because they offer the promise of much larger and much more representative data samples than can typically be collected on college campuses. However, before such methods can be widely adopted, a number of technological challenges must be overcome – in particular in experiments where tight control over stimulus properties is necessary. Here we present methods for collecting performance data on two tests of visual attention. Both tests require control over the visual angle of the stimuli (which in turn requires knowledge of the viewing distance, monitor size, screen resolution, etc.) and the timing of the stimuli (as the tests involve either briefly flashed stimuli or stimuli that move at specific rates). Data collected on these tests from over 1,700 online participants were consistent with data collected in laboratory-based versions of the exact same tests. These results suggest that with proper care, timing/stimulus size dependent tasks can be deployed in web-based settings.
Over the past five years there has been a surge of interest in the use of online behavioral data collection methods. While the vast majority of publications in the domain of psychology have utilized potentially non-representative subject populations1 (i.e., primarily college undergraduates) and often reasonably small sample sizes as well (i.e., typically in the range of tens of subjects), online methods offer the promise of far more diverse and larger samples. For instance, Amazon’s Mechanical Turk service has been the subject of a number of recent studies, both describing the characteristics of the “worker” population and the use of this population in behavioral research2-6.
However, one significant concern related to such methods is the relative lack of control over critical stimulus variables. For example, in most visual psychophysics tasks, stimuli are described in terms of visual angle. The calculation of visual angles requires precise measurements of viewing distance, screen size, and screen resolution. While these parameters are trivial to measure and control in a lab setting (where there is a known monitor and participants view stimuli while in a chin rest placed a known distance from the monitor), the same is not true of online data collection. In an online environment, not only will participants inevitably use a wide variety of monitors of different sizes with different software settings, they also may not have easy access to rulers/tape measures that would allow them to determine their monitor size or have the knowledge necessary to determine their software and hardware settings (e.g., refresh rate, resolution).
Here we describe a set of methods to collect data on two well-known tests of visual attention – the Useful Field of View (UFOV) paradigm7 and the multiple object tracking (MOT) task8 – while avoiding as much as possible the sources of variability that are inherent in online measurements. These tasks can be run by any participant with an internet connection and an HTML5 compatible browser. Participants who do not know their screen size are walked through a measurement process utilizing commonly available items of standard size (i.e., credit card/CD – see Figure 1).
Data on these two tasks were collected from over 1,700 participants in a Massive Online Open Course. Average performance of this online sample was highly consistent with results obtained in tightly controlled laboratory-based measures of the exact same tasks9,10. Our results are thus consistent with the growing body of literature demonstrating the efficacy of online data collection methods, even in tasks that require specific control over viewing conditions.
The protocol was approved by the institutional review board at the University of Wisconsin-Madison. The following steps have been written as a guide for programmers to replicate the automated process of the web application described.
1. Login Participant
2. Screen Calibration
NOTE: The web application guides the participant through the three steps outlined in the calibration page at: http://brainandlearning.org/jove/Calibration.
3. Multiple Object Tracking Task (MOT) – Figure 2
4. Moving from One Task to Another (Optional Step)
5. Useful Field of View Task (UFOV) – Figure 3
Outlier Removal
A total of 1,779 participants completed the UFOV task. Of those, 32 participants had UFOV thresholds that were greater than 3 standard deviations from the mean, suggesting that they were unable to perform the task as instructed. As such, the UFOV data from these participants were removed from the final analysis, leaving a total of 1,747 participants.
Data were obtained from 1,746 participants for the MOT task. Two participants had mean accuracy scores that were more than 3 standard deviations below the mean, thus the data from these participants were removed from the final MOT analysis, leaving a total of 1,744 participants.
UFOV
For the UFOV task, performance was calculated by averaging the presentation time over the final 5 trials in order to obtain a detection threshold. The presentation time reflected the measured stimulus presentation duration on each participant’s screen: the time from the start of the first stimulus frame until the end the last stimulus frame was recorded in milliseconds using the participant’s system clock. The detection threshold reflects the minimum presentation duration at which the participants can detect the peripheral target with approximately 79% accuracy, given our use of a 3-down, 1-up staircase procedure. The mean UFOV threshold was 64.7 msec (SD = 53.5, 95% CI [62.17, 67.19]) and scores ranged from 17 msec to 315 msec with a median threshold of 45 msec (see Figure 4). The threshold distribution was positively skewed, with skewness of 1.92 (SE = 0.06) and kurtosis of 3.93 (SE = 0.12).
MOT
MOT performance was measured by calculating the mean accuracy (percent correct) for each set size (1 – 5). Accuracy ranged from 0.4 – 1.0 for set size 1 to 0.1 – 1.0 for set size 5, and mean accuracy ranged from 0.99 (SD = 0.06, 95% CI [0.983, 0.989]) for set size 1 to 0.71 (SD = 0.17, 95% CI [0.700, 0.716]) for set size 5. The median accuracy scores ranged from 1.0 to 0.70 for set size 1 and 5 respectively (see Figure 5).
A repeated-measures ANOVA was conducted to examine whether accuracy differed as a function of set size. There was a significant main effect of set size (F(4, 6968) = 1574.70, p < 0.001, ŋρ2 = 0.475) such that accuracy decreased as set size increased, demonstrating a typical MOT effect.
Figure 1. Screen measurement. Because not all online participants know their screen size – or have easy access to a ruler/tape measure to assess their screen size – the calibration process asked subjects to utilize commonly available items of standard size (credit card – above; CD – below). Please click here to view a larger version of this figure.
Figure 2. MOT Task. Participants viewed a set of randomly moving dots. At trial onset, a subset of these dots was blue (targets), while the remainder were yellow (distractors). After 2 sec the blue target dots changed to yellow, making them visually indistinguishable from the distractors. Participants had to mentally track the formerly blue target dots for 4 sec until a response screen appeared. On this screen one of the dots was white and the subject made a “yes (this was one of the original targets)” or “no (this was not one of the original targets)” decision (with a key press). Please click here to view a larger version of this figure.
Figure 3. UFOV Task. The main screen consisted of a central stimulus (a yellow smiley that could have either short or long hair), a peripheral stimulus (a filled white star inside a circle) and peripheral distractors (white outlined squares). This screen was briefly flashed (with the timing determined adaptively based on participant performance). When the response screen appeared the participant had to make two responses: they had to indicate (with a key press) whether the smiley had long or short hair and they had to indicate (by clicking) on which of the 8 radial spokes the target stimulus appeared. They then received feedback about both responses (here they chose the correct answer for the central task, but the incorrect answer for the peripheral task). Please click here to view a larger version of this figure.
Figure 4. UFOV Results. As is clear from the histogram of subject performance, not only could the vast majority of the participants perform the task as instructed (~1% removed for poor/outlier performance), the mean performance was squarely in the range expected from lab based measures on the exact same task9.
Figure 5. MOT Results. Consistent with previous work10, MOT accuracy fell off smoothly with increasing set size.
Online data collection has a number of advantages over standard laboratory-based data collection. These include the potential to sample far more representative populations than the typical college undergraduate pool utilized in the field, and the ability to obtain far greater sample sizes in less time than it takes to obtain sample sizes that are an order of magnitude smaller in the lab1-6 (e.g., the data points collected from 1,700+ participants in the current paper were obtained in less than one week).
The described online methods were able to replicate results obtained from previously conducted lab-based studies: calculated means and ranges for UFOV thresholds and MOT accuracy in the online tasks were comparable to results reported by Dye and Bavelier9 for the UFOV task and Green and Bavelier10 for the MOT task. However, the large participant sample did have an impact on the distribution of the results, particularly in the UFOV task. The online UFOV threshold distribution was more right skewed than previous laboratory-based results9. This difference in skew may be attributed to the greater diversity of participants recruited online, particularly in regards to their wider variation in age: the online sample ranged from 18 – 70 years, while the laboratory-based sample ranged from 18 – 22 years9.
Furthermore, collecting data via online methods does require solving several technical challenges – particularly when close stimulus control is necessary for the validity of the measures. The two tasks employed here required control over both the visual angle of the stimuli that were presented and the timing of the stimuli. Visual angle in particular can be difficult to control in online settings as its calculation requires knowing viewing distance, monitor size, and screen resolution. This is particularly problematic given that many online participants may not know their monitor size or have easy access to a tape measure in order to measure their monitor size.
We devised a series of steps to overcome some of these issues. While we can perfectly resolve monitor size, we still cannot precisely control the actual viewing distance. We suggest to participants to sit an arm’s length away from the monitor, although this distance may vary among participants. Arm length was chosen, as U.S. anthropometric data indicates that the difference in length of a forward arm reach (the position in which participants would use to judge their distance away from the screen) between male and female adults is small, such that the median male reach is 63.8 cm while the median female reach is 62.5 cm11. Although the experiment setup procedure attempts to avoid introducing sex biases by using this measurement, there may be potential height biases; future studies that collect participants’ height information would need to be conducted to assess this possibility.
As for stimulus timing, we took into account the discrepancies between expected duration and recorded duration of stimulus presentation when calculating threshold values. Rather than relying on the expected presentation duration, we measured the duration of the stimulus frames using the participant’s system clock with millisecond precision. However, inherent disparities between monitor displays still were present and cannot be controlled for without physical in situ measurements. It is well known that Liquid Crystal Displays (LCD)— the most likely monitors our participants have access to—have long response times that typically vary depending on the start and end values of the pixel luminance changes. The latter issue is not a concern in our study because we always switched from the same background level to stimulus level. A greater concern is that variability in displays across participants causes a large portion of the measured variance. We believe that this is not an issue as pixel response times are typically smaller than 1 frame rate (i.e., 17 msec)12,13, which seems acceptable in comparison to the large inter individual variability in UFOV thresholds.
The methods employed here overcome the aforementioned challenges and thus allowed us to measure performance on two tasks – the UFOV and the MOT – that both require control over visual angle and screen timing properties. The results obtained by these methods were consistent with those obtained in standard laboratory settings, thus demonstrating their validity. Additionally, because these tasks require only an internet connection and an HTML5 compatible browser, these tasks can be employed not only to easily gather a large sample from a generally representative population, but can also be used to reach specific sub-types of individuals that may be geographically separated and thus difficult to bring to a common lab setting (e.g., patients with a certain type of disease or individuals with a certain ethnic background). Furthermore, with the rise of use of iPads and other tablets, the design of the web application could easily be adapted for better compatibility with touchscreen technology in order to reach an even greater number of participants. While the web application can currently run on tablets via an HTML5 browser, future iterations could remove the requirement of a keyboard and replace response keys with interface buttons or gestures.
The authors have nothing to disclose.
The authors have nothing to disclose.
Name of Reagent/ Equipment | Company | Catalog Number | Comments/Description |
Computer/tablet | N/A | N/A | It must have an internet connection and an HTML5 compatible browser |
CD or credit card | N/A | N/A | May not be needed if participant already knows the monitor size |