Here we present the Deese, Roediger and McDermott (DRM) task, a tool to study false memories in the laboratory. Subjects study lists of semantically related words (e.g., nurse, sick, etc.), and later falsely remember an unstudied word (doctor) that represents the gist, or theme, of the word list.
The Deese, Roediger and McDermott (DRM) task is a false memory paradigm in which subjects are presented with lists of semantically related words (e.g., nurse, hospital, etc.) at encoding. After a delay, subjects are asked to recall or recognize these words. In the recognition memory version of the task, subjects are asked whether they remember previously presented words, as well as related (but never presented) critical lure words ('doctor'). Typically, the critical word is recognized with high probability and confidence. This false memory effect has been robustly demonstrated across short (e.g., immediate, 20 min) and long (e.g., 1, 7, 60 d) delays between encoding and memory testing. A strength of using this task to study false memory is its simplicity and short duration. If encoding and retrieval components of the task occur in the same session, the entire task can take as little as 2 – 30 min. However, although the DRM task is widely considered a 'false memory' paradigm, some researchers consider DRM illusions to be based on the activation of semantic memory networks in the brain, and argue that such semantic gist-based false memory errors may actually be useful in some scenarios (e.g., remembering the forest for the trees; remembering that a word list was about "doctors", even though the actual word "doctor" was never presented for study). Remembering the gist of experience (instead of or along with individual details) is arguably an adaptive process and this task has provided a great deal of knowledge about the constructive, adaptive nature of memory. Therefore, researchers should use caution when discussing the overall reach and implications of their experiments when using this task to study 'false memory', as DRM memory errors may not adequately reflect false memories in the real world, such as false memory in eyewitness testimony, or false memories of sexual abuse.
The Deese, Roediger and McDermott (DRM) task was initially created by Deese1, and later revitalized by Roediger and McDermott2 as a convenient means of studying false memory in the laboratory. Although some3,4 argue it should be called the DRMRS task, for the contributions of Read5 and Solso6, the most common name in the literature is the DRM task, and we call it by that name here. After a seminal paper published by Roediger and McDermott2, interest of false memory research skyrocketed (see7), resulting in over 2,800 citations of that article to date. According to Roediger and McDermott, they revived the experimental design created by Deese because there was no reliable laboratory paradigm to induce false recall, while evidence of false recognition (e.g.,8,9) did "little to discourage the belief that more natural, coherent materials are needed to demonstrate powerful false memory effects"2.
One such example of a "more natural" paradigm is the misinformation paradigm10,11. In this task, subjects are presented with a story through pictures, slides, or video. Later, misleading information is provided, and the question is whether subjects will incorporate this misleading information into their recollection of the story. The DRM task is simpler than the misinformation paradigm in several respects. DRM encoding requires only the quick presentation and learning of lists of words, either visually or aurally. Retrieval testing for the DRM task is equally convenient regardless of the particular method used. In a recognition test participants are presented with a subset of the encoded words, the critical lure words (e.g., 'doctor'), and unrelated lure words and have to make simple judgments of whether they remember each word or not, whereas in a recall test, participants have to write down all the words they are able to remember. In contrast, free recall testing for the misinformation paradigm is impractical, as it requires time-consuming content analysis. Additionally, the DRM task does not require any manipulation between encoding and testing, as DRM 'false memories' are spontaneously self-generated. The misinformation errors, on the other hand, are induced via external suggestions. Although both the DRM and misinformation paradigms are argued to assess false memory, newer studies have found small (r = 0.12)12 or no relationship13,14 between the misinformation and the DRM effects, suggesting that different mechanisms may be at play for each type of false memory. Moreover, the DRM illusions are argued to be a byproduct of the constructive nature of memory15, which can be considered an evolutionarily adaptive process16.
The DRM false memory effect is highly robust across studies (for quantitative reviews see 17,18), and there is considerable evidence that the DRM task is quite reliable19 (but see20). The DRM false memory effect has been found using various delay intervals, including those as short as an immediate test, and those delaying memory testing until 60 days later21,22,23 (but see 24). Warning subjects of the DRM illusion reduces, but does not erase, the effect 14,25. The DRM effect has also been found with different encoding strategies, such as changes in word presentation duration26, and can be increased by several post-encoding manipulations, such as sleep27 or stress28.
Moreover, the DRM task has been utilized by many laboratories to study false memory formation in a variety of subject populations, such as children29,30,31,32 and older adults33, and in a variety of research fields, including individual cognitive (e.g., working memory20,34) and personality differences35, neuroimaging36,37, and neuropsychology38. In spite of its popularity, however, many have argued against the generalizability of the DRM task, and whether the creation of DRM false memories is comparable to the naturalistic creation of false autobiographical memories outside of the laboratory, such as memories of child abuse recovered in psychotherapy39,40,41. Nonetheless, several studies have found that subjects that are more susceptible to DRM false memories are also more prone to autobiographical memory distortions42, fantastic autobiographical memories (alien abductions43; past lives44), and recovered autobiographical memories45.
In short, the DRM task has been a useful tool to investigate the neurocognitive underpinnings of the (re)constructive nature of memory15,16, regardless of the ongoing debate about how appropriate and relevant it is in the study of autobiographical false memories7. In the current report, the DRM task procedures are explained in their simplest form, with a focus on targeting memory consolidation processes (i.e. experimental manipulations, such as sleep and stress, occur after encoding has finished and are thus used as tools to evaluate consolidation), as this has been the focus in our laboratory. The authors refer the reader to Gallo (2013)46 for an excellent review of the DRM task, along with the different variations on encoding and testing procedures.
The Institutional Review Board of the University of Notre Dame approved all of the procedures, including use of human subjects, discussed here. The preparation and the administration of the DRM task materials described below were used in a published study28, in which the effects of psychosocial stress following DRM word list encoding were assessed 24 h later.
1. Preparation of DRM Task
2. Administration of DRM Task
Using the procedure presented here, the authors have been able to reliably produce the DRM effect in two independent experiments; that is, subjects recall and recognize, with high probability, non-presented critical words that can be considered false memories for the 'gist' of the word lists.
Results for experiment 1 (see Figures 1 and 2) have been published elsewhere28. In that experiment, 67 subjects arrived at the laboratory, listened (through headphones) to 15 DRM word lists and then were submitted to a psychosocial stress task involving public speaking (Trier Social Stress Test) or a control version of the task. Subjects returned 24 h later to complete the free recall test, immediately followed by the recognition test, as described above. Of relevance to the current report, the overall proportion of words recalled and recognized was higher for critical words (false recall M = 0.20, false recognition M = 0.71) than for presented words (true recall M = 0.09, true recognition M = 0.65), t(66) = 8.61, p <0.001, Cohen's d = 1.22 for recall [Figure 1]; t(66) = 2.42, p = 0.02, Cohen's d = 0.29 for recognition [Figure 2]). Importantly, recognition of critical words was also significantly higher than recognition for unrelated foil words (M = 0.36), t(66) = 12.88, p <0.001, Cohen's d = 1.57.
Figure 1: Recall Rates from Experiment 1 (Pardilla-Delgado et al., 2016)28. Bars represent means and error bars represent standard error of the mean. Relevant to this report, the overall memory for false recall (last two bars) is significantly higher than memory for true recall and recognition. *** p <0.001. Please click here to view a larger version of this figure.
Figure 2: Recognition Rates from Experiment 1 (Pardilla-Delgado et al., 2016)28. Bars represent means and error bars represent standard error of the mean. Relevant to this report, the overall memory for false recognition is significantly higher than memory for true and foil recognition (last three bars). * p <0.05; *** p <0.001. Please click here to view a larger version of this figure.
Similar results were obtained in experiment 249 (see Figures 3 and 4). In that study, 117 subjects encoded 16 DRM word lists either at night, before going to sleep, or during the morning, prior to a period of wakefulness. Subjects returned 24 or 48 h later to complete the free recall test followed by the recognition test. The overall proportion of words recalled and recognized was higher for critical words (false recall M = 0.20, false recognition M = 0.72) than for presented words (true recall M = 0.09, true recognition M = 0.65), t(116) = 12.4, p <0.001, Cohen's d = 1.36 for recall [Figure 3]; t(116) = 3.66, p <0.001, Cohen's d = 0.39 for recognition [Figure 4]). Importantly, recognition of critical words was also significantly higher than recognition for unrelated foil words (M = 0.37), t(116) = 15.68, p <0.001, Cohen's d = 1.44.
Figure 3: Recall Rates from Experiment 2 (Pardilla-Delgado et al., 2017)49. Bars represent means and error bars represent standard error of the mean. Groups: S24: Sleep 1st/24 h delay, W24: Wake 1st/24 h delay, S48: Sleep 1st/ 48 h delay, W48: Wake 1st/48 h delay. Relevant to this report, the overall memory for false recall (last two bars) is significantly higher than memory for true recall. *** p <0.001. Please click here to view a larger version of this figure.
Figure 4: Recognition Rates from Experiment 2 (Pardilla-Delgado et al., 2017)49. Bars represent means and error bars represent standard error of the mean. Groups: S24: Sleep 1st/24 h delay, W24: Wake 1st/24 h delay, S48: Sleep 1st/ 48 h delay, W48: Wake 1st/48 h delay. Relevant to this report, the overall memory for false recognition (last three bars) is significantly higher than memory for true and foil recognition. *** p <0.001. Please click here to view a larger version of this figure.
The fact that, in these two independent studies conducted in our laboratory, false memories (critical words) were remembered proportionally more often than true memories (studied words) 24 and 48 h after encoding is consistent with early studies that showed a similar false memory persistence effect over long delays21,22,23. These results underscore the efficacy of the DRM task in eliciting false memories across lengthy delay intervals, at least as false memories can be broadly defined as remembered events that were not actually experienced by the subject.
Probability of false recall ranked from highest to lowest according to Stadler et al., 1999 | |
WINDOW | 65 |
SLEEP | 61 |
SMELL | 60 |
DOCTOR | 60 |
SWEET | 54 |
CHAIR | 54 |
SMOKE | 54 |
ROUGH | 53 |
NEEDLE | 52 |
ANGER | 49 |
TRASH | 49 |
SOFT | 46 |
CITY | 46 |
CUP | 45 |
COLD | 42 |
MOUNTAIN | 42 |
SLOW | 42 |
RIVER | 51 |
Table 1: Probability of false recall ranked from highest to lowest according to Stadler et al., 199947. Stadler and colleagues47 found that the lists associated with these critical words have the highest probability of producing a false memory in a free recall test. Presented here are the critical words only (i.e. the nonpresented word that is falsely remembered at retrieval testing). See the Appendix for each complete list.
Probability of false recognition ranked from highest to lowest according to Stadler et al., 1999 | |
WINDOW | 84 |
SMELL | 84 |
COLD | 84 |
ROUGH | 83 |
CUP | 82 |
SOFT | 81 |
SLEEP | 80 |
ANGER | 79 |
SWEET | 78 |
TRASH | 78 |
CHAIR | 74 |
SMOKE | 73 |
HIGH | 72 |
DOCTOR | 71 |
THIEF | 70 |
MOUNTAIN | 69 |
SLOW | 69 |
MUSIC | 69 |
Table 2: Probability of false recognition ranked from highest to lowest according to Stadler et al., 199947. Stadler and colleagues47 found that the lists associated with these critical lure words have the highest probability of producing a false memory in an old/new recognition test. Presented here are the critical lures only (i.e. the nonpresented word that is falsely remembered at retrieval testing). See the Appendix for each complete list.
Critical words (in alphabetical order) with list items (ranked by associative strength) for the top 18 lists for free recall | ||||||||
ANGER | CHAIR | CITY | COLD | CUP | DOCTOR | MOUNTAIN | NEEDLE | ROUGH |
mad | table | town | hot | mug | nurse | hill | thread | smooth |
fear | sit | crowded | snow | saucer | sick | valley | pine | bumpy |
hate | legs | state | warm | tea | lawyer | climb | eye | road |
rage | seat | capital | winter | measuring | medicine | summit | sewing | tough |
temper | couch | streets | ice | coaster | health | top | sharp | sandpaper |
fury | desk | subway | wet | lid | hospital | molehill | point | jagged |
ire | recliner | country | frigid | handle | dentist | peak | prick | ruddy |
wrath | sofa | New York | chilly | coffee | physician | plain | thimble | coarse |
happy | wood | village | heat | straw | ill | glacier | haystack | uneven |
fight | cushion | metropolis | weather | goblet | patient | goat | thorn | riders |
hatred | swivel | big | freeze | soup | office | bike | hurt | rugged |
mean | stool | Chicago | air | stein | stethoscope | climber | injection | sand |
calm | sitting | suburb | shiver | drink | surgeon | range | syringe | boards |
emotion | rocking | county | Artic | plastic | clinic | steep | cloth | ground |
enrage | bench | urban | frost | sip | cure | ski | knitting | gravel |
RIVER | SLEEP | SLOW | SMELL | SMOKE | SOFT | SWEET | TRASH | WINDOW |
water | bed | fast | nose | cigarette | hard | sour | garbage | door |
stream | rest | lethargic | breathe | puff | light | candy | waste | glass |
lake | awake | stop | sniff | blaze | furry | sugar | can | pane |
Mississippi | tired | listless | aroma | billows | pillow | bitter | refuse | shade |
boat | dream | snail | hear | pollution | plush | good | sewage | ledge |
tide | wake | cautious | see | ashes | loud | taste | bag | sill |
swim | snooze | delay | nostril | cigar | cotton | tooth | junk | house |
flow | blanket | trafic | whiff | chimney | fur | nice | rubbish | open |
run | doze | turtle | scent | fire | touch | honey | sweep | curtain |
barge | slumber | hesitant | reek | tobacco | fluffy | soda | scraps | frame |
creek | snore | speed | stench | stink | feather | chocolate | pile | view |
brook | nap | quick | fragrance | pipe | downy | heart | dump | breeze |
fish | peace | sluggish | perfume | lungs | kitten | cake | landfill | sash |
bridge | yawn | wait | salts | flames | skin | tart | debris | screen |
winding | drowsy | molasses | rose | stain | tender | pie | litter | shutter |
Appendix: Critical Words (in Alphabetical Order) with List Items (Ranked by Associative Strength) for Top 18 Lists for Free Recall. Bold words at top represent the 'gist' of the list and are considered the critical words (false memories); these words are not presented at encoding.
In this report, the authors described a highly used cognitive task that reliably produces gist-based false memories in human subjects. It is important to note that, in the current report, the DRM task was presented in one of its simplest forms, very similar to the original protocol used by Deese1 and Roediger and McDermott2. The similarity with the original protocol used in the experiments described here has one particular exception: a long delay (24, 48 h) between encoding and testing, which is useful when testing the persistence of false memories over true memories22 or following manipulations that can affect memory consolidation, such as sleep27,49 and stress28. Related to this issue, in the current experiments, the recognition test was administered immediately after the free recall test, which has been found to increase recognition rates2,18; therefore, we caution the reader to interpret our recognition data accordingly. Additionally, although several early 2 studies, as well as the presented studies suggest that critical words (false memories) are consistently remembered better than study words (true memories), others have shown the opposite pattern, particularly for short term memory tests17,18.
The DRM task has multiple modifications (for review, see46), ranging from, but not limited to: 1) changes to encoding processing, such as warnings about the effect25, relational and associative processing instructions50,51, priming52, incidental encoding53, and rapid word presentation54; 2) changes to the testing method, such as forced-choice tests55, speeded recognition tests56, and recollection vs. familiarity judgments2; and 3) changes to critical word features, such as using taboo words57, long words58, and concrete words58.
There are several important factors researchers should consider when using the DRM paradigm. Here, we recommend using the work from Stadler et al. (1999)47 in order to choose the word lists to be presented at encoding. In the study by Stadler and colleagues, subjects recalled each list immediately after listening to the words, whereas the recognition test was given after all lists had been presented and recalled. Therefore, recall and recognition rates may vary if longer retention intervals are used, as we have done in our laboratory. Our mean false recall rates were M = 0.20. Across shorter delays, like those used by Stadler et al.47 mean false recall rates can be higher (e.g., M = 0.51 for the top 18 lists47). Further, we recommend using auditory presentation, as it is the more common of the two modalities (visual or auditory). Visual presentation has also been shown to decrease the DRM effect60,61. Depending on the experimental question and if a more detailed assessment of memory is desired, individual confidence ratings can be added at testing, to both recall and recognition tasks. For the recognition test, "remember" and "know" judgments62 can alternatively be used. In our studies, participants were given 10 min for the recall task to allow them to retrieve as many words as possible, because 1) 180 words were presented at encoding (15 lists; 12 words/list) and 2) there were 24 – 48 h intervals between encoding and testing, which was bound to reduce retention. Regarding statistical analysis, although in the current report we presented uncorrected recognition rates (for simplicity), the reader might consider signal detection methods to analyze the recognition test data63 (see Seamon et al.23 for a good example of signal detection methods with the DRM task). Regarding tools and materials, in the event that experiment creation software is not available, word presentation for the recognition test can also be done using slideshow creation software (e.g., PowerPoint), while having subjects answer old/new on a sheet of paper.
One particularly important factor to keep in mind for future experiments is that increasing the number of semantically related words in each list boosts the false memory effect64, i.e. in order to increase the probability of false recall/recognition, it is paramount that experimenters present as many words as possible (for each list) during encoding; see Appendix for the complete word lists. Similarly, using an insufficient number of word lists may also decrease the ability to observe a clear effect, especially regarding correlations (i.e. statistically significant correlations are more difficult to observe when the range of a variable is small, as would be the case if few DRM word lists were included in a study27). In contrast, this is opposite to the suggestion of, if using recognition as the testing method, not including some of the list items at encoding in order to use them as non-presented foils during recognition testing. Related to this, it is suggested to include critical lures from non-studied DRM word lists in the to-be-recognized words18, because DRM critical lures have higher word frequencies and higher baseline false alarm rates than study (list) items2,65. This is one procedure that represents a high-threshold correction that addresses response bias. Another possible procedure is using signal detection methods(see Seamon et al.23).
The DRM is not without its limitations. Some have argued that the simple gist-based errors caused by the DRM task are related to spreading activation in semantic memory networks in the brain and may not be comparable to false autobiographical memories, such as the "recovered" memories of child abuse resulting from psychotherapy41. Although addressing this decade-long question is outside the scope of this report, the authors agree with Gallo in that "the appropriate questions to ask are what aspects of the DRM illusion are relevant to what aspects of autobiographical memories" (p. 834)7. Related to this dilemma, using the original DRM task, as described in the current report, can result in ambiguous interpretations because there are several activation/monitoring processes that govern this type of gist-based false memory formation66. Broadly speaking, future applications of the DRM task should continue addressing the reconstructive nature of memory, and more specifically, the transformation of single episodes into generalizable, flexible, and useful gist abstractions. Regardless of the research question, caution is always warranted when generalizing the results of studies using the DRM false memory task to other, real-world, forms of false memory, as the DRM task is a humble cognitive paradigm, yet one with great research potential.
The authors have nothing to disclose.
The authors thank all members of the Sleep, Stress, and Memory Lab for their help in data collection, particularly Stephen M. Mattingly for proofreading the final manuscript.
Computer | No particular brand/type required. | ||
Headphones | No particular brand/type required. | ||
RODE NT1-A 1" cardioid condenser microphone | Rode | http://www.rode.com/microphones/nt1-a | recording equipment used to record the wordlists |
Audacity | Audacity | http://www.audacityteam.org/ | for editing the recording of the wordlists |
E-Prime | Psychology Software Tools, Inc. | https://www.pstnet.com/eprime.cfm | for stimuli presentation and/or testing |
MS PowerPoint (optional) | Microsoft | for stimuli presentation and/or testing | |
MS Word (optional) | Microsoft | for free recall testing. Any word processor application will work. |