A cost-benefit analysis is a weighing-scale approach that the brain performs during the course of decision making. Here, we propose a protocol to train rats on an operant-based decision-making paradigm where rats choose higher rewards at the expense of waiting for 15 s to receive them.
Reinforcement-guided decision making is the ability to choose between competing courses of action based on the relative value of the benefits and their consequences. This process is integral to the normal human behavior and has been shown to be disrupted by neurological and psychiatric disorders such as addiction, schizophrenia, and depression. Rodents have long been used to uncover the neurobiology of human cognition. To this end, several behavioral tasks have been developed; however, most are non-automated and are labor-intensive. The recent development of the open-source microcontroller has enabled researchers to automate operant-based tasks for assessing a variety of cognitive tasks, standardizing the stimulus presentation, improving the data recording and consequently, improving the research output. Here, we describe an automated delay-based reinforcement-guided decision-making task, using an operant T-maze controlled by custom-written software programs. Using these decision-making tasks, we show the changes in the local field potential activities in the anterior cingulate cortex of a rat whilst it performs a delay-based cost-and-benefit decision-making task.
Decision making is the process of recognizing and selecting choices based on the values and preferences of the decision maker and the consequences of the selected action1. Although decision making has been extensively studied in different fields (i.e., economics, psychology, and neuroscience), neural mechanisms underlying such cognitive abilities are not yet fully understood. Two subcategories of decision making are perceptual decision making and reinforcement-guided decision making. Though they incorporate considerable overlapping elements and concepts, perceptual decision making relies on the available sensory information1,2, whereas reinforcement-guided decision-making deals with the relative value of actions gained over a specific timescale3. One important aspect of reinforced decision making is the cost-benefit analysis which is performed intuitively by the brain by computing the benefits of the given choices and subtracting the associated costs of each alternative1.
The T-maze (or the variant Y-maze) is one of the most-used mazes in cognitive experiments using rodents. Animals are placed in the start arm (the base of the T) and permitted to choose the goal arm (one of the side arms). Tasks such as a forced alternation or left-right discrimination are mainly used with rodents in the T-maze to test reference and working memory4. T-mazes are also widely used in decision-making experiments5,6,7. In the simplest design, the reward is placed in only one goal arm. The choice is predictable, and animals would certainly prefer the reward rather than nothing, regardless of the reward value. Another option is to place rewards in both goal arms and then let the animals make a choice of which path to take depending on several parameters (i.e., the natural preference of the animal, the difference in the value of the rewards, and the costs to be paid). In the value-based design, the task is more complicated by having weighing-scale properties. In this way, an animal receives differently valued rewards by choosing between the two alternatives, as well as between the costs of the actions [i.e., the amount of waiting (delay-based) or the amount of effort (effort-based) needed to receive rewards], each contributing to the decision that is made5,6.
In traditional delay-based T-maze decision making, animals are trained to select the high reward arm (HRA) and avoid the opposite low reward arm (LRA). The sides of the HRA and the LRA remain unchanged throughout the experiment. Although the task described above has been well documented in the literature, it suffers from several procedural drawbacks. Firstly, by having a fixed goal arm, the animal knows which arm to choose from the beginning of each trial. In this scenario, animals may select the goal arm based on their memory rather than on decision making. Hence, in a delay-based decision-making paradigm, if an animal selects the low reward because of the study intervention, it will not be clear whether this is due to a loss of memory or to the study intervention. A memory control group to segregate the observed behavior from the memory problem might be considered, but this burdens researchers and animals alike because of the additional work7. A second concern is the moment of decision making by the animal: once animals reach the decision zone (the junction of all three arms), they usually look to the left and to the right, weigh the costs and benefits regarding each arm, and then make their decision. However, after a few trials, they perform such a computation prior to arriving at the decision zone and simply run directly to the reward arm. As a result, these two drawbacks—a pre-bias to one arm and finding the moment of decision making—both highly interrupt the interpretation of electrophysiological and neuroimaging data.
In the method explained in this paper, the preferred arm (HRA) is cued by an auditory cue and may vary from trial to trial. Animals initiate the trials by entering the test zone (Figure 1) and triggering the auditory cue by "nose-poking" an infrared gate that has been placed at the junction of the three arms. The audio signal (20 dB, between 500 and 1,000 ms) is played from a speaker at the end of the goal arm.
All procedures explained here were approved and carried out in accordance with the Guide for the Care and Use of Laboratory Animals and were approved by the Florey Institute Animal Ethics Committee or the Neuroscience Research Center.
1. Housing, Handling, and Food Restriction
2. Experimental Set-up
3. Habituation to the Maze
4. Discrimination Training
5. Delay Training
6. Electrophysiology (Electrode Fabrication)
7. Anesthesia
8. Surgical procedure
9. Post-procedure Training
The data presented here is the recorded LFP from the left orbitofrontal cortex (OFC) and the anterior cingulate cortex (ACC) of six male Wistar rats using bipolar electrodes (of PFA-coated stainless steel). Table 1 shows the behavioral acquisition length for each training stage. The coordinates for the target locations were determined from a rat brain atlas9 and are as follows: for the AAC, 1.2 mm anterior to the bregma, 0.8 mm lateral to the midline, and 2 mm ventral to the skull; and for the OFC, 3.5 mm anterior to the bregma, 2.3 mm lateral to the midline, and 5.4 mm ventral to the skull.
The recordings were bandpass-filtered (0.01 – 250 Hz) to extract LFPs and then sampled at 1,000 Hz. The spectral analysis was performed on LFPs using the multi-taper10. Five Slepian tapers and a time-bandwidth product of three were used to achieve the optimal spectral concentration. Time-frequency spectrograms were estimated using a sliding-window of 300 ms that was shifted over the data at 5 ms steps. For a better observation of the task-dependent modulation of the spectral powers and to attenuate the 1/f power-scaling problem, all spectrograms were baseline normalized and converted to decibel using dBtf = 10log10(Stf/mSf), where Stf is the spectrum at time t and frequency f and mSf is the mean spectrum of all time points in the baseline within a frequency band11. The spectral powers were computed for the baseline (300 ms before nose-poking), stimulus (100 ms), pre-chamber (300 ms before entering the chamber), and chamber (600 ms) time windows. The statistical analysis was performed using a non-parametric permutation-based t-test.
As shown in the top row of Figure 2A, there was a decrease in both the low (4 – 12 Hz) and the high (45 – 85 Hz) frequency powers in the ACC from the onset to the end of the stimulus. Comparing the time spent outside the chamber with the time in the chamber, the spectral analysis showed no changes in the oscillatory activities (as seen in the top row of Figure 2B) in the ACC.
Low-frequency oscillations in the OFC also showed decreases in the spectral powers while the animals approached the IR gate; however, this appeared earlier (-80 ms) and lasted longer compared to the low-frequency oscillations in the ACC (50 – 420ms). Mid/high-frequency band activities (23 – 100 Hz) in the OFC increased following the stimulus onset (Figure 2A, bottom row). No significant changes were observed in the pre-chamber and chamber time windows when they were compared together (Figure 2B, bottom row). These results are in line with previous findings supposing that the OFC and ACC are both involved in the value-based decision making2,12,13.
Figure 1: Schematic of a choice trial in a delay-based decision-making task. The maze measures 60 cm x 10 cm x 40 cm. The start box is connected to the start arm through a retractable door. Two other retractable doors (door-A and door-B) are placed at each goal arm and together they make a chamber to delay the animals' access to rewards. Door-A is placed 12.5 cm from the entry point to each arm and door-B is placed just before the food well, 5 cm from the end of the arm. A raised metal food well, 3 cm in diameter, is placed at the far end of each goal arm, 2 cm above the maze floor.
The animal is placed in the start box and is allowed to approach and nose-poke the infrared gate (IRB-1) to trigger the auditory stimulus that cues the HRA (in this panel, the right arm). The IRB-2L and IRB-2R timestamp the animal's choice. If the animal turns right, door-A is opened to let the animal enter the arm (chamber) and is closed immediately after the animal enters. After 15 s, door-B is opened to give the animal access to the reward. If the animal chooses to turn left, (door-A is opened in the left side), door-B is opened immediately after it enters the left chamber. The IRB-3L and IRB-3R timestamp the animal's entrance to the chamber.
Figure 2: Temporal and spectral dynamics of ACC and OFC neural activities. (A) This panel shows time-frequency plots of ACC (top row) and OFC (bottom row) neural activities during a successful high-reward discrimination. The spectral powers are baseline-normalized by subtracting the post-poking time window from the baseline time window. The value 0 in the abscissa denotes the onset of the auditory stimulus. (B) This panel shows time-frequency plots of ACC (top row) and OFC (bottom row) neural activities when the animal enters the chamber. The chamber time window is normalized by the pre-chamber time window. The value 0 in the abscissa denotes the time of opening door-A. The color bags depict the extent of the spectral changes in decibel scale. The black rectangles demonstrate significant deviations from the chance level (p < 0.05 by two-sided permutation test). Please click here to view a larger version of this figure.
Habituation | Discrimination training | Delay training (5s) | Delay training (10s) | Delay training (15) | total | |
Rat 1 | 3 days | 15 days | 8 days | 6 days | 5 days | 37 days |
Rat 2 | 3 days | 18 days | 9 days | 6 days | 5 days | 41 days |
Rat 3 | 3 days | 13 days | 7 days | 5 days | 6 days | 34 days |
Rat 4 | 3 days | 15 days | 9 days | 6 days | 6 days | 39 days |
Rat 5 | 3 days | 17 days | 8 days | 7 days | 5 days | 40 days |
Rat 6 | 3 days | 16 days | 7 days | 6 days | 6 days | 38 days |
Table 1: Behavioral variability and the time course of learning for 6 rats.
Rodents have long been used in neuroscientific studies that deal with different topics, from cognitive abilities such as learning and memory2,14 and reinforced behavior7,15,16 to the central control of organs17,18 and neuropharmacology19,20. The proposed protocol explained a complex behavioral task suitable for experiments involved with electrophysiology and neuroimaging. We have described the delay-based reinforced guided task for rats, but it can be adapted for mice since rats and mice perform similarly on dry land tasks.
We used nose-poking as the stimulus to trigger an audio stimulus. However, lever-pressing and other stimulus modalities such as visual or olfactory stimuli can also be used, individually or concurrently. The proposed operant task has a number of benefits and advantages over existing non-operant methods. Most compelling is the automatic precise timestamping of the course of the animals' decisions which, otherwise, is very difficult. The method is especially well suited to electrophysiology and neuroimaging studies. Another advantage is removing the spatial components of the task which require spatial memory control groups. As a highly demanding task, it is quite likely that not all rats perform well on the paradigm. Replace the animal if it stays idle in the start arm, delays entering the decision zone for more than 5 min or produces higher error rates compare to other animals in the group.
At any decision moment, the costs and values of any choice are assumed to be evaluated simultaneously. Therefore, choosing either HRA or LRA in this task can be results of changes in the encoding of the costs, in the encoding of the benefits, or in the cost-benefit computation. One caveat of the proposed method is being unable to discriminate between the encoding processes.
There are a number of steps that can be taken to maximize the success in training the animals and recording their electrophysiological signals. Firstly, handling the animals before the training is crucial. As the recording sessions start with connecting the recording wires to the animal's head-stage, try to acclimate them so that they allow you to hold their head. This is very important, as infrequently handled animals become anxious during this procedure and may damage the head-stage or the recording cable. Generally, well-handled animals are less stressed, easier to work with, and tend to produce less variable data.
Secondly, rodents leave behind a variety of odorant cues in the maze (i.e., pheromone-containing urine and feces, secrete pheromones from their whisker region, and in fluids from their foot pads). Therefore, the maze needs to be wiped after each individual use and at the conclusion of an experiment to minimize the impact of these residual odorant molecules on the test results. Ethanol (70%) is a common disinfectant used to clean testing equipment. However, like many disinfectants, alcohol itself has an odor that can influence rodent behavior. Therefore, make sure that it has fully evaporated before placing an animal in the maze.
Thirdly, although LFPs are less sensitive to noise than spikes, using solid connectors and a well-secured cable decreases the level of movement noise. Lightly spraying water on the maze floor may decrease any static electricity which is created by friction between the animal's fur and the floor surface.
In conclusion, the protocol described in this article may help to design delay-based reinforced decision-making experiments and record electrophysiological signals while the animal is performing the task.
The authors have nothing to disclose.
This research was supported by RMH Neuroscience Foundation, Australia; the Australian Brain Foundation; the RACP Thyne Reid Fellowship, Australia; and by a project grant from the Cognitive Sciences and Technologies Council, Iran to Abbas Haghparast.
T-maze | Self made | ||
Dustless Precision Sugar Pellets | TSE Systems Intl. Group | F0023 | 45 mg, Sucrose |
Ketamine Hydrochloride Injection, USP | Sigma-Aldrich | 6740-87-0 | |
Xylazine | Sigma-Aldrich | 7361-61-7 | |
stereotaxic device | Stoelting | ||
Isofluran | Santa Cruz Biotechnology | sc-363629Rx | |
PFA-coated stainless-steel wires | A-M systems | ||
acrylic cement | Vertex, MA, USA | ||
(wooden or PVC (polyvinyl chloride)-made) | local suppliers | ||
Mini-Fit Power Connector | Molex | 15243048 | |
ethannol 70% | Local suppliers | ||
buprenorphine | diamondback drugs | ||
Arduino UNO | Arduino | https://www.arduino.cc/ | |
Infrared emitting diode | Sharp | GL480E00000F | http://www.sharp-world.com/ |
Chronux Toolbox | Chronux.org | ||
Arduino codes | https://github.com/dechuans/arduino-maze |