The neural correlates of listening to consonant and dissonant intervals have been widely studied, but the neural mechanisms associated with production of consonant and dissonant intervals are less well known. In this article, behavioral tests and fMRI are combined with interval identification and singing tasks to describe these mechanisms.
The neural correlates of consonance and dissonance perception have been widely studied, but not the neural correlates of consonance and dissonance production. The most straightforward manner of musical production is singing, but, from an imaging perspective, it still presents more challenges than listening because it involves motor activity. The accurate singing of musical intervals requires integration between auditory feedback processing and vocal motor control in order to correctly produce each note. This protocol presents a method that permits the monitoring of neural activations associated with the vocal production of consonant and dissonant intervals. Four musical intervals, two consonant and two dissonant, are used as stimuli, both for an auditory discrimination test and a task that involves first listening to and then reproducing given intervals. Participants, all female vocal students at the conservatory level, were studied using functional Magnetic Resonance Imaging (fMRI) during the performance of the singing task, with the listening task serving as a control condition. In this manner, the activity of both the motor and auditory systems was observed, and a measure of vocal accuracy during the singing task was also obtained. Thus, the protocol can also be used to track activations associated with singing different types of intervals or with singing the required notes more accurately. The results indicate that singing dissonant intervals requires greater participation of the neural mechanisms responsible for the integration of external feedback from the auditory and sensorimotor systems than does singing consonant intervals.
Certain combinations of musical pitches are generally acknowledged to be consonant, and they are typically associated with a pleasant sensation. Other combinations are generally referred to as dissonant and are associated with an unpleasant or unresolved feeling1. Although it seems sensible to assume that enculturation and training play some part in the perception of consonance2, it has been recently shown that the differences in perception of consonant and dissonant intervals and chords probably depend less on musical culture than was previously thought3 and may even derive from simple biological bases4,5,6. In order to prevent an ambiguous understanding of the term consonance, Terhardt7 introduced the notion of sensory consonance, as opposed to consonance in a musical context, where harmony, for example, may well influence the response to a given chord or interval. In the present protocol, only isolated, two-note intervals were used precisely to single out activations solely related to sensory consonance, without interference from context-dependent processing8.
Attempts to characterize consonance through purely physical means began with Helmholtz9, who attributed the perceived roughness associated with dissonant chords to the beating between adjacent frequency components. More recently, however, it has been shown that sensory consonance is not only associated with the absence of roughness, but also with harmonicity, which is to say the alignment of the partials of a given tone or chord with those of an unheard tone of a lower frequency10,11. Behavioral studies confirm that subjective consonance is indeed affected by purely physical parameters, such as frequency distance12,13, but a wider range of studies have conclusively demonstrated that physical phenomena cannot solely account for the differences between perceived consonance and dissonance14,15,16,17. All of these studies, however, report these differences when listening to a variety of intervals or chords. A variety of studies using Positron Emission Tomography (PET) and functional Magnetic Resonance Imaging (fMRI) have revealed significant differences in the cortical regions that become active when listening to either consonant or dissonant intervals and chords8,18,19,20. The purpose of the present study is to explore the differences in brain activity when producing, rather than listening to, consonant and dissonant intervals.
The study of sensory-motor control during musical production typically involves the use of musical instruments, and very often it then requires the fabrication of instruments modified specifically for their use during neuroimaging21. Singing, however, would seem to provide from the start an appropriate mechanism for the analysis of sensory-motor processes during music production, as the instrument is the human voice itself, and the vocal apparatus does not require any modification in order to be suitable during imaging22. Although the neural mechanisms associated with aspects of singing, such as pitch control23, vocal imitation24, training-induced adaptive changes25, and the integration of external feedback25,26,27,28,29, have been the subject of a number of studies over the past two decades, the neural correlates of singing consonant and dissonant intervals were only recently described30. For this purpose, the current paper describes a behavioral test designed to establish the adequate recognition of consonant and dissonant intervals by participants. This is followed by an fMRI study of participants singing a variety of consonant and dissonant intervals. The fMRI protocol is relatively straightforward, but, as with all MRI research, great care must be taken to correctly set up the experiments. In this case, it is particularly important to minimize head, mouth, and lip movement during singing tasks, making the identification of effects not directly related to the physical act of singing more straightforward. This methodology may be used to investigate the neural mechanisms associated with a variety of activities involving musical production by singing.
This protocol has been approved by the Research, Ethics, and Safety Committee of the Hospital Infantil de México "Federico Gómez".
1. Behavioral Pretest
2. fMRI Experiment
Figure 1: Sparse-sampling Design. (A) Timeline of events within a trial involving only listening to a two-tone interval (2 s), without subsequent overt reproduction. (B) Timeline of events within a trial involving listening and singing tasks. Please click here to view a larger version of this figure.
3. Data Analysis
All 11 participants in our experiment were female vocal students at the conservatory level, and they performed well enough in the interval recognition tasks to be selected for scanning. The success rate for the interval identification task was 65.72 ±21.67%, which is, as expected, lower than the success rate when identifying dissonant and consonant intervals, which was 74.82 ±14.15%.
In order to validate the basic design of the study, we hoped to identify neural activity during singing in the regions known to constitute the "singing network," as defined in a number of previous studies25,26,27,28,29,30,31,32,33,34,35,36,37. The effect of singing is observed by means of the first-level linear contrast of interest, which corresponds to singing as opposed to listening. One-sample t-tests were used with clusters determined by Z >3 and a cluster significance threshold of p = 0.05, Family-Wise Error (FWE) corrected38. For anatomic labeling, the SPM Anatomy Toolbox33 and the Harvard-Oxford cortical and subcortical structural atlas were used39. The brain regions where significant activation was observed were the primary somatosensory cortex (S1), the secondary somatosensory cortex (S2), the primary motor cortex (M1), the supplementary motor area (SMA), the premotor cortex (PM), Brodmann area 44 (BA 44), the primary auditory cortex (PAC), the superior temporal gyrus (STG), the temporal pole, the anterior insula, the putamen, the thalamus, and the cerebellum. These activations match those reported in the studies cited above regarding the "singing network," and they are illustrated in Figure 2. Note that in Figures 2 and 3 both, the x-coordinate is perpendicular to the sagittal plane, the y-coordinate is perpendicular to the coronal plane, and the z-coordinate is perpendicular to the transverse, or horizontal plane.
Once the basic design has been validated, two further first-level linear contrasts were calculated for each participant, corresponding to singing dissonant as opposed to consonant intervals and to singing consonant as opposed to dissonant intervals. These linear contrasts were then taken to a second-level random-effects model involving a set of 2-way repeated-measures analyses of variance (ANOVA), with the factors consonance and dissonance. In this manner, the activated or deactivated areas were examined for possible interactions, with activations of interest determined according to the significance voxel threshold, p <0.001, uncorrected for multiple comparisons28,29. For the contrast resulting from singing dissonant as opposed to consonant intervals, increased activations were observed in the right S1, right PAC, left midbrain, right posterior insula, left amygdala, and left putamen. These activations are shown in Figure 3. Regarding the complementary contrast, no significant changes in activation were detected during the singing of consonant intervals.
Figure 2: Activation in Regions that Constitute the "Singing Network." Activation maps are presented with a cluster significance threshold of p = 0.05, family-wise error (FWE) corrected. BOLD responses are reported in arbitrary units. Please click here to view a larger version of this figure.
Figure 3: Contrast between the Singing of Dissonant and Consonant Intervals. Activation maps are presented, uncorrected for multiple comparisons, with a cluster significance threshold of p = 0.001. BOLD responses are reported in arbitrary units. Please click here to view a larger version of this figure.
Interval | Number of semitones | Ratio of fundamentals |
Unison | 0 | 1:1 |
Minor second | 1 | 16:15 |
Major second | 2 | 9:8 |
Minor third | 3 | 6:5 |
Major third | 4 | 5:4 |
Perfect fourth | 5 | 4:3 |
Tritone | 6 | 45:32 |
Perfect fifth | 7 | 3:2 |
Minor sixth | 8 | 8:5 |
Major sixth | 9 | 5:3 |
Minor seventh | 10 | 16:9 |
Major seventh | 11 | 15:8 |
Octave | 12 | 2:1 |
Table 1: Consonant and Dissonant Musical Intervals. Consonant intervals appear in boldface, while dissonant intervals appear in italics. Observe that the more consonant an interval, the smaller the integers that appear in the frequency ratio used to represent it. For an in-depth discussion of consonance and dissonance as a function of frequency ratios, see Bidelman & Krishnan40.
This work describes a protocol in which singing is used as a means of studying brain activity during the production of consonant and dissonant intervals. Even though singing provides what is possibly the simplest method for the production of musical intervals22, it does not allow for the production of chords. However, although most physical characterizations of the notion of consonance rely, to some degree, on the superposition of simultaneous notes, a number of studies have shown that intervals constructed with notes that correspond to consonant or dissonant chords are still perceived as consonant or dissonant, respectively4,6,15,41,42.
The behavioral interval perception task is used to establish, before participants can go through to the scanning session, if they are able to adequately distinguish the intervals. Thus, they perform well once inside the magnetic resonator. Any participants unable to meet a predetermined threshold when performing these identification tasks should not proceed to the fMRI experiment. The main purpose of this selection process is to ensure that differences in performance between the participants are not due to deficient perceptual abilities. Chosen participants should have similar degrees of vocal and musical training, and also, if possible, similar tessituras. If they have vocal ranges that vary significantly, the range of intervals presented during the singing tasks they will need to perform must be personalized.
The microphone setup is critical for the acquisition to be reliable and artifact-free. The type of microphone itself is very important, and although it is possible to use optical28 or specially designed, MR-compatible29 microphones, it has been shown that the sensitivity of condenser microphones is not affected by the presence of the intense magnetic fields in the imaging environment43. Indeed, a small Lavalier condenser microphone may be used in this context, provided a shielded twisted-triplet cable is used to connect the microphone to the preamplifier, which must be placed outside the room where the MR scanner is housed. This arrangement will prevent the appearance of imaging artifacts44, but researchers should also ensure that the scanner does not interfere with the performance of the microphone. To this end, a test tone can be sent through the MR-compatible headphones to the microphone placed inside the MR scanner, and the signal obtained in this manner can then be compared to that obtained by sending the same tone to the microphone now placed outside the scanner. The sound pressure levels inside the MR scanner can be extremely high45, so the microphone must be placed as close as possible to the source. By asking participants to hum rather than openly sing notes, the movement in and around the mouth area can be minimized. By placing the microphone just below the larynx, covered by tape, it is possible to obtain a faithful recording of the singer's voice. The recording will naturally be very noisy – this cannot be avoided – but if researchers are mainly interested in pitch and not in the articulation or enunciation of words, a variety of software packages can be used to clean the signal enough for the detection of the fundamental frequency of each sung note. A standard method would be to use audio-editing software to filter the time signals through a Hamming window and then to use the autocorrelation algorithms built into certain speech and phonetics software packages to identify the sung fundamentals. Vocal accuracy can then be calculated for each participant. Potential applications of the data obtained from the recordings include correlating pitch or rhythmic accuracy with either degree of training or interval distances.
Functional images are acquired using a sparse sampling design so as to minimize BOLD or auditory masking due to scanning noise25,28,29,46,47. Every subject undergoes 3 experimental runs, each one lasting 10 min. During each run, subjects are first asked to lay in silence during 10 silent baseline trials, then to listen passively to a block of 10 intervals, and finally to listen to and sing back another block of 40 intervals. One purpose of keeping individual runs as short as possible is to avoid participant fatigue. Nonetheless, it has since been concluded that it might be better in the future to include the same number of listen-only and singing trials, which can then be presented in alternating blocks. This would have the effect of increasing statistical power. As an example, a run could consist of 2 blocks of 5 silent baseline trials, 4 blocks of 5 listen-only trials, and 4 blocks of singing trials. The blocks would then be presented to participants in alternation, with a total duration of 500 s per run.
The main reason for having participants listen passively inside the resonator is to have a means of subtracting auditory activity from motor activity. Thus, a favorable comparison of singing activations against the "singing network"25,27,28,29,36,37 is indispensable for the proper validation of the study. Note that "singing network" activations are very robust and well-established and are usually detected by means of one-sample t-tests and a corrected cluster significance threshold of p = 0.05. Activations corresponding to the contrast between singing dissonant/consonant and consonant/dissonant intervals are typically identified by means of 2-way repeated factors analyses of variance (ANOVA) according to the significance voxel threshold p < 0.001, uncorrected for multiple comparisons28,29. It is expected that participants will find singing dissonant intervals more challenging than singing consonant intervals48,49; thus, different activations for each of the two contrasts described above are anticipated. Results indicate that singing dissonant intervals involves a reprogramming of the neural mechanisms recruited for the production of consonant intervals. During singing, produced sound is compared to intended sound, and any necessary adjustment is then achieved through the integration of external and internal feedback from auditory and somatosensory pathways. A detailed discussion of these results and the conclusions drawn from them is included in the article by González-García, González, and Rendón30.
This protocol provides a reasonably straightforward method for the study of the neural correlates of musical production and for monitoring the activity of both the motor and auditory systems. It can be used to track differences in brain activation between binary conditions, such as singing consonant or dissonant intervals, and singing narrow or wide intervals30. It is also well-suited to study the effect of training on a variety of tasks associated with singing specific frequencies. On the other hand, because of the very large amount of noise contained in recordings of the sung voice obtained during the scan, it would be difficult to employ this protocol to analyze tasks concerned with quality of tone or timbre, especially because these are qualities that cannot be gauged correctly while humming.
The authors have nothing to disclose.
The authors acknowledge financial support for this research from Secretaría de Salud de México (HIM/2011/058 SSA. 1009), CONACYT (SALUD-2012-01-182160), and DGAPA UNAM (PAPIIT IN109214).
Achieva 1.5-T magnetic resonance scanner | Philips | Release 6.4 | |
Audacity | Open source | 2.0.5 | |
Audio interface | Tascam | US-144MKII | |
Audiometer | Brüel & Kjaer | Type 1800 | |
E-Prime Professional | Psychology Software Tools, Inc. | 2.0.0.74 | |
Matlab | Mathworks | R2014A | |
MRI-Compatible Insert Earphones | Sensimetrics | S14 | |
Praat | Open source | 5.4.12 | |
Pro audio condenser microphone | Shure | SM93 | |
SPSS Statistics | IBM | 20 | |
Statistical Parametric Mapping | Wellcome Trust Centre for Neuroimaging | 8 |