Ultrasound imaging can be used to display the shape and movements of the tongue in real time during speech. The images can be used to determine the nature of speech sound errors. Visual feedback of the tongue can be used to facilitate improvements in speech sound production in clinical populations.
Diagnostic ultrasound imaging has been a common tool in medical practice for several decades. It provides a safe and effective method for imaging structures internal to the body. There has been a recent increase in the use of ultrasound technology to visualize the shape and movements of the tongue during speech, both in typical speakers and in clinical populations. Ultrasound imaging of speech has greatly expanded our understanding of how sounds articulated with the tongue (lingual sounds) are produced. Such information can be particularly valuable for speech-language pathologists. Among other advantages, ultrasound images can be used during speech therapy to provide (1) illustrative models of typical (i.e. "correct") tongue configurations for speech sounds, and (2) a source of insight into the articulatory nature of deviant productions. The images can also be used as an additional source of feedback for clinical populations learning to distinguish their better productions from their incorrect productions, en route to establishing more effective articulatory habits.
Ultrasound feedback is increasingly used by scientists and clinicians as both the expertise of the users increases and as the expense of the equipment declines. In this tutorial, procedures are presented for collecting ultrasound images of the tongue in a clinical context. We illustrate these procedures in an extended example featuring one common error sound, American English /r/. Images of correct and distorted /r/ are used to demonstrate (1) how to interpret ultrasound images, (2) how to assess tongue shape during production of speech sounds, (3), how to categorize tongue shape errors, and (4), how to provide visual feedback to elicit a more appropriate and functional tongue shape. We present a sample protocol for using real-time ultrasound images of the tongue for visual feedback to remediate speech sound errors. Additionally, example data are shown to illustrate outcomes with the procedure.
Both clinical and research settings have seen an increase in the use of ultrasound imaging to provide visual biofeedback intervention to individuals with speech disorders. One important use of ultrasound imaging for speech-language pathologists is as a visual biofeedback tool during intervention for individuals with speech disorders. With the guidance of a speech-language pathologist, learners can observe real-time video of the shape and movements of their tongue and discuss how these images may differ from the tongue movements needed to properly articulate a speech sound. To conduct such interventions, it is important for users to be competent in the interpretation of ultrasound images as the tongue moves in real time. Knowledge of the range of correct articulatory patterns used by typical speakers is foundational to recognizing erroneous tongue shapes.
The methods described herein address (a) collecting ultrasound images of the tongue, (b) interpreting ultrasound images associated with both correct and incorrect productions of speech sounds, and (c) using real-time ultrasound imaging as a source of visual biofeedback to facilitate speech production changes in individuals with speech sound errors. Although ultrasound can be used to visualize a variety of lingual phonemes, examples here will focus on ultrasound images of the tongue for the /r/ sound (as in red car), which is described as the most common residual error among children acquiring American English 1. It is also the sound that has been most extensively studied in clinical applications of ultrasound to date. 2-14
One important goal in speech (re)habilitation is to facilitate more intelligible speech by teaching articulatory routines that result in perceptually appropriate productions of a target sound or sequence. Therefore, it is critical to understand tongue actions during normal speech and during production of speech errors. Real-time visualization of the tongue can play a highly beneficial role in encouraging a speaker to modify articulatory movements, as it provides the clinician and client with a shared representation of what is actually happening during speech. Without real-time visualization of the tongue, only static pictures or verbal descriptions of target tongue configurations are available to facilitate understanding of the desired articulatory behaviors. In schema-based models of motor learning, visual information about the movements of the tongue during speech is considered a form of "knowledge of performance" feedback (i.e. it provides specific qualitative information about the movement that occurred)15. Previous research has indicated that detailed knowledge of performance feedback can facilitate acquisition of a novel motor routine16.
Ultrasound has several advantages over other technologies used to visualize speech. With ultrasound, the entire contour of the tongue can be visualized quickly from tip to root. Preparation for ultrasound imaging generally takes less than a minute.
In contrast, electropalatography (EPG) requires a dental impression and the creation of a customized pseudopalate (which may take weeks), and it can take time to adapt to speaking with the pseudo-palate 17. EPG also enables visualization of tongue-palate contact only in the region covered by the pseudopalate and cannot display the tongue root or the overall shape of the tongue. This limits the nature of what aspects of articulation can be effectively targeted with EPG.
Another alternative is electromagnetic articulography (EMA), which can provide general information about tongue shape and movement 18. However, EMA requires sensors to be glued to the tongue and other structures; thus, the set-up for this type of tongue imaging can take 20 – 30 min and may not be a viable method for frequent use. Thus, ultrasound may be viewed as more practical.
In the specific context of clinical research on the assessment and treatment of /r/ errors, the use of ultrasound has been reported in several studies for individuals with idiopathic speech sound disorders 2,10,11,13,19, hearing impairment 20, childhood apraxia of speech 12,21, and acquired apraxia of speech following a cerebral vascular accident 22. Studies have also reported the use of ultrasound to treat errors on other lingual phonemes such as /s k g l ʃ ʧ / 23,24. Additional populations that may be candidates include individuals with speech disorders related to cleft palate, or individuals learning pronunciation of sounds in a non-native language 25.
Ultrasound imaging may also be useful diagnostically, e.g., to characterize errors in lingual shapes,26,27, or to identify sub-perceptible or covert contrasts in disordered speech 28,29. If precise articulatory measurements are being obtained and compared, it is essential that the ultrasound be stabilized so that the coordinate space for measurement remains reasonably constant. However, it is generally agreed that an unstabilized probe yields information of sufficient quality for clinical diagnosis and treatment applications, which is the focus of the present paper.
Ethics Statement. When used in research, informed consent and/or assent from children is always required before collecting ultrasound images. When used clinically, clients should be informed of the purpose of the ultrasound imaging. Although diagnostic ultrasound imaging is considered "minimal risk" 30, users should always follow the ALARA principle when using ultrasound, meaning exposure to ultrasound should be as "As Low As Reasonably Achievable"31. This involves limiting acoustic power during imaging and also limiting exposure time. For example, if ultrasound is being used for visual feedback but the participant is not attending to the visual feedback, it would be prudent to discontinue imaging.
1. Collecting Ultrasound Images of the Tongue
NOTE: Technical Considerations. Diagnostic ultrasound probes are used to image the tongue. A frequency range between approximately 3 – 8 MHz with a frame rate of about 30 frames per second is recommended for clinical imaging the tongue 32.
NOTE: The instructions below apply to the diagnostic Ultrasound System (see Materials Table) with a C6-2 transducer, which was selected based on visual comparison of ultrasound images collected from several transducers available for this system. These instructions are adapted from the diagnostic ultrasound system reference manual for this device and are intended to be an illustrative example for one ultrasound. Many other ultrasound systems are in use, and users should consult the operating manuals of their specific device.
2. Interpreting Ultrasound Images of the Tongue
3. Using Real-time Ultrasound Images for Feedback to Remediate Speech Sound Errors
Figure 1 presents sample sagittal images of correct /r/ in a 9-year-old female. The ultrasound images are paired with magnetic resonance images from the same speaker to demonstrate the similar tongue shape that can be viewed with both technologies.
Figure 1: Sagittal View of a Magnetic Resonance Image during a Correctly Produced American English /r/ with Ultrasound Image of the Tongue (bottom right) from the Same Participant. In all images, the right side of the image represents anterior and the left represents posterior. Notice the elevation of the anterior tongue (right arrow) and the lowering of the dorsum (left arrow). Please click here to view a larger version of this figure.
In Figure 2, the same 9-year-old is shown 3 months earlier (before ultrasound visual feedback therapy). Note that the distorted /r/ involves a high posterior tongue position, low tongue tip/blade, and lack of a pharyngeal constriction, yielding a sound perceptually similar to [ʊ]. Correct /r/ productions feature elevation of the anterior tongue, a lowered tongue dorsum, and a posterior narrowing reflecting retraction of the tongue root. Note that a range of tongue shapes are possible for correct /r/.
Figure 2: Sagittal View of a Magnetic Resonance Image during a Distorted Production of American English /r/ with Ultrasound Image of the Tongue (bottom right) from the Same Participant. In all images, the right side of the image represents anterior and the left represents posterior. Notice the low tongue tip/blade (right arrow) and the raised tongue dorsum (left arrow). Please click here to view a larger version of this figure.
Figure 3 shows sample correct and incorrect /r/ productions in coronal view. Note the elevation of the sides of the tongue, along with midline grooving, in the correct productions and a relatively flat tongue shape for distorted /r/.
Figure 3: Sample Coronal Ultrasound Tongue Images of Correct (top) and Distorted (bottom) Productions of American English /r/ In these Coronal Views, the Probe is Positioned Vertically to Image the Posterior Tongue Dorsum. Notice the elevation of the lateral margins of the tongue for the correct /r/, along with a groove in the middle. Notice the flat tongue shape for the distorted /r/. These images are from an EchoBlaster 128 ultrasound. Please click here to view a larger version of this figure.
To date, studies on ultrasound visual feedback for speech sound errors have involved case series or single subject designs.2,5,9-13,21-23 Widely varying patterns of individual response to treatment have been reported. For many individuals, improvement in sound accuracy can be observed with just a few hours of experimental treatment on /r/. Individuals who do not show immediate gains may still achieve improved production over the course of ultrasound practice. Gains made in the treatment setting almost always require some time to generalize to untreated words or contexts.
Figure 4 shows the average accuracy on words containing /r/ across 11 American English speaking participants ages 10-20 years who were treated for /r/ distortions. The data are from multiple-baseline across-subjects single case designs 13,34. Some of the participants were treated on other sounds as well, although the figure is restricted to accuracy of /r/ in one word position per participant. The vertical axis represents percent of untreated /r/ words judged as correct. The horizontal axis represents separate sessions (spaced approximately 3 – 4 d apart) in which data were collected. Accuracy of /r/ production at the word level was monitored before, during, and after the 7 treatment sessions. Multiple listeners rated recorded productions of words as either "correct /r/" or "incorrect /r/" based on perceived phonetic accuracy. The box reflects the 7 sessions in which ultrasound biofeedback therapy was provided. Improved /r/ accuracy corresponds with the onset of treatment. Moreover, after 7 sessions, when treatment was withdrawn, an upward trend in accuracy continues, suggesting that retention and generalization continued to occur.
Figure 4: Mean Accuracy of /r/ in Single Words for 11 Participants Ages 10 – 20 Years Treated for /r/ Distortions. The box represents the sessions in which ultrasound visual feedback treatment occurred. Error bars represent standard deviations. Please click here to view a larger version of this figure.
Critical Steps within the Protocol
It is essential to obtain clear, interpretable images as described in steps 1.3 and 1.6. Poor image quality renders the procedures meaningless. Additionally, participants must be fully aware of what they are seeing on the screen. Therefore, orienting the participant to the image as described in 3.2 is a step that should be emphasized prior to providing visual feedback training. Additionally, step 3.10, which involves clearly describing differences in tongue shape between the participant's perceptually accurate and inaccurate tongue shapes, is a critical step to increase awareness of the target tongue shape for a specific speaker.
Modifications and Troubleshooting
Image quality is essential. When image quality is waning, it may be necessary to re-apply gel and/or to check that the probe is making stable contact with the skin.
Additionally, it is important to recognize when the images are not representing what the user intends. For example, when collecting sagittal images, if the probe is positioned in the midsagittal plane (i.e., down the middle of the head), the image will show the groove running down the center line of the tongue. If the probe is positioned to the side, the image will show more of the lateral edge of the tongue. The gross shape of the ultrasound "bright white line" will be similar if the image shows more of the groove or more of the tongue side, but they will not be exactly the same. The user should therefore regularly check the position of the probe to determine whether the images reflect mid-sagittal images, and reposition the probe if necessary.
Limitations of the Technique
Although ultrasound has significant advantages over other approaches to visualizing speech production, it is not without limitations. One primary limitation of ultrasound imaging is that only the tongue is imaged. That is, other structures such the hard or soft palate or the pharyngeal walls are not visible; thus, the relation of the tongue to other structures is not apparent. Additionally, it can be difficult to determine where exactly along the tongue contour the images are collected. For example, when interpreting sagittal images of the tongue, the position of the probe is important to consider, as images may not necessarily be mid-sagittal (i.e., midline) if the probe is offcenter or has been rotated. Additionally, not all participants/clients tolerate the use of ultrasound gel beneath the chin. The mindful user of ultrasound should be aware of both the advantages and the limitations of the technology.
Significance of the Technique with Respect to Existing/Alternative Methods
Ultrasound imaging of the tongue using diagnostic mode can be a fast, safe, and effective technology for visualizing tongue movements in real time 30,32. This information can be used to contrast correct and incorrect productions of speech sounds as a way to understand speech errors and teach desired movements for a variety of speech sounds. Traditional speech therapy methods for assessing and remediating speech sound errors such as /r/ distortions rely on auditory perception. Thus, the speech-language clinician is unaware of the exact nature of the speaker's tongue movements. Cues are often provided instructing speakers to modify their tongue position without any visual reference to the actual movement. Thus, real-time imaging of the tongue in offers an immediate visualization for shared discussion of speech, which traditionally has been abstract or transient. With respect to current theories on speech motor learning (e.g., schema-based motor learning), ultrasound visual feedback offers a form of knowledge of performance feedback 13,15. This feedback may facilitate the acquisition of new speech motor plans for individuals who have previously had difficulty understanding the target movements.
Ultrasound imaging can be particularly useful for evaluating 26,27 and remediating 2,10,11,12,13,20 speech sound errors that involve the oral and pharyngeal constrictions associated with /r/. Sagittal views can identify if the participant is lacking an anterior constriction or tongue root retraction. Coronal views provide the ability to examine whether there is midline grooving and elevation of the lateral margins of the tongue during /r/ production. Once the elements in error have been properly identified, this information can be used to systematically train new tongue movements, ideally while viewing real-time feedback of the tongue 2,10,11,12,13,20. Methods such as electropalatography or electromagnetic articulography do not allow sufficient visualization of all aspects of the tongue, such as the tongue root, whereas ultrasound can overcome this limitation.
Future Applications or Directions after Mastering This Technique
The protocol outlined here is intended to be broad enough to allow others to follow the procedures regardless of the ultrasound technology available. The procedures are also intended to be flexible enough to meet a variety of clinical research or clinical practice needs. Although the focus throughout this discussion was on the specific context of treatment for /r/, these procedures can readily be adapted when training other speech sounds or when working with a variety of populations. Ultrasound feedback of the tongue can be useful for remediation of lingual sounds other than /r/, including vowels, velar and alveolar stops and nasals, and lingual fricatives and affricates 21,23.
Variations in procedures exist; for example, some researchers have used head stabilization techniques to prevent movement of the vocal tract relative to the ultrasound probe. Such procedures are useful if one intends to measure the contour of the tongue 23,35,36 and stabilization can also overcome some of the problems such as the drift in the position of the probe over time; however, head stabilization during ultrasound imaging of the tongue can lead to practical limitations (e.g., uncomfortable head-mounted devices) and thus the ultrasound user must make decisions about the relative trade-off of such procedures. Studies are underway exploring specific modifications to the procedures (e.g., the amount of practice with ultrasound that is ideal, the role of cueing only oral constrictions vs. oral and pharyngeal constrictions) to determine the methods that are optimally effective. In sum, evidence continues to accumulate that procedures incorporating ultrasound feedback of the tongue can yield improved speech clarity in individuals with speech sound disorders.
The authors have nothing to disclose.
The work was supported by NIH grants R01DC013668 (D. Whalen, PI) and R03DC013152 (J. Preston, PI).
ACUSON X300 ultrasound with C6-2 probe | Siemens | Acuson X300 | |
Trasceptic Spray | Parker labs | PLI 09-25 | |
Acquasonic 100 ultrasound gel | Parker labs | 01-08 |