Objective assessments of the physiological mechanisms that support speech are needed to monitor disease onset and progression in persons with ALS and to quantify treatment effects in clinical trials. In this video, we present a comprehensive, instrumentation-based protocol for quantifying speech motor performance in clinical populations.
Improved methods for assessing bulbar impairment are necessary for expediting diagnosis of bulbar dysfunction in ALS, for predicting disease progression across speech subsystems, and for addressing the critical need for sensitive outcome measures for ongoing experimental treatment trials. To address this need, we are obtaining longitudinal profiles of bulbar impairment in 100 individuals based on a comprehensive instrumentation-based assessment that yield objective measures. Using instrumental approaches to quantify speech-related behaviors is very important in a field that has primarily relied on subjective, auditory-perceptual forms of speech assessment1. Our assessment protocol measures performance across all of the speech subsystems, which include respiratory, phonatory (laryngeal), resonatory (velopharyngeal), and articulatory. The articulatory subsystem is divided into the facial components (jaw and lip), and the tongue. Prior research has suggested that each speech subsystem responds differently to neurological diseases such as ALS. The current protocol is designed to test the performance of each speech subsystem as independently from other subsystems as possible. The speech subsystems are evaluated in the context of more global changes to speech performance. These speech system level variables include speaking rate and intelligibility of speech.
The protocol requires specialized instrumentation, and commercial and custom software. The respiratory, phonatory, and resonatory subsystems are evaluated using pressure-flow (aerodynamic) and acoustic methods. The articulatory subsystem is assessed using 3D motion tracking techniques. The objective measures that are used to quantify bulbar impairment have been well established in the speech literature and show sensitivity to changes in bulbar function with disease progression. The result of the assessment is a comprehensive, across-subsystem performance profile for each participant. The profile, when compared to the same measures obtained from healthy controls, is used for diagnostic purposes. Currently, we are testing the sensitivity and specificity of these measures for diagnosis of ALS and for predicting the rate of disease progression. In the long term, the more refined endophenotype of bulbar ALS derived from this work is expected to strengthen future efforts to identify the genetic loci of ALS and improve diagnostic and treatment specificity of the disease as a whole. The objective assessment that is demonstrated in this video may be used to assess a broad range of speech motor impairments, including those related to stroke, traumatic brain injury, multiple sclerosis, and Parkinson disease.
I. Subsystem Analyses
1. Respiratory subsystem/ Breathing for speech
The respiratory subsystem is evaluated using the Phonatory Aerodynamic System (PAS). The system allows for simultaneous recordings of oral pressure, airflow, and speech acoustics (see Table 1 for the list of equipment and manufacturers). A disposable face mask and a disposable pressure-sensing tube are necessary for recordings. Prior to recording, the flow and pressure channels are calibrated according to the manufacturer's specifications.
- Vital Capacity (VC) is the maximum volume of air that is exhaled following maximum inhalation. VC is evaluated using a disposable face mask that is attached to the pneumotachograph.
- The PAS "Vital Capacity" protocol is selected for the recording.
- The participant is instructed to inhale as maximally as possible and exhale maximally into the mask; the task is repeated three times.
- Maximum expiratory volume is derived using PAS software.
- Subglottal pressure (Ps) is the air pressure available in the lungs for production of "pressure" consonants. Ps is evaluated indirectly by measuring peak pressure in the mouth during the production of a syllable train2,3.
- The PAS "Voicing Efficiency" protocol is selected for the recording.
- To record the oral pressure during /pa/, the pressure-sensing tube is positioned inside the mouth on the tongue surface.
- Nasal passages are occluded with a nose clip to eliminate potential nasal air flow escape.
- The participant is instructed to inhale approximately twice their normal amount and say /pa/ into the face mask. The syllable /pa/ is repeated seven times on one exhalation, while maintaining consistent pitch and loudness. The rate is maintained at 1.5 syllables per second.
- Peak oral pressure is measured for five (middle) repetitions of /pa/. An average of these five productions is obtained to represent Ps during speech.
- Because Ps covaries with sound pressure level (SPL)4,5, the SPL is also collected for each syllable. It is used subsequently as a covariate during analyses.
- Speech breathing is evaluated during connected speech while participants read a standard 60-word paragraph (Appendix 1) developed specifically for accurate, automatic pause-boundary detection6.
- The PAS "Maximum Phonation" protocol is selected for the recording.
- The airflow signal is collected using a disposable mask that is fit around the face.
- The participant is instructed to read the paragraph at their normal comfortable speaking rate and loudness.
- Air flow traces are exported into a custom-made Speech-Pause Analysis (SPA)7 software program in Matlab. In this program, the pauses in connected speech are identified. The software calculates, among other measures, percent pause time, which is a measure of time spent pausing during the reading of a passage.
2. Phonatory subsystem
The phonatory subsystem is evaluated via voice recordings using high-quality acoustic recording equipment (Table 1).
- The microphone is placed approximately 15 cm away from the mouth.
- A nasal clip is used to eliminate the potential effect of the velopharyngeal inadequacy on the quality of phonation.
- The participant is asked to produce "Maximum Phonation". He or she is instructed to inhale the maximum amount of air and then to phonate /a/ at a normal pitch and loudness for as long as possible. This task is practiced at least once prior to recording. The importance of putting forth maximum effort is emphasized.
- Maximum phonation duration is measured in seconds using the acoustic waveform.
- The digitized acoustic waveform is loaded into the Multidimensional Voice Profile (MDVP) software for analysis. Measures of central tendency and variability of fundamental frequency (F0), noise-to-harmonic ratio (NHR) and percent jitter, among others, are obtained for the middle five seconds of the phonation interval.
3. Resonatory subsystem
The resonatory subsystem is evaluated using Nasometer. This device consists of a headset with a baffle plate, which is positioned under the nose and separates the oral and nasal cavities. Two microphones that detect the oral and nasal acoustic signals are attached to opposite sides of the plate.
- The device is calibrated prior to each recording.
- The headset is placed on the head with the baffle plate resting above the upper lip and positioned parallel to the ground.
- The participant is asked to repeat one "nasal" (e.g., Mama made some lemon jam) and one "non-nasal" (e.g., Buy Bobby a puppy) sentence three times at a habitual speaking rate and loudness.
- The measured intensities of the voiced portion of the oral and nasal acoustic signals are converted into a nasalance score, which is defined as the ratio of nasal / nasal+oral acoustic energy, and is expressed as a percentage. The nasalance score reflects the relative proportion of nasal-to-oral acoustic energy in a speech stream8.
- The Nasometer software calculates numerous descriptive statistics from the nasalance waveform.
- Nasalance distance, which is derived by subtracting the mean nasalance calculated across oral sentences (BBP) from the mean nasalance for the nasal sentences (MMJ)9, can also be used as an index of velopharyngeal impairment.
4. Articulatory subsystem: Face
Facial (lip and jaw) movements are registered in 3D using a high resolution, optical motion capture system10. The infrared digital video cameras capture the positions of 15 reflective markers that are attached to each participant's head and face at specific anatomical landmarks. An acoustic speech signal is recorded simultaneously with speech kinematics.
- The system is calibrated prior to recordings according to the manufacturer's specifications.
- Four markers are attached to the forehead of the participant using a head band. Markers are also attached to the left and right eyebrow, the bridge and tip of the nose, the vermilion border of the upper and lower lip, the left and right corners of the mouth, and to three different locations on the chin. This is the typical marker array used in this protocol, but an unlimited number of markers can be used with this system.
- The participant is asked to read sentences and phrases (see Table 2) at their habitual speaking rate and loudness.
- A "rest" file recording is obtained and used in post-processing to normalize for differences in marker placement between sessions and for re-expression of the data relative to the consistent anatomically-based coordinate system as needed.
- During post-processing, movements of the facial markers are checked for tracking errors and head-corrected based on the subtraction of both the translational and rotational components of head movement.
- The data are loaded into SMASH, a Matlab based software program developed in our lab. Within SMASH, the data are filtered and parsed. Peak movement speed is derived from each trace and used as the primary indicator of articulatory function for the jaw and lips. 3D speed is computed as the first-order derivative of each articulator's Euclidian distance time history in SMASH.
5. Articulatory subsystem: Tongue
Tongue tracking is accomplished using an electromagnetic tracking device (WAVE), which records the position and rotation of sensors that are attached to the tongue. Unlike the optical motion tracking that is used to record external, facial structures, the electromagnetic technology provides a way to accurately track tongue movements during speech11.The system uses a combination of 5 and 6-degree-of-freedom (5DOF and 6DOF) sensors to record articulatory motions in a calibrated volume (30 x 30 x 30 cm). Movement data and acoustic data are acquired simultaneously.
- Two sensors are attached to the articulators using dental glue (PeriAcryl Periodontal Adhesive). One reference is attached to the bridge of the nose to record head movements. One small 5DOF sensor (3D location and 2D angular measurements) is attached to the tongue at midline, approximately 2 cm posterior to the tongue tip.
- To obtain tongue movements that are independent from the underlying jaw, each participant is fitted with a pre-made 5 mm bite block. The bite block is made of non-toxic condensation putty (Henry Schein).
- The bite block is placed between molars on the side of the mouth. A string attached to the bite block is secured to the participant's face to prevent swallowing of the bite block.
- The participant is asked to read sentences and phrases (see Table 2).
- Tongue movements are recorded relative to head position.
- Post-acquisition, the data is transferred into SMASH, where it is low-pass filtered, parsed based on the vertical movement trace, and used to calculate 3D speed. The average and maximum speed of movement during each utterance is reported as an index of disease-related change of this articulator.
II. System-level Assessment
In addition to the subsystem-level variables, speech intelligibility and speaking rate are measured. These measures are essential because they are current clinical "goal standards" characterizing bulbar speech performance. They provide an indication of the functional status of the speech production system as a whole and quantify the severity of speech impairment. These measures are obtained using the Sentence Intelligibility Test (SIT)12.
- Prior to recording, a random list of 10 sentences of increasing length (from 5 to 15 words) is generated by the SIT software.
- A microphone is placed on the head, approximately 15 cm from the mouth.
- The participant is asked to read the list at their habitual speaking rate and loudness. The sentences are digitally recorded at 44.1k using a 16 bit resolution.
- Several trained judges who are unfamiliar to the participant transcribe the sentences orthographically and measure sentence durations.
- The SIT software automatically calculates speech intelligibility, which is reported as percent of words correctly transcribed out of the total number of words produced. Speaking rate is also reported as the number of words read per minute.
|Subsystem||Equipment / Software||Signal||Acquisition Settings|
|Respiratory||Phonatory Aerodynamic System (PAS), KayPENTAX, Lincoln Park, NJ, USA||Acoustic, pressure, and flow||Sampling rate=200 Hz, Low-pass filtered=30Hz|
|Phonatory||Compact flash recorder (E.g., PMD660),
Professional quality microphone,
SPL meter, Extech Instruments
Software: MDVP, KAYPentax
|Acoustic||Sampling rate=44.01 kHz, 16 bit linear PCM|
|Resonatory||Nasometer, Model 6400, KAYPentax||Acoustic||Sampling rate=11025 Hz|
|Articulatory: Face||Eagle Digital System, Motion Analysis Corp.||Kinematic and acoustic||Sampling rate=120Hz, Low-pass filtered =10Hz|
|Articulatory: Tongue||WAVE, Northern Digital Inc, Canada||Kinematic and acoustic||Sampling rate=100Hz, Low pass filtered=20Hz|
Table 1: Instrumentation and acquisition settings for sub-system data collection
|Level||Task||Measurements||References & Norms|
|Respiratory||VC||Maximum expiratory lung volume||13|
|/pa/ x 7||Subglottal pressure||2, 3|
|Bamboo passage||% Pause time||6, 7, 14|
|Phonatory||Maximum phonation /a/||Maximum phonation duration, mean F0, jitter, SNR||15, 16, 17, 3|
|Resonatory||Mama made some lemon jam; Buy Bobby a puppy||Nasalance||18, 19|
|Articulatory: Face||Buy Bobby a puppy; Say _ again (bat, tide, keep, tool)||Movement speed||20, 21|
|Articulatory: Tongue||/ta/ x 5, Say doily again|
|System-level||SIT, Sentences||Speech intelligibility and speaking rate||12|
Table 2: Measurements obtained for each subsystem and task
Appendix 1: Bamboo passage
Bamboo walls are getting to be very popular. They are strong, easy to use, and good looking. They provide a good background and create the mood in Japanese gardens. Bamboo is a grass, and is one of the most rapidly growing grasses in the world. Many varieties of bamboo are grown in Asia, although it is also grown in America. Last year we bought a new home and have been working on the flower gardens. In a few more days, we will be done with the bamboo wall in one of our gardens. We have really enjoyed the project.
Here we demonstrated a comprehensive protocol for the assessment of bulbar (speech) dysfunction in ALS. The data obtained from this protocol are used to gain a deeper understanding of how ALS affects speech production. These data are also used to identify the most sensitive measures of disease progression. Although this protocol is currently being employed for research, the findings from this research will be utilized to develop more cost-efficient and clinically feasible approaches to quantify bulbar involvement.
No conflicts of interest declared.
This work has been supported by the National Institute of Health, National Institute on Deafness and Other Communication Disorders, Grant R01DCO09890-02, Canadian Foundation for Innovation (CFI-LOF #15704), and Connaught Foundation, University of Toronto. The authors would like to thank Cynthia Didion, Mili Kuruvilla, Krista Rudy, and Lori Synhorst for assistance with data collection and analysis; and Cara Ullman for creating video clips.
Animations were made by Blue Tree Publishing (http://www.bluetreepublishing.com/)
The SPA and SMASH software is Matlab based and can be obtained by contacting Jordan Green at email@example.com.
Visit our labs:
Bulbar Function Laboratory (Sunnybrook Health Sciences Centre in Toronto, Canada):
Speech Production Laboratory (University Nebraska Lincoln):
|Phonatory Aerodynamic System (PAS)||KayPENTAX|
|Compact flash recorder||PMD660|
|Professional quality microphone|
|SPL meter||Extech Instruments|
|Eagle Digital System||Motion Analysis Corp.|
|WAVE||Northern Digital Inc, Canada|
- Ball, L. J., Willis, A., Beukelman, D. R., Pattee, G. L. A protocol for identification of early bulbar signs in amyotrophic lateral sclerosis. J. Neurol. Sci. 191, 43-53 (2001).
- Smitheran, J. R., Hixon, T. J. A clinical method for estimating laryngeal airway resistance during vowel production. J. Speech Hear. Disord. 46, 138-146 (1981).
- Baken, R. J., Orlikoff, R. F. Clinical Measurement of Speech and Voice. , Singular Publishing Group. San Diego. (2000).
- Stathopoulos, E. T. Relationship between intraoral air pressure and vocal intensity in children and adults. J. Speech Hear. Res. 29, 71-74 (1986).
- Gauster, A., Yunusova, Y., Zajac, D. Effect of speaking rate on measures of velopharyngeal function in healthy speakers. Clin. Linguist. Phon. 24, 576-588 (2010).
- Green, J. R., Beukelman, D. R., Ball, L. J. Algorithmic estimation of pauses in extended speech samples of dysarthric and typical speech. J. Med. Speech Lang. Pathol. 12, 149-154 (2004).
- Wang, Y., Green, J. R., Nip, I. S. B., Kent, R. D., Kent, J. F., Ullman, C. Accuracy of perceptually-based and acoustically-based inspiratory loci in reading. Behavior Research Methods. , Forthcoming Forthcoming.
- Fletcher, S. G. "Nasalance" vs. listener judgments of nasality. Cleft Palate J. 13, 31-44 (1976).
- Bressmann, T. Nasalance distance and ratio: Two new measures. Cleft Palate Craniofac. J.. 37, 248-256 (2000).
- Green, J. R., Wilson, E. M. Spontaneous facial motility in infancy: A 3D kinematic analysis. Dev. Psychobiol. 48, 16-28 (2006).
- Yunusova, Y., Green, J., Mefferd, A. Accuracy Assessment for AG500, Electromagnetic. Articulograph. J. Speech Lang. Hear.Res. 52, 556-570 (2009).
- Beukelman, D., Yorkston, K., Hakel, M., Dorsey, M. Speech Intelligibility Test. , Madonna Rehabilitation Hospital. Lincoln. (2007).
- Lyall, R. A., Donaldson, N., Polkey, M. I., Leigh, P. N., Moxham, J. Respiratory muscle strength and ventilatory failure in amyotrophic lateral sclerosis. Brain. 124, 2000-2013 (2001).
- Sapienza, C. M., Stathopoulos, E. T., Brown, S. Speech breathing during reading in women with vocal nodules. J. Voice. 11, 195-201 (1997).
- Hakkesteegt, M. M., Brocaar, M. P., Wieringa, M. H., Feenstra, L. Influence of age and gender on the dysphonia severity index. A study of normative values. Folia Phoniatr. Logop. 58, 264-273 (2006).
- Hakkesteegt, M. M., Brocaar, M. P., Wieringa, M. H., Feenstra, L. The relationship between perceptual evaluation and objective multiparametric evaluation of dysphonia severity. J. Voice. 4, 529-542 (2007).
- Robert, D., Pouget, J., Giovanni, A., Azulay, J. P., Triglia, J. M. Quantitative voice analysis in the assessment of bulbar involvement in amyotrophic lateral sclerosis. Acta Otolaryngol. 119, 724-731 (1999).
- Hardin, M. A., Demark, D. R. V. an, Morris, H. L., Payne, M. M. Correspondence between nasalance scores and listener judgments of hypernasality and hyponasality. Cleft Palate Craniofac J. 29, 346-351 (1992).
- Delorey, R., Leeper, H. A., Hudson, A. J. Measures of velopharyngeal functioning in subgroups of individuals with amyotrophic lateral sclerosis. J. Med. Speech Lang. Pathol. 7, 19-31 (1999).
- Tasko, S. M., Westbury, J. R. Speed-curvature relations for speech-related articulatory movement. J. Phon. 32, 65-80 Forthcoming.
- Yunusova, Y., Green, J. R., Lindstrom, M. J., Bal, L. J., Pattee, G. L., aZinman, L. Kinematics of disease progression in bulbar ALS. J Commun. Disord. 43, 6-20 (2010).