A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis (ALS)

Yana Yunusova; Jordan R. Green; Jun Wang; Gary Pattee; Lorne Zinman

doi:10.3791/2422

Medicine

A Protocol for Comprehensive Assessment of Bulbar Dysfunction in Amyotrophic Lateral Sclerosis (ALS)

Published: February 21, 2011 doi: 10.3791/2422

Yana Yunusova^1,2, Jordan R. Green³, Jun Wang³, Gary Pattee⁴, Lorne Zinman^2,5

¹Department of Speech-Language Pathology, University of Toronto, ²ALS/ MN Clinic, Sunnybrook Health Science Centre, ³Department of Special Education and Communication Disorders, University of Nebraska-Lincoln, ⁴Department of Neurology, Munroe-Meyer Institute, University of Nebraska Medical Center, ⁵Department of Neurology, University of Toronto

Summary

Objective assessments of the physiological mechanisms that support speech are needed to monitor disease onset and progression in persons with ALS and to quantify treatment effects in clinical trials. In this video, we present a comprehensive, instrumentation-based protocol for quantifying speech motor performance in clinical populations.

Abstract

Improved methods for assessing bulbar impairment are necessary for expediting diagnosis of bulbar dysfunction in ALS, for predicting disease progression across speech subsystems, and for addressing the critical need for sensitive outcome measures for ongoing experimental treatment trials. To address this need, we are obtaining longitudinal profiles of bulbar impairment in 100 individuals based on a comprehensive instrumentation-based assessment that yield objective measures. Using instrumental approaches to quantify speech-related behaviors is very important in a field that has primarily relied on subjective, auditory-perceptual forms of speech assessment¹. Our assessment protocol measures performance across all of the speech subsystems, which include respiratory, phonatory (laryngeal), resonatory (velopharyngeal), and articulatory. The articulatory subsystem is divided into the facial components (jaw and lip), and the tongue. Prior research has suggested that each speech subsystem responds differently to neurological diseases such as ALS. The current protocol is designed to test the performance of each speech subsystem as independently from other subsystems as possible. The speech subsystems are evaluated in the context of more global changes to speech performance. These speech system level variables include speaking rate and intelligibility of speech.

The protocol requires specialized instrumentation, and commercial and custom software. The respiratory, phonatory, and resonatory subsystems are evaluated using pressure-flow (aerodynamic) and acoustic methods. The articulatory subsystem is assessed using 3D motion tracking techniques. The objective measures that are used to quantify bulbar impairment have been well established in the speech literature and show sensitivity to changes in bulbar function with disease progression. The result of the assessment is a comprehensive, across-subsystem performance profile for each participant. The profile, when compared to the same measures obtained from healthy controls, is used for diagnostic purposes. Currently, we are testing the sensitivity and specificity of these measures for diagnosis of ALS and for predicting the rate of disease progression. In the long term, the more refined endophenotype of bulbar ALS derived from this work is expected to strengthen future efforts to identify the genetic loci of ALS and improve diagnostic and treatment specificity of the disease as a whole. The objective assessment that is demonstrated in this video may be used to assess a broad range of speech motor impairments, including those related to stroke, traumatic brain injury, multiple sclerosis, and Parkinson disease.

Protocol

I. Subsystem Analyses

1. Respiratory subsystem/ Breathing for speech

The respiratory subsystem is evaluated using the Phonatory Aerodynamic System (PAS). The system allows for simultaneous recordings of oral pressure, airflow, and speech acoustics (see Table 1 for the list of equipment and manufacturers). A disposable face mask and a disposable pressure-sensing tube are necessary for recordings. Prior to recording, the flow and pressure channels are calibrated according to the manufacturer's specifications.

Vital Capacity (VC) is the maximum volume of air that is exhaled following maximum inhalation. VC is evaluated using a disposable face mask that is attached to the pneumotachograph.
1. The PAS "Vital Capacity" protocol is selected for the recording.
2. The participant is instructed to inhale as maximally as possible and exhale maximally into the mask; the task is repeated three times.
3. Maximum expiratory volume is derived using PAS software.
Subglottal pressure (Ps) is the air pressure available in the lungs for production of "pressure" consonants. Ps is evaluated indirectly by measuring peak pressure in the mouth during the production of a syllable train^2,3.
1. The PAS "Voicing Efficiency" protocol is selected for the recording.
2. To record the oral pressure during /pa/, the pressure-sensing tube is positioned inside the mouth on the tongue surface.
3. Nasal passages are occluded with a nose clip to eliminate potential nasal air flow escape.
4. The participant is instructed to inhale approximately twice their normal amount and say /pa/ into the face mask. The syllable /pa/ is repeated seven times on one exhalation, while maintaining consistent pitch and loudness. The rate is maintained at 1.5 syllables per second.
5. Peak oral pressure is measured for five (middle) repetitions of /pa/. An average of these five productions is obtained to represent Ps during speech.
6. Because Ps covaries with sound pressure level (SPL)^4,5, the SPL is also collected for each syllable. It is used subsequently as a covariate during analyses.
Speech breathing is evaluated during connected speech while participants read a standard 60-word paragraph (Appendix 1) developed specifically for accurate, automatic pause-boundary detection⁶.
1. The PAS "Maximum Phonation" protocol is selected for the recording.
2. The airflow signal is collected using a disposable mask that is fit around the face.
3. The participant is instructed to read the paragraph at their normal comfortable speaking rate and loudness.
4. Air flow traces are exported into a custom-made Speech-Pause Analysis (SPA)⁷ software program in Matlab. In this program, the pauses in connected speech are identified. The software calculates, among other measures, percent pause time, which is a measure of time spent pausing during the reading of a passage.

2. Phonatory subsystem

The phonatory subsystem is evaluated via voice recordings using high-quality acoustic recording equipment (Table 1).

The microphone is placed approximately 15 cm away from the mouth.
A nasal clip is used to eliminate the potential effect of the velopharyngeal inadequacy on the quality of phonation.
The participant is asked to produce "Maximum Phonation". He or she is instructed to inhale the maximum amount of air and then to phonate /a/ at a normal pitch and loudness for as long as possible. This task is practiced at least once prior to recording. The importance of putting forth maximum effort is emphasized.
Maximum phonation duration is measured in seconds using the acoustic waveform.
The digitized acoustic waveform is loaded into the Multidimensional Voice Profile (MDVP) software for analysis. Measures of central tendency and variability of fundamental frequency (F0), noise-to-harmonic ratio (NHR) and percent jitter, among others, are obtained for the middle five seconds of the phonation interval.

3. Resonatory subsystem

The resonatory subsystem is evaluated using Nasometer. This device consists of a headset with a baffle plate, which is positioned under the nose and separates the oral and nasal cavities. Two microphones that detect the oral and nasal acoustic signals are attached to opposite sides of the plate.

The device is calibrated prior to each recording.
The headset is placed on the head with the baffle plate resting above the upper lip and positioned parallel to the ground.
The participant is asked to repeat one "nasal" (e.g., Mama made some lemon jam) and one "non-nasal" (e.g., Buy Bobby a puppy) sentence three times at a habitual speaking rate and loudness.
The measured intensities of the voiced portion of the oral and nasal acoustic signals are converted into a nasalance score, which is defined as the ratio of nasal / nasal+oral acoustic energy, and is expressed as a percentage. The nasalance score reflects the relative proportion of nasal-to-oral acoustic energy in a speech stream⁸.
The Nasometer software calculates numerous descriptive statistics from the nasalance waveform.
Nasalance distance, which is derived by subtracting the mean nasalance calculated across oral sentences (BBP) from the mean nasalance for the nasal sentences (MMJ)⁹, can also be used as an index of velopharyngeal impairment.

4. Articulatory subsystem: Face

Facial (lip and jaw) movements are registered in 3D using a high resolution, optical motion capture system¹⁰. The infrared digital video cameras capture the positions of 15 reflective markers that are attached to each participant's head and face at specific anatomical landmarks. An acoustic speech signal is recorded simultaneously with speech kinematics.

The system is calibrated prior to recordings according to the manufacturer's specifications.
Four markers are attached to the forehead of the participant using a head band. Markers are also attached to the left and right eyebrow, the bridge and tip of the nose, the vermilion border of the upper and lower lip, the left and right corners of the mouth, and to three different locations on the chin. This is the typical marker array used in this protocol, but an unlimited number of markers can be used with this system.
The participant is asked to read sentences and phrases (see Table 2) at their habitual speaking rate and loudness.
A "rest" file recording is obtained and used in post-processing to normalize for differences in marker placement between sessions and for re-expression of the data relative to the consistent anatomically-based coordinate system as needed.
During post-processing, movements of the facial markers are checked for tracking errors and head-corrected based on the subtraction of both the translational and rotational components of head movement.
The data are loaded into SMASH, a Matlab based software program developed in our lab. Within SMASH, the data are filtered and parsed. Peak movement speed is derived from each trace and used as the primary indicator of articulatory function for the jaw and lips. 3D speed is computed as the first-order derivative of each articulator's Euclidian distance time history in SMASH.

5. Articulatory subsystem: Tongue

Tongue tracking is accomplished using an electromagnetic tracking device (WAVE), which records the position and rotation of sensors that are attached to the tongue. Unlike the optical motion tracking that is used to record external, facial structures, the electromagnetic technology provides a way to accurately track tongue movements during speech¹¹.The system uses a combination of 5 and 6-degree-of-freedom (5DOF and 6DOF) sensors to record articulatory motions in a calibrated volume (30 x 30 x 30 cm). Movement data and acoustic data are acquired simultaneously.

Two sensors are attached to the articulators using dental glue (PeriAcryl Periodontal Adhesive). One reference is attached to the bridge of the nose to record head movements. One small 5DOF sensor (3D location and 2D angular measurements) is attached to the tongue at midline, approximately 2 cm posterior to the tongue tip.
To obtain tongue movements that are independent from the underlying jaw, each participant is fitted with a pre-made 5 mm bite block. The bite block is made of non-toxic condensation putty (Henry Schein).
The bite block is placed between molars on the side of the mouth. A string attached to the bite block is secured to the participant's face to prevent swallowing of the bite block.
The participant is asked to read sentences and phrases (see Table 2).
Tongue movements are recorded relative to head position.
Post-acquisition, the data is transferred into SMASH, where it is low-pass filtered, parsed based on the vertical movement trace, and used to calculate 3D speed. The average and maximum speed of movement during each utterance is reported as an index of disease-related change of this articulator.

II. System-level Assessment

In addition to the subsystem-level variables, speech intelligibility and speaking rate are measured. These measures are essential because they are current clinical "goal standards" characterizing bulbar speech performance. They provide an indication of the functional status of the speech production system as a whole and quantify the severity of speech impairment. These measures are obtained using the Sentence Intelligibility Test (SIT)¹².

Prior to recording, a random list of 10 sentences of increasing length (from 5 to 15 words) is generated by the SIT software.
A microphone is placed on the head, approximately 15 cm from the mouth.
The participant is asked to read the list at their habitual speaking rate and loudness. The sentences are digitally recorded at 44.1k using a 16 bit resolution.
Several trained judges who are unfamiliar to the participant transcribe the sentences orthographically and measure sentence durations.
The SIT software automatically calculates speech intelligibility, which is reported as percent of words correctly transcribed out of the total number of words produced. Speaking rate is also reported as the number of words read per minute.

Subsystem	Equipment / Software	Signal	Acquisition Settings
Respiratory	Phonatory Aerodynamic System (PAS), KayPENTAX, Lincoln Park, NJ, USA	Acoustic, pressure, and flow	Sampling rate=200 Hz, Low-pass filtered=30Hz
Phonatory	Compact flash recorder (E.g., PMD660), Professional quality microphone, SPL meter, Extech Instruments Software: MDVP, KAYPentax	Acoustic	Sampling rate=44.01 kHz, 16 bit linear PCM
Resonatory	Nasometer, Model 6400, KAYPentax	Acoustic	Sampling rate=11025 Hz
Articulatory: Face	Eagle Digital System, Motion Analysis Corp.	Kinematic and acoustic	Sampling rate=120Hz, Low-pass filtered =10Hz
Articulatory: Tongue	WAVE, Northern Digital Inc, Canada	Kinematic and acoustic	Sampling rate=100Hz, Low pass filtered=20Hz

Table 1: Instrumentation and acquisition settings for sub-system data collection

Level	Task	Measurements	References & Norms
Respiratory	VC	Maximum expiratory lung volume	13
/pa/ x 7	Subglottal pressure	2, 3
Bamboo passage	% Pause time	6, 7, 14
Phonatory	Maximum phonation /a/	Maximum phonation duration, mean F0, jitter, SNR	15, 16, 17, 3
Resonatory	Mama made some lemon jam; Buy Bobby a puppy	Nasalance	18, 19
Articulatory: Face	Buy Bobby a puppy; Say _ again (bat, tide, keep, tool)	Movement speed	20, 21
Articulatory: Tongue	/ta/ x 5, Say doily again
System-level	SIT, Sentences	Speech intelligibility and speaking rate	12

Table 2: Measurements obtained for each subsystem and task

Appendix 1: Bamboo passage

Bamboo walls are getting to be very popular. They are strong, easy to use, and good looking. They provide a good background and create the mood in Japanese gardens. Bamboo is a grass, and is one of the most rapidly growing grasses in the world. Many varieties of bamboo are grown in Asia, although it is also grown in America. Last year we bought a new home and have been working on the flower gardens. In a few more days, we will be done with the bamboo wall in one of our gardens. We have really enjoyed the project.

Discussion

Here we demonstrated a comprehensive protocol for the assessment of bulbar (speech) dysfunction in ALS. The data obtained from this protocol are used to gain a deeper understanding of how ALS affects speech production. These data are also used to identify the most sensitive measures of disease progression. Although this protocol is currently being employed for research, the findings from this research will be utilized to develop more cost-efficient and clinically feasible approaches to quantify bulbar involvement.

Disclosures

No conflicts of interest declared.

Acknowledgments

This work has been supported by the National Institute of Health, National Institute on Deafness and Other Communication Disorders, Grant R01DCO09890-02, Canadian Foundation for Innovation (CFI-LOF #15704), and Connaught Foundation, University of Toronto. The authors would like to thank Cynthia Didion, Mili Kuruvilla, Krista Rudy, and Lori Synhorst for assistance with data collection and analysis; and Cara Ullman for creating video clips.

Animations were made by Blue Tree Publishing (http://www.bluetreepublishing.com/)

The SPA and SMASH software is Matlab based and can be obtained by contacting Jordan Green at jgreen4@unl.edu.

Visit our labs:

Bulbar Function Laboratory (Sunnybrook Health Sciences Centre in Toronto, Canada):
http://www.sunnybrook.ca/research/?page=sri_groups_bulb_home

Speech Production Laboratory (University Nebraska Lincoln):
http://spl.unl.edu

Materials

Name	Company	Catalog Number	Comments
Phonatory Aerodynamic System (PAS)	KayPENTAX
Compact flash recorder	PMD660
Professional quality microphone
SPL meter	Extech Instruments
MDVP	KayPENTAX
Nasometer	KayPENTAX	Model 6400
Eagle Digital System	Motion Analysis Corp.
WAVE	Northern Digital Inc, Canada