According to the Perceptual Assimilation Model (PAM), articulatory similarity/dissimilarity between sounds of the second language (L2) and the native language (L1) governs L2 learnability in adulthood and predicts L2 sound perception by naïve listeners. We performed behavioral and neurophysiological experiments on two groups of university students at the first and fifth years of the English language curriculum and on a group of naïve listeners. Categorization and discrimination tests, as well as the mismatch negativity (MMN) brain response to L2 sound changes, showed that the discriminatory capabilities of the students did not significantly differ from those of the naïve subjects. In line with the PAM model, we extend the findings of previous behavioral studies showing that, at the neural level, classroom instruction in adulthood relies on assimilation of L2 vowels to L1 phoneme categories and does not trigger improvement in L2 phonetic discrimination. Implications for L2 classroom teaching practices are discussed.
Identifying children at risk for reading problems or dyslexia at kindergarten age could improve support for beginning readers. Brain event-related potentials (ERPs) were measured for temporally complex pseudowords and corresponding non-speech stimuli from 6.5-year-old children who participated in behavioral literacy tests again at 9 years in the second grade. Children who had reading problems at school age had larger N250 responses to speech and non-speech stimuli particularly at the left hemisphere. The brain responses also correlated with reading skills. The results suggest that atypical auditory and speech processing are a neural-level risk factor for future reading problems. [Supplementary material is available for this article. Go to the publishers online edition of Developmental Neuropsychology for the following free supplemental resources: Sound files used in the experiments. Three speech sounds and corresponding non-speech sounds with short, intermediate, and long gaps].
All-pole modeling is a widely used formant estimation method, but its performance is known to deteriorate for high-pitched voices. In order to address this problem, several all-pole modeling methods robust to fundamental frequency have been proposed. This study compares five such previously known methods and introduces a technique, Weighted Linear Prediction with Attenuated Main Excitation (WLP-AME). WLP-AME utilizes temporally weighted linear prediction (LP) in which the square of the prediction error is multiplied by a given parametric weighting function. The weighting downgrades the contribution of the main excitation of the vocal tract in optimizing the filter coefficients. Consequently, the resulting all-pole model is affected more by the characteristics of the vocal tract leading to less biased formant estimates. By using synthetic vowels created with a physical modeling approach, the results showed that WLP-AME yields improved formant frequencies for high-pitched sounds in comparison to the previously known methods (e.g., relative error in the first formant of the vowel [a] decreased from 11% to 3% when conventional LP was replaced with WLP-AME). Experiments conducted on natural vowels indicate that the formants detected by WLP-AME changed in a more regular manner between repetitions of different pitch than those computed by conventional LP.
This study evaluated whether the linguistic multi-feature paradigm with five types of speech-sound changes and novel sounds is an eligible neurophysiologic measure of central auditory processing in toddlers. Participants were 18 typically developing 2-year-old children. Syllable stimuli elicited significant obligatory responses and syllable changes significant MMN (mismatch negativity) which suggests that toddlers can discriminate auditory features from alternating speech-sound stream. The MMNs were lateralized similarly as found earlier in adults. Furthermore, novel sounds elicited a significant novelty P3 response. Thus, the linguistic multi-feature paradigm with novel sounds is feasible for the concurrent investigation of the different stages of central auditory processing in 2-year-old children, ranging from pre-attentive encoding and discrimination of stimuli to attentional mechanisms in speech-like research compositions. As a conclusion, this time-efficient paradigm can be applied to investigating central auditory development and impairments in toddlers in whom developmental changes of speech-related cortical functions and language are rapid.
High vocal effort has characteristic acoustic effects on speech. This study focuses on the utilization of this information by human listeners and a machine-based detection system in the task of detecting shouted speech in the presence of noise. Both female and male speakers read Finnish sentences using normal and shouted voice in controlled conditions, with the sound pressure level recorded. The speech material was artificially corrupted by noise and supplemented with pure noise. The human performance level was statistically evaluated by a listening test, where the subjects labeled noisy samples according to whether shouting was heard or not. A Bayesian detection system was constructed and statistically evaluated. Its performance was compared against that of human listeners, substituting different spectrum analysis methods in the feature extraction stage. Using features capable of taking into account the spectral fine structure (i.e., the fundamental frequency and its harmonics), the machine reached the detection level of humans even in the noisiest conditions. In the listening test, male listeners detected shouted speech significantly better than female listeners, especially with speakers making a smaller vocal effort increase for shouting.
The aim of this study was to investigate if there were objective quantities extracted from the speech pressure waveforms that underlay inaudible changes in the symptoms of the vocal organ. This was done through analyzing 180 voice samples obtained from nine subjects (five females and four males) before and after exposure to a placebo substance (lactose) and an organic dust substance. Acoustical analysis of the voice samples was achieved by using glottal inverse filtering. Results showed that the values of primary open quotient and primary speed quotient changed significantly (P<0.05) as did the amplitude quotient (P<0.01). Exposure to lactose resulted in significant changes of secondary open quotient (P<0.05) but opposite to effects found for exposure to organic dust. Modeling of the vocal tract into cross-sectional planes revealed that the immediate plane above the vocal folds correlates inversely with the feeling that voice is tense, or feeling the need to make an effort when speaking in addition having a feeling of shortness of breath or the need to gasp for air. Such results may point to acoustically detected subclinical changes in the vocal organ that the subject him/herself feels while they remain perceptually undetected by others.
Most speech sounds are periodic due to the vibration of the vocal folds. Non-invasive studies of the human brain have revealed a periodicity-sensitive population in the auditory cortex which might contribute to the encoding of speech periodicity. Since the periodicity of natural speech varies from (almost) periodic to aperiodic, one may argue that speech aperiodicity could similarly be represented by a dedicated neuron population. In the current magnetoencephalography study, cortical sensitivity to periodicity was probed with natural periodic vowels and their aperiodic counterparts in a stimulus-specific adaptation paradigm. The effects of intervening adaptor stimuli on the N1m elicited by the probe stimuli (the actual effective stimuli) were studied under interstimulus intervals (ISIs) of 800 and 200 ms. The results indicated a periodicity-dependent release from adaptation which was observed for aperiodic probes alternating with periodic adaptors under both ISIs. Such release from adaptation can be attributed to the activation of a distinct neural population responsive to aperiodic (probe) but not to periodic (adaptor) stimuli. Thus, the current results suggest that the aperiodicity of speech sounds may be represented not only by decreased activation of the periodicity-sensitive population but, additionally, by the activation of a distinct cortical population responsive to speech aperiodicity.
Cortical sensitivity to the periodicity of speech sounds has been evidenced by larger, more anterior responses to periodic than to aperiodic vowels in several non-invasive studies of the human brain. The current study investigated the temporal integration underlying the cortical sensitivity to speech periodicity by studying the increase in periodicity-specific cortical activation with growing stimulus duration. Periodicity-specific activation was estimated from magnetoencephalography as the differences between the N1m responses elicited by periodic and aperiodic vowel stimuli. The duration of the vowel stimuli with a fundamental frequency (F0=106 Hz) representative of typical male speech was varied in units corresponding to the vowel fundamental period (9.4 ms) and ranged from one to ten units. Cortical sensitivity to speech periodicity, as reflected by larger and more anterior responses to periodic than to aperiodic stimuli, was observed when stimulus duration was 3 cycles or more. Further, for stimulus durations of 5 cycles and above, response latency was shorter for the periodic than for the aperiodic stimuli. Together the current results define a temporal window of integration for the periodicity of speech sounds in the F0 range of typical male speech. The length of this window is 3-5 cycles, or 30-50 ms.
The cortical mechanisms underlying human speech perception in acoustically adverse conditions remain largely unknown. Besides distortions from external sources, degradation of the acoustic structure of the sound itself poses further demands on perceptual mechanisms. We conducted a magnetoencephalography (MEG) study to reveal whether the perceptual differences between these distortions are reflected in cortically generated auditory evoked fields (AEFs). To mimic the degradation of the internal structure of sound and external distortion, we degraded speech sounds by reducing the amplitude resolution of the signal waveform and by using additive noise, respectively. Since both distortion types increase the relative strength of high frequencies in the signal spectrum, we also used versions of the stimuli which were low-pass filtered to match the tilted spectral envelope of the undistorted speech sound. This enabled us to examine whether the changes in the overall spectral shape of the stimuli affect the AEFs. We found that the auditory N1m response was substantially enhanced as the amplitude resolution was reduced. In contrast, the N1m was insensitive to distorted speech with additive noise. Changing the spectral envelope had no effect on the N1m. We propose that the observed amplitude enhancements are due to an increase in noisy spectral harmonics produced by the reduction of the amplitude resolution, which activates the periodicity-sensitive neuronal populations participating in pitch extraction processes. The current findings suggest that the auditory cortex processes speech sounds in a differential manner when the internal structure of sound is degraded compared with the speech distorted by external noise.
Early auditory experiences are a prerequisite for speech and language acquisition. In healthy children, phoneme discrimination abilities improve for native and degrade for unfamiliar, socially irrelevant phoneme contrasts between 6 and 12 months of age as the brain tunes itself to, and specializes in the native spoken language. This process is known as perceptual narrowing, and has been found to predict normal native language acquisition. Prematurely born infants are known to be at an elevated risk for later language problems, but it remains unclear whether these problems relate to early perceptual narrowing. To address this question, we investigated early neurophysiological phoneme discrimination abilities and later language skills in prematurely born infants and in healthy, full-term infants.
To investigate the effects of cortical ischemic stroke and aphasic symptoms on auditory processing abilities in humans as indicated by the transient brain response, a recently documented cortical deflection which has been shown to accurately predict behavioral sound detection.
Recent studies have shown that the human right-hemispheric auditory cortex is particularly sensitive to reduction in sound quality, with an increase in distortion resulting in an amplification of the auditory N1m response measured in the magnetoencephalography (MEG). Here, we examined whether this sensitivity is specific to the processing of acoustic properties of speech or whether it can be observed also in the processing of sounds with a simple spectral structure. We degraded speech stimuli (vowel /a/), complex non-speech stimuli (a composite of five sinusoidals), and sinusoidal tones by decreasing the amplitude resolution of the signal waveform. The amplitude resolution was impoverished by reducing the number of bits to represent the signal samples. Auditory evoked magnetic fields (AEFs) were measured in the left and right hemisphere of sixteen healthy subjects.
The aim of the present study was to determine differences in cortical processing of consonant-vowel syllables and acoustically matched non-speech sounds, as well as novel human and nonhuman sounds. Event-related potentials (ERPs) were recorded to vowel, vowel duration, consonant, syllable intensity, and frequency changes as well as corresponding changes in their non-speech counterparts with the multi-feature mismatch negativity (MMN) paradigm. Enhanced responses to linguistically relevant deviants were expected. Indeed, the vowel and frequency deviants elicited significantly larger MMNs in the speech than non-speech condition. Minimum-norm source localization algorithm was applied to determine hemispheric asymmetry in the responses. Language relevant deviants (vowel, duration and - to a lesser degree - frequency) showed higher activation in the left than right hemisphere to stimuli in the speech condition. Novel sounds elicited novelty P3 waves, the amplitude of which for nonhuman sounds was larger in the speech than non-speech condition. The current MMN results imply enhanced processing of linguistically relevant information at the pre-attentive stage and in this way support the domain-specific model of speech perception.
The development of native-like memory traces for foreign phonemes can be measured by using the Mismatch Negativity (MMN), a component of the auditory event-related potential. Previous studies have shown that the MMN is sensitive to changes in neural organization depending on language experience. Here we measured the MMN response in 5-6year-old monolingual German and bilingual Turkish-German kindergarten children growing up in Germany. MMN was investigated to a German vowel contrast and to a vowel contrast that exists in Turkish and in German. The results show that compared to a German control group, the MMN response is less robust in Turkish-German children to the German vowel contrast. The response to the contrast that exists in both languages does not differ between groups. Overall, the results suggest that the Turkish-German children have not yet fully acquired the German phonetic inventory despite living in Germany since birth and being immersed in a German-speaking environment.
The aim of the study was to investigate the effects of aging on human cortical auditory processing of rising-intensity sinusoids and speech sounds. We also aimed to evaluate the suitability of a recently discovered transient brain response for applied research.
In this study, we addressed whether a new fast multi-feature mismatch negativity (MMN) paradigm can be used for determining the central auditory discrimination accuracy for several acoustic and phonetic changes in speech sounds. We recorded the MMNs in the multi-feature paradigm to changes in syllable intensity, frequency, and vowel length, as well as for consonant and vowel change, and compared these MMNs to those obtained with the traditional oddball paradigm. In addition, we examined the reliability of the multi-feature paradigm by repeating the recordings with the same subjects 1-7 days after the first recordings. The MMNs recorded with the multi-feature paradigm were similar to those obtained with the oddball paradigm. Furthermore, only minor differences were observed in the MMN amplitudes across the two recording sessions. Thus, this new multi-feature paradigm with speech stimuli provides similar results as the oddball paradigm, and the MMNs recorded with the new paradigm were reproducible.
Recent single-neuron recordings in monkeys and magnetoencephalography (MEG) data on humans suggest that auditory space is represented in cortex as a population rate code whereby spatial receptive fields are wide and centered at locations to the far left or right of the subject. To explore the details of this code in the human brain, we conducted an MEG study utilizing realistic spatial sound stimuli presented in a stimulus-specific adaptation paradigm. In this paradigm, the spatial selectivity of cortical neurons is measured as the effect the location of a preceding adaptor has on the response to a subsequent probe sound. Two types of stimuli were used: a wideband noise sound and a speech sound. The cortical hemispheres differed in the effects the adaptors had on the response to a probe sound presented in front of the subject. The right-hemispheric responses were attenuated more by an adaptor to the left than by an adaptor to the right of the subject. In contrast, the left-hemispheric responses were similarly affected by adaptors in these two locations. When interpreted in terms of single-neuron spatial receptive fields, these results support a population rate code model where neurons in the right hemisphere are more often tuned to the left than to the right of the perceiver while in the left hemisphere these two neuronal populations are of equal size.
Closed phase (CP) covariance analysis is a widely used glottal inverse filtering method based on the estimation of the vocal tract during the glottal CP. Since the length of the CP is typically short, the vocal tract computation with linear prediction (LP) is vulnerable to the covariance frame position. The present study proposes modification of the CP algorithm based on two issues. First, and most importantly, the computation of the vocal tract model is changed from the one used in the conventional LP into a form where a constraint is imposed on the dc gain of the inverse filter in the filter optimization. With this constraint, LP analysis is more prone to give vocal tract models that are justified by the source-filter theory; that is, they show complex conjugate roots in the formant regions rather than unrealistic resonances at low frequencies. Second, the new CP method utilizes a minimum phase inverse filter. The method was evaluated using synthetic vowels produced by physical modeling and natural speech. The results show that the algorithm improves the performance of the CP-type inverse filtering and its robustness with respect to the covariance frame position.
Aperiodicity of speech alters voice quality. The current study investigated the relationship between vowel aperiodicity and human auditory cortical N1m and sustained field (SF) responses with magnetoencephalography. Behavioral estimates of vocal roughness perception were also collected. Stimulus aperiodicity was experimentally varied by increasing vocal jitter with techniques that model the mechanisms of natural speech production. N1m and SF responses for vowels with high vocal jitter were reduced in amplitude as compared to those elicited by vowels of normal vocal periodicity. Behavioral results indicated that the ratings of vocal roughness increased up to the highest jitter values. Based on these findings, the representation of vocal jitter in the auditory cortex is suggested to be formed on the basis of reduced activity in periodicity-sensitive neural populations.
Closed-phase (CP) covariance analysis is a glottal inverse filtering method based on the estimation of the vocal tract with linear prediction (LP) during the closed phase of the vocal fold vibration cycle. Since the closed phase is typically short, the analysis is vulnerable with respect to the extraction of the covariance frame position. The present study proposes a modified CP algorithm based on imposing certain predefined values on the gains of the vocal tract inverse filter at angular frequencies of 0 and pi in optimizing filter coefficients. With these constraints, vocal tract models are less prone to show false low-frequency roots. Experiments show that the algorithm improves the robustness of the CP analysis on the covariance frame position.
The aim of this pilot research was to investigate acute voice and throat symptoms related to organic dust exposure among nine subjects with suspected occupational rhinitis or asthma. Subjective voice and throat symptoms were recorded before and after an occupational exposure test. In addition, the study included perceptual assessment of subjects voice samples recorded before and after the exposure tests. The results showed a number of (statistically) significant voice and throat changes in symptoms based on subjects own assessments. These symptoms included a hoarse, husky, or tense voice, requiring an extra effort when speaking and difficulty in starting phonation (P < 0.05). Other significant symptoms included feeling of shortness of breath or the need to gasp for air and feeling that the voice is weak or that it does not resonate (P < 0.01). Such changes were not, however, detected by voice clinicians in the listening test of subjects voice samples recorded before and after the exposure. These results suggest that the larynx reacts to organic dust with symptoms that are felt by the patient rather than heard by the voice clinician. The voice disorder in such cases is a diagnosis based on symptoms expressed by subjects.
We examined 10-12-year old elementary school childrens ability to preattentively process sound durations in music and speech stimuli. In total, 40 children had either advanced foreign language production skills and higher musical aptitude or less advanced results in both musicality and linguistic tests. Event-related potential (ERP) recordings of the mismatch negativity (MMN) show that the duration changes in musical sounds are more prominently and accurately processed than changes in speech sounds. Moreover, children with advanced pronunciation and musicality skills displayed enhanced MMNs to duration changes in both speech and musical sounds. Thus, our study provides further evidence for the claim that musical aptitude and linguistic skills are interconnected and the musical features of the stimuli could have a preponderant role in preattentive duration processing.
Statistical learning is a candidate for one of the basic prerequisites underlying the expeditious acquisition of spoken language. Infants from 8 months of age exhibit this form of learning to segment fluent speech into distinct words. To test the statistical learning skills at birth, we recorded event-related brain responses of sleeping neonates while they were listening to a stream of syllables containing statistical cues to word boundaries.
A linguistic multi-feature mismatch negativity (MMN) paradigm with five types of changes (vowel, vowel-duration, consonant, frequency (F0), and intensity) in Finnish syllables was used to determine speech-sound discrimination in 17 normally-developing 6-year-old children. The MMNs for vowel and vowel-duration were also recorded in an oddball condition in order to compare the two paradigms. Similar MMNs in the two paradigms would suggest that they tap the same processes. This would promote the usefulness of the more time-efficient multi-feature paradigm for future studies in children.
Post-filtering can be utilized to improve the quality and intelligibility of telephone speech. Previous studies have shown that energy reallocation with a high-pass type filter works effectively in improving the intelligibility of speech in difficult noise conditions. The present study introduces a signal-to-noise ratio adaptive post-filtering method that utilizes energy reallocation to transfer energy from the first formant to higher frequencies. The proposed method adapts to the level of the background noise so that, in favorable noise conditions, the post-filter has a flat frequency response and the effect of the post-filtering is increased as the level of the ambient noise increases. The performance of the proposed method is compared with a similar post-filtering algorithm and unprocessed speech in subjective listening tests which evaluate both intelligibility and listener preference. The results indicate that both of the post-filtering methods maintain the quality of speech in negligible noise conditions and are able to provide intelligibility improvement over unprocessed speech in adverse noise conditions. Furthermore, the proposed post-filtering algorithm performs better than the other post-filtering method under evaluation in moderate to difficult noise conditions, where intelligibility improvement is mostly required.
Artificial bandwidth extension methods have been developed to improve the quality and intelligibility of narrowband telephone speech and to reduce the difference with wideband speech. Such methods have commonly been evaluated with objective measures or subjective listening-only tests, but conversational evaluations have been rare. This article presents a conversational evaluation of two methods for the artificial bandwidth extension of telephone speech. Bandwidth-extended narrowband speech is compared with narrowband and wideband speech in a test setting including a simulated telephone connection, realistic conversation tasks, and various background noise conditions. The responses of the subjects indicate that speech processed with one of the methods is preferred to narrowband speech in noise, but wideband speech is superior to both narrowband and bandwidth-extended speech. Bandwidth extension was found to be beneficial for telephone conversation in noisy listening conditions.
Human speech perception is highly resilient to acoustic distortions. In addition to distortions from external sound sources, degradation of the acoustic structure of the sound itself can substantially reduce the intelligibility of speech. The degradation of the internal structure of speech happens, for example, when the digital representation of the signal is impoverished by reducing its amplitude resolution. Further, the perception of speech is also influenced by whether the distortion is transient, coinciding with speech, or is heard continuously in the background. However, the complex effects of the acoustic structure and continuity of the distortion on the cortical processing of degraded speech are unclear. In the present magnetoencephalography study, we investigated how the cortical processing of degraded speech sounds as measured through the auditory N1m response is affected by variation of both the distortion type (internal, external) and the continuity of distortion (transient, continuous). We found that when the distortion was continuous, the N1m was significantly delayed, regardless of the type of distortion. The N1m amplitude, in turn, was affected only when speech sounds were degraded with transient internal distortion, which resulted in larger response amplitudes. The results suggest that external and internal distortions of speech result in divergent patterns of activity in the auditory cortex, and that the effects are modulated by the temporal continuity of the distortion.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.