The McGurk Effect - JoVE Wordpress Development

JoVE Science Education
Sensation and Perception

A subscription to JoVE is required to view this content. Sign in or start your free trial.

JoVE Science Education Sensation and Perception

The McGurk Effect

00:00Overview
01:29Experimental Design
03:22Running the Experiment
04:51Representative Results
05:53Applications
07:45Summary

English العربية 中文 Nederlands Français Deutsch עברית Italiano 日本語 한국어 Português русский Español Türkçe

맥거크 효과

English

Overview

출처: 조나단 플롬바움 연구소 -존스 홉킨스 대학

인간의 단수 적인 성취인 말하는 언어는 전문적인 지각 메커니즘에 크게 의존합니다. 언어 지각 메커니즘의 한 가지 중요한 특징은 청각 및 시각적 정보에 동시에 의존한다는 것입니다. 현대까지, 사람이 대부분의 언어가 대면 상호 작용에서 들릴 것이라고 기대할 수 있기 때문에 이것은 의미가 있습니다. 그리고 특정 음성 소리를 생성하려면 정확한 관절이 필요하기 때문에 입은 누군가가 말하는 것에 대한 좋은 시각적 정보를 제공 할 수 있습니다. 사실, 누군가의 얼굴을 가까이서 방해받지 않는 시야를 가진 입은 종종 음성공급보다 더 나은 시각적 신호를 제공 할 수 있습니다 청각 신호. 그 결과 인간의 뇌는 시각적 입력을 선호하고, 음성 언어로 내재된 모호성을 모호하게 하는 데 사용합니다.

사운드를 해석하기 위한 시각적 입력에 대한 이러한 의존은 해리 맥거크와 존 맥도널드가 1976년 입술을 듣고 목소리를 보는논문에서 설명되었습니다. ¹ 이 논문에서 그들은 음원과 비디오 녹화 사이의 불일치를 통해 발생하는 환상을 설명했다. 그 환상은 맥거크 효과로 알려지게되었다. 이 비디오는 McGurk 효과를 제작하고 해석하는 방법을 보여줍니다.

Procedure

1화 자극 McGurk 효과 자극을 만들기 위해 당신은 비디오 카메라가 필요합니다 – 스마트 폰에 종류는 괜찮습니다. 또한 순진한 주제에 대한 비디오 프레젠테이션을 제어하려면 컴퓨터가 필요합니다. 머리가 디스플레이를 채울 수 있도록 카메라를 직접 가리킵니다. 네 개의 녹음을 합니다. 각각 10s 길이여야 합니다. 네 개의 레코딩 각각에서 1/s 정도의 단어를 10번 반복?…

Results

Remember, the sounds played to your observer are either the words bane or pan. But in the accompanying videos, the words being articulated are gain and can respectively. So which words will people actually hear? The answer is most often none of those four. Instead, the typical result is that observers in the bane/gain condition will hear the word Dane. And observers in the pan/can condition will hear the word tan.

To understand why we need to understand a little bit about how phonemes are produced. A phoneme is a minimal unit of speech sound. The words bane and gain have the same phonemes in all positions but the first. In the word bane the first phoneme is a b sound, denoted /b/. In the word gain it is the sound /g/. The remaining sounds are the same-which is why the words rhyme. Figure 1 breaks down the McGurk effect in terms of the initial phonemes in these examples. When /b/ is shown and /g/ is played, people hear /d/. The word Dane in other words also rhymes with bane and gain, with a one phoneme difference right at the beginning.

Figure 1: The McGurk effect happens when there is a mismatch between a phoneme that is articulated in a visual presentation and different phoneme is played simultaneously through speakers. With phonemes that share certain articulation properties, the result heard may not match either of the mismatching stimuli. In the mismatch causes a third sound to be heard. Specifically, a visual /g/ with an auditory /b/ causes the phoneme /d/ to be heard. This is why a visual gain with an auditory bane results in Dane being heard. Similarly, a visual /k/ with an auditory /p/ leads the sound /t/ to be heard. That's why can/pan produces tan in the McGurk effect.

Why do conflicting /b/ and /g/ produce a /d/ specifically? Well, /b/, /g/, and /d/ are really not that different from one another, especially in terms of how they are produced. The three basically involve moving the same amount of air from a person's larynx through their mouth, with just a difference in where the speaker places a small obstruction. When someone makes a /b/ sound, they use their lips to obstruct the air; this is known as a labial point of articulation. For a /g/ sound, the point of articulation is palatal-it is far in the back of the mouth. And for a /d/ sound, the point of articulation is known as dental because people obstruct airflow through the mouth by touching their tongues to the top teeth. Figure 2 shows the relative points of articulation for the six phonemes in the McGurk effect.

Figure 2: Humans produce sounds by moving air through their throats and mouth. This involves vibrations in the larynx. A given set of vibrations produced in the larynx can produce multiple different phonemes by obstructing the flow of air. The place where an obstruction is placed to create a specific sound is called the point of articulation. Three important points of articulation are known as labial, referring to the lips; dental, referring to the teeth; and palatal, referring to the palate, or the back roof of the mouth. The figure shows how the phonemes produced and heard in the McGurk effect differ in terms of their points of articulation.

Now that you know a bit about how these sounds are produced, the logic of the McGurk effect should be more apparent. It works like this: Your brain knows that some phonemes are actually pretty similar to one another. In the McGurk effect the word bane is played to the observer, led off by a /b/ sound. But the face in the video is moving their mouth as they would to make a /g/ sound, and the word gain. The brain therefore receives conflicting inputs from the eyes and ears. To resolve the conflict, the brain comes to the conclusion that the truth is probably someplace in between. Since /d/ is the sound between /b/ and /g/-in terms of production-that's what people hear. The same explanation applies for turning the conflict between pan and can into tan. /p/ is a labial sound, and /k/ is a palatal sound. The dental one in between is /t/.

Applications and Summary

One place that the McGurk effect has been important is in understanding how very young infants learn spoken language. A study in 1997 was able to show that even 5-month-old infants perceive the McGurk effect.² This is important because it suggests that visual information may be used by infants to solve a major challenge to learning language-parsing a continuous audio stream into its units. Think about how a foreign language spoken at its normal speed can seem like such a jumble that you might not even know where the word boundaries are. Well, if all languages are foreign to infants, then how do they figure out where the words are? The McGurk effect suggests that they can rely on facial articulation patterns.

References

McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746-748.
Rosenblum, L. D., Schmuckler, M. A., & Johnson, J. A. (1997). The McGurk effect in infants. Perception & Psychophysics, 59(3), 347-357.

Transcript

Language perception—in a spoken form—benefits from face-to-face interactions, as the mouth supplies good visual information for articulating specific sounds.

For instance, in an up-close and unobstructed situation, an individual can watch their friend mention going to the beach. In this case, they use visual input—observing the movement around the lips and tongue—to clearly comprehend what was said.

However, if the friend continues to talk out of sight in another room, they might be tempted to watch the muted television and therefore must solely rely on the obstructed voice to make sense of the message.

In this case, what was actually said at the tail end, pick, interfered with the silent kick and was misinterpreted as tick. This is an example of the McGurk Effect—a perceptual illusion that arises through a mismatch between sound and visual cues.

This video demonstrates how to construct the audiovisual stimuli to test the phenomenon originally discovered by McGurk and Macdonald. It also investigates how vision interacts with sound production to understand how individuals learn language at a very young age.

In this experiment, participants are asked to watch muted videos, in which a word like gain is mouthed, while a sound such as bane is played simultaneously in the background. Afterwards, they are asked to share what they heard.

To understand the outcome, how the illusion is produced, let’s first discuss how phonemes—the minimal units of speech sounds—are articulated.

For example, bane and gain share the same elements in all positions except for the first, which are the sounds /b/ and /g/.

Although words with these initial phonemes may sound similar, when /g/ is shown and /b/ is played, individuals are expected to hear a completely different third sound—/d/—instead.

The reason /d/ is heard is due to the fact that all three are basically produced in the same manner, with only a small difference in where the speaker places an obstruction in airflow, called the points of articulation, or POA.

For instance, when a /b/ sound is made, lips provide the obstruction, resulting in a labial POA, whereas for /g/, it’s referred to as palatal—in the back of the mouth. As for /d/, the POA is dental, a consequence of the tongue touching the upper teeth.

When the brain integrates the conflicting visual /g/ and auditory /b/, it concludes that the final sound must lie somewhere in the middle of POAs, thus hearing /d/ and reporting the word Dane.

In preparation for the demonstration, obtain a computer to present videos on and a smartphone with a video camera.

First position the camera so that your head fills the display. Now, record four 10-s clips, each one containing different words that should be repeated 10 times at a rate of 1 word/s. Make sure to transfer the gain and can videos to the computer for visual playback.

To conduct the experiment, sit a participant in front of the computer. Open up the video file for the word gain and turn off the audio.

On the phone, open up the video for bane. Place it behind the computer so that its screen is hidden and only the sound can be heard clearly.

Instruct the participant to watch the computer monitor and listen. Then, play both videos simultaneously.

When the clips end, ask the participant what they heard. [Participant says: “Dane”]. Repeat the procedure by playing the video of the word can on the computer and presenting the audio for pan on the phone. Once again, question the participant as to what they heard. [Participant says: “tan”].

Here, the words bane and pan were played aloud as the participant watched gain and can being mouthed. Typically, when a term with the /g/ phoneme is shown visually and paired with the sound /b/, individuals will hear /d/.

Likewise, when a word starting with /k/ is paired with the sound /p/, individuals will hear /t/.

The reason behind such auditory perception is due to the way that sounds are produced. The brain tries to resolve conflicting information from the eyes seeing labial movements—/b/ and /p/—while the ears hear palatal units—/g/ and /k/. As a result, it concludes that the sounds must lie in the middle, resulting in the perception of dental phonemes—/d/ and /t/.

Now that you are familiar with how to produce the McGurk effect, let’s look at some other ways that researchers use this perceptual phenomenon to investigate language development and cases in which the effect is altered.

Infants can even be tested on the McGurk effect as early as five months of age, when they are pre-linguistic, using an habituation-of-looking-time paradigm.

In this procedure, Rosenblum and colleagues repeatedly presented infants with a particular syllable, like va, in both the audio and visual domains before introducing mismatched phonemes in a testing phase.

Infants showed signs of habituation to va—reduced looking times—and dishabituation, noted as increased looking, when something other than va was perceived. Thus, even before infants can talk, they display similar results as adults, in which they rely on the use of visual information for language discrimination.

However, children with autism have greater difficulty exhibiting the McGurk effect as readily as controls due to their impaired ability to understand and attend to the visual facial components. This indicates fundamental differences in processing audiovisual speech, which may contribute to their difficulty with language and communication.

Lastly, patients with lesions in their left hemisphere—the side typically predominant for understanding and learning language—often use visual facial features to help during speech therapy. Interestingly, when tested on the McGurk effect, they more often reported hearing dental sounds compared to controls. Such perceptions are likely due to their higher focus on visual information.

You’ve just watched JoVE’s video on the McGurk Effect. Now you should know how to conduct this audiovisual illusion and relate phonemes to sound production. In addition, you should also have a better understanding of the interactions between vision and hearing, and how they can be affected during development and adulthood.

Thanks for watching!