Skip to content ↓

Hearing voices: MIT researchers explore how auditory feedback helps us speak

Senior research scientist Joseph S. Perkell (center) attaches electrodes to electrical engineering and computer science graduate student Adrienne Prahler to gather data about speech motor control. Engineer Majid Zandipour (right foreground) operates the computer in an adjoining room.
Senior research scientist Joseph S. Perkell (center) attaches electrodes to electrical engineering and computer science graduate student Adrienne Prahler to gather data about speech motor control. Engineer Majid Zandipour (right foreground) operates the computer in an adjoining room.
Photo / Donna Coveney
Dr. Joseph Perkell affixes sensors to the tongue of Adrienne Prahler to learn about motor strategies for producing speech.
Dr. Joseph Perkell affixes sensors to the tongue of Adrienne Prahler to learn about motor strategies for producing speech.
Photo / Donna Coveney

CAMBRIDGE, Mass.--Although it sounds like an April Fool's joke, MIT researchers are conducting serious studies in which people mouth odd sounds in response to a machine or have their tongues wired up with little coils. These studies may shed light on how hearing affects our ability to talk.

In the long run, this knowledge could help diagnose and treat individuals with communicative disorders, including people who stutter, stroke victims and those suffering from dyslexia. It might also be used to help gain or lose an accent.

By studying speech movements and the resulting sounds in people with normal hearing, in people whose auditory perception of speech sounds is modified experimentally and in hearing-impaired people who gain some hearing artificially, MIT researchers are learning about the "internal model" that tells us how we want to sound.

"The main thing I have to offer is a new tool for understanding the role of auditory feedback in speech production," said John F. Houde, a postdoctoral fellow in otolaryngology at the University of California at San Francisco, who recently completed his Ph.D. in brain and cognitive sciences at MIT and published a paper on this subject in the Feb. 20 issue of the journal Science.

Joseph S. Perkell, a senior research scientist in the Speech Communication Group of the Research Laboratory of Electronics, also studies the role hearing plays in speaking.

By examining the physical components of speech production and exploring changes in speech that result from changes in hearing, he is helping to unravel the enormously complex system involving brain and body that allows us to communicate through spoken language.

Perkell, also affiliated with the MIT Department of Brain and Cognitive Sciences, says that the goals of speech movements--aside from the primary goal of converting linguistic messages into intelligible signals--vary depending on the situation.

"Variations are made possible partly because listeners can understand less-than-perfect speech," he said. The almost-unconscious decisions we make about the clarity, volume and rate at which we speak depends on the noise level around us, the listener's familiarity with the language and whether we can see the listener's face, among other things.

Perkell is investigating the idea that, whenever possible, we opt for the laziest way to speak by choosing movements that will get our point across with the least effort. Speaking clearly may require more effort than letting words run together and dropping parts of words.


Our ability to control individual muscles in our tongues, coupled with our uniquely right-angled vocal tract (part of the airway between the larynx and lips), may account for the wide range of speech sounds that humans alone can make.

Although talking may seem easy, it is anything but simple. Before we even open our mouths, decisions about how loudly or long we intend to speak determine how much air we use. Then we expand our rib cage and begin forcing air from the lungs through the trachea and larynx. In the larynx, the air flow can cause the vocal folds to vibrate, making the voicing sounds of vowels. The air then passes into the vocal tract, where the tongue, jaw, soft palate and lips are moved around to create different speech sounds.

Sometimes the tongue or lips completely close off the vocal tract or create narrow constrictions for the production of silent intervals and certain kinds of noises that correspond to consonants like "t" and "s."

Dozens of muscles that move several very different physiological structures--the lungs, larynx and vocal tract--are controlled by the brain with lightning speed to produce a single "Hey!"

"These systems that evolved to serve different functions--breathing, swallowing, chewing--are used in an elegant way to make these sounds come out right," Perkell said. "Having engineering knowledge and skills to apply to understanding this process has been extremely helpful."

Approaching the problem with expertise in experimental psychology as well as engineering, Perkell studies aspects of our motor control system that makes speech possible. This task is so complex that only a handful of researchers in the country are tackling it. "Speech may be the most complicated motor act any creature performs," said Perkell, who admits he doesn't expect to see an explanation in his lifetime of how billions of neurons work together to accomplish this enormously complex task.

To help understand speech motor control, Perkell and his colleagues gather data about the speech motor control system with an Electro-Magnetic Midsaggital Articulometer (EMMA) system.The EMMA system developed at MIT uses small encased transducer coils that are glued to a subject's articulators--the parts of the tongue, teeth and lips that help produce speech.

The movement traces produced by the EMMA system when a subject is talking are analyzed on a computer and plotted on graphs to help researchers learn about the motor strategies that people use to produce intelligible sequences of speech sounds. About 20 such systems are used in laboratories around the world. The data from these systems have made it clear that people can use different strategies to produce the same utterance. Even when the same person repeats the same thing a number of times, it is produced somewhat differently each time.

By studying such patterns of variation, researchers can begin to understand which parts of utterances are less variable and which parts are more variable. The parts that are less variable are likely to be most useful for transmitting the message reliably to the listener. The research done by Perkell and his colleagues with the EMMA system shows that the resulting sounds may be somewhat less variable than the movements that produce them. Such findings point to the importance of the acoustic signal and the need to understand the role of hearing in controlling speech production.


Scientists agree that with the help of our ears, we establish and refine from infancy through puberty an "internal model" of how our speech should sound. In effect, through learning, we become "hard-wired" for speech. "Speech is really stable in the absence of feedback," Houde said. Houde and Perkell pointed out that adults who become deaf will retain intelligible speech for a couple of decades, but an individual who is born deaf has great difficulty learning to speak.

While the brain's parameters for controlling speech are very stable, they can be tinkered with, as Houde found in his research at MIT.

Houde and his thesis adviser, Professor of Psychology Michael I. Jordan, wanted to find out how people adapt their speech to what they hear.

Houde built a sensorimotor adaptation (SA) apparatus that takes a sample of speech, figures out in real time its spectrum and formant pattern (the peaks in the frequency spectrum of human speech), alters the formant pattern of a vowel, resynthesizes the speech and sends it back to the subject with a virtually imperceptible delay.

If you try to say a word with one vowel sound, "pep," for instance, the SA apparatus makes it sound as though you just said a different vowel sound, like "peep."

"The easiest way to think about it is that you've suddenly been transported to a planet with a weird atmosphere that changes how your speech sounds," Houde said. This is all done in a whisper, because it's hard to block out a person's hearing of his own voice.

It turns out that "we end up doing whatever we have to do to make the correct sound come out of our mouths. When we hear our own words with an altered vowel sound, we automatically begin to 'correct' the way we say the vowel in the first place," he said.

In one version of the experiment, when subjects whispered "pep," the machine made them hear this as "peep." In response, the subjects altered their whispering until what they heard from the machine sounded once again like "pep," even though what they were actually whispering--if they could hear it without the machine--sounded more liked "pop." If they heard altered feedback for long enough, they continued to say "pop" for "pep" even after the alteration was eliminated, as long as they couldn't hear themselves. Once they could hear themselves normally, they returned to normal speech.


It was once thought that "hearing yourself has only an indirect influence on your speech--a minor one about how to set some parameter about how to speak," Houde said. "It appears that the connection between hearing your own and others' speech and producing sounds is not that simple."

Perkell says that although the role of hearing in controlling speech may be complicated, "it is very unlikely that the motor control system uses auditory feedback moment-to-moment" to monitor speech production. "The neural processing times would probably be too long, and a significant portion of the movements for many vowels occurs during preceding consonant strings, when relatively little or no sound is being generated."

Perkell and his colleagues have been studying the relation between speech and hearing in people who have been fitted with an auditory prosthesis called a cochlear implant, which uses sophisticated electronics to provide a form of artificial hearing. These patients have learned how to speak while they could hear and then lost their hearing. While their speech remains intelligible, it does not sound completely normal.

Perkell's group collaborates with surgeon Joe Nadol and research scientist Don Eddington at the Massachusetts Eye and Ear Infirmary in studying changes that take place in the speech of cochlear implant patients. The changes indicate that in adults, hearing seems to have two roles: it monitors the acoustic environment, so that we speak louder in a noisy room; and it maintains our internal model to assure that we end up sounding the way we think we should.

Houde's work is supported by the National Institute of Health (NIH). Perkell's work is funded by the NIH's National Institute of Deafness and Other Communicative Disorders.

Related Topics

More MIT News