Shri Narayanan, left, supervises Ph.D. student Eric Bresch, who is preparing a subject for MRI imaging.
How does your tongue move exactly when you utter a simple “hello”? Does the tip move faster than the base or vice versa? Does the movement vary if you’re not a native English speaker? If you’re angry, sad or ebullient? Would it move differently if you had suffered a stroke or had any other form of cerebral damage?
These are some of the questions that Shrikanth Narayanan, director of the USC Viterbi School’s Signal Analysis and Interpretation Laboratory, is trying to answer.
In search of responses, Narayanan and a multidisciplinary team of researchers have, for the past several years, refined the use of real-time magnetic resonance imaging, that allows them to go where no researchers have gone before, thanks to funding from the National Institutes of Health.
“It’s very hard to look inside the body to see speech articulation; it’s a very hostile environment,” said linguistics professor Dani Byrd, a member of the research team looking into the mysteries of speech production. “It’s dark, it’s wet, it’s somewhat salty, things move very, very fast, they whack into other things, so a fast-moving tongue can hit up against the palate with quite a lot of force or our two lips can come together with quite a lot of force, and almost none of these events are externally visible.
“With this technology we can see what were the events that took place in the human body which shaped the vocal fold vibration , in a way that created the speech output,” Byrd said. “We can see the events that cause the speech waveform to have the properties that we see now.”
To peek into the human vocal tract, Narayanan and his team have developed software and hardware that allow them to record body movements when subjects are inside an MRI machine reading a series of prepared sentences aloud or while talking to someone outside the scanner.
During a recent scanning session at the USC Imaging Science Center, a German speaker read out loud a series of statements, first in his native language and then in English. Outside, Narayanan and Ph.D. student Yoon-Chul Kim monitored the movement of the subjects’ vocal tract, which could be seen in real time on a computer monitor.
On a tiny square of the monitor, the movement of the tongue, the lips and the velum can be seen just as the volunteer inside the MRI machine spoke. Ph.D. student Eric Bresch tracked the sound, making sure the loud thumping of the scanner did not clutter the sentences being uttered by the subject.
“It’s a major accomplishment technically,” Bresch said, referring to the contraption he had put together to capture the vocal utterances and diminish the acoustic noise of the MRI.
The USC team is the first in the nation to use this technology for linguistic research, and even though it has its drawbacks, its use has proven far superior to other tools.
“One of the powerful aspects of our approach is that MRI provides a full picture of the position of soft tissue while speech sounds are being produced,” said Krishna Nayak, assistant professor in the Ming Hsieh Department of Electrical Engineering and a member of the research team. “Compare this with one of the pre-existing modalities to study real-time speech, ultrasound, which only lets you see the tongue. To really understand the shaping of the vocal tract, you need to see both sides. In fact, you need to see all three dimensions.”
Linguists also have used electro-magnetometry, a method that yields high temporal resolution by tracking a few sensors placed on key speech articulators, though it has the potential not only to produce distorted results – since the sensors are placed on the tongue – but provides only a partial view of the front part of the vocal tract.
“That’s why real-time MRI is such a powerful technique,” Nayak said.
Real-time MRI in its present form does have limitations. First, the volunteers have to be lying down supine when the pictures are being taken, an unnatural position for day-to-day communication. Second, the imaging still doesn’t have the spatial and temporal resolution that the researchers would like.
“We’d like to image even faster than we do right now because there are certain sounds that require very rapid motion of the tongue tip and of the lips , the human vocal tract is very amazing,” Nayak said.