NIH Record - National Institutes of Health

Neurosurgeon Develops AI Device to Restore Speech in Patients with Paralysis

Dr. Edward Chang
Dr. Edward Chang

Dr. Edward Chang has pioneered what was once considered impossible: building a device that restores the ability to speak for people with severe paralysis. 

“Speech allows us to communicate 150 words per minute,” said Chang, the chief neurosurgeon at the University of California, San Francisco, speaking at a recent NIH Director’s Lecture held in Lipsett Amphitheater. “It’s such a special human behavior. It allows me to transmit an idea from my mind to yours and back quickly.”

The act of speaking is about “shaping the breath.” It starts with an exhalation of air from the lungs. The air then moves through the voice box, also known as the larynx, and vibrates the vocal folds to create “the voice energy of our words,” Chang explained.

“The voice energy then goes through the upper part of the vocal tract—the lips, jaw and tongue. The movements create a filter on that sound. That’s what gives rise to the consonants and vowels we speak.”

An area in the brain’s sensorimotor cortex plays a key role in speech production. Chang’s lab uses a high- resolution technique called electrocorticography to study speech production’s “neural code,” the pattern of electrical activity directly from the surface of the brain.

When Chang started his lab 15 years ago, “I recognized we didn’t understand very much about the organization or the basic principles by which the human brain processes words.”

Since then, Chang’s lab has been studying how the neural code corresponds with different aspects of speech control. Their research has revealed a complex map in our brains where specific neurons are tuned to different speech sounds in the English language.

His lab’s earliest experiments focused on brain mapping in patients with severe epilepsy. These patients had electrode arrays implanted on the surface of their brains to measure electrical activity. Each array had 256 electrodes arranged in a 16-by-16 grid. The electrodes help doctors pinpoint exactly what tissue is responsible for the seizures, so it can be removed to cure the seizures.

In one early experiment, Chang asked the patients to read a list of syllables, such as “ba,” “da” and “ga.” Saying each sound out loud requires a different set of movements from the muscles in the vocal tract.

“When you make that ‘ba’ sound, your lips come to closure. It’s the release of your lips that creates the sound. The tongue tip from the top of your upper tooth creates the ‘da’ sound. The ‘ga’ sound is the back of the tongue going up,” he explained. “It was fun to learn all of this. I had no idea how I was exactly speaking.”

Chang speaking to NIH
Chang speaking to NIH

When people are speaking those sounds, Chang and his colleagues discovered distinct patterns of brain activity in the cortical areas that control the vocal tract for producing consonants and vowels.  Further studies revealed the area of the brain responsible for vocal tone, or the ability to adjust the intonational pitch of one’s voice prosody.

“Interestingly, when we played back people’s speech through a speaker, we found these areas were also activated by what they heard,” he said. “This means there’s auditory processing in this part of the brain as well. It’s encoding both the pitch of what we’ve heard, and what we are saying.”

Chang has taken their fundamental discoveries about the neural basis of speech production to build a device that restores communication for people with paralysis. He termed this technology a “speech neuroprosthesis,” which uses artificial intelligence (AI) algorithms to decode brain activity patterns into words.

Previously, most attempts using neural interfaces focused on spelling-based approaches. To communicate, those who use these devices typed out letters one-by-one. A drawback to this approach is typing takes much longer than speaking to communicate.

Early on, Chang’s colleagues remarked that what he was doing was impossible. They said speech was too complex and abstract. Although it seemed unattainable, he believed it was possible because of scientific advances in his understanding of the neural code as well as rapid progress in AI.

His lab conducted proof of principle demonstrations with patients with epilepsy, who were not paralyzed, but needed brain mapping for seizure surgery.  Chang asked them to read sentences. As they read aloud, he recorded their brain activity. His team built a computer model that translated brain activity into movement and then translated those movements into sounds. While imperfect, the speech synthesized from neural activity was intelligible.

Image
A man with a wire attached to his head watches a screen with the words "Would you like some water?" and "No, I am not thirsty"
A volunteer uses technology that gives him a voice by decoding his speech.

The results were promising enough that Chang believed he could now try this approach with someone who was paralyzed. He partnered with a colleague to launch the BRAVO (Brain-Computer Interface Restoration of Arm and Voice) study. The trial’s first participant—a man named Pancho—had a severe brainstem stroke that damaged the connection between his brain and his vocal tract and limbs.

Chang surgically implanted an electrode array over the patient’s speech motor cortex more than six years ago, and it is still transmitting strong signals. Every week, Chang and his team have worked to “translate the electrical neural code into words and more.”

Pancho initially worked with researchers to create a 50-word vocabulary. They asked him to say each word several times. As he tried, an AI algorithm learned to distinguish brain patterns to predict which word he was attempting to say.   

The system basically worked. Chang found the neuroprosthesis could do a “decent, but not perfect job” predicting which word Pancho wanted to say. The system confused words that were similar, such as “is” and “it” most often. It did a better job differentiating words that sounded different. It also used an “auto-correct” function to increase accuracy. Recently, Chang and colleagues proved the device could decode whether Pancho, who is bilingual, was speaking his native language, Spanish, or his second language, English. 

 A second patient, Ann, who also experienced a brainstem stroke and was paralyzed, reached out to Chang after reading about Pancho’s experience. Chang implanted an electrode array with twice as many sensors.

The new device was accurate and fast.  Continued efforts significantly improved the translation time to synthesized words, which were personalized using recordings of her pre-injury voice.  Additionally, the team built an “embodied” speech neuroprosthesis system that translates her brain signals into verbal and non-verbal facial movements that appeared on a speaking avatar. Several research groups across the country have now replicated their findings in other patients.

“It’s no longer about whether this is possible,” Chang concluded. “It’s now a question of how good we can make it.”

The NIH Record

The NIH Record, founded in 1949, is the biweekly newsletter for employees of the National Institutes of Health.

Published 25 times each year, it comes out on payday Fridays.

Editor: Dana Talesnik
Dana.Talesnik@nih.gov

Assistant Editor: Eric Bock
Eric.Bock@nih.gov

Assistant Editor: Amber Snyder
Amber.Snyder@nih.gov