NIH Logo
March 25, 2016
Linguistic Analysis Yields Reliable Diagnoses, Cecchi Shows

Dr. Guillermo Cecchi
Dr. Guillermo Cecchi

Anything that generates a pattern can be analyzed mathematically. Speech and language generate patterns, and the mathematical savvy of scientists such as Dr. Guillermo Cecchi of IBM’s Thomas J. Watson Research Center holds the promise of enabling accurate diagnosis—from speech alone—of ailments ranging from psychosis, Parkinson’s disease and post-traumatic stress disorder to chronic pain, Alzheimer’s disease and depression.

In short, the way you talk, and write, can yield not only a medical diagnosis, but also the likelihood that you will convert, in time, to a condition.

Speaking at an NIMH lecture series in the Neuroscience Center, Cecchi, a native of Argentina whose first language is Spanish, demonstrated that the title of his talk—“A privileged window into the mind: language as a tool for diagnosis of mental disease”—applies not just in English, but also in other tongues.

Trained in both physics and psychiatry, Cecchi might most accurately be considered an opportunist. He realized that there was both an Everest of data—63.3 million patient visits for mental disorders, according to the CDC, where an interview is the main source of information—and a powerful new analytic arsenal buttressed by “an explosion of artificial intelligence/machine learning tools, computational power and algorithms.”

It dawned on Cecchi that the kind of analysis done routinely on everyone’s emails by vendors seeking to sate our consumer appetites “has not penetrated at all the mental health community.” He and his colleagues knew that computer science provided the mathematical tools to study language as it applies to psychiatry. Their challenge was to quantify psychiatric dysfunction by some new measure.

For a series of pilot studies, they used tools that have been around since the 1950s, including Noam Chomsky’s technique for determining the “logical scaffold of language,” graph theory and complex networks, and “semantic embedding,” which measures the relative meaning of words by frequency of occurrence and appropriateness of related terms. Quipped Cecchi, “The big [IT] companies use it to scan your emails.”

One study assembled regular users of the recreational drug ecstasy. The challenge was to discover, by analysis of language alone, whether the study subject belonged to one of four groups: high-dose, low-dose, methamphetamine or placebo. The subjects were asked to talk for 20 minutes about someone close to them, with little intervention by the interviewer.

Scientists could determine, with almost 90 percent accuracy, the high-dose ecstasy population from placebo. Perhaps unsurprisingly, verbosity was “a strong signal” for those high on speed.

Interestingly, such common conversational flotsam as “like,” “you know,” and “just” seemed to vanish in those using ecstasy. “There was a very strong effect—they became more fluent,” Cecchi said.

Cecchi’s work has been highlighted in the popular press, including the New York Times and Forbes magazine. “As they say, a picture in Forbes is worth a thousand citations,” he quipped.
Cecchi’s work has been highlighted in the popular press, including the New York Times and Forbes magazine. “As they say, a picture in Forbes is worth a thousand citations,” he quipped.


Another pilot study used graph theory to probe disorders of thought. This time, the study population included normal controls, schizophrenics and manic patients. They were to relate a recent dream, in 100 or fewer words. The short speech sample was then represented mathematically.

The resulting “graphical signatures” were unequivocal. “It jumps to the naked eye that there is a very clear difference,” Cecchi said. Manic patients produced graphs full of loops—they were literally loopy—that wandered out and returned, topically. Schizophrenic patients had fewer loops, less even than normal controls.

A third study sought to predict the onset of psychosis, and followed a group of people for more than 2 years after they sat for a 40-minute session with a therapist, open-endedly discussing their problems. Graph theory and semantic embedding were of little use here, Cecchi said, but when scientists added a new feature—semantic coherence—accuracy improved dramatically.

“It turns out that ‘flight of ideas’ is very important for defining psychosis,” he said. The math here simply computed the topical differences between consecutive sentences. “It yielded a very good signal.”

Investigators learned that the logical complexity of speech is markedly greater in those who convert to psychosis.

As further proof of their deductive skills, the mathematicians submitted writing samples by New York Post reporter Susannah Cahalan, a diagnosed psychotic and author of the book Brain on Fire, to analysis. They were able to identify “very dramatic drops in coherence” that closely tracked Cahalan’s journey through manic and schizophrenic phases, simply from her written texts (some of which, as you might have guessed, were not published).

Cecchi and his team are now applying their tools to Parkinson’s disease, “where we don’t expect purely linguistic components” to be predictive. Yet a 1-minute speech sample on the topic of what a typical day is like for a PD patient can spot the real patient with 76 percent accuracy, in both English and Spanish-speaking populations. Stuttering stands out as a marker in this group.

In a follow-up to the ecstasy study, using more subjects and fewer words, drug effect could be determined with 81 percent accuracy with the addition of new semantic features.

And a follow-up to the prodromal study of psychosis—a “story game” in which patients read a short story then attempt to retell it—showed 85 percent accuracy in predicting who would convert to psychosis. Again, faults in logical coherence were the giveaway. For reasons yet unclear, those who convert also employ markedly fewer personal pronouns, noted Cecchi.

Companies are now building cloud-based tools accessible by smartphones—you can speak and it will analyze the output, Cecchi said. Not only is verbal content a potential subject of study, but also one’s tone and pitch—pure sound—may yield useful information.

“What we really need is much larger studies,” concluded Cecchi. “There have only been a small number of small pilots so far, but I think that will change very soon.”

During a brief but lively Q&A session, wherein participants wondered what insights the great American gusher of speech might really say about all of us, an audience member wondered whether one day, perhaps 10 years from now, our smartphones might beep and notify us that we’re going nuts. Cecchi answered, “That will happen 3 years down the road, not 10.”

back to top of page