The Therapeutic Potential of Voice Technology
John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform, wrote this article.
The human voice is capable of extraordinary feats of genius and everyday acts of kindness. It can recite Shakespearean sonnets, teach our children moral values, stir audiences with a dramatic performance, and much more. But few of us ever imagined it capable of assisting in the diagnosis of disease. That’s about the change, as evidenced by several innovative projects in voice technology.
With the right digital tools, it is now possible to analyze a patient’s speech patterns to detect clues to underlying pathological issues. Elad Maor, MD, Ph.D., with the Mayo Clinic Department of Cardiovascular Medicine, and his colleagues have looked at voice samples from about 100 patients who underwent coronary angiograms, asking them to read text excerpts and respond to questions about positive and negative emotional experiences. Their recorded responses found subtle differences in vocal pitch and intensity between patients who were ultimately diagnosed with heart disease and normal controls. [1] Dr. Maor and his associates concluded: “One possible explanation for our interesting finding is the documented association between mental stress, the adrenergic system, and voice. . . . Emotional stress conditions change the human voice, including an increase in fundamental frequency. . . . [O]ne possible hypothesis to interpret our findings is that the association between voice and atherosclerosis is mediated by hypersensitivity of the adrenergic system to stress. The association between stress, the adrenergic system, and atherosclerosis is well established on the basis of robust data.”
There is also reason to believe that voice technology may help detect pulmonary hypertension. Jaskanwal Deep Singh Sara, M.B., Ch.B. (also with the Mayo Clinic), in collaboration with scientists from Vocalis Health (an Israeli vendor), has analyzed voice recordings among patients who had invasive cardiac hemodynamic testing, the standard approach to diagnosing pulmonary hypertension. The recordings were analyzed to measure pitch, loudness, jitter, and other metrics. Dr. Sara and his colleagues found a significant association between an invasively derived hemodynamic index used to measure pulmonary hypertension and the vocal biomarkers. Patients with pulmonary arterial pressure at or above 35 mmHg had higher mean vocal biomarker readings than those with pressure readings at or below 35 mmHg. Given that invasive testing occurred during cardiac catheterization, the non-invasive collection of a patient's voice patterns holds promise as a safer alternative. If controlled clinical trials confirm the findings, it will likely reduce the cost and risk associated with pulmonary hypertension (PH) diagnosis. [2]
Investigators in Belgium also have had success using vocal characteristics as part of a suite of tools to help detect Parkinson's Disease in its early stages. [3] Some studies suggest that 60-90% of patients have subtle changes in their voice and speech patterns when initially diagnosed. In situations in which the diagnosis is questionable, however, a neurologist might administer a loading dose of levodopa, one of the standard drugs used to treat the disease and watch for improvements in a patient’s speech and voice. After administering the drug, the Flemish researchers monitored the strength of a patient's facial or mouth muscles and evaluated vocal quality, frequency, breathiness, phonation time — how long a person can sustain a vocal sound on one deep breath — and several other parameters, folding them into a metric called a Voice Handicap Index. They found that these markers helped distinguish patients with idiopathic Parkinson's disease from healthy individuals.
Mayo Clinic is also exploring the value of voice technology, coupled with artificial intelligence, to address patients' needs with neurological disorders and related motor speech disorders. The project's goals include creating digital tools for voice-based disease detection in the office, over the phone, and in a patient's home; earlier, more accurate and more holistic diagnoses; and the provision of individualized, in-home markers of disease progression and treatment response. To accomplish those goals, the Neurology AI Program and other Mayo researchers and clinicians are developing a fully automated digital speech diagnostics platform that can take a patient's speech sample and provide probabilistic diagnoses. Those taking the lead in this initiative believe that the primary goal should not be to map directly onto a clinical diagnostic label but instead use AI to extract clinically meaningful information from a speech sample.
Mayo Clinic-based speech pathologists Darley, Aronson and Brown [4] proposed a similar idea in 1969 in their seminal work on the dysarthrias, speech disorders caused by muscle weakness. While this theoretical construct was previously based on conventional speech labels, advances in AI have made it feasible to build such feature space using latent patterns in the data, which can then be labeled and linked to dense, multivariate medical data. This Deeply Annotated Speech Latent space can then be used to characterize new speech samples. When connected to non-speech medical and demographic data, a broader neurologic or systemic diagnosis can be rendered. The model will likely have the most considerable impact if deployed on smartphone devices, smart speakers or other wearable technology.
Voice technology has yet to surpass Shakespeare’s genius, replace a gifted actor's heart-wrenching performance, or serve as a moral compass for the next generation. Still, the evidence suggests it is ushering in a new generation of intelligent digital tools that will likely transform patient care.
References
1. Maor E, Sara JD, Orbelo DM, et al. Voice signal characteristics are independently associated with coronary artery disease. Mayo Clinic Proceedings. 2018;93:840–847.
2. Sara JDS, Maor E, Borlaug B, et al. Non-invasive vocal biomarker is associated with pulmonary hypertension. PLOS ONE. 15(4):e0231441. doi.org/10.1371/journal.pone.0231441
3. Lechien JR, Delsaut B, Abderrakib A et al. Orofacial Strength and Voice Quality as Outcome of Levodopa Challenge Test in Parkinson Disease. Laryngoscope, 130:E896–E903, 2020.
4. Frederic L Darley, Arnold E Aronson, and Joe R Brown. Clusters of deviant speech dimensions in the dysarthrias. Journal of speech and hearing research, 12(3):462–496, 1969.
Recent Posts
By John Halamka and Paul Cerrato — All the good things in the world worth believing, and among those good things are the therapeutic power of kindness and the healing effects of music.
By John Halamka, Paul Cerrato, and Sonya Makhni — How do you construct a safe, effective algorithm? It’s not an easy question to answer, but with a well thought out roadmap, it’s doable.
By John Halamka and Paul Cerrato — Generative AI has limitations, but with each quarter, performance and adoption are growing at an unprecedented rate.