The Transformative Power of Conversational Technologies – Part I

By John Halamka • December 14, 2022

Enabled by natural language processing, these digital tools are slowly finding their way into clinical practice. Despite the potential to make it easier to navigate an EHR system, they come with problems of their own.

By Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform and John Halamka, M.D., president, Mayo Clinic Platform

Natural language processing (NLP) has had a major impact on digital health in the last several years. While the technology can seem almost magical to the uninformed, once we pull back the curtain, it becomes clear that NLP is more math than magic. The practical tools generated by the technology are helping to solve some of health care’s most challenging problems.

NLP is the foundation upon which many conversational technologies are built. And these new digital tools are impacting cardiology, neurology, and a host of other specialties, as we have pointed out in our blogs and books. At the most basic level, voice technology requires computers to not only recognize human speech but to understand the meaning of words and their relationship to one another in a conversation—no easy task. The existence of so many idioms, colloquialisms, metaphors, and similar expressions makes human language very complex, and more than a little challenging for software programs to decipher. For instance, a simple expression from a worried parent “My baby is burning up!” can be interpreted in so many ways by a system that looks at the individual words and attempts to understand their relationships in context. The challenge of understanding the meaning of such statements becomes even more daunting when local dialects, accents, and speech impediments are taken into consideration.

To develop a functional NLP system that can interact with patients and clinicians, a lexicon of relevant terms first needs to be created that defines medical terms and the numerous lay expressions that are often used to describe the more technical words. A program capable of performing lexical analysis is also necessary to help interpret the various phrases and combinations of words used during a medical conversation. Finally, the NLP system must be capable of grasping sentence structure, understanding the grammatical rules of each human language it is analyzing, and applying semantic modifiers such as negation and disambiguates.

Detecting negation and understanding the relationship between subject, predicate, and object continue to challenge NLP systems. A negation is a statement that cancels out or refutes another action or statement. An NLP algorithm may read a sentence like: “The patient has chest pain, but no dizziness” and conclude that the patient has two medical problems: chest pain and dizziness. Similarly, NLP tools may suffer from lexical ambiguity, in which it has difficulty telling the difference between the subject of a sentence and its verb.

Although NLP programs remain primitive when compared to the language-processing skills of a 5-year-old child, they are capable of carrying on life-like conversations that enable us to order merchandise, listen to our favorite song, and estimate the risk of COVID-19 infection. Voice technology is also proving useful to clinicians trying to improve their interactions with electronic health record (EHR) systems. In fact, basic math suggests that voice communication should be more efficient than other forms of communication. On average, we speak 110–150 words per minute (wpm), whereas we type only 40 wpm and write only 13 wpm.

A systematic review by informaticists at Vanderbilt University has found several studies that document the ability of computerized speech technology to outperform human transcription, saving on costs and speeding up documentation. Despite such positive reports, these software systems come with their share of problems. Kumah-Crystal et al. sum up the challenges: “The accuracy of modern voice recognition technology has been described as high as 99%. . . . Some reports state that SR [speech recognition] is approaching human recognition . . . However, an important caveat is that human-like understanding of the context (e.g., ‘arm’ can refer to a weapon or a limb. Humans can easily determine word meaning from the context.) is critical to reducing errors in the final transcription.” Inaccuracies in transcription can also occur as a result of speakers hesitating too long, coping with interruptions, and unusual cadences.

These ambient assistants are also being used to speak commands into an EHR system, asking for a patient’s latest hemoglobin A1c readings or requesting an e-prescription. They have been used by nursing staff to call up patient allergies, as well.

In the next part, we’ll focus on several of the cutting-edge technology companies that are using conversational technology to improve communication between patients and clinicians and lighten the burden on practitioners who find they spend too much time filling in boxes in the EHR system.

The Transformative Power of Conversational Technologies – Part I

Recent Posts