Demystifying Natural Language Processing

NLP enables humans and computers to communicate in ways never imagined a few short years ago. The results have practical implications for anyone working in healthcare.

By John Halamka, M.D., M.S., Diercks President, Mayo Clinic Platform and Paul Cerrato, MA, senior research analyst and communications specialist, Mayo Clinic Platform

Natural language processing (NLP) may sound like a strange term if you’re not working in digital health. The simplest way to understand the expression is to remember that humans and computers do not speak the same language. We speak English, Spanish, or French while computers speak in binary code. When you hit the “b” key on a keyboard, for instance, the computer doesn’t send a “b” to the computer, it sends 01000010.  With this disconnect in mind, NLP is a way to convert our letters, words, and sentences into a language that computers not only understand but can act on in several practical ways. Among the useful applications that rely on NLP: translation services, summarizations of long electronic health records, sentiment analysis, and search engine optimization.

One reliable source explains: “NLP enables computers and digital devices to recognize, understand and generate text and speech by combining computational linguistics—the rule-based modeling of human language—together with statistical modeling, machine learning and deep learning.” Typically the process involves two broad steps: Preprocessing and creating algorithms. Preprocessing usually begins by cleaning up raw data and then using tokenization to break down human communication into its component parts. A simple sentence is broken down into its words, sub-words, or characters, which in turn enables the computer to more easily recognize patterns in the text being analyzed.

Computational linguistics also includes analyzing the data’s syntax and semantics. If you’ve been speaking English all your life, you probably don’t think much about the structure of the sentences you use or the order of the parts of speech. But listening to the Jedi Master Yoda gets the point across: “Named must be your fear before banish it you can” or “The greatest teacher, failure is.” NLP preprocessing works out the meaning of these strange comments by learning how the verbs, nouns, and adjectives are placed in a sentence. Tools to analyze semantics are equally important since so many of our expressions can’t be interpreted literally. “Dig a little deeper” for instance, needs to be examined so that its figurative meaning can be understood. Similarly, NLP uses word sense disambiguation to understand the difference between a mouse that runs across your floor and one that points to words on a computer screen.

Vectorization and word embedding are two additional steps in NLP. In simple terms, they substitute long strings of numbers for the words they are trying to understand and communicate. One source explains: “Vector embeddings are a way to convert words and sentences and other data into numbers that capture their meaning and relationships. They represent different data types as points in a multidimensional space, where similar data points are clustered closer together. These numerical representations help machines understand and process this data more effectively.”  

Once all these early steps are complete, NLP algorithms have to be created and trained. They can include logistic regression, random forest modeling, neural networks, transformers, fine-tuning, and related modeling techniques. Transformers are probably the most revolutionary component in the list. They are at the heart of the latest chatbots that have affected virtually every industry and the “secret sauce” behind ChatGPT and similar large language systems. LLMs rely on transformers—a type of neural network—and a technological innovation called self-attention. One of the best ways to understand it is with an example of how LLMs do language translation. The original article that put transformers in the spotlight was written by Google and University of Toronto data scientists. Entitled 'Attention is all you need', it used two test sentences to show how accurate an LLM could be in translating English to German and English to French. The latter task looked like this:

English: "The economic situation in Europe remains challenging, but there are signs of recovery."

French: "La situation économique en Europe reste difficile, mais il y a des signes de reprise."

If you were to translate each English word into French and line them up in the same order as they appear in the English sentence, it wouldn’t make any sense because French doesn’t use the same positioning of parts of speech as does English. There are neural networks that can handle this problem by switching around the word order to find the sequence of words that makes sense in the foreign language. Recurrent neural networks (RNNs) can do that to a limited extent. But Dale Markowitz with Google has pointed out that RNNs can’t handle really large sequences of text like essays and they are slow to train because they can’t be “parallelized.” In plain English, that means they can’t be processed through a long chain of graphic processing units, the GPUs that are stacked up side by side in a computer.

LLMs that use transformer technology can be parallelized and thus can be trained on massive amounts of data—often many terabytes. The technology relies on positioning encoding and the self-attention mechanism. RNNs look at words sequentially, but transformers assign a number to each of the words in the text document being analyzed. In the case of the English sentence above, “The” would be tagged 1, “economic” 2, “situation” 3 and so on. This is especially useful if you have an entire essay with hundreds of words. Then the neural network learns how to interpret this encoding while it is being trained on the millions of sentences in its data set. Transformers also know how to pay attention to certain words by analyzing thousands or millions of English/French sentence pairs. That teaches them the rules of grammar, usage, gender, and so on that are specific to each language.

Using the self-attention mechanism, these models gain a deeper interpretation of how language works. They can pick up on synonyms, grasp verb tense, and so on. This process involves having the transformer turn its attention inward, looking at the input text that it is required to interpret. As Markowitz points out, self-attention allows the model to “understand a word by looking at the context of the words around it.”

Spoken and written communication remains one of our most valuable tools to solve complex problems, express our feelings, and bridge the gap amongst ourselves. When used responsibly, NLP can complement these efforts. 


Recent Posts