Generative AI: Pulling Back the Curtain

Large language models like ChatGPT have the potential to profoundly benefit and seriously harm patients. With that in mind, it makes sense to understand the underlying technology.

By John Halamka, M.D., President, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform.

As the name implies, a large language model (LLM) uses massive amounts of data, and in the case of text documents, can learn all sorts of linguistic patterns and associations to create a deep knowledge base to draw on. That in turn enables the model to make logical predictions on which words, sentences, and paragraphs should follow one another. Nigam Shah, M.D., a Professor of Medicine and Biomedical Data Science at Stanford University, has provided a lucid explanation of how these digital tools work in a recent video.

If we start with a simple sentence like “Where are we going” and leave out the word “going”, and then ask a computer to predict that word, the probability looks like this:

P(S), the probability of the completed sentence = P of (where) x the P of (are given where) x P of (we given “Where are”) x P of (going given “Where are we”). If instead of predicting the above sentence, the training data included a different sentence: Where are we at, the same probability would exist. And if the data set consisted of only these two sentences, Dr. Shah points out that the probability of correctly predicting the missing word— going or at— would be 0.5. A LLM learns such probabilities on a massive scale. While a predictive model used to classify a skin lesion as melanoma or a normal mole typically will use a data set of hundreds or thousands of samples, LLMs use billions of data points. Some speculate that ChatGPT-4, from OpenAI, used a trillion samples.

To understand how these Chatbots work, it helps to deconstruct the term. The term chat is pretty obvious. These models talk to users, either with words, images, or even computer code. GPT stands for generative pre-trained transformer. These tools are generative in the sense that they are generating or creating new information, which can be accurate, inaccurate, or complete fabrications. They are pre-trained because the model deliberately masks some of the words in the corpus of training data. Then the partially masked data is inserted into a complex transformer program that uses various encoders and decoders, developed by Google, and the Chatbot is told to fill in, or predict the missing words, i.e. the data that had been masked.

Of course, all of these technical details do not touch on the ethical and practical implications of using these chatbots in the real world. As we have pointed out in previous articles, they can be used to generate fake videos depicting prominent public figures saying or doing things they never did. They’ve been used by students to answer test questions and have spread all sorts of conspiracy theories.

On the plus side, it is also being used to take on routine business tasks, enabling employees to focus on more complex work.  Some clinicians are also using it to answer patients’ questions, or sum up patient histories by extracting key elements from EHR records. The list goes on. But regardless of the benefits and risks of these LLMs, in order to have informed conversations with others it’s best to understand the basic concepts behind the technology, to pull back the curtain on this Wizard of Oz.


Recent Posts