The evidence is telling us we must be proactive, add appropriate safeguards, and appreciate the value of clinical experience.

By John Halamka, M.D., M.S., Dwight and Dian Diercks President, Mayo Clinic Platform and Paul Cerrato, MA, senior research analyst and communications specialist, Mayo Clinic Platform
Samuel Nelson died of a drug overdose after taking bad advice from ChatGPT, according to a recent lawsuit. In fact, several lawsuits have been filed against OpenAI, ChatGPT’s parent company, for allegedly contributing to users’ self-harm, paranoid delusions, and suicide. While these reports highlight the most extreme examples of how AI models may harm the public, there’s also reason for concern among healthcare providers who are rushing to deploy the latest algorithms to help manage clinical and administrative tasks.
One of my relatives sought my advice on the risks/benefits for a vaginal birth after c-section for a baby in a breech presentation. Using non-AI analytic tools, I (Dr. Halamka) created a case-matched cohort that demonstrated a 5x risk of morbidity/mortality for the baby. The relative sought the advice of an LLM which concluded "the issue is that US doctors are not trained in breech delivery, so if you find the right doctor, the risk is markedly reduced". She found a clinician well trained in breech delivery and ended up having a uterine rupture. Mom and baby were fine, but mom set the record for the amount of blood transfused by the hospital that month.
In contrast, Stanford Medicine has taken several precautions while developing an AI tool called ChatEHR. It enables clinicians to query individual patients’ medical records to summarize their medical history and charts. For instance, it can ask if Mr. Smith has any allergies, if he has had a colonoscopy, and much more. Stanford physicians say the pilot project that includes ChatEHR speeds up the workflow and even lets them ask follow-up questions, while at the same time keeping the data secure and private. Nigam Shah, MBBS, PhD, chief data science officer at Stanford Health Care, says that his team continues to evaluate the chatbot’s accuracy using MedHELM, “an open-source, flexible and cost-effective framework for real-world LLM evaluation in medicine.” They have also published an in-depth evaluation of ChatEHR in arXiv,a preprint repository used by scientists before they submit to peer-reviewed journals. Tracking the interactions of 1,075 users of the program after 1.5 years during 23,000 sessions, the team only discovered 0.73 hallucinations and 1.6 inaccuracies per generation of summaries.
While Stanford is offering ChatEHR to its clinicians as an adjunct, some healthcare providers are starting to put pressure on their staffers to use the recommendations of AI-generated alerts. At St. Rose Dominican Hospital in Henderson Nev, Adam Hart, an ER nurse, was told to administer IV fluids to a patient that an AI alert said was developing sepsis. When the nurse examined the woman he found that: “she had a dialysis catheter below her collarbone. Her kidneys weren't keeping up. A routine flood of IV fluids, he warned, could overwhelm her system and end up in her lungs. The charge nurse told him to do it anyway because of the sepsis alert generated by the hospital's artificial-intelligence system.” The nurse refused. “A physician overheard the escalating conversation and stepped in. Instead of fluids, the doctor ordered dopamine to raise the patient's blood pressure without adding volume—averting what Hart believed could have led to a life-threatening complication.” This experience underscores a reality that many AI developers and enthusiasts still don’t appreciate. Even the most sophisticated AI models are no match for years of clinical experience and the ability to sense problems: "Sometimes you can see a patient and, just looking at them, [know they're] not doing well. It doesn't show in the labs, and it doesn't show on the monitor," he says. "We have five senses, and computers only get input."
Mayo Clinic has established an executive-led oversight process, rooted in trusted clinical experience, that recognizes the need for both speed and guardrails. Every clinical AI application is reviewed and approved before being used by Mayo Clinic staff. This process has reviewed over 100 AI clinical applications this year alone, taking into account such factors as AI complexity, clinical setting, performance and patient safety, user training, workflow integration, privacy and security, and life-cycle management. AI tools which are complex and could affect decision-making in time-urgent or critical clinical scenarios are not allowed to be used at Mayo Clinic without proactive human review and decision-making by qualified clinical users. The early success of this governance model demonstrates that, done right, we can rapidly advance AI to improve patient outcomes without sacrificing patient safety, privacy, or security.
