AI for Sale: Buyer Beware

By John Halamka • August 21, 2025

Many hospitals are purchasing the latest AI algorithms to streamline operations and improve clinical care. But few of these models are being fully vetted for accuracy, bias, and transparency.

By Paul Cerrato, MA, senior research analyst and communications specialist and John Halamka, M.D., Diercks President, Mayo Clinic Platform

When consumers think about buying a new TV or appliance, they are often advised: Buyer beware. They’re encouraged to think twice about exaggerated claims from overly enthusiastic vendors. Having worked in the digital health space for several decades, we have both learned to warn healthcare providers to heed the same warning when they think about purchasing the latest AI tools. Companies may claim their service will fully automate your revenue cycle, require almost no effort to integrate into the hospital’s IT network, replace unnecessary administrative staff, perform part of the clinical diagnostic process, and much more.

A recent analysis of over 2,400 hospitals that looked at their IT capabilities, applications, and operations found that 65% were integrating their AI models into a EHR system. But more than half of these hospitals (56%) evaluated the algorithms for bias and more than one third said they did not do any kind of local evaluation to determine their accuracy. Overall, the survey respondents were using AI models to predict risks for inpatients, identify high risk outpatients, monitor health, recommend treatments, help schedule patient visits, and automate billing. Most of the models that were evaluated for accuracy used data from their own health system.

On the other hand, there are many other AI models that have received FDA approval for use nationwide. Unfortunately, the generalizability of these AI-enabled devices sometimes falls short. For a model to be truly generalizable, clinical performance studies need to be done; but an analysis of over 900 AI-enabled FDA approved devices found that only about half reported such performance studies when they were approved. Worse yet, one in four admitted that no such evaluation was ever conducted. Among the devices that did undergo clinical performance testing, most were retrospective in design, and only 2% involved randomized clinical trials.

Why is this happening? Part of the problem may lie in the way software as a medical device (SAMD) is approved by regulators. Many of these devices receive FDA clearance through the agency’s 510(k) process, which only requires “manufacturers to demonstrate that a new device shares technological characteristics and indications with legally authorized devices, thereby establishing substantial equivalence without the compulsory need for clinical testing.” But equivalence in technological features and indications is not the same as equivalence in clinical performance.

Technological and indication equivalence may mean software-based devices share very similar modeling techniques—a neural network or random forest analysis, for instance. They may also be addressing the same clinical problem, or use the same input data. But that doesn’t ensure the SAMD will perform the same in a real-world clinical scenario. For that to happen, the device needs to demonstrate clinical performance, with metrics like specificity, sensitivity, area under the curve, and positive predictive value, as well as measurable patient outcomes. It also has to demonstrate a lack of bias against patients who have been traditionally overlooked in large data sets.

FDA approval also doesn’t guarantee a software system’s transparency. For example, it may not make it clear to users how it arrives at its conclusions. As we have discussed in the past, the black box problem remains an issue for many clinicians who don’t want to blindly accept a diagnostic or treatment recommendation. Developers who design their models to provide clinicians with a credible explanation for their recommendations are much more likely to get buy in when their product is deployed. As we pointed out in a previous article, physicians are far more likely to trust an AI algorithm if it can provide an understandable explanation for its decisions.

The AI Gold Rush is upon us, but as Shakespeare pointed out: All that glitters is not gold. The most successful healthcare providers will make sure the digital solutions they invest in are fully validated, equitable, and explainable.

AI for Sale: Buyer Beware

Recent Posts