Finding the Best AI Algorithms: Is FDA Approval Enough?
Guidelines and guardrails for the safe use of AI require more than regulation. The Coalition for Health AI has created a framework to reduce these risks and improve the safety and effectiveness of these models.

By John Halamka, M.D., Diercks President, Mayo Clinic Platform and Paul Cerrato, MA, senior research analyst and communications specialist, Mayo Clinic Platform
A growing number of healthcare providers and clinicians are looking to AI-enabled algorithms to help perform a variety of tasks. Many are exploring their value in answering patient emails, summarizing long EHR documents, interpreting radiology images, and assisting in diagnosis and treatment. In their quest to find the safest, most effective digital tools, they typically review the FDA database of approved software as a medical device (SAMD). But is FDA approval or clearance enough to justify adoption of these devices?
The agency defines SAMD as: Software intended to be used for one or more medical purposes that perform these purposes without being part of a hardware medical device. The 21st Century Cures Act updated the way software is categorized and regulated by the FDA. If the software is not intended for use in the diagnosis, cure, mitigation, treatment, or prevention of a disease or condition, it is not regulated by the agency. Similarly, administrative, electronic record, general wellness software, and certain types of clinical decision support (CDS) are not considered medical devices and do not require FDA approval.
This last category requires further clarification. To qualify as non-device CDS, it must meet all four criteria from the 21st Century Cures Act:
- It cannot be intended to acquire, process, or analyze a medical image or signal from an in vitro diagnostic or a pattern or signal from an acquisition system;
- it must be intended for the purpose of displaying, analyzing, or printing medical information about a patient or other medical information;
- It must be intended for the purpose of supporting or providing recommendations to a healthcare professional about prevention, diagnosis, or treatment of a disease or condition
- It must be intended to enable a healthcare professional to independently review the recommendations so as not to rely primarily on the recommendations to make a clinical diagnosis / treatment for an individual patient.
If, on the other hand, the CDS support does not meet these four criteria, it will probably require FDA regulation. Although FDA’s mission includes establishing criteria to ensure the safe and effective approval of SAMD, its involvement in ensuring clinical validation of said medical devices has been limited. Several published reports attest to these limitations.
Rajurkar et al analyzed apps that interpret medical images and concluded: “The models underlying specific AI applications are often not tested outside the setting in which they were trained, and even AI systems that receive FDA approval are rarely tested prospectively or in multiple clinical settings. Very few randomized, controlled trials have shown the safety and effectiveness of existing AI algorithms in radiology, and the lack of real-world evaluation of AI systems can pose a substantial risk to patients and clinicians.”
The investigators cite the example of algorithms designed to evaluate brain tumor segmentation and interpret chest X-Rays. They found that outcomes got worse when they were validated on external data, when compared to the model training that occurred in hospitals that did the original training. Similarly, “… a retrospective study showed that the performance of a commercial AI model in detecting cervical spine fractures was worse in real-world practice than the performance initially reported to the FDA.”
The Coalition for Health AI (CHAI), an industry-led, non-profit public-private partnership, recently developed an AI Action Plan to address most of these shortcomings. The organization is calling for a National AI Innovation and Solution infrastructure that would prioritize standardized AI performance benchmarking. CHAI points out that “AI applications vary widely in risk. High-risk applications, such as models used in diagnosis or direct patient care, should be subject to stronger oversight, while lower-risk AI, such as administrative tools that indirectly impact patient safety and care, should require fewer reporting requirements.” The classification and benchmarking would take a nuanced, risk-based framework that enables models to be de-risked through mitigations like human oversight, increasing adoption without unnecessary regulatory burden. The group also emphasizes the need for rigorous post-marketing of monitoring of AI models. To accomplish these goals, CHAI is calling for a nationwide federated quality assurance network. The network would be responsible for leveraging public-private partnerships to provide testing frameworks, benchmark data sets, and performance metrics and disclosing AI model performance, including training data, limitations, and risks where it is relevant to safety.
CHAI’s AI Action Plan rests on a set of foundational principles that focus on responsible, trustworthy AI. These core principles include usefulness, usability, and efficacy; fairness; safety and reliability; transparency, intelligibility, and accountability; and security/privacy, all of which are explained in more details in the group’s Responsible AI Guide and Checklists. And on a more practical level, the principles serve as the basis for CHAI’s Applied Model Card. The model card, sometimes compared to the food label on a can of soup, spells out all the “ingredients” in a specific healthcare algorithm, including the developer’s name, model type, bias mitigation efforts, the source of the data set used to generate the model, the training details, and so on. A sample card for one of Aidoc’s products, called BriefCase-Triage for Intracranial Hemorrhage (ICH) is available on the CHAI website as well.
As AI-enhanced algorithms take center stage in patient care, clinicians and other decision makers in medicine will have to stay current on the standards required to make these digital tools trustworthy. Our patients deserve nothing less.
Recent Posts

By John Halamka and Paul Cerrato — The capabilities of generative AI continue to grow. Using them wisely will likely improve clinical decision making.

By John Halamka and Paul Cerrato — The Oxford Dictionary says disease is a disorder of structure or function, especially one that has a known cause and a distinctive group of symptoms, signs, or anatomical changes. It’s time to rethink that simplistic definition.

By John Halamka and Paul Cerrato — In part one, we discussed the shortcomings of evidence-based medicine and the disconnect between RCTs and bedside clinical care. Part 2 explores possible solutions, including machine learning-based algorithms.