Paving the Road to AI Excellency
Developing and implementing AI-driven algorithms in health care has proven far more complicated than we first imagined. The new Coalition for Health AI (CHAI) guidelines provide the much-needed blueprint.
By John Halamka, M.D., President, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform.
When biomedical journals first began publishing research on AI-enabled algorithms, many observers were enthusiastic about their potential value in improving patient outcomes and easing clinicians’ burdens. Much of that enthusiasm has taken a backseat as we gradually realized that replacing clinicians’ diagnostic skills—or even augmenting them—was not as simple as we once thought. As stakeholders took a closer look “behind the curtain” to examine the composition of the data sets and the modeling techniques to create the algorithms, they realized that, in many cases, they were looking at smoke and mirrors, not sound, unbiased scientific evidence.
Against this background, several organizations began developing guidelines to ensure the development and implementation of trustworthy digital tools. For instance, SPIRIT-AI and CONSORT-AI consensus statements were issued to set standards on how AI-related research should be conducted and published. Similarly, several bias checklists, risk prediction models, and clinical decision support oversight guidelines became available to point vendors, investors, and health care providers in the right direction.
This week, the Coalition for Health AI (CHAI) is lending its voice to the conversation by publishing Blueprint for Trustworthy AI Implementation Guidance and Assurance for Healthcare. No doubt, some will ask: Do we really need another set of guidelines? The new CHAI blueprint addresses this issue by pointing out that “… there are few guides that offer a holistic approach to assessments of AI-based clinical systems for health systems, consumers, and end users.” With this goal in mind, the Blueprint takes a structured approach to the issue of trustworthy AI, following the same risk management framework used by the National Institute of Standards and Technology (NIST). We summarized the NIST’s recently published framework, called Artificial Intelligence Risk Management Framework (AI RMF 1.0), in a recent blog. Like the NIST document, CHAI has used four broad categories; Map, Measure, Manage, and Govern, to explain the key elements involved in developing trustworthy AI.
Our blueprint addresses one of the most critical issues stakeholders must contend with when navigating the large collection of commercially available AI algorithms: validation. Many of these digital tools have not been adequately vetted for accuracy and equity. The Blueprint explains validation as “confirmation, through the provision of objective evidence, that the requirements for a specific intended use or application have been fulfilled.”
Equally important, the Blueprint goes into detail and explains the need for algorithm reliability, reproducibility, testability, usability, postimplementation monitoring, benefits, safety, accountability, transparency, explainability, interpretability, and bias evaluation. In previous blogs, we have discussed the importance of addressing algorithmic bias. The CHAI document takes a much deeper dive into the critically important issue, outlining the need to examine several biases: human/cognitive, computational, systemic, and statistical.
Of course, a document that only talks about all the problems but none of the solutions would fall far short. The Blueprint also issued a call to action, including a recommendation to set up AI assurance labs, an advisory service infrastructure, registries for AI tools similar to the ClinicalTrials.gov clinical trials registry, and sandboxes that would enable developers and users to “play.” As the document points out: “An evaluation and monitoring sandbox platform that includes a data standards-based federated repository can help ensure long-term reliability of new AI algorithms as well by enabling evaluation and ongoing monitoring to identify bias, detect performance degradation due to data shift, and assess the usefulness of algorithms.”
Finally, we encourage stakeholders to get more directly involved in this far-reaching endeavor by joining the Coalition. We hope that new members will join us in taking an active role in creating these entities.
The road ahead will undoubtedly have many bumps, but the rewards are almost unimaginable, including tangible benefits at the bedside and a less burdensome workload for clinicians and administrators. We invite you the join the journey.