Is Your AI Algorithm Ready for Prime Time?
With so many developers launching AI algorithms, end users worry that they may be investing in useless—or harmful—technology. Academic medical centers, technology companies, and federal agencies have joined forces to address the challenge.
By John Halamka, M.D., President, Mayo Clinic Platform and Paul Cerrato, MA, senior research analyst and communications specialist, Mayo Clinic Platform.
If you have been following recent developments in the digital health space, you no doubt recall the recent executive order from President Joe Biden that spelled out a set of guidelines for safe, secure, and trustworthy AI algorithms. As we pointed out in an earlier blog, the order requires developers of the most powerful AI systems to share their safety test results and other critical information with the U.S. government and to develop standards, tools, and tests to help ensure that AI systems meet those high standards.
With these mandates in mind, several leading healthcare systems are now working with government agencies and technology companies to establish a series of AI assurance labs to make this a practical reality. The stakeholders in this national initiative include Google, Microsoft, FDA, the Office of the National Coordinator for Health IT, and the Coalition for Health AI (CHAI). The latter group includes Mayo Clinic, Duke Health, Stanford Medicine, Johns Hopkins University, and many others. The AI assurance laboratories aim to bring structure, discipline, and an evidence-based approach to what has been called a Wild West environment by many critics.
In a sense, the effort to “tame” this Wild West has been like herding cats. A literature search reveals that there are more than 200 suggestions on how to report the performance of AI algorithms, including all sorts of model cards and data cards, randomized clinical trials, observational studies, and so on. CHAI has managed to bring order to this confusing scenario, and the new non-profit group that is leading the way has outlined a game plan for the AI assurance labs.
As Shah et al point out, we need a shared resource for development and validation: “A network of assurance labs could comprise both private and public entities, rather than one national organization, given the number and diversity of emerging models, the need for localized testing, and the increasing recognition of the need for ongoing monitoring as well as reporting. Such a network could fill a critical gap in an ecosystem dominated by well-meaning but often overexuberant and inexperienced developers who lack the depth of understanding of healthcare delivery.”
Said network will have at least five goals in mind:
- Technical evaluation of a model’s performance and potential for bias
- Assessment of an algorithm’s performance in patient subgroups
- Evaluation of a model’s usability in prospective studies
- Evaluation of how well a model works once deployed, by means of a “pre-deployment simulation of the consequences of using the model’s output in light of specific policies and work capacity constraints.”
- Ongoing monitoring of an algorithm once it has been put in place
There are a variety of ways that this national network might be configured, each with its own strengths and weaknesses. If it ignores the AI needs of local medical providers, and their unique workflow set up, the approved algorithms will likely fall short. One alternative is to have each local provider develop its own assurance lab. However, that would put smaller hospital systems at a distinct disadvantage since they don’t have the resources of the large academic medical centers. A second approach would be to create commercial assurance labs, run by large AI developers, not an ideal situation for ethical reasons.
Over the next 1-2 quarters, CHAI will refine the AI assurance network model and the nationwide registry of model performance.