NIST Provides Much-Needed AI Guardrails
The National Institute of Standards and Technology recently published a framework that will help health care providers assess AI trustworthiness, explainability, and bias.
By John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform.
Like most technologies, AI-enabled algorithms are a two-edged sword, capable of generating measurable benefits, but also capable of causing significant harm—to both clinicians and patients. The newly released federal guidelines, Artificial Intelligence Risk Management Framework (AI RMF 1.0), discuss the latter possibility by stating: “Without proper controls, AI systems can amplify, perpetuate, or exacerbate inequitable or undesirable outcomes for people and communities. With proper controls, AI systems can mitigate and manage inequitable outcomes.”
The list of potential dangers is long. Unreliable algorithms can harm a person’s civil liberties, do physical and psychological harm, and deprive them of economic opportunities. Untrustworthy algorithms can also disrupt a provider’s business operations, open the door to security breaches, and result in monetary loss and a damaged reputation. On a broader scale, they can potentially disrupt the global financial system, supply chain, and a variety of interrelated systems. With these risks in mind, NIST has produced a practical set of guidelines to help organizations manage these problems.
The Framework is divided into two sections. Part one explains how to analyze AI systems’ trustworthiness. It points out that these systems need to be “valid and reliable, safe, secure, and resilient, accountable and transparent, explainable and interpretable and fair with their harmful biases managed.” Part two of the document describes four specific functions to help organizations address the risks of AI systems in practice. These functions – GOVERN, MAP, MEASURE, and MANAGE – are broken down further into categories and subcategories. While GOVERN applies to all stages of organizations’ AI risk management processes and procedures, the MAP, MEASURE, and MANAGE functions can be applied in AI system-specific contexts and at specific stages of the AI lifecycle.
While it may seem obvious to say that an AI risk management approach requires proper governance, the everyday responsibilities of health care leaders can sometimes crowd out the need to give this issue adequate attention. To manage AI risk, senior decision-makers must create a culture that places risk management as one of their top priorities. Once this is accomplished, leaders need to outline processes, documents, and organizational schemes that anticipate, identify, and manage the risks that the AI system poses. Equally important, accountability must be established. The Framework states: “Roles and responsibilities and lines of communication related to mapping, measuring, and managing AI risks [need to be] documented and … clear to people and teams throughout the organization.”
AI risk mapping requires all AI actors to be fully informed of the context in which the AI systems will be used. Because any AI system will have numerous moving parts, with each part often assigned to different teams or rolled out during different stages in the system’s lifecycle, everyone needs to understand the context in which they are working. For example, early decisions in identifying the purposes and objectives of an AI system can alter its behavior and capabilities and the dynamics of deployment settings (such as end users or impacted individuals) can shape the impacts of AI system decisions. As a result, the best intentions within one dimension of the AI lifecycle can be undermined via interactions with decisions and conditions in other later activities. Categorization is another part of the mapping process. The specific tasks and methods used to implement the tasks that the AI system will support need to be defined (e.g., classifiers, generative models, and recommenders).
Most informaticists instinctively see the importance of measuring the value and risks of AI algorithms. That includes rigorous software testing, performance assessment, validation, and bias evaluation. At Mayo Clinic Platform, we have devoted considerable resources to these metrics through our Gather, Discover, Validate, and Deliver products. Similarly, the Coalition for Health AI has concentrated its efforts on developing many of the guardrails outlined in the NIST Framework. NIST succinctly summarizes the need for such safeguards, stating, “The AI system to be deployed [needs to be] demonstrated to be valid and reliable. Limitations of the generalizability beyond the conditions under which the technology was developed [must be] documented.”
Finally, NIST emphasizes the need to manage risk resources to mapped and measured risks regularly. Implementing an AI system makes little sense if it’s neglected once it’s in place. Managers must determine whether the AI system achieves its intended purposes and stated objectives and whether its development or deployment should proceed. Equally important, they must be on the lookout for emerging threats.
Although we have focused most of this discussion on what can go wrong with an AI system, let’s not forget the other side of the two-edged sword. As NIST points out: “With proper controls, AI systems can mitigate and manage inequitable outcomes.”
Recent Posts
By John Halamka, Paul Cerrato, and Teresa Atkinson — Many clinicians are well aware of the shortcomings of LLMs, but studies suggest that retrieval-augmented generation could help address these problems.
By John Halamka and Paul Cerrato — Large language models rely on complex technology, but a plain English tutorial makes it clear that they use math, not magic to render their impressive results.
By John Halamka and Paul Cerrato — Many algorithms only reinforce a person’s narrow point of view, or encourage existing prejudices. There are better alternatives.