A Call for Quality and Trust: Four Pillars of High-Quality AI
Sonya Makhni, M.D., M.B.A., M.S., Medical Director and clinical informaticist at Mayo Clinic Platform, wrote this article
- The gap between AI promise and adoption in healthcare: We examine why the potential of AI in healthcare has not translated into widespread adoption, despite the growth of AI solutions and awareness.
- The challenges of AI quality, safety, and trust: We argue that AI solutions need to be of high quality, which involves following principles of trustworthy AI, validating solutions, reporting performance and outcomes, and monitoring and improving solutions continuously.
- The implications of AI for patients and clinicians: We warn that AI solutions can have significant downstream effects on patient outcomes, such as misdiagnosis, bias, or harm, and that clinicians need to be informed and empowered to use AI tools responsibly and effectively.
- The call for action and accountability: We urge solution developers and healthcare organizations to work together to ensure that AI solutions are of the highest quality possible, and to safeguard the health and wellbeing of the patients they serve.
As messaging of the great promise and potential of AI in healthcare continues, we must take a step back to more closely evaluate why this “promise and potential” has not yet converted to “realized and actual”. Developments in AI, particularly within Generative AI, have provided a nice counterbalance in momentum to the ongoing technology recession. We are seeing steady reductions in funding; U.S. digital health funding peaked at $9.7B in 2021 and dropped to a 6-year low of $2.2B in 2023. These drops in funding come within the backdrop of global reductions in deal volumes across all industries. Despite this AI is still growing, with general awareness of AI exploding following the release of ChatGPT. In fact, 85% of healthcare providers have an AI strategy.
Interestingly, only about 50% have adopted at least one AI solution. This is notable given that there are over 1,500 healthcare AI vendors and half of these have been around for over seven years. Readiness to adopt AI is certainly on the rise, and there seems to be a wealth of AI solutions available in the market. So, where are they in practice? While we stand at the cusp of massive healthcare transformation, this stance at said cusp seems to be a bit prolonged given the wealth of AI solutions in the market.
Quality Challenges of AI Solutions
There are of course many reasons to account for slow adoption of AI solutions into clinical practice. These relate to technology, policy, and societal factors. We grapple with integration issues that prevent seamless workflow experiences for end-users. We struggle with culture issues, such as lack of awareness and buy-in from clinicians. And, we face burdensome regulatory requirements that slow solution developer progress.
But what about quality? We know some solutions are simply not effective or are too risky for physician comfort. Many solutions are prematurely marketed or deployed without demonstrating proof that they are of high quality. Further, AI solutions require a different framework for evaluation; they differ from traditional interventions, such as medication or procedures, in that they are not immediately ingested or felt. They are digital, so they influence indirectly via clinician. But those influences can have major downstream implications that are of equal or greater magnitude as compared to those traditional interventions. Let's consider a hypothetical AI solution that triages patients in the emergency room based off of predicted clinical severity. If this model is biased against a certain subgroup and an individual within that subgroup presents with a heart attack that is ultimately missed by the algorithm, they may be subsequently de-prioritized and could even die. This is a real possible outcome and consequence. This digital intervention would have had significant clinical impact, even though it was intended only to be an administrative tool to support patient volume and flow.
An AI solution can and likely will contribute to a negative patient outcome. We know this, so let’s be better prepared. Currently, there are a host of solutions that do not fall under FDA regulation but can still impact patients. A biased solution can prevent a human being from receiving beneficial treatment or valuable preventative care resources. These consequences might seem small in comparison to injury or death, but they still impact a human being’s life.
Four Pillars of High-Quality AI
We are at a pivotal moment where we must choose the path of Responsible AI so that innovations are delivered responsibly with quality and trust at the forefront. Here we describe four pillars of quality that innovators and healthcare providers should jointly strive to achieve for the AI solutions they create and use in practice: trustworthy AI, validation, performance and outcomes reporting, and monitoring and continuous improvement.
Organizations, such as CHAI, highlight key principles for Trustworthy AI (https://www.coalitionforhealthai.org/papers/blueprint-for-trustworthy-ai_V1.0.pdf). These include concepts such as useful, safe, accountable, transparent, explainable, interpretable, and fair. These concepts have been well characterized in CHAI’s Blueprint for Trustworthy AI, and it is now the responsibility of solution developers and healthcare organizations to work together to ensure that these principles are adhered to when developing solutions. Executing on this need might involve processes utilizing checklists or supporting documentation to assure that these goals are being addressed. Healthcare providers should proactively seek solutions that can demonstrate adherence to the standards of Trustworthy AI, and solution developers should willingly participate in this process, as this will likely promote chances for success post-deployment.
Solutions should not only be internally validated, but externally validated, as well. This means solutions should be tested on data sets independent of where they were created, ideally by a separate entity. Further, all solutions should also be assessed for bias. It would be highly unlikely for a solution to have no bias. It is realistic, however, to expect that AI solutions should be assessed for bias, as such information could have significant impact on how clinicians use these tools in practice. A solution may perform well in one demographic and poorly in the second; if a clinician knows this, he/she may feel more comfortable applying the solution to the first cohort and avoiding it in the second.
Performance & Outcome Reporting
Results of internal and external validation as well as bias analyses should be clearly communicated through transparent, explainable, and standardized language so that end-users, such as clinicians, can better understand the solution. Solution developers often worry that highlighting sub-optimal performance metrics may discourage use of those solutions. This assumption is problematic for several reasons. First, concealing suboptimal performance metrics will result in indiscriminate use of and unrealistic expectations for the solution. Clinicians will not be able to apply the solution to the appropriate clinical scenarios, and thus it may underperform. Clinicians will quickly lose trust in the intervention, and possibly AI in general. Second, clinicians are trained to think in terms of indications, contraindications, benefits, risks, and side effects. Knowing the facts – all the facts – is critical to the clinical decision-making process. Arming clinicians with information will help demystify the solution in ways that will likely promote adoption. Third, many assume that clinicians will be overwhelmed by technical information, such as training data, performance metrics, and inclusion/exclusion criteria. However, a survey of approximately 300 physicians showed that clinicians are more likely to trust AI algorithms and assign value to them if they are explainable and interpretable.
Monitoring & Continuous Improvement
Once a solution is created, the work truly begins. It must be deployed into real clinical workflows, and it must be monitored and assessed so that it can ultimately be refined. If a deployed solution is not monitored for algorithmic drift, performance, clinical outcomes, and bias, we will not know if it is effective or perhaps harmful. The heterogeneity of solutions makes monitoring complex; a solution to predict staffing needs and a solution to predict 7-day mortality need to be assessed on vastly different metrics. Solutions need to be held accountable on performance, so we must know what metrics to monitor (quantitative and clinical) prior to deployment. Further, AI solutions are trained on a snapshot of patient data. This data will change over time as patients age, clinical practices evolve, and populations change. This means that the performance of solutions can deteriorate over time if they do not reflect these changes. It is critical that we not only monitor clinical and quantitative performance, but that we continuously improve solutions based off of these results. We must build the tools that enable auditability, surveillance, transparency, and outcomes monitoring, and we must commit to deploying these alongside the AI solutions that we develop and promote. This will optimize chances that these solutions remain safe and clinically useful.
A Call to Action
As we continue our journey to bring innovation to patients, we all must hold ourselves accountable for ensuring that what we deliver is of the highest quality possible. Abiding by the four pillars consisting of Trustworthy AI, validation, outcomes and performance reporting, and continuous monitoring and improvement will help us ensure success of AI solutions. Incentives of solution developers and healthcare providers are aligned: better care for the patients we serve. We must remember that these patients have livelihoods, loved ones, emotions, and at times significant hardships. As creators and consumers of innovation, we are all responsible for safeguarding the health and wellbeing of those we aim to help. If we do so, we can look forward to long-term, high-value, and sustainable change that positively impacts every member of our healthcare ecosystem.
Liu, Chung-Feng et al. “Does AI explainability affect physicians' intention to use AI?.” International journal of medical informatics vol. 168 (2022): 104884. doi:10.1016/j.ijmedinf.2022.104884
Yin, Jiamin et al. “Role of Artificial Intelligence Applications in Real-Life Clinical Practice: Systematic Review.” Journal of medical Internet research vol. 23,4 e25759. 22 Apr. 2021, doi:10.2196/25759