Changing the Future of Health Care with the Right AI Validating Tools

Concerns about algorithmic bias and poor performance of the models have made many stakeholders more cautious about using these digital tools in patient care. That’s about to change.

By John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform

Initial reports about the potential of machine learning to transform health care first were met with mixed reactions by clinicians. Many experienced clinicians expressed doubts about its application “in the trenches,” while early adopters were eager to put AI-fueled algorithms to immediate use in patient care. In the years that followed, critics and enthusiasts have both taken a more nuanced position. As reviews about poor accuracy and possible bias surfaced, it has become apparent that we need to take a more critical look at these digital tools. Our recent critique in BMJ Health & Care Informatics, for instance, discusses how some algorithms have built-in biases toward Blacks, women and patients of a lower socioeconomic background.

Similarly, Johns Hopkins University investigators have found that health care inequalities are a significant problem in the algorithms used to predict 30-day hospital readmissions. The research team led by Suchi Saria examined four models designed to help clinicians identify patients most likely to be readmitted: LACE, HOSPITAL, Johns Hopkins ACG, and HATRIX. As they point out, these tools are currently being used “to direct care to high-readmission-risk patients, standardize readmissions-based quality metrics across hospitals, and forecast all-cause and condition-specific readmissions.” Unfortunately, their analysis revealed that LACE and HOSPITAL have the greatest potential to introduce bias and the Johns Hopkins model generated the most uncertainty; HATRIX, on the other hand, had the fewest problems. To arrive at these conclusions, the researchers developed an 11- question checklist for evaluation, such as:

  • Were the parameters being evaluated by the readmission model realistic proxies for patients’ health care outcomes or needs? 
  • Were any important features left out of each model?
  • Were validation studies conducted to evaluate differences among various subpopulations?

For example, suppose one of the metrics used by an algorithm predicts a patient’s likelihood of being readmitted to a hospital based on their health care utilization. The reasoning here seems sensible: if insurance statistics show a patient has used an inordinate amount of medical resources in the past, chances are they will find their way back into the hospital sooner rather than later. That logic is faulty, however, when applied the patients of color or in lower socioeconomic groups because they are far less likely to have access to health care resources. Less access translates into less utilization.

With such concerns about algorithm validation and bias in mind, we developed Mayo Clinic Platform_Validate, a digital product that helps measure model sensitivity, specificity, area under the curve (AUC) and bias, which in turn enables the system to provide the breakdown of the racial, gender and socio-economic disparities in the delivery of care. Using the tool lends credibility to models, accelerates adoption into clinical practice and enables developers to meet regulatory requirements for approval more readily. It provides users with a series of descriptive statistics of model performance and data to demonstrate that the model was run against data for each demographic.

To illustrate Mayo Clinic Platform_Validate’s performance, imagine that a developer wants to bring a clinical solution to market that utilizes binary classification to predict whether a pancreatic mass will be malignant or benign before surgery is performed. Inputs fed into the algorithm might include all historical patient data, including demographics, prior diagnoses, a history of previous cancers, family history of same, along with any other risk factors that have been associated with pancreatic cancer. Validate would provide the sorely needed testability that has been missing from so many commercially available products. In the case of the pancreatic mass algorithm, for example, bias and performance metrics will help clinicians determine if an algorithm's application in clinical practice is appropriate.  Validate enables health care stakeholders to test an AI model against an extensive data set and evaluate the reasonableness and usefulness of the result.  In addition to its ability to evaluate and certify the quality and accuracy of an AI model, Validate protects the intellectual property of the model and its data, using state-of-the-art privacy protection protocols. It analyzes metrics performance and can also perform a bias evaluation.

Combining the resources available from Mayo Clinic Platform_Validate tool with the insights generated by the Coalition for Health AI--which convenes a group of experts from academia, industry and government to develop standards to address equity and fairness--will be a game changer for the health care industry.

Recent Posts