How Do You Construct a Safe, Effective Algorithm?

It’s not an easy question to answer, but with a well thought out roadmap, it’s doable.

By John Halamka, M.D., President, Mayo Clinic Platform and Paul Cerrato, MA, senior research analyst and communications specialist, Mayo Clinic Platform, and Sonya Makhni, MD, MS, MBA, Medical Director, Mayo Clinic Platform.

By July 30, 2023, the US Food and Drug Administration had cleared almost 700 AI-enabled medical algorithms. Within a year, that number had grown to 950, and that does not include the many algorithms that do not require FDA clearance or approval. Unfortunately, the evidence strongly suggests that many of these digital tools, both approved and not, have a credibility problem because they lack transparency, measures of reliability, and demonstrated consistency.

An analysis of over 13,000 medical apps, for instance, found 79 that could be classified as digital therapeutics. But in this subgroup, only 52 had a least one study to support its efficacy or effectiveness (66%), which leaves about a third that fall short. These and similar studies beg the question: How does one develop a safe, effective AI-powered model that healthcare administrators and bedside practitioners would feel comfortable using?

One logical place to begin the journey is to identify a specific problem that needs to be solved and the functional requirements of the system to be developed. The diagram below includes this as the first step.

Sonya Makhni, MD, MS, a medical director for Mayo Clinic Platform, and Paul have been teaching a course on data mining and machine learning as part of the Mayo Clinic/Northeastern University MS program on the digital transformation of health where we have taken a similar approach. The course includes a project on how to develop an algorithm, in which students are first asked to come up with a specific problem to solve.

One of our students wanted to create a model to answer the question: How can the risk of stroke be reduced for patients with high blood pressure? Another wanted to predict congestive heart failure in patients using home sleep study data. A third focused on the problem of patient no-shows in a healthcare setting, which results in wasted resources, including clinician time and clinic space, and can delay necessary care for patients.

Once you identify a specific problem and understand how the relevant model needs to function, the next step is to locate a data set upon which the algorithm can be built, depicted in the diagram as step 2: data access and anonymization. Developers have used a variety of sources to create their data set. At Mayo Clinic Platform, we’ve created a massive data set called Mayo Clinic Platform_Connect. It’s a distributed data network program that partners with health systems, payers, medical device companies, and academic medical centers to enable better ways to diagnose, treat, and even prevent disease. In this program, clinical data are connected in a federated, secure architecture to drive innovation in healthcare.

Of course, many developers will look elsewhere for their data source. To create their hypothetical algorithm, the students in our Mayo Clinic/Northeastern University course used content from clinical trials, patient reported surveys, date and time data from an EHR system, asset management data from a cloud-based software platform like Nuvolo, and from the Alzheimer’s Disease Neuroimaging Initiative, to name a few. Once you choose a data source, you face the challenge of keeping it secure and private. As we have mentioned in other blogs, Mayo Clinic has a sophisticated system to de-identify patient data, which we refer to as Data Behind Glass*.

To acquire the data needed to populate a model, Dr. Makhni pointed out in our course: “If you are working with identified patient data, you will usually need to go through a formal request for your data set through your institution with whom you are affiliated. Typically, this process involves outlining the data elements you are interested in, briefly describing the question you are trying to answer, declaring your work relates to research or quality improvement, and some other questions. Then after some period of time for review, you will receive access to your requested data fields by some secure method. If you are planning on working with a publicly available data set, you simply need to request access as that might be indicated. If you are able, you may be able to query the data yourself; sometimes this task requires some programming skills in a language like SQL. Regardless, you will ultimately extract your desired data set.”  In addition to this process, developers may need to annotate their data, labeling it appropriately, often with the help of experts in the clinical or administrative domain.

Then comes the challenges involved in training the model, using a modeling technique that best fits the data and the end result you’re seeking. Options include a variety of statistical and machine learning techniques, including convolutional neural networks, logistic regression, random forest modeling, gradient boosting, and many others (Step 4 in our diagram).

At Mayo Clinic Platform, once an algorithm has been trained, we put it through a comprehensive qualification process encompassing regulatory compliance, technological, clinical, and algorithmic performance. This ensures clear and transparent communication of each solution's maturity, efficacy, safety, and bias mitigation to potential end-users.

Step 5, which we call algorithmic audit, requires experts to test and improve the model’s performance, followed by internal and external validation. At Mayo Clinic Platform, we have a digital tool called Mayo Clinic Platform_Validate to handle statistical validation. Here’s a sample of a Validate report, which addresses both performance and possible bias. It includes sensitivity, specificity, positive and negative predictive values, and area under the curve (AUC) metrics. Validate enables developers to test any algorithm against extensive data sets from more than 10 million patients. It can also assess the model's fit for purpose against multisite data from urban and rural communities across Mayo Clinic in Minnesota, Florida, Arizona, and more. The bias report looks at age, demographics, ethnicity, and a variety of other relevant features of the data set.

Most model developers realize the need for multi-site validation as well. They may initially test their model against patient data within their hospital system, for instance, and then find an independent facility to test it against. The key is to demonstrate that the model is generalizable. 

The Coalition for Health AI (CHAI) is also involved in ensuring that healthcare models are properly validated by issuing model cards, comparable to an ingredients list or food label that can help standardize the output of assurance labs that test said algorithms. The model cards include an algorithm’s model type, key performance metrics, security credentials, and much more.

Once a model has met appropriate validation criteria, the next difficult question to address is: Does it require FDA clearance or approval as software as a medical device (SAMD) or can it be commercialized without it? That decision will require the expertise of specialists in healthcare law.

Once this hurdle is behind you, clinical integration is next (Step 8). As we have discussed in an earlier article, “deployment is king.” Even the most accurate algorithm is useless if it can’t be seamlessly implemented into a hospital’s workflow. At Mayo Clinic Platform, we have developed a digital tool, appropriately named Mayo Clinic Platform_Deploy, to address this issue. This involves bridging the gap between solution developers and healthcare providers by evaluating the model for intended use, proposed value, and clinical and algorithmic performance and speeding up a model’s integration into the provider’s workflow with pre-built hooks into their EHR.

The last two steps in the “Wheel of AI” are engaging end users so they actually use the digital tool. The most impressive algorithm will remain an idle button in the EHR dashboard if clinicians don’t believe it will improve patient care and lighten their workload. And finally, any algorithm that has been successfully deployed and gained traction has to be monitored over time to make sure it really is doing what developers said it would do.

The road to launching a safe, effective, unbiased healthcare algorithm may seem daunting, but anyone who has successful make the journey can testify that it was worth the effort.

*trademark pending


Recent Posts