Weighing the Value of an AI Algorithm

Data scientists use a variety of coding languages to create AI-driven models, but the real “secret sauce” that helps them identify the best algorithms are the weights the coding generates.

By John Halamka, M.D., President, Mayo Clinic Platform and Paul Cerrato, MA, senior research analyst and communications specialist, Mayo Clinic Platform.

Predictive algorithms are having a major impact on healthcare, helping clinicians and administrators make key decisions in patient care. There are now algorithms to help identify a person most likely to develop atrial fibrillation and prediabetes. There are also models to help determine the 10-year risk of developing colorectal cancer among those without a history of the disease, inflammatory bowel disease, or precancerous polyps. Many of these algorithms rely on convolutional neural networks, gradient boosting, random forest analysis, and a variety of other modeling techniques.

For these models to yield safe, effective results, they must start with a large data set that’s representative of the patient population they hope to serve. Then they need to be trained, using the appropriate statistical and clinical standards we’ve covered in previous columns. During this training period, a collection of risk factors are evaluated to determine whether they do in fact increase the threat of the disease under discussion. For example, if you wanted to predict the likelihood of a person developing Type 2 diabetes, the risk factors would include body weight, family history of the disease, waist-to-hip circumference, and a long list of other features not typically measured in old school risk assessment systems. Each of these features or risk factors are given a score based on how much they contribute to the final model outcome, namely overt diabetes. In data science parlance, that’s its weight.

In the graphic below, the neural network was designed to help clinicians screen for melanoma. The input for the model might include tens of thousands of photos of skin lesions. The algorithm analyzes the photos looking for features that differentiate skin cancer from a normal mole. These features would likely include the irregular shape of a melanoma, color variations, and bleeding. Based on their association with melanoma, each of these features would be given a numerical value—its weight—to indicate how strongly linked they each are with the cancer. Positive weights are assigned to features associated with the cancer while negative weights are assigned to features that are linked to a normal mole. 

(Source: Cerrato, P, Halamka J. Redefining the Boundaries of Medicine. 2023, Mayo Clinic Press.)

Clinicians can get access to many other predictive algorithms to help them advise patients on the best treatment options. For instance, at Mayo Clinic, we have a calculator to help estimate various types of cardiovascular disease. One calculator helps clinicians differentiate wide QRS complex tachycardias (WCTs) in adults into two categories: ventricular tachycardia (VT) or supraventricular wide complex tachycardia (SWCT). It does this by calculating a VT probability estimate, which can then be integrated with other clinical information to reach an accurate VT or SWCT diagnosis. Another calculator predicts the likelihood of patients with cirrhosis dying after surgery. Each algorithm requires the clinician enter certain risk factors, i.e. variables that have been derived from clinical trials. These variables have been identified as weighted features, similar to the features or “signposts” for melanoma.

Similarly, Cleveland Clinic has a library of risk calculators that includes one that predicts the likelihood of a patient with pancreatic cancer surviving for at least a year after undergoing surgery for an adenocarcinoma. The calculator, which is located here, uses a statistical model called a predictive nomogram, which was derived from a study that looked at over 500 pancreatic surgeries for the cancer.  Using this digital tool, physicians can predict with some degree of confidence that a patient who had a well-differentiated tumor in the pancreatic head and no margins or malignant nodes has a 50% chance of surviving at least three years.

A closer look at the data from that study reveals a list of risk factors and their weights. The nomogram found a weight of -0.20496646 for well-differentiated tumor. Well-differentiated tumors generally have a better prognosis than those that are poorly differentiated. On the other hand, splenectomy was weighted as +0.90746165. That’s because a patient who has had a splenectomy will usually fare poorly. The surgery is often performed in more extensive surgeries, indicating more advanced disease, which is associated with poorer outcomes. When all 25 features measured during the study are summed up, they generate a total score that can help patients and physicians make informed choices going forward.

Deciding whether to use an AI-enabled algorithm in patient care will never be an exact science. But with the assistance of statistical weighting—and the expertise of an experienced clinician—patients’ odds of surviving their ordeal are much higher.


Recent Posts