Exploring the AI/Machine Learning Toolkit–Part 2
In the second installment in the series, we explain how gradient boosting can help make patient care more precise and cost effective.
By Paul Cerrato, MA, senior research analyst and communications specialist, Mayo Clinic Platform, and John Halamka, M.D., president, Mayo Clinic Platform
As algorithm developers look for more accurate ways to create digital tools that can predict disease severity and mortality, many are choosing gradient boosting. As we explain in The Digital Reconstruction of Healthcare, boosting is the operative word in this machine learning (ML) approach because it starts with a weak predictive model and continues to improve its ability to make more accurate predictions by boosting its strength. If, for example, an algorithm is designed to detect skin cancer by analyzing the pixels in a high-resolution image, the goal of gradient boosting is to refine its predictive ability through a series of iterations. If the model only used one feature of a melanoma to determine if a skin lesion is a melanoma — its irregular shape, for instance — the model would be a weak classifier. Boosting can improve its ability to classify the image as a skin cancer by taking into account many other features — e.g., its coloration, bleeding, and so on.
This approach is a type of ensemble learning that uses sequential boosting in which each iteration looks for errors in the algorithm’s ability to detect the cancer and distinguish it from a normal mole. To better understand gradient boosting, it helps to contrast it to forms of ensemble modeling that do not take a sequential approach — random forest modeling (RFM), for instance. RFM uses a “bagging” approach because it collects data points from the original dataset and generates separate parallel samples chosen at random. Each sample — for example, a collection of skin lesion images — is analyzed separately and then folded into separate models. Then all the models are combined, using a majority voting process or averaging to arrive at the most reliable algorithm. (This video provides an example of how RFM was used to help detect subgroups of patients with diabetes most likely to benefit from a lifestyle modification program.)
A sequential approach such as boosting, on the other hand, starts by extracting data from the original dataset and using the resulting random sample to generate an initial model. Typically, this model will contain numerous errors. Those errors are used to generate a second set of weighted samples to create a second model that has learned from the observations derived from the first model. The second iteration can then be used to create a third set of weighted samples that is then fed into the next model. Finally, all the models are used to do weighted voting. This weighted voting process results in a more accurate predictive tool.
Traditional boosting techniques, such as adaptive boosting (AdaBoost), use the errors of the current models to assign weights to different training samples for the next iteration, thus putting more emphasis on the model’s errors. Gradient boosting, on the other hand, draws inspiration from gradient descent optimization, training the next iteration model to fit the derivative of the prediction error (also called loss function). Hence, the different weak models form a path in the space of models, each step in an optimized direction toward a more accurate model.
A growing number of data scientists and software developers are favoring gradient boosting — in particular, a state-of-the-art version called xgboost — over other ML techniques because it tends to produce more accurate predictions. Gradient boosting (GB) is also favored for its ability to handle redundant features without suffering from significant overfitting, and for its handling of missing data, even without imputation. Zhang et al. have compared gradient boosting to artificial neural networks, logistic regression, RFM, and other modeling techniques to estimate that risk of developing type 2 diabetes among more than 14,000 men and 22,000 women in rural China. They found that all the modeling techniques had strong predictive performance, but GB performed the best with an area under the curve (AUC) of 0.872 when lab data was included in the analysis and 0.817 when it was excluded. The top 10 variables that suggested the likelihood of developing the disease were sweet flavor, urine glucose, age, heart rate, creatinine, waist circumference, uric acid, pulse pressure, insulin, and hypertension for all the techniques evaluated. However, while these 10 predictors were captured by all the models, the relative strengths of each variable differed depending on the technique. For gradient boosting, the “winner” in this analysis, the ranking of important features positioned sweet flavor, age, urine glucose, heart rate, and creatine levels as the strongest predictors. There were also differences in AUC, sensitivity, specificity, and positive predictive values among the various modeling approaches.
Investigators at Oklahoma State University have also tested the relative merits of various modeling techniques in health care and likewise found gradient boosting offered an advantage. Extracting patient records from Cerner’s electronic health record (EHR) database, they studied three prediction models to determine their ability to predict the severity of inflammation among patients with Crohn’s disease. Gradient boosting was very accurate (AUC = 0.928). Regularized regression had an AUC of 0.827 and logistic regression of 0.8274.
Gradient boosting has also proven useful in sifting through data on patients with prediabetes to estimate which ones will most likely develop full-blown diabetes. It is difficult to convince many prediabetics to take their condition seriously and make the necessary lifestyle changes to prevent diabetes for two reasons: First, they are usually asymptomatic so have no strong incentive to take action. Second, only a small percentage of prediabetics actually convert: 5 percent to 10 percent each year. In practical terms, that means convincing apparently healthy persons to make significant changes in their diet and physical activity levels when as many as 9 out of 10 will not develop the disease in any given year. As we discussed elsewhere in the aforementioned book, using an AI-enabled assessment tool that takes advantage of gradient boosting can factor in many risk factors not assessed in traditional risk scoring systems by gleaning data from a patient’s EHR. More importantly, this type of assessment system can accurately predict which patients will progress to full-blown diabetes with an AUC of 0.865. The results of this kind of ML-fueled assessment will hopefully carry more weight among prediabetics who are resistant to preventive measures.
The figure below provides a graphic to help explain how gradient boosting can be used in clinical medicine.
Recent Posts
By John Halamka, Paul Cerrato, and Teresa Atkinson — Many clinicians are well aware of the shortcomings of LLMs, but studies suggest that retrieval-augmented generation could help address these problems.
By John Halamka and Paul Cerrato — Large language models rely on complex technology, but a plain English tutorial makes it clear that they use math, not magic to render their impressive results.
By John Halamka and Paul Cerrato — Many algorithms only reinforce a person’s narrow point of view, or encourage existing prejudices. There are better alternatives.