The Algorithmically Underserved Need Our Attention
Three patient populations are underrepresented in the AI models currently being developed. There are several practical solutions to the problem.
By John Halamka, M.D., president, Mayo Clinic Platform; Nigam Shah, M.B.B.S., Chief Data Scientist, Stanford Healthcare; Suchi Saria, Ph.D., Johns Hopkins University School of Medicine; and Paul Cerrato, M.A., senior research analyst and communications specialist, Mayo Clinic Platform.
There are many underserved populations in the health care ecosystem, groups who for one reason or another have been ignored or marginalized by providers: persons of color come to mind, as do women, LGBTQIA+ patients, and those in lower socioeconomic groups. One group that is rarely included in the list, however, are the algorithmically underserved. These are patients who fall between the cracks when developers and data scientists construct the digital models intended to provide accurate, unbiased medical care. They fall into three primary “buckets”.
There are those whose data is simply not available in electronic form. Obviously, we can’t build an AI-enhanced algorithm for patients whose health care data has never been placed in an electronic health record system. They may include patients who do not have a primary care provider and are not seeing any clinician who uses an EHR; isolated populations whose providers still rely on paper files to conduct business; and immigrants who don’t want their identity flagged by federal authorities.
The second underserved population includes those whose data are electronic, but the sample sizes are too small to create well performing algorithms. Certain Native American populations come to mind, as well as groups on isolated Pacific Islands. In situations like this, there aren’t enough records to develop a representative model. We have the inputs, but the data size is too small for the math to work.
Finally, there are patients whose data are electronic, a good model can be learned, but the resulting algorithms are not being utilized because the model does not perform equally well for other categories. For example, a study published in JAMA showed a poor algorithm performance for the overweight, which significantly overestimated the risk of atherosclerotic cardiovascular disease in individuals with higher body mass index.
In our quest to develop fair and equitable algorithms, we delay the benefits for a large majority, creating iatrogenic under-servedness. Doing so would be like stopping the use of the Pooled Cohort Equation for 260 million people because it does not work well for 40 million Americans. We don’t do that. Instead, we create guardrails that notify the user regarding the likely errors in certain population. The current mindset is a classic case of sacrificing the good in pursuit of the perfect. It also illustrates the resistance to change seen among many stakeholders in the health care community.
Finding solutions for these three categories will not be easy but they are achievable. For example, we can pull data from non-traditional sources, including content about the social determinants of health from consumer sources like smartphones, fitness devices, social media sites, and the like. To make this doable, the health care ecosystem will require more robust interoperability, and providers will have to make more of an effort to adhere to the information blocking rules that were recently put into effect by the federal government.
At the most basic level, it requires providers, insurers, and other related entities to create the connections needed for one system or one app to talk to and share data with other entities. Since October 2022, providers, developers, and health information exchanges have been required to offer all the electronic health data in whatever form it exists, as long as it is in a computable or machine-readable format. By the end of 2023, however, a publicly available export format must be provided by these organizations so that patients can make sense of the information.
Another way to underdo the algorithmic bias we speak of is to take advantage of distributed data networks (DDNs), which will enable us to achieve more breadth, i.e., more patients, and better spread, namely different kinds of patients. This is accomplished through a DDN’s unique architecture. Typically, it begins with a massive collection of patient data, which is then de-identified with trustworthy algorithms. Once the de-identified data set is available, it can be shared with trusted partners and used to develop AI-enabled applications that meet the needs of underserved patient populations.
Finally, to address the problem of algorithmic under-servedness, developers will have to start testing their models’ performance in better ways and set up guardrails to keep them on track. As we have mentioned in other publications, the Coalition for Health AI (CHAI) has been launched with this purpose in mind. CHAI’s aims for 2023 include:
- Development of Assurance Standards for health AI tools
- Development of a Technical Implementation Guide for health AI tools
- Creation of a sandbox platform and the blueprint for an AI Assurance, Discovery & Evaluation Lab
- Creation of a strawman for a portal displaying community-driven transparency information
- Development of a first version of a maturity model that assesses health systems developing and using AI/ML tools
- Development of a business plan for an AI Assurance, Discovery & Evaluation Lab
Each of these initiatives will benefit the algorithmically underserved.
(Dr. Halamka will address some of those challenges and the path forward at the HLTH conference on Wednesday, November 16, 2022.)