Discovering Hidden Treasure Within EHR Data
One of the privileges we enjoy working in digital health is the opportunity to use state-of-the-art AI tools to unearth actionable insights from millions of patient records, without compromising their privacy.
By John Halamka, M.D., president, Mayo Clinic Platform, and Paul Cerrato, senior research analyst and communications specialist, Mayo Clinic Platform
Discovering hidden patterns is one of the things we humans are good at. You experience several months of unexplainable GI discomfort and then decide to keep a food diary, which shows the symptoms come on shortly after eating dairy products: Looks like lactose intolerance. Physicians notice an unusual cluster of signs and symptoms among gay men in Los Angeles—an outbreak of pneumocystis pneumonia—investigate further, and discover the beginning of the AIDS epidemic in the United States. Our ability to see such patterns, at the heart of epidemiology, has saved countless lives. But despite our analytics skills, the human brain has its limitations and is incapable of detecting many of the subtle relationships among risk factors, behaviors, and disease, which is why data science and AI are having such a profound impact on health care. And while we may have called attention to the shortcomings of AI-enabled algorithms in recent blogs, that’s not to suggest that these tools aren’t moving us forward.
Examples abound: The EAGLE study used machine learning to help improve the diagnosis of ventricular systolic dysfunction; the diagnosis of brain cancer in the operating room has benefited from deep neural networks; Medial Early Sign’s research relied on AI-enabled algorithms to help locate patients at high risk of colorectal cancer. But none of these digital tools would have generated clinically useful insights without access to reliable, representative, de-identified data sets. With this reality in mind, Mayo Clinic Platform has developed several products to help developers and providers move into the future. The Mayo Clinic Platform _ Discover product, for instance, is a portal to patient data that offers access to decades of Mayo Clinic de-identified patient data that can be mined with sophisticated tools to advance and improve patient care.
"Our data includes the longitudinal records of nearly 10 million patients," says Emily Wampfler, senior director at Mayo Clinic Platform and product manager for Discover. "For example, Discover includes 1.8 million echocardiograms, the largest collection in the world; 520 million clinical notes collected over 40 years; and 1.1 billion lab test results."
This makes it one of the richest clinical data sets in the world. This is accomplished in partnership with the data analytics firm nference, with the use of its automated deidentification and augmented curation technologies.
Using Discover, researchers will be able to explore data from a trusted source to:
- Build new or better artificial intelligence (AI) algorithms and increase the speed of algorithm development
- Create analytics tools to accelerate innovative scientific and clinical research to improve patient care
- Decrease the cost of refining algorithms with one access point to a variety of data types
Mayo Clinic Platform will provide access to data, infrastructure, tools and services to support development of statistical or machine learning models to achieve research and business objectives. Each customer's work environment is monitored and audited to ensure there is no misuse of the de-identified, aggregated data.
"Protecting patient data is an essential component of Discover. We have established a multi-layer, re-identification defense strategy to guard patient privacy," says Wampfler.
Potential partners and customers include medical devices companies or other research institutions developing solutions aimed to improve diagnostics, therapeutics, and cures for patients.
The human capability to see hidden patterns in patient data has enabled clinicians and researchers to discover new pathological pathways and develop new diagnostic tools. With the help of sophisticated AI-based tools, there’s no limit to what we can accomplish.
Recent Posts
By John Halamka, Paul Cerrato, and Teresa Atkinson — Many clinicians are well aware of the shortcomings of LLMs, but studies suggest that retrieval-augmented generation could help address these problems.
By John Halamka and Paul Cerrato — Large language models rely on complex technology, but a plain English tutorial makes it clear that they use math, not magic to render their impressive results.
By John Halamka and Paul Cerrato — Many algorithms only reinforce a person’s narrow point of view, or encourage existing prejudices. There are better alternatives.