A Deeper Dive into Knowledge Graphs
These digital tools have the potential to transform patient care by tapping data resources rarely used in routine medical practice.

By John Halamka, M.D., Diercks President, Mayo Clinic Platform and Paul Cerrato, MA, senior research analyst and communications specialist, Mayo Clinic Platform
Information overload continues to vex clinicians. The sheer quantity of data available in a patient’s EHR alone makes it almost impossible to obtain a comprehensive picture of their condition. And while AI-enabled algorithms are helping to summarize these reports, it’s not enough. One solution is to develop a visualization system that quickly puts all the patient’s most important information at a clinician’s fingertips. That information needs to include not just content from the EHR but details on their genetic make-up, environmental exposure to toxins, input from wearables, published systematic reviews and meta-analysis, and so much more. That’s exactly what knowledge graphs (KGs) are designed to do. The latest research demonstrates that these tools are accomplishing that feat.
As we explained in an earlier column, a KG is “a network of real-world entities—i.e. objects, events, situations, or concepts—and illustrates the relationship between them. This information is usually stored in a graph database and visualized as a graph structure, prompting the term knowledge graph.” These visualization tools have a long history in healthcare, dating back to the famous graphic created by the English physician John Snow in the 1800s. As figure 1 illustrates, he was able to link cholera outbreaks to water pumping stations in London. That connection became crystal clear when one looked at his map. The areas in the city circled in red represented the greatest number of cholera cases, most of which clustered around the Broad Street pump, circled in green.
Figure 1

The cause/effect relationship between microbe-saturated water and cholera may be obvious to 21st century clinicians, but to Dr Snow’s colleagues, it was a revelation. Similarly, clinicians and researchers who are deploying knowledge graphs are seeing hidden insights that are having an impact on patient care.
To date, there’s evidence to show that KGs are playing an important role in repurposing drugs so that they can be used to treat conditions for which they were not originally approved for. They can also improve clinical decision support when linked to EHR data, enabling them to detect hidden patterns, which in turn improve diagnostic predictions and treatment options. There’s also reason to believe KGs can support precision medicine and contribute to clinical research by generating better hypotheses and improving the reasoning process.
These impressive accomplishments take advantage of a KG’s basic structure, which includes nodes, edges, and labels—the so-called triple. As figure 2 illustrates, nodes can include various types of data, including disease phenotypes, exposure to specific environmental factors, drugs, and diet; edges represent the relationships between these nodes, and labels can be the text used to explain the relationships. They can be as simple as a caption for a table or be more complex, referring to variables plotted on an X and Y axis.
Figure 2 comes from a group of investigators who created a KG called SPOKE, an acronym for scalable precision medicine open knowledge engine. Their system took advantage of 41 specialized databases, 21 types of nodes, and 55 types of edges. These data sources included content from molecular and cell biology, pharmacology, and clinical practice. Morris et al explain: “SPOKE has been used for a variety of biomedical applications including drug repurposing…, disease prediction and interpretation of transcriptomic data…, among others. More recently, we developed an algorithm to embed electronic health records onto SPOKE, which, when combined with machine learning techniques, enables a wide range of applications relevant to precision medicine.”
Figure 2

(Source: Morris et al. The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information. Bioinformatics, Volume 39, Issue 2, February 2023, btad080, https://doi.org/10.1093/bioinformatics/btad080)
Ziad Obermeyer, a professor at University California, Berkeley, once said: “The complexity of medicine now exceeds the capacity of the human mind.” With the flood of new data resources now available, that complexity has grown exponentially. Well-constructed KGs are “connecting the dots,” helping clinicians manage this information overload. As they find their way in routine medical practice, there’s reason to believe they will improve patient outcomes.
Recent Posts

By John Halamka and Paul Cerrato—NLP enables humans and computers to communicate in ways never imagined a few short years ago. The results have practical implications for anyone working in healthcare.

By John Halamka and Paul Cerrato— Guidelines and guardrails for the safe use of AI require more than regulation. The Coalition for Health AI has created a framework to reduce these risks and improve the safety and effectiveness of these models.

By John Halamka and Paul Cerrato — The capabilities of generative AI continue to grow. Using them wisely will likely improve clinical decision making.