Reexamining the Evidence Behind Evidence-Based Medicine, Part 1
Evidence-based medicine has benefited patients in countless ways, but there’s a disconnect between EBM and the clinical care that is typically delivered at the bedside. Pragmatic, real-world trials and AI-based algorithms may help solve the problem.

By Paul Cerrato, MA, senior research analyst and communications specialist, Mayo Clinic Platform, and John Halamka, M.D., Diercks President, Mayo Clinic Platform
Since its introduction in the 1990s, evidence-based medicine (EBM) has transformed patient care and removed much of the guess work and unsubstantiated opinions from the practice of medicine. The original definition of EBM is the “de-emphasized intuition, unsystematic clinical experience, and pathophysiological rationale as sufficient grounds for clinical decision making and stresses the examination of evidence from clinical research.”1 Since that initial statement, the field has evolved to include other aspects of patient care. The National Cancer Institute defines EBM as: “a systematic approach to medicine in which doctors and other healthcare professionals use the best available scientific evidence from clinical research to help make decisions about the care of individual patients. A physician’s clinical experience and the patient’s values and preferences are also important in the process of using the evidence to make decisions. The use of evidence-based medicine may help plan the best treatment and improve quality of care and patient outcomes.”2
Since its inception, EBM has prioritized the importance of randomized controlled trials (RCTs), systematic reviews, and meta-analyses and considers them as gold standards for clinical decision making. However, critics continue to point out the shortcomings of RCTs, not the least of which is the fact that they often conflict with the way medicine is currently practiced in most healthcare delivery facilities. In a series of special communications published in JAMA, entitled Integrating Clinical Trials and Practice, Angus et al. state: “…the clinical trials and healthcare delivery enterprises largely ignore each other: RCTs frequently fail to generate knowledge relevant to practice, while practice patterns are frequently unsupported by, or fail to change with, RCT evidence.”3 They go on to highlight the weaknesses of RCTs by pointing out: “…many trials are duplicative, badly designed (providing no useful information), or never reach completion. Consequently, massive areas of clinical uncertainty are uninformed by any RCT. Treatment guidelines rely heavily on expert opinion, observational studies, or tangential extrapolation from partially relevant RCTs.”
These criticisms are only part of the problem. Others have pointed out that a RCT often generates results that are too narrow to be applied to patients in community practice. For example, RCTs that have studied the therapeutic options for patients with asthma, chronic obstructive pulmonary disease, and allergic rhinitis included only 5-10% of patients in routine care.4 The problem is the result of inclusion and exclusion criteria that are so strict that they no longer represent the general population. In addition, RCTs often confirm a treatment protocol that benefits the average patient, but as most experienced clinicians know, many of their patients are outliers and are far from average.
Other weaknesses worth noting are the fact that RCTs are very difficult and expensive to conduct, requiring an exhaustive search for eligible subjects, approval by internal review boards, regulatory barriers to address, extensive data collection and analysis, and more.
One critic summarized many of the aforementioned shortcomings succinctly: “All RCTs do is show that what you’re dealing with is not snake oil… They don’t tell you the critical information you need, which is which patients are going to benefit from the treatment.” These trials provide “central tendencies” of a very large number of people — a measure that’s “not going to be representative of much of anybody if you look at them as individuals.”5
Some of the criticisms that have been leveled at traditional RCTs can be remedied by conducting pragmatic trials. As stated earlier, RCTs frequently fail to generate knowledge relevant to practice. Pragmatic trials rely on data from electronic health records, disease registries, and medical claims, and typically include a more diverse cohort of patients. They also employ simpler ways of collecting data and less rigid exclusive and inclusion criteria. Of course, these data sources have their weaknesses as well. One way to address them is to embed randomization into the real world data.6 Randomly selecting patients from an EHR system or a disease registry, for example, can elevate pragmatic trials on the evidence “ladder.” Jones et al. demonstrated the value of such randomization in a pragmatic trial that compared two aspirin doses in patients with preexisting cardiovascular disease.7 They extracted EHR data for 15,076 patients from 20 centers and one health plan, using patient portals to randomly assign patients to either 81 mg or 325 mg of aspirin daily, on a 1:1 ratio. They found no differences in any of the outcome events measured, in deaths, and hospitalization for MI or stroke.
As mentioned above, disease registries are also a potential source of patient data for pragmatic trials, but to be seriously considered in this context, they must contain reliable, high quality content. Fortunately, many of these registries meet that high bar. They have been collecting standardized data in diverse clinical settings. That has prompted several professional societies and government agencies to invest in these resources.8
A nuanced approach to evaluating evidence
Since EBM first gained prominence in the 1990s, there has been a shift away from overemphasis on RCTs. In fact, the latest Grading of Recommendations Assessment, Development and Evaluation (GRADE) system for evaluating medical evidence addresses several of the criticisms outlined above. Among the latest criteria published by the GRADE Working Group are the need to take into account the risk of bias, imprecision, inconsistency, indirectness, publication bias, large effects, dose response gradients, and residual plausible opposing bias, with an emphasis of systematic reviews and other types of high quality data.9 In evaluating the strength of observational studies, which have been viewed unfavorably in the past, the GRADE system “establishes three criteria that can raise the level of evidence: large magnitude of effect, residual effect of confounding variables and dose-response gradient.”
More specifically, while RCTs are generally considered of higher quality than observational studies, the latter can be upgraded when they demonstrate a very large treatment effect and a clearcut dose response gradient.11 Similarly, international clinical guidelines for managing kidney disease highlight the value of observational studies if they show strong evidence of association between intervention and outcomes. 11
Although there have been many refinements in how medical evidence is evaluated and graded, there is room for improvement. AI-based algorithms that utilize artificial neural networks, random forest modeling, gradient boosting, and large language models are among the digital tools that hold promise and may have a significant impact in this domain. In part 2, we’ll focus on these potential solutions.
References
1. Evidence Based Medicine Working Group. Evidence-based medicine. A new approach to teaching the practice of medicine. JAMA 1992; 268:2420-2025.
2. National Cancer Institute. Evidence based medicine. https://www.cancer.gov/publications/dictionaries/cancer-terms/def/evidence-based-medicine Accessed December 24, 2024.
3. Angus D et al. The Integration of Clinical Trials With the Practice of Medicine. Repairing a House Divided. JAMA. 2024; 332: 153-162.
4. Wong GWK et al. Respiratory Guidelines—Which Real World? Ann Am Thorac Soc 2014;11(Suppl 2):S85-91.
5. Clay R. More than one way to measure. American Psychological Association. Sept 10, 2010. https://www.apa.org/monitor/2010/09/trials#:~:text= Accessed December 27, 2024.
6. Pencina, M et al. Deriving real-world insights from real world data: Biostatistics to the rescue. Ann Intern. Med. 2018; 401-402.
7. Jones W, et al. Comparative effectiveness of aspirin dosing in cardiovascular disease. N. Eng J Med. 2021; 384:1981-1990.
8. GRADE. Criteria for applying or using GRADE https://www.gradeworkinggroup.org Accessed December 31, 2024.
9. Lauer M, et al. The Randomized Registry Trial — The Next Disruptive Technology in Clinical Research? Neng. J Med. 2013;369:1569-1581.
10. Guyatt G et al. What is “quality of evidence” and why is it important to clinicians? BMJ 2008; 336. https://www.bmj.com/content/336/7651/995
11. KDIGO 2022 Clinical Practice Guideline For The Prevention, Diagnosis, Evaluation, And Treatment Of Hepatitis C In Chronic Kidney Disease. Kidney International. 2022;102(6S):S129-S205. doi:10.1016/j.kint.2022.07.013.
Recent Posts

By John Halamka and Paul Cerrato— Guidelines and guardrails for the safe use of AI require more than regulation. The Coalition for Health AI has created a framework to reduce these risks and improve the safety and effectiveness of these models.

By John Halamka and Paul Cerrato — The capabilities of generative AI continue to grow. Using them wisely will likely improve clinical decision making.

By John Halamka and Paul Cerrato — The Oxford Dictionary says disease is a disorder of structure or function, especially one that has a known cause and a distinctive group of symptoms, signs, or anatomical changes. It’s time to rethink that simplistic definition.