Who Watches the Watchers?
Critics have questioned the objectivity of medical journal editors and peer reviewers, asking whether these “watchers” are preventing the best innovations from reaching professional readers.

By John Halamka, M.D., Diercks President, Mayo Clinic Platform and Paul Cerrato, MA, senior research analyst and communications specialist, Mayo Clinic Platform
A field experiment published in the Proceedings of the National Academy of Sciences (PNAS) suggests the problem is real. Investigators sent three versions of a finance research paper that was co-written by a Nobel laureate and an early career associate to over 3,000 reviewers. One version included the laureate’s name, another only listed the junior author, the third was sent out anonymously. Only 23% rejected the article when the prominent author was listed, compared to 48% when only the junior author was listed. Similarly, 20% recommended the paper be accepted when the Nobel winner was on the article, versus less than 2% when the research associate alone was listed.
Despite this troubling finding, there’s no doubt that peer review improves the practice of medicine by reducing the number of major flaws and gaps in research articles. Nonetheless, the PNAS study is one of many reports that strongly suggest that poorly executed peer review is compromising the integrity of research, which in turn puts clinicians and their patients at risk. For example, when 250 controlled clinical trials were analyzed, researchers found that if peer reviewers were made aware of how treatment options had been assigned to different patient groups, the odds ratio for a treatment’s effect was exaggerated by 41%. The researchers also examined the effect of the lack of double-blinding on reviewers’ judgement. In this context, double-blinding refers to not telling authors who the reviewers are and not telling reviewers who the authors are. When reviewers were not blinded to who wrote the articles, odds ratios were 17% higher.
Several other shortcomings come to mind. There is evidence to show that some reviewers write unduly harsh critiques because the article being considered was written by a competitor. And several thought leaders have pointed out that journals are far more likely to accept papers that have generated positive results versus negative results. Lastly, investigators have discovered that “the biggest hazard to the quality of published literature …. is indifferent acceptance of low-quality [articles]”.
How can these shortcomings be corrected—or at least mitigated? One obvious step would be double blinding, which would likely address some but not all of the problems. It seems unlikely it would prevent unscrupulous reviewers from rejecting papers that challenge their firmly held position on scientific issues. Other thought leaders have recommended that peer reviewers receive formal training in epidemiology and statistics to improve their skills; to date the research on that approach has generated mixed results.
AI-enabled algorithms are another option to consider. While there are no definitive clinical trials to demonstrate that AI will improve the peer review process, several options show potential. Grunebaum et al have developed the FAIR framework, an ethical hybrid peer review system that has merit. Its goal is to improve Fairness, Accountability, Integrity, and Responsibility. “The framework employs standardized prompt engineering to guide AI evaluation of manuscripts while maintaining human oversight throughout all stages.” By using algorithms and standardized evaluation protocols, it hopes to detect bias and provide transparent audit trails and documented decisions.
There is also reason to believe that large language models may help reviewers evaluate an article reporting a randomized clinical trial (RCT) by detecting gaps in the paper’s completeness and consistency in trial reporting. Srinivasan et al state: “LLMs could enable scalable, reliable auditing of RCT reporting and highlight persistent gaps that journals and funders should target to improve research transparency and reproducibility.” Similarly, there are AI-enabled systems that can help reviewers assess the validity of the statistics being used in a submitted article. Statreviewer, for instance, will automatically review the statistics deployed by the researchers, as well as the integrity of the manuscript. Of course, reviewers have to be willing to use these AI tools and fairly judge their findings.
Thought leaders have been debating the strengths and weaknesses of peer review for many years. We hope some of the possible solutions we’ve described above will move us in the right direction.
Recent Posts
By John Halamka and Paul Cerrato — The research strongly suggests that it’s time to rethink our nutritional choices and urge patients to do likewise.
By John Halamka, Paul Cerrato, and Nneka Comfere — Among the medical specialties, dermatology has witnessed some of the most promising AI applications in recent years.
By John Halamka and Paul Cerrato — Mayo Clinic Platform has developed tools that enable users to find new diagnostic and treatment solutions.
