The Growing Power of AI Agents

Since ChatGPT was launched in 2022, the entire world has suddenly taken notice of AI’s capabilities. AI agents use such large language models to actively reshape business, finance, and the healthcare ecosystem. Are we ready for this shift?

By John Halamka, M.D., Diercks President, Mayo Clinic Platform and Paul Cerrato, MA, senior research analyst and communications specialist, Mayo Clinic Platform

One popular source says “Agentic AI is a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks without human intervention. The independent systems automatically respond to conditions, to produce process results.” According to Jensen Huang, CEO of Nvidia, it’s the next frontier in artificial intelligence.  He believes these digital tools will enhance productivity and operations across industries. The question on the minds of many healthcare professionals is: Will it benefit patients and do no harm? A closer look at some of the emerging AI agents can help address this question.

Vivek Natarajan, with Google DeepMind, and his associates recently introduced AI co-scientist, a multi-agent system built on the Gemini 2.0 LLM. Its purpose is to foster the development of new research hypotheses and proposals and hopefully reveal new insights and knowledge never imagined before. Gottweis et al explain that the tool includes: “(1) a multi-agent architecture with an asynchronous task execution framework for flexible compute scaling; (2) a tournament evolution process for self-improving hypotheses generation.” That’s quite a lofty ambition, but given the success of DeepMind to date, it might be achievable. Its AlphaFold system has had a profound effect on the scientific world by predicting the three-dimensional structure of proteins based on their amino acid sequences. That in turn is having a major impact on drug discovery, disease research, and much more.

Unfortunately, DeepMind’s explanation of AI-co-scientist is rather confusing to anyone not deeply involved in the specialty. In plain English, tournament evolution refers to the process of comparing and evaluating machine learning models to see which ones are best—a type of competition. Flexible compute scaling is the ability of a computer system to adjust its CPU, RAM, and other components so that it is capable of performing the tasks it’s required to do.

AI-Co-Scientist, illustrated in the figure below, accomplishes these “magical” feats by using a set of agents — Generation, Reflection, Ranking, Evolution, Proximity (which evaluates relatedness), and meta-review to continuously generate, debate, and evolve research hypotheses within a tournament framework. Feedback from the tournament enables iterative improvement, creating a self-improving loop towards novel and high-quality outputs.

In a sense, you might think of AI agents as large language models with built-in prompt engineering. Traditional LLMs will generate answers to a user’s query, or prompt, but the user must then submit additional queries to refine their search and weed out inaccuracies or vague generalities. The best AI agents are capable of doing this winnowing process on their own.

Figure 1: How AI-Co Scientist Works

(Source: Gottweis J et al. Towards an AI co-scientist. https://arxiv.org/abs/2502.18864)

Biomni is another AI agent that is gaining the attention of healthcare providers.  Developed by computer scientists at Stanford and Princeton Universities, and UCSF, its stated purpose is to “autonomously execute a wide spectrum of research tasks across diverse biomedical subfields.” By taking advantage of protocols and databases from tens of thousands of publications and 25 biomedical domains, it “integrates large language model (LLM) reasoning with retrieval-augmented planning and code-based execution, enabling it to dynamically compose and carry out complex biomedical workflows – entirely without relying on predefined templates or rigid task flows.” While the inner workings of the agent would take too long to explain in a short column, Biomni’s practical application is noteworthy, according to the researchers who developed it. It’s been able to gain novel insights from wearable sensor data, quickly perform bioinformatics analyses of large raw data sets, and develop lab protocols to help wet-lab researchers.

A popular song once spoke about the space “between the world of men and make believe.” In the real world, clinicians and healthcare executives have to make difficult decisions about the use of resources and treatment options. The Internet world of “make believe” on the other hand is filled with gems and junk. AI agents occur in the space between, offering potential benefits but risking misdirection as well.  Based on the evidence to date, we are cautiously optimistic about the future of these powerful digital tools.


Recent Posts