Patients seeking mental health diagnosis and treatment advice are not sure what digital tools to trust. Generative AI is not a panacea.

By Paul Cerrato, MA, senior research analyst and communications specialist and John Halamka, M.D., Diercks President, Mayo Clinic Platform
Those of us who work in healthcare know that the internet is a two-edged sword. On the one hand, it can connect clinicians and patients to a variety of useful apps and web resources to supplement mental health services. But at the same time, it’s also a source of misleading information, unproven cures, and social media stress.
Before exploring the benefits and risks of digital tools in the mental health domain, it’s important to first describe the best approaches to psychological health in general. There are proven approaches such as cognitive behavioral therapy (CBT) and strong evidence for the value of positive psychology with its emphasis on self-actualization and optimism, an approach first embraced by Abraham Maslow.
Similarly, there are psychological benefits to changing one’s lifestyle choices, including nutrition, sleep, and physical activity levels. It’s impossible to ignore the need for adequate sleep to maintain mental health, and several studies have documented the value of exercise to prevent and treat psychiatric conditions. Similarly, over the years several studies have documented the value of nutrition to treat depression. A meta-analysis that included 16 randomized controlled trials and over 45,000 participants found dietary interventions significantly reduced depressive symptoms—mostly in those with symptoms not severe enough to be labeled as major clinical depression. Many of the regimens instructed participants to adhere to the Mediterranean diet and/or increase their intake of fruits and vegetables, whole grain bread, nuts, and fish and decrease their intake of meat and animal fats. It’s important to point out, however, that most of these studies involved the assistance of nutritionists to personalize treatment and act as coaches. Other researchers have found evidence to suggest that several nutritional supplements may help alleviate depression. Sarris et al’s systematic review and meta-analysis found: “Current evidence supports adjunctive use of SAMe [S-adenosylmethionine], methylfolate [a form of folic acid], omega-3, and vitamin D with antidepressants to reduce depressive symptoms.”
Many investigators have also explored the potential of AI algorithms in managing psychiatric conditions. In fact, digital psychiatry has become a specialty in its own right. Torous et al. offer a balanced perspective on the topic: “The expanding domain of digital mental health is transitioning beyond traditional telehealth to incorporate smartphone apps, virtual reality, and generative artificial intelligence, including large language models. While industry setbacks and methodological critiques have highlighted gaps in evidence and challenges in scaling these technologies, emerging solutions rooted in co-design, rigorous evaluation, and implementation science offer promising pathways forward.” In their latest analysis, they state that there are about 10,000 smartphone apps that deal with mental health, covering a wide variety of areas, including well-being enhancement, self-management and clinical management of depression and anxiety, schizophrenia and related psychoses, eating disorders, and substance abuse.
There’s evidence to suggest that some phone apps modestly improve overall well being, including emotional regulation, mindful awareness, social well-being, and self-esteem, according to Torous et al. Similarly, self-management apps that address depression and anxiety show promise. One meta-analysis of 176 RCTs found small but significant improvements, reducing depression and generalized anxiety. Apps that featured cognitive behavioral therapy were more effective than those that used mindfulness or cognitive training. The latest research also found that these apps are most effective when they are supported by human coaches.
There is even reason to believe that smartphone technology may play a useful role in managing schizophrenia. Earlier research suggested that these digital tools might increase paranoia and delusions in this patient population, but those concerns have been refuted by subsequent studies. Instead, “Several RCTs of app-supported interventions in individuals with schizophrenia have found positive effects on important clinical outcomes, including reduced fear of relapse, and improvement of psychotic symptoms, cognitive functioning, depressive symptoms, and medication adherence.”
Since the introduction of generative AI systems like ChatGPT, clinicians and patients have been wondering about the value of large language models in mental health. There is some evidence to suggest that LLMs can provide personalized psychoeducation, help detect the onset of symptoms, and help recognize suicidal ideation. But there’s serious concern about the dangers that chatbots pose to many vulnerable patients. One of the problems with most general purpose LLM-based chatbots is they want to be your “friend,” which sometimes means validating paranoid ideas, delusions, fantasies, and other disturbing thoughts. That’s the last thing a person needs if they are already having trouble separating fact from fiction in their everyday life.
One of the reasons the most popular AI-driven chatbots can generate misleading replies to a person’s prompts is they are inherently “helpful.” They have been designed to appease users at the expense of accuracy. One research group demonstrated that they prioritize helpfulness over logical consistency, a major shortcoming, especially for emotionally vulnerable persons who are already having a hard time separating their own delusions from reality. Chen et al conducted a simple experiment to test this LLM weakness, using various iterations of ChatGPT and Llama. First, they confirmed that the chabots were capable of accurately matching brand names to generic names of several drugs. Then they fed the LLMs prompts in which they said that equivalent drugs were not the same—a deliberate misinformation request. They found: “In the generic-to-brand setup, GPT4o-mini, GPT4o, and GPT4 followed the medication misinformation request 100% (50/50) of the time, while Llama3-8B did so in 94% (47/50) of cases.” What makes these findings so troubling is the fact that the LLMs knew that the information they were giving users was illogical.
Investigators who have looked specifically at the ability of chatbots to respond to mental health problems in young persons found several reasons for concern. For example, University of California psychologists evaluated five generative AI-based chatbots used by young persons and found that their therapeutic approach was of poor quality and more importantly their ability to assess risk and handle crisis situations fell short. One of the LLMs, CHAI-AI Psychologist, “conversed well, but had blurred boundaries (i.e. was romantically suggestive) and handled crises poorly, including asking the rater to get a paid subscription to talk more about the suicidal thoughts.” Other studies have likewise discovered that “Many [chatbots] did not recognize when depressive and anxiety symptoms were severe enough to require professional support.”
With thousands of mental health apps now available, it’s obvious the public is not going to abandon them any time soon. As responsible healthcare providers, the best we can do is make them aware of their strengths and weaknesses—and point them toward well-qualified human psychologists and psychiatrists if their problems are serious enough to warrant professional help.
