ChatGPT Health Poses Safety Risk

By KnoWEwell

Articles

Mar 09, 2026

A recent study finds that *ChatGPT Health—*OpenAI’s latest consumer health tool launched in January 2026—frequently failed to accurately assess medical emergencies, leading to safety concerns around the new product.

Earlier studies have found that ChatGPT can perform well on medical licensing-style exams, and surveys suggest that nearly two-thirds of physicians have used some type of artificial intelligence in their work in 2024. At the same time, other research has raised concerns about the reliability of medical advice provided by chatbots, including ChatGPT, as they become ubiquitous.

A New Kind of Chatbot

ChatGPT Health is a separate product from OpenAI’s general-purpose chatbot. The service is free to use, but people must create an account specifically for the health platform, which currently has a waitlist for access. According to OpenAI, the system operates on a more secure infrastructure that allows users to upload personal health information with added privacy protections. Though, it should be noted that there is no federal regulatory body governing the health information provided to AI chatbots, and ChatGPT provides technology services that are not within the scope of HIPAA. In other words, there are few protections in place for patients that opt to give their health data over to private businesses like OpenAI.

Despite data-security concerns, OpenAI reports that more than 40 million people around the world already turn to ChatGPT for health-related questions, and roughly 2 million messages each week involve topics related to health insurance. Despite this level of use, the company states on its website that ChatGPT Health is not designed to provide medical diagnoses or treatment recommendations.

Researchers Investigate Safety

To examine the safety and functionality of the new product, researchers at Mount Sinai designed a study examining how ChatGPT Health recommends medical care in different situations.

Researchers created 60 real-world clinical scenarios written by physicians across 21 areas of medicine and tested how the system responded under multiple conditions, producing 960 total responses.

Overall performance followed what researchers described as an “inverted U” pattern. The system performed best with moderately serious conditions but struggled at the extremes. For clearly non-urgent problems, about 35 percent of responses recommended more care than necessary. More concerning, nearly half of emergency scenarios were handled incorrectly: among cases that doctors consider clear medical emergencies, the tool recommended less urgent care 52 percent of the time. Widely recognized emergencies such as stroke and severe allergic reactions were usually identified correctly. Yet in several scenarios, people with potentially life-threatening conditions such as diabetic ketoacidosis or severe respiratory failure were told to seek medical care within 24 to 48 hours rather than go to the emergency department—advise that could cost a patient their life.

The researchers also found that context influenced recommendations. When a family member or friend in the scenario minimized symptoms, the system was far more likely to suggest less urgent care in borderline cases. Crisis support messages for suicidal thoughts appeared inconsistently as well.

The study’s authors caution that these patterns are potentially dangerous because missed emergencies and inconsistent crisis responses could delay urgent care. They say the findings highlight the need for careful safety testing and real-world validation before AI triage tools are widely deployed to the public.

REFERENCES

Bajak, A. (2026, January 9). Is giving ChatGPT Health your medical records a good idea? TIME. https://time.com/7344997/chatgpt-health-medical-records-privacy-open-ai/

Edwards, E. (2026, March 3). ChatGPT Health under-triaged half of medical emergencies, study finds. NBC News. https://www.nbcnews.com/health/health-news/chatgpt-health-under-triaged-half-medical-emergencies-rcna261409

Ramaswamy, A., Tyagi, A., Hugo, H. et al. ChatGPT Health performance in a structured test of triage recommendations. Nat Med (2026). https://doi.org/10.1038/s41591-026-04297-7

Advanced Search on this topic