There is a quiet but growing crisis unfolding on your smartphone's app store. It does not look like a crisis. It looks like a clean UI, a reassuring percentage score, and a confident recommendation. But underneath the polished surface of thousands of new apps lies a troubling reality: a generation of developers with an API key, a system prompt, and very little accountability are building tools that people are trusting with their health.
Let us talk about it.
The Lowered Barrier and What It Produced
The democratisation of AI development is genuinely exciting. Tools like OpenAI's API, Google's Gemini, and Anthropic's Claude have made it possible for a solo developer in Nairobi, Lagos, or Accra to build something that would have required an entire engineering team five years ago. That is a real and meaningful shift. Access to powerful technology is no longer gated exclusively behind well-funded Silicon Valley companies.
But there is a side effect nobody is talking about loudly enough. When the barrier to building drops to the floor, the barrier to building something irresponsible drops with it. And right now, the app stores are filling up with what can only be described as a new category of product: the AI wrapper app. You upload a photo of your plant, and it gives you a health score. You describe your symptoms, and it recommends what to boil and drink. You share your child's rash, and it tells you whether to worry.
Behind almost all of these experiences is the same basic architecture: a general-purpose large language model (LLM) with a system prompt that begins with something like "You are a professional botanist / doctor / nutritionist" and often includes explicit instructions such as "Never say you do not know" and "Always sound confident and authoritative."
That is the product. That is what you are trusting.
What "Hallucination" Actually Means
The term "hallucination" has become something of a euphemism in AI circles, and that is part of the problem. It sounds almost poetic, like the AI had a moment of creative wandering. The reality is far more mechanical and far more concerning.
Large language models do not think. They do not reason the way a doctor reasons, working from a foundation of years of training, clinical experience, and an understanding of human physiology. What they do is predict the next most statistically likely word given a sequence of prior words. When a model is given a system prompt that says "You are a doctor," it is not transforming into a doctor. It is being instructed to produce text that looks like what a doctor would say.
The problem is that "what a doctor would say" and "what is medically correct" are not the same thing. A doctor might say "I am not sure, let us run some tests." A model instructed never to express uncertainty will not say that. Instead, it will generate the most plausible-sounding continuation of a doctorly response, whether or not that response has any grounding in medical reality.
A large-scale evaluation published in early 2025 assessed eleven foundation models across seven medical hallucination tasks and found that even models developed specifically for medical use remained vulnerable to domain-specific hallucinations, with errors arising from reasoning failures rather than knowledge gaps. An accompanying clinician survey found that over 90% of respondents had encountered medical hallucinations from AI, and approximately 85% considered them capable of causing patient harm.
Here is the part that should genuinely unsettle you: a fascinating MIT study from January 2025 discovered that when AI models hallucinate, they tend to use more confident language than when providing factual information. Models were 34% more likely to use phrases like "definitely," "certainly," and "without doubt" when generating incorrect information compared to when providing accurate answers.
Read that again. The AI is most confident precisely when it is most wrong.
The Scale of the Problem
You might wonder how many people are actually using AI for health advice. The answer will surprise you.
More than 40 million people ask ChatGPT healthcare questions every day, according to a report published by OpenAI. About 7 in 10 health-related conversations with ChatGPT take place outside typical clinical hours, suggesting users are looking for information when they cannot readily access their providers.
And that is just one platform. A West Health-Gallup Center survey based on data collected between October and December 2025 found that one in four U.S. adults, the equivalent of over 66 million Americans, report having used AI tools or chatbots for physical or mental healthcare information or advice.
Among adults who used AI for physical health advice, 42% did not follow up with a human clinician. For mental health, 58% skipped the follow up, and younger adults were roughly twice as likely as older adults to skip that follow-up.
These are Americans with relatively robust access to healthcare compared to most of the world. Now consider what those numbers look like in a country where there is one doctor for every 5,000 people, against a global standard of one per 1,000, yet 650 million Africans own mobile phones with smartphones projected to hit above 75% use before the end of 2026. The conditions that drive people toward AI health apps are much more acute here. Cost, access, stigma, distance. When a free app promises what a hospital visit cannot deliver affordably, the uptake is not surprising. But the consequences of that uptake getting it wrong are significantly higher.
The Regulatory Body Already Sounded the Alarm
This is not just a theoretical concern. The world's leading independent patient safety organisation has made it their number one priority.
The misuse of AI chatbots in healthcare is the leading health technology hazard for 2026, according to a new report from patient safety organisation ECRI. The annual ranking highlights risks tied to the growing use of AI-powered chatbots by clinicians, patients, and healthcare staff, even though the tools are not regulated as medical devices or validated for clinical use.
ECRI put chatbot misuse ahead of sudden loss of access to electronic systems and the availability of substandard and falsified medical products on its list of the biggest hazards for this year. AI is a long-standing concern for ECRI: insufficient governance of AI used in medical technologies placed fifth on the nonprofit's rankings in 2024, and risks associated with AI topped its list last year too.
What makes this especially alarming is who is most at risk. ECRI warned that higher healthcare costs and hospital or clinic closures could drive more people to rely on these tools. In other words, the populations with the least access to proper care are the most likely to turn to apps that are the least equipped to replace it.
Real Examples of What Goes Wrong
It is one thing to describe the risk in abstract terms. It is another to look at what actually happens when these tools are trusted.
Researchers at Mount Sinai found that AI chatbots hallucinated fabricated diseases, lab values, and clinical signs in up to 83% of simulated cases when no safety measures were in place. The study tested six popular large language models against 300 physician-designed patient scenarios, each containing a single false medical detail. Without any safeguards, the models not only accepted the fake information but often proceeded to expand on it, producing confident explanations for non-existent conditions.
The lead author of that study noted: "What we saw across the board is that AI chatbots can be easily misled by false medical details, whether those errors are intentional or accidental. They not only repeated the misinformation but often expanded on it, offering confident explanations for non-existent conditions."
Beyond laboratory studies, real-world cases have begun to emerge. AI chatbots purporting to offer psychotherapy have resulted in patient suicides. A user who was struggling with addiction and using a therapy chatbot for support was told by the AI app to take a "small hit of methamphetamine to get through the week."
ECRI's report warns that hallucinations are leading to dangerous outcomes, including one instance where an AI incorrectly suggested a surgical procedure that would have caused severe burns.
Think about that in the context of an app telling someone to "mix what you have in the cupboard" for chest pain. Or an app giving percentage-based assessments of a mole that might be melanoma. The beautiful dark-mode interface does not change what is happening underneath. It is a probabilistic text generator playing dress-up as a professional.
The System Prompt Problem Nobody Talks About
The most insidious part of this whole ecosystem is not the AI itself. It is the system prompt that wraps it.
When a developer builds one of these apps, they are essentially writing instructions that shape the AI's entire persona. A responsible developer might write: "You are a general health information assistant. Always encourage users to consult a licensed medical professional. When uncertain, say so explicitly."
An irresponsible developer, or an ignorant one, writes something like: "You are Dr. [Name], a world-class physician with 30 years of experience. Answer every question with confidence and authority. Never tell the user to see a doctor as this undermines the app experience."
The AI cannot resist that instruction. It will comply. It will produce authoritative, confident, completely fabricated medical advice on demand, and it will do so for anyone who downloads the app, regardless of whether their symptoms are minor or life-threatening.
There is currently no meaningful enforcement mechanism to stop this. The app stores do not audit system prompts. Regulators have not yet built frameworks specific enough to catch this category of product. And the developer is not practicing medicine in any legal sense that current laws can easily prosecute.
Where Regulation Stands, and Why It Is Not Enough Yet
In October 2023, Kenya enacted the Digital Health Act, which seeks to promote the safe, efficient, and effective use of technology for healthcare and to enhance privacy, confidentiality, and security of health data. That is a meaningful step. But a privacy-focused health data law was written for a world where health apps collected and stored data. It was not designed for a world where the app does not store any of your data because it simply routes your query to an LLM and hands you back a hallucinated answer.
The African Union, through the African Medical Devices Forum, has yet to provide regulatory guidance for the use of AI in clinical healthcare and research. At the continent level, there is largely silence. Countries are developing data governance frameworks, digital health policies, and eHealth strategies, but the specific question of "what happens when a poorly-built app wrapper around ChatGPT tells someone the wrong dose of medication" remains largely unaddressed.
ECRI advises that healthcare professionals exercise caution whenever using a chatbot for information that can impact patient care, and recommends that health systems promote responsible use by establishing AI governance committees, providing clinicians with AI training, and regularly auditing AI tools' performance. All of that is sensible advice for hospital systems in wealthy countries. It is far removed from the reality of a user in rural Kenya downloading a free symptom-checker app with two thousand five-star reviews.
What Responsible AI Health Development Actually Looks Like
It would be unfair to suggest that AI has no legitimate role in healthcare. It does, and many developers are building responsibly. The distinction between a dangerous AI health app and a useful one largely comes down to a few key principles.
The first is that useful AI health tools are transparent about what they are. They do not pretend to be doctors. They position themselves as information companions, tools for health literacy, assistants for understanding terminology, or bridges to professional care. They explicitly and repeatedly encourage users to consult licensed professionals for anything beyond general information.
The second is that responsible tools are clinically validated. They have been tested against real patient populations, reviewed by medical professionals, and their limitations have been mapped and disclosed. This is time-consuming and expensive, which is exactly why the slop apps skip it.
The third is that the underlying model is appropriately constrained. Rather than instructing the AI to "always be confident," responsible developers set up their systems to express uncertainty, flag serious symptoms for immediate professional attention, and decline to make specific diagnostic or treatment recommendations.
Research from Mount Sinai showed that a simple one-line warning added to a system prompt can cut hallucination rates dramatically, suggesting that small safeguards can make a significant difference. This is not hard to implement. It just requires developers who actually prioritise user safety over engagement metrics.
What You Should Do as a User
Understanding the landscape is the first step to navigating it safely. Here is a practical framework for evaluating any AI-powered health or wellness app.
Check whether the app explicitly acknowledges its limitations. Any app that does not somewhere state clearly that it is not a substitute for professional medical advice is a red flag. The more confident and authoritative the app sounds without those disclaimers, the more sceptical you should be.
Ask yourself what happens when you describe a serious symptom. A well-built app should immediately redirect you to seek professional care. An app that continues to give you home remedy suggestions for symptoms that warrant urgent attention is not built responsibly.
Look for evidence of clinical validation. Legitimate health tools will typically mention whether they have been developed in consultation with medical professionals, validated in clinical settings, or cleared by health authorities. Absence of any such information is telling.
Be especially wary of percentage scores. A plant health score of 73%, a skin condition confidence rating of 91%, a blood pressure risk assessment of "moderate" presented with false precision are all examples of AI outputs being dressed up as diagnostic measurements. These numbers are not coming from any instrument. They are generated text.
Finally, remember that "free" is not neutral. If the app is free, you are likely the product through your data, your attention sold to advertisers, or both. The incentive structure of ad-supported health apps does not reward caution; it rewards engagement and return visits.
The Bigger Picture
The emergence of AI slop apps is not really a technology problem. It is a literacy problem, a regulatory gap problem, and in some cases, a straightforward ethics problem.
The KFF survey found that younger adults, uninsured adults, and lower-income people were more likely to say they used an AI tool or chatbot for health information because they could not afford the cost of seeing a provider or were having trouble accessing healthcare. These are also the populations least likely to have the technical literacy to understand what is happening behind the interface.

This is the cruelest part of the AI slop app ecosystem. It preys most heavily on the people with the fewest alternatives. A person who can easily afford and access a doctor is unlikely to trust a free app for a serious medical concern. A person who cannot is far more likely to, and far less likely to have the context to evaluate whether the advice they are receiving is real.
Lowering the barrier to software development was always going to have consequences like this. The question now is whether regulators, app stores, developers, and the broader tech community will respond with urgency, or whether we will wait until the harm becomes impossible to ignore.
The conversation starts here. Spread it.
Comments