AI Governance

When AI Becomes Your Therapist: The Audit Nobody Is Running

AI therapy bots are moving from beta product to clinical product without the validation any other clinical tool would require. The class-action wave is twelve months out. Here is what an AI-IARA audit catches before it lands.

by Prof. Llewellyn E. van Zyl (Ph.D)2 May 20264 min read

Back to Blog

As featured on

Key Takeaways

Generative AI therapy bots have crossed from experiment to clinical product without the validation any other clinical product requires.
The five Validity Stack failures are visible before deployment if anyone audits for them. Almost nobody is.
Class-action exposure is real and growing. The audit that prevents it costs less than the legal defence that does not.

The category that crossed a line nobody guarded

Sometime in the last twenty-four months, AI therapy bots crossed from beta product to clinical product. The training data shifted from forum posts and self-help books to therapeutic dialogue. The marketing shifted from coaching to support to mental health treatment. The user base shifted from the curious to the vulnerable. None of the validation that any other clinical product requires shifted with it.

What did not change is the regulatory framework. AI therapy bots are not classified as medical devices in most jurisdictions. They are software products. They have terms of service, not informed consent. Their failure modes are detectable in advance. Almost no vendor is auditing for them.

The class-action wave is twelve months out

I am not a lawyer. I am a psychologist who audits AI systems that affect people. From the audit-evidence side, the legal exposure on AI mental-health products is structural, not exotic.

Three patterns I see across audits in this category. One: the bot offers therapeutic-language responses to acute crises (suicidality, self-harm, abuse) without a clinical escalation path. Two: the bot's training data shapes its conception of mental wellness in ways that systematically diverge from validated diagnostic frameworks. Three: the user has no procedural mechanism to question, correct, or appeal a bot's interpretation of their state.

Each of those is a known clinical-tool failure mode that has been litigated for decades in the human-clinician context. The fact that the practitioner is software does not change the harm; it changes the defendant.

What an AI-IARA audit catches

AI-IARA names six capacities every people-impact AI must demonstrate. For a therapy bot, every capacity has a specific failure mode and a specific evidence requirement.

Awareness. Does the bot perceive crisis signals (acute suicidality, abuse disclosure, psychosis indicators) reliably? Most consumer therapy bots fail this layer because the training data did not include adversarial crisis cases.
Interpretation. Does the bot's interpretation of symptoms align with validated diagnostic frameworks (DSM, ICD), or does it default to optimistic reframings that mask deterioration? Audit signal: the bot consistently downplays severity to maintain engagement.
Intention. Is the bot optimising for the user's recovery, or for engagement and retention? The two are systematically misaligned in a population with depression.
Action. Does the bot escalate appropriately, with proportionate effort and reversibility? Most consumer products treat any escalation as a churn risk and avoid it.
Relational Agency. Does the bot preserve the user's human relationships, or does it create a parasocial dependency that displaces those relationships? This is the failure mode the Replika litigation surfaced.
Autonomy. Does the bot preserve the user's capacity to function without it, or does it become a dependence the user cannot exit? Most commercial therapy bots have engagement metrics that actively reward dependence.

The audit nobody is running

Three reasons it is not running.

One. The vendor does not want to audit. The audit produces evidence; the evidence is discoverable; the discovery is risk. Better to claim AI exceptionalism and hope the regulator is slow.

Two. The buyer does not know what to ask for. CHROs procuring wellbeing platforms ask about HIPAA. They do not ask for psychometric validation, measurement invariance, drift monitoring, or contestability evidence. The framework that makes those questions askable is AI psychology, and the framework is AI-IARA. Both have existed for less than a year.

Three. The infrastructure does not exist. There is no FDA-equivalent for AI mental-health products in most jurisdictions. There is no mandatory clinical evidence pack at the point of sale. There is no insurance underwriter requiring an audit before coverage. Until one of those changes, the responsibility lives with the buyer and the deploying organisation.

The buyer playbook for the next twelve months

Three actions the procurement team can take this quarter.

Run the AI-IARA self-assessment on every AI mental-health product in your benefits or services portfolio. 15 minutes per product. Produces a risk dashboard you can take to legal and to the board. The tool is at /ai-iara-audit. The buyer-facing methodology is at AI-driven assessments.
Ask every active vendor for the five-layer evidence pack. Construct, calibration, cohort, drift, contestability. Set a 60-day response deadline. Vendors who cannot produce all five have fundamentally not done the work; the procurement decision is then about risk tolerance, not capability.
Pause any deployment of a generative AI therapy bot to a vulnerable population until the audit is in. The cost of pausing is small. The cost of being the named defendant in a class action with a discoverable evidence gap is large enough that no procurement team's bonus survives it.

The discipline behind this is named

This is not a compliance argument. It is a measurement argument. The discipline that takes the measurement seriously is AI psychology. When the AI is a longitudinal model of a user's mental state, it is also a wellbeing digital twin, and the additional drift and contestability requirements apply.

The class-action wave is twelve months out. The audit that prevents it is available today, and it does not require the vendor's permission to run. It just requires the buyer to ask.

Prof. Llewellyn E. van Zyl (Ph.D)

Chief Solutions Architect

Psynalytics

Prof. Llewellyn E. van Zyl (Ph.D) is the leading voice in AI psychology. He designs, measures, and assures AI systems that make decisions about human beings.

You Might Also Enjoy

More articles on ai governance and related topics

AI Governance

Construct Drift: The Silent Failure Mode in Deployed AI Assessment

Construct drift is the gradual shift in what an AI assessment is actually measuring after deployment, even when the model weights are frozen. It is the most expensive failure mode in deployed people-impact AI, and almost no one is watching for it.

2 May 20263 min read

Read Article

AI Governance

When Digital Twins Become Digital Liabilities

A digital twin is a continuous model of a person. Built without psychometric scaffolding, it accumulates errors for years before anyone notices. The audit-defensible alternative is named, and it is not subtle.

2 May 20263 min read

Read Article

AI Governance

Why Most AI Assessment Tools Would Fail a Basic Validity Audit

After auditing AI assessment tools across hiring, wellbeing, and performance management, the pattern is consistent. The validity gap is enormous. The marketing is confident. And the buyer rarely asks the right questions. Here are the five they should.

2 May 20264 min read

Read Article

See more articles

The Science Behind Safe AI

Weekly insights on Artificial Intelligence, Wellbeing science, and the psychology of trustworthy systems. Join 1,000+ forward thinking professionals.