The silent failure
Construct drift is the most expensive failure mode in deployed people-impact AI. Expensive because it accumulates silently for years. Silent because the model weights have not changed; the dashboards still report the same numbers; the team running the system has no reason to suspect anything is wrong.
What has changed is the relationship between the construct the system claims to measure and what the system is actually measuring. The score still says wellbeing. What it now measures is engagement with the wellbeing tool. Same number, different meaning, and the gap keeps growing.
How drift works mechanically
Three steps, repeating in a loop:
- The system produces inferences about people. The inferences shape interventions, recommendations, decisions.
- The interventions change the people's behaviour. Subtly, often imperceptibly, but cumulatively across the population.
- The new behaviour feeds back into the system as data. The system updates its operational priors. The construct it is now modelling is not the construct it was validated for.
Frozen model weights do not save you here. The drift is in the data and in the operational interpretation, not in the model. A model that has not been retrained for two years can be measuring something materially different from what it measured at launch.
Why annual re-validation is not drift monitoring
Most generic AI governance frameworks ask for annual re-validation. The AI assessment vendor produces a report, the procurement file checks a box, the system runs for another year. The cadence assumes drift is slow and detectable retrospectively.
Construct drift is fast and detectable only with the right signals. By the time an annual report shows it, the system has been making materially different decisions about people for months. The audit-defensible cadence is continuous: signals watched at the daily, weekly, and monthly horizons, with thresholds that trigger pause or rollback.
The full methodology lives in the AI-IARA framework; the buyer-facing walkthrough is at AI-driven assessments.
The signals that actually catch drift
Five signals, on a tiered watch cadence:
- Inter-rater divergence. When the AI score diverges systematically from human-rated samples on the same construct. Watch weekly with a sampling protocol.
- Response-pattern shift. When the distribution of inputs to the system shifts in ways the validation set did not anticipate. Watch daily.
- KPI decoupling. When the outcome the score is supposed to predict and the score itself stop tracking each other. Watch monthly.
- Subgroup-specific divergence. When drift is happening only in certain populations, indicating differential drift on top of any baseline drift. Watch monthly with subgroup stratification.
- Contestability rate. When the rate at which people contest scores rises sharply, even before any other signal moves. Watch continuously; spikes are an early-warning lead indicator.
Named owners and rollback authority
Drift signals without owners catch nothing. The defensible deployment names a single person accountable for the watch, with the authority to pause the system, retrain the model, or roll back to the previous version. The owner has explicit thresholds (defined in advance, not negotiated under pressure) at which each action fires.
Procurement contracts that do not name the human owner of drift monitoring are a red flag. Without a named owner, the watch is theatre.
If you are deploying an AI assessment today
Three questions, in this order. One: which of the five signals are we watching, on what cadence? Two: what are the thresholds at which we pause or roll back, and who has the authority to call them? Three: when did we last validate that the construct we are measuring is still the construct we validated at launch?
If you cannot answer all three, the system is operating outside any defensible AI assurance regime. The fix is not complicated; it is just rarely prioritised. The AI-IARA self-assessment produces the gap analysis in 15 minutes. The discipline behind the framework is AI psychology.












