AI Governance

AI Governance for People-Impact Systems: Why Generic Frameworks Don't Apply

Generic AI governance frameworks treat a hiring tool, a credit-scoring tool, and a maintenance optimiser as if they were the same thing. They are not. People-impact AI needs psychometric assurance built into the governance, not bolted on after.

by Prof. Llewellyn E. van Zyl (Ph.D)2 May 20263 min read

Back to Blog

As featured on

Key Takeaways

Generic AI governance frameworks (NIST RMF, ISO 42001, parts of the EU AI Act) treat people-impact AI like industrial AI. The audit objects are not the same.
Three failure modes: adverse impact ratios alone are not fairness, periodic re-validation alone is not drift monitoring, and human-in-the-loop without contestability is theatre.
The Validity Stack (construct, calibration, cohort, drift, contestability) is the people-impact specialisation that generic governance frameworks lack.

Where governance frameworks stop

Generic AI governance frameworks treat people-impact AI like every other AI. NIST AI RMF, ISO 42001, even most of the EU AI Act high-risk operative provisions read as if a hiring tool, a credit-scoring tool, and an industrial maintenance optimiser were the same thing under the hood. They are not. The audit object is different. The failure modes are different. The evidence the regulator and the litigator will ask for is different.

When governance teams adopt a generic framework and apply it uniformly across people-impact AI, the result is a documentation set that satisfies the auditor's checklist and misses the actual risk. The system passes the procedural review and harms people quietly at scale.

What's different about people-impact AI

Three things, all rooted in measurement. Psychometric measurement constraints do not apply to a robot arm. Construct validity, measurement invariance, differential prediction, and contestability are required for systems that decide about humans, optional or absent for systems that decide about machines.

A predictive maintenance algorithm that misjudges a bearing wears the bearing. A predictive hiring algorithm that misjudges a candidate ends a career. The mathematics may be similar; the audit object is not.

Three failure modes of generic frameworks

First failure: adverse impact alone is not fairness.

Most procurement teams accept an adverse impact ratio of 0.8 as evidence of fairness. It is not. Adverse impact can be inside the threshold while measurement invariance fails across demographic groups, which means the same score signals different latent levels of the construct. Differential prediction can over- or under-predict performance for any group, which means the score is biased even when the selection ratio is balanced. Generic governance asks for the ratio. People-impact governance asks for both invariance and differential prediction.

Second failure: re-validation is not monitoring.

Generic frameworks ask for periodic re-validation, usually annually. Annual re-validation catches almost nothing in a system with feedback loops. Construct drift, proxy collapse, and feedback contamination operate on a weekly or monthly timescale. The drift layer requires named owners, named signals, and rollback authority. None of this lives in NIST RMF or ISO 42001 in any operationalised form for people-impact AI.

Third failure: human-in-the-loop without contestability is theatre.

Most generic frameworks satisfy the human-oversight requirement with a single human reviewer at the top of the decision pipeline. The reviewer rubber-stamps the AI score in 95% of cases. The reviewer's role is not contestability; it is approval. Contestability is the affected person's procedural right to see, question, and appeal the AI's decision. Without it, the human-in-the-loop is a documentation artefact, not an audit defence.

What people-impact governance requires

The five-layer Validity Stack is the specialisation. Construct, Calibration, Cohort, Drift, Contestability. Every people-impact AI system in deployment should produce evidence at each layer, on a continuous cadence, with a named owner per layer.

The methodology is documented on the AI psychology hub. The buyer-facing version, with a worked example, is on AI-driven assessments. The interactive self-assessment is at the AI-IARA audit tool.

Stop using bare AI governance

The phrase AI governance, used unqualified, is too generic to be defensible. It signals to procurement that the team has a checklist; it signals to the regulator that the team understands its category; it does neither. Call the work what it actually is. AI governance for people-impact systems. Or AI governance for human-decision AI. Or psychometric AI assurance. The qualifier is the difference between governance theatre and audit-defensible evidence.

If you are running a generic AI governance programme and your portfolio includes hiring, performance, wellbeing, or assessment AI, you have the wrong tool for the job. The same governance team can do the work; the audit object has to change.

What to do next

If you are scoping a programme, the engagement that produces the audit-ready evidence pack is AI assurance for people-impact systems. It runs four to eight weeks per system in scope.

If you are evaluating one specific system, the AI-IARA self-assessment will tell you in 15 minutes whether the gap is small enough to fix in-house or large enough to need a formal audit.

Prof. Llewellyn E. van Zyl (Ph.D)

Chief Solutions Architect

Psynalytics

Prof. Llewellyn E. van Zyl (Ph.D) is the leading voice in AI psychology. He designs, measures, and assures AI systems that make decisions about human beings.

You Might Also Enjoy

More articles on ai governance and related topics

AI Governance

When AI Becomes Your Therapist: The Audit Nobody Is Running

AI therapy bots are moving from beta product to clinical product without the validation any other clinical tool would require. The class-action wave is twelve months out. Here is what an AI-IARA audit catches before it lands.

2 May 20264 min read

Read Article

AI Governance

Construct Drift: The Silent Failure Mode in Deployed AI Assessment

Construct drift is the gradual shift in what an AI assessment is actually measuring after deployment, even when the model weights are frozen. It is the most expensive failure mode in deployed people-impact AI, and almost no one is watching for it.

2 May 20263 min read

Read Article

AI Governance

When Digital Twins Become Digital Liabilities

A digital twin is a continuous model of a person. Built without psychometric scaffolding, it accumulates errors for years before anyone notices. The audit-defensible alternative is named, and it is not subtle.

2 May 20263 min read

Read Article

See more articles

The Science Behind Safe AI

Weekly insights on Artificial Intelligence, Wellbeing science, and the psychology of trustworthy systems. Join 1,000+ forward thinking professionals.