A Foundational Scope of Practice

AI System Psychology

AI System Psychology is the discipline that measures an AI system as a behavioural subject and protects human agency across the full life of a deployment. With Prof. Leon De Beer, I am defining its scope of practice.

Abstract brand visual for AI System Psychology: a precise teal measurement lattice meeting an organic lime form, representing the two halves of the discipline, measuring AI and protecting human agency.
Why Now

The empirical signature is already in the data

Before I argue for a new discipline, look at four findings from the last two years. Each one is a measurable loss of a human capacity that AI was supposed to support, and not one of them was read by a psychologist tasked with reading it.

Education

Skill retention after the AI was removed

0 pp
With AI tutorTool withdrawn

High-school students who practised mathematics with an unguarded AI tutor scored 17 percentage points below their peers once the tool was taken away. The capacity was borrowed, and the loan came due.

Bastani et al., 2025, PNAS

Clinical

Unaided adenoma detection rate

0 pp
Pre-AI exposurePost-AI exposure

Endoscopists' unaided detection rate fell from 28.4 percent to 22.4 percent after routine exposure to AI-assisted colonoscopy. That is the kind of drop that triggers an urgent safety review in any other clinical adjunct.

Budzyń et al., 2025, Lancet Gastroenterology and Hepatology

Diagnostic

Accuracy turns on trust calibration

Higher vs lower
diagnostic accuracy
gapJudged AI correctlyDid not

Physicians who correctly judged when to trust the AI reached substantially higher diagnostic accuracy than those who did not, on the same task with the same AI output. The controllable variable lives on the human side.

Sakamoto et al., 2024, JMIR Formative Research

Social

Mental-health chatbot safety screening

0 / 29
0 of 29 passed

Across 29 mental-health chatbots evaluated against the Columbia Suicide Severity Rating Scale, none met the adequate-response threshold. Emotional dependence on commercial chatbots is now a documented harm.

Pichowicz et al., 2025; Laestadius et al., 2024

The Hybrid Framing

Psychology of AI systems, and psychology for AI systems

In plain terms, I do two jobs at once: I study the AI as a subject whose behaviour can be measured, and I protect the people on the receiving end of it. AI System Psychology works both sides of the human-AI relation at the same time. Each side on its own is useful research. Neither side on its own finishes the job the discipline is being asked to do.

The boundary

I claim that these systems show stable, measurable behavioural regularities. I do not claim they have minds, beliefs, or experience. The position is methodological, and its boundary is explicit.

The Disciplinary Niche

Adjacent fields each cover part of the work. None covers all of it.

I am not inventing this from nothing. Several adjacent fields each cover a slice of what AI System Psychology must do. I show what each one contributes and where it stops, because the gap between them is the discipline.

  • Machine Psychology. Inherits psychometric methods. I take its warrant that AI behaviour is a proper object of psychological assessment, then add the practitioner identity and the accountability structure it was never designed to carry. The relationship is the one experimental cognitive psychology has to clinical neuropsychology.
  • HCI and Human-AI Interaction. Inherits user-side design. I adopt its trust-calibration and human-AI design methods at the Deploying stage, then go beyond them by treating the AI as a psychometric subject in its own right.
  • Cyberpsychology. Inherits the specialty-recognition lesson. I answer the reasons it stalled by construction: a dual unit of analysis, an HCI overlap bounded by the lifecycle, and an applied arm specified through the six stages, AI-IARA, and Agency Debt.
  • AI Ethics. Inherits the normative anchor. Ethics anchors the normative position. I supply the assessment instrument and the intervention that the principles literature does not produce. We are complementary, not redundant.
  • AI Safety and Alignment. Inherits frontier-risk vigilance. I bring the measurement theory the eval enterprise lacks, and the human-side intervention practice the safety enterprise does not produce. RLHF and Constitutional AI are behavioural interventions, and I evaluate them as such.
  • Industrial and Organisational Psychology. Inherits the scope-of-practice template. I follow the same move that built the clinical, counselling, and I-O specialties: apply psychology's apparatus to a new target population with its own problems, competencies, and regulatory anchor. Here the target is dual.

The integration

AI System Psychology

Unit of analysis: AI behaviour and the humans interacting with it, across the lifecycle

Six-stage applied scope of practice
The Scope of Practice

Six lifecycle stages, from Architecting to Retiring

The work of an AI System Psychologist is organised around six lifecycle stages, each with its own inputs, activities, outputs, and a success criterion you can hold me to. Open any stage to see what the work is.

Stage 1

Architecting

design phase

The conceptual design phase, before any code is written.

Inputs
  • Product brief
  • Target user population
  • Use-case definition
  • Regulatory and ethical constraints
  • Organisational context
Activities
  • Behavioural target specification
  • Multi-agent decomposition with psychological logic
  • Digital-twin construct architecture
  • Persona and character design
  • AI-IARA capacity-impact pre-assessment
Outputs
  • System Psychological Specification
  • Multi-agent orchestration map
  • Digital-twin construct dictionary
  • Persona and interaction-protocol brief
  • Capacity-impact pre-assessment
Success criterion

Behavioural targets are measurable and falsifiable, and agency-preservation choices are explicit in the architecture rather than retrofitted.

What I bring here that the rest of the team cannot

I translate complex psychological processes into computational sub-tasks before engineering decisions foreclose what the system can become. After build, those decisions are hard to recover.

Stage 2

Building

training phase

Stage 3

Evaluating

validation phase

Stage 4

Deploying

rollout phase

Stage 5

Monitoring

operations phase

Stage 6

Retiring

sunset phase

Four cross-cutting dimensions surface in every stage

Model

The AI as a behavioural entity with measurable traits and dispositions.

System

The composed deployment artefact: orchestration, digital twins, multi-agent structure.

Governance

Ethics, regulation, and AI-IARA capacity protection.

Organisational

Workforce, deployment context, and change management.

The six stages are sequential, but in production they are not strictly linear. They overlap, iterate, and feed back into one another as systems are revised, retrained, and redeployed.

StageScopeSuccess criterionDistinctive contribution
1. ArchitectingThe conceptual design phase, before any code is written.Behavioural targets are measurable and falsifiable, and agency-preservation choices are explicit in the architecture rather than retrofitted.I translate complex psychological processes into computational sub-tasks before engineering decisions foreclose what the system can become. After build, those decisions are hard to recover.
2. BuildingSystem development and model training, where reinforcement learning from human feedback and Constitutional AI do the real psychological work.The trained model meets its behavioural targets within tolerance, and AI-IARA-relevant behaviours stay within bounds across training stages.I treat RLHF and Constitutional AI as behavioural interventions, structurally close to operant shaping and to cognitive behavioural therapy. They need the evaluative discipline those analogies carry, not engineering judgement alone.
3. EvaluatingPre-deployment psychometric characterisation. The stage where this discipline differs most from current AI safety practice.Profiles are reproducible across independent investigators, benchmarks pass construct-validity standards, and reports carry calibrated uncertainty rather than headline accuracy alone.The AI community has built an enormous eval infrastructure without measurement theory. I am the practitioner who supplies it. A benchmark labelled reasoning has to defend the inference from item performance to the construct, exactly as a depression inventory must.
4. DeployingIntegration into the organisational context, where the psychological work shifts from the system to the people around it.The workforce meets a literacy floor, trust calibration is verifiably present rather than assumed, and baseline AI-IARA capacities are measured before exposure so drift can be detected later.Whether AI augments or undermines expert performance is decided here, not at training. Two teams using the same model can produce very different outcomes, and the variance lives at deployment.
5. MonitoringLongitudinal surveillance during deployment. The stage where Agency Debt accumulates or is repaid.Drift is detected within a specified latency, capability stays within bounds across all six AI-IARA strata, and tail prevalence stays below the clinical threshold.I treat routine deployment as an ongoing measurement problem, not a settled engineering outcome. The signal is in the longitudinal data, and reading it has not until now been anyone's job.
6. RetiringDecommissioning and successor specification, the stage current AI governance most neglects.The protocol respects user dependency, agency restoration is documented in measurable outcomes, and the archive enables successor continuity.Removing an AI system is a psychological transition, not a technical end-of-life. Withdrawal without scaffolding produces measurable harm, so a badly managed retirement is its own source of Agency Debt.
The Backbone

AI-IARA: six capacities of human agency

Everything in the lifecycle is held together by one framework. AI-IARA names human agency as six capacities. Each one is trainable. Each one is erodable through routine AI use. And each one is a construct my psychometric toolkit can measure, at the level of an individual and a population.

AI-IARASix capacities of human agency
AI-IARAFramework
Awareness
Interpretation
Intention
Action
Relational Agency
Autonomy

The theoretical lineage

These capacities are not philosophical primitives. They are constructs anchored in established psychology.

The capabilities approach

Sen, Nussbaum

Wellbeing is the freedom to do and be what one has reason to value, and that freedom needs specific functional capacities, not abstract entitlements.

Human agency

Bandura

Agency is the exercise of intentional influence over one's own functioning and life circumstances. This is what separates agentic capacity from passive disposition.

Self-determination theory

Ryan and Deci

Autonomy, competence, and relatedness are basic psychological needs whose support or thwarting is consequential for wellbeing.

The Accountability Metric

Agency Debt: what I hold my own practice to

Agency Debt is my one-number answer to a plain question: how much of a person's own capability has quietly eroded while they leaned on the AI, and for whom. If you measure one thing about a deployed AI system, measure this. Formally, it is the cumulative shortfall in the six capacities against a pre-exposure baseline, measured at fixed points in time and floored at zero so a recovery in one place never hides a loss in another. I report it three ways on purpose, because any single number conceals part of the picture a governance body needs.

Illustrative scenario
Baseline1 of 3
BaselineWave 1Wave 2

Per-capacity vs baseline

Current capacity on a 0 to 100 scale across the six AI-IARA strata

AwarenessInterpretationIntentionActionRelationalAutonomy

Compound Agency Debt index

0.00out of 40040

Low debt

Pre-exposure baseline. Nothing has been borrowed yet.

Tail prevalence

Below thresholdCapacity scoreAbove threshold

The share of users with at least one capacity below the clinical threshold. An average can stay green while this tail accumulates.

Quantity 1

Per-capacity population-mean shortfall

For each capacity c, the average gap between where a user started (baseline B) and where they are now (current C), with recovery floored at zero. This is the FGT poverty-gap measure (the same maths economists use to size how far the poor fall below a poverty line) moved from income to capacity space.

It preserves which capacity is degrading, which is the information a clinician needs to target an intervention.

Show the maths

Quantity 2

Capacity-weighted compound index

One weighted summary figure for governance reporting and cross-deployment comparison. The weights are derived by expert consensus or by regressing functional outcomes on capacity scores, not assumed equal.

It deliberately loses information for the sake of comparability, so the six-vector must stay available beside it.

Show the maths

Quantity 3

Tail-prevalence statistic

The proportion of users with at least one capacity below a clinical threshold. This is the union-deprivation headcount (the share of people who fall short on at least one dimension) from multidimensional poverty measurement.

The compound index is an average, so it can stay green while a clinically significant tail accumulates. The same logic drives pharmacovigilance to track adverse events, not means alone.

Show the maths
The Practitioner

A T-shape: broad technical literacy, three deep pillars

A scope of practice is empty without a competency model. The AI System Psychologist is a T-shape: broad working literacy across the AI technical stack, crossed with three deep specialty pillars, each mapped to the lifecycle stages where it does the most work.

Breadth

Technical literacy

Enough depth to specify requirements, evaluate outputs, and intervene credibly at the design table. Not equivalence with a machine-learning engineer.

Contemporary AI architecturesRLHF and Constitutional AIMechanistic interpretabilityEval methodology and benchmark designPrompt and prompt-flow architectureThe training-to-deployment toolchain
1Architecting2Building3Evaluating4Deploying5Monitoring6Retiring

The value sits in the intersection of breadth and depth, not in either alone. Deep psychometrics without technical breadth produces a researcher who cannot sit at the design table. Technical breadth without psychometric depth produces another safety engineer.

PrimaryContributory
Will It Hold Up

Five objections, answered directly

I would rather anticipate the strongest critiques than wait for them. Here are five, including the one I find most serious, and the terms on which I answer each.

We are the last generation of psychologists who will have studied humans whose minds were not yet substantially formed under conditions of routine AI exposure. The difference is measurable. The gap is intervenable. And the responsibility is psychology's, whether the field accepts it or not.

Prof. Llewellyn E. van Zyl (Ph.D)

Prof. Llewellyn E. van Zyl (Ph.D)

Chief Solutions Architect, Psynalytics

People Also Ask

Common questions about AI System Psychology

Read the work, run the audit, or talk to me

AI System Psychology is a foundational proposal, co-authored with Prof. Leon De Beer. Read the AI-IARA paper for the full argument, run the AI-IARA audit on your own system in about fifteen minutes, or contact me to discuss a deployment across the lifecycle.