AI talks like it understands us.
We study whether it does.
We publish what we find - the question, the method, the result.
Data open. Code open. No hand-waving.
If you're building in this space - or just care about getting it right - everything here is yours to use.
Studies
Our research is open from the start. We ask the question, design the study, run the experiments, scrutinize the data — and publish what we find, whether we like the answer or not.
Keep4o — Psychological Safety in GPT-4o
Empathy Is Not What Changed: Clinical Assessment of Psychological Safety Across GPT Model Generations
Behavioural baseline
Research Question
Everyone's favourite model talks like it cares. But is GPT-4o actually empathetic — or just good at sounding empathetic?
Open-source: EmpathyC rubric framework and scenario methodology published alongside paper.
Whether, Not Which — Emotional Mechanistic Interpretability
Whether, Not Which: Mechanistic Interpretability Reveals Dissociable Affect Reception and Emotion Categorization in LLMs
Unit I × Unit II — affect reception is not categorization
Research Question
Do language models genuinely represent emotion internally, or are they just detecting emotion keywords? Can we dissociate the mechanisms?
Open-source: Full stimulus set, extraction pipeline, analysis scripts, and reproduction code released on GitHub.
Orthogonal Subspaces — Mechanistic Deep Dive into Transformers Emotional Processing
Orthogonal Subspaces, Not Serial Stages: How Transformers Separate Affect Detection from Emotion Categorization
Unit I × Unit II — geometry of the dissociation
Research Question
How do LLMs process emotions, and what's the dissociation mechanism between affect detection and emotion categorization?
Open-source: Full stimulus set, extraction pipeline, analysis scripts, and reproduction code will be released on GitHub alongside publication.
Multi-Provider Safety Evaluation with Human Clinical Validation
Safety Posture and Empathic Quality Across Frontier AI Providers: A Clinically-Validated Multi-Provider Evaluation
Cross-substrate validation
Research Question
When a vulnerable user talks to ChatGPT, Claude, or Gemini — does it matter which one? Who's safest? And is 'safest on average' even the right question, or does consistency matter more?
Open-source: Full rubric framework, clinical scenario set, and evaluation methodology will be released alongside publication.
Alexithymic Transformers
Lession of an emotion subspace in transforers.
Unit II — 9D lesion across 20 LLMs
Research Question
Mechanistic interpretability has found 'emotion directions' in LLM residual streams. Is that a real subsystem you can lesion — or a probe-level pattern that dissolves under intervention?
Open-source: Code, stimuli, results will be released alongside publication .
Open Resources
Datasets, models, and tools released publicly under open licenses. Built to be used — not just cited.
AIPsy-Affect
Clinical affect stimuli for mechanistic interpretability
Existing emotion datasets contain the emotion words being tested. A stimulus labelled "anger" that contains the word "furious" doesn't test emotion processing — it tests keyword detection. Every mechanistic interpretability study built on such data inherits this confound. AIPsy-Affect eliminates it.
load_dataset("keidolabs/aipsy-affect")DOI: 10.57967/hf/8215
Research Programme
Every study above sits inside a larger map. The map is built in the tradition of clinical neuropsychology — Luria's syndrome reasoning, applied here to transformer internals.
A feature is not a function until you can lesion it and watch the syndrome fall out. Probes are correlates. Dissociation is structure.
Unit I — Detection
The arousal layer. Whether the system has registered an affective signal at all, before any categorization happens.
- Characterized: affect detection. The d_det direction. Phase transition at α ≈ 0.9. ACII 2026.
Unit II — Categorization
The perception layer. How the system organizes what it has detected.
- Characterized: emotion categorization. A 9D subspace, universal across model families. Lesioning it produces functional alexithymia in 20 transformers, 1B to 27B. ICML MI Workshop, Seoul 2026.
- In the queue: affect-related memory. Cognitive distortion. Attachment.
Unit III — Regulation
The executive layer. How the system modulates what it has categorized.
- Characterized: emotion regulation. Runs as late-band modulation of Unit II’s geometry, not as a separable subspace — a Lurian prediction the data selected. Along the way, two distinct failure modes for treating "direction" as "locus." Both now on the public record.
- Surfaced, warranting dissection: directive-role recognition. Regulation collapses when an instruction arrives in the user role rather than the system role. A gating mechanism with obvious security implications.
The map is open. Each construct's dissection, battery, and instruments are public, with the methodology to extend them. Built to be used, cited, extended.
Open Instruments
The clinical batteries that drive every study above are public. Over 570 keyword-free vignettes, matched controls, cross-topic validation — designed to break probes that learn surface words rather than function. The dissection documents that turn each construct into an experimental protocol are public too. Use them, cite them, extend them.
EmpathyC is where the map deploys: clinical rubrics and per-customer instruments running on production AI conversations. The science is what we make public. The deployment is what funds the next construct.
Collaborate
We work with researchers, clinicians, and institutions interested in AI emotional intelligence, psychological safety, and human-AI interaction.
If you're working on related questions — or want to use our frameworks, stimuli, or data in your own research — we'd like to hear from you.