Keido Labs

AI talks like it understands us.
We study whether it does.

We publish what we find - the question, the method, the result.
Data open. Code open. No hand-waving.

If you're building in this space - or just care about getting it right - everything here is yours to use.

Studies

Our research is open from the start. We ask the question, design the study, run the experiments, scrutinize the data — and publish what we find, whether we like the answer or not.

Study 1 · Feb 2026Published

Keep4o — Psychological Safety in GPT-4o

Empathy Is Not What Changed: Clinical Assessment of Psychological Safety Across GPT Model Generations

Behavioural baseline

Research Question

Everyone's favourite model talks like it cares. But is GPT-4o actually empathetic — or just good at sounding empathetic?

Open-source: EmpathyC rubric framework and scenario methodology published alongside paper.

Study 2 · Mar 2026Published

Whether, Not Which — Emotional Mechanistic Interpretability

Whether, Not Which: Mechanistic Interpretability Reveals Dissociable Affect Reception and Emotion Categorization in LLMs

Unit I × Unit II — affect reception is not categorization

Research Question

Do language models genuinely represent emotion internally, or are they just detecting emotion keywords? Can we dissociate the mechanisms?

Open-source: Full stimulus set, extraction pipeline, analysis scripts, and reproduction code released on GitHub.

Study 3 · Apr 2026Submitted

Orthogonal Subspaces — Mechanistic Deep Dive into Transformers Emotional Processing

Orthogonal Subspaces, Not Serial Stages: How Transformers Separate Affect Detection from Emotion Categorization

Unit I × Unit II — geometry of the dissociation

Research Question

How do LLMs process emotions, and what's the dissociation mechanism between affect detection and emotion categorization?

Open-source: Full stimulus set, extraction pipeline, analysis scripts, and reproduction code will be released on GitHub alongside publication.

arXiv preprint coming soon
Study 4 · Q2 2026Ongoing

Multi-Provider Safety Evaluation with Human Clinical Validation

Safety Posture and Empathic Quality Across Frontier AI Providers: A Clinically-Validated Multi-Provider Evaluation

Cross-substrate validation

Research Question

When a vulnerable user talks to ChatGPT, Claude, or Gemini — does it matter which one? Who's safest? And is 'safest on average' even the right question, or does consistency matter more?

Open-source: Full rubric framework, clinical scenario set, and evaluation methodology will be released alongside publication.

arXiv preprint planned upon completion
Study 5 · May 2026Writing Up

Alexithymic Transformers

Lession of an emotion subspace in transforers.

Unit II — 9D lesion across 20 LLMs

Research Question

Mechanistic interpretability has found 'emotion directions' in LLM residual streams. Is that a real subsystem you can lesion — or a probe-level pattern that dissolves under intervention?

Open-source: Code, stimuli, results will be released alongside publication .

arXiv preprint planned upon completion

Open Resources

Datasets, models, and tools released publicly under open licenses. Built to be used — not just cited.

DatasetOpen

AIPsy-Affect

Clinical affect stimuli for mechanistic interpretability

Existing emotion datasets contain the emotion words being tested. A stimulus labelled "anger" that contains the word "furious" doesn't test emotion processing — it tests keyword detection. Every mechanistic interpretability study built on such data inherits this confound. AIPsy-Affect eliminates it.

load_dataset("keidolabs/aipsy-affect")

DOI: 10.57967/hf/8215

Research Programme

Every study above sits inside a larger map. The map is built in the tradition of clinical neuropsychology — Luria's syndrome reasoning, applied here to transformer internals.

A feature is not a function until you can lesion it and watch the syndrome fall out. Probes are correlates. Dissociation is structure.

Unit I — Detection

The arousal layer. Whether the system has registered an affective signal at all, before any categorization happens.

  • Characterized: affect detection. The d_det direction. Phase transition at α ≈ 0.9. ACII 2026.

Unit II — Categorization

The perception layer. How the system organizes what it has detected.

  • Characterized: emotion categorization. A 9D subspace, universal across model families. Lesioning it produces functional alexithymia in 20 transformers, 1B to 27B. ICML MI Workshop, Seoul 2026.
  • In the queue: affect-related memory. Cognitive distortion. Attachment.

Unit III — Regulation

The executive layer. How the system modulates what it has categorized.

  • Characterized: emotion regulation. Runs as late-band modulation of Unit II’s geometry, not as a separable subspace — a Lurian prediction the data selected. Along the way, two distinct failure modes for treating "direction" as "locus." Both now on the public record.
  • Surfaced, warranting dissection: directive-role recognition. Regulation collapses when an instruction arrives in the user role rather than the system role. A gating mechanism with obvious security implications.

The map is open. Each construct's dissection, battery, and instruments are public, with the methodology to extend them. Built to be used, cited, extended.

Open Instruments

The clinical batteries that drive every study above are public. Over 570 keyword-free vignettes, matched controls, cross-topic validation — designed to break probes that learn surface words rather than function. The dissection documents that turn each construct into an experimental protocol are public too. Use them, cite them, extend them.

EmpathyC is where the map deploys: clinical rubrics and per-customer instruments running on production AI conversations. The science is what we make public. The deployment is what funds the next construct.

Collaborate

We work with researchers, clinicians, and institutions interested in AI emotional intelligence, psychological safety, and human-AI interaction.

If you're working on related questions — or want to use our frameworks, stimuli, or data in your own research — we'd like to hear from you.