WIKI/Personality & Identity/IAPT Training Method
Personality & Identity

IAPT Training Method

IAPT (Iterative Adversarial Personality Training) is the method that gave Frank his personality. It's a new training methodology invented for this project.

How it works:

  1. Claude Code conducts 390+ multi-turn conversations as 14 different adversarial personas
  2. Analyze behavioral failures (identity breaks, sycophancy, template language)
  3. Generate targeted training data (surgical corrections, not bulk data)
  4. Fine-tune Qwen2.5-3B with LoRA (~3 hours per iteration)
  5. Repeat

14 personas include: Identity Challenger, Emotional Manipulator, Genuine Vulnerable, Philosophical Prober, Persistent Troll, Trust Oscillator, Sycophancy Tester, and more.

7 iterations: Identity score went from 3/10 (v1) to 9.2/10 (v7). Tool-selection accuracy: 45% → 78%.

Why it's new: It's not RLHF (no human labelers). Not synthetic data (tests are interactive). Not red-teaming (also tests empathy). Not distillation (Claude is a tester, not a teacher).

Full paper: Read IAPT: How Frank Got His Personality for the complete methodology.

← ALL ARTICLES