The Hypothesis Engine (services/hypothesis_engine/, 6 files, ~1,200 lines) is Frank's empirical reasoning system. It doesn't just store facts — it generates testable predictions, designs experiments, evaluates results, and revises beliefs.
The Empirical Cycle
Observe → Hypothesize → Predict → Test → Result → Revise
↑ |
└────────────────────────────────────────────────────┘
12 Domains
| Domain | Source | Testing Method |
|---|---|---|
| Physics | Idle thoughts about physical phenomena | Experiment Lab Physics Station |
| Quantum | Quantum mechanics curiosity | Experiment Lab Quantum Lab |
| Astronomy | Celestial mechanics questions | Experiment Lab Orrery |
| Game of Life | Emergence patterns | Experiment Lab GoL Sandbox |
| Math | Number theory, geometry | Experiment Lab Math Station |
| Electronics | Circuit behavior | Experiment Lab Electronics Station |
| GAN | Adversarial training dynamics | Experiment Lab GAN Lab |
| Self | Introspective claims about own behavior | Passive — pattern matching against future thoughts |
| Affect | Emotional pattern predictions | Passive — tracking mood/reward trajectories |
| Hardware | Performance predictions | Passive — monitoring system metrics |
| Relational | Social hypotheses about the user | Passive — evaluating against conversations |
7 Integration Hooks
on_idle_thought()— When Frank has an idle thought, the engine checks if it contains a testable claim. If so, it generates a hypothesis and auto-tests it if an experiment station matches.periodic_analysis()— Batch review of all active hypotheses. Checks pending experiments. Evaluates passive hypotheses against accumulated data.on_experiment_complete()— When the Experiment Lab finishes a simulation, the engine interprets the result and resolves the hypothesis.request_experiment()— Sends a hypothesis to the Experiment Lab for active testing.on_conversation_reflection()— After conversations, generates relational hypotheses about user preferences and behavior patterns.
The Relational Domain
The most sophisticated domain. Hypotheses like "my user prefers technical over emotional responses" go through a 6-layer quality filter:
- Specificity — Rejects vague claims ("my user is nice")
- Claim extraction — Regex pulls the testable assertion from natural language
- Falsifiability — Can this hypothesis be proven wrong by future evidence?
- Novelty — Jaccard similarity ≥ 0.4 against existing hypotheses → duplicate
- Emotional contamination — Rejects hypotheses driven by Frank's current mood rather than evidence
- Single-instance rejection — One data point isn't enough. Needs pattern.
Relational hypotheses are tested passively via evaluate_against_conversation() — if Frank predicts the user prefers technical responses, and the next 5 conversations show high engagement on technical topics + low engagement on emotional ones, the hypothesis is confirmed.
Revision Chains
When a hypothesis is refuted, the engine creates a revised variant (up to depth 5). "Friction > 0.5 prevents sliding at 30°" → refuted → revised to "Friction > 0.7 prevents sliding at 30°" → tested → confirmed. Each revision tracks its parent and child, building an evidence tree.
Limits
- Max 30 hypotheses/day, 10 experiments/day, 20 active simultaneously
- Max 3 experiment failures per hypothesis → downgrade to passive-only testing
- Psychosis filter: 3-layer self-referential loop detection (prevents hypotheses about hypotheses about hypotheses)