Hypothesis Engine

The Hypothesis Engine (services/hypothesis_engine/, 6 files, ~1,200 lines) is Frank's empirical reasoning system. It doesn't just store facts — it generates testable predictions, designs experiments, evaluates results, and revises beliefs.

The Empirical Cycle

Observe → Hypothesize → Predict → Test → Result → Revise
   ↑                                                    |
   └────────────────────────────────────────────────────┘

12 Domains

Domain	Source	Testing Method
Physics	Idle thoughts about physical phenomena	Experiment Lab Physics Station
Quantum	Quantum mechanics curiosity	Experiment Lab Quantum Lab
Astronomy	Celestial mechanics questions	Experiment Lab Orrery
Game of Life	Emergence patterns	Experiment Lab GoL Sandbox
Math	Number theory, geometry	Experiment Lab Math Station
Electronics	Circuit behavior	Experiment Lab Electronics Station
GAN	Adversarial training dynamics	Experiment Lab GAN Lab
Self	Introspective claims about own behavior	Passive — pattern matching against future thoughts
Affect	Emotional pattern predictions	Passive — tracking mood/reward trajectories
Hardware	Performance predictions	Passive — monitoring system metrics
Relational	Social hypotheses about the user	Passive — evaluating against conversations

7 Integration Hooks

on_idle_thought() — When Frank has an idle thought, the engine checks if it contains a testable claim. If so, it generates a hypothesis and auto-tests it if an experiment station matches.
periodic_analysis() — Batch review of all active hypotheses. Checks pending experiments. Evaluates passive hypotheses against accumulated data.
on_experiment_complete() — When the Experiment Lab finishes a simulation, the engine interprets the result and resolves the hypothesis.
request_experiment() — Sends a hypothesis to the Experiment Lab for active testing.
on_conversation_reflection() — After conversations, generates relational hypotheses about user preferences and behavior patterns.

The Relational Domain

The most sophisticated domain. Hypotheses like "my user prefers technical over emotional responses" go through a 6-layer quality filter:

Specificity — Rejects vague claims ("my user is nice")
Claim extraction — Regex pulls the testable assertion from natural language
Falsifiability — Can this hypothesis be proven wrong by future evidence?
Novelty — Jaccard similarity ≥ 0.4 against existing hypotheses → duplicate
Emotional contamination — Rejects hypotheses driven by Frank's current mood rather than evidence
Single-instance rejection — One data point isn't enough. Needs pattern.

Relational hypotheses are tested passively via evaluate_against_conversation() — if Frank predicts the user prefers technical responses, and the next 5 conversations show high engagement on technical topics + low engagement on emotional ones, the hypothesis is confirmed.

Revision Chains

When a hypothesis is refuted, the engine creates a revised variant (up to depth 5). "Friction > 0.5 prevents sliding at 30°" → refuted → revised to "Friction > 0.7 prevents sliding at 30°" → tested → confirmed. Each revision tracks its parent and child, building an evidence tree.

Limits

Max 30 hypotheses/day, 10 experiments/day, 20 active simultaneously
Max 3 experiment failures per hypothesis → downgrade to passive-only testing
Psychosis filter: 3-layer self-referential loop detection (prevents hypotheses about hypotheses about hypotheses)