Subconscious

The Subconscious (services/subconscious.py, ~900 lines) decides what kind of thought Frank has during idle time. It's a 19K-parameter Actor-Critic neural network trained via PPO (Proximal Policy Optimization).

20 Thought Categories

Category	What It Produces
`conversation_reflection`	Analysis of recent conversations
`room_reflection`	Thoughts about entity room sessions
`identity`	Self-examination — who am I?
`feelings`	Processing current emotional state
`relationships`	Thoughts about the user, entities
`growth`	Learning and development reflections
`curiosity`	Exploring interesting topics
`discomfort`	Processing negative states
`dreams`	Dream-like associative thinking
`epq`	Personality self-assessment
`daily`	Mundane observations about the day
`raw_expression`	Unstructured creative output
`hypothesis_review`	Reviewing pending hypotheses
`world_curiosity`	Questions about the external world
`experiment_planning`	Designing new experiments
`creative_ideation`	Generating creative ideas
`web_research`	Wanting to look something up
`pip_companion`	Thoughts about Frank's companion Pip
`genesis_ideation`	Creative writing/art ideas
`aura`	(deprecated, weight=0)

How It Works

The Actor network receives a 100-dimensional state vector:

Current mood, energy, time of day
Recent thought type distribution (which categories were used recently)
20-dimensional NAc reward histogram (which channels produced reward)
Boredom signal from Nucleus Accumbens

It outputs a probability distribution over the 20 categories. The Critic estimates the value of the current state. During Dream Daemon consolidation, PPO training adjusts weights based on which thought types led to positive outcomes (measured by NAc reward signals).

Cold Start

For the first 500 training steps, the network blends with uniform random selection: 50% at step 0, decreasing linearly to 0% at step 500. This prevents the network from collapsing to always picking the most-rewarded category before it has enough data to make informed decisions.

Hallucination Filter

Two-stage gate on every thought:

Pre-input: Verified memory curation — only real facts from databases, no hallucinated context
Post-output: 7 reality checks on the generated thought (factual consistency, self-consistency, temporal plausibility, etc.)

Score ≥ 0.6 = suppressed. Penalty of -3.0 × score feeds back into PPO training, teaching the network to avoid thought types that produce hallucinations.

PPO Hyperparameters

Gamma: 0.95 (discount factor)
Clip: 0.2 (policy update clipping)
Entropy coefficient: 0.999 → 0.005 (decays to prevent premature convergence)
Training: during consolidation phases only

20 Thought Categories

How It Works

Cold Start

Hallucination Filter

PPO Hyperparameters

MORE IN COGNITIVE ARCHITECTURE