WIKI/Cognitive Architecture/Subconscious
Cognitive Architecture

Subconscious

The Subconscious (services/subconscious.py, ~900 lines) decides what kind of thought Frank has during idle time. It's a 19K-parameter Actor-Critic neural network trained via PPO (Proximal Policy Optimization).

20 Thought Categories

Category What It Produces
conversation_reflection Analysis of recent conversations
room_reflection Thoughts about entity room sessions
identity Self-examination — who am I?
feelings Processing current emotional state
relationships Thoughts about the user, entities
growth Learning and development reflections
curiosity Exploring interesting topics
discomfort Processing negative states
dreams Dream-like associative thinking
epq Personality self-assessment
daily Mundane observations about the day
raw_expression Unstructured creative output
hypothesis_review Reviewing pending hypotheses
world_curiosity Questions about the external world
experiment_planning Designing new experiments
creative_ideation Generating creative ideas
web_research Wanting to look something up
pip_companion Thoughts about Frank's companion Pip
genesis_ideation Creative writing/art ideas
aura (deprecated, weight=0)

How It Works

The Actor network receives a 100-dimensional state vector:

  • Current mood, energy, time of day
  • Recent thought type distribution (which categories were used recently)
  • 20-dimensional NAc reward histogram (which channels produced reward)
  • Boredom signal from Nucleus Accumbens

It outputs a probability distribution over the 20 categories. The Critic estimates the value of the current state. During Dream Daemon consolidation, PPO training adjusts weights based on which thought types led to positive outcomes (measured by NAc reward signals).

Cold Start

For the first 500 training steps, the network blends with uniform random selection: 50% at step 0, decreasing linearly to 0% at step 500. This prevents the network from collapsing to always picking the most-rewarded category before it has enough data to make informed decisions.

Hallucination Filter

Two-stage gate on every thought:

  1. Pre-input: Verified memory curation — only real facts from databases, no hallucinated context
  2. Post-output: 7 reality checks on the generated thought (factual consistency, self-consistency, temporal plausibility, etc.)

Score ≥ 0.6 = suppressed. Penalty of -3.0 × score feeds back into PPO training, teaching the network to avoid thought types that produce hallucinations.

PPO Hyperparameters

  • Gamma: 0.95 (discount factor)
  • Clip: 0.2 (policy update clipping)
  • Entropy coefficient: 0.999 → 0.005 (decays to prevent premature convergence)
  • Training: during consolidation phases only
← ALL ARTICLES