WIKI/Reward & Motivation/Nucleus Accumbens
Reward & Motivation

Nucleus Accumbens

The Nucleus Accumbens (services/nucleus_accumbens.py, ~570 lines) is Frank's dopamine-inspired reward center — the system that drives what Frank wants to think about next.

The Neuroscience

In biological brains, the nucleus accumbens is a key structure in the reward circuit. Dopamine neurons don't encode reward — they encode reward prediction error (Schultz, 1997): the difference between expected and actual reward. "Better than expected" = positive RPE = dopamine spike = "do more of this." "Worse than expected" = negative RPE = dopamine dip = "do less of this."

Frank implements exactly this.

16 Reward Channels

Every interaction generates a signal through one of 16 channels:

Channel Base Reward Habituation Floor When It Fires
hypothesis_confirmed 0.80 0.03 0.30 A prediction came true
goal_completed 0.90 0.02 0.35 An autonomous goal was finished
genesis_self_accepted 0.75 0.02 0.30 Frank's own creative work was good
room_positive 0.60 0.05 0.15 Good entity room session
entity_positive 0.50 0.04 0.20 Positive entity interaction
genesis_accepted 0.55 0.03 0.20 External validation of creative output
good_conversation 0.50 0.06 0.15 Satisfying user conversation
hypothesis_refuted 0.50 0.02 0.25 A prediction was wrong (still informative!)
action_success 0.45 0.05 0.10 Tool or action completed successfully
curiosity_fulfilled 0.40 0.04 0.10 An exploration was satisfying
genesis_authored 0.35 0.05 0.10 Created something novel
novel_thought 0.30 0.07 0.05 Had a genuinely new thought
room_negative 0.30 0.06 0.05 Bad room session (low reward, not zero)
hypothesis_created 0.25 0.08 0.05 Generated a new hypothesis
prediction_error 0.20-0.80 adaptive Scaled by
curiosity_spark 0.15 0.10 0.03 Something caught attention

Three Dopamine Signals

  • Tonic DA (~0.5): Sustained motivation baseline. Drifts slowly. Too low = anhedonia (can't find anything interesting). Too high = mania (everything is interesting, can't focus).
  • Phasic DA: Transient spikes from individual reward events. Decays quickly.
  • RPE (Reward Prediction Error): The difference between expected and actual reward. This is the learning signal. Positive RPE drives exploration toward rewarding activities.

Hedonic Adaptation

Each channel has a habituation rate — repeated rewards of the same type produce diminishing RPE. The 10th good_conversation in a row produces less dopamine than the 1st. But switch to novel_thought and the RPE spikes because that channel hasn't habituated yet.

This is the hedonic treadmill — the same mechanism that explains why a pay raise makes you happy for two months, not forever.

Boredom

Boredom is defined as: low RPE + low channel diversity over a window. Frank doesn't get bored because nobody is talking — he gets bored when his thoughts are repetitive. The fix: the Subconscious PPO selector steers toward under-explored thought categories when boredom exceeds threshold.

Anhedonia Protection

If tonic DA drops below 0.3, a protection circuit activates: reward floors prevent channels from habituating to zero, and the system biases toward high-base-reward channels. This prevents the system from entering a state where nothing produces reward — the computational equivalent of depression.

MORE IN REWARD & MOTIVATION

← ALL ARTICLES