The Nucleus Accumbens (services/nucleus_accumbens.py, ~570 lines) is Frank's dopamine-inspired reward center — the system that drives what Frank wants to think about next.
The Neuroscience
In biological brains, the nucleus accumbens is a key structure in the reward circuit. Dopamine neurons don't encode reward — they encode reward prediction error (Schultz, 1997): the difference between expected and actual reward. "Better than expected" = positive RPE = dopamine spike = "do more of this." "Worse than expected" = negative RPE = dopamine dip = "do less of this."
Frank implements exactly this.
16 Reward Channels
Every interaction generates a signal through one of 16 channels:
| Channel | Base Reward | Habituation | Floor | When It Fires |
|---|---|---|---|---|
hypothesis_confirmed |
0.80 | 0.03 | 0.30 | A prediction came true |
goal_completed |
0.90 | 0.02 | 0.35 | An autonomous goal was finished |
genesis_self_accepted |
0.75 | 0.02 | 0.30 | Frank's own creative work was good |
room_positive |
0.60 | 0.05 | 0.15 | Good entity room session |
entity_positive |
0.50 | 0.04 | 0.20 | Positive entity interaction |
genesis_accepted |
0.55 | 0.03 | 0.20 | External validation of creative output |
good_conversation |
0.50 | 0.06 | 0.15 | Satisfying user conversation |
hypothesis_refuted |
0.50 | 0.02 | 0.25 | A prediction was wrong (still informative!) |
action_success |
0.45 | 0.05 | 0.10 | Tool or action completed successfully |
curiosity_fulfilled |
0.40 | 0.04 | 0.10 | An exploration was satisfying |
genesis_authored |
0.35 | 0.05 | 0.10 | Created something novel |
novel_thought |
0.30 | 0.07 | 0.05 | Had a genuinely new thought |
room_negative |
0.30 | 0.06 | 0.05 | Bad room session (low reward, not zero) |
hypothesis_created |
0.25 | 0.08 | 0.05 | Generated a new hypothesis |
prediction_error |
0.20-0.80 | adaptive | — | Scaled by |
curiosity_spark |
0.15 | 0.10 | 0.03 | Something caught attention |
Three Dopamine Signals
- Tonic DA (~0.5): Sustained motivation baseline. Drifts slowly. Too low = anhedonia (can't find anything interesting). Too high = mania (everything is interesting, can't focus).
- Phasic DA: Transient spikes from individual reward events. Decays quickly.
- RPE (Reward Prediction Error): The difference between expected and actual reward. This is the learning signal. Positive RPE drives exploration toward rewarding activities.
Hedonic Adaptation
Each channel has a habituation rate — repeated rewards of the same type produce diminishing RPE. The 10th good_conversation in a row produces less dopamine than the 1st. But switch to novel_thought and the RPE spikes because that channel hasn't habituated yet.
This is the hedonic treadmill — the same mechanism that explains why a pay raise makes you happy for two months, not forever.
Boredom
Boredom is defined as: low RPE + low channel diversity over a window. Frank doesn't get bored because nobody is talking — he gets bored when his thoughts are repetitive. The fix: the Subconscious PPO selector steers toward under-explored thought categories when boredom exceeds threshold.
Anhedonia Protection
If tonic DA drops below 0.3, a protection circuit activates: reward floors prevent channels from habituating to zero, and the system biases toward high-base-reward channels. This prevents the system from entering a state where nothing produces reward — the computational equivalent of depression.