Neural Immune System

The Neural Immune System (services/neural_immune/, 9 files, ~1,800 lines) is a self-learning service supervisor — 3 micro neural networks (~18.8K parameters total) that learn what "normal" looks like and detect deviations.

What It Monitors

Every Frank service is tracked: restart frequency, RSS memory, CPU usage, response times, port availability, error rates. Unlike a traditional watchdog (which checks "is it running?"), the immune system checks "is it behaving normally?"

3 Neural Networks

Network	Parameters	Function
Anomaly Detector	~7K	Classifies service behavior as normal/anomalous based on learned patterns
Pattern Classifier	~6K	Categorizes anomaly types: memory leak, restart loop, port conflict, latency spike
Severity Estimator	~5.8K	Scores 0-1 how serious an anomaly is — drives response priority

Service Registry

Each service has a known-good profile: expected port, typical RSS memory range, normal restart frequency, acceptable CPU%. The immune system compares real-time metrics against these baselines and flags deviations.

Real-World Catches

In evaluation sessions, the immune system detected:

llama3-gpu: 303 restarts in 51 minutes (scored 1.0 severity — maximum)
aura-headless: 421 restart attempts (zombie port conflict with Swarm)
Memory leaks: Gradual RSS growth in services using importlib.reload()

Resource Constraints

The immune system itself runs under strict limits:

CPUQuota=5% — must be lightweight
MemoryMax=300M — hard memory cap
Type=notify with sd_notify — proper systemd integration

sd_notify Fix

All subprocess calls use _CLEAN_ENV (strips NOTIFY_SOCKET environment variable) to prevent child processes from stealing the parent's PID notification. Without this fix, systemd thought the service had died whenever a child process ran.