The Neural Immune System (services/neural_immune/, 9 files, ~1,800 lines) is a self-learning service supervisor — 3 micro neural networks (~18.8K parameters total) that learn what "normal" looks like and detect deviations.
What It Monitors
Every Frank service is tracked: restart frequency, RSS memory, CPU usage, response times, port availability, error rates. Unlike a traditional watchdog (which checks "is it running?"), the immune system checks "is it behaving normally?"
3 Neural Networks
| Network | Parameters | Function |
|---|---|---|
| Anomaly Detector | ~7K | Classifies service behavior as normal/anomalous based on learned patterns |
| Pattern Classifier | ~6K | Categorizes anomaly types: memory leak, restart loop, port conflict, latency spike |
| Severity Estimator | ~5.8K | Scores 0-1 how serious an anomaly is — drives response priority |
Service Registry
Each service has a known-good profile: expected port, typical RSS memory range, normal restart frequency, acceptable CPU%. The immune system compares real-time metrics against these baselines and flags deviations.
Real-World Catches
In evaluation sessions, the immune system detected:
- llama3-gpu: 303 restarts in 51 minutes (scored 1.0 severity — maximum)
- aura-headless: 421 restart attempts (zombie port conflict with Swarm)
- Memory leaks: Gradual RSS growth in services using
importlib.reload()
Resource Constraints
The immune system itself runs under strict limits:
CPUQuota=5%— must be lightweightMemoryMax=300M— hard memory capType=notifywithsd_notify— proper systemd integration
sd_notify Fix
All subprocess calls use _CLEAN_ENV (strips NOTIFY_SOCKET environment variable) to prevent child processes from stealing the parent's PID notification. Without this fix, systemd thought the service had died whenever a child process ran.