WIKI/Safety & Constraints/Neural Immune System
Safety & Constraints

Neural Immune System

The Neural Immune System (services/neural_immune/, 9 files, ~1,800 lines) is a self-learning service supervisor — 3 micro neural networks (~18.8K parameters total) that learn what "normal" looks like and detect deviations.

What It Monitors

Every Frank service is tracked: restart frequency, RSS memory, CPU usage, response times, port availability, error rates. Unlike a traditional watchdog (which checks "is it running?"), the immune system checks "is it behaving normally?"

3 Neural Networks

Network Parameters Function
Anomaly Detector ~7K Classifies service behavior as normal/anomalous based on learned patterns
Pattern Classifier ~6K Categorizes anomaly types: memory leak, restart loop, port conflict, latency spike
Severity Estimator ~5.8K Scores 0-1 how serious an anomaly is — drives response priority

Service Registry

Each service has a known-good profile: expected port, typical RSS memory range, normal restart frequency, acceptable CPU%. The immune system compares real-time metrics against these baselines and flags deviations.

Real-World Catches

In evaluation sessions, the immune system detected:

  • llama3-gpu: 303 restarts in 51 minutes (scored 1.0 severity — maximum)
  • aura-headless: 421 restart attempts (zombie port conflict with Swarm)
  • Memory leaks: Gradual RSS growth in services using importlib.reload()

Resource Constraints

The immune system itself runs under strict limits:

  • CPUQuota=5% — must be lightweight
  • MemoryMax=300M — hard memory cap
  • Type=notify with sd_notify — proper systemd integration

sd_notify Fix

All subprocess calls use _CLEAN_ENV (strips NOTIFY_SOCKET environment variable) to prevent child processes from stealing the parent's PID notification. Without this fix, systemd thought the service had died whenever a child process ran.

MORE IN SAFETY & CONSTRAINTS

← ALL ARTICLES