feat(host-agent): Phase 1a process supervision — instance start/stop/restart/status + push state events

Per-instance ProcessSupervisor: tokio child spawn with proper arg list
(fixes Go's naive space-splitting), graceful SIGTERM with 30s budget
then force kill, monitor task classifying ordered-stop vs crash (exit
code captured), watch-channel state observable everywhere. Instance cmd
channel live on corrosion.{license}.{instance}.cmd (start/stop/restart/
status) with state events pushed on {instance}.status (keep-latest
semantics, documented). Heartbeats now carry live process state +
uptime per instance. Crate restructured lib+bin for integration tests.

Verified: 5 integration tests with real OS processes (lifecycle, crash
exit-code, restart recovery, unmanaged rejection, clean spawn failure)
+ live-NATS contract test (request-reply roundtrips, double-start
rejection, push events, heartbeat state) — all green.

Known limitation (documented): no PID adoption yet — agent restart
orphans a running game process to 'stopped' until panel restart.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Vantz Stockwell
2026-06-11 10:44:24 -04:00
parent f706c3c47e
commit 068a476f39
13 changed files with 669 additions and 44 deletions

View File

@@ -26,6 +26,9 @@ anyhow = "1"
clap = { version = "4.5", features = ["derive"] }
rand = "0.8"
[target.'cfg(unix)'.dependencies]
libc = "0.2"
# Size-optimized release: single static binary living next to RAM-heavy game
# servers. Panic stays 'unwind' so a panicking task surfaces through its
# JoinHandle instead of killing the whole agent.