Files
corrosion-admin-panel/corrosion-host-agent/README.md
Vantz Stockwell 068a476f39 feat(host-agent): Phase 1a process supervision — instance start/stop/restart/status + push state events
Per-instance ProcessSupervisor: tokio child spawn with proper arg list
(fixes Go's naive space-splitting), graceful SIGTERM with 30s budget
then force kill, monitor task classifying ordered-stop vs crash (exit
code captured), watch-channel state observable everywhere. Instance cmd
channel live on corrosion.{license}.{instance}.cmd (start/stop/restart/
status) with state events pushed on {instance}.status (keep-latest
semantics, documented). Heartbeats now carry live process state +
uptime per instance. Crate restructured lib+bin for integration tests.

Verified: 5 integration tests with real OS processes (lifecycle, crash
exit-code, restart recovery, unmanaged rejection, clean spawn failure)
+ live-NATS contract test (request-reply roundtrips, double-start
rejection, push events, heartbeat state) — all green.

Known limitation (documented): no PID adoption yet — agent restart
orphans a running game process to 'stopped' until panel restart.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:44:24 -04:00

2.0 KiB

Corrosion Host Agent

Rust rewrite of the Go companion agent (companion-agent/, retained as the behavior reference until parity). One agent per machine supervises every game instance on that host — Rust, Conan Exiles, Soulmask, Dune: Awakening.

Status — Phase 0

  • Multi-instance TOML config + env overrides (CORROSION_LICENSE_ID, CORROSION_NATS_URL, CORROSION_NATS_TOKEN)
  • NATS connection (infinite reconnect, capped backoff, 30s ping, offline send-buffering, tls:// support)
  • Host heartbeat with real telemetry (sysinfo: CPU, memory, disks) — no fabricated values
  • Connectivity prober (outbound TCP, periodic + on-demand)
  • Host command channel (ping, probe, sysinfo)
  • Graceful shutdown (cancellation token, going-offline beacon, NATS flush)
  • Phase 1a: process supervision — per-instance start/stop/restart/status over {instance}.cmd request-reply, push state events on {instance}.status, crash detection with exit codes, live state in heartbeats (integration-tested with real processes + live-NATS contract test)
  • Phase 1b: RCON trait (WebRCON rust / TCP conan+soulmask), SteamCMD, jailed file manager
  • Phase 2: Dune Docker adapter (compose lifecycle, RabbitMQ bus, Postgres admin)
  • Phase 3: signed self-update (enforced ed25519 — release gate), service install, supervisor split

Build

cargo build --release                                    # native
cargo build --release --target x86_64-unknown-linux-gnu  # linux deploy target
cargo build --release --target x86_64-pc-windows-msvc    # windows (cargo-xwin on non-Windows)

Run

corrosion-host-agent --config ./agent.toml         # foreground
corrosion-host-agent --config ./agent.toml check   # validate config only
corrosion-host-agent version                       # semver + git hash + build ts