7 Commits

Author SHA1 Message Date
Vantz Stockwell
440474290b feat: wire the panel command surface to the live Rust agent + wipe handler
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Successful in 1m35s
Build Host Agent (Rust) / build (push) Successful in 1m48s
CI / integration (push) Successful in 23s
The legacy Go agent was never deployed, so the entire backend command surface
published to a dead cmd.server/cmd.wipe/files.cmd void. Route it all to the
Rust agent's instance-scoped subjects.

Agent (corrosion-host-agent, alpha.10):
- New src/wipe.rs + 'wipe' func on {instance}.cmd: stop -> delete game files by
  type (map/blueprint/full, with optional backup) -> restart. Jailed to the
  instance root, symlink-safe (lstat, no cross-boundary follow — Lesson 26).
  8 tests incl. jail-escape + symlink-skip proofs. Agent suite 64 tests green.

Backend (NestJS):
- InstancesService is now @Global with license-scoped convenience wrappers
  (lifecycleForLicense/rconForLicense/writeFileForLicense/readFileForLicense/
  deleteFileForLicense/wipeForLicense) + resolveDefaultInstance (license ->
  primary instance).
- Routed to the agent: servers start/stop/restart/command; players kick/banid/
  unban via RCON; schedules restart/announce/command/plugin-reload; wipes ->
  wipeForLicense (real wipe now); plugins reload/unload/upload via rcon+file
  ops; all 9 plugin-config module applies -> writeFileForLicense + oxide.reload
  rcon, imports -> readFileForLicense (server:// prefix stripped).
- Honestly gated (need agent funcs not yet built): server deploy-from-panel,
  Oxide install, one-click uMod install -> 503 coming-soon instead of dead
  publishes.

Backend tsc green; agent cargo test green (64).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 22:30:18 -04:00
Vantz Stockwell
d13f2cb8b1 feat(host-agent): Phase 2 — Dune docker-compose adapter via Supervisor trait
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Failing after 35s
CI / integration (push) Has been skipped
Build Host Agent (Rust) / build (push) Successful in 1m45s
Introduce a Supervisor trait (async-trait) so the agent manages games with
different models behind one wire contract. ProcessSupervisor (spawned process:
rust/conan/soulmask) and the new DockerComposeSupervisor (dune) both impl it;
Agent.supervisors is now HashMap<String, Arc<dyn Supervisor>> and instancecmd
dispatch is game-agnostic — start/stop/restart/status identical across games,
selected by a per-game factory in main. InstanceState moved to the shared
supervisor module.

DockerComposeSupervisor drives docker-compose up-d / stop / restart against
the instance's compose project, with -f/-p/single-service support and a
configurable compose binary. New [instance.docker_compose] config block.
First cut = lifecycle + cached state; container crash-detection + restart
adoption deferred to Phase 3b (reconcilable with a compose ps probe).

Trait choice (dyn over enum) per Commander: scales to future planes (kubectl,
AMP/podman, SSH) as new struct+impl, no central match.

56 tests green (6 new docker-compose mock-binary tests + 5 refactored process
tests), zero warnings. Live verification pending a real Dune stack.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 21:33:00 -04:00
Vantz Stockwell
6b3e805ac2 feat(host-agent): Phase 3a signed self-update (minisign) + CI signing gate
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 1m27s
CI / integration (push) Successful in 21s
Build Host Agent (Rust) / build (push) Failing after 1m33s
Agent only ever runs a binary whose minisign signature verifies against
the EMBEDDED public key. NATS host.cmd func 'update' {url}: download
binary + .minisig from the CDN -> verify against embedded pubkey ->
atomic swap (.old rollback) -> relaunch. URL allowlist (https + cdn.
corrosionmgmt.com only, rejects userinfo-bypass), 100MiB cap. Closes the
supply-chain hole: even a malicious CDN upload can't run unsigned.

CI: build-host-agent.yml signs every artifact with MINISIGN_SECRET_KEY
(Gitea secret) and publishes .minisig alongside; the step FAILS the
build if the secret is absent (refuses to ship unsigned). Bumped to
alpha.6.

6 deterministic tests (accept valid / reject tampered+garbage+empty sig,
URL allowlist incl userinfo-bypass, atomic swap+rollback). Fixtures
signed with the real release key so tests need no key at runtime. Full
suite 50/50 green; musl + native build clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:00:36 -04:00
Vantz Stockwell
700dc2254d fix(host-agent): SECURITY — file manager copy/list no longer follow symlinks out of the jail
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 17s
CI / agent-tests (push) Successful in 1m21s
Build Host Agent (Rust) / build (push) Successful in 1m34s
CI / integration (push) Has been cancelled
Automated security review (HIGH) caught a jail-escape my own review
missed: copy_recursive used fs::metadata (follows symlinks). A symlink
inside the jail pointing to e.g. /etc, then a 'copy' of its parent dir,
would dereference it and pull external content INTO the jail where it
could be read — a read-escape exfiltration. jail() validates only the
top-level src/dest; the recursive walk reintroduced the escape.

Fix: copy_recursive uses symlink_metadata and refuses any symlink
('symlinks are not followed across the jail boundary'). list() likewise
switched to symlink_metadata so it reports the link, never the
dereferenced target's size/type (info leak). Two regression tests added:
copy-symlink-exfil (asserts no external content lands inside) and
list-no-deref. 44/44 tests green. Rolled forward to alpha.4 (vulnerable
alpha.3 superseded).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:57:08 -04:00
Vantz Stockwell
18f978dde1 feat(host-agent): Phase 1c — SteamCMD update + jailed file manager
steam_update func runs SteamCMD per game (rust/conan/soulmask app-ids;
dune rejected), streaming stdout to {instance}.steam_status. Jailed
file manager on {instance}.files.cmd: list/read/write/delete/rename/
mkdir/mkfile/move/copy, all confined to instance root via two-stage
lexical-normalize + canonicalize (defeats ../ traversal AND symlink
escape — incl chained symlinks). Replaces the Go agent's UNJAILED
legacy files API (retired, not ported). 5MiB read cap.

42/42 tests green: 24 filemanager incl 7 jail-escape attempts
(dotdot, deep dotdot, absolute, symlink-inside, direct symlink,
chained symlink), 5 steamcmd app-id (cfg-gated win/linux soulmask).
Jail logic reviewed line-by-line: Path::starts_with is component-wise
(no sibling-prefix bypass), non-existent suffix components can't be
symlinks, leading .. normalizes to / and fails the prefix check.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:51:46 -04:00
Vantz Stockwell
fde0926d52 feat(host-agent): Phase 1b RCON — WebRCON (rust) + Source RCON (conan/soulmask)
rcon func on the instance command channel: WebSocket JSON WebRCON with
Identifier correlation (skips chat/log noise frames) and full Valve
Source RCON over TCP (auth, exec, multi-packet reassembly via empty
probe, 1MiB cap). Protocol inferred from game, explicit kind override
in [instance.rcon]. Always 127.0.0.1 — agent is co-located.

Hardening from review: WebRCON password never interpolated into error
contexts/logs (redacted URL); probe-tolerant termination — a quiet
period after received data ends the response for servers that don't
echo the probe (Soulmask conformance unverified), so data is never
discarded on probe timeout.

13/13 tests green incl. mock Source-RCON server (auth/multi-packet/
errors) and mock WebRCON server (noise-frame skipping).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:53:52 -04:00
Vantz Stockwell
068a476f39 feat(host-agent): Phase 1a process supervision — instance start/stop/restart/status + push state events
Per-instance ProcessSupervisor: tokio child spawn with proper arg list
(fixes Go's naive space-splitting), graceful SIGTERM with 30s budget
then force kill, monitor task classifying ordered-stop vs crash (exit
code captured), watch-channel state observable everywhere. Instance cmd
channel live on corrosion.{license}.{instance}.cmd (start/stop/restart/
status) with state events pushed on {instance}.status (keep-latest
semantics, documented). Heartbeats now carry live process state +
uptime per instance. Crate restructured lib+bin for integration tests.

Verified: 5 integration tests with real OS processes (lifecycle, crash
exit-code, restart recovery, unmanaged rejection, clean spawn failure)
+ live-NATS contract test (request-reply roundtrips, double-start
rejection, push events, heartbeat state) — all green.

Known limitation (documented): no PID adoption yet — agent restart
orphans a running game process to 'stopped' until panel restart.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:44:24 -04:00