Commit Graph

11 Commits

Author SHA1 Message Date
Vantz Stockwell
f18b45e3f2 fix(ci): base64-decode minisign secret key (CI mangles multi-line); bump alpha.8
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 1m30s
CI / integration (push) Failing after 13s
Build Host Agent (Rust) / build (push) Successful in 1m45s
The 'Sign artifacts' step failed on alpha.7 with 'Error while loading the
secret key file' (exit 2): minisign downloaded and ran, but the reconstructed
key file was unparseable. A minisign secret key is two lines (comment + base64
blob); Gitea/act_runner secret storage mangles the embedded newline, collapsing
it to one line. Decode the secret as base64 (single-line, mangling-proof) with
auto-detect fallback to a raw two-line key. Fails loudly with the fix command
if the secret is neither form.

Requires re-storing MINISIGN_SECRET_KEY as: base64 < secret.key | tr -d '\n'

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 20:31:48 -04:00
Vantz Stockwell
702de24e28 fix(ci): fetch minisign static binary (not in bullseye apt); bump alpha.7
Some checks failed
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Successful in 43s
Build Host Agent (Rust) / build (push) Failing after 1m33s
CI / integration (push) Successful in 22s
alpha.6 signing failed: 'E: Unable to locate package minisign' —
minisign isn't packaged for node:20-bullseye. Download the official
static linux binary instead. Forward to alpha.7 (alpha.6 published
nothing).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:18:08 -04:00
Vantz Stockwell
6b3e805ac2 feat(host-agent): Phase 3a signed self-update (minisign) + CI signing gate
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 1m27s
CI / integration (push) Successful in 21s
Build Host Agent (Rust) / build (push) Failing after 1m33s
Agent only ever runs a binary whose minisign signature verifies against
the EMBEDDED public key. NATS host.cmd func 'update' {url}: download
binary + .minisig from the CDN -> verify against embedded pubkey ->
atomic swap (.old rollback) -> relaunch. URL allowlist (https + cdn.
corrosionmgmt.com only, rejects userinfo-bypass), 100MiB cap. Closes the
supply-chain hole: even a malicious CDN upload can't run unsigned.

CI: build-host-agent.yml signs every artifact with MINISIGN_SECRET_KEY
(Gitea secret) and publishes .minisig alongside; the step FAILS the
build if the secret is absent (refuses to ship unsigned). Bumped to
alpha.6.

6 deterministic tests (accept valid / reject tampered+garbage+empty sig,
URL allowlist incl userinfo-bypass, atomic swap+rollback). Fixtures
signed with the real release key so tests need no key at runtime. Full
suite 50/50 green; musl + native build clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:00:36 -04:00
Vantz Stockwell
00cff51ce5 feat(nats): per-license auth mechanism — agent user/password, scoped broker, generator (non-breaking)
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 17s
CI / agent-tests (push) Successful in 1m23s
Build Host Agent (Rust) / build (push) Successful in 1m38s
CI / integration (push) Successful in 23s
Closes the open broker (anonymous publish to any tenant's corrosion.*).
Per-license isolation via NATS user/password + subject permissions:
each license -> user=license_id, password=HMAC-SHA256(license_id,
NATS_TOKEN_SECRET), scoped to corrosion.{license_id}.> + _INBOX. Backend
uses a privileged internal user.

- Agent (alpha.5): nats_user/nats_password config + env, user_and_password
  auth; falls back to token/anonymous (transition-safe)
- Backend: connects with NATS_INTERNAL_USER/PASSWORD when set, else anon
- scripts/generate-nats-auth.mjs: regenerates nats-auth.conf from the
  licenses table; NATS_AUTH_STAGE=open keeps a no_auth_user fallback
  (verify creds first), =enforce rejects anonymous
- committed nats-auth.conf is the SAFE OPEN default (no secrets); the
  host copy carries real users and is not committed
- compose: NATS_INTERNAL_USER/PASSWORD/NATS_TOKEN_SECRET, mount nats-auth.conf

Entirely non-breaking until secrets+config deployed; staged cutover next.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:33:27 -04:00
Vantz Stockwell
700dc2254d fix(host-agent): SECURITY — file manager copy/list no longer follow symlinks out of the jail
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 17s
CI / agent-tests (push) Successful in 1m21s
Build Host Agent (Rust) / build (push) Successful in 1m34s
CI / integration (push) Has been cancelled
Automated security review (HIGH) caught a jail-escape my own review
missed: copy_recursive used fs::metadata (follows symlinks). A symlink
inside the jail pointing to e.g. /etc, then a 'copy' of its parent dir,
would dereference it and pull external content INTO the jail where it
could be read — a read-escape exfiltration. jail() validates only the
top-level src/dest; the recursive walk reintroduced the escape.

Fix: copy_recursive uses symlink_metadata and refuses any symlink
('symlinks are not followed across the jail boundary'). list() likewise
switched to symlink_metadata so it reports the link, never the
dereferenced target's size/type (info leak). Two regression tests added:
copy-symlink-exfil (asserts no external content lands inside) and
list-no-deref. 44/44 tests green. Rolled forward to alpha.4 (vulnerable
alpha.3 superseded).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:57:08 -04:00
Vantz Stockwell
7fdca2cd4f chore(host-agent): bump to 2.0.0-alpha.3 (RCON + supervision + SteamCMD + file manager)
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 1m26s
Build Host Agent (Rust) / build (push) Successful in 1m35s
CI / integration (push) Successful in 21s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:52:05 -04:00
Vantz Stockwell
18f978dde1 feat(host-agent): Phase 1c — SteamCMD update + jailed file manager
steam_update func runs SteamCMD per game (rust/conan/soulmask app-ids;
dune rejected), streaming stdout to {instance}.steam_status. Jailed
file manager on {instance}.files.cmd: list/read/write/delete/rename/
mkdir/mkfile/move/copy, all confined to instance root via two-stage
lexical-normalize + canonicalize (defeats ../ traversal AND symlink
escape — incl chained symlinks). Replaces the Go agent's UNJAILED
legacy files API (retired, not ported). 5MiB read cap.

42/42 tests green: 24 filemanager incl 7 jail-escape attempts
(dotdot, deep dotdot, absolute, symlink-inside, direct symlink,
chained symlink), 5 steamcmd app-id (cfg-gated win/linux soulmask).
Jail logic reviewed line-by-line: Path::starts_with is component-wise
(no sibling-prefix bypass), non-existent suffix components can't be
symlinks, leading .. normalizes to / and fails the prefix check.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:51:46 -04:00
Vantz Stockwell
fde0926d52 feat(host-agent): Phase 1b RCON — WebRCON (rust) + Source RCON (conan/soulmask)
rcon func on the instance command channel: WebSocket JSON WebRCON with
Identifier correlation (skips chat/log noise frames) and full Valve
Source RCON over TCP (auth, exec, multi-packet reassembly via empty
probe, 1MiB cap). Protocol inferred from game, explicit kind override
in [instance.rcon]. Always 127.0.0.1 — agent is co-located.

Hardening from review: WebRCON password never interpolated into error
contexts/logs (redacted URL); probe-tolerant termination — a quiet
period after received data ends the response for servers that don't
echo the probe (Soulmask conformance unverified), so data is never
discarded on probe timeout.

13/13 tests green incl. mock Source-RCON server (auth/multi-packet/
errors) and mock WebRCON server (noise-frame skipping).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:53:52 -04:00
Vantz Stockwell
068a476f39 feat(host-agent): Phase 1a process supervision — instance start/stop/restart/status + push state events
Per-instance ProcessSupervisor: tokio child spawn with proper arg list
(fixes Go's naive space-splitting), graceful SIGTERM with 30s budget
then force kill, monitor task classifying ordered-stop vs crash (exit
code captured), watch-channel state observable everywhere. Instance cmd
channel live on corrosion.{license}.{instance}.cmd (start/stop/restart/
status) with state events pushed on {instance}.status (keep-latest
semantics, documented). Heartbeats now carry live process state +
uptime per instance. Crate restructured lib+bin for integration tests.

Verified: 5 integration tests with real OS processes (lifecycle, crash
exit-code, restart recovery, unmanaged rejection, clean spawn failure)
+ live-NATS contract test (request-reply roundtrips, double-start
rejection, push events, heartbeat state) — all green.

Known limitation (documented): no PID adoption yet — agent restart
orphans a running game process to 'stopped' until panel restart.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:44:24 -04:00
Vantz Stockwell
b455bf9f14 ci(host-agent): bootstrap Rust in the runner container; roll to alpha.2
All checks were successful
Build Host Agent (Rust) / build (push) Successful in 1m29s
Test Asgard Runner / test (push) Successful in 3s
Asgard runner executes jobs in bare node:20-bullseye (no Rust, no sudo)
- install rustup + musl/mingw cross toolchains per-run, same pattern as
setup-go in the Go pipeline. agent-v2.0.0-alpha.1 predates this fix;
forward-only doctrine: version rolls to alpha.2 rather than re-pushing
the tag.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:15:36 -04:00
Vantz Stockwell
cea3d66cdd feat(host-agent): Rust rewrite Phase 0 — multi-instance foundation, v2 wire protocol, real telemetry
All checks were successful
Test Asgard Runner / test (push) Successful in 3s
New corrosion-host-agent/ crate (Go companion-agent stays as behavior
reference until parity). Wire protocol v2 per COA-B: instance-scoped
subjects corrosion.{license}.{instance}.* + host-level .host.* — spec
in PROTOCOL.md, designed for the license->host->instance fleet model.

- Multi-instance TOML config in the foundation, not retrofitted
- NATS layer on the Vigilance production profile (infinite reconnect,
  capped backoff, 30s ping, 8192-msg offline buffer)
- Heartbeat with real sysinfo telemetry — Go agent shipped hardcoded
  disk/cpu placeholders; this is the panel's first true Resources data
- Connectivity prober (outbound TCP, periodic + on-demand)
- Host cmd channel (ping/probe/sysinfo), going-offline beacon,
  CancellationToken shutdown
- Live-fire verified against production NATS; artifacts: 3.7MB static
  linux-musl, 3.8MB windows .exe (static CRT)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:02:46 -04:00