Commit Graph

216 Commits

Author SHA1 Message Date
Vantz Stockwell
215355d1cb fix(security): prevent RCON command injection in player kick/ban/unban (HIGH)
Some checks failed
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Failing after 29s
CI / integration (push) Has been skipped
Player id and ban reason flowed unsanitized into the single-line RCON command,
so a control char (newline/CR) in 'reason' could break the framing and inject a
second console command — an RBAC-escalation vector (a Moderator-role user could
run arbitrary RCON via the ban reason field).

- validate player id against a safe token charset /^[A-Za-z0-9_.:-]{1,64}$/ and
  reject otherwise (multi-game safe — not a Rust-only SteamID64 regex, so
  Conan/Funcom and Dune ids still pass)
- strip C0 control chars from reason, collapse whitespace, cap at 200 chars
- coerce ban duration to a non-negative integer

Flagged by automated commit security review. Backend tsc green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 22:36:44 -04:00
Vantz Stockwell
440474290b feat: wire the panel command surface to the live Rust agent + wipe handler
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Successful in 1m35s
Build Host Agent (Rust) / build (push) Successful in 1m48s
CI / integration (push) Successful in 23s
The legacy Go agent was never deployed, so the entire backend command surface
published to a dead cmd.server/cmd.wipe/files.cmd void. Route it all to the
Rust agent's instance-scoped subjects.

Agent (corrosion-host-agent, alpha.10):
- New src/wipe.rs + 'wipe' func on {instance}.cmd: stop -> delete game files by
  type (map/blueprint/full, with optional backup) -> restart. Jailed to the
  instance root, symlink-safe (lstat, no cross-boundary follow — Lesson 26).
  8 tests incl. jail-escape + symlink-skip proofs. Agent suite 64 tests green.

Backend (NestJS):
- InstancesService is now @Global with license-scoped convenience wrappers
  (lifecycleForLicense/rconForLicense/writeFileForLicense/readFileForLicense/
  deleteFileForLicense/wipeForLicense) + resolveDefaultInstance (license ->
  primary instance).
- Routed to the agent: servers start/stop/restart/command; players kick/banid/
  unban via RCON; schedules restart/announce/command/plugin-reload; wipes ->
  wipeForLicense (real wipe now); plugins reload/unload/upload via rcon+file
  ops; all 9 plugin-config module applies -> writeFileForLicense + oxide.reload
  rcon, imports -> readFileForLicense (server:// prefix stripped).
- Honestly gated (need agent funcs not yet built): server deploy-from-panel,
  Oxide install, one-click uMod install -> 503 coming-soon instead of dead
  publishes.

Backend tsc green; agent cargo test green (64).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
agent-v2.0.0-alpha.10
2026-06-11 22:30:18 -04:00
Vantz Stockwell
6f783bfac8 feat(panel): Beta sweep — multi-game coherence, honesty, UX fixes
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Successful in 45s
CI / integration (push) Successful in 22s
Multi-game rebrand (no more Rust-only leftovers): game-neutral setup wizard +
deploy/store defaults; player-id labels driven by game profile (Steam ID only
for Rust); blueprint wipe type + verify-plugins gated to uMod games; oxide
command examples + Rust-only plugin pages (AutoDoors/FurnaceSplitter/BetterChat)
guarded behind mods==='umod' with empty-states for other games.

Honesty: webstore checkout shows coming-soon (backend now 503s); 'integrated
webstore' marketed as coming-soon; Discord references neutralized to
community/webhook; migration FAQ marked in-development; analytics dev phase
labels removed; Network pricing tier set to Custom/Contact (was a confusing
duplicate of Operator); docs/PRICING.md rewritten to match live subscriptions.

UX/bugs: fixed ServerView oxide-status operator-precedence bug; dead 'Deploy
server' button wired; non-functional topbar search removed; alert()/confirm()
replaced with toasts across schedules/alerts/migration/public store+server;
analytics chart arrays null-guarded; production console.logs gated to DEV.

Frontend build (vue-tsc + vite) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 22:06:10 -04:00
Vantz Stockwell
f2ea415840 fix(api): Beta hardening — real 500 fix, encryption guard, honest payments
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Successful in 1m36s
CI / integration (push) Successful in 23s
- analytics: getMapAnalytics queried map.name but the map_library column is
  display_name (no name column) — every map-analytics call 500'd. Fixed select
  + groupBy to map.display_name.
- setup: guard ENCRYPTION_KEY length before AES-256-GCM createCipheriv — an
  unset key crashed bare-metal setup with an opaque 'Invalid key length' 500;
  now returns a clear 503. Also stop falsely marking bare-metal connected on
  completeSetup; leave offline until the agent's first heartbeat.
- webstore: public checkout returned a FAKE PayPal order token + sandbox URL
  that resolves to nowhere. Refuse honestly with 503 (payments coming soon)
  instead of faking a transaction.
- store: module purchase wrote a fake txn_<ts> implying a charge; record it
  honestly as a free Beta grant (transaction_id=beta-free-grant, amount 0).

Backend tsc green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 21:53:22 -04:00
Vantz Stockwell
d13f2cb8b1 feat(host-agent): Phase 2 — Dune docker-compose adapter via Supervisor trait
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Failing after 35s
CI / integration (push) Has been skipped
Build Host Agent (Rust) / build (push) Successful in 1m45s
Introduce a Supervisor trait (async-trait) so the agent manages games with
different models behind one wire contract. ProcessSupervisor (spawned process:
rust/conan/soulmask) and the new DockerComposeSupervisor (dune) both impl it;
Agent.supervisors is now HashMap<String, Arc<dyn Supervisor>> and instancecmd
dispatch is game-agnostic — start/stop/restart/status identical across games,
selected by a per-game factory in main. InstanceState moved to the shared
supervisor module.

DockerComposeSupervisor drives docker-compose up-d / stop / restart against
the instance's compose project, with -f/-p/single-service support and a
configurable compose binary. New [instance.docker_compose] config block.
First cut = lifecycle + cached state; container crash-detection + restart
adoption deferred to Phase 3b (reconcilable with a compose ps probe).

Trait choice (dyn over enum) per Commander: scales to future planes (kubectl,
AMP/podman, SSH) as new struct+impl, no central match.

56 tests green (6 new docker-compose mock-binary tests + 5 refactored process
tests), zero warnings. Live verification pending a real Dune stack.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
agent-v2.0.0-alpha.9
2026-06-11 21:33:00 -04:00
Vantz Stockwell
651a35d4be docs(reference): import Dune: Awakening server-manager references
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Successful in 39s
CI / integration (push) Successful in 22s
Phase 2 references for the host-agent Dune adapter, moved out of volatile /tmp
into docs/reference-repos/ (per Commander). Three upstream projects, .git +
node_modules + compiled binaries stripped (16MB source). Nested AI-instruction
files (.claude/, CLAUDE.md) removed so they don't pollute Corrosion sessions.

- icehunter/    dune-admin (Go+React) — 4 control planes; SETUP_DOCKER.md is the
                closest analog to our agent's Dune docker control plane (compose
                lifecycle, docker logs, RabbitMQ-via-exec, dune Postgres schema)
- adainrivers/  Rust/Tauri desktop — SSH+k8s BattleGroup control, maintenance
                daemon, in-game admin console (Rust idiom reference)
- the4rchangel/ Node web UI replacing battlegroup.bat — matches the Commander's
                Hyper-V self-host path + game-config schema

See docs/reference-repos/README.md for the full index + how we use each.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 21:08:05 -04:00
Vantz Stockwell
0715492ddf chore(panel): fleet-aware shell footer + drop dead vuefinder dep
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 14s
CI / agent-tests (push) Successful in 49s
CI / integration (push) Successful in 22s
COA-B cleanup:
- Sidebar agent-health footer now reads the fleet store (host count / online
  count / per-host status + last heartbeat) instead of the single legacy
  server.connection row, which disagreed with the multi-host fleet. Removed the
  legacy useServerStore dependency from the shell.
- Removed the unused 'vuefinder' dependency (replaced by the native file
  manager): dep + main.ts plugin/CSS registration. Main JS chunk 588kB -> 165kB.

Recon reclassified the 'dead cmd.server v1' item: it is the LIVE license-level
command path (module config applies, plugin install, schedules, legacy
start/stop) served only by the Go agent — a Rust-agent parity gap, not dead
code. Left intact.

Build-green (vue-tsc) + boots clean in-browser (0 console errors).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 21:04:09 -04:00
Vantz Stockwell
4ef5db5b0d feat(panel): drive active game from deployed fleet instances
All checks were successful
CI / backend-types (push) Successful in 8s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 40s
CI / integration (push) Successful in 23s
The shell skin / sidebar nav / dashboard terminology now follow the games
actually deployed (game_instances.game, agent-reported) instead of a
localStorage-only toggle. syncActiveGameFromFleet() derives: one game ->
auto-skin to it; zero/multiple -> 'all' neutral. A manual GameSwitcher pick
persists and overrides the heuristic. Wired into DashboardLayout via a watch
on the fleet store.

No schema change: a license's games are the distinct games of its instances
(the normalized source of truth) — deliberately not duplicating into a
licenses.game column that would drift (Lesson 20).

Build-green (vue-tsc) + boots clean in-browser (0 console errors, theming
initializes). Authenticated auto-derive confirms live on next instance deploy.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 20:51:36 -04:00
Vantz Stockwell
bb71763714 docs: Lesson 28 — base64-encode multi-line CI secrets (minisign signing key)
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 39s
CI / integration (push) Successful in 21s
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 20:38:56 -04:00
Vantz Stockwell
f18b45e3f2 fix(ci): base64-decode minisign secret key (CI mangles multi-line); bump alpha.8
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 1m30s
CI / integration (push) Failing after 13s
Build Host Agent (Rust) / build (push) Successful in 1m45s
The 'Sign artifacts' step failed on alpha.7 with 'Error while loading the
secret key file' (exit 2): minisign downloaded and ran, but the reconstructed
key file was unparseable. A minisign secret key is two lines (comment + base64
blob); Gitea/act_runner secret storage mangles the embedded newline, collapsing
it to one line. Decode the secret as base64 (single-line, mangling-proof) with
auto-detect fallback to a raw two-line key. Fails loudly with the fix command
if the secret is neither form.

Requires re-storing MINISIGN_SECRET_KEY as: base64 < secret.key | tr -d '\n'

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
agent-v2.0.0-alpha.8
2026-06-11 20:31:48 -04:00
Vantz Stockwell
702de24e28 fix(ci): fetch minisign static binary (not in bullseye apt); bump alpha.7
Some checks failed
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Successful in 43s
Build Host Agent (Rust) / build (push) Failing after 1m33s
CI / integration (push) Successful in 22s
alpha.6 signing failed: 'E: Unable to locate package minisign' —
minisign isn't packaged for node:20-bullseye. Download the official
static linux binary instead. Forward to alpha.7 (alpha.6 published
nothing).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
agent-v2.0.0-alpha.7
2026-06-11 20:18:08 -04:00
Vantz Stockwell
6b3e805ac2 feat(host-agent): Phase 3a signed self-update (minisign) + CI signing gate
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 1m27s
CI / integration (push) Successful in 21s
Build Host Agent (Rust) / build (push) Failing after 1m33s
Agent only ever runs a binary whose minisign signature verifies against
the EMBEDDED public key. NATS host.cmd func 'update' {url}: download
binary + .minisig from the CDN -> verify against embedded pubkey ->
atomic swap (.old rollback) -> relaunch. URL allowlist (https + cdn.
corrosionmgmt.com only, rejects userinfo-bypass), 100MiB cap. Closes the
supply-chain hole: even a malicious CDN upload can't run unsigned.

CI: build-host-agent.yml signs every artifact with MINISIGN_SECRET_KEY
(Gitea secret) and publishes .minisig alongside; the step FAILS the
build if the secret is absent (refuses to ship unsigned). Bumped to
alpha.6.

6 deterministic tests (accept valid / reject tampered+garbage+empty sig,
URL allowlist incl userinfo-bypass, atomic swap+rollback). Fixtures
signed with the real release key so tests need no key at runtime. Full
suite 50/50 green; musl + native build clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:00:36 -04:00
Vantz Stockwell
7c84912ff5 chore(frontend): bump version 1.0.0 -> 1.0.1
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 50s
CI / integration (push) Successful in 28s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:38:52 -04:00
Vantz Stockwell
355a53f6e3 feat(files): native instance-scoped file browser (replaces broken VueFinder)
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 42s
CI / integration (push) Successful in 22s
FileManagerView rewritten as a native DS browser on the per-instance
file bridge: instance selector, breadcrumb nav, dir-first listing
(name/size/modified), folder drill-down, inline file editor (read/save),
toolbar (new folder/file/refresh), per-row rename + delete-confirm.
New files store wraps the /instances/:id/files* endpoints. VueFinder
import + RemoteDriver fully removed — no more retired-protocol /api/files.
Honest empty (no instance -> Server page) + error (retry) states, never
the global error boundary.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:31:01 -04:00
Vantz Stockwell
589516a021 feat(api): complete per-instance file op-set (delete/rename/mkdir/mkfile/move/copy)
All checks were successful
CI / backend-types (push) Successful in 8s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Successful in 54s
CI / integration (push) Successful in 25s
Rounds out the per-instance file bridge to the agent's full jailed file
manager so a real file browser can be built on it: POST :id/files/
{delete,rename,mkdir,mkfile,move,copy}, all via requestScoped (license-
scoped reply) on the new agent {op,path} protocol. files.manage. The
broken legacy VueFinder /api/files (retired Go fm_* protocol, wrong
subject, default _INBOX) is superseded by this — frontend rewrite next.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:24:31 -04:00
Vantz Stockwell
f60e6abd33 feat(server): config file editor — read/edit/save a host config file per instance
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 44s
CI / integration (push) Successful in 21s
The Server page's config-honesty note now leads somewhere real: a
Configuration file panel that loads a config file from the instance
(prefilled with the game's primaryConfigFile hint — server.cfg,
ServerSettings.ini, GameXishu.json), edits it in a mono textarea, and
saves it straight to the host through the jailed agent file bridge.
Not-found is handled gracefully (empty editor to create). Works across
games; gameProfiles gains primaryConfigFile per game.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:07:59 -04:00
Vantz Stockwell
877fadcb6c feat(api): per-instance file bridge — list/read/write via the new agent file manager
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 44s
CI / integration (push) Successful in 21s
GET /api/instances/:id/files (list) + /file (read), PUT /file (write) —
tenant-guarded, routed through requestScoped to the per-instance
corrosion.{license}.{instance}.files.cmd using the new agent's {op,path}
protocol (jailed to the instance root, symlink-safe). files.view /
files.manage perms. Foundation for the per-game config editor and for
fixing the legacy VueFinder File Manager (which still speaks the retired
Go fm_* protocol on the wrong subject and is broken under per-license
auth — separate reconciliation).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:00:28 -04:00
Vantz Stockwell
e897a4802f fix(server): apply lifecycle reply state optimistically (heartbeat lag)
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 42s
CI / integration (push) Successful in 21s
The agent reply is authoritative for the action just taken; the fleet
DB only updates on the next heartbeat (~10s), so the immediate refetch
read a stale state and reverted the UI (Start -> still Stopped). Now
apply the reply's state/uptime directly to the instance.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 18:41:19 -04:00
Vantz Stockwell
c0b20f2f78 feat(server): instance-centric controls — real per-instance state + lifecycle
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 55s
CI / integration (push) Successful in 22s
The Server page now manages the selected GAME INSTANCE, not the legacy
host connection. New instances store flattens the fleet and drives the
command bridge. New 'Game instance' panel: real state badge
(running/stopped/crashed/configured), uptime, host, and an instance
selector when >1. Start/Stop/Restart/Refresh wired to POST
/api/instances/:id/lifecycle — gated on the actual instance state (not
host connectivity), with telemetry-only instances flagged. Works across
all four games (state + lifecycle are game-agnostic).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 18:37:53 -04:00
Vantz Stockwell
06e832fca1 feat(fleet): remove host — DELETE /api/fleet/hosts/:id + Fleet card action
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 36s
CI / integration (push) Successful in 21s
Self-service host removal. DELETE /api/fleet/hosts/:id (server.manage,
tenant-guarded): refuses while the host is 'connected' (409 — a live
agent re-registers on its next heartbeat, stop it first), deletes the
host's game_instances explicitly (FK is SET NULL, would otherwise
orphan them; instance_stats cascade), and clears the legacy
server_connections row if it was the license's last host. Fleet view:
offline host cards get a Remove button with inline confirm + toast.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 18:21:04 -04:00
Vantz Stockwell
009ceb86ad feat(server): real agent credentials + agent.toml setup; per-game config honesty
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 17s
CI / agent-tests (push) Successful in 45s
CI / integration (push) Successful in 22s
Server page Host-agent panel now fetches GET /api/servers/agent-
credentials and renders the real agent.toml (license UUID, nats_user,
nats_password) instead of the broken LICENSE_ID=license_key env
commands that would never connect. Password masked by default with a
reveal toggle; copy-to-clipboard uses the real value. Setup commands
point at --config /etc/corrosion/agent.toml.

Configuration panel: World size / Current seed (Rust-only Facepunch
concepts) gated behind isRust; Conan/Soulmask/Dune get an honest note
pointing to the File Manager for their real config files instead of
fake Rust fields.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 13:23:47 -04:00
Vantz Stockwell
6f31c41dc3 feat(api): instance command bridge + agent credentials endpoint
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 43s
CI / integration (push) Successful in 21s
Backend layer wiring the panel to the host agent's per-instance command
channel (the unblocker for the Server-page rework):
- NatsService.requestScoped(): request-reply with a LICENSE-SCOPED reply
  subject (corrosion.{license}.reply.<id>) so per-license-scoped agents
  (no _INBOX permission) can actually reply — the design from the NATS
  auth work, now exercised.
- InstancesModule: POST /api/instances/:id/lifecycle {action} (start/
  stop/restart/status/steam_update, server.manage) and POST :id/rcon
  {command} (server.console). Tenant-guarded via game_instances.
- GET /api/servers/agent-credentials: derives the agent's NATS user/
  password (HMAC) so a customer can configure their agent — closes the
  post-auth setup gap.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 13:05:22 -04:00
Vantz Stockwell
99433a09d1 docs(claude): Lesson 27 — lint infra config before deploy; compose up -d recreates changed deps
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 42s
CI / integration (push) Successful in 22s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:53:06 -04:00
Vantz Stockwell
b442ef4102 fix(api): consumer rejects malformed heartbeats with no host block (no phantom hosts)
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 41s
CI / integration (push) Successful in 21s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:49:53 -04:00
Vantz Stockwell
856106174a fix(nats): no_auth_user is top-level, not inside authorization{} — broke broker startup
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 43s
CI / integration (push) Successful in 22s
Caught during the live cutover: nats-server rejects 'unknown field
no_auth_user' when it is nested in the authorization block, taking the
whole broker down. Both the generator (open stage) and the committed
bootstrap default emitted it nested. Moved to top level. Enforce-stage
output was unaffected (no no_auth_user), which is what the live broker
now runs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:47:14 -04:00
Vantz Stockwell
463908b18e fix(nats): security review — secure-by-default + per-tenant inbox isolation
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 43s
CI / integration (push) Successful in 23s
Two HIGH findings from automated review on the generator, both fixed:
1. Cross-tenant inbox access: per-license users were granted _INBOX.>,
   letting license A subscribe to license B's request-reply responses.
   Now scoped to corrosion.{license}.> ONLY; replies must ride the
   license namespace (corrosion.{license}.reply.<id>) — documented in
   PROTOCOL.md. Agent unchanged (responds to msg.reply); constraint is
   on the requester (internal user has full >).
2. Default-open auth bypass: generator defaulted to stage=open with a
   full-access anonymous user — a stale regen left the broker wide open.
   Now defaults to enforce (secure by default); the explicit 'open'
   migration stage maps anonymous to a harmless corrosion.unclaimed.>
   namespace, never real tenant subjects. Committed bootstrap default
   hardened the same way.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:39:31 -04:00
Vantz Stockwell
00cff51ce5 feat(nats): per-license auth mechanism — agent user/password, scoped broker, generator (non-breaking)
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 17s
CI / agent-tests (push) Successful in 1m23s
Build Host Agent (Rust) / build (push) Successful in 1m38s
CI / integration (push) Successful in 23s
Closes the open broker (anonymous publish to any tenant's corrosion.*).
Per-license isolation via NATS user/password + subject permissions:
each license -> user=license_id, password=HMAC-SHA256(license_id,
NATS_TOKEN_SECRET), scoped to corrosion.{license_id}.> + _INBOX. Backend
uses a privileged internal user.

- Agent (alpha.5): nats_user/nats_password config + env, user_and_password
  auth; falls back to token/anonymous (transition-safe)
- Backend: connects with NATS_INTERNAL_USER/PASSWORD when set, else anon
- scripts/generate-nats-auth.mjs: regenerates nats-auth.conf from the
  licenses table; NATS_AUTH_STAGE=open keeps a no_auth_user fallback
  (verify creds first), =enforce rejects anonymous
- committed nats-auth.conf is the SAFE OPEN default (no secrets); the
  host copy carries real users and is not committed
- compose: NATS_INTERNAL_USER/PASSWORD/NATS_TOKEN_SECRET, mount nats-auth.conf

Entirely non-breaking until secrets+config deployed; staged cutover next.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
agent-v2.0.0-alpha.5
2026-06-11 12:33:27 -04:00
Vantz Stockwell
7a07d600e7 feat(fleet): Phase B — fleet overview UI + GET /api/fleet read endpoint
Tenant-scoped fleet read: GET /api/fleet returns agent_hosts (host
metrics) each with their game_instances, plus a summary
(host/instance/online counts). FleetView lists host cards (status, CPU/
mem/disk/uptime/last-heartbeat) with their instances (game, state badge,
uptime); honest empty state -> Server page when no hosts. New 'Fleet'
sidebar nav item across all four game profiles, /fleet route. Store
follows the no-throw-on-fetch pattern (error state, never bricks). The
marketing hero made real from the live fleet tables.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:32:55 -04:00
Vantz Stockwell
4a4ae7a5d4 docs(claude): Lesson 26 — jail-at-entry doesn't jail the recursive walk (security review caught what my review missed)
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 41s
CI / integration (push) Successful in 21s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:04:23 -04:00
Vantz Stockwell
930f655bf5 feat(api): fleet data model Phase A — License -> Host -> Instance
All checks were successful
CI / backend-types (push) Successful in 14s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 42s
CI / integration (push) Successful in 22s
Migration 022 adds agent_hosts / game_instances / instance_clusters /
instance_stats (named agent_hosts to avoid the existing B2B hosts
table). HostAgentConsumerService now parses the full v2 heartbeat and
upserts an agent_hosts row (host metrics: cpu/mem/disk/agent version,
keyed by license_id+hostname until enrollment) plus one game_instances
row per heartbeat instance entry (state + uptime, the billing unit).
Legacy server_connections write retained so the current panel keeps
working — additive migration, nothing breaks. Staleness sweep + offline
beacon now flip agent_hosts too. cluster_id FK reserved for Soulmask/
Dune. Migration applied to live DB; tsc green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:00:52 -04:00
Vantz Stockwell
700dc2254d fix(host-agent): SECURITY — file manager copy/list no longer follow symlinks out of the jail
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 17s
CI / agent-tests (push) Successful in 1m21s
Build Host Agent (Rust) / build (push) Successful in 1m34s
CI / integration (push) Has been cancelled
Automated security review (HIGH) caught a jail-escape my own review
missed: copy_recursive used fs::metadata (follows symlinks). A symlink
inside the jail pointing to e.g. /etc, then a 'copy' of its parent dir,
would dereference it and pull external content INTO the jail where it
could be read — a read-escape exfiltration. jail() validates only the
top-level src/dest; the recursive walk reintroduced the escape.

Fix: copy_recursive uses symlink_metadata and refuses any symlink
('symlinks are not followed across the jail boundary'). list() likewise
switched to symlink_metadata so it reports the link, never the
dereferenced target's size/type (info leak). Two regression tests added:
copy-symlink-exfil (asserts no external content lands inside) and
list-no-deref. 44/44 tests green. Rolled forward to alpha.4 (vulnerable
alpha.3 superseded).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
agent-v2.0.0-alpha.4
2026-06-11 11:57:08 -04:00
Vantz Stockwell
7fdca2cd4f chore(host-agent): bump to 2.0.0-alpha.3 (RCON + supervision + SteamCMD + file manager)
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 1m26s
Build Host Agent (Rust) / build (push) Successful in 1m35s
CI / integration (push) Successful in 21s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
agent-v2.0.0-alpha.3
2026-06-11 11:52:05 -04:00
Vantz Stockwell
18f978dde1 feat(host-agent): Phase 1c — SteamCMD update + jailed file manager
steam_update func runs SteamCMD per game (rust/conan/soulmask app-ids;
dune rejected), streaming stdout to {instance}.steam_status. Jailed
file manager on {instance}.files.cmd: list/read/write/delete/rename/
mkdir/mkfile/move/copy, all confined to instance root via two-stage
lexical-normalize + canonicalize (defeats ../ traversal AND symlink
escape — incl chained symlinks). Replaces the Go agent's UNJAILED
legacy files API (retired, not ported). 5MiB read cap.

42/42 tests green: 24 filemanager incl 7 jail-escape attempts
(dotdot, deep dotdot, absolute, symlink-inside, direct symlink,
chained symlink), 5 steamcmd app-id (cfg-gated win/linux soulmask).
Jail logic reviewed line-by-line: Path::starts_with is component-wise
(no sibling-prefix bypass), non-existent suffix components can't be
symlinks, leading .. normalizes to / and fails the prefix check.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:51:46 -04:00
Vantz Stockwell
9e5e828c8d fix(docker): nginx healthcheck uses 127.0.0.1 not localhost — IPv4-only listener
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 44s
CI / integration (push) Successful in 21s
corrosion-nginx reported (unhealthy) despite serving the panel fine:
nginx listens 0.0.0.0:80 (IPv4 only, no listen [::]:80), but
'localhost' resolves to ::1 first inside the container, so the probe
got connection-refused. Verified: 127.0.0.1:80 serves the SPA. Probe
now targets IPv4 explicitly. No nginx config change — the panel was
never broken, only the healthcheck's hostname resolution.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:43:01 -04:00
Vantz Stockwell
fccd5c61c5 docs(claude): Lessons 24-25 — onModuleInit-before-connect dead subscriptions + resurrected-path crash
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 39s
CI / integration (push) Successful in 22s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:17:02 -04:00
Vantz Stockwell
c72a280361 fix(api): WS gateways crashed on first forwarded event — WebSocket.OPEN undefined at runtime
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 40s
CI / integration (push) Successful in 20s
Lesson 10 in the flesh: the onApplicationBootstrap fix made the NATS->
WS bridge actually deliver events for the first time, which instantly
crashed the API. esModuleInterop is off, so 'import WebSocket from ws'
compiles to ws_1.default = undefined; WebSocket.OPEN threw
'Cannot read properties of undefined' and killed the process on the
first heartbeat forward. All three WS guard sites (nats-bridge x2,
console gateway) switched to the import-agnostic instance constant
client.OPEN. Latent in every build — never hit because the bridge was
dead-on-boot until today.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:11:29 -04:00
Vantz Stockwell
a3b4b5cc7d fix(api): NATS subscriptions moved to onApplicationBootstrap — they silently no-oped before connect
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 47s
CI / integration (push) Successful in 22s
Production bug caught live: provider onModuleInit order put bridge/
consumer subscription hooks BEFORE NatsService finished connecting, so
every subscribe() hit the [OFFLINE] no-op path — the WS bridge has been
dead-on-boot in every production build, and the new v2 consumer never
saw a heartbeat (server_connections stayed empty under a live agent).
onApplicationBootstrap is guaranteed to run after all module inits,
including the awaited NATS connect.

The new CI contract suite fails on exactly this class of bug.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:02:52 -04:00
Vantz Stockwell
4e184ca571 ci: full test gate — types, frontend build, agent tests, agent<->backend contract suite
Some checks failed
CI / backend-types (push) Successful in 11s
CI / frontend-build (push) Successful in 17s
CI / agent-tests (push) Successful in 1m48s
CI / integration (push) Has been cancelled
ci.yml runs on every push to main: backend tsc, frontend vue-tsc+vite,
cargo test (cached), then an integration job with postgres:16 + nats
service containers — real migrations applied to a fresh DB, real
backend booted (admin seed provides the license), real agent binary
spawned. contract-tests/agent-backend.contract.mjs proves the entire
v2 pipeline: heartbeat shape + measured telemetry, auto-registered
server_connections row flipping connected, instance start/stop/status
round-trips with push events, and the offline beacon flipping the row
back. This is the test that could not run before a production rebuild
until now — it now runs before every push lands.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:59:44 -04:00
Vantz Stockwell
fde0926d52 feat(host-agent): Phase 1b RCON — WebRCON (rust) + Source RCON (conan/soulmask)
rcon func on the instance command channel: WebSocket JSON WebRCON with
Identifier correlation (skips chat/log noise frames) and full Valve
Source RCON over TCP (auth, exec, multi-packet reassembly via empty
probe, 1MiB cap). Protocol inferred from game, explicit kind override
in [instance.rcon]. Always 127.0.0.1 — agent is co-located.

Hardening from review: WebRCON password never interpolated into error
contexts/logs (redacted URL); probe-tolerant termination — a quiet
period after received data ends the response for servers that don't
echo the probe (Soulmask conformance unverified), so data is never
discarded on probe timeout.

13/13 tests green incl. mock Source-RCON server (auth/multi-packet/
errors) and mock WebRCON server (noise-frame skipping).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:53:52 -04:00
Vantz Stockwell
4d99c9d99d feat(frontend): validate persisted session on app boot
A stale or revoked token previously rendered the full panel chrome and
only collapsed on the first API call. App boot now calls /auth/me
through useApi (401 -> refresh -> logout already handled there); user
profile refreshes on success, and non-auth failures (network, 5xx)
never log the user out.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:49:21 -04:00
Vantz Stockwell
b8f0ccba3c fix(frontend): env-driven marketing host detection
Exact-match on 'corrosionmgmt.com' meant www. or any staging host
silently served the panel instead of the marketing site. Hosts now come
from VITE_MARKETING_HOSTS (comma-separated, defaults cover bare + www).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:47:15 -04:00
Vantz Stockwell
068a476f39 feat(host-agent): Phase 1a process supervision — instance start/stop/restart/status + push state events
Per-instance ProcessSupervisor: tokio child spawn with proper arg list
(fixes Go's naive space-splitting), graceful SIGTERM with 30s budget
then force kill, monitor task classifying ordered-stop vs crash (exit
code captured), watch-channel state observable everywhere. Instance cmd
channel live on corrosion.{license}.{instance}.cmd (start/stop/restart/
status) with state events pushed on {instance}.status (keep-latest
semantics, documented). Heartbeats now carry live process state +
uptime per instance. Crate restructured lib+bin for integration tests.

Verified: 5 integration tests with real OS processes (lifecycle, crash
exit-code, restart recovery, unmanaged rejection, clean spawn failure)
+ live-NATS contract test (request-reply roundtrips, double-start
rejection, push events, heartbeat state) — all green.

Known limitation (documented): no PID adoption yet — agent restart
orphans a running game process to 'stopped' until panel restart.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:44:24 -04:00
Vantz Stockwell
f706c3c47e docs(claude): host-agent reality — active Rust crate, tag scheme, runner container truth, command corrections
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:37:37 -04:00
Vantz Stockwell
4c9c322c29 feat(seo): per-route titles + meta descriptions; ci: honest runner test
Every page previously titled 'Corrosion Management' with zero meta -
marketing invisible to search and link previews. Router afterEach now
sets title/description/og per route (no new deps); marketing pages get
real content-backed descriptions, panel views mechanical titles.
index.html carries defaults for pre-JS crawlers. Verified in-browser
per page via Playwright.

test-runner.yml: per-tool presence checks instead of green-lighting
missing toolchains; workflow_dispatch instead of every push.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:35:58 -04:00
Vantz Stockwell
47fa72763c feat(api): host-agent protocol v2 consumer — heartbeat persistence, auto-register, staleness sweep
Nothing persisted agent heartbeats before: companion_last_seen was
written once at setup and connection_status stayed 'connected' forever.
HostAgentConsumerService now consumes corrosion.*.host.heartbeat
(updates last_seen + status, auto-creates the bare_metal connection row
on first contact), host.going_offline (graceful offline), and sweeps
connections offline after 180s of heartbeat silence. License-existence
tenant validation with caching per NATS-consumer doctrine. WS bridge
forwards host_heartbeat/host_going_offline to the panel.

Contract-verified against production NATS with the backend's own nats
lib: v2 subjects, schema 2, real telemetry, offline beacon.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:35:58 -04:00
Vantz Stockwell
b455bf9f14 ci(host-agent): bootstrap Rust in the runner container; roll to alpha.2
All checks were successful
Build Host Agent (Rust) / build (push) Successful in 1m29s
Test Asgard Runner / test (push) Successful in 3s
Asgard runner executes jobs in bare node:20-bullseye (no Rust, no sudo)
- install rustup + musl/mingw cross toolchains per-run, same pattern as
setup-go in the Go pipeline. agent-v2.0.0-alpha.1 predates this fix;
forward-only doctrine: version rolls to alpha.2 rather than re-pushing
the tag.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
agent-v2.0.0-alpha.2
2026-06-11 10:15:36 -04:00
Vantz Stockwell
4abf0ab889 ci(host-agent): Rust agent build pipeline on agent-v* tags -> CDN alpha channel
Some checks failed
Build Host Agent (Rust) / build (push) Failing after 3s
Test Asgard Runner / test (push) Successful in 3s
Separate tag namespace from the Go pipeline (v*.*.*) per the
blast-radius doctrine; artifacts publish to /host-agent/alpha/ and a
versioned dir, leaving /host-agent/latest/ on the Go build until
cutover. Linux = static musl, Windows = mingw (msvc/cargo-xwin stays
the local release path). Tag-vs-Cargo.toml version gate included.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
agent-v2.0.0-alpha.1
2026-06-11 10:09:43 -04:00
Vantz Stockwell
cea3d66cdd feat(host-agent): Rust rewrite Phase 0 — multi-instance foundation, v2 wire protocol, real telemetry
All checks were successful
Test Asgard Runner / test (push) Successful in 3s
New corrosion-host-agent/ crate (Go companion-agent stays as behavior
reference until parity). Wire protocol v2 per COA-B: instance-scoped
subjects corrosion.{license}.{instance}.* + host-level .host.* — spec
in PROTOCOL.md, designed for the license->host->instance fleet model.

- Multi-instance TOML config in the foundation, not retrofitted
- NATS layer on the Vigilance production profile (infinite reconnect,
  capped backoff, 30s ping, 8192-msg offline buffer)
- Heartbeat with real sysinfo telemetry — Go agent shipped hardcoded
  disk/cpu placeholders; this is the panel's first true Resources data
- Connectivity prober (outbound TCP, periodic + on-demand)
- Host cmd channel (ping/probe/sysinfo), going-offline beacon,
  CancellationToken shutdown
- Live-fire verified against production NATS; artifacts: 3.7MB static
  linux-musl, 3.8MB windows .exe (static CRT)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:02:46 -04:00
Vantz Stockwell
1abe57ca40 fix(marketing): strip dead Discord invite from footer; docs: Scout tier -> sonnet[1m]
All checks were successful
Test Asgard Runner / test (push) Successful in 2s
discord.gg/corrosion is an Unknown Invite (verified via Discord API) -
no dead community promises on the marketing site. Re-add when a real
server exists. Discord *webhook* feature copy stays; that's shipped.

AGENTS.md Scout tier haiku -> sonnet[1m] confirmed by Commander:
marginal price difference, 1m context window pays for itself on recon.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 09:28:06 -04:00
Vantz Stockwell
a8722a7a07 fix(audit): kill fake install cmds + dead demo CTA; production fonts; scoped error boundary; admin bootstrap seed
All checks were successful
Test Asgard Runner / test (push) Successful in 3s
Full-site fake-data audit findings:
- SetupWizard showed a curl|sh installer (get.corrosionmgmt.com) and a
  'corrosion-agent' binary that don't exist -> real host-agent commands
- 'View live demo' CTA on 5 marketing pages linked to a login, not a
  demo -> honest 'Sign in'
- Google Fonts @import was silently dropped from the production CSS
  bundle (mid-bundle @import) -> <link> tags in index.html; prod was
  shipping system fallback fonts
- App-root ErrorBoundary bricked the entire SPA (incl. marketing) on a
  single failed fetch until manual reload -> resets on route change +
  content-scoped boundary inside DashboardLayout so nav chrome survives
- Status page KPIs showed fake zeros while the fetch failed -> em dash
- Login lacked the forgot-password link (flow already existed end-to-end)
- AdminSeedService: fresh DB had schema but no login possible; seeds
  super-admin + license from ADMIN_* env when users table is empty

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 09:23:44 -04:00