The agent reply is authoritative for the action just taken; the fleet
DB only updates on the next heartbeat (~10s), so the immediate refetch
read a stale state and reverted the UI (Start -> still Stopped). Now
apply the reply's state/uptime directly to the instance.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The Server page now manages the selected GAME INSTANCE, not the legacy
host connection. New instances store flattens the fleet and drives the
command bridge. New 'Game instance' panel: real state badge
(running/stopped/crashed/configured), uptime, host, and an instance
selector when >1. Start/Stop/Restart/Refresh wired to POST
/api/instances/:id/lifecycle — gated on the actual instance state (not
host connectivity), with telemetry-only instances flagged. Works across
all four games (state + lifecycle are game-agnostic).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Self-service host removal. DELETE /api/fleet/hosts/:id (server.manage,
tenant-guarded): refuses while the host is 'connected' (409 — a live
agent re-registers on its next heartbeat, stop it first), deletes the
host's game_instances explicitly (FK is SET NULL, would otherwise
orphan them; instance_stats cascade), and clears the legacy
server_connections row if it was the license's last host. Fleet view:
offline host cards get a Remove button with inline confirm + toast.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Server page Host-agent panel now fetches GET /api/servers/agent-
credentials and renders the real agent.toml (license UUID, nats_user,
nats_password) instead of the broken LICENSE_ID=license_key env
commands that would never connect. Password masked by default with a
reveal toggle; copy-to-clipboard uses the real value. Setup commands
point at --config /etc/corrosion/agent.toml.
Configuration panel: World size / Current seed (Rust-only Facepunch
concepts) gated behind isRust; Conan/Soulmask/Dune get an honest note
pointing to the File Manager for their real config files instead of
fake Rust fields.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Backend layer wiring the panel to the host agent's per-instance command
channel (the unblocker for the Server-page rework):
- NatsService.requestScoped(): request-reply with a LICENSE-SCOPED reply
subject (corrosion.{license}.reply.<id>) so per-license-scoped agents
(no _INBOX permission) can actually reply — the design from the NATS
auth work, now exercised.
- InstancesModule: POST /api/instances/:id/lifecycle {action} (start/
stop/restart/status/steam_update, server.manage) and POST :id/rcon
{command} (server.console). Tenant-guarded via game_instances.
- GET /api/servers/agent-credentials: derives the agent's NATS user/
password (HMAC) so a customer can configure their agent — closes the
post-auth setup gap.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Caught during the live cutover: nats-server rejects 'unknown field
no_auth_user' when it is nested in the authorization block, taking the
whole broker down. Both the generator (open stage) and the committed
bootstrap default emitted it nested. Moved to top level. Enforce-stage
output was unaffected (no no_auth_user), which is what the live broker
now runs.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Two HIGH findings from automated review on the generator, both fixed:
1. Cross-tenant inbox access: per-license users were granted _INBOX.>,
letting license A subscribe to license B's request-reply responses.
Now scoped to corrosion.{license}.> ONLY; replies must ride the
license namespace (corrosion.{license}.reply.<id>) — documented in
PROTOCOL.md. Agent unchanged (responds to msg.reply); constraint is
on the requester (internal user has full >).
2. Default-open auth bypass: generator defaulted to stage=open with a
full-access anonymous user — a stale regen left the broker wide open.
Now defaults to enforce (secure by default); the explicit 'open'
migration stage maps anonymous to a harmless corrosion.unclaimed.>
namespace, never real tenant subjects. Committed bootstrap default
hardened the same way.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Closes the open broker (anonymous publish to any tenant's corrosion.*).
Per-license isolation via NATS user/password + subject permissions:
each license -> user=license_id, password=HMAC-SHA256(license_id,
NATS_TOKEN_SECRET), scoped to corrosion.{license_id}.> + _INBOX. Backend
uses a privileged internal user.
- Agent (alpha.5): nats_user/nats_password config + env, user_and_password
auth; falls back to token/anonymous (transition-safe)
- Backend: connects with NATS_INTERNAL_USER/PASSWORD when set, else anon
- scripts/generate-nats-auth.mjs: regenerates nats-auth.conf from the
licenses table; NATS_AUTH_STAGE=open keeps a no_auth_user fallback
(verify creds first), =enforce rejects anonymous
- committed nats-auth.conf is the SAFE OPEN default (no secrets); the
host copy carries real users and is not committed
- compose: NATS_INTERNAL_USER/PASSWORD/NATS_TOKEN_SECRET, mount nats-auth.conf
Entirely non-breaking until secrets+config deployed; staged cutover next.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Tenant-scoped fleet read: GET /api/fleet returns agent_hosts (host
metrics) each with their game_instances, plus a summary
(host/instance/online counts). FleetView lists host cards (status, CPU/
mem/disk/uptime/last-heartbeat) with their instances (game, state badge,
uptime); honest empty state -> Server page when no hosts. New 'Fleet'
sidebar nav item across all four game profiles, /fleet route. Store
follows the no-throw-on-fetch pattern (error state, never bricks). The
marketing hero made real from the live fleet tables.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Migration 022 adds agent_hosts / game_instances / instance_clusters /
instance_stats (named agent_hosts to avoid the existing B2B hosts
table). HostAgentConsumerService now parses the full v2 heartbeat and
upserts an agent_hosts row (host metrics: cpu/mem/disk/agent version,
keyed by license_id+hostname until enrollment) plus one game_instances
row per heartbeat instance entry (state + uptime, the billing unit).
Legacy server_connections write retained so the current panel keeps
working — additive migration, nothing breaks. Staleness sweep + offline
beacon now flip agent_hosts too. cluster_id FK reserved for Soulmask/
Dune. Migration applied to live DB; tsc green.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Automated security review (HIGH) caught a jail-escape my own review
missed: copy_recursive used fs::metadata (follows symlinks). A symlink
inside the jail pointing to e.g. /etc, then a 'copy' of its parent dir,
would dereference it and pull external content INTO the jail where it
could be read — a read-escape exfiltration. jail() validates only the
top-level src/dest; the recursive walk reintroduced the escape.
Fix: copy_recursive uses symlink_metadata and refuses any symlink
('symlinks are not followed across the jail boundary'). list() likewise
switched to symlink_metadata so it reports the link, never the
dereferenced target's size/type (info leak). Two regression tests added:
copy-symlink-exfil (asserts no external content lands inside) and
list-no-deref. 44/44 tests green. Rolled forward to alpha.4 (vulnerable
alpha.3 superseded).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
steam_update func runs SteamCMD per game (rust/conan/soulmask app-ids;
dune rejected), streaming stdout to {instance}.steam_status. Jailed
file manager on {instance}.files.cmd: list/read/write/delete/rename/
mkdir/mkfile/move/copy, all confined to instance root via two-stage
lexical-normalize + canonicalize (defeats ../ traversal AND symlink
escape — incl chained symlinks). Replaces the Go agent's UNJAILED
legacy files API (retired, not ported). 5MiB read cap.
42/42 tests green: 24 filemanager incl 7 jail-escape attempts
(dotdot, deep dotdot, absolute, symlink-inside, direct symlink,
chained symlink), 5 steamcmd app-id (cfg-gated win/linux soulmask).
Jail logic reviewed line-by-line: Path::starts_with is component-wise
(no sibling-prefix bypass), non-existent suffix components can't be
symlinks, leading .. normalizes to / and fails the prefix check.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
corrosion-nginx reported (unhealthy) despite serving the panel fine:
nginx listens 0.0.0.0:80 (IPv4 only, no listen [::]:80), but
'localhost' resolves to ::1 first inside the container, so the probe
got connection-refused. Verified: 127.0.0.1:80 serves the SPA. Probe
now targets IPv4 explicitly. No nginx config change — the panel was
never broken, only the healthcheck's hostname resolution.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Lesson 10 in the flesh: the onApplicationBootstrap fix made the NATS->
WS bridge actually deliver events for the first time, which instantly
crashed the API. esModuleInterop is off, so 'import WebSocket from ws'
compiles to ws_1.default = undefined; WebSocket.OPEN threw
'Cannot read properties of undefined' and killed the process on the
first heartbeat forward. All three WS guard sites (nats-bridge x2,
console gateway) switched to the import-agnostic instance constant
client.OPEN. Latent in every build — never hit because the bridge was
dead-on-boot until today.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Production bug caught live: provider onModuleInit order put bridge/
consumer subscription hooks BEFORE NatsService finished connecting, so
every subscribe() hit the [OFFLINE] no-op path — the WS bridge has been
dead-on-boot in every production build, and the new v2 consumer never
saw a heartbeat (server_connections stayed empty under a live agent).
onApplicationBootstrap is guaranteed to run after all module inits,
including the awaited NATS connect.
The new CI contract suite fails on exactly this class of bug.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ci.yml runs on every push to main: backend tsc, frontend vue-tsc+vite,
cargo test (cached), then an integration job with postgres:16 + nats
service containers — real migrations applied to a fresh DB, real
backend booted (admin seed provides the license), real agent binary
spawned. contract-tests/agent-backend.contract.mjs proves the entire
v2 pipeline: heartbeat shape + measured telemetry, auto-registered
server_connections row flipping connected, instance start/stop/status
round-trips with push events, and the offline beacon flipping the row
back. This is the test that could not run before a production rebuild
until now — it now runs before every push lands.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
rcon func on the instance command channel: WebSocket JSON WebRCON with
Identifier correlation (skips chat/log noise frames) and full Valve
Source RCON over TCP (auth, exec, multi-packet reassembly via empty
probe, 1MiB cap). Protocol inferred from game, explicit kind override
in [instance.rcon]. Always 127.0.0.1 — agent is co-located.
Hardening from review: WebRCON password never interpolated into error
contexts/logs (redacted URL); probe-tolerant termination — a quiet
period after received data ends the response for servers that don't
echo the probe (Soulmask conformance unverified), so data is never
discarded on probe timeout.
13/13 tests green incl. mock Source-RCON server (auth/multi-packet/
errors) and mock WebRCON server (noise-frame skipping).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A stale or revoked token previously rendered the full panel chrome and
only collapsed on the first API call. App boot now calls /auth/me
through useApi (401 -> refresh -> logout already handled there); user
profile refreshes on success, and non-auth failures (network, 5xx)
never log the user out.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Exact-match on 'corrosionmgmt.com' meant www. or any staging host
silently served the panel instead of the marketing site. Hosts now come
from VITE_MARKETING_HOSTS (comma-separated, defaults cover bare + www).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Per-instance ProcessSupervisor: tokio child spawn with proper arg list
(fixes Go's naive space-splitting), graceful SIGTERM with 30s budget
then force kill, monitor task classifying ordered-stop vs crash (exit
code captured), watch-channel state observable everywhere. Instance cmd
channel live on corrosion.{license}.{instance}.cmd (start/stop/restart/
status) with state events pushed on {instance}.status (keep-latest
semantics, documented). Heartbeats now carry live process state +
uptime per instance. Crate restructured lib+bin for integration tests.
Verified: 5 integration tests with real OS processes (lifecycle, crash
exit-code, restart recovery, unmanaged rejection, clean spawn failure)
+ live-NATS contract test (request-reply roundtrips, double-start
rejection, push events, heartbeat state) — all green.
Known limitation (documented): no PID adoption yet — agent restart
orphans a running game process to 'stopped' until panel restart.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Every page previously titled 'Corrosion Management' with zero meta -
marketing invisible to search and link previews. Router afterEach now
sets title/description/og per route (no new deps); marketing pages get
real content-backed descriptions, panel views mechanical titles.
index.html carries defaults for pre-JS crawlers. Verified in-browser
per page via Playwright.
test-runner.yml: per-tool presence checks instead of green-lighting
missing toolchains; workflow_dispatch instead of every push.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Nothing persisted agent heartbeats before: companion_last_seen was
written once at setup and connection_status stayed 'connected' forever.
HostAgentConsumerService now consumes corrosion.*.host.heartbeat
(updates last_seen + status, auto-creates the bare_metal connection row
on first contact), host.going_offline (graceful offline), and sweeps
connections offline after 180s of heartbeat silence. License-existence
tenant validation with caching per NATS-consumer doctrine. WS bridge
forwards host_heartbeat/host_going_offline to the panel.
Contract-verified against production NATS with the backend's own nats
lib: v2 subjects, schema 2, real telemetry, offline beacon.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Asgard runner executes jobs in bare node:20-bullseye (no Rust, no sudo)
- install rustup + musl/mingw cross toolchains per-run, same pattern as
setup-go in the Go pipeline. agent-v2.0.0-alpha.1 predates this fix;
forward-only doctrine: version rolls to alpha.2 rather than re-pushing
the tag.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Separate tag namespace from the Go pipeline (v*.*.*) per the
blast-radius doctrine; artifacts publish to /host-agent/alpha/ and a
versioned dir, leaving /host-agent/latest/ on the Go build until
cutover. Linux = static musl, Windows = mingw (msvc/cargo-xwin stays
the local release path). Tag-vs-Cargo.toml version gate included.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
New corrosion-host-agent/ crate (Go companion-agent stays as behavior
reference until parity). Wire protocol v2 per COA-B: instance-scoped
subjects corrosion.{license}.{instance}.* + host-level .host.* — spec
in PROTOCOL.md, designed for the license->host->instance fleet model.
- Multi-instance TOML config in the foundation, not retrofitted
- NATS layer on the Vigilance production profile (infinite reconnect,
capped backoff, 30s ping, 8192-msg offline buffer)
- Heartbeat with real sysinfo telemetry — Go agent shipped hardcoded
disk/cpu placeholders; this is the panel's first true Resources data
- Connectivity prober (outbound TCP, periodic + on-demand)
- Host cmd channel (ping/probe/sysinfo), going-offline beacon,
CancellationToken shutdown
- Live-fire verified against production NATS; artifacts: 3.7MB static
linux-musl, 3.8MB windows .exe (static CRT)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
discord.gg/corrosion is an Unknown Invite (verified via Discord API) -
no dead community promises on the marketing site. Re-add when a real
server exists. Discord *webhook* feature copy stays; that's shipped.
AGENTS.md Scout tier haiku -> sonnet[1m] confirmed by Commander:
marginal price difference, 1m context window pays for itself on recon.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Full-site fake-data audit findings:
- SetupWizard showed a curl|sh installer (get.corrosionmgmt.com) and a
'corrosion-agent' binary that don't exist -> real host-agent commands
- 'View live demo' CTA on 5 marketing pages linked to a login, not a
demo -> honest 'Sign in'
- Google Fonts @import was silently dropped from the production CSS
bundle (mid-bundle @import) -> <link> tags in index.html; prod was
shipping system fallback fonts
- App-root ErrorBoundary bricked the entire SPA (incl. marketing) on a
single failed fetch until manual reload -> resets on route change +
content-scoped boundary inside DashboardLayout so nav chrome survives
- Status page KPIs showed fake zeros while the fetch failed -> em dash
- Login lacked the forgot-password link (flow already existed end-to-end)
- AdminSeedService: fresh DB had schema but no login possible; seeds
super-admin + license from ADMIN_* env when users table is empty
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Version badge: was hardcoded '1.0.8' — now single-sourced from frontend/package.json (1.0.0) via Vite define __APP_VERSION__, so it auto-updates on release. Sidebar agent footer: removed the FABRICATED 'asgard-01' host name and the fake 'Agent v1.0.8' line — now shows real server.connection data, or an honest 'No host agent connected' empty state when nothing is deployed (the operator's actual state). Renamed 'Companion agent' -> 'Corrosion host agent' across the UI (ServerView/SetupWizard/Dashboard/Plugins), the binary names (corrosion-host-agent-<os>-<arch>) + CDN path (/host-agent/), the Go Makefile build output, and the Gitea CI workflow — frontend download links and CI output now match. Marketing hero mock host names neutralized (asgard-01 -> rust-host/dune-host/conan-host). DB column names (companion_last_seen) left intact. Build green; zero 'asgard'/'1.0.8' remain in frontend/src.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drives the panel off the active game (GameSwitcher selection) + the GameProfile registry, so each game visibly differs (not just accent color). Sidebar nav: Rust = full (uMod plugins + plugin configs); Conan/Soulmask/Dune drop uMod + plugin-configs and relabel reset (Wipe World / World Reset / Deep Desert), Dune relabels Console->Broadcast (no RCON) and is Docker-managed. ServerView: management-model badge + game-appropriate panels (Rust deploy + Oxide; Dune Docker/BattleGroup-Sietches; Conan clans/thralls/avatars/purge; Soulmask main-client cluster) with HONEST EmptyStates where no backend data exists yet. Dashboard: per-game reset terminology + stat labels. No invented routes (all map to existing router entries); no fabricated data. Build green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Root cause of 'data lost on every rebuild': nothing created the Postgres schema. TypeORM is synchronize:false, the API container runs no migration step, and there was no init mount — so a fresh pg_data volume came up with ZERO tables (empty/broken DB; the schema had only ever been loaded manually). Mount backend/migrations/*.sql into /docker-entrypoint-initdb.d so Postgres auto-applies the full schema (001..021, plain SQL) ON FIRST INIT ONLY. Existing volumes are untouched (initdb scripts run only on an empty data dir); a fresh volume now self-heals the schema. NOTE: actual row DATA still persists only while the pg_data named volume persists — 'docker compose down' keeps it across 'build --no-cache'; 'down -v' / volume prune is the only thing that wipes it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Real @Public() NestJS endpoint persisting to the existing early_access_signups table (email + server_count), matching the schema exactly (no migration). Duplicate-email safe (pre-check + unique-constraint catch -> friendly success). Wired into app.module. Makes the marketing early-access form functional end-to-end on next API deploy. tsc/nest build green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Five marketing sub-pages built to match the landing's design language, all real content: Pricing (4 real tiers + Fleet Block + commercial-use definition + feature-comparison table + self-service support model), How it works (one agent -> N game instances, BYOS, no-SSH), FAQ (real support/product/games/billing Q&A reflecting the self-service model), Roadmap (honest Shipped/In-progress/Planned, no fake dates), Early access (real signup form). 3 icons added (circle/send/help-circle). Visually verified via Playwright; 0 console errors. Build green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
MarketingLayout + LandingView rebuilt from the delivered design as a multi-game platform site (was Rust-only stub): hero with per-game re-skin + panel mockup, 8-pain problem grid, agent-model shift, 4 self-themed game blueprints (Rust/Dune/Conan/Soulmask), core capabilities, wipe orchestration, built-like-infrastructure, public sites/storefront, pricing, serious-admins, final CTA, footer. REAL pricing (Hobby $9.99 / Community $19.99 / Operator $99.99 / Network $99.99 + $49.99 fleet block) + commercial-use definition + self-service support model ($125/hr prepaid blocks, 'a tool, not a managed service'). marketing.css ported (token-based). 6 icons added to the registry. No fabricated metrics/testimonials. Build green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
DashboardView now renders the REAL server from useServerStore (connection/config + live WebSocket stats) + real 24h history from /analytics/timeseries, with honest EmptyStates ('install the companion agent') when there is no data. DELETED _dashboardMock.ts (the fake 8-server fleet/feed/wipes). PlayersChart hardened: removed the DEFAULT_SERIES fallback, renders an 'awaiting telemetry' empty state instead of a fabricated curve. New gameProfiles.ts: real per-game capability/terminology/stat registry (rust/conan/soulmask/dune; dune managementModel=docker-compose), ready to wire when the backend gains a per-license game field. No fake data. Build green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tokens + theming contract + 23 DS components + game-aware shell + Fleet/Solo dashboard, and every panel view re-skinned onto the design system (auth, account, server ops, operations, store, analytics, 9 plugin-config editors + Loot Builder, platform-admin, public pages). Marketing views deferred to their dedicated redesign. Includes the token-loading fix (f440fd7) verified live. Build green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Forged during the design-system port: the whole panel rendered unstyled despite green builds because a nested @import token barrel after `@import "tailwindcss"` was dropped by Tailwind v4. Caught only by a Playwright screenshot.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The nested ./styles/corrosion.css barrel (8 @import token files) was placed after `@import "tailwindcss"`. Once Tailwind v4 expands in place, those nested @imports no longer precede all statements, so PostCSS DROPPED them (the 8 'should be written before any other statement' warnings). Result: every design token (--surface-*, --accent, --text-*, --font-brand, --space-*) was empty and the entire re-skin rendered unstyled (white bg, no surfaces/accent) despite a green build. Fix: import the 8 token files directly + contiguous in style.css. Verified live via Playwright — tokens resolve (--accent #f26622, canvas #0a0b0e) and the login renders correctly.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Auth (Login/Register/ForgotPassword/SetupWizard) + account cluster (Settings/Team/Notifications) re-skinned onto design-system components + tokens. JPEG login banner replaced with the C-gauge mark + Oxanium wordmark. All logic/store/router/handlers preserved (TOTP flow, validators, save handlers, API endpoints). Build green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tokens ported 1:1 from the Claude Design bundle (colors/game-themes/type/spacing/elevation/motion/fonts) with the data-theme/data-game theming contract via useThemeGame (+ cc-skin-swap repaint guard). 23 design-system components reimplemented as Vue SFCs (core/forms/data/navigation/feedback/brand). DashboardLayout rebuilt as the game-aware shell (GameSwitcher, grouped nav with permission gating preserved, agent-health footer, topbar). DashboardView: Fleet + Solo with per-game GAME_FIELDS rows and the themed ECharts PlayersChart; Solo wired to the real server store, Fleet on representative data pending the multi-instance backend. All four game skins (Rust/Dune/Conan/Soulmask). vue-tsc + vite build green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>