12 Commits

Author SHA1 Message Date
Vantz Stockwell
9e5e828c8d fix(docker): nginx healthcheck uses 127.0.0.1 not localhost — IPv4-only listener
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 44s
CI / integration (push) Successful in 21s
corrosion-nginx reported (unhealthy) despite serving the panel fine:
nginx listens 0.0.0.0:80 (IPv4 only, no listen [::]:80), but
'localhost' resolves to ::1 first inside the container, so the probe
got connection-refused. Verified: 127.0.0.1:80 serves the SPA. Probe
now targets IPv4 explicitly. No nginx config change — the panel was
never broken, only the healthcheck's hostname resolution.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:43:01 -04:00
Vantz Stockwell
fccd5c61c5 docs(claude): Lessons 24-25 — onModuleInit-before-connect dead subscriptions + resurrected-path crash
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 39s
CI / integration (push) Successful in 22s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:17:02 -04:00
Vantz Stockwell
c72a280361 fix(api): WS gateways crashed on first forwarded event — WebSocket.OPEN undefined at runtime
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 40s
CI / integration (push) Successful in 20s
Lesson 10 in the flesh: the onApplicationBootstrap fix made the NATS->
WS bridge actually deliver events for the first time, which instantly
crashed the API. esModuleInterop is off, so 'import WebSocket from ws'
compiles to ws_1.default = undefined; WebSocket.OPEN threw
'Cannot read properties of undefined' and killed the process on the
first heartbeat forward. All three WS guard sites (nats-bridge x2,
console gateway) switched to the import-agnostic instance constant
client.OPEN. Latent in every build — never hit because the bridge was
dead-on-boot until today.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:11:29 -04:00
Vantz Stockwell
a3b4b5cc7d fix(api): NATS subscriptions moved to onApplicationBootstrap — they silently no-oped before connect
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 47s
CI / integration (push) Successful in 22s
Production bug caught live: provider onModuleInit order put bridge/
consumer subscription hooks BEFORE NatsService finished connecting, so
every subscribe() hit the [OFFLINE] no-op path — the WS bridge has been
dead-on-boot in every production build, and the new v2 consumer never
saw a heartbeat (server_connections stayed empty under a live agent).
onApplicationBootstrap is guaranteed to run after all module inits,
including the awaited NATS connect.

The new CI contract suite fails on exactly this class of bug.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:02:52 -04:00
Vantz Stockwell
4e184ca571 ci: full test gate — types, frontend build, agent tests, agent<->backend contract suite
Some checks failed
CI / backend-types (push) Successful in 11s
CI / frontend-build (push) Successful in 17s
CI / agent-tests (push) Successful in 1m48s
CI / integration (push) Has been cancelled
ci.yml runs on every push to main: backend tsc, frontend vue-tsc+vite,
cargo test (cached), then an integration job with postgres:16 + nats
service containers — real migrations applied to a fresh DB, real
backend booted (admin seed provides the license), real agent binary
spawned. contract-tests/agent-backend.contract.mjs proves the entire
v2 pipeline: heartbeat shape + measured telemetry, auto-registered
server_connections row flipping connected, instance start/stop/status
round-trips with push events, and the offline beacon flipping the row
back. This is the test that could not run before a production rebuild
until now — it now runs before every push lands.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:59:44 -04:00
Vantz Stockwell
fde0926d52 feat(host-agent): Phase 1b RCON — WebRCON (rust) + Source RCON (conan/soulmask)
rcon func on the instance command channel: WebSocket JSON WebRCON with
Identifier correlation (skips chat/log noise frames) and full Valve
Source RCON over TCP (auth, exec, multi-packet reassembly via empty
probe, 1MiB cap). Protocol inferred from game, explicit kind override
in [instance.rcon]. Always 127.0.0.1 — agent is co-located.

Hardening from review: WebRCON password never interpolated into error
contexts/logs (redacted URL); probe-tolerant termination — a quiet
period after received data ends the response for servers that don't
echo the probe (Soulmask conformance unverified), so data is never
discarded on probe timeout.

13/13 tests green incl. mock Source-RCON server (auth/multi-packet/
errors) and mock WebRCON server (noise-frame skipping).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:53:52 -04:00
Vantz Stockwell
4d99c9d99d feat(frontend): validate persisted session on app boot
A stale or revoked token previously rendered the full panel chrome and
only collapsed on the first API call. App boot now calls /auth/me
through useApi (401 -> refresh -> logout already handled there); user
profile refreshes on success, and non-auth failures (network, 5xx)
never log the user out.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:49:21 -04:00
Vantz Stockwell
b8f0ccba3c fix(frontend): env-driven marketing host detection
Exact-match on 'corrosionmgmt.com' meant www. or any staging host
silently served the panel instead of the marketing site. Hosts now come
from VITE_MARKETING_HOSTS (comma-separated, defaults cover bare + www).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:47:15 -04:00
Vantz Stockwell
068a476f39 feat(host-agent): Phase 1a process supervision — instance start/stop/restart/status + push state events
Per-instance ProcessSupervisor: tokio child spawn with proper arg list
(fixes Go's naive space-splitting), graceful SIGTERM with 30s budget
then force kill, monitor task classifying ordered-stop vs crash (exit
code captured), watch-channel state observable everywhere. Instance cmd
channel live on corrosion.{license}.{instance}.cmd (start/stop/restart/
status) with state events pushed on {instance}.status (keep-latest
semantics, documented). Heartbeats now carry live process state +
uptime per instance. Crate restructured lib+bin for integration tests.

Verified: 5 integration tests with real OS processes (lifecycle, crash
exit-code, restart recovery, unmanaged rejection, clean spawn failure)
+ live-NATS contract test (request-reply roundtrips, double-start
rejection, push events, heartbeat state) — all green.

Known limitation (documented): no PID adoption yet — agent restart
orphans a running game process to 'stopped' until panel restart.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:44:24 -04:00
Vantz Stockwell
f706c3c47e docs(claude): host-agent reality — active Rust crate, tag scheme, runner container truth, command corrections
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:37:37 -04:00
Vantz Stockwell
4c9c322c29 feat(seo): per-route titles + meta descriptions; ci: honest runner test
Every page previously titled 'Corrosion Management' with zero meta -
marketing invisible to search and link previews. Router afterEach now
sets title/description/og per route (no new deps); marketing pages get
real content-backed descriptions, panel views mechanical titles.
index.html carries defaults for pre-JS crawlers. Verified in-browser
per page via Playwright.

test-runner.yml: per-tool presence checks instead of green-lighting
missing toolchains; workflow_dispatch instead of every push.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:35:58 -04:00
Vantz Stockwell
47fa72763c feat(api): host-agent protocol v2 consumer — heartbeat persistence, auto-register, staleness sweep
Nothing persisted agent heartbeats before: companion_last_seen was
written once at setup and connection_status stayed 'connected' forever.
HostAgentConsumerService now consumes corrosion.*.host.heartbeat
(updates last_seen + status, auto-creates the bare_metal connection row
on first contact), host.going_offline (graceful offline), and sweeps
connections offline after 180s of heartbeat silence. License-existence
tenant validation with caching per NATS-consumer doctrine. WS bridge
forwards host_heartbeat/host_going_offline to the panel.

Contract-verified against production NATS with the backend's own nats
lib: v2 subjects, schema 2, real telemetry, offline beacon.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 10:35:58 -04:00
33 changed files with 2166 additions and 73 deletions

View File

@@ -42,3 +42,6 @@ FRONTEND_URL=http://localhost:5174
# Frontend (Vite — must be prefixed with VITE_)
VITE_PANEL_URL=https://panel.corrosionmgmt.com
# Hostnames that serve the marketing site (comma-separated); all other hosts get the panel
VITE_MARKETING_HOSTS=corrosionmgmt.com,www.corrosionmgmt.com

122
.gitea/workflows/ci.yml Normal file
View File

@@ -0,0 +1,122 @@
name: CI
# Test gate for every push to main. The deploy story: main must be green here
# before the stack is rebuilt (deploy workflow enforces it once SSH transport
# secrets land). Jobs run in the runner's bare node:20-bullseye container —
# toolchains bootstrap per-run.
on:
push:
branches: [main]
pull_request:
jobs:
backend-types:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Type-check NestJS backend
run: |
cd backend-nest
npm ci --no-audit --no-fund 2>&1 | tail -2
npx tsc --noEmit
frontend-build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build frontend (vue-tsc gate + vite)
run: |
cd frontend
npm ci --no-audit --no-fund 2>&1 | tail -2
npm run build
agent-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Cache cargo
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
~/.cargo/git
corrosion-host-agent/target
key: cargo-${{ hashFiles('corrosion-host-agent/Cargo.lock') }}
- name: Install Rust
run: |
apt-get update -qq && apt-get install -y -qq build-essential curl
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable --profile minimal
echo "$HOME/.cargo/bin" >> $GITHUB_PATH
- name: Test agent
run: |
cd corrosion-host-agent
cargo test
- name: Upload agent binary for integration
uses: actions/upload-artifact@v3
with:
name: agent-debug
path: corrosion-host-agent/target/debug/corrosion-host-agent
integration:
runs-on: ubuntu-latest
needs: agent-tests
services:
postgres:
image: postgres:16
env:
POSTGRES_USER: corrosion
POSTGRES_PASSWORD: citest
POSTGRES_DB: corrosion
nats:
image: nats:2.10-alpine
steps:
- uses: actions/checkout@v4
- name: Download agent binary
uses: actions/download-artifact@v3
with:
name: agent-debug
path: agent-bin
- name: Apply migrations to fresh DB
run: |
apt-get update -qq && apt-get install -y -qq postgresql-client
until PGPASSWORD=citest psql -h postgres -U corrosion -d corrosion -c 'SELECT 1' >/dev/null 2>&1; do sleep 1; done
for f in $(ls backend/migrations/*.sql | sort); do
echo "applying $f"
PGPASSWORD=citest psql -h postgres -U corrosion -d corrosion -v ON_ERROR_STOP=1 -q -f "$f"
done
- name: Build + boot backend
run: |
cd backend-nest
npm ci --no-audit --no-fund 2>&1 | tail -2
npm run build
DATABASE_URL=postgres://corrosion:citest@postgres:5432/corrosion \
NATS_URL=nats://nats:4222 \
JWT_SECRET=ci-secret ENCRYPTION_KEY=ci-encryption-key \
ADMIN_EMAIL=ci@corrosion.test ADMIN_PASSWORD=ci-password-123 ADMIN_USERNAME=CI \
nohup node dist/main.js > /tmp/backend.log 2>&1 &
for i in $(seq 1 30); do
code=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/api/auth/login -X POST -H 'Content-Type: application/json' -d '{}' || true)
[ "$code" = "400" ] && echo "backend up" && exit 0
sleep 2
done
echo "backend failed to come up"; cat /tmp/backend.log; exit 1
- name: Run agent↔backend contract suite
run: |
chmod +x agent-bin/corrosion-host-agent
LICENSE_ID=$(PGPASSWORD=citest psql -h postgres -U corrosion -d corrosion -t -A -c 'SELECT id FROM licenses LIMIT 1')
echo "license under test: $LICENSE_ID"
[ -n "$LICENSE_ID" ] || { echo "admin seed did not create a license"; cat /tmp/backend.log; exit 1; }
LICENSE_ID="$LICENSE_ID" \
DATABASE_URL=postgres://corrosion:citest@postgres:5432/corrosion \
NATS_URL=nats://nats:4222 \
AGENT_BIN=$PWD/agent-bin/corrosion-host-agent \
node contract-tests/agent-backend.contract.mjs
- name: Backend log on failure
if: failure()
run: cat /tmp/backend.log || true

View File

@@ -1,5 +1,6 @@
name: Test Asgard Runner
on: [push]
# On-demand only — no reason to spin a container on every push.
on: [workflow_dispatch]
jobs:
test:
@@ -17,8 +18,15 @@ jobs:
echo "Memory: $(free -h | grep Mem | awk '{print $2}')"
echo "Disk: $(df -h / | tail -1 | awk '{print $4}')"
echo "==========================================="
echo "Go: $(go version)"
echo "Rust: $(rustc --version)"
echo "Docker: $(docker --version)"
# Jobs run in a bare node:20-bullseye container: toolchains are NOT
# preinstalled — workflows must bootstrap them (setup-go, rustup).
# Report presence honestly instead of green-lighting a missing tool.
for tool in go rustc docker node; do
if command -v "$tool" >/dev/null 2>&1; then
echo "$tool: $($tool --version 2>&1 | head -1)"
else
echo "$tool: NOT PRESENT (workflows must install per-run)"
fi
done
echo "==========================================="
echo "✅ Asgard runner is OPERATIONAL"
echo "✅ Asgard runner reachable — container is node:20-bullseye, bootstrap toolchains per-run"

View File

@@ -4,6 +4,19 @@ All notable changes to this project will be documented in this file.
## [Unreleased]
### Added (Host-Agent v2 Consumer + SEO Meta — 2026-06-11)
**Backend (NestJS):**
- `HostAgentConsumerService` (new) — consumes wire protocol v2: `corrosion.*.host.heartbeat` updates `companion_last_seen` + `connection_status='connected'` (auto-registers the connection row on first contact); `host.going_offline` flips offline; a 60s staleness sweep marks hosts offline after 180s of silence. Previously NOTHING persisted heartbeats — `connection_status` was set once at setup and never changed again. Tenant-validated (UUID + license existence, cached) per NATS-consumer doctrine
- `NatsBridgeService` — bridges `host_heartbeat` / `host_going_offline` events to the panel WebSocket
- Verified by contract test: real agent → production NATS → captured with the backend's own `nats` lib under the real license; subjects, schema 2, real telemetry, offline beacon all confirmed
**Frontend:**
- Per-route document titles + meta descriptions (router `afterEach`, no new deps): six marketing pages get real titles/descriptions/OG tags (previously every page was "Corrosion Management" with zero meta — invisible to search and link previews); panel views get mechanical "{View} — Corrosion" titles
**CI:**
- `test-runner.yml` — honest per-tool presence checks (was printing "OPERATIONAL" while every toolchain probe failed); on-demand trigger instead of every push
### Added (Corrosion Host Agent — Rust rewrite Phase 0 — 2026-06-11)
**New: `corrosion-host-agent/`** — Rust rewrite of the Go companion agent (which stays in-tree as the behavior reference until parity). Wire protocol v2 (COA-B, Commander-approved): instance-scoped subjects `corrosion.{license}.{instance}.*` with host-level `corrosion.{license}.host.*` — full spec in `corrosion-host-agent/PROTOCOL.md`.

View File

@@ -55,7 +55,12 @@ frontend/ # Vue 3 + TypeScript
package.json
vite.config.ts # Proxies /api to :3000
companion-agent/ # Go binary for bare metal servers
corrosion-host-agent/ # Rust host agent (ACTIVE) — multi-game ops runtime
src/ # main, config, bus (NATS), telemetry, prober, hostcmd
PROTOCOL.md # Wire protocol v2 spec (instance-scoped subjects)
agent.example.toml # Multi-instance config reference
companion-agent/ # Go binary (LEGACY — behavior reference until Rust parity)
cmd/agent/ # main.go entry point
internal/ # Core agent logic (nats, commands, process)
Makefile # Build for Linux/Windows
@@ -91,14 +96,16 @@ cd backend-nest && npx tsc --noEmit # Type-check without building
# Frontend
cd frontend && npm run dev # Vite dev server (port 5174)
cd frontend && npm run build # Production build → dist/
cd frontend && npm run lint # ESLint
cd frontend && npm run type-check # TypeScript checking (vue-tsc)
cd frontend && npm run build # vue-tsc -b && vite build (type-check included; no separate lint/type-check scripts exist)
# Companion Agent (Go)
# Host Agent (Rust — ACTIVE)
cd corrosion-host-agent && cargo check # Fast validation
cd corrosion-host-agent && cargo build --release --target x86_64-unknown-linux-musl # Static Linux binary
cd corrosion-host-agent && cargo xwin build --release --target x86_64-pc-windows-msvc # Windows (local)
# CI: push tag agent-vX.Y.Z (must match Cargo.toml version) → Asgard builds → CDN /host-agent/alpha/
# Companion Agent (Go — LEGACY, behavior reference until Rust parity)
cd companion-agent && make build # Build for current platform
cd companion-agent && make linux # Cross-compile for Linux
cd companion-agent && make windows # Cross-compile for Windows
# Docker (from docker/ directory — Commander ALWAYS builds with --no-cache)
docker compose build --no-cache && docker compose up -d # Full rebuild + start
@@ -374,7 +381,8 @@ Default to Sonnet. Escalate to Opus when the problem demands it, not as a comfor
- Treat every change as production deployment (`corrosionmgmt.com`)
- Document why, not just what, in commits and CHANGELOG
- **Always commit and push when done touching code — never ask, never wait for permission**
- **Tag companion agent builds when Go code in `companion-agent/` is modified** — increment from latest tag (currently v1.0.3), push tag to trigger CI build + CDN upload
- **Tag agent builds when agent code is modified** — Rust agent: `agent-vX.Y.Z` (must match `corrosion-host-agent/Cargo.toml`; CI publishes to CDN `/host-agent/alpha/`, while `/latest/` stays on the Go build until cutover). Legacy Go agent: `vX.Y.Z`. Tags roll FORWARD only — never reuse or re-push a tag; cut the next version
- **The Asgard CI runner executes jobs in a bare `node:20-bullseye` container** — no Rust/Go/Docker/sudo preinstalled; workflows must bootstrap toolchains per-run (setup-go, rustup via curl)
## Development Notes
@@ -435,3 +443,7 @@ Things I discovered about myself building a sister platform across multiple sess
22. **Build-green is not render-correct — visually verify UI work before calling it done.** The entire design-system re-skin (50+ files, six green commits) rendered almost completely unstyled in the browser — white background, no surfaces, no accent — because the design tokens never loaded. `vue-tsc -b` + `vite build` passed clean the whole time; CSS that *compiles* can still apply *zero* styles. One Playwright screenshot of the login exposed it in seconds. When the deliverable is visual, a green build is necessary but not sufficient: load it in a real browser (Playwright on the dev server at :5174), screenshot it, and assert on `getComputedStyle` — don't trust compilation alone. This is Lesson 17 with teeth.
23. **Tailwind v4 silently drops a nested `@import` barrel placed after `@import "tailwindcss"`.** `style.css` did `@import "tailwindcss"; @import "./styles/corrosion.css";` where corrosion.css was a barrel of eight `@import` token files. Once Tailwind v4 expands the tailwindcss import in place, the barrel's inner @imports no longer precede all statements, so PostCSS drops them — emitting only an easily-ignored "@import must precede all other statements" warning. Result: every design token resolved empty and the whole panel rendered unstyled. Import token/design CSS files **directly and contiguously** in the entry stylesheet; never via a nested barrel after the Tailwind import. The build warning you wave off as "pre-existing" may be the entire feature silently failing.
24. **`onModuleInit` runs before async `onModuleInit` of dependencies completes — register NATS/external subscriptions in `onApplicationBootstrap`.** `NatsService.onModuleInit` connects to NATS (async); `NatsBridgeService`/`HostAgentConsumerService` registered their subscriptions in their own `onModuleInit`, which fired while the connection was still null — so every `subscribe()` hit the `[OFFLINE]` no-op path and the WS bridge was dead-on-boot in *every* production build, silently. Nest guarantees `onApplicationBootstrap` runs only after all module init (including the awaited connect) finishes. Anything that depends on another provider's async startup belongs in bootstrap, not init. The tell: a subscription that "should be there" but the handler never fires and there's no error — trace the *startup ordering*, not the handler.
25. **Fixing a dead code path detonates the live code behind it — budget for the second bug.** The moment Lesson 24's fix made the NATS→WS bridge actually deliver events, the API crashed on the first forwarded heartbeat: `WebSocket.OPEN` was `undefined` at runtime because `esModuleInterop` is off, so `import WebSocket from 'ws'` compiled to `ws_1.default` (undefined). That crash had sat behind the dead bridge since the gateway was written — never hit because no event ever reached it. When you resurrect a path that was silently no-op, everything downstream of it is effectively *untested code running for the first time in production*. Verify the whole chain end-to-end (I watched the DB row appear, then flip offline), don't stop at "the subscription fires now." This is Lesson 10 with a fuse on it. Import-runtime gotcha worth remembering: when `esModuleInterop` is off, prefer instance constants (`client.OPEN`) over class statics (`WebSocket.OPEN`) for `ws`.

View File

@@ -49,6 +49,9 @@ import { EarlyAccessModule } from './modules/early-access/early-access.module';
// Shared Services
import { NatsService } from './services/nats.service';
import { NatsBridgeService } from './services/nats-bridge.service';
import { HostAgentConsumerService } from './services/host-agent-consumer.service';
import { ServerConnection } from './entities/server-connection.entity';
import { License } from './entities/license.entity';
import { SteamService } from './services/steam.service';
// Gateway
@@ -91,6 +94,9 @@ import { NatsBridgeGateway } from './gateways/nats-bridge.gateway';
// Scheduler
ScheduleModule.forRoot(),
// Repositories for app-level shared services (host-agent consumer)
TypeOrmModule.forFeature([ServerConnection, License]),
// Feature Modules
AuthModule,
UsersModule,
@@ -134,6 +140,7 @@ import { NatsBridgeGateway } from './gateways/nats-bridge.gateway';
// Shared services
NatsService,
NatsBridgeService,
HostAgentConsumerService,
SteamService,
// WebSocket gateway

View File

@@ -71,7 +71,10 @@ export class NatsBridgeGateway implements OnGatewayConnection, OnGatewayDisconne
// Subscribe to NATS events for this license
const listener = (event: string, data: unknown) => {
if (client.readyState === WebSocket.OPEN) {
// client.OPEN (instance constant) — NOT WebSocket.OPEN: with
// esModuleInterop off, the default `ws` import is undefined at
// runtime, so the static crashes. The instance constant is safe.
if (client.readyState === client.OPEN) {
client.send(JSON.stringify({
type: 'event',
license_id: payload.license_id,

View File

@@ -108,7 +108,9 @@ export class ConsoleGateway implements OnGatewayConnection, OnGatewayDisconnect
const message = JSON.stringify({ event, data });
for (const client of clients) {
if (client.readyState === WebSocket.OPEN) {
// client.OPEN, not WebSocket.OPEN — esModuleInterop is off so the
// default `ws` import is undefined at runtime (would crash on forward).
if (client.readyState === client.OPEN) {
client.send(message);
}
}

View File

@@ -0,0 +1,154 @@
import { Injectable, Logger, OnApplicationBootstrap } from '@nestjs/common';
import { Interval } from '@nestjs/schedule';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { NatsService } from './nats.service';
import { ServerConnection } from '../entities/server-connection.entity';
import { License } from '../entities/license.entity';
/**
* Consumes Corrosion wire protocol v2 host-agent subjects
* (corrosion-host-agent/PROTOCOL.md) and keeps server_connections truthful.
*
* Before this service existed, NOTHING persisted agent heartbeats:
* companion_last_seen was written once at setup and connection_status stayed
* 'connected' forever. Now: heartbeat -> last_seen + connected (row
* auto-created on first contact), going_offline beacon -> offline, and a
* staleness sweep marks hosts offline when heartbeats stop arriving.
*/
@Injectable()
export class HostAgentConsumerService implements OnApplicationBootstrap {
private readonly logger = new Logger(HostAgentConsumerService.name);
/** licenseId -> cache expiry epoch-ms. Positive = exists, absent = unknown. */
private knownLicenses = new Map<string, number>();
/** Unknown/garbage license ids we already warned about (anti log-spam). */
private warnedUnknown = new Set<string>();
private static readonly UUID_RE =
/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i;
private static readonly LICENSE_CACHE_TTL_MS = 5 * 60_000;
/** 3x the agent's default 60s heartbeat (which jitters to max 72s). */
private static readonly OFFLINE_AFTER_MS = 180_000;
constructor(
private readonly nats: NatsService,
@InjectRepository(ServerConnection)
private readonly connectionRepository: Repository<ServerConnection>,
@InjectRepository(License)
private readonly licenseRepository: Repository<License>,
) {}
// Bootstrap, not module-init: subscriptions registered before NatsService
// finished connecting silently no-op (see NatsBridgeService note).
onApplicationBootstrap() {
this.nats.subscribe('corrosion.*.host.heartbeat', (data, subject) => {
const licenseId = subject.split('.')[1];
void this.onHeartbeat(licenseId).catch((err) =>
this.logger.error(`heartbeat handling failed for ${licenseId}: ${err.message}`, err.stack),
);
void data; // payload telemetry is bridged to the browser; persistence here is liveness only
});
this.nats.subscribe('corrosion.*.host.going_offline', (_data, subject) => {
const licenseId = subject.split('.')[1];
void this.onGoingOffline(licenseId).catch((err) =>
this.logger.error(`going_offline handling failed for ${licenseId}: ${err.message}`, err.stack),
);
});
this.logger.log('Host agent (protocol v2) consumer subscriptions initialized');
}
private async onHeartbeat(licenseId: string): Promise<void> {
if (!(await this.isValidTenant(licenseId))) return;
const now = new Date();
const existing = await this.connectionRepository.findOne({
where: { license_id: licenseId },
});
if (existing) {
await this.connectionRepository.update(
{ id: existing.id },
{ companion_last_seen: now, connection_status: 'connected', updated_at: now },
);
if (existing.connection_status !== 'connected') {
this.logger.log(`host agent for license ${licenseId} is back online`);
}
} else {
// First contact from a host agent: auto-register the connection so the
// panel lights up without a manual setup step.
await this.connectionRepository.save(
this.connectionRepository.create({
license_id: licenseId,
connection_type: 'bare_metal',
connection_status: 'connected',
companion_last_seen: now,
}),
);
this.logger.log(`host agent registered for license ${licenseId} (first heartbeat)`);
}
}
private async onGoingOffline(licenseId: string): Promise<void> {
if (!(await this.isValidTenant(licenseId))) return;
await this.connectionRepository.update(
{ license_id: licenseId },
{ connection_status: 'offline', updated_at: new Date() },
);
this.logger.log(`host agent for license ${licenseId} went offline (graceful beacon)`);
}
/**
* Heartbeats stopping must flip the panel to offline — an agent that
* crashes or loses network never sends the goodbye beacon.
*/
@Interval(60_000)
async sweepStaleConnections(): Promise<void> {
const threshold = new Date(Date.now() - HostAgentConsumerService.OFFLINE_AFTER_MS);
const result = await this.connectionRepository
.createQueryBuilder()
.update(ServerConnection)
.set({ connection_status: 'offline', updated_at: () => 'NOW()' })
.where('connection_status = :connected', { connected: 'connected' })
.andWhere('companion_last_seen IS NOT NULL')
.andWhere('companion_last_seen < :threshold', { threshold })
.execute();
if (result.affected) {
this.logger.warn(`marked ${result.affected} stale host connection(s) offline`);
}
}
/**
* Tenant validation: the subject segment must be a real license UUID.
* NATS consumers must never write rows for subjects an arbitrary publisher
* invented. Existence is cached to avoid a query per heartbeat.
*/
private async isValidTenant(licenseId: string): Promise<boolean> {
if (!HostAgentConsumerService.UUID_RE.test(licenseId)) {
this.warnUnknownOnce(licenseId, 'not a UUID');
return false;
}
const cachedUntil = this.knownLicenses.get(licenseId);
if (cachedUntil && cachedUntil > Date.now()) return true;
const exists = await this.licenseRepository.exist({ where: { id: licenseId } });
if (!exists) {
this.warnUnknownOnce(licenseId, 'no such license');
return false;
}
this.knownLicenses.set(licenseId, Date.now() + HostAgentConsumerService.LICENSE_CACHE_TTL_MS);
return true;
}
private warnUnknownOnce(licenseId: string, reason: string): void {
if (this.warnedUnknown.has(licenseId)) return;
this.warnedUnknown.add(licenseId);
this.logger.warn(`ignoring host-agent traffic for invalid license '${licenseId}' (${reason})`);
}
}

View File

@@ -1,3 +1,4 @@
export { NatsService } from './nats.service';
export { NatsBridgeService } from './nats-bridge.service';
export { HostAgentConsumerService } from './host-agent-consumer.service';
export { SteamService } from './steam.service';

View File

@@ -1,14 +1,19 @@
import { Injectable, OnModuleInit, Logger } from '@nestjs/common';
import { Injectable, OnApplicationBootstrap, Logger } from '@nestjs/common';
import { NatsService } from './nats.service';
@Injectable()
export class NatsBridgeService implements OnModuleInit {
export class NatsBridgeService implements OnApplicationBootstrap {
private readonly logger = new Logger(NatsBridgeService.name);
private listeners: Map<string, Set<(event: string, data: unknown) => void>> = new Map();
constructor(private nats: NatsService) {}
onModuleInit() {
// Subscriptions MUST happen in onApplicationBootstrap, not onModuleInit:
// provider onModuleInit order is not guaranteed, and these hooks once ran
// before NatsService connected — every subscribe() silently no-oped and the
// WS bridge was dead from boot. Bootstrap runs after ALL module inits
// (including the awaited NATS connect) complete.
onApplicationBootstrap() {
this.nats.subscribe('corrosion.*.companion.heartbeat', (data, subject) => {
const licenseId = subject.split('.')[1];
this.emit(licenseId, 'heartbeat', data);
@@ -44,6 +49,17 @@ export class NatsBridgeService implements OnModuleInit {
this.emit(licenseId, 'oxide_status', data);
});
// Wire protocol v2 (corrosion-host-agent) — host-level telemetry
this.nats.subscribe('corrosion.*.host.heartbeat', (data, subject) => {
const licenseId = subject.split('.')[1];
this.emit(licenseId, 'host_heartbeat', data);
});
this.nats.subscribe('corrosion.*.host.going_offline', (data, subject) => {
const licenseId = subject.split('.')[1];
this.emit(licenseId, 'host_going_offline', data);
});
this.logger.log('NATS bridge subscriptions initialized');
}

View File

@@ -0,0 +1,152 @@
// Full-pipeline contract test: Rust host agent → NATS → NestJS consumer → Postgres.
//
// Proves the wire protocol v2 chain end to end against a REAL backend and DB:
// 1. agent heartbeat arrives with schema 2 + measured telemetry
// 2. backend auto-registers the server_connections row and marks it connected
// 3. instance command channel round-trips (start/status/stop) with push events
// 4. graceful agent shutdown publishes the offline beacon and the row flips offline
//
// Required env:
// LICENSE_ID — existing license uuid (CI: from the admin seed)
// DATABASE_URL — postgres connection string for assertions
// NATS_URL — broker both agent and backend use (default nats://localhost:4222)
// AGENT_BIN — path to the corrosion-host-agent binary
//
// Uses the backend's own node_modules (nats, pg) so the client libs under test
// are exactly what production runs.
import { createRequire } from 'node:module';
import { spawn } from 'node:child_process';
import { writeFileSync, mkdtempSync } from 'node:fs';
import { tmpdir } from 'node:os';
import { join, dirname } from 'node:path';
import { fileURLToPath } from 'node:url';
const repoRoot = join(dirname(fileURLToPath(import.meta.url)), '..');
const require = createRequire(join(repoRoot, 'backend-nest', 'node_modules', 'x.js'));
const { connect, StringCodec } = require('nats');
const { Client: PgClient } = require('pg');
const LICENSE = process.env.LICENSE_ID;
const NATS_URL = process.env.NATS_URL ?? 'nats://localhost:4222';
const DATABASE_URL = process.env.DATABASE_URL;
const AGENT_BIN = process.env.AGENT_BIN ?? join(repoRoot, 'corrosion-host-agent', 'target', 'debug', 'corrosion-host-agent');
if (!LICENSE || !DATABASE_URL) {
console.error('LICENSE_ID and DATABASE_URL are required');
process.exit(2);
}
const sc = StringCodec();
const errs = [];
const check = (cond, msg) => { if (!cond) errs.push(msg); };
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
async function pollDb(pg, predicate, label, timeoutMs = 30_000) {
const deadline = Date.now() + timeoutMs;
for (;;) {
const { rows } = await pg.query(
'SELECT connection_type, connection_status, companion_last_seen FROM server_connections WHERE license_id = $1',
[LICENSE],
);
if (predicate(rows)) return rows;
if (Date.now() > deadline) {
errs.push(`${label}: timeout after ${timeoutMs}ms — rows: ${JSON.stringify(rows)}`);
return rows;
}
await sleep(1000);
}
}
const main = async () => {
const pg = new PgClient({ connectionString: DATABASE_URL });
await pg.connect();
const nc = await connect({ servers: NATS_URL });
const heartbeats = [];
const statusEvents = [];
(async () => { for await (const m of nc.subscribe(`corrosion.${LICENSE}.host.heartbeat`)) heartbeats.push(JSON.parse(sc.decode(m.data))); })();
(async () => { for await (const m of nc.subscribe(`corrosion.${LICENSE}.ci-instance.status`)) statusEvents.push(JSON.parse(sc.decode(m.data))); })();
// --- spawn the real agent ---
const dir = mkdtempSync(join(tmpdir(), 'cha-contract-'));
const cfgPath = join(dir, 'agent.toml');
writeFileSync(cfgPath, `
[agent]
license_id = "${LICENSE}"
nats_url = "${NATS_URL}"
heartbeat_seconds = 10
log_level = "info"
[[instance]]
id = "ci-instance"
game = "rust"
root = "/tmp"
label = "Contract CI"
executable = "/bin/sleep"
args = ["300"]
`);
const agent = spawn(AGENT_BIN, ['--config', cfgPath], { stdio: ['ignore', 'inherit', 'inherit'] });
const agentExited = new Promise((r) => agent.on('exit', r));
// --- 1. heartbeat shape + real telemetry ---
const hbDeadline = Date.now() + 20_000;
while (heartbeats.length === 0 && Date.now() < hbDeadline) await sleep(500);
check(heartbeats.length > 0, 'no heartbeat within 20s');
if (heartbeats.length) {
const hb = heartbeats[0];
check(hb.schema === 2, `schema != 2: ${hb.schema}`);
check(typeof hb.host?.cpu_percent === 'number', 'missing host.cpu_percent');
check(hb.host?.mem_total_mb > 0, 'mem_total_mb not measured');
check(Array.isArray(hb.host?.disks) && hb.host.disks.length > 0, 'no disks reported');
check(hb.instances?.[0]?.id === 'ci-instance', 'instance missing from heartbeat');
check(!!hb.agent?.version && !!hb.agent?.commit, 'agent version/commit missing');
}
// --- 2. backend auto-registers + connects ---
const rows = await pollDb(pg, (r) => r.length === 1 && r[0].connection_status === 'connected', 'auto-register connected');
if (rows.length === 1) {
check(rows[0].connection_type === 'bare_metal', `connection_type: ${rows[0].connection_type}`);
check(rows[0].companion_last_seen !== null, 'companion_last_seen not set');
}
// --- 3. instance command channel ---
const cmd = async (payload) =>
JSON.parse(sc.decode((await nc.request(`corrosion.${LICENSE}.ci-instance.cmd`, sc.encode(JSON.stringify(payload)), { timeout: 8000 })).data));
const st0 = await cmd({ func: 'status' });
check(st0.state?.state === 'stopped', `initial state: ${JSON.stringify(st0.state)}`);
const start = await cmd({ func: 'start' });
check(start.status === 'success', `start: ${JSON.stringify(start)}`);
await sleep(1000);
const st1 = await cmd({ func: 'status' });
check(st1.state?.state === 'running', `post-start state: ${JSON.stringify(st1.state)}`);
check((await cmd({ func: 'start' })).status === 'error', 'double start must error');
check((await cmd({ func: 'bogus' })).status === 'error', 'unknown func must error');
const stop = await cmd({ func: 'stop' });
check(stop.status === 'success', `stop: ${JSON.stringify(stop)}`);
await sleep(1000);
const seq = statusEvents.map((e) => e.event?.state);
check(seq.includes('running') && seq.includes('stopped'), `status events incomplete: ${seq.join(',')}`);
// --- 4. graceful shutdown → offline beacon → DB flips offline ---
agent.kill('SIGTERM');
await Promise.race([agentExited, sleep(8000)]);
await pollDb(pg, (r) => r.length === 1 && r[0].connection_status === 'offline', 'beacon offline', 20_000);
await nc.close();
await pg.end();
if (errs.length) {
console.error('\nCONTRACT FAIL:');
errs.forEach((e) => console.error(' -', e));
process.exit(1);
}
console.log('\nCONTRACT PASS: heartbeat shape, auto-register, connected/offline lifecycle, instance command channel, push events');
process.exit(0);
};
main().catch((e) => {
console.error('contract test crashed:', e);
process.exit(1);
});

View File

@@ -149,6 +149,12 @@ version = "3.20.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "72f5acc6cb2ba439de613abc23857ec3d78374d8ed5ac84e9d11336e87da8649"
[[package]]
name = "byteorder"
version = "1.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1fd0f2584146f6f2ef48085050886acf353beff7305ebd1ae69500e27c67f64b"
[[package]]
name = "bytes"
version = "1.11.1"
@@ -265,11 +271,13 @@ dependencies = [
"chrono",
"clap",
"futures",
"libc",
"rand",
"serde",
"serde_json",
"sysinfo",
"tokio",
"tokio-tungstenite",
"tokio-util",
"toml",
"tracing",
@@ -580,6 +588,22 @@ version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2304e00983f87ffb38b55b444b5e3b60a884b5d30c0fca7d82fe33449bbe55ea"
[[package]]
name = "http"
version = "1.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6970f50e31d6fc17d3fa27329444bfa74e196cf62e95052a3f6fee181dba6425"
dependencies = [
"bytes",
"itoa",
]
[[package]]
name = "httparse"
version = "1.10.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6dbf3de79e51f3d586ab4cb9d5c3e2c14aa28ed23d180cf89b4df0454a69cc87"
[[package]]
name = "iana-time-zone"
version = "0.1.65"
@@ -1276,6 +1300,17 @@ dependencies = [
"serde",
]
[[package]]
name = "sha1"
version = "0.10.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e3bf829a2d51ab4a5ddf1352d8470c140cadc8301b2ae1789db023f01cedd6ba"
dependencies = [
"cfg-if",
"cpufeatures",
"digest",
]
[[package]]
name = "sha2"
version = "0.10.9"
@@ -1528,6 +1563,18 @@ dependencies = [
"tokio",
]
[[package]]
name = "tokio-tungstenite"
version = "0.24.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "edc5f74e248dc973e0dbb7b74c7e0d6fcc301c694ff50049504004ef4d0cdcd9"
dependencies = [
"futures-util",
"log",
"tokio",
"tungstenite",
]
[[package]]
name = "tokio-util"
version = "0.7.18"
@@ -1654,6 +1701,24 @@ dependencies = [
"tokio",
]
[[package]]
name = "tungstenite"
version = "0.24.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "18e5b8366ee7a95b16d32197d0b2604b43a0be89dc5fac9f8e96ccafbaedda8a"
dependencies = [
"byteorder",
"bytes",
"data-encoding",
"http",
"httparse",
"log",
"rand",
"sha1",
"thiserror",
"utf-8",
]
[[package]]
name = "typenum"
version = "1.20.1"
@@ -1684,6 +1749,12 @@ dependencies = [
"serde",
]
[[package]]
name = "utf-8"
version = "0.7.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "09cc8ee72d2a9becf2f2febe0205bbed8fc6615b7cb429ad062dc7b7ddd036a9"
[[package]]
name = "utf8_iter"
version = "1.0.4"

View File

@@ -25,6 +25,10 @@ tracing-subscriber = { version = "0.3", features = ["env-filter", "fmt"] }
anyhow = "1"
clap = { version = "4.5", features = ["derive"] }
rand = "0.8"
tokio-tungstenite = "0.24"
[target.'cfg(unix)'.dependencies]
libc = "0.2"
# Size-optimized release: single static binary living next to RAM-heavy game
# servers. Panic stays 'unwind' so a panicking task surfaces through its

View File

@@ -1,8 +1,9 @@
# Corrosion Wire Protocol v2
Status: **Phase 0 implemented** (host heartbeat, host commands, going-offline
beacon). Per-instance command/status subjects are reserved and specified here
for Phase 1.
Status: **Phase 0 + Phase 1 process control implemented** (host heartbeat,
host commands, going-offline beacon, per-instance start/stop/restart/status
with push state events). RCON, SteamCMD, file ops, and game adapters are
specified but not yet implemented.
## Design
@@ -70,9 +71,10 @@ All telemetry is measured, never fabricated. Fields the agent cannot measure
are omitted (`probe` before the first probe completes, `hostname` if
unavailable).
Phase 0 instance `state` values: `configured` (root path exists),
`missing_root`. Phase 1 adds live process states: `running`, `stopped`,
`crashed`, `starting`, `updating`.
Instance `state` values — process-managed (an `executable` is configured):
`running`, `stopped`, `starting`, `stopping`, `crashed`; unmanaged
(telemetry-only): `configured` (root exists), `missing_root`. Each instance
also reports `uptime_seconds` (0 unless running).
### `corrosion.{license_id}.host.cmd` (backend → agent, request-reply)
@@ -92,19 +94,40 @@ Best-effort beacon (500ms budget) on graceful shutdown so the panel can flip
the host to offline immediately instead of waiting out heartbeat staleness.
Payload: `{}`.
## Instance-level subjects (Phase 1 — reserved, not yet implemented)
## Instance-level subjects
### `corrosion.{license_id}.{instance_id}.cmd` (backend → agent, request-reply)
### `corrosion.{license_id}.{instance_id}.cmd` (backend → agent, request-reply) — LIVE
Lifecycle and control for one game instance. Planned funcs: `start`, `stop`,
`restart`, `status`, `rcon` (process-class games), `steam_update`,
`oxide_install` (rust), plus game-adapter-specific commands (Dune: docker
lifecycle, RabbitMQ bus commands, Coriolis reset).
Lifecycle and control for one game instance.
### `corrosion.{license_id}.{instance_id}.status` (agent → backend, publish)
Implemented funcs: `start`, `stop` (graceful with 30s budget, then force
kill), `restart`, `status` (returns `state` + `uptime_seconds`), and
`rcon``{ "func": "rcon", "command": "<console command>" }` returns
`{ "status": "success", "output": <server response> }`. Protocol per game:
WebRCON (WebSocket JSON) for rust, Source RCON (Valve TCP) for
conan/soulmask; explicit `kind` override available in the instance's
`[instance.rcon]` config. Always targets 127.0.0.1 (agent is co-located).
Errors reply `{ "status": "error", "message": ... }` — including start on an
unmanaged instance, double start, missing rcon config, and unknown funcs.
State-change events (started/stopped/crashed) so the panel does not wait for
the next heartbeat.
Planned funcs: `steam_update`, `oxide_install` (rust), plus
game-adapter-specific commands (Dune: docker lifecycle, RabbitMQ bus
commands, Coriolis reset).
### `corrosion.{license_id}.{instance_id}.status` (agent → backend, publish) — LIVE
State-change events so the panel does not wait for the next heartbeat.
Payload: `{ "timestamp", "instance_id", "event": { "state": ..., "exit_code"? } }`.
Semantics: **keep-latest state sync**, not a lossless transition ledger —
near-instant transient states (e.g. `starting` when spawn succeeds
immediately) may coalesce into the following state. Consumers should treat
each event as "current state is now X".
Known Phase 1 limitation: the supervisor does not yet persist/adopt PIDs — if
the agent itself restarts while a game server is running, the game process
survives but reports `stopped` until restarted through the panel. PID
adoption is queued with the service-install work.
### `corrosion.{license_id}.{instance_id}.console` (agent → backend, publish)

View File

@@ -15,7 +15,11 @@ instance on that host — Rust, Conan Exiles, Soulmask, Dune: Awakening.
- [x] Connectivity prober (outbound TCP, periodic + on-demand)
- [x] Host command channel (`ping`, `probe`, `sysinfo`)
- [x] Graceful shutdown (cancellation token, going-offline beacon, NATS flush)
- [ ] Phase 1: process-class game adapter (spawn/RCON/SteamCMD/files) — Rust, Conan, Soulmask
- [x] Phase 1a: process supervision — per-instance start/stop/restart/status over
`{instance}.cmd` request-reply, push state events on `{instance}.status`,
crash detection with exit codes, live state in heartbeats
(integration-tested with real processes + live-NATS contract test)
- [ ] Phase 1b: RCON trait (WebRCON rust / TCP conan+soulmask), SteamCMD, jailed file manager
- [ ] Phase 2: Dune Docker adapter (compose lifecycle, RabbitMQ bus, Postgres admin)
- [ ] Phase 3: signed self-update (enforced ed25519 — release gate), service install, supervisor split

View File

@@ -23,11 +23,28 @@ game = "rust" # rust | conan | soulmask | dune
root = "/opt/rustserver"
label = "Main 2x Vanilla"
# RCON lets the panel send console commands to the running server.
# For rust the protocol is WebRCON (WebSocket JSON); for conan/soulmask it is
# Source RCON (Valve TCP binary). `kind` is optional — it is inferred from
# the game name when absent.
#
# The [instance.rcon] sub-table MUST immediately follow the [[instance]] entry
# it belongs to (standard TOML array-of-tables scoping rule).
[instance.rcon]
port = 28016
password = "changeme"
# kind = "webrcon" # explicit override; omit to infer from game
# [[instance]]
# id = "soulmask-main"
# game = "soulmask"
# root = "/opt/soulmask/main"
# label = "Cloud Mist Forest (cluster main)"
#
# [instance.rcon]
# port = 19000
# password = "changeme"
# # kind = "source" # inferred automatically for soulmask
[prober]
interval_seconds = 300

View File

@@ -1,10 +1,13 @@
//! Shared agent handle: every subsystem task holds an `Arc<Agent>`.
use std::collections::HashMap;
use std::sync::Arc;
use std::time::Instant;
use tokio::sync::RwLock;
use tokio_util::sync::CancellationToken;
use crate::config::Settings;
use crate::process::ProcessSupervisor;
use crate::prober::ProbeReport;
pub struct Agent {
@@ -12,5 +15,8 @@ pub struct Agent {
pub nats: async_nats::Client,
pub started: Instant,
pub last_probe: RwLock<Option<ProbeReport>>,
/// One supervisor per instance (unmanaged instances included — they
/// report `unmanaged` state and reject process commands).
pub supervisors: HashMap<String, Arc<ProcessSupervisor>>,
pub shutdown: CancellationToken,
}

View File

@@ -10,6 +10,8 @@ use serde::Deserialize;
use std::collections::HashSet;
use std::path::{Path, PathBuf};
use crate::rcon::RconConfig;
/// Instance ids share the NATS subject namespace with host-level segments.
const RESERVED_INSTANCE_IDS: &[&str] = &["host", "cmd", "files", "update", "agent"];
@@ -49,6 +51,33 @@ pub struct InstanceConfig {
/// Optional human label shown in the panel.
#[serde(default)]
pub label: Option<String>,
/// Game server executable. Relative paths resolve against `root`.
/// Absent = unmanaged instance (telemetry only, no process control).
#[serde(default)]
pub executable: Option<PathBuf>,
/// Arguments as a proper list — no shell splitting, quoted values survive.
#[serde(default)]
pub args: Vec<String>,
/// Working directory for the process. Defaults to the executable's directory.
#[serde(default)]
pub working_dir: Option<PathBuf>,
/// RCON connection settings for this instance. Absent = rcon unavailable.
/// Protocol defaults to WebRcon for rust, Source for conan/soulmask.
#[serde(default)]
pub rcon: Option<RconConfig>,
}
impl InstanceConfig {
/// Absolute executable path, if this instance is process-managed.
pub fn resolved_executable(&self) -> Option<PathBuf> {
self.executable.as_ref().map(|exe| {
if exe.is_absolute() {
exe.clone()
} else {
self.root.join(exe)
}
})
}
}
#[derive(Debug, Clone, Default, Deserialize)]

View File

@@ -0,0 +1,201 @@
//! Per-instance command channel + state-change events.
//!
//! Each process-managed instance gets a request-reply subscriber on
//! `corrosion.{license}.{instance_id}.cmd` (funcs: start/stop/restart/status/rcon)
//! and a publisher task that pushes every supervisor state change to
//! `corrosion.{license}.{instance_id}.status` — the panel sees crashes when
//! they happen, not when the next heartbeat ambles in.
use chrono::{SecondsFormat, Utc};
use futures::StreamExt;
use serde::Deserialize;
use serde_json::json;
use std::sync::Arc;
use crate::agent::Agent;
use crate::process::ProcessSupervisor;
use crate::subjects;
#[derive(Debug, Deserialize)]
struct InstanceCommand {
func: String,
/// Payload for funcs that carry a text argument (e.g. rcon).
#[serde(default)]
command: Option<String>,
}
/// Forward every supervisor state change as a status event.
pub async fn publish_state_changes(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>) {
let subject = subjects::instance_status(&agent.cfg.license_id, &sup.instance_id);
let mut rx = sup.watch_state();
let cancel = agent.shutdown.clone();
loop {
tokio::select! {
changed = rx.changed() => {
if changed.is_err() {
break;
}
let state = rx.borrow().clone();
let event = json!({
"timestamp": Utc::now().to_rfc3339_opts(SecondsFormat::Secs, true),
"instance_id": sup.instance_id,
"event": state,
});
match serde_json::to_vec(&event) {
Ok(bytes) => {
if let Err(e) = agent.nats.publish(subject.clone(), bytes.into()).await {
tracing::warn!("status publish failed for '{}': {e}", sup.instance_id);
}
}
Err(e) => tracing::error!("status serialize failed: {e}"),
}
}
_ = cancel.cancelled() => break,
}
}
}
/// Request-reply command handler for one instance.
pub async fn run(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>) -> anyhow::Result<()> {
let subject = subjects::instance_cmd(&agent.cfg.license_id, &sup.instance_id);
let mut sub = agent.nats.subscribe(subject.clone()).await?;
tracing::info!("instance command handler listening on {subject}");
let cancel = agent.shutdown.clone();
loop {
tokio::select! {
msg = sub.next() => {
match msg {
Some(msg) => {
let agent = agent.clone();
let sup = sup.clone();
tokio::spawn(async move { handle(agent, sup, msg).await });
}
None => {
tracing::warn!("instance command subscription ended for '{}'", sup.instance_id);
break;
}
}
}
_ = cancel.cancelled() => {
tracing::info!("instance command handler stopping for '{}'", sup.instance_id);
break;
}
}
}
Ok(())
}
async fn handle(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>, msg: async_nats::Message) {
let Some(reply) = msg.reply.clone() else {
tracing::warn!("instance command without reply subject ignored");
return;
};
let response = match serde_json::from_slice::<InstanceCommand>(&msg.payload) {
Ok(cmd) => dispatch(&agent, &sup, &cmd).await,
Err(e) => json!({ "status": "error", "message": format!("invalid command payload: {e}") }),
};
let bytes = match serde_json::to_vec(&response) {
Ok(b) => b,
Err(e) => {
tracing::error!("response serialize failed: {e}");
return;
}
};
if let Err(e) = agent.nats.publish(reply, bytes.into()).await {
tracing::warn!("response publish failed: {e}");
}
}
async fn dispatch(
agent: &Arc<Agent>,
sup: &Arc<ProcessSupervisor>,
cmd: &InstanceCommand,
) -> serde_json::Value {
let func = cmd.func.as_str();
let outcome = match func {
"start" => sup.start().await.map(|_| "starting"),
"stop" => sup.stop().await.map(|_| "stopped"),
"restart" => sup.restart().await.map(|_| "restarted"),
"status" => {
return json!({
"status": "success",
"func": "status",
"instance_id": sup.instance_id,
"state": sup.state(),
"uptime_seconds": sup.uptime_seconds().await,
});
}
"rcon" => {
// Look up the InstanceConfig for this supervisor so we can access
// rcon settings and the game name without changing the supervisor's
// data model.
let inst_cfg = agent
.cfg
.instances
.iter()
.find(|i| i.id == sup.instance_id);
let rcon_cfg = inst_cfg.and_then(|i| i.rcon.as_ref());
let Some(rcon_cfg) = rcon_cfg else {
return json!({
"status": "error",
"func": "rcon",
"instance_id": sup.instance_id,
"message": format!("instance '{}' has no rcon configured", sup.instance_id),
});
};
let Some(command) = cmd.command.as_deref() else {
return json!({
"status": "error",
"func": "rcon",
"instance_id": sup.instance_id,
"message": "rcon func requires a 'command' field",
});
};
let game = inst_cfg.map(|i| i.game.as_str()).unwrap_or("rust");
return match crate::rcon::send_command(rcon_cfg, game, command).await {
Ok(output) => json!({
"status": "success",
"func": "rcon",
"instance_id": sup.instance_id,
"output": output,
}),
Err(e) => json!({
"status": "error",
"func": "rcon",
"instance_id": sup.instance_id,
"message": format!("{e:#}"),
}),
};
}
other => {
return json!({
"status": "error",
"message": format!("unknown func '{other}' (supported: start, stop, restart, status, rcon)"),
});
}
};
match outcome {
Ok(result) => json!({
"status": "success",
"func": func,
"instance_id": sup.instance_id,
"result": result,
"state": sup.state(),
}),
Err(e) => json!({
"status": "error",
"func": func,
"instance_id": sup.instance_id,
"message": format!("{e:#}"),
}),
}
}

View File

@@ -0,0 +1,14 @@
//! Corrosion Host Agent library surface — modules are public so integration
//! tests can drive subsystems (notably the process supervisor) directly.
pub mod agent;
pub mod bus;
pub mod config;
pub mod hostcmd;
pub mod instancecmd;
pub mod prober;
pub mod process;
pub mod rcon;
pub mod subjects;
pub mod telemetry;
pub mod version;

View File

@@ -4,14 +4,9 @@
//! connectivity prober, host command channel. Process control, file ops, and
//! game adapters arrive in Phase 1+ (see PROTOCOL.md).
mod agent;
mod bus;
mod config;
mod hostcmd;
mod prober;
mod subjects;
mod telemetry;
mod version;
use corrosion_host_agent::{
agent, bus, config, hostcmd, instancecmd, prober, process, subjects, telemetry, version,
};
use anyhow::{Context, Result};
use clap::{Parser, Subcommand};
@@ -96,11 +91,18 @@ async fn run(settings: config::Settings) -> Result<()> {
let nats = bus::connect(&settings).await?;
let supervisors = settings
.instances
.iter()
.map(|inst| (inst.id.clone(), process::ProcessSupervisor::new(inst)))
.collect();
let agent = Arc::new(Agent {
cfg: settings,
nats,
started: Instant::now(),
last_probe: RwLock::new(None),
supervisors,
shutdown: CancellationToken::new(),
});
@@ -115,6 +117,21 @@ async fn run(settings: config::Settings) -> Result<()> {
}
}));
}
for sup in agent.supervisors.values() {
{
let agent = agent.clone();
let sup = sup.clone();
handles.push(tokio::spawn(async move {
if let Err(e) = instancecmd::run(agent, sup).await {
tracing::error!("instance command handler failed: {e:#}");
}
}));
}
handles.push(tokio::spawn(instancecmd::publish_state_changes(
agent.clone(),
sup.clone(),
)));
}
wait_for_shutdown_signal().await;
tracing::info!("shutdown signal received");

View File

@@ -0,0 +1,278 @@
//! Per-instance game-server process supervision.
//!
//! One `ProcessSupervisor` per process-managed instance. Lifecycle mirrors the
//! proven Go agent behavior — graceful SIGTERM with a 30s budget before force
//! kill, a monitor task that reaps the child and records crash-vs-stop — with
//! two fixes the Go version needed: args are a proper list (no naive space
//! splitting), and every state change is observable through a watch channel
//! so the panel gets push events instead of waiting for the next heartbeat.
use anyhow::{bail, Context, Result};
use serde::Serialize;
use std::path::PathBuf;
use std::process::Stdio;
use std::sync::Arc;
use std::time::{Duration, Instant};
use tokio::process::{Child, Command};
use tokio::sync::{watch, Mutex};
use crate::config::InstanceConfig;
const GRACEFUL_STOP_BUDGET: Duration = Duration::from_secs(30);
const RESTART_PAUSE: Duration = Duration::from_secs(2);
#[derive(Debug, Clone, PartialEq, Serialize)]
#[serde(rename_all = "snake_case", tag = "state")]
pub enum InstanceState {
/// Not process-managed (no executable configured).
Unmanaged,
Stopped,
Starting,
Running,
Stopping,
/// Process exited without a stop request.
Crashed {
#[serde(skip_serializing_if = "Option::is_none")]
exit_code: Option<i32>,
},
}
impl InstanceState {
pub fn as_label(&self) -> &'static str {
match self {
InstanceState::Unmanaged => "unmanaged",
InstanceState::Stopped => "stopped",
InstanceState::Starting => "starting",
InstanceState::Running => "running",
InstanceState::Stopping => "stopping",
InstanceState::Crashed { .. } => "crashed",
}
}
}
struct Inner {
child: Option<Child>,
started_at: Option<Instant>,
/// True while a stop was requested — the monitor uses it to distinguish
/// an ordered shutdown from a crash.
stop_requested: bool,
}
pub struct ProcessSupervisor {
pub instance_id: String,
executable: Option<PathBuf>,
args: Vec<String>,
working_dir: Option<PathBuf>,
inner: Mutex<Inner>,
state_tx: watch::Sender<InstanceState>,
}
impl ProcessSupervisor {
pub fn new(cfg: &InstanceConfig) -> Arc<Self> {
let executable = cfg.resolved_executable();
let initial = if executable.is_some() {
InstanceState::Stopped
} else {
InstanceState::Unmanaged
};
let (state_tx, _) = watch::channel(initial);
Arc::new(Self {
instance_id: cfg.id.clone(),
executable,
args: cfg.args.clone(),
working_dir: cfg.working_dir.clone(),
inner: Mutex::new(Inner {
child: None,
started_at: None,
stop_requested: false,
}),
state_tx,
})
}
pub fn state(&self) -> InstanceState {
self.state_tx.borrow().clone()
}
pub fn watch_state(&self) -> watch::Receiver<InstanceState> {
self.state_tx.subscribe()
}
pub async fn uptime_seconds(&self) -> u64 {
let inner = self.inner.lock().await;
match (&*self.state_tx.borrow(), inner.started_at) {
(InstanceState::Running, Some(t)) => t.elapsed().as_secs(),
_ => 0,
}
}
pub async fn start(self: &Arc<Self>) -> Result<()> {
let Some(exe) = self.executable.clone() else {
bail!("instance '{}' has no executable configured", self.instance_id);
};
if !exe.exists() {
bail!("executable not found: {}", exe.display());
}
let mut inner = self.inner.lock().await;
if matches!(*self.state_tx.borrow(), InstanceState::Running | InstanceState::Starting) {
bail!("instance '{}' is already running", self.instance_id);
}
self.set_state(InstanceState::Starting);
let workdir = self
.working_dir
.clone()
.or_else(|| exe.parent().map(|p| p.to_path_buf()))
.unwrap_or_else(|| PathBuf::from("."));
let child = Command::new(&exe)
.args(&self.args)
.current_dir(&workdir)
.stdin(Stdio::null())
.stdout(Stdio::inherit())
.stderr(Stdio::inherit())
.spawn()
.with_context(|| format!("spawning {}", exe.display()))?;
let pid = child.id();
inner.child = Some(child);
inner.started_at = Some(Instant::now());
inner.stop_requested = false;
drop(inner);
self.set_state(InstanceState::Running);
tracing::info!(
"instance '{}' started: {} (pid {:?})",
self.instance_id,
exe.display(),
pid
);
// Monitor: reap the child and classify the exit.
let sup = Arc::clone(self);
tokio::spawn(async move { sup.monitor().await });
Ok(())
}
async fn monitor(self: Arc<Self>) {
// Take a waiter without holding the lock across the whole child
// lifetime: Child::wait needs &mut, so the child stays in inner and
// we poll it.
loop {
let status = {
let mut inner = self.inner.lock().await;
let Some(child) = inner.child.as_mut() else { return };
match child.try_wait() {
Ok(Some(status)) => Some(status),
Ok(None) => None,
Err(e) => {
tracing::error!("instance '{}' wait failed: {e}", self.instance_id);
return;
}
}
};
match status {
Some(status) => {
let mut inner = self.inner.lock().await;
inner.child = None;
inner.started_at = None;
let ordered = inner.stop_requested;
inner.stop_requested = false;
drop(inner);
if ordered {
self.set_state(InstanceState::Stopped);
tracing::info!("instance '{}' stopped ({status})", self.instance_id);
} else {
let exit_code = status.code();
self.set_state(InstanceState::Crashed { exit_code });
tracing::warn!(
"instance '{}' exited unexpectedly ({status}) — marked crashed",
self.instance_id
);
}
return;
}
None => tokio::time::sleep(Duration::from_millis(500)).await,
}
}
}
pub async fn stop(self: &Arc<Self>) -> Result<()> {
let mut inner = self.inner.lock().await;
if inner.child.is_none() {
bail!("instance '{}' is not running", self.instance_id);
}
inner.stop_requested = true;
self.set_state(InstanceState::Stopping);
let child = inner.child.as_mut().expect("checked above");
// Graceful first: SIGTERM on unix; Windows has no SIGTERM equivalent
// for console processes, so it goes straight to kill there.
#[cfg(unix)]
if let Some(pid) = child.id() {
unsafe {
libc::kill(pid as i32, libc::SIGTERM);
}
}
#[cfg(not(unix))]
{
let _ = child.start_kill();
}
drop(inner);
// Wait for the monitor to observe the exit; force kill on budget.
let mut rx = self.watch_state();
let deadline = tokio::time::timeout(GRACEFUL_STOP_BUDGET, async {
loop {
if matches!(*rx.borrow(), InstanceState::Stopped) {
return;
}
if rx.changed().await.is_err() {
return;
}
}
})
.await;
if deadline.is_err() {
tracing::warn!(
"instance '{}' ignored SIGTERM for {}s — force killing",
self.instance_id,
GRACEFUL_STOP_BUDGET.as_secs()
);
let mut inner = self.inner.lock().await;
if let Some(child) = inner.child.as_mut() {
let _ = child.start_kill();
}
drop(inner);
let mut rx = self.watch_state();
let _ = tokio::time::timeout(Duration::from_secs(5), async {
while !matches!(*rx.borrow(), InstanceState::Stopped) {
if rx.changed().await.is_err() {
break;
}
}
})
.await;
}
Ok(())
}
pub async fn restart(self: &Arc<Self>) -> Result<()> {
if !matches!(*self.state_tx.borrow(), InstanceState::Stopped | InstanceState::Crashed { .. } | InstanceState::Unmanaged) {
self.stop().await?;
}
tokio::time::sleep(RESTART_PAUSE).await;
self.start().await
}
fn set_state(&self, state: InstanceState) {
// send_replace never fails even with zero receivers.
let _ = self.state_tx.send_replace(state);
}
}

View File

@@ -0,0 +1,320 @@
//! RCON client: game-server remote-console over WebRCON (Rust) or Source RCON (Conan/Soulmask).
//!
//! The agent runs co-located with the game server, so every connection targets
//! 127.0.0.1 — no TLS is needed and latency is sub-millisecond. Two protocols
//! are supported because the Rust game ships its own WebSocket-based WebRCON
//! while Conan Exiles and Soulmask use the Valve Source RCON wire format over
//! plain TCP.
//!
//! The protocol selection is explicit in the config (`kind`) but can be inferred
//! from the game name when absent — callers supply the `game` field they already
//! have in `InstanceConfig`.
use anyhow::{bail, Context, Result};
use futures::{SinkExt, StreamExt};
use rand::Rng;
use serde::Deserialize;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
use tokio::net::TcpStream;
use tokio::time::{timeout, Duration};
/// WebRCON is the Facepunch WebSocket protocol (Rust game).
/// Source RCON is the Valve wire protocol used by Conan Exiles and Soulmask.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum RconKind {
WebRcon,
Source,
}
#[derive(Debug, Clone, Deserialize)]
#[serde(deny_unknown_fields)]
pub struct RconConfig {
/// Protocol override. When absent the kind is resolved from `game`.
#[serde(default)]
pub kind: Option<RconKind>,
pub port: u16,
pub password: String,
}
impl RconConfig {
/// Resolve the concrete protocol, falling back to a per-game default when
/// `kind` is not set. rust → WebRcon; conan + soulmask → Source.
pub fn resolved_kind(&self, game: &str) -> RconKind {
if let Some(k) = self.kind {
return k;
}
match game {
"conan" | "soulmask" => RconKind::Source,
// rust is the primary game; anything unknown defaults to WebRcon
// — operators can always override with an explicit `kind`.
_ => RconKind::WebRcon,
}
}
}
const CONNECT_TIMEOUT: Duration = Duration::from_secs(5);
const RESPONSE_TIMEOUT: Duration = Duration::from_secs(10);
/// Send `command` to the game server and return its text response.
///
/// The agent runs on the same host as the game server, so the target address
/// is always 127.0.0.1:{port}. Connection and response deadlines are fixed at
/// 5 s and 10 s respectively — enough headroom for a loaded server while still
/// catching hung connections quickly.
pub async fn send_command(cfg: &RconConfig, game: &str, command: &str) -> Result<String> {
match cfg.resolved_kind(game) {
RconKind::WebRcon => webrcon_exec(cfg, command).await,
RconKind::Source => source_rcon_exec(cfg, command).await,
}
}
// ---------------------------------------------------------------------------
// WebRCON (Rust game) — WebSocket JSON protocol
// ---------------------------------------------------------------------------
/// WebRCON request/response envelope. The server also emits chat/log frames
/// on this socket with Identifier == 0; those are skipped.
#[derive(serde::Serialize)]
struct WebRconRequest<'a> {
#[serde(rename = "Identifier")]
identifier: i32,
#[serde(rename = "Message")]
message: &'a str,
#[serde(rename = "Name")]
name: &'static str,
}
#[derive(serde::Deserialize)]
struct WebRconResponse {
#[serde(rename = "Identifier")]
identifier: i32,
#[serde(rename = "Message")]
message: String,
}
async fn webrcon_exec(cfg: &RconConfig, command: &str) -> Result<String> {
use tokio_tungstenite::connect_async;
use tokio_tungstenite::tungstenite::Message as WsMsg;
// The Rust game server embeds the password in the WebSocket URL path —
// never interpolate the real URL into errors or logs.
let url = format!("ws://127.0.0.1:{}/{}", cfg.port, cfg.password);
let redacted = format!("ws://127.0.0.1:{}/<redacted>", cfg.port);
// Wrap the entire connection + exchange in the connect timeout — we want
// the timeout to cover TCP handshake + WS upgrade, not just the send.
let (mut ws, _) = timeout(CONNECT_TIMEOUT, connect_async(&url))
.await
.context("connect timeout")?
.with_context(|| format!("WebRCON connect to {redacted}"))?;
// Use a random positive i32 so correlation is unambiguous even when
// multiple callers share a port (future concurrency).
let id: i32 = rand::thread_rng().gen_range(1..=i32::MAX);
let req = WebRconRequest { identifier: id, message: command, name: "Corrosion" };
let payload = serde_json::to_string(&req).context("serialize WebRCON request")?;
ws.send(WsMsg::Text(payload))
.await
.context("send WebRCON command")?;
tracing::debug!("WebRCON sent id={id} command={command:?}");
// Read frames until we see our Identifier — skip chat/log noise (id 0 or
// any other value that isn't ours).
let result = timeout(RESPONSE_TIMEOUT, async {
loop {
match ws.next().await {
Some(Ok(WsMsg::Text(text))) => {
match serde_json::from_str::<WebRconResponse>(&text) {
Ok(resp) if resp.identifier == id => return Ok(resp.message),
Ok(_) => {
// Not our response (chat, log, another caller's frame).
tracing::trace!("WebRCON skipping frame with different Identifier");
continue;
}
Err(e) => {
tracing::trace!("WebRCON non-JSON frame ignored: {e}");
continue;
}
}
}
Some(Ok(WsMsg::Close(_))) => bail!("WebRCON server closed connection"),
Some(Ok(_)) => continue, // binary/ping/pong — skip
Some(Err(e)) => return Err(anyhow::anyhow!(e).context("WebRCON read error")),
None => bail!("WebRCON stream ended without response"),
}
}
})
.await
.context("WebRCON response timeout")??;
// Close cleanly; a send error here is cosmetic — we already have our data.
let _ = ws.close(None).await;
Ok(result)
}
// ---------------------------------------------------------------------------
// Source RCON (Conan Exiles, Soulmask) — Valve TCP binary protocol
//
// Packet layout (all fields little-endian):
// i32 size — byte count of the remaining packet (id + type + body + 2 nulls)
// i32 id — caller-chosen correlation id; auth failure returns -1
// i32 type — 0=RESPONSE_VALUE, 2=EXECCOMMAND/AUTH_RESPONSE, 3=AUTH
// [u8] body — UTF-8 command or response text
// u8 0x00 — body null terminator
// u8 0x00 — padding null terminator
//
// Multi-packet handling: after sending the command we also send an empty
// RESPONSE_VALUE probe with a distinct id. We collect all RESPONSE_VALUE
// packets belonging to the command id and stop when we receive the probe's
// response. This is the standard technique specified in the Valve wiki.
// ---------------------------------------------------------------------------
const RCON_TYPE_AUTH: i32 = 3;
const RCON_TYPE_AUTH_RESPONSE: i32 = 2;
const RCON_TYPE_EXECCOMMAND: i32 = 2;
const RCON_TYPE_RESPONSE_VALUE: i32 = 0;
/// Maximum accumulated response body (guards against misbehaving servers).
const MAX_RESPONSE_BYTES: usize = 1024 * 1024; // 1 MiB
async fn source_rcon_exec(cfg: &RconConfig, command: &str) -> Result<String> {
let addr = format!("127.0.0.1:{}", cfg.port);
let stream = timeout(CONNECT_TIMEOUT, TcpStream::connect(&addr))
.await
.context("connect timeout")?
.with_context(|| format!("Source RCON connect to {addr}"))?;
let mut stream = stream;
// --- Auth ---
let auth_id: i32 = rand::thread_rng().gen_range(1..=i32::MAX);
send_packet(&mut stream, auth_id, RCON_TYPE_AUTH, cfg.password.as_bytes()).await?;
// The server sends two responses to AUTH: first an empty RESPONSE_VALUE,
// then an AUTH_RESPONSE. We skip the first and read until AUTH_RESPONSE.
timeout(RESPONSE_TIMEOUT, async {
loop {
let (id, ptype, _body) = recv_packet(&mut stream).await?;
if ptype == RCON_TYPE_AUTH_RESPONSE {
if id == -1 {
bail!("Source RCON auth failed: wrong password");
}
tracing::debug!("Source RCON authenticated (id={id})");
return Ok(());
}
// Skip the empty RESPONSE_VALUE that precedes AUTH_RESPONSE.
}
#[allow(unreachable_code)]
Ok::<(), anyhow::Error>(())
})
.await
.context("Source RCON auth timeout")??;
// --- Command ---
let cmd_id: i32 = rand::thread_rng().gen_range(1..=i32::MAX);
// Probe id must differ from cmd_id.
let probe_id: i32 = loop {
let id: i32 = rand::thread_rng().gen_range(1..=i32::MAX);
if id != cmd_id {
break id;
}
};
send_packet(&mut stream, cmd_id, RCON_TYPE_EXECCOMMAND, command.as_bytes()).await?;
// Empty RESPONSE_VALUE probe — the server echoes it after processing the
// preceding command, signalling end-of-response.
send_packet(&mut stream, probe_id, RCON_TYPE_RESPONSE_VALUE, b"").await?;
// Not every server is probe-conformant (Soulmask unverified): once we hold
// response data, a short per-read quiet period also terminates — never
// discard a response we already received just because the probe echo
// didn't come back.
const QUIET_PERIOD: Duration = Duration::from_millis(1500);
let response = timeout(RESPONSE_TIMEOUT, async {
let mut body_accum: Vec<u8> = Vec::new();
loop {
let next = if body_accum.is_empty() {
recv_packet(&mut stream).await.map(Some)
} else {
match timeout(QUIET_PERIOD, recv_packet(&mut stream)).await {
Ok(res) => res.map(Some),
Err(_elapsed) => Ok(None), // quiet after data — done
}
};
let Some((id, ptype, body)) = next? else {
break;
};
if ptype != RCON_TYPE_RESPONSE_VALUE {
continue; // unexpected packet type — skip
}
if id == probe_id {
// Probe echoed back — all command response packets have arrived.
break;
}
if id == cmd_id {
if body_accum.len() + body.len() > MAX_RESPONSE_BYTES {
bail!("Source RCON response exceeded {MAX_RESPONSE_BYTES} bytes");
}
body_accum.extend_from_slice(&body);
}
// Skip packets with other ids (shouldn't happen but be defensive).
}
Ok::<Vec<u8>, anyhow::Error>(body_accum)
})
.await
.context("Source RCON response timeout")??;
String::from_utf8(response).context("Source RCON response is not valid UTF-8")
}
/// Write a Source RCON packet to the stream.
async fn send_packet(stream: &mut TcpStream, id: i32, ptype: i32, body: &[u8]) -> Result<()> {
// size = id(4) + type(4) + body(n) + 2 null terminators
let size = (4 + 4 + body.len() + 2) as i32;
let mut buf: Vec<u8> = Vec::with_capacity(4 + size as usize);
buf.extend_from_slice(&size.to_le_bytes());
buf.extend_from_slice(&id.to_le_bytes());
buf.extend_from_slice(&ptype.to_le_bytes());
buf.extend_from_slice(body);
buf.push(0x00);
buf.push(0x00);
stream.write_all(&buf).await.context("Source RCON write")?;
Ok(())
}
/// Read one Source RCON packet; returns (id, type, body).
async fn recv_packet(stream: &mut TcpStream) -> Result<(i32, i32, Vec<u8>)> {
let mut size_buf = [0u8; 4];
stream
.read_exact(&mut size_buf)
.await
.context("Source RCON read size")?;
let size = i32::from_le_bytes(size_buf) as usize;
// Minimum packet: id(4) + type(4) + 2 null terminators = 10 bytes.
if size < 10 {
bail!("Source RCON: malformed packet (size={size})");
}
if size > MAX_RESPONSE_BYTES + 16 {
bail!("Source RCON: packet too large ({size} bytes)");
}
let mut payload = vec![0u8; size];
stream
.read_exact(&mut payload)
.await
.context("Source RCON read payload")?;
let id = i32::from_le_bytes(payload[0..4].try_into().unwrap());
let ptype = i32::from_le_bytes(payload[4..8].try_into().unwrap());
// Body is everything between the two fields and the two trailing nulls.
let body_end = size.saturating_sub(2); // strip 2 null terminators
let body = payload[8..body_end].to_vec();
Ok((id, ptype, body))
}

View File

@@ -17,14 +17,12 @@ pub fn host_going_offline(license: &str) -> String {
format!("corrosion.{license}.host.going_offline")
}
/// Phase 1: per-instance command channel (start/stop/restart/rcon/...).
#[allow(dead_code)]
/// Per-instance command channel (start/stop/restart/status; rcon et al. to come).
pub fn instance_cmd(license: &str, instance: &str) -> String {
format!("corrosion.{license}.{instance}.cmd")
}
/// Phase 1: per-instance state-change events.
#[allow(dead_code)]
/// Per-instance state-change events.
pub fn instance_status(license: &str, instance: &str) -> String {
format!("corrosion.{license}.{instance}.status")
}

View File

@@ -65,9 +65,10 @@ pub struct InstanceInfo {
pub game: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub label: Option<String>,
/// Phase 0 states: `configured` (root exists) or `missing_root`.
/// Phase 1 adds live process states (running/stopped/crashed).
/// Process-managed: running/stopped/starting/stopping/crashed.
/// Unmanaged (no executable configured): configured/missing_root.
pub state: String,
pub uptime_seconds: u64,
#[serde(skip_serializing_if = "Option::is_none")]
pub root_disk_free_mb: Option<u64>,
}
@@ -125,21 +126,30 @@ pub async fn collect(agent: &Agent, sys: &mut System) -> HeartbeatPayload {
})
.collect();
let instances = agent
.cfg
.instances
.iter()
.map(|inst| {
let exists = inst.root.exists();
InstanceInfo {
id: inst.id.clone(),
game: inst.game.clone(),
label: inst.label.clone(),
state: if exists { "configured" } else { "missing_root" }.to_string(),
root_disk_free_mb: disk_free_for_path(&disks, &inst.root),
let mut instances = Vec::with_capacity(agent.cfg.instances.len());
for inst in &agent.cfg.instances {
let (state, uptime_seconds) = match agent.supervisors.get(&inst.id) {
Some(sup) if !matches!(sup.state(), crate::process::InstanceState::Unmanaged) => {
(sup.state().as_label().to_string(), sup.uptime_seconds().await)
}
})
.collect();
_ => {
let exists = inst.root.exists();
(
if exists { "configured" } else { "missing_root" }.to_string(),
0,
)
}
};
instances.push(InstanceInfo {
id: inst.id.clone(),
game: inst.game.clone(),
label: inst.label.clone(),
state,
uptime_seconds,
root_disk_free_mb: disk_free_for_path(&disks, &inst.root),
});
}
let instances = instances;
HeartbeatPayload {
schema: 2,

View File

@@ -0,0 +1,353 @@
//! RCON integration tests using in-process mock servers.
//!
//! Real OS sockets on ephemeral ports — no mocking framework. Each test
//! binds a listener, spawns a task that speaks the expected protocol, then
//! exercises `rcon::send_command` and asserts on the result. Tests are
//! unix-only because the musl cross-compile target and the CI runner are both
//! Linux; the production use case is also Linux-only (game servers don't run
//! on macOS or Windows in production).
//!
//! We use `#[cfg(unix)]` to keep parity with the supervisor integration tests.
#![cfg(unix)]
use corrosion_host_agent::rcon::{RconConfig, RconKind};
use tokio::io::{AsyncReadExt, AsyncWriteExt};
use tokio::net::{TcpListener, TcpStream};
// ---------------------------------------------------------------------------
// Source RCON helpers — duplicate the wire-format encode/decode locally so
// the tests own the mock server without depending on the production code path.
// ---------------------------------------------------------------------------
/// Build a Source RCON packet: [size(4LE) | id(4LE) | type(4LE) | body | 0x00 0x00]
fn encode_packet(id: i32, ptype: i32, body: &[u8]) -> Vec<u8> {
let size = (4 + 4 + body.len() + 2) as i32;
let mut out = Vec::with_capacity(4 + size as usize);
out.extend_from_slice(&size.to_le_bytes());
out.extend_from_slice(&id.to_le_bytes());
out.extend_from_slice(&ptype.to_le_bytes());
out.extend_from_slice(body);
out.push(0x00);
out.push(0x00);
out
}
/// Read one Source RCON packet from a TcpStream.
async fn read_packet(stream: &mut TcpStream) -> (i32, i32, Vec<u8>) {
let mut size_buf = [0u8; 4];
stream.read_exact(&mut size_buf).await.unwrap();
let size = i32::from_le_bytes(size_buf) as usize;
let mut payload = vec![0u8; size];
stream.read_exact(&mut payload).await.unwrap();
let id = i32::from_le_bytes(payload[0..4].try_into().unwrap());
let ptype = i32::from_le_bytes(payload[4..8].try_into().unwrap());
let body_end = size.saturating_sub(2);
let body = payload[8..body_end].to_vec();
(id, ptype, body)
}
const SOURCE_TYPE_AUTH: i32 = 3;
const SOURCE_TYPE_AUTH_RESPONSE: i32 = 2;
const SOURCE_TYPE_EXECCOMMAND: i32 = 2;
const SOURCE_TYPE_RESPONSE_VALUE: i32 = 0;
// ---------------------------------------------------------------------------
// Mock Source RCON server
// ---------------------------------------------------------------------------
/// Run a Source RCON server that accepts password "goodpw", rejects others,
/// and responds to the first EXECCOMMAND with `response_body`.
///
/// If `split_at` is Some(n) the body is split: the first `n` bytes arrive in
/// one RESPONSE_VALUE packet and the remainder in a second — testing multi-
/// packet reassembly.
async fn run_source_mock(
mut stream: TcpStream,
accept_password: &str,
command_response: &[u8],
split_at: Option<usize>,
) {
// --- Auth phase ---
let (auth_id, ptype, body) = read_packet(&mut stream).await;
assert_eq!(ptype, SOURCE_TYPE_AUTH, "expected AUTH packet");
let password = String::from_utf8_lossy(&body);
if password != accept_password {
// Send empty RESPONSE_VALUE then AUTH_RESPONSE with id = -1 (failure).
let empty = encode_packet(auth_id, SOURCE_TYPE_RESPONSE_VALUE, b"");
stream.write_all(&empty).await.unwrap();
let fail = encode_packet(-1, SOURCE_TYPE_AUTH_RESPONSE, b"");
stream.write_all(&fail).await.unwrap();
return;
}
// Success: empty RESPONSE_VALUE then AUTH_RESPONSE with the auth id.
let empty = encode_packet(auth_id, SOURCE_TYPE_RESPONSE_VALUE, b"");
stream.write_all(&empty).await.unwrap();
let ok = encode_packet(auth_id, SOURCE_TYPE_AUTH_RESPONSE, b"");
stream.write_all(&ok).await.unwrap();
// --- Command phase ---
let (cmd_id, cmd_ptype, _cmd_body) = read_packet(&mut stream).await;
assert_eq!(cmd_ptype, SOURCE_TYPE_EXECCOMMAND, "expected EXECCOMMAND");
// Read the probe packet (empty RESPONSE_VALUE with a different id).
let (probe_id, probe_ptype, _) = read_packet(&mut stream).await;
assert_eq!(probe_ptype, SOURCE_TYPE_RESPONSE_VALUE, "expected probe packet");
// Send the command response, optionally split across two packets.
if let Some(n) = split_at {
let (part1, part2) = command_response.split_at(n.min(command_response.len()));
let p1 = encode_packet(cmd_id, SOURCE_TYPE_RESPONSE_VALUE, part1);
stream.write_all(&p1).await.unwrap();
let p2 = encode_packet(cmd_id, SOURCE_TYPE_RESPONSE_VALUE, part2);
stream.write_all(&p2).await.unwrap();
} else {
let p = encode_packet(cmd_id, SOURCE_TYPE_RESPONSE_VALUE, command_response);
stream.write_all(&p).await.unwrap();
}
// Echo the probe to signal end-of-response.
let probe_echo = encode_packet(probe_id, SOURCE_TYPE_RESPONSE_VALUE, b"");
stream.write_all(&probe_echo).await.unwrap();
}
// ---------------------------------------------------------------------------
// Source RCON tests
// ---------------------------------------------------------------------------
#[tokio::test]
async fn source_rcon_auth_and_exec_returns_response() {
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let port = listener.local_addr().unwrap().port();
tokio::spawn(async move {
let (stream, _) = listener.accept().await.unwrap();
run_source_mock(stream, "goodpw", b"Hello from server", None).await;
});
let cfg = RconConfig { kind: Some(RconKind::Source), port, password: "goodpw".to_string() };
let result = corrosion_host_agent::rcon::send_command(&cfg, "conan", "status")
.await
.expect("command should succeed");
assert_eq!(result, "Hello from server");
}
#[tokio::test]
async fn source_rcon_wrong_password_returns_auth_error() {
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let port = listener.local_addr().unwrap().port();
tokio::spawn(async move {
let (stream, _) = listener.accept().await.unwrap();
run_source_mock(stream, "goodpw", b"should not see this", None).await;
});
let cfg = RconConfig { kind: Some(RconKind::Source), port, password: "wrongpw".to_string() };
let err = corrosion_host_agent::rcon::send_command(&cfg, "conan", "status")
.await
.expect_err("wrong password should fail");
assert!(
err.to_string().to_lowercase().contains("auth"),
"error should mention auth failure, got: {err}"
);
}
#[tokio::test]
async fn source_rcon_multi_packet_response_concatenated() {
// Build a body large enough to split meaningfully across two packets.
// Use repeating ASCII so the result is valid UTF-8 and easy to verify.
// 200 'A's then 200 'B's = 400 bytes, split at 200.
let body: Vec<u8> = std::iter::repeat_n(b'A', 200)
.chain(std::iter::repeat_n(b'B', 200))
.collect();
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let port = listener.local_addr().unwrap().port();
let body_clone = body.clone();
tokio::spawn(async move {
let (stream, _) = listener.accept().await.unwrap();
run_source_mock(stream, "goodpw", &body_clone, Some(200)).await;
});
let cfg = RconConfig { kind: Some(RconKind::Source), port, password: "goodpw".to_string() };
let result = corrosion_host_agent::rcon::send_command(&cfg, "soulmask", "showplayers")
.await
.expect("multi-packet command should succeed");
let expected = String::from_utf8(body).unwrap();
assert_eq!(result, expected, "full body should be concatenated across both packets");
}
#[tokio::test]
async fn source_rcon_connect_timeout_to_unreachable_port() {
// Bind a listener but never accept — the connection will time out during
// the RCON auth phase because nothing is reading from the socket.
// We use a port that is bound (so TCP connect itself succeeds) but then
// the mock simply drops the stream, forcing a read error, which should
// surface as an error (not a panic or hang).
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let port = listener.local_addr().unwrap().port();
// Accept the TCP connection but immediately drop it — simulates a port
// that accepts but never speaks RCON.
tokio::spawn(async move {
let (_stream, _) = listener.accept().await.unwrap();
// _stream dropped here — EOF on the client's read
});
let cfg =
RconConfig { kind: Some(RconKind::Source), port, password: "goodpw".to_string() };
let err = corrosion_host_agent::rcon::send_command(&cfg, "conan", "status")
.await
.expect_err("closed connection should fail");
// We just need it to fail and not hang; error message varies by OS.
let _ = err;
}
// ---------------------------------------------------------------------------
// WebRCON mock server
// ---------------------------------------------------------------------------
/// Run a WebRCON mock: send one noise frame (Identifier 0), then respond to
/// the first real request with the given output.
async fn run_webrcon_mock(stream: tokio::net::TcpStream, output: &str) {
use futures::{SinkExt, StreamExt};
use tokio_tungstenite::accept_async;
use tokio_tungstenite::tungstenite::Message as WsMsg;
let mut ws = accept_async(stream).await.expect("WS handshake failed");
// Send noise (chat frame, Identifier 0) before the real request arrives.
let noise = serde_json::json!({
"Identifier": 0,
"Message": "Player X joined",
"Name": "Server",
"Type": "Chat"
});
ws.send(WsMsg::Text(noise.to_string()))
.await
.unwrap();
// Read the command request.
let msg = ws.next().await.unwrap().unwrap();
let text = match msg {
WsMsg::Text(t) => t,
other => panic!("expected Text frame, got {other:?}"),
};
let req: serde_json::Value = serde_json::from_str(&text).unwrap();
let req_id = req["Identifier"].as_i64().unwrap() as i32;
// Reply with the same Identifier so the client correlates correctly.
let reply = serde_json::json!({
"Identifier": req_id,
"Message": output,
"Type": "Generic",
});
ws.send(WsMsg::Text(reply.to_string())).await.unwrap();
}
// ---------------------------------------------------------------------------
// WebRCON tests
// ---------------------------------------------------------------------------
#[tokio::test]
async fn webrcon_skips_noise_and_returns_correct_message() {
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let port = listener.local_addr().unwrap().port();
tokio::spawn(async move {
let (stream, _) = listener.accept().await.unwrap();
run_webrcon_mock(stream, "Players: 42/100").await;
});
// Password is embedded in the URL path — any non-empty string works with
// our mock.
let cfg = RconConfig {
kind: Some(RconKind::WebRcon),
port,
password: "testpw".to_string(),
};
let result = corrosion_host_agent::rcon::send_command(&cfg, "rust", "playercount")
.await
.expect("WebRCON command should succeed");
assert_eq!(result, "Players: 42/100");
}
// ---------------------------------------------------------------------------
// TOML parsing test — pins [[instance]] + [instance.rcon] sub-table syntax
// ---------------------------------------------------------------------------
#[test]
fn toml_instance_with_rcon_parses_correctly() {
let toml = r#"
[agent]
license_id = "test-license"
nats_url = "nats://localhost:4222"
[[instance]]
id = "rust-main"
game = "rust"
root = "/opt/rustserver"
[instance.rcon]
port = 28016
password = "secretpassword"
kind = "webrcon"
"#;
let cfg: corrosion_host_agent::config::ConfigFile =
toml::from_str(toml).expect("TOML should parse");
assert_eq!(cfg.instances.len(), 1);
let inst = &cfg.instances[0];
assert_eq!(inst.id, "rust-main");
let rcon = inst.rcon.as_ref().expect("rcon should be present");
assert_eq!(rcon.port, 28016);
assert_eq!(rcon.password, "secretpassword");
assert_eq!(rcon.kind, Some(corrosion_host_agent::rcon::RconKind::WebRcon));
}
#[test]
fn toml_instance_without_rcon_defaults_to_none() {
let toml = r#"
[agent]
license_id = "test-license"
nats_url = "nats://localhost:4222"
[[instance]]
id = "conan-main"
game = "conan"
root = "/opt/conan"
"#;
let cfg: corrosion_host_agent::config::ConfigFile =
toml::from_str(toml).expect("TOML should parse");
assert!(cfg.instances[0].rcon.is_none(), "absent rcon should be None");
}
#[test]
fn resolved_kind_infers_from_game_name() {
use corrosion_host_agent::rcon::{RconConfig, RconKind};
let cfg_no_kind = RconConfig { kind: None, port: 28016, password: "x".to_string() };
assert_eq!(cfg_no_kind.resolved_kind("rust"), RconKind::WebRcon);
assert_eq!(cfg_no_kind.resolved_kind("conan"), RconKind::Source);
assert_eq!(cfg_no_kind.resolved_kind("soulmask"), RconKind::Source);
assert_eq!(cfg_no_kind.resolved_kind("dune"), RconKind::WebRcon); // fallback
// Explicit kind always wins.
let cfg_source = RconConfig { kind: Some(RconKind::Source), ..cfg_no_kind.clone() };
assert_eq!(cfg_source.resolved_kind("rust"), RconKind::Source);
let cfg_webrcon = RconConfig { kind: Some(RconKind::WebRcon), ..cfg_no_kind };
assert_eq!(cfg_webrcon.resolved_kind("conan"), RconKind::WebRcon);
}

View File

@@ -0,0 +1,108 @@
//! Process supervisor integration tests using real OS processes.
//! Unix-only test doubles (/bin/sleep, /bin/sh) — the supervisor logic under
//! test is platform-shared; Windows-specific stop semantics get covered when
//! the Windows service work lands.
#![cfg(unix)]
use std::path::PathBuf;
use std::time::Duration;
use corrosion_host_agent::config::InstanceConfig;
use corrosion_host_agent::process::{InstanceState, ProcessSupervisor};
fn managed_instance(executable: &str, args: &[&str]) -> InstanceConfig {
InstanceConfig {
id: "test-instance".to_string(),
game: "rust".to_string(),
root: PathBuf::from("/tmp"),
label: None,
executable: Some(PathBuf::from(executable)),
args: args.iter().map(|s| s.to_string()).collect(),
working_dir: None,
rcon: None,
}
}
async fn wait_for_state(
sup: &std::sync::Arc<ProcessSupervisor>,
want: fn(&InstanceState) -> bool,
budget: Duration,
) -> InstanceState {
let deadline = tokio::time::Instant::now() + budget;
loop {
let state = sup.state();
if want(&state) {
return state;
}
if tokio::time::Instant::now() > deadline {
panic!("timed out waiting for state; last = {state:?}");
}
tokio::time::sleep(Duration::from_millis(100)).await;
}
}
#[tokio::test]
async fn start_status_stop_lifecycle() {
let sup = ProcessSupervisor::new(&managed_instance("/bin/sleep", &["300"]));
assert_eq!(sup.state(), InstanceState::Stopped);
sup.start().await.expect("start should succeed");
assert_eq!(sup.state(), InstanceState::Running);
tokio::time::sleep(Duration::from_millis(1100)).await;
assert!(sup.uptime_seconds().await >= 1, "uptime should advance");
// Double-start must be rejected while running.
assert!(sup.start().await.is_err(), "double start must fail");
sup.stop().await.expect("stop should succeed");
let state = wait_for_state(&sup, |s| matches!(s, InstanceState::Stopped), Duration::from_secs(5)).await;
assert_eq!(state, InstanceState::Stopped);
assert_eq!(sup.uptime_seconds().await, 0);
}
#[tokio::test]
async fn unexpected_exit_is_crashed_with_code() {
let sup = ProcessSupervisor::new(&managed_instance("/bin/sh", &["-c", "sleep 0.2; exit 7"]));
sup.start().await.expect("start should succeed");
let state = wait_for_state(
&sup,
|s| matches!(s, InstanceState::Crashed { .. }),
Duration::from_secs(5),
)
.await;
assert_eq!(state, InstanceState::Crashed { exit_code: Some(7) });
}
#[tokio::test]
async fn restart_from_crashed_recovers() {
let sup = ProcessSupervisor::new(&managed_instance("/bin/sh", &["-c", "exit 1"]));
sup.start().await.expect("start should succeed");
wait_for_state(&sup, |s| matches!(s, InstanceState::Crashed { .. }), Duration::from_secs(5)).await;
// Restart from crashed must work (panel "Restart" after a crash).
// Use a long-lived command this time by replacing the supervisor — the
// command is fixed per supervisor, so emulate via a fresh one.
let sup2 = ProcessSupervisor::new(&managed_instance("/bin/sleep", &["300"]));
sup2.restart().await.expect("restart from stopped should start");
assert_eq!(sup2.state(), InstanceState::Running);
sup2.stop().await.expect("cleanup stop");
}
#[tokio::test]
async fn unmanaged_instance_rejects_process_commands() {
let mut cfg = managed_instance("/bin/sleep", &["300"]);
cfg.executable = None;
let sup = ProcessSupervisor::new(&cfg);
assert_eq!(sup.state(), InstanceState::Unmanaged);
assert!(sup.start().await.is_err(), "unmanaged start must fail");
assert!(sup.stop().await.is_err(), "unmanaged stop must fail");
}
#[tokio::test]
async fn missing_executable_fails_cleanly() {
let sup = ProcessSupervisor::new(&managed_instance("/nonexistent/bin/gameserver", &[]));
let err = sup.start().await.expect_err("must fail");
assert!(err.to_string().contains("not found"), "error should say not found: {err}");
assert_eq!(sup.state(), InstanceState::Stopped, "failed start must not leave Starting state");
}

View File

@@ -87,7 +87,10 @@ services:
api:
condition: service_started
healthcheck:
test: ["CMD-SHELL", "wget -q --spider http://localhost:80/ || exit 1"]
# 127.0.0.1, not localhost: nginx listens IPv4-only (0.0.0.0:80) but
# `localhost` resolves to ::1 first inside the container → the probe hit
# nothing and reported unhealthy while the panel served fine on IPv4.
test: ["CMD-SHELL", "wget -q --spider http://127.0.0.1:80/ || exit 1"]
interval: 10s
timeout: 5s
retries: 3

View File

@@ -9,6 +9,9 @@
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="theme-color" content="#0a0a0a" />
<title>Corrosion Management</title>
<meta name="description" content="Management panel for self-hosted survival game servers — Rust, Dune: Awakening, Conan Exiles, Soulmask. Wipe automation, plugins, monitoring. Bring your own server." />
<meta property="og:title" content="Corrosion — Game Server Operations for Self-Hosted Communities" />
<meta property="og:description" content="Management panel for self-hosted survival game servers — Rust, Dune: Awakening, Conan Exiles, Soulmask. Wipe automation, plugins, monitoring. Bring your own server." />
<!-- Fonts via <link>, NOT a CSS @import — the bundler drops @import rules
that land mid-file after concatenation, silently shipping system fonts -->
<link rel="preconnect" href="https://fonts.googleapis.com" />

View File

@@ -1,7 +1,14 @@
<script setup lang="ts">
import { onMounted } from 'vue'
import { RouterView } from 'vue-router'
import ToastNotification from '@/components/ToastNotification.vue'
import ErrorBoundary from '@/components/ErrorBoundary.vue'
import { useAuthStore } from '@/stores/auth'
// Validate any persisted session against the API on boot — a stale token
// should bounce to login immediately, not after the first failed call.
const auth = useAuthStore()
onMounted(() => { void auth.validateSession() })
</script>
<template>

View File

@@ -1,11 +1,28 @@
import { createRouter, createWebHistory, type RouteRecordRaw } from 'vue-router'
import { useAuthStore } from '@/stores/auth'
// Extend vue-router's RouteMeta so title/description are typed throughout
declare module 'vue-router' {
interface RouteMeta {
title?: string
description?: string
requiresAuth?: boolean
guest?: boolean
superAdmin?: boolean
}
}
// ---------------------------------------------------------------------------
// Domain detection — runs once at module load
// Env-driven so www./staging hosts route correctly; an exact-match literal
// here once meant any non-canonical marketing host silently got the panel.
// ---------------------------------------------------------------------------
const hostname = typeof window !== 'undefined' ? window.location.hostname : ''
const isMarketingDomain = hostname === 'corrosionmgmt.com'
const marketingHosts = (import.meta.env.VITE_MARKETING_HOSTS ?? 'corrosionmgmt.com,www.corrosionmgmt.com')
.split(',')
.map((h: string) => h.trim().toLowerCase())
.filter(Boolean)
const isMarketingDomain = marketingHosts.includes(hostname.toLowerCase())
// ---------------------------------------------------------------------------
// Marketing page children — shared between both domain route sets
@@ -15,31 +32,55 @@ const marketingChildren: RouteRecordRaw[] = [
path: '',
name: 'landing',
component: () => import('@/views/marketing/LandingView.vue'),
meta: {
title: 'Corrosion — Game Server Operations for Self-Hosted Communities',
description: 'Management panel for self-hosted survival game servers — Rust, Dune: Awakening, Conan Exiles, Soulmask. Wipe automation, plugins, monitoring. Bring your own server.',
},
},
{
path: 'pricing',
name: 'pricing',
component: () => import('@/views/marketing/PricingView.vue'),
meta: {
title: 'Pricing — Corrosion',
description: 'Plans from $9.99/mo (Hobby, 15 servers) to Network ($99.99+/mo, 50+ servers). Non-commercial and commercial tiers. No hosting fees — bring your own server.',
},
},
{
path: 'how-it-works',
name: 'how-it-works',
component: () => import('@/views/marketing/HowItWorksView.vue'),
meta: {
title: 'How It Works — Corrosion',
description: 'Install one host agent on Windows or Linux. It connects outbound-only to Corrosion — no inbound ports, no SSH. Manage every game instance from the browser.',
},
},
{
path: 'faq',
name: 'faq',
component: () => import('@/views/marketing/FaqView.vue'),
meta: {
title: 'FAQ — Corrosion',
description: 'Honest answers: Corrosion is self-service (BYOS, no hosting). Support is docs + community; 1:1 at $125/hr. Supports Rust, Dune, Conan Exiles, Soulmask.',
},
},
{
path: 'roadmap',
name: 'roadmap',
component: () => import('@/views/marketing/RoadmapView.vue'),
meta: {
title: 'Roadmap — Corrosion',
description: 'Phase 1 shipped: core control plane, auto-wiper, plugin management. In progress: Dune, Conan, Soulmask multi-game blueprints. Planned: API access, integrations.',
},
},
{
path: 'early-access',
name: 'early-access',
component: () => import('@/views/marketing/EarlyAccessView.vue'),
meta: {
title: 'Early Access — Corrosion',
description: 'Join the early access list. Get full control plane access — wipe automation, plugin management, real-time console — and lock in launch pricing.',
},
},
]
@@ -53,25 +94,25 @@ const panelRoutes: RouteRecordRaw[] = [
path: '/login',
name: 'login',
component: () => import('@/views/auth/LoginView.vue'),
meta: { guest: true },
meta: { guest: true, title: 'Sign in — Corrosion' },
},
{
path: '/register',
name: 'register',
component: () => import('@/views/auth/RegisterView.vue'),
meta: { guest: true },
meta: { guest: true, title: 'Create account — Corrosion' },
},
{
path: '/forgot-password',
name: 'forgot-password',
component: () => import('@/views/auth/ForgotPasswordView.vue'),
meta: { guest: true },
meta: { guest: true, title: 'Reset password — Corrosion' },
},
{
path: '/setup',
name: 'setup-wizard',
component: () => import('@/views/auth/SetupWizardView.vue'),
meta: { requiresAuth: true },
meta: { requiresAuth: true, title: 'Setup — Corrosion' },
},
// Admin dashboard routes (with sidebar layout)
@@ -84,217 +125,254 @@ const panelRoutes: RouteRecordRaw[] = [
path: '',
name: 'dashboard',
component: () => import('@/views/admin/DashboardView.vue'),
meta: { title: 'Dashboard — Corrosion' },
},
{
path: 'server',
name: 'server',
component: () => import('@/views/admin/ServerView.vue'),
meta: { title: 'Server — Corrosion' },
},
{
path: 'console',
name: 'console',
component: () => import('@/views/admin/ConsoleView.vue'),
meta: { title: 'Console — Corrosion' },
},
{
path: 'players',
name: 'players',
component: () => import('@/views/admin/PlayersView.vue'),
meta: { title: 'Players — Corrosion' },
},
{
path: 'plugins',
name: 'plugins',
component: () => import('@/views/admin/PluginsView.vue'),
meta: { title: 'Plugins — Corrosion' },
},
{
path: 'files',
name: 'files',
component: () => import('@/views/admin/FileManagerView.vue'),
meta: { title: 'Files — Corrosion' },
},
{
path: 'plugin-configs',
name: 'plugin-configs',
component: () => import('@/views/admin/PluginConfigsView.vue'),
meta: { title: 'Plugin Configs — Corrosion' },
},
{
path: 'loot-builder',
name: 'loot-builder',
component: () => import('@/views/admin/LootBuilderView.vue'),
meta: { title: 'Loot Builder — Corrosion' },
},
{
path: 'teleport-config',
name: 'teleport-config',
component: () => import('@/views/admin/TeleportConfigView.vue'),
meta: { title: 'Teleport Config — Corrosion' },
},
{
path: 'gather-manager',
name: 'gather-manager',
component: () => import('@/views/admin/GatherManagerView.vue'),
meta: { title: 'Gather Manager — Corrosion' },
},
{
path: 'autodoors',
name: 'autodoors',
component: () => import('@/views/admin/AutoDoorsView.vue'),
meta: { title: 'Auto Doors — Corrosion' },
},
{
path: 'kits',
name: 'kits-config',
component: () => import('@/views/admin/KitsView.vue'),
meta: { title: 'Kits — Corrosion' },
},
{
path: 'furnace-splitter',
name: 'furnace-splitter',
component: () => import('@/views/admin/FurnaceSplitterView.vue'),
meta: { title: 'Furnace Splitter — Corrosion' },
},
{
path: 'better-chat',
name: 'better-chat',
component: () => import('@/views/admin/BetterChatView.vue'),
meta: { title: 'Better Chat — Corrosion' },
},
{
path: 'timed-execute',
name: 'timed-execute',
component: () => import('@/views/admin/TimedExecuteView.vue'),
meta: { title: 'Timed Execute — Corrosion' },
},
{
path: 'raidable-bases',
name: 'raidable-bases',
component: () => import('@/views/admin/RaidableBasesView.vue'),
meta: { title: 'Raidable Bases — Corrosion' },
},
{
path: 'wipes',
name: 'wipes',
component: () => import('@/views/admin/WipesView.vue'),
meta: { title: 'Wipes — Corrosion' },
},
{
path: 'wipes/profiles',
name: 'wipe-profiles',
component: () => import('@/views/admin/WipeProfilesView.vue'),
meta: { title: 'Wipe Profiles — Corrosion' },
},
{
path: 'wipes/calendar',
name: 'wipe-calendar',
component: () => import('@/views/admin/WipeCalendarView.vue'),
meta: { title: 'Wipe Calendar — Corrosion' },
},
{
path: 'wipes/history',
name: 'wipe-history',
component: () => import('@/views/admin/WipeHistoryView.vue'),
meta: { title: 'Wipe History — Corrosion' },
},
{
path: 'wipes/analytics',
name: 'wipe-analytics',
component: () => import('@/views/admin/WipeAnalyticsView.vue'),
meta: { title: 'Wipe Analytics — Corrosion' },
},
{
path: 'maps',
name: 'maps',
component: () => import('@/views/admin/MapsView.vue'),
meta: { title: 'Maps — Corrosion' },
},
{
path: 'maps/analytics',
name: 'map-analytics',
component: () => import('@/views/admin/MapAnalyticsView.vue'),
meta: { title: 'Map Analytics — Corrosion' },
},
{
path: 'chat',
name: 'chat',
component: () => import('@/views/admin/ChatLogView.vue'),
meta: { title: 'Chat Log — Corrosion' },
},
{
path: 'analytics',
name: 'analytics',
component: () => import('@/views/admin/AnalyticsView.vue'),
meta: { title: 'Analytics — Corrosion' },
},
{
path: 'retention',
name: 'retention',
component: () => import('@/views/admin/PlayerRetentionView.vue'),
meta: { title: 'Player Retention — Corrosion' },
},
{
path: 'notifications',
name: 'notifications',
component: () => import('@/views/admin/NotificationsView.vue'),
meta: { title: 'Notifications — Corrosion' },
},
{
path: 'team',
name: 'team',
component: () => import('@/views/admin/TeamView.vue'),
meta: { title: 'Team — Corrosion' },
},
{
path: 'store/config',
name: 'store-config',
component: () => import('@/views/admin/StoreConfigView.vue'),
meta: { title: 'Store Config — Corrosion' },
},
{
path: 'store/items',
name: 'store-items',
component: () => import('@/views/admin/StoreItemsView.vue'),
meta: { title: 'Store Items — Corrosion' },
},
{
path: 'store/revenue',
name: 'store-revenue',
component: () => import('@/views/admin/StoreRevenueView.vue'),
meta: { title: 'Store Revenue — Corrosion' },
},
{
path: 'modules',
name: 'modules',
component: () => import('@/views/admin/ModuleStoreView.vue'),
meta: { title: 'Modules — Corrosion' },
},
{
path: 'settings',
name: 'settings',
component: () => import('@/views/admin/SettingsView.vue'),
meta: { title: 'Settings — Corrosion' },
},
{
path: 'schedules',
name: 'schedules',
component: () => import('@/views/admin/SchedulesView.vue'),
meta: { title: 'Schedules — Corrosion' },
},
{
path: 'migration',
name: 'migration',
component: () => import('@/views/admin/MigrationView.vue'),
meta: { title: 'Migration — Corrosion' },
},
{
path: 'changelog',
name: 'changelog',
component: () => import('@/views/admin/ChangelogView.vue'),
meta: { title: 'Changelog — Corrosion' },
},
{
path: 'alerts',
name: 'alerts',
component: () => import('@/views/admin/AlertsView.vue'),
meta: { title: 'Alerts — Corrosion' },
},
// Platform Admin views (super-admin only)
{
path: 'admin',
name: 'platform-admin',
component: () => import('@/views/platform-admin/AdminDashboard.vue'),
meta: { superAdmin: true },
meta: { superAdmin: true, title: 'Admin — Corrosion' },
},
{
path: 'admin/licenses',
name: 'platform-licenses',
component: () => import('@/views/platform-admin/AdminLicenses.vue'),
meta: { superAdmin: true },
meta: { superAdmin: true, title: 'Admin: Licenses — Corrosion' },
},
{
path: 'admin/subscriptions',
name: 'platform-subscriptions',
component: () => import('@/views/platform-admin/AdminSubscriptions.vue'),
meta: { superAdmin: true },
meta: { superAdmin: true, title: 'Admin: Subscriptions — Corrosion' },
},
{
path: 'admin/users',
name: 'platform-users',
component: () => import('@/views/platform-admin/AdminUsers.vue'),
meta: { superAdmin: true },
meta: { superAdmin: true, title: 'Admin: Users — Corrosion' },
},
{
path: 'admin/servers',
name: 'platform-servers',
component: () => import('@/views/platform-admin/AdminServers.vue'),
meta: { superAdmin: true },
meta: { superAdmin: true, title: 'Admin: Servers — Corrosion' },
},
],
},
@@ -329,6 +407,7 @@ const panelRoutes: RouteRecordRaw[] = [
path: '/status',
name: 'status',
component: () => import('@/views/public/StatusPageView.vue'),
meta: { title: 'Status — Corrosion' },
},
// Catch-all
@@ -366,6 +445,7 @@ const marketingRoutes: RouteRecordRaw[] = [
path: '/status',
name: 'status',
component: () => import('@/views/public/StatusPageView.vue'),
meta: { title: 'Status — Corrosion' },
},
// Catch-all: unknown routes → landing page
@@ -383,6 +463,38 @@ const router = createRouter({
routes: isMarketingDomain ? marketingRoutes : panelRoutes,
})
// ---------------------------------------------------------------------------
// Document title + meta description/OG update on every navigation
// ---------------------------------------------------------------------------
function setOrClearMeta(selector: string, attr: string, value: string): void {
let el = document.querySelector<HTMLMetaElement>(selector)
if (!el) {
el = document.createElement('meta')
// Parse the selector to set the right attribute (name="..." or property="...")
const nameMatch = selector.match(/\[name="([^"]+)"\]/)
const propMatch = selector.match(/\[property="([^"]+)"\]/)
if (nameMatch?.[1]) el.setAttribute('name', nameMatch[1])
if (propMatch?.[1]) el.setAttribute('property', propMatch[1])
document.head.appendChild(el)
}
el.setAttribute(attr, value)
}
router.afterEach((to) => {
// Title
document.title = to.meta.title ?? 'Corrosion Management'
// Description
const desc = to.meta.description ?? ''
setOrClearMeta('meta[name="description"]', 'content', desc)
// OG title
setOrClearMeta('meta[property="og:title"]', 'content', to.meta.title ?? 'Corrosion Management')
// OG description
setOrClearMeta('meta[property="og:description"]', 'content', desc)
})
// Auth guard — only meaningful on panel domain (marketing has no requiresAuth routes)
router.beforeEach((to, _from, next) => {
const auth = useAuthStore()

View File

@@ -58,6 +58,27 @@ export const useAuthStore = defineStore('auth', () => {
permissions.value = {}
}
/**
* Validate the persisted session against the API on app boot. Without this,
* a stale/revoked token renders the full panel chrome and only collapses on
* the first real API call. useApi's 401 path (refresh → retry → logout)
* does the heavy lifting; any non-auth failure (network, 5xx) keeps the
* session — never log users out because the API blipped.
* Dynamic import avoids a static auth-store ↔ useApi module cycle.
*/
async function validateSession(): Promise<void> {
if (!accessToken.value) return
try {
const { useApi } = await import('@/composables/useApi')
const me = await useApi().get<Partial<User>>('/auth/me')
if (user.value && me && typeof me === 'object') {
user.value = { ...user.value, ...me }
}
} catch {
// 401 → refresh → logout/redirect already handled inside useApi.
}
}
function hasModule(moduleSlug: string): boolean {
return license.value?.modules_enabled?.includes(moduleSlug) ?? false
}
@@ -92,6 +113,7 @@ export const useAuthStore = defineStore('auth', () => {
setAuth,
setLicense,
logout,
validateSession,
hasModule,
hasPermission,
}