27 Commits

Author SHA1 Message Date
Vantz Stockwell
d13f2cb8b1 feat(host-agent): Phase 2 — Dune docker-compose adapter via Supervisor trait
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Failing after 35s
CI / integration (push) Has been skipped
Build Host Agent (Rust) / build (push) Successful in 1m45s
Introduce a Supervisor trait (async-trait) so the agent manages games with
different models behind one wire contract. ProcessSupervisor (spawned process:
rust/conan/soulmask) and the new DockerComposeSupervisor (dune) both impl it;
Agent.supervisors is now HashMap<String, Arc<dyn Supervisor>> and instancecmd
dispatch is game-agnostic — start/stop/restart/status identical across games,
selected by a per-game factory in main. InstanceState moved to the shared
supervisor module.

DockerComposeSupervisor drives docker-compose up-d / stop / restart against
the instance's compose project, with -f/-p/single-service support and a
configurable compose binary. New [instance.docker_compose] config block.
First cut = lifecycle + cached state; container crash-detection + restart
adoption deferred to Phase 3b (reconcilable with a compose ps probe).

Trait choice (dyn over enum) per Commander: scales to future planes (kubectl,
AMP/podman, SSH) as new struct+impl, no central match.

56 tests green (6 new docker-compose mock-binary tests + 5 refactored process
tests), zero warnings. Live verification pending a real Dune stack.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 21:33:00 -04:00
Vantz Stockwell
651a35d4be docs(reference): import Dune: Awakening server-manager references
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Successful in 39s
CI / integration (push) Successful in 22s
Phase 2 references for the host-agent Dune adapter, moved out of volatile /tmp
into docs/reference-repos/ (per Commander). Three upstream projects, .git +
node_modules + compiled binaries stripped (16MB source). Nested AI-instruction
files (.claude/, CLAUDE.md) removed so they don't pollute Corrosion sessions.

- icehunter/    dune-admin (Go+React) — 4 control planes; SETUP_DOCKER.md is the
                closest analog to our agent's Dune docker control plane (compose
                lifecycle, docker logs, RabbitMQ-via-exec, dune Postgres schema)
- adainrivers/  Rust/Tauri desktop — SSH+k8s BattleGroup control, maintenance
                daemon, in-game admin console (Rust idiom reference)
- the4rchangel/ Node web UI replacing battlegroup.bat — matches the Commander's
                Hyper-V self-host path + game-config schema

See docs/reference-repos/README.md for the full index + how we use each.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 21:08:05 -04:00
Vantz Stockwell
0715492ddf chore(panel): fleet-aware shell footer + drop dead vuefinder dep
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 14s
CI / agent-tests (push) Successful in 49s
CI / integration (push) Successful in 22s
COA-B cleanup:
- Sidebar agent-health footer now reads the fleet store (host count / online
  count / per-host status + last heartbeat) instead of the single legacy
  server.connection row, which disagreed with the multi-host fleet. Removed the
  legacy useServerStore dependency from the shell.
- Removed the unused 'vuefinder' dependency (replaced by the native file
  manager): dep + main.ts plugin/CSS registration. Main JS chunk 588kB -> 165kB.

Recon reclassified the 'dead cmd.server v1' item: it is the LIVE license-level
command path (module config applies, plugin install, schedules, legacy
start/stop) served only by the Go agent — a Rust-agent parity gap, not dead
code. Left intact.

Build-green (vue-tsc) + boots clean in-browser (0 console errors).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 21:04:09 -04:00
Vantz Stockwell
4ef5db5b0d feat(panel): drive active game from deployed fleet instances
All checks were successful
CI / backend-types (push) Successful in 8s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 40s
CI / integration (push) Successful in 23s
The shell skin / sidebar nav / dashboard terminology now follow the games
actually deployed (game_instances.game, agent-reported) instead of a
localStorage-only toggle. syncActiveGameFromFleet() derives: one game ->
auto-skin to it; zero/multiple -> 'all' neutral. A manual GameSwitcher pick
persists and overrides the heuristic. Wired into DashboardLayout via a watch
on the fleet store.

No schema change: a license's games are the distinct games of its instances
(the normalized source of truth) — deliberately not duplicating into a
licenses.game column that would drift (Lesson 20).

Build-green (vue-tsc) + boots clean in-browser (0 console errors, theming
initializes). Authenticated auto-derive confirms live on next instance deploy.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 20:51:36 -04:00
Vantz Stockwell
bb71763714 docs: Lesson 28 — base64-encode multi-line CI secrets (minisign signing key)
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 39s
CI / integration (push) Successful in 21s
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 20:38:56 -04:00
Vantz Stockwell
f18b45e3f2 fix(ci): base64-decode minisign secret key (CI mangles multi-line); bump alpha.8
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 1m30s
CI / integration (push) Failing after 13s
Build Host Agent (Rust) / build (push) Successful in 1m45s
The 'Sign artifacts' step failed on alpha.7 with 'Error while loading the
secret key file' (exit 2): minisign downloaded and ran, but the reconstructed
key file was unparseable. A minisign secret key is two lines (comment + base64
blob); Gitea/act_runner secret storage mangles the embedded newline, collapsing
it to one line. Decode the secret as base64 (single-line, mangling-proof) with
auto-detect fallback to a raw two-line key. Fails loudly with the fix command
if the secret is neither form.

Requires re-storing MINISIGN_SECRET_KEY as: base64 < secret.key | tr -d '\n'

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 20:31:48 -04:00
Vantz Stockwell
702de24e28 fix(ci): fetch minisign static binary (not in bullseye apt); bump alpha.7
Some checks failed
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Successful in 43s
Build Host Agent (Rust) / build (push) Failing after 1m33s
CI / integration (push) Successful in 22s
alpha.6 signing failed: 'E: Unable to locate package minisign' —
minisign isn't packaged for node:20-bullseye. Download the official
static linux binary instead. Forward to alpha.7 (alpha.6 published
nothing).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:18:08 -04:00
Vantz Stockwell
6b3e805ac2 feat(host-agent): Phase 3a signed self-update (minisign) + CI signing gate
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 1m27s
CI / integration (push) Successful in 21s
Build Host Agent (Rust) / build (push) Failing after 1m33s
Agent only ever runs a binary whose minisign signature verifies against
the EMBEDDED public key. NATS host.cmd func 'update' {url}: download
binary + .minisig from the CDN -> verify against embedded pubkey ->
atomic swap (.old rollback) -> relaunch. URL allowlist (https + cdn.
corrosionmgmt.com only, rejects userinfo-bypass), 100MiB cap. Closes the
supply-chain hole: even a malicious CDN upload can't run unsigned.

CI: build-host-agent.yml signs every artifact with MINISIGN_SECRET_KEY
(Gitea secret) and publishes .minisig alongside; the step FAILS the
build if the secret is absent (refuses to ship unsigned). Bumped to
alpha.6.

6 deterministic tests (accept valid / reject tampered+garbage+empty sig,
URL allowlist incl userinfo-bypass, atomic swap+rollback). Fixtures
signed with the real release key so tests need no key at runtime. Full
suite 50/50 green; musl + native build clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:00:36 -04:00
Vantz Stockwell
7c84912ff5 chore(frontend): bump version 1.0.0 -> 1.0.1
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 50s
CI / integration (push) Successful in 28s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:38:52 -04:00
Vantz Stockwell
355a53f6e3 feat(files): native instance-scoped file browser (replaces broken VueFinder)
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 42s
CI / integration (push) Successful in 22s
FileManagerView rewritten as a native DS browser on the per-instance
file bridge: instance selector, breadcrumb nav, dir-first listing
(name/size/modified), folder drill-down, inline file editor (read/save),
toolbar (new folder/file/refresh), per-row rename + delete-confirm.
New files store wraps the /instances/:id/files* endpoints. VueFinder
import + RemoteDriver fully removed — no more retired-protocol /api/files.
Honest empty (no instance -> Server page) + error (retry) states, never
the global error boundary.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:31:01 -04:00
Vantz Stockwell
589516a021 feat(api): complete per-instance file op-set (delete/rename/mkdir/mkfile/move/copy)
All checks were successful
CI / backend-types (push) Successful in 8s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Successful in 54s
CI / integration (push) Successful in 25s
Rounds out the per-instance file bridge to the agent's full jailed file
manager so a real file browser can be built on it: POST :id/files/
{delete,rename,mkdir,mkfile,move,copy}, all via requestScoped (license-
scoped reply) on the new agent {op,path} protocol. files.manage. The
broken legacy VueFinder /api/files (retired Go fm_* protocol, wrong
subject, default _INBOX) is superseded by this — frontend rewrite next.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:24:31 -04:00
Vantz Stockwell
f60e6abd33 feat(server): config file editor — read/edit/save a host config file per instance
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 44s
CI / integration (push) Successful in 21s
The Server page's config-honesty note now leads somewhere real: a
Configuration file panel that loads a config file from the instance
(prefilled with the game's primaryConfigFile hint — server.cfg,
ServerSettings.ini, GameXishu.json), edits it in a mono textarea, and
saves it straight to the host through the jailed agent file bridge.
Not-found is handled gracefully (empty editor to create). Works across
games; gameProfiles gains primaryConfigFile per game.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:07:59 -04:00
Vantz Stockwell
877fadcb6c feat(api): per-instance file bridge — list/read/write via the new agent file manager
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 44s
CI / integration (push) Successful in 21s
GET /api/instances/:id/files (list) + /file (read), PUT /file (write) —
tenant-guarded, routed through requestScoped to the per-instance
corrosion.{license}.{instance}.files.cmd using the new agent's {op,path}
protocol (jailed to the instance root, symlink-safe). files.view /
files.manage perms. Foundation for the per-game config editor and for
fixing the legacy VueFinder File Manager (which still speaks the retired
Go fm_* protocol on the wrong subject and is broken under per-license
auth — separate reconciliation).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:00:28 -04:00
Vantz Stockwell
e897a4802f fix(server): apply lifecycle reply state optimistically (heartbeat lag)
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 42s
CI / integration (push) Successful in 21s
The agent reply is authoritative for the action just taken; the fleet
DB only updates on the next heartbeat (~10s), so the immediate refetch
read a stale state and reverted the UI (Start -> still Stopped). Now
apply the reply's state/uptime directly to the instance.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 18:41:19 -04:00
Vantz Stockwell
c0b20f2f78 feat(server): instance-centric controls — real per-instance state + lifecycle
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 55s
CI / integration (push) Successful in 22s
The Server page now manages the selected GAME INSTANCE, not the legacy
host connection. New instances store flattens the fleet and drives the
command bridge. New 'Game instance' panel: real state badge
(running/stopped/crashed/configured), uptime, host, and an instance
selector when >1. Start/Stop/Restart/Refresh wired to POST
/api/instances/:id/lifecycle — gated on the actual instance state (not
host connectivity), with telemetry-only instances flagged. Works across
all four games (state + lifecycle are game-agnostic).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 18:37:53 -04:00
Vantz Stockwell
06e832fca1 feat(fleet): remove host — DELETE /api/fleet/hosts/:id + Fleet card action
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 36s
CI / integration (push) Successful in 21s
Self-service host removal. DELETE /api/fleet/hosts/:id (server.manage,
tenant-guarded): refuses while the host is 'connected' (409 — a live
agent re-registers on its next heartbeat, stop it first), deletes the
host's game_instances explicitly (FK is SET NULL, would otherwise
orphan them; instance_stats cascade), and clears the legacy
server_connections row if it was the license's last host. Fleet view:
offline host cards get a Remove button with inline confirm + toast.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 18:21:04 -04:00
Vantz Stockwell
009ceb86ad feat(server): real agent credentials + agent.toml setup; per-game config honesty
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 17s
CI / agent-tests (push) Successful in 45s
CI / integration (push) Successful in 22s
Server page Host-agent panel now fetches GET /api/servers/agent-
credentials and renders the real agent.toml (license UUID, nats_user,
nats_password) instead of the broken LICENSE_ID=license_key env
commands that would never connect. Password masked by default with a
reveal toggle; copy-to-clipboard uses the real value. Setup commands
point at --config /etc/corrosion/agent.toml.

Configuration panel: World size / Current seed (Rust-only Facepunch
concepts) gated behind isRust; Conan/Soulmask/Dune get an honest note
pointing to the File Manager for their real config files instead of
fake Rust fields.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 13:23:47 -04:00
Vantz Stockwell
6f31c41dc3 feat(api): instance command bridge + agent credentials endpoint
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 43s
CI / integration (push) Successful in 21s
Backend layer wiring the panel to the host agent's per-instance command
channel (the unblocker for the Server-page rework):
- NatsService.requestScoped(): request-reply with a LICENSE-SCOPED reply
  subject (corrosion.{license}.reply.<id>) so per-license-scoped agents
  (no _INBOX permission) can actually reply — the design from the NATS
  auth work, now exercised.
- InstancesModule: POST /api/instances/:id/lifecycle {action} (start/
  stop/restart/status/steam_update, server.manage) and POST :id/rcon
  {command} (server.console). Tenant-guarded via game_instances.
- GET /api/servers/agent-credentials: derives the agent's NATS user/
  password (HMAC) so a customer can configure their agent — closes the
  post-auth setup gap.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 13:05:22 -04:00
Vantz Stockwell
99433a09d1 docs(claude): Lesson 27 — lint infra config before deploy; compose up -d recreates changed deps
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 42s
CI / integration (push) Successful in 22s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:53:06 -04:00
Vantz Stockwell
b442ef4102 fix(api): consumer rejects malformed heartbeats with no host block (no phantom hosts)
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 41s
CI / integration (push) Successful in 21s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:49:53 -04:00
Vantz Stockwell
856106174a fix(nats): no_auth_user is top-level, not inside authorization{} — broke broker startup
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 43s
CI / integration (push) Successful in 22s
Caught during the live cutover: nats-server rejects 'unknown field
no_auth_user' when it is nested in the authorization block, taking the
whole broker down. Both the generator (open stage) and the committed
bootstrap default emitted it nested. Moved to top level. Enforce-stage
output was unaffected (no no_auth_user), which is what the live broker
now runs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:47:14 -04:00
Vantz Stockwell
463908b18e fix(nats): security review — secure-by-default + per-tenant inbox isolation
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 43s
CI / integration (push) Successful in 23s
Two HIGH findings from automated review on the generator, both fixed:
1. Cross-tenant inbox access: per-license users were granted _INBOX.>,
   letting license A subscribe to license B's request-reply responses.
   Now scoped to corrosion.{license}.> ONLY; replies must ride the
   license namespace (corrosion.{license}.reply.<id>) — documented in
   PROTOCOL.md. Agent unchanged (responds to msg.reply); constraint is
   on the requester (internal user has full >).
2. Default-open auth bypass: generator defaulted to stage=open with a
   full-access anonymous user — a stale regen left the broker wide open.
   Now defaults to enforce (secure by default); the explicit 'open'
   migration stage maps anonymous to a harmless corrosion.unclaimed.>
   namespace, never real tenant subjects. Committed bootstrap default
   hardened the same way.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:39:31 -04:00
Vantz Stockwell
00cff51ce5 feat(nats): per-license auth mechanism — agent user/password, scoped broker, generator (non-breaking)
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 17s
CI / agent-tests (push) Successful in 1m23s
Build Host Agent (Rust) / build (push) Successful in 1m38s
CI / integration (push) Successful in 23s
Closes the open broker (anonymous publish to any tenant's corrosion.*).
Per-license isolation via NATS user/password + subject permissions:
each license -> user=license_id, password=HMAC-SHA256(license_id,
NATS_TOKEN_SECRET), scoped to corrosion.{license_id}.> + _INBOX. Backend
uses a privileged internal user.

- Agent (alpha.5): nats_user/nats_password config + env, user_and_password
  auth; falls back to token/anonymous (transition-safe)
- Backend: connects with NATS_INTERNAL_USER/PASSWORD when set, else anon
- scripts/generate-nats-auth.mjs: regenerates nats-auth.conf from the
  licenses table; NATS_AUTH_STAGE=open keeps a no_auth_user fallback
  (verify creds first), =enforce rejects anonymous
- committed nats-auth.conf is the SAFE OPEN default (no secrets); the
  host copy carries real users and is not committed
- compose: NATS_INTERNAL_USER/PASSWORD/NATS_TOKEN_SECRET, mount nats-auth.conf

Entirely non-breaking until secrets+config deployed; staged cutover next.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:33:27 -04:00
Vantz Stockwell
7a07d600e7 feat(fleet): Phase B — fleet overview UI + GET /api/fleet read endpoint
Tenant-scoped fleet read: GET /api/fleet returns agent_hosts (host
metrics) each with their game_instances, plus a summary
(host/instance/online counts). FleetView lists host cards (status, CPU/
mem/disk/uptime/last-heartbeat) with their instances (game, state badge,
uptime); honest empty state -> Server page when no hosts. New 'Fleet'
sidebar nav item across all four game profiles, /fleet route. Store
follows the no-throw-on-fetch pattern (error state, never bricks). The
marketing hero made real from the live fleet tables.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:32:55 -04:00
Vantz Stockwell
4a4ae7a5d4 docs(claude): Lesson 26 — jail-at-entry doesn't jail the recursive walk (security review caught what my review missed)
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 41s
CI / integration (push) Successful in 21s
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:04:23 -04:00
Vantz Stockwell
930f655bf5 feat(api): fleet data model Phase A — License -> Host -> Instance
All checks were successful
CI / backend-types (push) Successful in 14s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 42s
CI / integration (push) Successful in 22s
Migration 022 adds agent_hosts / game_instances / instance_clusters /
instance_stats (named agent_hosts to avoid the existing B2B hosts
table). HostAgentConsumerService now parses the full v2 heartbeat and
upserts an agent_hosts row (host metrics: cpu/mem/disk/agent version,
keyed by license_id+hostname until enrollment) plus one game_instances
row per heartbeat instance entry (state + uptime, the billing unit).
Legacy server_connections write retained so the current panel keeps
working — additive migration, nothing breaks. Staleness sweep + offline
beacon now flip agent_hosts too. cluster_id FK reserved for Soulmask/
Dune. Migration applied to live DB; tsc green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:00:52 -04:00
Vantz Stockwell
700dc2254d fix(host-agent): SECURITY — file manager copy/list no longer follow symlinks out of the jail
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 17s
CI / agent-tests (push) Successful in 1m21s
Build Host Agent (Rust) / build (push) Successful in 1m34s
CI / integration (push) Has been cancelled
Automated security review (HIGH) caught a jail-escape my own review
missed: copy_recursive used fs::metadata (follows symlinks). A symlink
inside the jail pointing to e.g. /etc, then a 'copy' of its parent dir,
would dereference it and pull external content INTO the jail where it
could be read — a read-escape exfiltration. jail() validates only the
top-level src/dest; the recursive walk reintroduced the escape.

Fix: copy_recursive uses symlink_metadata and refuses any symlink
('symlinks are not followed across the jail boundary'). list() likewise
switched to symlink_metadata so it reports the link, never the
dereferenced target's size/type (info leak). Two regression tests added:
copy-symlink-exfil (asserts no external content lands inside) and
list-no-deref. 44/44 tests green. Rolled forward to alpha.4 (vulnerable
alpha.3 superseded).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:57:08 -04:00
1395 changed files with 243640 additions and 948 deletions

View File

@@ -67,6 +67,43 @@ jobs:
sha256sum corrosion-host-agent-windows-amd64.exe >> checksums.txt
cat checksums.txt
- name: Sign artifacts (minisign)
env:
MINISIGN_SECRET_KEY: ${{ secrets.MINISIGN_SECRET_KEY }}
run: |
if [ -z "$MINISIGN_SECRET_KEY" ]; then
echo "::error::MINISIGN_SECRET_KEY secret is not set — refusing to publish unsigned agent artifacts."
exit 1
fi
# minisign isn't packaged for bullseye — fetch the official static binary.
curl -sSL https://github.com/jedisct1/minisign/releases/download/0.12/minisign-0.12-linux.tar.gz -o /tmp/minisign.tgz
tar -xzf /tmp/minisign.tgz -C /tmp
MINISIGN="$(find /tmp -type f -name minisign -path '*linux*' | head -1)"
chmod +x "$MINISIGN"
"$MINISIGN" -v
# A minisign secret key file is TWO lines (comment + base64 blob). CI
# secret storage mangles embedded newlines, collapsing it to one line
# so minisign can't load it. Preferred form: store the secret
# base64-encoded (single line) — we decode it here. Auto-detect so a
# correctly-stored raw two-line key still works.
if printf '%s' "$MINISIGN_SECRET_KEY" | base64 -d 2>/dev/null | head -1 | grep -q "untrusted comment:"; then
printf '%s' "$MINISIGN_SECRET_KEY" | base64 -d > /tmp/sign.key
else
printf '%s\n' "$MINISIGN_SECRET_KEY" > /tmp/sign.key
fi
if ! head -1 /tmp/sign.key | grep -q "untrusted comment:"; then
echo "::error::MINISIGN_SECRET_KEY is neither base64 of a minisign key nor a raw two-line key file. Store it as: base64 < your-secret.key | tr -d '\n'"
rm -f /tmp/sign.key
exit 1
fi
cd corrosion-host-agent/bin
# Passwordless key (-W generated); feed empty stdin so it never blocks.
for f in corrosion-host-agent-linux-amd64 corrosion-host-agent-windows-amd64.exe checksums.txt; do
"$MINISIGN" -S -s /tmp/sign.key -m "$f" -x "$f.minisig" < /dev/null
done
rm -f /tmp/sign.key
echo "signed: $(ls *.minisig)"
- name: Create Release
env:
RELEASE_TOKEN: ${{ secrets.RELEASE_TOKEN }}
@@ -82,7 +119,9 @@ jobs:
"${API_URL}/repos/${REPO}/releases")
RELEASE_ID=$(echo "$RESPONSE" | grep -o '"id":[0-9]*' | head -1 | grep -o '[0-9]*')
for f in corrosion-host-agent-linux-amd64 corrosion-host-agent-windows-amd64.exe checksums.txt; do
for f in corrosion-host-agent-linux-amd64 corrosion-host-agent-linux-amd64.minisig \
corrosion-host-agent-windows-amd64.exe corrosion-host-agent-windows-amd64.exe.minisig \
checksums.txt checksums.txt.minisig; do
curl -s -X POST \
-H "Authorization: token ${RELEASE_TOKEN}" \
-H "Content-Type: application/octet-stream" \
@@ -95,7 +134,9 @@ jobs:
CDN_URL="https://cdn.corrosionmgmt.com"
VERSION="${{ steps.version.outputs.VERSION }}"
for f in corrosion-host-agent-linux-amd64 corrosion-host-agent-windows-amd64.exe checksums.txt; do
for f in corrosion-host-agent-linux-amd64 corrosion-host-agent-linux-amd64.minisig \
corrosion-host-agent-windows-amd64.exe corrosion-host-agent-windows-amd64.exe.minisig \
checksums.txt checksums.txt.minisig; do
curl -s -X POST \
-F "file=@corrosion-host-agent/bin/$f" \
"${CDN_URL}/host-agent/alpha/$f"

View File

@@ -4,6 +4,35 @@ All notable changes to this project will be documented in this file.
## [Unreleased]
### Added (Host-agent Phase 2 — Dune docker-compose adapter — 2026-06-12)
**`Supervisor` trait abstraction (`corrosion-host-agent`):**
- Introduced `trait Supervisor` (via `async-trait`, the battle-tested ecosystem standard) so the agent can manage games with fundamentally different models behind one wire contract. `ProcessSupervisor` (spawned OS process — Rust/Conan/Soulmask) and the new `DockerComposeSupervisor` (Dune) both implement it; `Agent.supervisors` is now `HashMap<String, Arc<dyn Supervisor>>` and the instance command dispatch (`instancecmd::dispatch`) is fully game-agnostic — `start`/`stop`/`restart`/`status` are identical across games. A per-game factory in `main` selects the impl. `InstanceState` moved to the shared `supervisor` module.
- **Architecture call** (per Commander): chose the `dyn` trait over a zero-dependency enum because the Dune references point at *several* future management planes (kubectl, AMP/podman, SSH) — a trait makes each new plane "new struct + impl," no central match to edit.
**`DockerComposeSupervisor` (Dune: Awakening):**
- Drives `docker compose up -d` / `stop` / `restart` against the instance's compose project (a "battlegroup"), with `-f`/`-p`/single-service support and a configurable compose binary (`docker compose` default, `docker-compose` legacy). New `[instance.docker_compose]` config block (file/project/service/command, all optional). `steam_update` already rejected for Dune (Docker images, no SteamCMD).
- **Scope (first cut):** lifecycle + cached state. Deferred to Phase 3b (with process PID adoption): container crash-detection and state adoption on agent restart (both reconcilable with a `docker compose ps` probe).
- Verified: 6 new docker-compose tests (mock `docker` binary asserting exact invocations + state transitions + failure paths) + the 5 refactored process-supervisor tests; full agent suite 56 tests green, zero warnings. Live verification against a real Dune stack pending the Commander standing one up.
### Changed (Fleet-driven active game + signed-update CI fix — 2026-06-12)
**Frontend — active game follows the deployed fleet:**
- The panel's active game (shell skin + sidebar nav + dashboard terminology) is now **derived from the deployed instances** instead of a localStorage-only toggle. `syncActiveGameFromFleet()` reads the distinct `game` values of the license's instances (`game_instances.game`, reported by the host agent): exactly one game deployed → the shell auto-skins to it; zero or multiple → `all` (neutral house skin). Wired into `DashboardLayout` (the always-mounted admin shell) via a watch on the fleet store.
- A manual GameSwitcher pick still wins — it persists to `cc-active-game` and suppresses auto-derive (operator intent beats the heuristic). Un-overridden panels keep tracking the fleet across sessions.
- **No backend/schema change:** a license's game(s) are the distinct games of its instances — the normalized source of truth. Deliberately did NOT add a `licenses.game` column (would duplicate `game_instances.game` and drift; see Lesson 20).
**Frontend — sidebar agent-health footer is now fleet-aware:**
- The shell footer read a single legacy `server.connection` (one `server_connections` row), which disagreed with the multi-host fleet. Repointed it at the fleet store: one host → hostname + status + last-heartbeat; multiple → `{online}/{total} online` + total instance count. Tone aggregates (all online → healthy, some → degraded, none → offline). Dropped the legacy `useServerStore` dependency from the shell entirely.
**Frontend — removed dead `vuefinder` dependency:**
- VueFinder was replaced by the native instance-scoped file manager but the plugin (and its CSS) were still globally registered in `main.ts` and shipped in the bundle. Removed the dep + the three `main.ts` lines. Side effect: the main JS chunk dropped **588 kB → 165 kB** (vuefinder bundled an entire unused file-manager UI).
**Recon note (not a change):** `corrosion.{license}.cmd.server` was on the cleanup list as "dead v1" — it is NOT. It remains the live license-level command path for all plugin/module config applies, plugin install, scheduled tasks, and legacy start/stop/restart, served only by the legacy Go agent. The Rust agent does not implement it yet — this is a **parity/migration gap** (Phase 2+), not dead code. Left intact.
**CI — signed host-agent build:**
- Fixed the `Sign artifacts (minisign)` step (`Error while loading the secret key file`): a minisign secret key is two lines and CI secret storage mangles the embedded newline. The job now base64-decodes the secret (single-line, mangling-proof) with auto-detect fallback to a raw key. `MINISIGN_SECRET_KEY` must be stored as `base64 < secret.key | tr -d '\n'`. Verified end-to-end: `agent-v2.0.0-alpha.8` Linux + Windows binaries validate against the agent's embedded public key; tampered byte rejected.
### Added (Host-Agent v2 Consumer + SEO Meta — 2026-06-11)
**Backend (NestJS):**

View File

@@ -447,3 +447,9 @@ Things I discovered about myself building a sister platform across multiple sess
24. **`onModuleInit` runs before async `onModuleInit` of dependencies completes — register NATS/external subscriptions in `onApplicationBootstrap`.** `NatsService.onModuleInit` connects to NATS (async); `NatsBridgeService`/`HostAgentConsumerService` registered their subscriptions in their own `onModuleInit`, which fired while the connection was still null — so every `subscribe()` hit the `[OFFLINE]` no-op path and the WS bridge was dead-on-boot in *every* production build, silently. Nest guarantees `onApplicationBootstrap` runs only after all module init (including the awaited connect) finishes. Anything that depends on another provider's async startup belongs in bootstrap, not init. The tell: a subscription that "should be there" but the handler never fires and there's no error — trace the *startup ordering*, not the handler.
25. **Fixing a dead code path detonates the live code behind it — budget for the second bug.** The moment Lesson 24's fix made the NATS→WS bridge actually deliver events, the API crashed on the first forwarded heartbeat: `WebSocket.OPEN` was `undefined` at runtime because `esModuleInterop` is off, so `import WebSocket from 'ws'` compiled to `ws_1.default` (undefined). That crash had sat behind the dead bridge since the gateway was written — never hit because no event ever reached it. When you resurrect a path that was silently no-op, everything downstream of it is effectively *untested code running for the first time in production*. Verify the whole chain end-to-end (I watched the DB row appear, then flip offline), don't stop at "the subscription fires now." This is Lesson 10 with a fuse on it. Import-runtime gotcha worth remembering: when `esModuleInterop` is off, prefer instance constants (`client.OPEN`) over class statics (`WebSocket.OPEN`) for `ws`.
26. **A jail check at the entry point does not jail the recursive walk behind it — and my own "line-by-line" review missed it; the automated security review didn't.** The file manager's `jail()` correctly canonicalized and prefix-checked the top-level path, and I traced every escape vector through it and signed off. But `copy_recursive` then walked the directory tree with `fs::metadata` (which *follows* symlinks). A symlink planted inside the jail pointing at `/etc`, then a `copy` of its parent, would dereference it and pull external content *into* the jail to be read — a jail escape the entry check never sees, because the escape is reintroduced by a descendant during traversal. Fix: `symlink_metadata` (lstat) everywhere you recurse, and refuse/never-follow symlinks across the boundary. The transferable rule: **validate at the boundary AND at every step that re-derives a path** (recursion, `read_dir`, glob, archive extraction). And the humbling part — I was confident after reviewing the jail function; the security-review pass caught the HIGH I'd waved through. Trust adversarial verification over your own once-over on security-critical code, especially path/traversal logic.
27. **Validate infra config BEFORE it reaches a deploy — and know that `docker compose up -d <service>` will recreate other services whose definitions changed.** During the NATS auth cutover I ran `docker compose up -d api` to pick up new env. Because the *nats* service definition had also changed (a new volume mount), compose recreated **corrosion-nats too** — and it failed to start on a config error (`no_auth_user` nested inside `authorization{}` instead of at top level), taking the broker down for ~3 minutes with the backend in offline mode. Two lessons: (a) a broker/proxy/DB config file is code — lint it before it can reach a restart (`nats-server -t -c cfg` to test-parse, `nginx -t`, etc.), don't let the first validation be the production container's startup; (b) `compose up -d <one-service>` is not surgical — it reconciles that service's **dependencies** too, so a stale edit to a depended-on service ships when you didn't mean it to. When touching shared-infra config, restart that service explicitly and watch it come up before moving on. Recovery also surfaced a third gotcha: recreating a client (api) while its server (nats) is down leaves the client stuck on a cached DNS failure (`EAI_AGAIN`) — restart the client once the server is healthy.
28. **A multi-line secret in CI (minisign/SSH/PGP keys) must be stored base64-encoded — the runner mangles embedded newlines and the key silently fails to load.** The signed-update CI passed the toolchain build, downloaded minisign fine, then died at the sign step on `Error while loading the secret key file` (exit 2). The cause wasn't the key or minisign — a minisign secret key file is **two lines** (`untrusted comment:` + base64 blob), and Gitea/act_runner secret storage collapses the embedded newline so the reconstructed file is one unparseable line. The robust pattern: store the secret as `base64 < secret.key | tr -d '\n'` (single line, mangling-proof) and `base64 -d` it in the job, with auto-detect fallback so a correctly-stored raw key still works, and a loud `::error::` carrying the fix command if it's neither. This applies to **any** multi-line credential in CI, not just minisign. Two corollaries: (a) the tell is "the tool runs but can't load its key" — suspect newline-mangling before the key itself; (b) generating that base64 prints the **private key to the terminal/transcript** — for a supply-chain signing key, treat it as exposed and rotate before cutover (embed the new pubkey, re-store the new secret, retire the old). And verify the published artifact end-to-end against the *embedded* pubkey (`minisign -Vm bin -P <pub>`) plus a tampered-byte negative control — a green build that signs is not the same as a signature the agent will actually accept.

View File

@@ -45,6 +45,8 @@ import { BetterChatModule } from './modules/betterchat/betterchat.module';
import { TimedExecuteModule } from './modules/timedexecute/timedexecute.module';
import { RaidableBasesModule } from './modules/raidablebases/raidablebases.module';
import { EarlyAccessModule } from './modules/early-access/early-access.module';
import { FleetModule } from './modules/fleet/fleet.module';
import { InstancesModule } from './modules/instances/instances.module';
// Shared Services
import { NatsService } from './services/nats.service';
@@ -52,6 +54,8 @@ import { NatsBridgeService } from './services/nats-bridge.service';
import { HostAgentConsumerService } from './services/host-agent-consumer.service';
import { ServerConnection } from './entities/server-connection.entity';
import { License } from './entities/license.entity';
import { AgentHost } from './entities/agent-host.entity';
import { GameInstance } from './entities/game-instance.entity';
import { SteamService } from './services/steam.service';
// Gateway
@@ -95,7 +99,7 @@ import { NatsBridgeGateway } from './gateways/nats-bridge.gateway';
ScheduleModule.forRoot(),
// Repositories for app-level shared services (host-agent consumer)
TypeOrmModule.forFeature([ServerConnection, License]),
TypeOrmModule.forFeature([ServerConnection, License, AgentHost, GameInstance]),
// Feature Modules
AuthModule,
@@ -131,6 +135,8 @@ import { NatsBridgeGateway } from './gateways/nats-bridge.gateway';
TimedExecuteModule,
RaidableBasesModule,
EarlyAccessModule,
FleetModule,
InstancesModule,
],
providers: [
// Global guards (order matters: auth first, then license, then permissions)

View File

@@ -6,6 +6,15 @@ export default () => ({
},
nats: {
url: process.env.NATS_URL || 'nats://localhost:4222',
// Public broker address shown to agents in setup instructions.
publicUrl: process.env.NATS_PUBLIC_URL || 'nats://nats.corrosionmgmt.com:4222',
// Privileged internal credentials for the backend's own NATS connection
// (full corrosion.> access). Empty = anonymous (transition period).
internalUser: process.env.NATS_INTERNAL_USER || '',
internalPassword: process.env.NATS_INTERNAL_PASSWORD || '',
// Secret used to derive a per-license agent password:
// HMAC-SHA256(license_id, secret). Shared with the nats.conf generator.
tokenSecret: process.env.NATS_TOKEN_SECRET || '',
},
jwt: {
secret: process.env.JWT_SECRET || 'change-me',

View File

@@ -0,0 +1,74 @@
import { Entity, PrimaryGeneratedColumn, Column, ManyToOne, JoinColumn, Check, Unique } from 'typeorm';
import { License } from './license.entity';
export interface AgentHostDisk {
mount: string;
total_mb: number;
free_mb: number;
}
/**
* One Corrosion host agent / one machine. Owns the machine-level facts.
*
* NOTE: distinct from the B2B `hosts` table (hosting-partner companies). This
* is `agent_hosts` — the physical/virtual box a customer runs the agent on.
*/
@Entity('agent_hosts')
@Unique(['license_id', 'hostname'])
@Check(`"status" IN ('connected', 'degraded', 'offline')`)
export class AgentHost {
@PrimaryGeneratedColumn('uuid')
id: string;
@Column({ type: 'uuid' })
license_id: string;
@Column({ type: 'varchar', length: 255, default: '' })
hostname: string;
@Column({ type: 'varchar', length: 64, nullable: true })
agent_version: string | null;
@Column({ type: 'varchar', length: 64, nullable: true })
agent_commit: string | null;
@Column({ type: 'varchar', length: 32, nullable: true })
os: string | null;
@Column({ type: 'varchar', length: 32, nullable: true })
arch: string | null;
@Column({ type: 'varchar', length: 20, default: 'offline' })
status: string;
@Column({ type: 'timestamptz', nullable: true })
last_heartbeat_at: Date | null;
@Column({ type: 'double precision', nullable: true })
cpu_percent: number | null;
@Column({ type: 'integer', nullable: true })
cpu_cores: number | null;
@Column({ type: 'bigint', nullable: true })
mem_total_mb: number | null;
@Column({ type: 'bigint', nullable: true })
mem_used_mb: number | null;
@Column({ type: 'bigint', nullable: true })
uptime_seconds: number | null;
@Column({ type: 'jsonb', nullable: true })
disks: AgentHostDisk[] | null;
@Column({ type: 'timestamptz', default: () => 'NOW()' })
created_at: Date;
@Column({ type: 'timestamptz', default: () => 'NOW()' })
updated_at: Date;
@ManyToOne(() => License, { onDelete: 'CASCADE' })
@JoinColumn({ name: 'license_id' })
license: License;
}

View File

@@ -0,0 +1,59 @@
import { Entity, PrimaryGeneratedColumn, Column, ManyToOne, JoinColumn, Unique } from 'typeorm';
import { License } from './license.entity';
import { AgentHost } from './agent-host.entity';
/**
* One game server process / orchestrated unit (a Rust server, a Conan world,
* a Dune battlegroup). The billing unit — plans count instances.
* `agent_instance_id` is the agent's slug and the NATS subject segment.
*/
@Entity('game_instances')
@Unique(['license_id', 'agent_instance_id'])
export class GameInstance {
@PrimaryGeneratedColumn('uuid')
id: string;
@Column({ type: 'uuid' })
license_id: string;
@Column({ type: 'uuid', nullable: true })
host_id: string | null;
@Column({ type: 'uuid', nullable: true })
cluster_id: string | null;
@Column({ type: 'varchar', length: 64 })
agent_instance_id: string;
@Column({ type: 'varchar', length: 32 })
game: string;
@Column({ type: 'varchar', length: 255, nullable: true })
label: string | null;
@Column({ type: 'varchar', length: 32, default: 'unknown' })
state: string;
@Column({ type: 'text', nullable: true })
root_path: string | null;
@Column({ type: 'bigint', default: 0 })
uptime_seconds: number;
@Column({ type: 'timestamptz', nullable: true })
last_seen_at: Date | null;
@Column({ type: 'timestamptz', default: () => 'NOW()' })
created_at: Date;
@Column({ type: 'timestamptz', default: () => 'NOW()' })
updated_at: Date;
@ManyToOne(() => License, { onDelete: 'CASCADE' })
@JoinColumn({ name: 'license_id' })
license: License;
@ManyToOne(() => AgentHost, { onDelete: 'SET NULL', nullable: true })
@JoinColumn({ name: 'host_id' })
host: AgentHost | null;
}

View File

@@ -0,0 +1,38 @@
import { Entity, PrimaryGeneratedColumn, Column, ManyToOne, JoinColumn } from 'typeorm';
import { License } from './license.entity';
/**
* Optional grouping of instances for games with linked topologies:
* Soulmask main/child clusters, Dune BattleGroup → Sietches. Reserved now;
* cluster orchestration ships with those game adapters.
*/
@Entity('instance_clusters')
export class InstanceCluster {
@PrimaryGeneratedColumn('uuid')
id: string;
@Column({ type: 'uuid' })
license_id: string;
@Column({ type: 'varchar', length: 32 })
game: string;
@Column({ type: 'varchar', length: 255 })
name: string;
@Column({ type: 'varchar', length: 32, nullable: true })
topology: string | null;
@Column({ type: 'jsonb', nullable: true })
config: Record<string, unknown> | null;
@Column({ type: 'timestamptz', default: () => 'NOW()' })
created_at: Date;
@Column({ type: 'timestamptz', default: () => 'NOW()' })
updated_at: Date;
@ManyToOne(() => License, { onDelete: 'CASCADE' })
@JoinColumn({ name: 'license_id' })
license: License;
}

View File

@@ -0,0 +1,38 @@
import { Entity, PrimaryGeneratedColumn, Column, ManyToOne, JoinColumn } from 'typeorm';
import { GameInstance } from './game-instance.entity';
/**
* Per-instance time-series game metrics (player count, FPS, …). Populated once
* game-level telemetry is collected via RCON/plugin — the host heartbeat
* carries host metrics, not game metrics, so this stays empty in Phase A.
*/
@Entity('instance_stats')
export class InstanceStats {
@PrimaryGeneratedColumn('uuid')
id: string;
@Column({ type: 'uuid' })
instance_id: string;
@Column({ type: 'uuid' })
license_id: string;
@Column({ type: 'integer', default: 0 })
player_count: number;
@Column({ type: 'integer', default: 0 })
max_players: number;
@Column({ type: 'double precision', default: 0 })
fps: number;
@Column({ type: 'integer', default: 0 })
memory_usage_mb: number;
@Column({ type: 'timestamptz', default: () => 'NOW()' })
recorded_at: Date;
@ManyToOne(() => GameInstance, { onDelete: 'CASCADE' })
@JoinColumn({ name: 'instance_id' })
instance: GameInstance;
}

View File

@@ -0,0 +1,26 @@
import { Controller, Get, Delete, Param } from '@nestjs/common';
import { ApiTags, ApiBearerAuth, ApiOperation } from '@nestjs/swagger';
import { FleetService } from './fleet.service';
import { CurrentTenant } from '../../common/decorators/current-tenant.decorator';
import { RequirePermission } from '../../common/decorators/require-permission.decorator';
@ApiTags('fleet')
@ApiBearerAuth()
@Controller('fleet')
export class FleetController {
constructor(private readonly fleetService: FleetService) {}
@Get()
@RequirePermission('server.view')
@ApiOperation({ summary: 'Get fleet overview — hosts and game instances for this license' })
async getFleet(@CurrentTenant() licenseId: string) {
return this.fleetService.getFleet(licenseId);
}
@Delete('hosts/:id')
@RequirePermission('server.manage')
@ApiOperation({ summary: 'Remove a host and its instances (host must be offline)' })
async deleteHost(@CurrentTenant() licenseId: string, @Param('id') id: string) {
return this.fleetService.deleteHost(licenseId, id);
}
}

View File

@@ -0,0 +1,15 @@
import { Module } from '@nestjs/common';
import { TypeOrmModule } from '@nestjs/typeorm';
import { FleetController } from './fleet.controller';
import { FleetService } from './fleet.service';
import { AgentHost } from '../../entities/agent-host.entity';
import { GameInstance } from '../../entities/game-instance.entity';
import { ServerConnection } from '../../entities/server-connection.entity';
@Module({
imports: [TypeOrmModule.forFeature([AgentHost, GameInstance, ServerConnection])],
controllers: [FleetController],
providers: [FleetService],
exports: [FleetService],
})
export class FleetModule {}

View File

@@ -0,0 +1,170 @@
import { Injectable, NotFoundException, ConflictException } from '@nestjs/common';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { AgentHost } from '../../entities/agent-host.entity';
import { GameInstance } from '../../entities/game-instance.entity';
import { ServerConnection } from '../../entities/server-connection.entity';
export interface FleetInstanceDto {
id: string;
agent_instance_id: string;
game: string;
label: string | null;
state: string;
uptime_seconds: number;
last_seen_at: string | null;
}
export interface FleetHostDto {
id: string;
hostname: string;
status: string;
agent_version: string | null;
os: string | null;
arch: string | null;
cpu_percent: number | null;
cpu_cores: number | null;
mem_total_mb: number | null;
mem_used_mb: number | null;
uptime_seconds: number | null;
disks: AgentHost['disks'];
last_heartbeat_at: string | null;
instances: FleetInstanceDto[];
}
export interface FleetSummaryDto {
host_count: number;
instance_count: number;
online_host_count: number;
}
export interface FleetResponseDto {
hosts: FleetHostDto[];
summary: FleetSummaryDto;
}
@Injectable()
export class FleetService {
constructor(
@InjectRepository(AgentHost)
private readonly hostRepo: Repository<AgentHost>,
@InjectRepository(GameInstance)
private readonly instanceRepo: Repository<GameInstance>,
@InjectRepository(ServerConnection)
private readonly connectionRepo: Repository<ServerConnection>,
) {}
/**
* Remove a host and its game instances from the fleet.
*
* Refuses while the host is `connected` — a live agent re-registers on its
* next heartbeat, so the operator must stop the agent first. Deletes the
* host's instances explicitly (the FK is SET NULL, which would otherwise
* orphan them); instance_stats cascade. If this was the license's last host,
* the legacy single-server connection row is cleared too so the old
* Dashboard doesn't show a stale server.
*/
async deleteHost(
licenseId: string,
hostId: string,
): Promise<{ deleted: true; instances_removed: number }> {
const host = await this.hostRepo.findOne({ where: { id: hostId, license_id: licenseId } });
if (!host) throw new NotFoundException('Host not found');
if (host.status === 'connected') {
throw new ConflictException(
'Host is online — stop the agent first, or it will re-register on its next heartbeat',
);
}
const del = await this.instanceRepo.delete({ license_id: licenseId, host_id: hostId });
await this.hostRepo.delete({ id: hostId, license_id: licenseId });
const remaining = await this.hostRepo.count({ where: { license_id: licenseId } });
if (remaining === 0) {
await this.connectionRepo.delete({ license_id: licenseId });
}
return { deleted: true, instances_removed: del.affected ?? 0 };
}
async getFleet(licenseId: string): Promise<FleetResponseDto> {
const [hosts, instances] = await Promise.all([
this.hostRepo.find({
where: { license_id: licenseId },
order: { hostname: 'ASC' },
}),
this.instanceRepo.find({
where: { license_id: licenseId },
order: { game: 'ASC', label: 'ASC' },
}),
]);
// Group instances by host_id. Bigint columns come back as strings from pg — coerce.
const instancesByHost = new Map<string | null, FleetInstanceDto[]>();
for (const inst of instances) {
const key = inst.host_id ?? null;
if (!instancesByHost.has(key)) {
instancesByHost.set(key, []);
}
instancesByHost.get(key)!.push({
id: inst.id,
agent_instance_id: inst.agent_instance_id,
game: inst.game,
label: inst.label,
state: inst.state,
uptime_seconds: Number(inst.uptime_seconds),
last_seen_at: inst.last_seen_at ? inst.last_seen_at.toISOString() : null,
});
}
const hostDtos: FleetHostDto[] = hosts.map((h) => ({
id: h.id,
hostname: h.hostname,
status: h.status,
agent_version: h.agent_version,
os: h.os,
arch: h.arch,
cpu_percent: h.cpu_percent !== null && h.cpu_percent !== undefined ? Number(h.cpu_percent) : null,
cpu_cores: h.cpu_cores !== null && h.cpu_cores !== undefined ? Number(h.cpu_cores) : null,
mem_total_mb: h.mem_total_mb !== null && h.mem_total_mb !== undefined ? Number(h.mem_total_mb) : null,
mem_used_mb: h.mem_used_mb !== null && h.mem_used_mb !== undefined ? Number(h.mem_used_mb) : null,
uptime_seconds: h.uptime_seconds !== null && h.uptime_seconds !== undefined ? Number(h.uptime_seconds) : null,
disks: h.disks,
last_heartbeat_at: h.last_heartbeat_at ? h.last_heartbeat_at.toISOString() : null,
instances: instancesByHost.get(h.id) ?? [],
}));
// Append synthetic "unassigned" bucket only if orphaned instances exist
const unassigned = instancesByHost.get(null) ?? [];
if (unassigned.length > 0) {
hostDtos.push({
id: '__unassigned__',
hostname: 'Unassigned',
status: 'offline',
agent_version: null,
os: null,
arch: null,
cpu_percent: null,
cpu_cores: null,
mem_total_mb: null,
mem_used_mb: null,
uptime_seconds: null,
disks: null,
last_heartbeat_at: null,
instances: unassigned,
});
}
const online_host_count = hosts.filter((h) => h.status === 'connected').length;
const instance_count = instances.length;
return {
hosts: hostDtos,
summary: {
host_count: hosts.length,
instance_count,
online_host_count,
},
};
}
}

View File

@@ -0,0 +1,133 @@
import { Controller, Post, Get, Put, Body, Param, Query } from '@nestjs/common';
import { ApiTags, ApiBearerAuth, ApiOperation } from '@nestjs/swagger';
import { CurrentTenant } from '../../common/decorators/current-tenant.decorator';
import { RequirePermission } from '../../common/decorators/require-permission.decorator';
import { InstancesService, LifecycleFunc } from './instances.service';
@ApiTags('instances')
@ApiBearerAuth()
@Controller('instances')
export class InstancesController {
constructor(private readonly instances: InstancesService) {}
@Post(':id/lifecycle')
@RequirePermission('server.manage')
@ApiOperation({ summary: 'Send a lifecycle command to a game instance (start/stop/restart/status/steam_update)' })
async lifecycle(
@CurrentTenant() licenseId: string,
@Param('id') id: string,
@Body() body: { action: LifecycleFunc },
) {
return this.instances.lifecycle(licenseId, id, body.action);
}
@Post(':id/rcon')
@RequirePermission('server.console')
@ApiOperation({ summary: 'Send an RCON/console command to a game instance' })
async rcon(
@CurrentTenant() licenseId: string,
@Param('id') id: string,
@Body() body: { command: string },
) {
return this.instances.rcon(licenseId, id, body.command);
}
@Get(':id/files')
@RequirePermission('files.view')
@ApiOperation({ summary: 'List a directory in the instance (jailed to its root)' })
async listFiles(
@CurrentTenant() licenseId: string,
@Param('id') id: string,
@Query('path') path?: string,
) {
return this.instances.listFiles(licenseId, id, path ?? '');
}
@Get(':id/file')
@RequirePermission('files.view')
@ApiOperation({ summary: 'Read a text file from the instance (jailed, 5 MiB cap)' })
async readFile(
@CurrentTenant() licenseId: string,
@Param('id') id: string,
@Query('path') path: string,
) {
return this.instances.readFile(licenseId, id, path);
}
@Put(':id/file')
@RequirePermission('files.manage')
@ApiOperation({ summary: 'Write a text file in the instance (jailed)' })
async writeFile(
@CurrentTenant() licenseId: string,
@Param('id') id: string,
@Body() body: { path: string; content: string },
) {
return this.instances.writeFile(licenseId, id, body.path, body.content ?? '');
}
@Post(':id/files/delete')
@RequirePermission('files.manage')
@ApiOperation({ summary: 'Delete a file or directory (jailed)' })
async deleteFile(
@CurrentTenant() licenseId: string,
@Param('id') id: string,
@Body() body: { path: string },
) {
return this.instances.deleteFile(licenseId, id, body.path);
}
@Post(':id/files/rename')
@RequirePermission('files.manage')
@ApiOperation({ summary: 'Rename a file/directory within its parent (jailed)' })
async renameFile(
@CurrentTenant() licenseId: string,
@Param('id') id: string,
@Body() body: { path: string; name: string },
) {
return this.instances.renameFile(licenseId, id, body.path, body.name);
}
@Post(':id/files/mkdir')
@RequirePermission('files.manage')
@ApiOperation({ summary: 'Create a directory (jailed)' })
async mkdir(
@CurrentTenant() licenseId: string,
@Param('id') id: string,
@Body() body: { path: string },
) {
return this.instances.mkdir(licenseId, id, body.path);
}
@Post(':id/files/mkfile')
@RequirePermission('files.manage')
@ApiOperation({ summary: 'Create an empty file (jailed)' })
async mkfile(
@CurrentTenant() licenseId: string,
@Param('id') id: string,
@Body() body: { path: string },
) {
return this.instances.mkfile(licenseId, id, body.path);
}
@Post(':id/files/move')
@RequirePermission('files.manage')
@ApiOperation({ summary: 'Move a file/directory (jailed)' })
async moveFile(
@CurrentTenant() licenseId: string,
@Param('id') id: string,
@Body() body: { path: string; dest: string },
) {
return this.instances.moveFile(licenseId, id, body.path, body.dest);
}
@Post(':id/files/copy')
@RequirePermission('files.manage')
@ApiOperation({ summary: 'Copy a file/directory (jailed)' })
async copyFile(
@CurrentTenant() licenseId: string,
@Param('id') id: string,
@Body() body: { path: string; dest: string },
) {
return this.instances.copyFile(licenseId, id, body.path, body.dest);
}
}

View File

@@ -0,0 +1,13 @@
import { Module } from '@nestjs/common';
import { TypeOrmModule } from '@nestjs/typeorm';
import { InstancesController } from './instances.controller';
import { InstancesService } from './instances.service';
import { GameInstance } from '../../entities/game-instance.entity';
import { NatsService } from '../../services/nats.service';
@Module({
imports: [TypeOrmModule.forFeature([GameInstance])],
controllers: [InstancesController],
providers: [InstancesService, NatsService],
})
export class InstancesModule {}

View File

@@ -0,0 +1,145 @@
import { Injectable, NotFoundException, BadRequestException, Logger } from '@nestjs/common';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import { NatsService } from '../../services/nats.service';
import { GameInstance } from '../../entities/game-instance.entity';
/** Lifecycle funcs the agent's {instance}.cmd handler accepts. */
const LIFECYCLE_FUNCS = ['start', 'stop', 'restart', 'status', 'steam_update'] as const;
export type LifecycleFunc = (typeof LIFECYCLE_FUNCS)[number];
@Injectable()
export class InstancesService {
private readonly logger = new Logger(InstancesService.name);
constructor(
private readonly nats: NatsService,
@InjectRepository(GameInstance)
private readonly instanceRepo: Repository<GameInstance>,
) {}
/** Resolve an instance the caller's license actually owns (tenant guard). */
private async resolveInstance(licenseId: string, instanceId: string): Promise<GameInstance> {
const inst = await this.instanceRepo.findOne({
where: { id: instanceId, license_id: licenseId },
});
if (!inst) throw new NotFoundException('Instance not found');
return inst;
}
async lifecycle(licenseId: string, instanceId: string, func: LifecycleFunc): Promise<unknown> {
if (!LIFECYCLE_FUNCS.includes(func)) {
throw new BadRequestException(`Unsupported action '${func}'`);
}
const inst = await this.resolveInstance(licenseId, instanceId);
const subject = `corrosion.${licenseId}.${inst.agent_instance_id}.cmd`;
this.logger.log(`instance ${inst.agent_instance_id}: ${func}`);
return this.nats.requestScoped(licenseId, subject, { func });
}
async rcon(licenseId: string, instanceId: string, command: string): Promise<unknown> {
if (!command || !command.trim()) {
throw new BadRequestException('command is required');
}
const inst = await this.resolveInstance(licenseId, instanceId);
const subject = `corrosion.${licenseId}.${inst.agent_instance_id}.cmd`;
// RCON can take longer than a lifecycle ack — give it more headroom.
return this.nats.requestScoped(licenseId, subject, { func: 'rcon', command }, 12_000);
}
// -------------------------------------------------------------------------
// File access — jailed to the instance root by the agent's file manager.
// The agent protocol (corrosion-host-agent/src/filemanager.rs):
// { op: list|read|write|delete|rename|mkdir|mkfile|move|copy, path, ... }
// reply: { status: 'success'|'error', data?, message? }
// -------------------------------------------------------------------------
private filesSubject(inst: GameInstance, licenseId: string): string {
return `corrosion.${licenseId}.${inst.agent_instance_id}.files.cmd`;
}
private async fileOp(
licenseId: string,
instanceId: string,
payload: Record<string, unknown>,
): Promise<{ status: string; data?: unknown; message?: string }> {
const inst = await this.resolveInstance(licenseId, instanceId);
const res = await this.nats.requestScoped<{ status: string; data?: unknown; message?: string }>(
licenseId,
this.filesSubject(inst, licenseId),
payload,
12_000,
);
if (res?.status === 'error') {
throw new BadRequestException(res.message ?? 'File operation failed');
}
return res;
}
async listFiles(licenseId: string, instanceId: string, path = ''): Promise<unknown> {
const res = await this.fileOp(licenseId, instanceId, { op: 'list', path });
return res.data;
}
async readFile(licenseId: string, instanceId: string, path: string): Promise<unknown> {
if (!path) throw new BadRequestException('path is required');
const res = await this.fileOp(licenseId, instanceId, { op: 'read', path });
return res.data;
}
async writeFile(
licenseId: string,
instanceId: string,
path: string,
content: string,
): Promise<unknown> {
if (!path) throw new BadRequestException('path is required');
const res = await this.fileOp(licenseId, instanceId, { op: 'write', path, content });
return res.data ?? { status: 'success' };
}
async deleteFile(licenseId: string, instanceId: string, path: string): Promise<unknown> {
if (!path) throw new BadRequestException('path is required');
return (await this.fileOp(licenseId, instanceId, { op: 'delete', path })).data ?? { ok: true };
}
async renameFile(
licenseId: string,
instanceId: string,
path: string,
name: string,
): Promise<unknown> {
if (!path || !name) throw new BadRequestException('path and name are required');
return (await this.fileOp(licenseId, instanceId, { op: 'rename', path, name })).data ?? { ok: true };
}
async mkdir(licenseId: string, instanceId: string, path: string): Promise<unknown> {
if (!path) throw new BadRequestException('path is required');
return (await this.fileOp(licenseId, instanceId, { op: 'mkdir', path })).data ?? { ok: true };
}
async mkfile(licenseId: string, instanceId: string, path: string): Promise<unknown> {
if (!path) throw new BadRequestException('path is required');
return (await this.fileOp(licenseId, instanceId, { op: 'mkfile', path })).data ?? { ok: true };
}
async moveFile(
licenseId: string,
instanceId: string,
path: string,
dest: string,
): Promise<unknown> {
if (!path || !dest) throw new BadRequestException('path and dest are required');
return (await this.fileOp(licenseId, instanceId, { op: 'move', path, dest })).data ?? { ok: true };
}
async copyFile(
licenseId: string,
instanceId: string,
path: string,
dest: string,
): Promise<unknown> {
if (!path || !dest) throw new BadRequestException('path and dest are required');
return (await this.fileOp(licenseId, instanceId, { op: 'copy', path, dest })).data ?? { ok: true };
}
}

View File

@@ -23,6 +23,13 @@ export class ServersController {
return await this.serversService.getServer(licenseId);
}
@Get('agent-credentials')
@RequirePermission('server.manage')
@ApiOperation({ summary: 'NATS credentials for this license\'s host agent' })
async getAgentCredentials(@CurrentTenant() licenseId: string) {
return await this.serversService.getAgentCredentials(licenseId);
}
@Put('config')
@RequirePermission('server.manage')
@ApiOperation({ summary: 'Update server configuration' })

View File

@@ -19,6 +19,15 @@ export class ServersService {
private readonly natsService: NatsService,
) {}
/**
* NATS credentials the customer puts in their host agent's config so it can
* authenticate to the per-license-scoped broker. Returns null if the broker
* isn't enforcing auth yet (NATS_TOKEN_SECRET unset).
*/
async getAgentCredentials(licenseId: string) {
return this.natsService.getAgentCredentials(licenseId);
}
/**
* Get server connection and config for a license.
* Returns null fields if no server has been set up yet.

View File

@@ -5,30 +5,53 @@ import { Repository } from 'typeorm';
import { NatsService } from './nats.service';
import { ServerConnection } from '../entities/server-connection.entity';
import { License } from '../entities/license.entity';
import { AgentHost, AgentHostDisk } from '../entities/agent-host.entity';
import { GameInstance } from '../entities/game-instance.entity';
/**
* Consumes Corrosion wire protocol v2 host-agent subjects
* (corrosion-host-agent/PROTOCOL.md) and keeps server_connections truthful.
* (corrosion-host-agent/PROTOCOL.md) and keeps the fleet model truthful.
*
* Before this service existed, NOTHING persisted agent heartbeats:
* companion_last_seen was written once at setup and connection_status stayed
* 'connected' forever. Now: heartbeat -> last_seen + connected (row
* auto-created on first contact), going_offline beacon -> offline, and a
* staleness sweep marks hosts offline when heartbeats stop arriving.
* Writes the License → Host → Instance model (hosts + game_instances) from
* each heartbeat, AND maintains the legacy single-server `server_connections`
* row so the current panel keeps working during the fleet UI transition.
*
* Host identity: until enrollment issues a stable host id, a host is keyed by
* (license_id, hostname). One agent = one host today; the schema is already
* multi-host-ready.
*/
interface HeartbeatPayload {
schema?: number;
timestamp?: string;
agent?: { version?: string; commit?: string; os?: string; arch?: string };
host?: {
hostname?: string | null;
cpu_percent?: number;
cpu_cores?: number;
mem_total_mb?: number;
mem_used_mb?: number;
uptime_seconds?: number;
disks?: AgentHostDisk[];
};
instances?: Array<{
id: string;
game: string;
label?: string | null;
state?: string;
uptime_seconds?: number;
}>;
}
@Injectable()
export class HostAgentConsumerService implements OnApplicationBootstrap {
private readonly logger = new Logger(HostAgentConsumerService.name);
/** licenseId -> cache expiry epoch-ms. Positive = exists, absent = unknown. */
private knownLicenses = new Map<string, number>();
/** Unknown/garbage license ids we already warned about (anti log-spam). */
private warnedUnknown = new Set<string>();
private static readonly UUID_RE =
/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i;
private static readonly LICENSE_CACHE_TTL_MS = 5 * 60_000;
/** 3x the agent's default 60s heartbeat (which jitters to max 72s). */
private static readonly OFFLINE_AFTER_MS = 180_000;
constructor(
@@ -37,6 +60,10 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
private readonly connectionRepository: Repository<ServerConnection>,
@InjectRepository(License)
private readonly licenseRepository: Repository<License>,
@InjectRepository(AgentHost)
private readonly hostRepository: Repository<AgentHost>,
@InjectRepository(GameInstance)
private readonly instanceRepository: Repository<GameInstance>,
) {}
// Bootstrap, not module-init: subscriptions registered before NatsService
@@ -44,10 +71,9 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
onApplicationBootstrap() {
this.nats.subscribe('corrosion.*.host.heartbeat', (data, subject) => {
const licenseId = subject.split('.')[1];
void this.onHeartbeat(licenseId).catch((err) =>
void this.onHeartbeat(licenseId, data as HeartbeatPayload).catch((err) =>
this.logger.error(`heartbeat handling failed for ${licenseId}: ${err.message}`, err.stack),
);
void data; // payload telemetry is bridged to the browser; persistence here is liveness only
});
this.nats.subscribe('corrosion.*.host.going_offline', (_data, subject) => {
@@ -60,25 +86,30 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
this.logger.log('Host agent (protocol v2) consumer subscriptions initialized');
}
private async onHeartbeat(licenseId: string): Promise<void> {
private async onHeartbeat(licenseId: string, payload: HeartbeatPayload): Promise<void> {
if (!(await this.isValidTenant(licenseId))) return;
// A well-formed v2 heartbeat always carries a host block. Reject malformed
// payloads so a stray/empty publish can't create a phantom host row.
if (!payload || typeof payload.host !== 'object' || payload.host === null) {
this.logger.warn(`ignoring malformed heartbeat for license ${licenseId} (no host block)`);
return;
}
const now = new Date();
const existing = await this.connectionRepository.findOne({
where: { license_id: licenseId },
});
await this.updateLegacyConnection(licenseId, now);
const host = await this.upsertHost(licenseId, payload, now);
await this.upsertInstances(licenseId, host, payload, now);
}
/** Legacy single-server row — keeps the current panel working. */
private async updateLegacyConnection(licenseId: string, now: Date): Promise<void> {
const existing = await this.connectionRepository.findOne({ where: { license_id: licenseId } });
if (existing) {
await this.connectionRepository.update(
{ id: existing.id },
{ companion_last_seen: now, connection_status: 'connected', updated_at: now },
);
if (existing.connection_status !== 'connected') {
this.logger.log(`host agent for license ${licenseId} is back online`);
}
} else {
// First contact from a host agent: auto-register the connection so the
// panel lights up without a manual setup step.
await this.connectionRepository.save(
this.connectionRepository.create({
license_id: licenseId,
@@ -87,28 +118,102 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
companion_last_seen: now,
}),
);
this.logger.log(`host agent registered for license ${licenseId} (first heartbeat)`);
}
}
/** Upsert the fleet host row, keyed by (license_id, hostname). */
private async upsertHost(licenseId: string, payload: HeartbeatPayload, now: Date): Promise<AgentHost> {
const hostname = payload.host?.hostname ?? '';
const fields = {
agent_version: payload.agent?.version ?? null,
agent_commit: payload.agent?.commit ?? null,
os: payload.agent?.os ?? null,
arch: payload.agent?.arch ?? null,
status: 'connected',
last_heartbeat_at: now,
cpu_percent: payload.host?.cpu_percent ?? null,
cpu_cores: payload.host?.cpu_cores ?? null,
mem_total_mb: payload.host?.mem_total_mb ?? null,
mem_used_mb: payload.host?.mem_used_mb ?? null,
uptime_seconds: payload.host?.uptime_seconds ?? null,
disks: payload.host?.disks ?? null,
updated_at: now,
};
const existing = await this.hostRepository.findOne({
where: { license_id: licenseId, hostname },
});
if (existing) {
await this.hostRepository.update({ id: existing.id }, fields);
return { ...existing, ...fields } as AgentHost;
}
const created = await this.hostRepository.save(
this.hostRepository.create({ license_id: licenseId, hostname, ...fields }),
);
this.logger.log(`host registered for license ${licenseId} (hostname '${hostname || 'unknown'}')`);
return created;
}
/** Upsert one game_instances row per heartbeat instance entry. */
private async upsertInstances(
licenseId: string,
host: AgentHost,
payload: HeartbeatPayload,
now: Date,
): Promise<void> {
for (const inst of payload.instances ?? []) {
if (!inst?.id || !inst?.game) continue;
const fields = {
host_id: host.id,
game: inst.game,
label: inst.label ?? null,
state: inst.state ?? 'unknown',
uptime_seconds: inst.uptime_seconds ?? 0,
last_seen_at: now,
updated_at: now,
};
const existing = await this.instanceRepository.findOne({
where: { license_id: licenseId, agent_instance_id: inst.id },
});
if (existing) {
await this.instanceRepository.update({ id: existing.id }, fields);
} else {
await this.instanceRepository.save(
this.instanceRepository.create({
license_id: licenseId,
agent_instance_id: inst.id,
...fields,
}),
);
this.logger.log(`instance '${inst.id}' (${inst.game}) registered for license ${licenseId}`);
}
}
}
private async onGoingOffline(licenseId: string): Promise<void> {
if (!(await this.isValidTenant(licenseId))) return;
const now = new Date();
await this.connectionRepository.update(
{ license_id: licenseId },
{ connection_status: 'offline', updated_at: new Date() },
{ connection_status: 'offline', updated_at: now },
);
this.logger.log(`host agent for license ${licenseId} went offline (graceful beacon)`);
await this.hostRepository.update(
{ license_id: licenseId },
{ status: 'offline', updated_at: now },
);
this.logger.log(`host(s) for license ${licenseId} went offline (graceful beacon)`);
}
/**
* Heartbeats stopping must flip the panel to offline — an agent that
* crashes or loses network never sends the goodbye beacon.
* crashes or loses network never sends the goodbye beacon. Sweeps both the
* legacy connection and fleet hosts.
*/
@Interval(60_000)
async sweepStaleConnections(): Promise<void> {
const threshold = new Date(Date.now() - HostAgentConsumerService.OFFLINE_AFTER_MS);
const result = await this.connectionRepository
const conn = await this.connectionRepository
.createQueryBuilder()
.update(ServerConnection)
.set({ connection_status: 'offline', updated_at: () => 'NOW()' })
@@ -117,8 +222,18 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
.andWhere('companion_last_seen < :threshold', { threshold })
.execute();
if (result.affected) {
this.logger.warn(`marked ${result.affected} stale host connection(s) offline`);
const hosts = await this.hostRepository
.createQueryBuilder()
.update(AgentHost)
.set({ status: 'offline', updated_at: () => 'NOW()' })
.where('status = :connected', { connected: 'connected' })
.andWhere('last_heartbeat_at IS NOT NULL')
.andWhere('last_heartbeat_at < :threshold', { threshold })
.execute();
const affected = (conn.affected ?? 0) + (hosts.affected ?? 0);
if (affected) {
this.logger.warn(`marked ${affected} stale connection/host record(s) offline`);
}
}
@@ -132,7 +247,6 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
this.warnUnknownOnce(licenseId, 'not a UUID');
return false;
}
const cachedUntil = this.knownLicenses.get(licenseId);
if (cachedUntil && cachedUntil > Date.now()) return true;
@@ -141,7 +255,6 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
this.warnUnknownOnce(licenseId, 'no such license');
return false;
}
this.knownLicenses.set(licenseId, Date.now() + HostAgentConsumerService.LICENSE_CACHE_TTL_MS);
return true;
}

View File

@@ -1,6 +1,14 @@
import { Injectable, OnModuleInit, OnModuleDestroy, Logger } from '@nestjs/common';
import { ConfigService } from '@nestjs/config';
import { connect, NatsConnection, StringCodec, Subscription } from 'nats';
import { createHmac, randomUUID } from 'crypto';
export interface AgentCredentials {
license_id: string;
nats_user: string;
nats_password: string;
nats_url: string;
}
@Injectable()
export class NatsService implements OnModuleInit, OnModuleDestroy {
@@ -13,8 +21,13 @@ export class NatsService implements OnModuleInit, OnModuleDestroy {
async onModuleInit() {
try {
const url = this.config.get<string>('nats.url') || 'nats://localhost:4222';
this.nc = await connect({ servers: url });
this.logger.log(`Connected to NATS at ${url}`);
const user = this.config.get<string>('nats.internalUser');
const pass = this.config.get<string>('nats.internalPassword');
// Authenticate with the privileged internal user when configured;
// otherwise connect anonymously (broker hasn't enforced auth yet).
const opts = user && pass ? { servers: url, user, pass } : { servers: url };
this.nc = await connect(opts);
this.logger.log(`Connected to NATS at ${url}${user ? ` as ${user}` : ' (anonymous)'}`);
} catch (err) {
this.logger.warn(`NATS connection failed — running in offline mode: ${(err as Error).message}`);
}
@@ -62,6 +75,64 @@ export class NatsService implements OnModuleInit, OnModuleDestroy {
return sub;
}
/**
* Request-reply to a host-agent subject with a LICENSE-SCOPED reply subject.
*
* Per-license agent users are confined to corrosion.{license}.> and have no
* _INBOX permission, so the agent cannot publish a reply to the default
* global inbox. The reply must live inside the license namespace
* (corrosion.{license}.reply.<id>); the privileged backend subscribes there.
* See corrosion-host-agent/PROTOCOL.md ("Reply-subject rule").
*/
async requestScoped<T = unknown>(
licenseId: string,
subject: string,
payload: Record<string, unknown>,
timeoutMs = 8000,
): Promise<T> {
if (!this.nc) {
throw new Error('NATS unavailable — agent is not reachable');
}
const replySubject = `corrosion.${licenseId}.reply.${randomUUID()}`;
const nc = this.nc;
return new Promise<T>((resolve, reject) => {
nc.subscribe(replySubject, {
max: 1,
timeout: timeoutMs,
callback: (err, msg) => {
if (err) {
reject(new Error(`agent did not respond within ${timeoutMs}ms`));
return;
}
try {
resolve(JSON.parse(this.sc.decode(msg.data)) as T);
} catch {
resolve(this.sc.decode(msg.data) as unknown as T);
}
},
});
nc.publish(subject, this.sc.encode(JSON.stringify(payload)), { reply: replySubject });
});
}
/**
* Derive a license's agent NATS credentials. Password is
* HMAC-SHA256(license_id, NATS_TOKEN_SECRET) — must match the broker config
* generated by scripts/generate-nats-auth.mjs. Returns null if the secret
* isn't configured (broker not yet enforcing auth).
*/
getAgentCredentials(licenseId: string): AgentCredentials | null {
const secret = this.config.get<string>('nats.tokenSecret');
if (!secret) return null;
const password = createHmac('sha256', secret).update(licenseId).digest('hex');
return {
license_id: licenseId,
nats_user: licenseId,
nats_password: password,
nats_url: this.config.get<string>('nats.publicUrl') || 'nats://nats.corrosionmgmt.com:4222',
};
}
/** Publish a command to a specific license's server */
async sendServerCommand(licenseId: string, action: string, payload: Record<string, unknown> = {}): Promise<void> {
await this.publish(`corrosion.${licenseId}.cmd.server`, {

View File

@@ -0,0 +1,102 @@
-- Fleet data model — License → Host → Instance (with optional Cluster)
--
-- ADDITIVE: existing server_connections / server_config / server_stats are
-- left untouched so the current single-server panel keeps working. The
-- host-agent consumer writes BOTH the legacy connection row and these fleet
-- tables during the transition; the panel migrates to the fleet tables in a
-- later phase.
--
-- Shape mirrors the host agent's wire protocol v2 heartbeat:
-- host{} block → agent_hosts
-- instances[] entries → game_instances
-- Host metrics (CPU/RAM/disk) live on the HOST, not duplicated per instance.
--
-- Named `agent_hosts` (not `hosts`) to avoid collision with the existing B2B
-- `hosts` table (hosting-partner companies) — different concept entirely.
-----------------------------------------------------------
-- AGENT_HOSTS — one Corrosion host agent / one machine
-----------------------------------------------------------
CREATE TABLE IF NOT EXISTS agent_hosts (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
license_id UUID NOT NULL REFERENCES licenses(id) ON DELETE CASCADE,
-- Natural key until enrollment issues a stable host identity.
hostname VARCHAR(255) NOT NULL DEFAULT '',
agent_version VARCHAR(64),
agent_commit VARCHAR(64),
os VARCHAR(32),
arch VARCHAR(32),
status VARCHAR(20) NOT NULL DEFAULT 'offline'
CHECK (status IN ('connected', 'degraded', 'offline')),
last_heartbeat_at TIMESTAMPTZ,
cpu_percent DOUBLE PRECISION,
cpu_cores INTEGER,
mem_total_mb BIGINT,
mem_used_mb BIGINT,
uptime_seconds BIGINT,
disks JSONB, -- [{ "mount": "/", "total_mb": n, "free_mb": n }]
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (license_id, hostname)
);
CREATE INDEX IF NOT EXISTS idx_agent_hosts_license ON agent_hosts(license_id);
-----------------------------------------------------------
-- INSTANCE CLUSTERS — optional grouping (Soulmask main/child, Dune battlegroup)
-- Reserved now; cluster logic ships with those game adapters.
-----------------------------------------------------------
CREATE TABLE IF NOT EXISTS instance_clusters (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
license_id UUID NOT NULL REFERENCES licenses(id) ON DELETE CASCADE,
game VARCHAR(32) NOT NULL,
name VARCHAR(255) NOT NULL,
topology VARCHAR(32), -- main_client | battlegroup
config JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX IF NOT EXISTS idx_clusters_license ON instance_clusters(license_id);
-----------------------------------------------------------
-- GAME INSTANCES — one game server process / orchestrated unit.
-- The billing unit (plans count instances).
-----------------------------------------------------------
CREATE TABLE IF NOT EXISTS game_instances (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
license_id UUID NOT NULL REFERENCES licenses(id) ON DELETE CASCADE,
host_id UUID REFERENCES agent_hosts(id) ON DELETE SET NULL,
cluster_id UUID REFERENCES instance_clusters(id) ON DELETE SET NULL,
-- The agent's instance slug; the NATS subject segment.
agent_instance_id VARCHAR(64) NOT NULL,
game VARCHAR(32) NOT NULL,
label VARCHAR(255),
-- running | stopped | starting | stopping | crashed
-- | configured | missing_root | unmanaged | unknown
state VARCHAR(32) NOT NULL DEFAULT 'unknown',
root_path TEXT,
uptime_seconds BIGINT NOT NULL DEFAULT 0,
last_seen_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE (license_id, agent_instance_id)
);
CREATE INDEX IF NOT EXISTS idx_instances_license ON game_instances(license_id);
CREATE INDEX IF NOT EXISTS idx_instances_host ON game_instances(host_id);
-----------------------------------------------------------
-- INSTANCE STATS — per-instance time series (game metrics).
-- Populated once game-level telemetry (player count/FPS via RCON/plugin) is
-- collected; the host heartbeat carries host metrics, not game metrics.
-----------------------------------------------------------
CREATE TABLE IF NOT EXISTS instance_stats (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
instance_id UUID NOT NULL REFERENCES game_instances(id) ON DELETE CASCADE,
license_id UUID NOT NULL REFERENCES licenses(id) ON DELETE CASCADE,
player_count INTEGER NOT NULL DEFAULT 0,
max_players INTEGER NOT NULL DEFAULT 0,
fps DOUBLE PRECISION NOT NULL DEFAULT 0,
memory_usage_mb INTEGER NOT NULL DEFAULT 0,
recorded_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX IF NOT EXISTS idx_instance_stats_instance
ON instance_stats(instance_id, recorded_at DESC);

View File

@@ -90,7 +90,7 @@ dependencies = [
"nuid",
"once_cell",
"portable-atomic",
"rand",
"rand 0.8.6",
"regex",
"ring",
"rustls-native-certs",
@@ -100,7 +100,7 @@ dependencies = [
"serde_json",
"serde_nanos",
"serde_repr",
"thiserror",
"thiserror 1.0.69",
"time",
"tokio",
"tokio-rustls",
@@ -110,6 +110,23 @@ dependencies = [
"url",
]
[[package]]
name = "async-trait"
version = "0.1.89"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9035ad2d096bed7955a320ee7e2230574d28fd3c3a0f186cbea1ff3c7eed5dbb"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "atomic-waker"
version = "1.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1505bd5d3d116872e7271a6d4e16d81d0c8570876c8de68093a09ac269d8aac0"
[[package]]
name = "autocfg"
version = "1.5.1"
@@ -180,6 +197,12 @@ version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801"
[[package]]
name = "cfg_aliases"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "613afe47fcd5fac7ccf1db93babcb082c5994d996f20b8b159f2ad1658eb5724"
[[package]]
name = "chrono"
version = "0.4.45"
@@ -264,15 +287,18 @@ checksum = "773648b94d0e5d620f64f280777445740e61fe701025087ec8b57f45c791888b"
[[package]]
name = "corrosion-host-agent"
version = "2.0.0-alpha.3"
version = "2.0.0-alpha.9"
dependencies = [
"anyhow",
"async-nats",
"async-trait",
"chrono",
"clap",
"futures",
"libc",
"rand",
"minisign-verify",
"rand 0.8.6",
"reqwest",
"serde",
"serde_json",
"sysinfo",
@@ -585,8 +611,24 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ff2abc00be7fca6ebc474524697ae276ad847ad0a6b3faa4bcb027e9a4614ad0"
dependencies = [
"cfg-if",
"js-sys",
"libc",
"wasi",
"wasm-bindgen",
]
[[package]]
name = "getrandom"
version = "0.3.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "899def5c37c4fd7b2664648c28120ecec138e4d395b459e5ca34f9cce2dd77fd"
dependencies = [
"cfg-if",
"js-sys",
"libc",
"r-efi 5.3.0",
"wasip2",
"wasm-bindgen",
]
[[package]]
@@ -597,7 +639,7 @@ checksum = "0de51e6874e94e7bf76d726fc5d13ba782deca734ff60d5bb2fb2607c7406555"
dependencies = [
"cfg-if",
"libc",
"r-efi",
"r-efi 6.0.0",
"wasip2",
"wasip3",
]
@@ -633,12 +675,94 @@ dependencies = [
"itoa",
]
[[package]]
name = "http-body"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1efedce1fb8e6913f23e0c92de8e62cd5b772a67e7b3946df930a62566c93184"
dependencies = [
"bytes",
"http",
]
[[package]]
name = "http-body-util"
version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b021d93e26becf5dc7e1b75b1bed1fd93124b374ceb73f43d4d4eafec896a64a"
dependencies = [
"bytes",
"futures-core",
"http",
"http-body",
"pin-project-lite",
]
[[package]]
name = "httparse"
version = "1.10.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6dbf3de79e51f3d586ab4cb9d5c3e2c14aa28ed23d180cf89b4df0454a69cc87"
[[package]]
name = "hyper"
version = "1.10.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "55281c53a1894c864990125767da440a4e630446785086f52523b20033b74498"
dependencies = [
"atomic-waker",
"bytes",
"futures-channel",
"futures-core",
"http",
"http-body",
"httparse",
"itoa",
"pin-project-lite",
"smallvec",
"tokio",
"want",
]
[[package]]
name = "hyper-rustls"
version = "0.27.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "33ca68d021ef39cf6463ab54c1d0f5daf03377b70561305bb89a8f83aab66e0f"
dependencies = [
"http",
"hyper",
"hyper-util",
"rustls",
"tokio",
"tokio-rustls",
"tower-service",
"webpki-roots",
]
[[package]]
name = "hyper-util"
version = "0.1.20"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "96547c2556ec9d12fb1578c4eaf448b04993e7fb79cbaad930a656880a6bdfa0"
dependencies = [
"base64",
"bytes",
"futures-channel",
"futures-util",
"http",
"http-body",
"hyper",
"ipnet",
"libc",
"percent-encoding",
"pin-project-lite",
"socket2",
"tokio",
"tower-service",
"tracing",
]
[[package]]
name = "iana-time-zone"
version = "0.1.65"
@@ -784,6 +908,12 @@ dependencies = [
"serde_core",
]
[[package]]
name = "ipnet"
version = "2.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d98f6fed1fde3f8c21bc40a1abb88dd75e67924f9cffc3ef95607bad8017f8e2"
[[package]]
name = "is_terminal_polyfill"
version = "1.70.2"
@@ -852,6 +982,12 @@ version = "0.4.32"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "953f07c43838f8e6f9758cab68bf5bed85465e7587ebe0b823f1bcd81978ad3a"
[[package]]
name = "lru-slab"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "112b39cec0b298b6c1999fee3e31427f74f676e4cb9879ed1a121b43661a4154"
[[package]]
name = "matchers"
version = "0.2.0"
@@ -867,6 +1003,12 @@ version = "2.8.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6b947ae49db0d222b1dbc6b113ce7248a3fc3a6ca21b696717bfc000ba4484d8"
[[package]]
name = "minisign-verify"
version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "22f9645cb765ea72b8111f36c522475d2daa0d22c957a9826437e97534bc4e9e"
[[package]]
name = "mio"
version = "1.2.1"
@@ -889,7 +1031,7 @@ dependencies = [
"ed25519-dalek",
"getrandom 0.2.17",
"log",
"rand",
"rand 0.8.6",
"signatory",
]
@@ -917,7 +1059,7 @@ version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fc895af95856f929163a0aa20c26a78d26bfdc839f51b9d5aa7a5b79e52b7e83"
dependencies = [
"rand",
"rand 0.8.6",
]
[[package]]
@@ -1056,6 +1198,61 @@ dependencies = [
"unicode-ident",
]
[[package]]
name = "quinn"
version = "0.11.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b9e20a958963c291dc322d98411f541009df2ced7b5a4f2bd52337638cfccf20"
dependencies = [
"bytes",
"cfg_aliases",
"pin-project-lite",
"quinn-proto",
"quinn-udp",
"rustc-hash",
"rustls",
"socket2",
"thiserror 2.0.18",
"tokio",
"tracing",
"web-time",
]
[[package]]
name = "quinn-proto"
version = "0.11.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "434b42fec591c96ef50e21e886936e66d3cc3f737104fdb9b737c40ffb94c098"
dependencies = [
"bytes",
"getrandom 0.3.4",
"lru-slab",
"rand 0.9.4",
"ring",
"rustc-hash",
"rustls",
"rustls-pki-types",
"slab",
"thiserror 2.0.18",
"tinyvec",
"tracing",
"web-time",
]
[[package]]
name = "quinn-udp"
version = "0.5.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "addec6a0dcad8a8d96a771f815f0eaf55f9d1805756410b39f5fa81332574cbd"
dependencies = [
"cfg_aliases",
"libc",
"once_cell",
"socket2",
"tracing",
"windows-sys 0.52.0",
]
[[package]]
name = "quote"
version = "1.0.45"
@@ -1065,6 +1262,12 @@ dependencies = [
"proc-macro2",
]
[[package]]
name = "r-efi"
version = "5.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "69cdb34c158ceb288df11e18b4bd39de994f6657d83847bdffdbd7f346754b0f"
[[package]]
name = "r-efi"
version = "6.0.0"
@@ -1078,8 +1281,18 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5ca0ecfa931c29007047d1bc58e623ab12e5590e8c7cc53200d5202b69266d8a"
dependencies = [
"libc",
"rand_chacha",
"rand_core",
"rand_chacha 0.3.1",
"rand_core 0.6.4",
]
[[package]]
name = "rand"
version = "0.9.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "44c5af06bb1b7d3216d91932aed5265164bf384dc89cd6ba05cf59a35f5f76ea"
dependencies = [
"rand_chacha 0.9.0",
"rand_core 0.9.5",
]
[[package]]
@@ -1089,7 +1302,17 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e6c10a63a0fa32252be49d21e7709d4d4baf8d231c2dbce1eaa8141b9b127d88"
dependencies = [
"ppv-lite86",
"rand_core",
"rand_core 0.6.4",
]
[[package]]
name = "rand_chacha"
version = "0.9.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d3022b5f1df60f26e1ffddd6c66e8aa15de382ae63b3a0c1bfc0e4d3e3f325cb"
dependencies = [
"ppv-lite86",
"rand_core 0.9.5",
]
[[package]]
@@ -1101,6 +1324,15 @@ dependencies = [
"getrandom 0.2.17",
]
[[package]]
name = "rand_core"
version = "0.9.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "76afc826de14238e6e8c374ddcc1fa19e374fd8dd986b0d2af0d02377261d83c"
dependencies = [
"getrandom 0.3.4",
]
[[package]]
name = "rayon"
version = "1.12.0"
@@ -1159,6 +1391,47 @@ version = "0.8.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d6f6ff9a378485b298a5286656da665ba74413d36db0979633275d2e708145d4"
[[package]]
name = "reqwest"
version = "0.12.28"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "eddd3ca559203180a307f12d114c268abf583f59b03cb906fd0b3ff8646c1147"
dependencies = [
"base64",
"bytes",
"futures-core",
"futures-util",
"http",
"http-body",
"http-body-util",
"hyper",
"hyper-rustls",
"hyper-util",
"js-sys",
"log",
"percent-encoding",
"pin-project-lite",
"quinn",
"rustls",
"rustls-pki-types",
"serde",
"serde_json",
"serde_urlencoded",
"sync_wrapper",
"tokio",
"tokio-rustls",
"tokio-util",
"tower",
"tower-http",
"tower-service",
"url",
"wasm-bindgen",
"wasm-bindgen-futures",
"wasm-streams",
"web-sys",
"webpki-roots",
]
[[package]]
name = "ring"
version = "0.17.14"
@@ -1173,6 +1446,12 @@ dependencies = [
"windows-sys 0.52.0",
]
[[package]]
name = "rustc-hash"
version = "2.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "94300abf3f1ae2e2b8ffb7b58043de3d399c73fa6f4b73826402a5c457614dbe"
[[package]]
name = "rustc_version"
version = "0.4.1"
@@ -1237,6 +1516,7 @@ version = "1.14.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "30a7197ae7eb376e574fe940d068c30fe0462554a3ddbe4eca7838e049c937a9"
dependencies = [
"web-time",
"zeroize",
]
@@ -1268,6 +1548,12 @@ version = "1.0.22"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b39cdef0fa800fc44525c84ccb54a029961a8215f9619753635a9c0d2538d46d"
[[package]]
name = "ryu"
version = "1.0.23"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9774ba4a74de5f7b1c1451ed6cd5285a32eddb5cccb8cc655a4e50009e06477f"
[[package]]
name = "schannel"
version = "0.1.29"
@@ -1384,6 +1670,18 @@ dependencies = [
"serde",
]
[[package]]
name = "serde_urlencoded"
version = "0.7.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d3491c14715ca2294c4d6a88f15e84739788c1d030eed8c110436aafdaa2f3fd"
dependencies = [
"form_urlencoded",
"itoa",
"ryu",
"serde",
]
[[package]]
name = "sha1"
version = "0.10.6"
@@ -1438,7 +1736,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c1e303f8205714074f6068773f0e29527e0453937fe837c9717d066635b65f31"
dependencies = [
"pkcs8",
"rand_core",
"rand_core 0.6.4",
"signature",
"zeroize",
]
@@ -1450,7 +1748,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "77549399552de45a898a580c1b41d445bf730df867cc44e6c0233bbc4b8329de"
dependencies = [
"digest",
"rand_core",
"rand_core 0.6.4",
]
[[package]]
@@ -1514,6 +1812,15 @@ dependencies = [
"unicode-ident",
]
[[package]]
name = "sync_wrapper"
version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0bf256ce5efdfa370213c1dabab5935a12e49f2c58d15e9eac2870d3b4f27263"
dependencies = [
"futures-core",
]
[[package]]
name = "synstructure"
version = "0.13.2"
@@ -1558,7 +1865,16 @@ version = "1.0.69"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b6aaf5339b578ea85b50e080feb250a3e8ae8cfcdff9a461c9ec2904bc923f52"
dependencies = [
"thiserror-impl",
"thiserror-impl 1.0.69",
]
[[package]]
name = "thiserror"
version = "2.0.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4288b5bcbc7920c07a1149a35cf9590a2aa808e0bc1eafaade0b80947865fbc4"
dependencies = [
"thiserror-impl 2.0.18",
]
[[package]]
@@ -1572,6 +1888,17 @@ dependencies = [
"syn",
]
[[package]]
name = "thiserror-impl"
version = "2.0.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ebc4ee7f67670e9b64d05fa4253e753e016c6c95ff35b89b7941d6b856dec1d5"
dependencies = [
"proc-macro2",
"quote",
"syn",
]
[[package]]
name = "thread_local"
version = "1.1.9"
@@ -1622,6 +1949,21 @@ dependencies = [
"zerovec",
]
[[package]]
name = "tinyvec"
version = "1.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3e61e67053d25a4e82c844e8424039d9745781b3fc4f32b8d55ed50f5f667ef3"
dependencies = [
"tinyvec_macros",
]
[[package]]
name = "tinyvec_macros"
version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1f3ccbac311fea05f86f61904b462b55fb3df8837a366dfc601a0161d0532f20"
[[package]]
name = "tokio"
version = "1.52.3"
@@ -1727,6 +2069,51 @@ version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5d99f8c9a7727884afe522e9bd5edbfc91a3312b36a77b5fb8926e4c31a41801"
[[package]]
name = "tower"
version = "0.5.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ebe5ef63511595f1344e2d5cfa636d973292adc0eec1f0ad45fae9f0851ab1d4"
dependencies = [
"futures-core",
"futures-util",
"pin-project-lite",
"sync_wrapper",
"tokio",
"tower-layer",
"tower-service",
]
[[package]]
name = "tower-http"
version = "0.6.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4cfcf7e2740e6fc6d4d688b4ef00650406bb94adf4731e43c096c3a19fe40840"
dependencies = [
"bitflags",
"bytes",
"futures-util",
"http",
"http-body",
"pin-project-lite",
"tower",
"tower-layer",
"tower-service",
"url",
]
[[package]]
name = "tower-layer"
version = "0.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "121c2a6cda46980bb0fcd1647ffaf6cd3fc79a013de288782836f6df9c48780e"
[[package]]
name = "tower-service"
version = "0.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8df9b6e13f2d32c91b9bd719c00d1958837bc7dec474d94952798cc8e69eeec3"
[[package]]
name = "tracing"
version = "0.1.44"
@@ -1788,6 +2175,12 @@ dependencies = [
"tracing-log",
]
[[package]]
name = "try-lock"
version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e421abadd41a4225275504ea4d6566923418b7f05506fbc9c0fe86ba7396114b"
[[package]]
name = "tryhard"
version = "0.5.2"
@@ -1810,9 +2203,9 @@ dependencies = [
"http",
"httparse",
"log",
"rand",
"rand 0.8.6",
"sha1",
"thiserror",
"thiserror 1.0.69",
"utf-8",
]
@@ -1882,6 +2275,15 @@ version = "0.9.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"
[[package]]
name = "want"
version = "0.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bfa7760aed19e106de2c7c0b581b509f2f25d3dacaf737cb82ac61bc6d760b0e"
dependencies = [
"try-lock",
]
[[package]]
name = "wasi"
version = "0.11.1+wasi-snapshot-preview1"
@@ -1919,6 +2321,16 @@ dependencies = [
"wasm-bindgen-shared",
]
[[package]]
name = "wasm-bindgen-futures"
version = "0.4.73"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "54568702fabf5d4849ce2b90fadfa64168a097eaf4b351ce9df8b687a0086aaf"
dependencies = [
"js-sys",
"wasm-bindgen",
]
[[package]]
name = "wasm-bindgen-macro"
version = "0.2.123"
@@ -1973,6 +2385,19 @@ dependencies = [
"wasmparser",
]
[[package]]
name = "wasm-streams"
version = "0.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "15053d8d85c7eccdbefef60f06769760a563c7f0a9d6902a13d35c7800b0ad65"
dependencies = [
"futures-util",
"js-sys",
"wasm-bindgen",
"wasm-bindgen-futures",
"web-sys",
]
[[package]]
name = "wasmparser"
version = "0.244.0"
@@ -1985,6 +2410,35 @@ dependencies = [
"semver",
]
[[package]]
name = "web-sys"
version = "0.3.100"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6e0871acf327f283dc6da28a1696cdc64fb355ba9f935d052021fa77f35cce69"
dependencies = [
"js-sys",
"wasm-bindgen",
]
[[package]]
name = "web-time"
version = "1.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5a6580f308b1fad9207618087a65c04e7a10bc77e02c8e84e9b00dd4b12fa0bb"
dependencies = [
"js-sys",
"wasm-bindgen",
]
[[package]]
name = "webpki-roots"
version = "1.0.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "52f5ee44c96cf55f1b349600768e3ece3a8f26010c05265ab73f945bb1a2eb9d"
dependencies = [
"rustls-pki-types",
]
[[package]]
name = "winapi"
version = "0.3.9"

View File

@@ -1,6 +1,6 @@
[package]
name = "corrosion-host-agent"
version = "2.0.0-alpha.3"
version = "2.0.0-alpha.9"
edition = "2021"
description = "Corrosion Host Agent — multi-game ops runtime for self-hosted game servers"
license = "UNLICENSED"
@@ -23,9 +23,12 @@ chrono = { version = "0.4", features = ["serde", "clock"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "fmt"] }
anyhow = "1"
async-trait = "0.1"
clap = { version = "4.5", features = ["derive"] }
rand = "0.8"
tokio-tungstenite = "0.24"
minisign-verify = "0.2.5"
reqwest = { version = "0.12", default-features = false, features = ["rustls-tls", "stream"] }
[target.'cfg(unix)'.dependencies]
libc = "0.2"

View File

@@ -85,6 +85,7 @@ Request: `{ "func": "<name>" }`. Reply: `{ "status": "success" | "error", ... }`
| `ping` | `version`, `commit`, `uptime_seconds` |
| `probe` | `report` — fresh ProbeReport (also cached for heartbeat) |
| `sysinfo` | `snapshot` — full heartbeat payload, collected on demand |
| `update` | `{ "func": "update", "url": "https://cdn.corrosionmgmt.com/host-agent/.../corrosion-host-agent-<plat>" }` → downloads the binary + `<url>.minisig`, verifies the minisign signature against the agent's EMBEDDED public key, atomically swaps (with `.old` rollback), replies `{ status: success, message: "...relaunching" }`, then relaunches the new binary. Rejects anything not signed by the release key and any URL that isn't `https://cdn.corrosionmgmt.com`. |
Unknown funcs return `status: "error"` with a message listing supported funcs.
@@ -100,8 +101,16 @@ Payload: `{}`.
Lifecycle and control for one game instance.
The same `start`/`stop`/`restart`/`status` funcs work for **every** game: the
agent picks a `Supervisor` impl per game — a spawned-process supervisor for
Rust/Conan/Soulmask, a **docker-compose supervisor for Dune** (`docker compose
up -d` / `stop` / `restart` against the instance's compose project, configured
via `[instance.docker_compose]`). The wire contract is identical; only the
management model behind it differs.
Implemented funcs: `start`, `stop` (graceful with 30s budget, then force
kill), `restart`, `status` (returns `state` + `uptime_seconds`), and
kill — process supervisor; Dune maps stop to `docker compose stop`), `restart`,
`status` (returns `state` + `uptime_seconds`), and
`rcon``{ "func": "rcon", "command": "<console command>" }` returns
`{ "status": "success", "output": <server response> }`. Protocol per game:
WebRCON (WebSocket JSON) for rust, Source RCON (Valve TCP) for
@@ -117,7 +126,10 @@ streaming progress lines to `corrosion.{license}.{instance}.steam_status`
and replying on completion.
Planned funcs: `oxide_install` (rust), plus game-adapter-specific
commands (Dune: docker lifecycle, RabbitMQ bus commands, Coriolis reset).
commands (Dune: RabbitMQ admin-bus commands, Coriolis reset, Postgres admin
surface). Dune **lifecycle** is already covered by the shared
start/stop/restart funcs above; container crash-detection and state adoption on
agent restart land with Phase 3b.
### `corrosion.{license_id}.{instance_id}.steam_status` (agent → backend, publish) — LIVE
@@ -179,6 +191,23 @@ service that attempts connections to the customer's public IP/ports on
request; that is specified as a Phase 1+ feature and will reuse this report
format with `direction: "inbound"`.
## Authentication & tenant isolation
The broker enforces per-license auth: an agent connects with `user = license_id`,
`password = HMAC-SHA256(license_id, NATS_TOKEN_SECRET)` (shown on the panel
Server page), and is scoped to `corrosion.{license_id}.>` only. The backend uses
a privileged internal user. This makes cross-tenant access impossible at the
broker, not just by convention.
**Reply-subject rule:** per-license users have NO `_INBOX` permission (granting
it would let one license read another's request-reply traffic). Therefore any
backend→agent request-reply MUST use a reply subject inside the license
namespace — e.g. `corrosion.{license_id}.reply.<id>` — never the client's
default global `_INBOX`. The agent is unaffected: it responds to whatever
`msg.reply` it receives. The constraint is on the requester (the internal user
has full access). The contract/CI tests run against an unauthenticated broker
and use the default inbox; production request-reply must follow this rule.
## Versioning
- The agent embeds semver + git hash + build timestamp (`--version`,

View File

@@ -20,8 +20,11 @@ instance on that host — Rust, Conan Exiles, Soulmask, Dune: Awakening.
crash detection with exit codes, live state in heartbeats
(integration-tested with real processes + live-NATS contract test)
- [ ] Phase 1b: RCON trait (WebRCON rust / TCP conan+soulmask), SteamCMD, jailed file manager
- [ ] Phase 2: Dune Docker adapter (compose lifecycle, RabbitMQ bus, Postgres admin)
- [ ] Phase 3: signed self-update (enforced ed25519 — release gate), service install, supervisor split
- [~] Phase 2: Dune Docker adapter **compose lifecycle done** (`docker compose up -d/stop/restart`
via the `Supervisor` trait + `DockerComposeSupervisor`); RabbitMQ admin bus + Postgres admin
surface deferred. Container crash-detection + state adoption on agent restart land with Phase 3b.
- [x] Phase 3a: SIGNED self-update — minisign-verified download+swap+relaunch (NATS `update` func); embedded public key; CI signs releases
- [ ] Phase 3b: service install (systemd/SCM), PID adoption
## Build

View File

@@ -9,7 +9,11 @@
[agent]
license_id = "your-license-uuid"
nats_url = "nats://nats.corrosionmgmt.com:4222"
# nats_token = "set-me-or-use-CORROSION_NATS_TOKEN"
# Per-license auth (preferred): user = license id, password = the token shown
# on the panel Server page. The broker scopes you to corrosion.{license}.>
# nats_user = "your-license-uuid" # defaults to license_id if omitted
# nats_password = "set-me-or-use-CORROSION_NATS_PASSWORD"
# nats_token = "legacy token-only auth; use nats_password instead"
heartbeat_seconds = 60
log_level = "info"
@@ -56,6 +60,24 @@ password = "changeme"
# Dune instances do not use SteamCMD (Docker images); the steam_update func
# will return a clear error if invoked on a dune instance.
# --- Dune: Awakening (container-managed) ---------------------------------
# Dune runs as a docker-compose stack, not a spawned process — leave
# `executable` unset and add an [instance.docker_compose] block. The agent
# drives `docker compose up -d / stop / restart` for start/stop/restart, and
# `steam_update` is rejected (Dune ships as Docker images).
#
# [[instance]]
# id = "dune-main"
# game = "dune"
# root = "/opt/dune" # directory the compose commands run in
# label = "Arrakis (battlegroup)"
#
# [instance.docker_compose]
# file = "docker-compose.yml" # -f; relative to root. Omit to use compose's discovery
# project = "dune-main" # -p; defaults to the instance id
# service = "gameserver" # limit lifecycle to one service; omit for the whole stack
# command = ["docker", "compose"] # default; use ["docker-compose"] for the legacy binary
[prober]
interval_seconds = 300

View File

@@ -7,16 +7,17 @@ use tokio::sync::RwLock;
use tokio_util::sync::CancellationToken;
use crate::config::Settings;
use crate::process::ProcessSupervisor;
use crate::prober::ProbeReport;
use crate::supervisor::Supervisor;
pub struct Agent {
pub cfg: Settings,
pub nats: async_nats::Client,
pub started: Instant,
pub last_probe: RwLock<Option<ProbeReport>>,
/// One supervisor per instance (unmanaged instances included — they
/// report `unmanaged` state and reject process commands).
pub supervisors: HashMap<String, Arc<ProcessSupervisor>>,
/// One supervisor per instance, keyed by instance id. The concrete impl
/// (process vs docker-compose) is chosen per game by the factory in main;
/// every subsystem talks to the `Supervisor` trait only.
pub supervisors: HashMap<String, Arc<dyn Supervisor>>,
pub shutdown: CancellationToken,
}

View File

@@ -33,7 +33,15 @@ pub async fn connect(cfg: &Settings) -> Result<async_nats::Client> {
if force_tls {
opts = opts.require_tls(true);
}
if let Some(token) = &cfg.nats_token {
// Per-license auth: the broker maps user=license_id, password=derived
// token to permissions scoped to corrosion.{license_id}.>. Falls back to
// token-only or anonymous so the agent still works against a broker that
// hasn't enforced auth yet (transition period).
if let Some(password) = &cfg.nats_password {
let user = cfg.nats_user.clone().unwrap_or_else(|| cfg.license_id.clone());
opts = opts.user_and_password(user, password.clone());
} else if let Some(token) = &cfg.nats_token {
opts = opts.token(token.clone());
}

View File

@@ -10,6 +10,7 @@ use serde::Deserialize;
use std::collections::HashSet;
use std::path::{Path, PathBuf};
use crate::docker_compose::DockerComposeConfig;
use crate::rcon::RconConfig;
use crate::steamcmd::SteamcmdConfig;
@@ -34,6 +35,12 @@ pub struct AgentSection {
pub license_id: Option<String>,
pub nats_url: Option<String>,
pub nats_token: Option<String>,
/// NATS username for per-license auth. Defaults to license_id when a
/// password is set but no user is given.
pub nats_user: Option<String>,
/// NATS password (the per-license token). When set, the agent authenticates
/// with user+password instead of a bare token.
pub nats_password: Option<String>,
#[serde(default = "default_heartbeat_seconds")]
pub heartbeat_seconds: u64,
#[serde(default = "default_log_level")]
@@ -70,6 +77,10 @@ pub struct InstanceConfig {
/// validate = false).
#[serde(default)]
pub steamcmd: Option<SteamcmdConfig>,
/// Docker-compose settings for container-managed games (Dune). Absent =
/// defaults apply (compose file in the instance root, project = instance id).
#[serde(default)]
pub docker_compose: Option<DockerComposeConfig>,
}
impl InstanceConfig {
@@ -122,6 +133,8 @@ pub struct Settings {
pub license_id: String,
pub nats_url: String,
pub nats_token: Option<String>,
pub nats_user: Option<String>,
pub nats_password: Option<String>,
pub heartbeat_seconds: u64,
pub log_level: String,
pub instances: Vec<InstanceConfig>,
@@ -167,6 +180,16 @@ fn resolve(file: ConfigFile) -> Result<Settings> {
.filter(|v| !v.is_empty())
.or(file.agent.nats_token);
let nats_user = std::env::var("CORROSION_NATS_USER")
.ok()
.filter(|v| !v.is_empty())
.or(file.agent.nats_user);
let nats_password = std::env::var("CORROSION_NATS_PASSWORD")
.ok()
.filter(|v| !v.is_empty())
.or(file.agent.nats_password);
validate_subject_segment("license_id", &license_id)?;
let mut seen: HashSet<&str> = HashSet::new();
@@ -196,6 +219,8 @@ fn resolve(file: ConfigFile) -> Result<Settings> {
license_id,
nats_url,
nats_token,
nats_user,
nats_password,
heartbeat_seconds: file.agent.heartbeat_seconds,
log_level: file.agent.log_level,
instances: file.instances,

View File

@@ -0,0 +1,216 @@
//! Docker-compose instance supervision — the Dune: Awakening adapter.
//!
//! Dune does not ship as a SteamCMD-updated process like Rust/Conan/Soulmask;
//! it runs as Docker container(s) (game server + RabbitMQ broker + Postgres),
//! orchestrated as a compose stack (a "battlegroup"). So Dune lifecycle is
//! `docker compose up -d / stop / restart` against the instance's compose
//! project, not a spawned OS process. This supervisor implements the same
//! [`Supervisor`] trait `ProcessSupervisor` does, so the instance command
//! dispatch is identical — only the management model differs.
//!
//! Scope (first cut): lifecycle + cached state. Two parity items are deferred
//! to Phase 3b alongside process PID adoption: (1) crash detection (containers
//! give us no child handle — a `docker compose ps` poll loop would supply it);
//! (2) state adoption on agent restart (a running stack reports `stopped` until
//! the next lifecycle command). Both are reconcilable with a `ps` probe.
//!
//! Reference: docs/reference-repos/icehunter SETUP_DOCKER.md (the docker
//! control plane this mirrors).
use std::path::PathBuf;
use std::process::Stdio;
use std::sync::Arc;
use std::time::Instant;
use anyhow::{bail, Context, Result};
use serde::Deserialize;
use tokio::process::Command;
use tokio::sync::{watch, Mutex};
use crate::config::InstanceConfig;
use crate::supervisor::{InstanceState, Supervisor};
/// Per-instance docker-compose settings (`[instance.docker_compose]`). All
/// fields optional — defaults cover the common "one compose file in the
/// instance root" case.
#[derive(Debug, Clone, Default, Deserialize)]
#[serde(deny_unknown_fields)]
pub struct DockerComposeConfig {
/// Compose file (`-f`). Relative paths resolve against the run dir. Default:
/// compose's own discovery (docker-compose.yml in the run dir).
#[serde(default)]
pub file: Option<PathBuf>,
/// Compose project name (`-p`). Default: the instance id.
#[serde(default)]
pub project: Option<String>,
/// Limit lifecycle ops to one service. Default: every service in the file.
#[serde(default)]
pub service: Option<String>,
/// Override the compose binary invocation. Default: `["docker","compose"]`.
/// Use `["docker-compose"]` for the legacy standalone binary.
#[serde(default)]
pub command: Option<Vec<String>>,
}
struct Inner {
started_at: Option<Instant>,
}
pub struct DockerComposeSupervisor {
instance_id: String,
/// Directory the compose commands run in (relative `-f`/file paths resolve
/// against it).
run_dir: PathBuf,
compose_file: Option<PathBuf>,
project: String,
service: Option<String>,
/// Compose binary + leading args, e.g. `["docker","compose"]`.
command: Vec<String>,
inner: Mutex<Inner>,
state_tx: watch::Sender<InstanceState>,
}
impl DockerComposeSupervisor {
pub fn new(cfg: &InstanceConfig) -> Arc<Self> {
let dc = cfg.docker_compose.clone().unwrap_or_default();
let run_dir = cfg
.working_dir
.clone()
.unwrap_or_else(|| cfg.root.clone());
let command = dc
.command
.filter(|c| !c.is_empty())
.unwrap_or_else(|| vec!["docker".to_string(), "compose".to_string()]);
let (state_tx, _) = watch::channel(InstanceState::Stopped);
Arc::new(Self {
instance_id: cfg.id.clone(),
run_dir,
compose_file: dc.file,
project: dc.project.unwrap_or_else(|| cfg.id.clone()),
service: dc.service,
command,
inner: Mutex::new(Inner { started_at: None }),
state_tx,
})
}
fn set_state(&self, state: InstanceState) {
let _ = self.state_tx.send_replace(state);
}
/// Run one compose subcommand (`up`/`stop`/`restart`/...), bailing with the
/// captured stderr on non-zero exit. Global flags (`-f`, `-p`) precede the
/// subcommand; the optional single service is appended last.
async fn run(&self, action: &str, action_args: &[&str]) -> Result<()> {
let mut cmd = Command::new(&self.command[0]);
cmd.args(&self.command[1..]);
if let Some(file) = &self.compose_file {
cmd.arg("-f").arg(file);
}
cmd.arg("-p").arg(&self.project);
cmd.arg(action);
cmd.args(action_args);
if let Some(service) = &self.service {
cmd.arg(service);
}
cmd.current_dir(&self.run_dir)
.stdin(Stdio::null())
.stdout(Stdio::piped())
.stderr(Stdio::piped());
let output = cmd
.output()
.await
.with_context(|| format!("running `{} {action}` (is docker installed and on PATH?)", self.command.join(" ")))?;
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
let stdout = String::from_utf8_lossy(&output.stdout);
let detail = if !stderr.trim().is_empty() {
stderr.trim()
} else {
stdout.trim()
};
bail!("compose {action} failed ({}): {detail}", output.status);
}
Ok(())
}
}
#[async_trait::async_trait]
impl Supervisor for DockerComposeSupervisor {
fn instance_id(&self) -> &str {
&self.instance_id
}
fn state(&self) -> InstanceState {
self.state_tx.borrow().clone()
}
fn watch_state(&self) -> watch::Receiver<InstanceState> {
self.state_tx.subscribe()
}
async fn uptime_seconds(&self) -> u64 {
let inner = self.inner.lock().await;
match (&*self.state_tx.borrow(), inner.started_at) {
(InstanceState::Running, Some(t)) => t.elapsed().as_secs(),
_ => 0,
}
}
async fn start(self: Arc<Self>) -> Result<()> {
if matches!(
*self.state_tx.borrow(),
InstanceState::Running | InstanceState::Starting
) {
bail!("instance '{}' is already running", self.instance_id);
}
self.set_state(InstanceState::Starting);
match self.run("up", &["-d"]).await {
Ok(()) => {
self.inner.lock().await.started_at = Some(Instant::now());
self.set_state(InstanceState::Running);
tracing::info!("instance '{}' compose up -d", self.instance_id);
Ok(())
}
Err(e) => {
self.set_state(InstanceState::Stopped);
Err(e)
}
}
}
async fn stop(self: Arc<Self>) -> Result<()> {
self.set_state(InstanceState::Stopping);
match self.run("stop", &[]).await {
Ok(()) => {
self.inner.lock().await.started_at = None;
self.set_state(InstanceState::Stopped);
tracing::info!("instance '{}' compose stop", self.instance_id);
Ok(())
}
Err(e) => {
// Stop failed — the stack is most likely still up.
self.set_state(InstanceState::Running);
Err(e)
}
}
}
async fn restart(self: Arc<Self>) -> Result<()> {
self.set_state(InstanceState::Starting);
match self.run("restart", &[]).await {
Ok(()) => {
self.inner.lock().await.started_at = Some(Instant::now());
self.set_state(InstanceState::Running);
tracing::info!("instance '{}' compose restart", self.instance_id);
Ok(())
}
Err(e) => {
self.set_state(InstanceState::Stopped);
Err(e)
}
}
}
}

View File

@@ -198,7 +198,11 @@ pub fn list(root: &Path, rel: &str) -> anyhow::Result<Vec<FileEntry>> {
let mut entries: Vec<FileEntry> = Vec::new();
for item in rd {
let item = item.with_context(|| format!("reading directory entry in '{}'", abs.display()))?;
let meta = item.metadata().with_context(|| format!("stat '{}'", item.path().display()))?;
// symlink_metadata (lstat): report the link itself, never the target —
// following it would leak the size/type/existence of files outside the
// jail. A symlink lists as a zero-ish-size non-dir entry.
let meta = fs::symlink_metadata(item.path())
.with_context(|| format!("stat '{}'", item.path().display()))?;
let name = item.file_name().to_string_lossy().into_owned();
let is_dir = meta.is_dir();
@@ -367,11 +371,24 @@ pub fn copy(root: &Path, src: &str, dest: &str) -> anyhow::Result<()> {
.with_context(|| format!("copy '{}' -> '{}'", src_abs.display(), dest_abs.display()))
}
/// Recursive copy helper (mirrors Go's `copyRecursive`).
/// Recursive copy helper.
///
/// SECURITY: uses `symlink_metadata` (does NOT follow symlinks) and refuses to
/// copy any symlink. `jail()` only validates the top-level src/dest; a symlink
/// *inside* a copied directory that points outside the jail would, if followed,
/// pull external content (e.g. `/etc`) into the jail where it could then be
/// read — a jail-escape exfiltration. Refusing symlinks closes that path.
fn copy_recursive(src: &Path, dest: &Path) -> anyhow::Result<()> {
let meta = fs::metadata(src)
let meta = fs::symlink_metadata(src)
.with_context(|| format!("stat source '{}'", src.display()))?;
if meta.file_type().is_symlink() {
bail!(
"refusing to copy symlink '{}' — symlinks are not followed across the jail boundary",
src.display()
);
}
if meta.is_dir() {
fs::create_dir_all(dest)
.with_context(|| format!("create_dir_all '{}'", dest.display()))?;

View File

@@ -13,11 +13,15 @@ use crate::agent::Agent;
use crate::prober;
use crate::subjects;
use crate::telemetry;
use crate::update;
use crate::version;
#[derive(Debug, Deserialize)]
struct HostCommand {
func: String,
/// Signed-update artifact URL (for func = "update").
#[serde(default)]
url: Option<String>,
}
pub async fn run(agent: Arc<Agent>) -> anyhow::Result<()> {
@@ -55,20 +59,46 @@ async fn handle(agent: Arc<Agent>, msg: async_nats::Message) {
return;
};
let response = match serde_json::from_slice::<HostCommand>(&msg.payload) {
Ok(cmd) => dispatch(&agent, &cmd.func).await,
Err(e) => json!({ "status": "error", "message": format!("invalid command payload: {e}") }),
};
let bytes = match serde_json::to_vec(&response) {
Ok(b) => b,
let cmd = match serde_json::from_slice::<HostCommand>(&msg.payload) {
Ok(cmd) => cmd,
Err(e) => {
tracing::error!("response serialize failed: {e}");
publish(&agent, &reply, json!({ "status": "error", "message": format!("invalid command payload: {e}") })).await;
return;
}
};
if let Err(e) = agent.nats.publish(reply, bytes.into()).await {
tracing::warn!("response publish failed: {e}");
// Self-update is special: it must reply BEFORE relaunching, because the
// relaunch replaces this process and nothing after it would run.
if cmd.func == "update" {
let Some(url) = cmd.url else {
publish(&agent, &reply, json!({ "status": "error", "message": "update requires a 'url'" })).await;
return;
};
match update::download_verify_swap(&url).await {
Ok(_) => {
publish(&agent, &reply, json!({ "status": "success", "func": "update", "message": "verified and swapped; relaunching" })).await;
let _ = agent.nats.flush().await;
update::relaunch_and_exit();
}
Err(e) => {
publish(&agent, &reply, json!({ "status": "error", "func": "update", "message": format!("{e:#}") })).await;
}
}
return;
}
let response = dispatch(&agent, &cmd.func).await;
publish(&agent, &reply, response).await;
}
async fn publish(agent: &Arc<Agent>, reply: &async_nats::Subject, value: serde_json::Value) {
match serde_json::to_vec(&value) {
Ok(bytes) => {
if let Err(e) = agent.nats.publish(reply.clone(), bytes.into()).await {
tracing::warn!("response publish failed: {e}");
}
}
Err(e) => tracing::error!("response serialize failed: {e}"),
}
}

View File

@@ -13,9 +13,9 @@ use serde_json::json;
use std::sync::Arc;
use crate::agent::Agent;
use crate::process::ProcessSupervisor;
use crate::subjects;
use crate::steamcmd;
use crate::supervisor::Supervisor;
#[derive(Debug, Deserialize)]
struct InstanceCommand {
@@ -26,8 +26,8 @@ struct InstanceCommand {
}
/// Forward every supervisor state change as a status event.
pub async fn publish_state_changes(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>) {
let subject = subjects::instance_status(&agent.cfg.license_id, &sup.instance_id);
pub async fn publish_state_changes(agent: Arc<Agent>, sup: Arc<dyn Supervisor>) {
let subject = subjects::instance_status(&agent.cfg.license_id, sup.instance_id());
let mut rx = sup.watch_state();
let cancel = agent.shutdown.clone();
@@ -40,13 +40,13 @@ pub async fn publish_state_changes(agent: Arc<Agent>, sup: Arc<ProcessSupervisor
let state = rx.borrow().clone();
let event = json!({
"timestamp": Utc::now().to_rfc3339_opts(SecondsFormat::Secs, true),
"instance_id": sup.instance_id,
"instance_id": sup.instance_id(),
"event": state,
});
match serde_json::to_vec(&event) {
Ok(bytes) => {
if let Err(e) = agent.nats.publish(subject.clone(), bytes.into()).await {
tracing::warn!("status publish failed for '{}': {e}", sup.instance_id);
tracing::warn!("status publish failed for '{}': {e}", sup.instance_id());
}
}
Err(e) => tracing::error!("status serialize failed: {e}"),
@@ -58,8 +58,8 @@ pub async fn publish_state_changes(agent: Arc<Agent>, sup: Arc<ProcessSupervisor
}
/// Request-reply command handler for one instance.
pub async fn run(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>) -> anyhow::Result<()> {
let subject = subjects::instance_cmd(&agent.cfg.license_id, &sup.instance_id);
pub async fn run(agent: Arc<Agent>, sup: Arc<dyn Supervisor>) -> anyhow::Result<()> {
let subject = subjects::instance_cmd(&agent.cfg.license_id, sup.instance_id());
let mut sub = agent.nats.subscribe(subject.clone()).await?;
tracing::info!("instance command handler listening on {subject}");
@@ -74,13 +74,13 @@ pub async fn run(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>) -> anyhow::Resu
tokio::spawn(async move { handle(agent, sup, msg).await });
}
None => {
tracing::warn!("instance command subscription ended for '{}'", sup.instance_id);
tracing::warn!("instance command subscription ended for '{}'", sup.instance_id());
break;
}
}
}
_ = cancel.cancelled() => {
tracing::info!("instance command handler stopping for '{}'", sup.instance_id);
tracing::info!("instance command handler stopping for '{}'", sup.instance_id());
break;
}
}
@@ -88,7 +88,7 @@ pub async fn run(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>) -> anyhow::Resu
Ok(())
}
async fn handle(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>, msg: async_nats::Message) {
async fn handle(agent: Arc<Agent>, sup: Arc<dyn Supervisor>, msg: async_nats::Message) {
let Some(reply) = msg.reply.clone() else {
tracing::warn!("instance command without reply subject ignored");
return;
@@ -113,20 +113,22 @@ async fn handle(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>, msg: async_nats:
async fn dispatch(
agent: &Arc<Agent>,
sup: &Arc<ProcessSupervisor>,
sup: &Arc<dyn Supervisor>,
cmd: &InstanceCommand,
) -> serde_json::Value {
let func = cmd.func.as_str();
// start/stop/restart take `self: Arc<Self>` (they may hand a clone to a
// monitor task), so clone the Arc before the consuming call.
let outcome = match func {
"start" => sup.start().await.map(|_| "starting"),
"stop" => sup.stop().await.map(|_| "stopped"),
"restart" => sup.restart().await.map(|_| "restarted"),
"start" => sup.clone().start().await.map(|_| "starting"),
"stop" => sup.clone().stop().await.map(|_| "stopped"),
"restart" => sup.clone().restart().await.map(|_| "restarted"),
"status" => {
return json!({
"status": "success",
"func": "status",
"instance_id": sup.instance_id,
"instance_id": sup.instance_id(),
"state": sup.state(),
"uptime_seconds": sup.uptime_seconds().await,
});
@@ -139,15 +141,15 @@ async fn dispatch(
.cfg
.instances
.iter()
.find(|i| i.id == sup.instance_id);
.find(|i| i.id == sup.instance_id());
let rcon_cfg = inst_cfg.and_then(|i| i.rcon.as_ref());
let Some(rcon_cfg) = rcon_cfg else {
return json!({
"status": "error",
"func": "rcon",
"instance_id": sup.instance_id,
"message": format!("instance '{}' has no rcon configured", sup.instance_id),
"instance_id": sup.instance_id(),
"message": format!("instance '{}' has no rcon configured", sup.instance_id()),
});
};
@@ -155,7 +157,7 @@ async fn dispatch(
return json!({
"status": "error",
"func": "rcon",
"instance_id": sup.instance_id,
"instance_id": sup.instance_id(),
"message": "rcon func requires a 'command' field",
});
};
@@ -165,13 +167,13 @@ async fn dispatch(
Ok(output) => json!({
"status": "success",
"func": "rcon",
"instance_id": sup.instance_id,
"instance_id": sup.instance_id(),
"output": output,
}),
Err(e) => json!({
"status": "error",
"func": "rcon",
"instance_id": sup.instance_id,
"instance_id": sup.instance_id(),
"message": format!("{e:#}"),
}),
};
@@ -181,14 +183,14 @@ async fn dispatch(
// settings. The supervisor only carries process-control state, not
// the full config, so we reach into agent.cfg.instances here as the
// rcon dispatch does.
let inst_cfg = agent.cfg.instances.iter().find(|i| i.id == sup.instance_id);
let inst_cfg = agent.cfg.instances.iter().find(|i| i.id == sup.instance_id());
let Some(inst_cfg) = inst_cfg else {
return json!({
"status": "error",
"func": "steam_update",
"instance_id": sup.instance_id,
"message": format!("no config found for instance '{}'", sup.instance_id),
"instance_id": sup.instance_id(),
"message": format!("no config found for instance '{}'", sup.instance_id()),
});
};
@@ -209,7 +211,7 @@ async fn dispatch(
};
let license = agent.cfg.license_id.clone();
let instance_id = sup.instance_id.clone();
let instance_id = sup.instance_id().to_string();
let nats = agent.nats.clone();
// Publish each progress line to the steam_status subject.
@@ -240,12 +242,12 @@ async fn dispatch(
Ok(()) => json!({
"status": "success",
"func": "steam_update",
"instance_id": sup.instance_id,
"instance_id": sup.instance_id(),
}),
Err(e) => json!({
"status": "error",
"func": "steam_update",
"instance_id": sup.instance_id,
"instance_id": sup.instance_id(),
"message": format!("{e:#}"),
}),
};
@@ -262,14 +264,14 @@ async fn dispatch(
Ok(result) => json!({
"status": "success",
"func": func,
"instance_id": sup.instance_id,
"instance_id": sup.instance_id(),
"result": result,
"state": sup.state(),
}),
Err(e) => json!({
"status": "error",
"func": func,
"instance_id": sup.instance_id,
"instance_id": sup.instance_id(),
"message": format!("{e:#}"),
}),
}

View File

@@ -4,6 +4,7 @@
pub mod agent;
pub mod bus;
pub mod config;
pub mod docker_compose;
pub mod filemanager;
pub mod hostcmd;
pub mod instancecmd;
@@ -12,5 +13,7 @@ pub mod process;
pub mod rcon;
pub mod steamcmd;
pub mod subjects;
pub mod supervisor;
pub mod telemetry;
pub mod update;
pub mod version;

View File

@@ -5,8 +5,8 @@
//! game adapters arrive in Phase 1+ (see PROTOCOL.md).
use corrosion_host_agent::{
agent, bus, config, filemanager, hostcmd, instancecmd, prober, process, subjects, telemetry,
version,
agent, bus, config, docker_compose, filemanager, hostcmd, instancecmd, prober, process,
subjects, supervisor, telemetry, version,
};
use anyhow::{Context, Result};
@@ -92,10 +92,20 @@ async fn run(settings: config::Settings) -> Result<()> {
let nats = bus::connect(&settings).await?;
let supervisors = settings
// Per-game supervisor factory: container-managed games (Dune) get a
// docker-compose supervisor; everything else is a spawned-process
// supervisor. Both satisfy the `Supervisor` trait, so the rest of the agent
// is game-agnostic.
let supervisors: std::collections::HashMap<String, Arc<dyn supervisor::Supervisor>> = settings
.instances
.iter()
.map(|inst| (inst.id.clone(), process::ProcessSupervisor::new(inst)))
.map(|inst| {
let sup: Arc<dyn supervisor::Supervisor> = match inst.game.as_str() {
"dune" => docker_compose::DockerComposeSupervisor::new(inst),
_ => process::ProcessSupervisor::new(inst),
};
(inst.id.clone(), sup)
})
.collect();
let agent = Arc::new(Agent {

View File

@@ -1,14 +1,16 @@
//! Per-instance game-server process supervision.
//!
//! One `ProcessSupervisor` per process-managed instance. Lifecycle mirrors the
//! proven Go agent behavior — graceful SIGTERM with a 30s budget before force
//! kill, a monitor task that reaps the child and records crash-vs-stop — with
//! two fixes the Go version needed: args are a proper list (no naive space
//! splitting), and every state change is observable through a watch channel
//! so the panel gets push events instead of waiting for the next heartbeat.
//! One `ProcessSupervisor` per process-managed instance (Rust/Conan/Soulmask).
//! Lifecycle mirrors the proven Go agent behavior — graceful SIGTERM with a 30s
//! budget before force kill, a monitor task that reaps the child and records
//! crash-vs-stop — with two fixes the Go version needed: args are a proper list
//! (no naive space splitting), and every state change is observable through a
//! watch channel so the panel gets push events instead of waiting for the next
//! heartbeat. Lifecycle control is exposed through the [`Supervisor`] trait so
//! the command dispatch is identical across process- and container-managed
//! games.
use anyhow::{bail, Context, Result};
use serde::Serialize;
use std::path::PathBuf;
use std::process::Stdio;
use std::sync::Arc;
@@ -17,39 +19,11 @@ use tokio::process::{Child, Command};
use tokio::sync::{watch, Mutex};
use crate::config::InstanceConfig;
use crate::supervisor::{InstanceState, Supervisor};
const GRACEFUL_STOP_BUDGET: Duration = Duration::from_secs(30);
const RESTART_PAUSE: Duration = Duration::from_secs(2);
#[derive(Debug, Clone, PartialEq, Serialize)]
#[serde(rename_all = "snake_case", tag = "state")]
pub enum InstanceState {
/// Not process-managed (no executable configured).
Unmanaged,
Stopped,
Starting,
Running,
Stopping,
/// Process exited without a stop request.
Crashed {
#[serde(skip_serializing_if = "Option::is_none")]
exit_code: Option<i32>,
},
}
impl InstanceState {
pub fn as_label(&self) -> &'static str {
match self {
InstanceState::Unmanaged => "unmanaged",
InstanceState::Stopped => "stopped",
InstanceState::Starting => "starting",
InstanceState::Running => "running",
InstanceState::Stopping => "stopping",
InstanceState::Crashed { .. } => "crashed",
}
}
}
struct Inner {
child: Option<Child>,
started_at: Option<Instant>,
@@ -59,7 +33,7 @@ struct Inner {
}
pub struct ProcessSupervisor {
pub instance_id: String,
instance_id: String,
executable: Option<PathBuf>,
args: Vec<String>,
working_dir: Option<PathBuf>,
@@ -90,72 +64,6 @@ impl ProcessSupervisor {
})
}
pub fn state(&self) -> InstanceState {
self.state_tx.borrow().clone()
}
pub fn watch_state(&self) -> watch::Receiver<InstanceState> {
self.state_tx.subscribe()
}
pub async fn uptime_seconds(&self) -> u64 {
let inner = self.inner.lock().await;
match (&*self.state_tx.borrow(), inner.started_at) {
(InstanceState::Running, Some(t)) => t.elapsed().as_secs(),
_ => 0,
}
}
pub async fn start(self: &Arc<Self>) -> Result<()> {
let Some(exe) = self.executable.clone() else {
bail!("instance '{}' has no executable configured", self.instance_id);
};
if !exe.exists() {
bail!("executable not found: {}", exe.display());
}
let mut inner = self.inner.lock().await;
if matches!(*self.state_tx.borrow(), InstanceState::Running | InstanceState::Starting) {
bail!("instance '{}' is already running", self.instance_id);
}
self.set_state(InstanceState::Starting);
let workdir = self
.working_dir
.clone()
.or_else(|| exe.parent().map(|p| p.to_path_buf()))
.unwrap_or_else(|| PathBuf::from("."));
let child = Command::new(&exe)
.args(&self.args)
.current_dir(&workdir)
.stdin(Stdio::null())
.stdout(Stdio::inherit())
.stderr(Stdio::inherit())
.spawn()
.with_context(|| format!("spawning {}", exe.display()))?;
let pid = child.id();
inner.child = Some(child);
inner.started_at = Some(Instant::now());
inner.stop_requested = false;
drop(inner);
self.set_state(InstanceState::Running);
tracing::info!(
"instance '{}' started: {} (pid {:?})",
self.instance_id,
exe.display(),
pid
);
// Monitor: reap the child and classify the exit.
let sup = Arc::clone(self);
tokio::spawn(async move { sup.monitor().await });
Ok(())
}
async fn monitor(self: Arc<Self>) {
// Take a waiter without holding the lock across the whole child
// lifetime: Child::wait needs &mut, so the child stays in inner and
@@ -201,7 +109,85 @@ impl ProcessSupervisor {
}
}
pub async fn stop(self: &Arc<Self>) -> Result<()> {
fn set_state(&self, state: InstanceState) {
// send_replace never fails even with zero receivers.
let _ = self.state_tx.send_replace(state);
}
}
#[async_trait::async_trait]
impl Supervisor for ProcessSupervisor {
fn instance_id(&self) -> &str {
&self.instance_id
}
fn state(&self) -> InstanceState {
self.state_tx.borrow().clone()
}
fn watch_state(&self) -> watch::Receiver<InstanceState> {
self.state_tx.subscribe()
}
async fn uptime_seconds(&self) -> u64 {
let inner = self.inner.lock().await;
match (&*self.state_tx.borrow(), inner.started_at) {
(InstanceState::Running, Some(t)) => t.elapsed().as_secs(),
_ => 0,
}
}
async fn start(self: Arc<Self>) -> Result<()> {
let Some(exe) = self.executable.clone() else {
bail!("instance '{}' has no executable configured", self.instance_id);
};
if !exe.exists() {
bail!("executable not found: {}", exe.display());
}
let mut inner = self.inner.lock().await;
if matches!(*self.state_tx.borrow(), InstanceState::Running | InstanceState::Starting) {
bail!("instance '{}' is already running", self.instance_id);
}
self.set_state(InstanceState::Starting);
let workdir = self
.working_dir
.clone()
.or_else(|| exe.parent().map(|p| p.to_path_buf()))
.unwrap_or_else(|| PathBuf::from("."));
let child = Command::new(&exe)
.args(&self.args)
.current_dir(&workdir)
.stdin(Stdio::null())
.stdout(Stdio::inherit())
.stderr(Stdio::inherit())
.spawn()
.with_context(|| format!("spawning {}", exe.display()))?;
let pid = child.id();
inner.child = Some(child);
inner.started_at = Some(Instant::now());
inner.stop_requested = false;
drop(inner);
self.set_state(InstanceState::Running);
tracing::info!(
"instance '{}' started: {} (pid {:?})",
self.instance_id,
exe.display(),
pid
);
// Monitor: reap the child and classify the exit.
let sup = Arc::clone(&self);
tokio::spawn(async move { sup.monitor().await });
Ok(())
}
async fn stop(self: Arc<Self>) -> Result<()> {
let mut inner = self.inner.lock().await;
if inner.child.is_none() {
bail!("instance '{}' is not running", self.instance_id);
@@ -263,16 +249,14 @@ impl ProcessSupervisor {
Ok(())
}
pub async fn restart(self: &Arc<Self>) -> Result<()> {
if !matches!(*self.state_tx.borrow(), InstanceState::Stopped | InstanceState::Crashed { .. } | InstanceState::Unmanaged) {
self.stop().await?;
async fn restart(self: Arc<Self>) -> Result<()> {
if !matches!(
*self.state_tx.borrow(),
InstanceState::Stopped | InstanceState::Crashed { .. } | InstanceState::Unmanaged
) {
self.clone().stop().await?;
}
tokio::time::sleep(RESTART_PAUSE).await;
self.start().await
}
fn set_state(&self, state: InstanceState) {
// send_replace never fails even with zero receivers.
let _ = self.state_tx.send_replace(state);
}
}

View File

@@ -0,0 +1,80 @@
//! The supervision abstraction.
//!
//! A `Supervisor` owns the lifecycle of one game instance. Different games are
//! managed in fundamentally different ways — Rust/Conan/Soulmask are spawned OS
//! processes ([`crate::process::ProcessSupervisor`]); Dune is a docker-compose
//! stack ([`crate::docker_compose::DockerComposeSupervisor`]); future planes
//! (kubectl, AMP/podman, SSH) will be their own impls. The instance command
//! dispatch (`instancecmd::dispatch`) talks only to this trait, so it never
//! learns which management model is behind a given instance.
//!
//! Trait objects (`Arc<dyn Supervisor>`) need object-safe, dynamically
//! dispatchable async methods; native `async fn` in traits is not yet
//! dyn-compatible, so we use `#[async_trait]` (the battle-tested ecosystem
//! standard) to box the returned futures. The cost — one heap alloc per
//! lifecycle call — is irrelevant for start/stop/restart, which happen seconds
//! to minutes apart.
use std::sync::Arc;
use anyhow::Result;
use serde::Serialize;
use tokio::sync::watch;
/// Observable lifecycle state of one instance. Shared vocabulary across every
/// supervisor impl; serialized verbatim into heartbeats and status events
/// (`{"state":"running", ...}`).
#[derive(Debug, Clone, PartialEq, Serialize)]
#[serde(rename_all = "snake_case", tag = "state")]
pub enum InstanceState {
/// Not lifecycle-managed (a process instance with no executable, etc.).
Unmanaged,
Stopped,
Starting,
Running,
Stopping,
/// Exited/died without a stop request.
Crashed {
#[serde(skip_serializing_if = "Option::is_none")]
exit_code: Option<i32>,
},
}
impl InstanceState {
pub fn as_label(&self) -> &'static str {
match self {
InstanceState::Unmanaged => "unmanaged",
InstanceState::Stopped => "stopped",
InstanceState::Starting => "starting",
InstanceState::Running => "running",
InstanceState::Stopping => "stopping",
InstanceState::Crashed { .. } => "crashed",
}
}
}
/// Lifecycle control + state observation for one instance.
///
/// `start`/`stop`/`restart` take `self: Arc<Self>` so an impl can hand a clone
/// to a spawned monitor task; callers hold an `Arc<dyn Supervisor>` and
/// `clone()` before each call. `watch_state` exposes the same channel the
/// status-event publisher drains, so panel push events stay decoupled from the
/// heartbeat cadence.
#[async_trait::async_trait]
pub trait Supervisor: Send + Sync {
/// The instance slug (a NATS subject segment).
fn instance_id(&self) -> &str;
/// Current cached state (cheap; no I/O).
fn state(&self) -> InstanceState;
/// Subscribe to state transitions.
fn watch_state(&self) -> watch::Receiver<InstanceState>;
/// Seconds since the instance entered `Running` (0 otherwise).
async fn uptime_seconds(&self) -> u64;
async fn start(self: Arc<Self>) -> Result<()>;
async fn stop(self: Arc<Self>) -> Result<()>;
async fn restart(self: Arc<Self>) -> Result<()>;
}

View File

@@ -129,7 +129,7 @@ pub async fn collect(agent: &Agent, sys: &mut System) -> HeartbeatPayload {
let mut instances = Vec::with_capacity(agent.cfg.instances.len());
for inst in &agent.cfg.instances {
let (state, uptime_seconds) = match agent.supervisors.get(&inst.id) {
Some(sup) if !matches!(sup.state(), crate::process::InstanceState::Unmanaged) => {
Some(sup) if !matches!(sup.state(), crate::supervisor::InstanceState::Unmanaged) => {
(sup.state().as_label().to_string(), sup.uptime_seconds().await)
}
_ => {

View File

@@ -0,0 +1,154 @@
//! Signed self-update.
//!
//! The agent only ever runs a binary whose minisign signature verifies against
//! the EMBEDDED public key below. Even if the CDN (which currently accepts
//! unauthenticated uploads) served a malicious binary, the agent refuses it
//! without a valid signature from the release private key (a CI secret).
//!
//! Flow: download binary + `.minisig` from the CDN → verify signature →
//! atomic swap (current → `.old`, new → current, rollback on failure) →
//! relaunch the new binary. Defence in depth mirrors the Vigilance updater:
//! a real URL parse rejecting credential-in-URL bypasses, an https + host
//! allowlist, and a size cap.
use anyhow::{bail, Context, Result};
use minisign_verify::{PublicKey, Signature};
use std::path::{Path, PathBuf};
use std::time::Duration;
/// minisign public key. The matching private key signs releases in CI
/// (Gitea Actions secret MINISIGN_SECRET_KEY). Rotating it means re-signing
/// every published artifact and shipping an agent build with the new key.
const PUBLIC_KEY: &str = "RWQKhJptuiwIkp31cZdz10z/R72UPZkl7/VtnZJ2Vfbe0dQfDlXHZYFC";
const ALLOWED_HOST: &str = "cdn.corrosionmgmt.com";
const MAX_BINARY_BYTES: usize = 100 * 1024 * 1024; // 100 MiB sanity cap
const DOWNLOAD_TIMEOUT: Duration = Duration::from_secs(600);
/// Verify a binary against the embedded public key + a minisign signature blob.
/// The security core of self-update — tampered or unsigned content is rejected.
pub fn verify_signature(binary: &[u8], signature_blob: &str) -> Result<()> {
let pk = PublicKey::from_base64(PUBLIC_KEY).context("embedded public key is invalid")?;
let sig = Signature::decode(signature_blob).context("malformed minisign signature")?;
pk.verify(binary, &sig, false)
.map_err(|e| anyhow::anyhow!("signature verification failed: {e}"))?;
Ok(())
}
/// Reject anything but `https://cdn.corrosionmgmt.com/...` with no embedded
/// credentials (the userinfo-bypass class).
pub fn assert_url_allowed(url: &str) -> Result<()> {
let parsed = reqwest::Url::parse(url).context("invalid update URL")?;
if parsed.scheme() != "https" {
bail!("update URL must be https");
}
if !parsed.username().is_empty() || parsed.password().is_some() {
bail!("update URL must not contain credentials");
}
if parsed.host_str() != Some(ALLOWED_HOST) {
bail!("update URL host not allowed: {:?}", parsed.host_str());
}
Ok(())
}
/// Download, verify, and atomically swap in a new agent binary. Does NOT
/// restart — the caller decides when to relaunch (after replying on NATS).
/// Returns the path of the now-current (new) binary.
pub async fn download_verify_swap(url: &str) -> Result<PathBuf> {
assert_url_allowed(url)?;
let sig_url = format!("{url}.minisig");
assert_url_allowed(&sig_url)?;
let client = reqwest::Client::builder()
.timeout(DOWNLOAD_TIMEOUT)
.build()
.context("building HTTP client")?;
let binary = client
.get(url)
.send()
.await
.with_context(|| format!("downloading {url}"))?
.error_for_status()
.context("update binary download failed")?
.bytes()
.await
.context("reading update binary")?;
if binary.len() > MAX_BINARY_BYTES {
bail!("update binary is {} bytes, exceeds the {MAX_BINARY_BYTES} cap", binary.len());
}
let signature = client
.get(&sig_url)
.send()
.await
.with_context(|| format!("downloading {sig_url}"))?
.error_for_status()
.context("signature download failed")?
.text()
.await
.context("reading signature")?;
verify_signature(&binary, &signature).context("refusing unsigned/tampered update")?;
tracing::info!("update signature verified ({} bytes)", binary.len());
let current = std::env::current_exe().context("resolving current executable")?;
swap_binary(&current, &binary)?;
tracing::info!("update swapped in at {}", current.display());
Ok(current)
}
/// Atomically replace `current` with `new_bytes`, keeping a `.old` backup and
/// rolling back if the rename fails.
pub fn swap_binary(current: &Path, new_bytes: &[u8]) -> Result<()> {
let dir = current.parent().unwrap_or_else(|| Path::new("."));
let stem = current.file_name().and_then(|s| s.to_str()).unwrap_or("corrosion-host-agent");
let new_path = dir.join(format!("{stem}.new"));
let backup = dir.join(format!("{stem}.old"));
std::fs::write(&new_path, new_bytes)
.with_context(|| format!("writing {}", new_path.display()))?;
#[cfg(unix)]
{
use std::os::unix::fs::PermissionsExt;
std::fs::set_permissions(&new_path, std::fs::Permissions::from_mode(0o755))
.context("chmod +x on new binary")?;
}
let _ = std::fs::remove_file(&backup);
std::fs::rename(current, &backup)
.with_context(|| format!("backing up current binary to {}", backup.display()))?;
if let Err(e) = std::fs::rename(&new_path, current) {
// Roll back: restore the backup so the agent stays runnable.
let _ = std::fs::rename(&backup, current);
return Err(anyhow::anyhow!(e).context("installing new binary (rolled back)"));
}
Ok(())
}
/// Relaunch the (already-swapped) binary with the same args, then exit. No
/// service manager is required — the new process reconnects on its own. There
/// is a sub-second window with no agent; acceptable for an update.
pub fn relaunch_and_exit() -> ! {
let exe = std::env::current_exe().unwrap_or_else(|_| PathBuf::from("corrosion-host-agent"));
let args: Vec<String> = std::env::args().skip(1).collect();
tracing::info!("relaunching {} after update", exe.display());
#[cfg(unix)]
{
use std::os::unix::process::CommandExt;
// exec replaces this process image with the new binary — cleanest,
// no gap. Only returns on failure.
let err = std::process::Command::new(&exe).args(&args).exec();
tracing::error!("exec after update failed: {err}; exiting for service restart");
std::process::exit(70);
}
#[cfg(not(unix))]
{
let _ = std::process::Command::new(&exe).args(&args).spawn();
std::process::exit(0);
}
}

View File

@@ -0,0 +1,156 @@
//! DockerComposeSupervisor tests. A fake `docker` script records the exact
//! arguments it was invoked with and returns a controllable exit code, so we
//! assert the compose invocations + state transitions with no real Docker
//! daemon — the same mock-the-external-binary approach the steamcmd tests use.
#![cfg(unix)]
use std::os::unix::fs::PermissionsExt;
use std::path::{Path, PathBuf};
use corrosion_host_agent::config::InstanceConfig;
use corrosion_host_agent::docker_compose::{DockerComposeConfig, DockerComposeSupervisor};
use corrosion_host_agent::supervisor::{InstanceState, Supervisor};
/// Write a fake `docker` executable that appends its args (space-joined) to
/// `args_log` and exits with the integer in `exit_file` (0 if absent).
fn fake_docker(dir: &Path, args_log: &Path, exit_file: &Path) -> PathBuf {
let script = dir.join("fakedocker");
let body = format!(
"#!/bin/sh\nprintf '%s\\n' \"$*\" >> '{}'\nexit \"$(cat '{}' 2>/dev/null || echo 0)\"\n",
args_log.display(),
exit_file.display(),
);
std::fs::write(&script, body).unwrap();
let mut perms = std::fs::metadata(&script).unwrap().permissions();
perms.set_mode(0o755);
std::fs::set_permissions(&script, perms).unwrap();
script
}
fn dune_instance(command: Vec<String>, service: Option<String>) -> InstanceConfig {
InstanceConfig {
id: "dune-main".to_string(),
game: "dune".to_string(),
root: PathBuf::from("/tmp"),
label: None,
executable: None,
args: vec![],
working_dir: None,
rcon: None,
steamcmd: None,
docker_compose: Some(DockerComposeConfig {
file: Some(PathBuf::from("docker-compose.yml")),
project: Some("duneproj".to_string()),
service,
command: Some(command),
}),
}
}
#[tokio::test]
async fn start_runs_compose_up_detached_and_sets_running() {
let dir = tempfile::tempdir().unwrap();
let args_log = dir.path().join("args.log");
let exit_file = dir.path().join("exit");
let docker = fake_docker(dir.path(), &args_log, &exit_file);
let sup = DockerComposeSupervisor::new(&dune_instance(
vec![docker.to_string_lossy().into_owned()],
None,
));
assert_eq!(sup.state(), InstanceState::Stopped);
sup.clone().start().await.expect("compose up should succeed");
assert_eq!(sup.state(), InstanceState::Running);
let logged = std::fs::read_to_string(&args_log).unwrap();
assert!(logged.contains("up -d"), "expected `up -d`; got: {logged}");
assert!(logged.contains("-p duneproj"), "expected project flag; got: {logged}");
assert!(logged.contains("-f docker-compose.yml"), "expected file flag; got: {logged}");
}
#[tokio::test]
async fn stop_runs_compose_stop_and_sets_stopped() {
let dir = tempfile::tempdir().unwrap();
let args_log = dir.path().join("args.log");
let exit_file = dir.path().join("exit");
let docker = fake_docker(dir.path(), &args_log, &exit_file);
let sup = DockerComposeSupervisor::new(&dune_instance(
vec![docker.to_string_lossy().into_owned()],
None,
));
sup.clone().start().await.expect("up");
sup.clone().stop().await.expect("compose stop should succeed");
assert_eq!(sup.state(), InstanceState::Stopped);
assert_eq!(sup.uptime_seconds().await, 0);
let logged = std::fs::read_to_string(&args_log).unwrap();
assert!(logged.lines().any(|l| l.contains("stop")), "expected a `stop` call; got: {logged}");
}
#[tokio::test]
async fn restart_runs_compose_restart() {
let dir = tempfile::tempdir().unwrap();
let args_log = dir.path().join("args.log");
let exit_file = dir.path().join("exit");
let docker = fake_docker(dir.path(), &args_log, &exit_file);
let sup = DockerComposeSupervisor::new(&dune_instance(
vec![docker.to_string_lossy().into_owned()],
None,
));
sup.clone().restart().await.expect("compose restart should succeed");
assert_eq!(sup.state(), InstanceState::Running);
let logged = std::fs::read_to_string(&args_log).unwrap();
assert!(logged.contains("restart"), "expected `restart`; got: {logged}");
}
#[tokio::test]
async fn single_service_is_targeted() {
let dir = tempfile::tempdir().unwrap();
let args_log = dir.path().join("args.log");
let exit_file = dir.path().join("exit");
let docker = fake_docker(dir.path(), &args_log, &exit_file);
let sup = DockerComposeSupervisor::new(&dune_instance(
vec![docker.to_string_lossy().into_owned()],
Some("gameserver".to_string()),
));
sup.clone().start().await.expect("up");
let logged = std::fs::read_to_string(&args_log).unwrap();
assert!(
logged.contains("up -d gameserver"),
"service must be appended after `up -d`; got: {logged}"
);
}
#[tokio::test]
async fn compose_failure_errors_and_reverts_state() {
let dir = tempfile::tempdir().unwrap();
let args_log = dir.path().join("args.log");
let exit_file = dir.path().join("exit");
std::fs::write(&exit_file, "1").unwrap(); // make the fake docker fail
let docker = fake_docker(dir.path(), &args_log, &exit_file);
let sup = DockerComposeSupervisor::new(&dune_instance(
vec![docker.to_string_lossy().into_owned()],
None,
));
let err = sup.clone().start().await.expect_err("nonzero compose exit must fail");
assert!(err.to_string().contains("compose up failed"), "got: {err}");
assert_eq!(sup.state(), InstanceState::Stopped, "failed start must revert to Stopped");
}
#[tokio::test]
async fn missing_docker_binary_errors_cleanly() {
let sup = DockerComposeSupervisor::new(&dune_instance(
vec!["/nonexistent/docker-xyz".to_string()],
None,
));
let err = sup.clone().start().await.expect_err("missing docker must fail");
assert!(err.to_string().contains("docker"), "error should mention docker: {err}");
assert_eq!(sup.state(), InstanceState::Stopped);
}

View File

@@ -347,6 +347,62 @@ fn jail_rejects_chained_symlink_escape() {
);
}
/// SECURITY REGRESSION: copying a directory that contains a symlink pointing
/// OUTSIDE the jail must NOT dereference it and pull external content inside.
/// jail() validates only the top-level src/dest; the recursive copy must
/// refuse symlinks itself or it becomes a read-escape exfiltration path.
#[cfg(unix)]
#[test]
fn copy_refuses_to_follow_symlink_out_of_jail() {
let dir = tempdir();
let root = dir.path();
let outside = tempdir();
std::fs::write(outside.path().join("secret.txt"), "TOP SECRET")
.expect("write external secret");
// A directory inside the jail containing a symlink to the outside dir.
std::fs::create_dir(root.join("src")).expect("mkdir src");
std::os::unix::fs::symlink(outside.path(), root.join("src").join("escape"))
.expect("plant symlink to outside");
// Attempt to copy src -> dest (both inside the jail).
let err = filemanager::copy(root, "src", "dest")
.expect_err("copy must refuse the embedded symlink");
assert!(
format!("{err:#}").contains("symlink"),
"error should name the refused symlink, got: {err:#}"
);
// The external secret must NOT have landed inside the jail.
assert!(
!root.join("dest").join("escape").join("secret.txt").exists(),
"external content leaked into the jail via symlink-following copy",
);
}
/// `list` must report a symlink as the link itself, never the dereferenced
/// target — otherwise it leaks the size/type of files outside the jail.
#[cfg(unix)]
#[test]
fn list_does_not_dereference_symlink_metadata() {
let dir = tempdir();
let root = dir.path();
std::os::unix::fs::symlink(Path::new("/etc/passwd"), root.join("leak"))
.expect("plant symlink");
let entries = filemanager::list(root, "").expect("list root");
let leak = entries.iter().find(|e| e.name == "leak").expect("symlink listed");
// /etc/passwd is a regular file; if we followed the link, is_dir would
// reflect the target. We must report the link, which is not a directory,
// and must NOT expose the target's byte size.
assert!(!leak.is_dir, "symlink must not be reported as a directory");
let target_size = std::fs::metadata("/etc/passwd").map(|m| m.len()).unwrap_or(0);
assert!(
leak.size != target_size || target_size == 0,
"list leaked the symlink target's size ({target_size} bytes)"
);
}
// ---------------------------------------------------------------------------
// Dispatch layer tests
// ---------------------------------------------------------------------------

View File

@@ -0,0 +1,2 @@
corrosion-host-agent signed-update test fixture
version 2.0.0-test

View File

@@ -0,0 +1,4 @@
untrusted comment: signature from minisign secret key
RUQKhJptuiwIkp378Z59BTwosDycAhmlhrdZZVwk1Vdb293OgcsXx0S3W0XezMtOXIXdgvQtW/DpDKlb1gdW4elQXLG5KFUgawI=
trusted comment: timestamp:1781222247 file:sample.bin hashed
QtUiOfJqRKYJZTL6QV93xeLVnODr8HXWvZIR3Q1AG0yqmqesZPyiKpVa9kD34Mwp1fQ76nx1Z7c6CB1v5KHQAw==

View File

@@ -8,7 +8,8 @@ use std::path::PathBuf;
use std::time::Duration;
use corrosion_host_agent::config::InstanceConfig;
use corrosion_host_agent::process::{InstanceState, ProcessSupervisor};
use corrosion_host_agent::process::ProcessSupervisor;
use corrosion_host_agent::supervisor::{InstanceState, Supervisor};
fn managed_instance(executable: &str, args: &[&str]) -> InstanceConfig {
InstanceConfig {
@@ -21,6 +22,7 @@ fn managed_instance(executable: &str, args: &[&str]) -> InstanceConfig {
working_dir: None,
rcon: None,
steamcmd: None,
docker_compose: None,
}
}
@@ -47,15 +49,15 @@ async fn start_status_stop_lifecycle() {
let sup = ProcessSupervisor::new(&managed_instance("/bin/sleep", &["300"]));
assert_eq!(sup.state(), InstanceState::Stopped);
sup.start().await.expect("start should succeed");
sup.clone().start().await.expect("start should succeed");
assert_eq!(sup.state(), InstanceState::Running);
tokio::time::sleep(Duration::from_millis(1100)).await;
assert!(sup.uptime_seconds().await >= 1, "uptime should advance");
// Double-start must be rejected while running.
assert!(sup.start().await.is_err(), "double start must fail");
assert!(sup.clone().start().await.is_err(), "double start must fail");
sup.stop().await.expect("stop should succeed");
sup.clone().stop().await.expect("stop should succeed");
let state = wait_for_state(&sup, |s| matches!(s, InstanceState::Stopped), Duration::from_secs(5)).await;
assert_eq!(state, InstanceState::Stopped);
assert_eq!(sup.uptime_seconds().await, 0);
@@ -64,7 +66,7 @@ async fn start_status_stop_lifecycle() {
#[tokio::test]
async fn unexpected_exit_is_crashed_with_code() {
let sup = ProcessSupervisor::new(&managed_instance("/bin/sh", &["-c", "sleep 0.2; exit 7"]));
sup.start().await.expect("start should succeed");
sup.clone().start().await.expect("start should succeed");
let state = wait_for_state(
&sup,
@@ -78,16 +80,16 @@ async fn unexpected_exit_is_crashed_with_code() {
#[tokio::test]
async fn restart_from_crashed_recovers() {
let sup = ProcessSupervisor::new(&managed_instance("/bin/sh", &["-c", "exit 1"]));
sup.start().await.expect("start should succeed");
sup.clone().start().await.expect("start should succeed");
wait_for_state(&sup, |s| matches!(s, InstanceState::Crashed { .. }), Duration::from_secs(5)).await;
// Restart from crashed must work (panel "Restart" after a crash).
// Use a long-lived command this time by replacing the supervisor — the
// command is fixed per supervisor, so emulate via a fresh one.
let sup2 = ProcessSupervisor::new(&managed_instance("/bin/sleep", &["300"]));
sup2.restart().await.expect("restart from stopped should start");
sup2.clone().restart().await.expect("restart from stopped should start");
assert_eq!(sup2.state(), InstanceState::Running);
sup2.stop().await.expect("cleanup stop");
sup2.clone().stop().await.expect("cleanup stop");
}
#[tokio::test]
@@ -96,14 +98,14 @@ async fn unmanaged_instance_rejects_process_commands() {
cfg.executable = None;
let sup = ProcessSupervisor::new(&cfg);
assert_eq!(sup.state(), InstanceState::Unmanaged);
assert!(sup.start().await.is_err(), "unmanaged start must fail");
assert!(sup.stop().await.is_err(), "unmanaged stop must fail");
assert!(sup.clone().start().await.is_err(), "unmanaged start must fail");
assert!(sup.clone().stop().await.is_err(), "unmanaged stop must fail");
}
#[tokio::test]
async fn missing_executable_fails_cleanly() {
let sup = ProcessSupervisor::new(&managed_instance("/nonexistent/bin/gameserver", &[]));
let err = sup.start().await.expect_err("must fail");
let err = sup.clone().start().await.expect_err("must fail");
assert!(err.to_string().contains("not found"), "error should say not found: {err}");
assert_eq!(sup.state(), InstanceState::Stopped, "failed start must not leave Starting state");
}

View File

@@ -0,0 +1,63 @@
//! Signed self-update tests — the security-critical part is signature
//! verification: a valid signature is accepted, anything tampered is rejected.
//! Fixtures (tests/fixtures/sample.bin + .minisig) were signed with the real
//! release private key, so these run with no key present (as in CI).
use corrosion_host_agent::update;
const SAMPLE: &[u8] = include_bytes!("fixtures/sample.bin");
const SAMPLE_SIG: &str = include_str!("fixtures/sample.bin.minisig");
#[test]
fn accepts_a_validly_signed_binary() {
update::verify_signature(SAMPLE, SAMPLE_SIG).expect("valid signature must verify");
}
#[test]
fn rejects_a_tampered_binary() {
let mut tampered = SAMPLE.to_vec();
tampered[0] ^= 0xFF; // flip a byte
let err = update::verify_signature(&tampered, SAMPLE_SIG)
.expect_err("tampered binary must be rejected");
assert!(err.to_string().contains("verification failed"), "got: {err}");
}
#[test]
fn rejects_a_garbage_signature() {
assert!(update::verify_signature(SAMPLE, "not a real minisig blob").is_err());
}
#[test]
fn rejects_empty_binary_against_real_sig() {
assert!(update::verify_signature(b"", SAMPLE_SIG).is_err());
}
#[test]
fn url_allowlist_enforced() {
// Allowed.
update::assert_url_allowed("https://cdn.corrosionmgmt.com/host-agent/alpha/corrosion-host-agent-linux-amd64")
.expect("the real CDN host must be allowed");
// http rejected.
assert!(update::assert_url_allowed("http://cdn.corrosionmgmt.com/x").is_err());
// wrong host rejected.
assert!(update::assert_url_allowed("https://evil.example.com/x").is_err());
// credential-in-URL (userinfo bypass) rejected.
assert!(update::assert_url_allowed("https://cdn.corrosionmgmt.com:[email protected]/x").is_err());
// host as userinfo trick rejected (real host is evil.com).
assert!(update::assert_url_allowed("https://[email protected]/x").is_err());
}
#[test]
fn swap_binary_replaces_and_backs_up() {
let dir = tempfile::tempdir().expect("tempdir");
let current = dir.path().join("corrosion-host-agent");
std::fs::write(&current, b"OLD BINARY").unwrap();
update::swap_binary(&current, b"NEW BINARY").expect("swap should succeed");
assert_eq!(std::fs::read(&current).unwrap(), b"NEW BINARY", "current is the new binary");
let backup = dir.path().join("corrosion-host-agent.old");
assert_eq!(std::fs::read(&backup).unwrap(), b"OLD BINARY", ".old holds the previous binary");
// the .new scratch file is consumed by the rename
assert!(!dir.path().join("corrosion-host-agent.new").exists());
}

View File

@@ -31,6 +31,9 @@ services:
volumes:
- nats_data:/data
- ./nats.conf:/etc/nats/nats.conf:ro
# Per-license authorization (generated on the host; carries secrets, not
# committed with real users — see scripts/generate-nats-auth.mjs).
- ./nats-auth.conf:/etc/nats/nats-auth.conf:ro
ports:
- "8089:4222" # Client connections
@@ -43,6 +46,12 @@ services:
DATABASE_URL: postgres://corrosion:${DB_PASSWORD:-corrosion_dev}@postgres:5432/corrosion
DATABASE_MAX_CONNECTIONS: "20"
NATS_URL: nats://nats:4222
# Privileged internal NATS user (full corrosion.> access). Empty = anonymous.
NATS_INTERNAL_USER: ${NATS_INTERNAL_USER:-}
NATS_INTERNAL_PASSWORD: ${NATS_INTERNAL_PASSWORD:-}
# Secret for deriving per-license agent passwords (shared with the
# nats-auth generator). HMAC-SHA256(license_id, secret).
NATS_TOKEN_SECRET: ${NATS_TOKEN_SECRET:-}
JWT_SECRET: ${JWT_SECRET}
JWT_ACCESS_EXPIRY_SECONDS: "14400"
JWT_REFRESH_EXPIRY_SECONDS: "604800"

18
docker/nats-auth.conf Normal file
View File

@@ -0,0 +1,18 @@
# BOOTSTRAP DEFAULT — no secrets, safe to commit.
#
# Anonymous is mapped to a HARMLESS namespace (corrosion.unclaimed.>), never to
# real tenant subjects (corrosion.{uuid}.>) — so a fresh/stale deploy running
# this default cannot read or forge any tenant's traffic. The REST API still
# works; agent telemetry just won't flow until the real config is generated.
#
# On every real deploy, scripts/generate-nats-auth.mjs OVERWRITES this file
# (on the host, not in git) with the privileged internal user + per-license
# scoped users. NATS_AUTH_STAGE defaults to "enforce" (anonymous rejected).
#
# NOTE: no_auth_user is a TOP-LEVEL field, NOT inside authorization { }.
authorization {
users: [
{ user: "anonymous", password: "", permissions: { publish: { allow: ["corrosion.unclaimed.>"] }, subscribe: { allow: ["corrosion.unclaimed.>"] } } }
]
}
no_auth_user: "anonymous"

View File

@@ -28,8 +28,11 @@ logtime: true
max_payload: 8MB # Support map file transfer metadata
max_connections: 10000
# Authorization — tokens validated per-connection
# Plugin and companion agents authenticate with license-specific tokens
authorization {
timeout: 5
}
# Authorization — per-license isolation.
# The committed nats-auth.conf is the SAFE OPEN default (anonymous full access,
# no secrets — same as before). On deploy, scripts/generate-nats-auth.mjs
# regenerates this file from the licenses table with the privileged internal
# user + per-license scoped users; flip NATS_AUTH_STAGE=enforce to reject
# anonymous. The host copy carries secrets and is NOT committed
# (git update-index --assume-unchanged docker/nats-auth.conf).
include "nats-auth.conf"

View File

@@ -0,0 +1,69 @@
# Reference Repos
Third-party Dune: Awakening server-management projects, kept here as **behavior
references** for Phase 2 (the Corrosion host-agent Dune adapter + future panel
Dune features). These are NOT Corrosion code and are not built or shipped — they
are read-only references. `.git` histories, `node_modules`, and compiled
binaries were stripped on import (the 38 MB `icehunter/web/dune-admin` build
artifact and a Tauri `.icns` are intentionally absent).
> Imported 2026-06-12 from `/tmp/dune-re`. Each was a separate upstream repo;
> see each project's own `LICENSE` and `README.md`. Treat as documentation.
## Why these are here
Dune: Awakening does **not** use SteamCMD or a plain game-server process like
Rust/Conan/Soulmask. It ships as **Docker container(s)** fronted by a **RabbitMQ
broker** (admin + game vhosts) and a **PostgreSQL** admin database (`dune`
schema), orchestrated as a "**battlegroup**". The game process is
`DuneSandboxServer-Linux-Shipping` (one per partition). Server settings live in
INI files (`UserEngine.ini` / `UserGame.ini`) and only take effect after a
restart. Our Dune adapter must model that container/broker/DB world instead of
the process+SteamCMD model — these repos are how that world actually works in
the wild.
## The references
### `icehunter/` — `dune-admin` (Go backend + React SPA)
The richest ops reference. A web admin panel with **four interchangeable control
planes**: `docker`, `kubectl`, `local`, and `amp` (CubeCoders AMP / podman).
Most relevant to us:
- **`SETUP_DOCKER.md`** — the Docker control plane: `docker start/stop/restart`
for lifecycle, `docker logs -f` for streaming, `docker exec` into the broker
container for RabbitMQ (`rabbitmqctl`) commands, direct TCP to the `dune`
Postgres. Optional SSH tunnelling when the admin is off-host. **This is the
closest analog to what the Corrosion host-agent Dune adapter must do.**
- `cmd/dune-admin/control_docker.go` / `control_kubectl.go` / `control_local.go`
/ `control_amp.go` — the `ControlPlane` interface and its implementations
(the start/stop/restart/status/log/broker abstraction we mirror as a Rust
game-adapter trait).
- `db.go` / `model.go` — the full Dune admin data model (players, bases,
inventory, exchange/market) for when Corrosion grows a richer Dune admin
surface beyond lifecycle.
- `CLAUDE.md` — upstream's own engineering notes; the AMP section documents the
INI-vs-API server-settings gotcha (AMP regenerates INIs on start).
### `adainrivers/` — Dune Dedicated Server Manager (Rust / Tauri desktop)
**The Rust reference.** Manages already-provisioned servers over **SSH +
Kubernetes** ("BattleGroup" start/stop/restart/update), with secure SSH tunnels
to Director / File Browser / Postgres / PgHero, an in-game admin console (item
grants, vehicle spawns, journey/XP tags), and a bundled **`dune-server-service`**
daemon for scheduled maintenance (timed restarts with in-game warnings, backups,
update apply). Closest to our stack idiomatically — read it for Rust patterns on
SSH control, the maintenance-daemon design, and the in-game command surface.
### `the4rchangel/` — Dune: Awakening Server Manager (Node.js local web UI)
**Matches the Commander's exact self-host path.** A local dashboard that
replaces the `battlegroup.bat` terminal menu — guided VM import (Hyper-V),
network, SSH, bootstrap, then daily ops: battlegroup start/stop/restart/update,
character editor, visual game-config editor (PvP, sandstorms, sandworms, mining
rates, decay, building limits), monitoring, DB access. Read it to understand the
`battlegroup.bat` workflow our agent has to drive on a Windows/Hyper-V host.
## How we use them
- **Lifecycle/control** → mirror `icehunter`'s `ControlPlane` docker provider as
the agent's Dune game-adapter (compose/`docker` lifecycle, `docker logs`
console, reject SteamCMD).
- **Rust idioms / maintenance daemon / SSH** → `adainrivers`.
- **Battlegroup.bat reality / setup flow / game-config schema** → `the4rchangel`.

View File

@@ -0,0 +1,71 @@
name: CI
on:
workflow_dispatch:
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
checks:
name: Workspace checks (${{ matrix.platform }})
runs-on: ${{ matrix.platform }}
strategy:
fail-fast: false
matrix:
platform: [windows-latest, ubuntu-22.04, macos-latest]
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Rust
uses: dtolnay/rust-toolchain@stable
- name: Install Node
uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
cache-dependency-path: app/package-lock.json
- name: Install Linux Tauri dependencies
if: matrix.platform == 'ubuntu-22.04'
run: |
sudo apt-get update
sudo apt-get install -y libwebkit2gtk-4.1-dev libappindicator3-dev librsvg2-dev patchelf pkg-config libssl-dev
- name: Install frontend dependencies
working-directory: app
run: npm ci
- name: Rust format
run: cargo fmt --all -- --check
- name: Rust check
run: cargo check --workspace
- name: Rust tests
run: cargo test --workspace
- name: Core API docs
run: cargo doc -p dune-manager-core --no-deps
- name: Frontend build
working-directory: app
run: npm run build
- name: Tauri shell check
run: cargo check -p dune-dedicated-server-manager-app
- name: Secret and machine-constant scan
if: matrix.platform == 'windows-latest'
shell: pwsh
run: |
rg -n -S "I:|AutoUpdate|192\.168\.2\.|menna|dune-awakening|C:\\WINDOWS\\System32\\OpenSSH|C:\\Windows\\System32\\OpenSSH|change-me-before-exposing|c05564d|d177d3bbc40be761|qRmQx|FuncomLiveServices__ServiceAuthToken" . -g "!app/**/target/**" -g "!crates/**/target/**" -g "!target/**" -g "!app/node_modules/**" -g "!app/dist/**" -g "!*.md" -g "!app/steamcmd/**" -g "!app/dune-server/**" -g "!app/vm/**" -g "!app/vm-*/**" -g "!vm/**" -g "!.tmp/**"
if ($LASTEXITCODE -eq 0) {
throw "Secret or machine-specific constant scan found matches."
}
if ($LASTEXITCODE -ne 1) {
exit $LASTEXITCODE
}

View File

@@ -0,0 +1,203 @@
name: Release
on:
push:
tags:
- "v*.*.*"
workflow_dispatch:
inputs:
version:
description: "Version to release, for example 0.1.0"
required: true
type: string
permissions:
contents: write
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
linux-service-binary:
name: Build dune-server-service (musl)
runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable
with:
targets: x86_64-unknown-linux-musl
- name: Install Zig
uses: mlugg/setup-zig@v1
with:
version: 0.13.0
- name: Install cargo-zigbuild
run: cargo install --locked cargo-zigbuild
- name: Resolve release version
shell: bash
env:
WORKFLOW_VERSION: ${{ inputs.version }}
run: |
version="$WORKFLOW_VERSION"
if [ -z "$version" ]; then
version="${GITHUB_REF_NAME#v}"
fi
if [ -z "$version" ]; then
echo "could not resolve release version" >&2
exit 1
fi
echo "RELEASE_VERSION=$version" >> "$GITHUB_ENV"
echo "RELEASE_TAG=v$version" >> "$GITHUB_ENV"
- name: Build musl binary
run: |
cargo zigbuild -p dune-server-service --release --target x86_64-unknown-linux-musl
strip target/x86_64-unknown-linux-musl/release/dune-server-service
- name: Stage release artifacts
run: |
mkdir -p release-artifacts
cp target/x86_64-unknown-linux-musl/release/dune-server-service release-artifacts/dune-server-service
cp crates/dune-server-service/systemd/dune-server-service.service release-artifacts/dune-server-service.service
cp crates/dune-server-service/openrc/dune-server-service release-artifacts/dune-server-service.openrc
- name: Upload artifact for desktop bundle
uses: actions/upload-artifact@v4
with:
name: dune-server-service-musl
path: release-artifacts/
retention-days: 7
- name: Resolve release notes
if: startsWith(github.ref, 'refs/tags/v')
shell: bash
run: |
notes_path="release-notes/${RELEASE_VERSION}.md"
if [ -f "$notes_path" ]; then
echo "RELEASE_BODY_PATH=$notes_path" >> "$GITHUB_ENV"
else
tmp=$(mktemp)
printf 'Release v%s. No release-notes/%s.md was provided — see the commit log for details.\n' \
"$RELEASE_VERSION" "$RELEASE_VERSION" > "$tmp"
echo "RELEASE_BODY_PATH=$tmp" >> "$GITHUB_ENV"
fi
- name: Attach to GitHub release
if: startsWith(github.ref, 'refs/tags/v')
uses: softprops/action-gh-release@v2
with:
tag_name: ${{ env.RELEASE_TAG }}
body_path: ${{ env.RELEASE_BODY_PATH }}
files: |
release-artifacts/dune-server-service
release-artifacts/dune-server-service.service
release-artifacts/dune-server-service.openrc
desktop-app:
name: Build ${{ matrix.name }} app
needs: linux-service-binary
runs-on: ${{ matrix.platform }}
strategy:
fail-fast: false
matrix:
include:
- name: Windows
platform: windows-latest
args: --bundles nsis
- name: Linux
platform: ubuntu-22.04
args: --bundles appimage,deb
- name: macOS Apple Silicon
platform: macos-latest
args: --target aarch64-apple-darwin --bundles dmg
- name: macOS Intel
platform: macos-latest
args: --target x86_64-apple-darwin --bundles dmg
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install Rust
uses: dtolnay/rust-toolchain@stable
with:
targets: ${{ startsWith(matrix.name, 'macOS') && 'aarch64-apple-darwin,x86_64-apple-darwin' || '' }}
- name: Install Node
uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
cache-dependency-path: app/package-lock.json
- name: Install Linux Tauri dependencies
if: matrix.platform == 'ubuntu-22.04'
run: |
sudo apt-get update
sudo apt-get install -y libwebkit2gtk-4.1-dev libappindicator3-dev librsvg2-dev patchelf pkg-config libssl-dev
- name: Install frontend dependencies
working-directory: app
run: npm ci
- name: Download bundled dune-server-service binary
uses: actions/download-artifact@v4
with:
name: dune-server-service-musl
path: app/src-tauri/binaries/
- name: Resolve release version
shell: pwsh
env:
WORKFLOW_VERSION: ${{ inputs.version }}
run: |
$version = $env:WORKFLOW_VERSION
if ([string]::IsNullOrWhiteSpace($version)) {
$version = "${{ github.ref_name }}".TrimStart("v")
}
if ([string]::IsNullOrWhiteSpace($version)) {
throw "Release version could not be resolved."
}
"RELEASE_VERSION=$version" | Out-File -FilePath $env:GITHUB_ENV -Append
"RELEASE_TAG=v$version" | Out-File -FilePath $env:GITHUB_ENV -Append
- name: Prepare release config
shell: pwsh
run: |
$version = $env:RELEASE_VERSION
Push-Location app
npm version --no-git-tag-version --allow-same-version $version
Pop-Location
$tauriConfigPath = "app/src-tauri/tauri.conf.json"
$config = Get-Content $tauriConfigPath -Raw
$config = $config -replace '"version":\s*"[^"]+"', ('"version": "' + $version + '"')
# Release builds publish signed updater artifacts; the checked-in
# default keeps this off so local debug builds do not require
# TAURI_SIGNING_PRIVATE_KEY.
$config = $config -replace '"createUpdaterArtifacts":\s*false', '"createUpdaterArtifacts": true'
Set-Content -Path $tauriConfigPath -Value $config -NoNewline
# The body is set by the linux-service-binary job's softprops step.
# tauri-action only uploads desktop bundles + the signed updater
# artifacts here; we don't pass releaseBody to avoid clobbering.
- name: Build and publish Tauri release
uses: tauri-apps/tauri-action@v0
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
TAURI_SIGNING_PRIVATE_KEY: ${{ secrets.TAURI_SIGNING_PRIVATE_KEY }}
TAURI_SIGNING_PRIVATE_KEY_PASSWORD: ${{ secrets.TAURI_SIGNING_PRIVATE_KEY_PASSWORD }}
VITE_ENABLE_STARTUP_UPDATE_CHECK: "true"
with:
projectPath: app
tagName: ${{ env.RELEASE_TAG }}
releaseName: "Dune Dedicated Server Manager ${{ env.RELEASE_TAG }}"
releaseDraft: false
prerelease: false
args: ${{ matrix.args }}

View File

@@ -0,0 +1,68 @@
# Dependencies
node_modules/
app/node_modules/
# Frontend build
dist/
app/dist/
app/src-tauri/gen/schemas/
# Rust/Tauri build outputs
target/
src-tauri/target/
app/src-tauri/target/
manager-api/target/
# Local environment
.env
.env.*
!.env.example
# Logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
pnpm-debug.log*
# Docs are scratch notes for now; keep README trackable later
*.md
!README.md
!docs/
!docs/*.md
docs/rabbitmq-protocol.md
# Release notes go on GitHub releases via the release workflow.
!release-notes/
!release-notes/*.md
# Editor and OS noise
.idea/
.vscode/
*.swp
*.swo
Thumbs.db
Desktop.ini
# Local app/runtime data and secrets
.tmp/
.playwright-mcp/
app/default-config.json
app/steamcmd/
app/dune-server/
dune-server/
app/vm/
app/vm-*/
app/src-tauri/dune-server/
app/src-tauri/vm/
app/src-tauri/resources/manager-api/dune-manager-api
app/src-tauri/resources/manager-api/dune-manager-api.exe
vm/
*.pem
*.key
sshKey
codex_vm_ed25519_dropbear
codex_vm_ed25519_dropbear.pub
snapshots/
keys/
initial-setup-log.txt
secrets/

7156
docs/reference-repos/adainrivers/Cargo.lock generated Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,7 @@
[workspace]
members = ["crates/dune-manager-core", "crates/dune-server-service", "app/src-tauri"]
resolver = "2"
[workspace.dependencies]
serde = { version = "1", features = ["derive"] }
serde_json = "1"

View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2026 gaming.tools
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -0,0 +1,59 @@
# Dune Dedicated Server Manager
A desktop manager for existing Dune Awakening dedicated servers.
![Dashboard — BattleGroup status, lifecycle actions, management service, and tunnel controls](images/ss-1.png)
The app manages already-provisioned Dune dedicated servers over SSH and
Kubernetes control commands. It does not install the game server, create VMs,
configure Hyper-V, provision Ubuntu, or manage external tools such as SteamCMD.
## Features
- Remote server profile management with SSH private-key authentication
- BattleGroup status, start, stop, restart, and update controls
- Component diagnostics, log viewing, and safe restart actions
- Secure Director, File Browser, PostgreSQL, and PgHero access through local SSH tunnels
- Bundled `dune-server-service` daemon for on-host scheduled maintenance (daily restarts with in-game warnings, automated backups, server update check + apply) — installed over SSH straight from the Management card
- Admin console for in-game actions: item grants, vehicle spawns, skill/journey/XP tags, player lookup with live pawn location, and a logged history of every published command
- Automated tasks tab with editable schedule settings (daily restart time, warning lead/frequency, update apply lead, IANA timezone) — saving auto-restarts the service so changes apply immediately
- Welcome Package automation: a per-player onboarding chain (item grants, water refill, welcome whisper) driven by Postgres player detection, tracked in the management service's SQLite ledger, and configurable from the Welcome Package tab with both a visual editor and a raw JSON mode
![Admin tab — granting items to online players with a searchable Funcom item picker](images/ss-2.png)
More management features coming soon.
## Install
Download the latest release for your operating system from GitHub Releases.
- Windows: run the NSIS installer.
- Linux: use the AppImage or Debian package.
- macOS: use the DMG for your Mac architecture.
After launching the app, add an existing server profile with its host, SSH user,
and private key path, then refresh it to detect BattleGroups and management
endpoints.
## Managed Server Assumptions
The target server must already be installed and reachable over SSH. The app
expects the Dune Kubernetes resources and vendor management scripts to exist on
the server before you add it.
Required player-facing/server ports depend on your own server deployment. A
typical dedicated-server deployment uses:
- UDP 7777-7810 for game servers
- TCP 31982 for RMQ
If you found a bug or are having other issues, please create an issue here:
https://github.com/adainrivers/dune-dedicated-server-manager/issues
## Building From Source
See [Building From Source](docs/building-from-source.md).
## License
MIT License. See [LICENSE](LICENSE).

View File

@@ -0,0 +1,15 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Dune Dedicated Server Manager</title>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Funnel+Display:wght@400;500;600;700&family=Geist:wght@300;400;500;600;700&family=Geist+Mono:wght@400;500;600&display=swap" rel="stylesheet">
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/main.tsx"></script>
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,32 @@
{
"name": "dune-dedicated-server-manager-app",
"private": true,
"version": "0.3.16",
"type": "module",
"scripts": {
"dev": "vite --host 127.0.0.1 --port 1420",
"build": "tsc && vite build",
"preview": "vite preview --host 127.0.0.1 --port 1420",
"tauri": "tauri"
},
"dependencies": {
"@radix-ui/react-icons": "^1.3.2",
"@radix-ui/themes": "^3.2.1",
"@tauri-apps/api": "^2.0.0",
"@tauri-apps/plugin-dialog": "^2.7.1",
"@tauri-apps/plugin-process": "^2.3.1",
"@tauri-apps/plugin-shell": "^2.3.5",
"@tauri-apps/plugin-updater": "^2.10.1",
"markdown-to-jsx": "^9.8.1",
"react": "^18.3.1",
"react-dom": "^18.3.1"
},
"devDependencies": {
"@tauri-apps/cli": "^2.0.0",
"@types/react": "^18.3.12",
"@types/react-dom": "^18.3.1",
"@vitejs/plugin-react": "^4.3.3",
"typescript": "^5.6.3",
"vite": "^5.4.10"
}
}

View File

@@ -0,0 +1,26 @@
[package]
name = "dune-dedicated-server-manager-app"
version = "0.2.0"
description = "Desktop shell for Dune Dedicated Server Manager"
authors = ["Dune Dedicated Server Manager"]
edition = "2021"
[lib]
name = "dune_dedicated_server_manager_app_lib"
crate-type = ["staticlib", "cdylib", "rlib"]
[build-dependencies]
tauri-build = { version = "2", features = [] }
[dependencies]
dune-manager-core = { path = "../../crates/dune-manager-core" }
tauri = { version = "2", features = ["devtools"] }
serde = { workspace = true }
serde_json = { workspace = true }
tauri-plugin-dialog = "2"
tauri-plugin-updater = "2"
tauri-plugin-process = "2"
tauri-plugin-shell = "2"
base64 = "0.22"
chrono = { version = "0.4", default-features = false, features = ["clock", "std"] }
reqwest = { version = "0.12", default-features = false, features = ["json"] }

View File

@@ -0,0 +1,6 @@
# Populated by CI from the `linux-service-binary` job artifact, or locally
# via `cargo zigbuild -p dune-server-service --release --target
# x86_64-unknown-linux-musl` + manual copy. Not tracked.
dune-server-service
dune-server-service.service
dune-server-service.openrc

View File

@@ -0,0 +1,23 @@
# Bundled service binaries
This directory holds the Linux `dune-server-service` binary (musl-static), its
systemd unit, and its OpenRC init script. They are populated by the
`linux-service-binary` job in `.github/workflows/release.yml` and bundled into
the desktop installer as Tauri resources.
For local debug builds the directory can be empty — the `install_management_service`
Tauri command surfaces a friendly error when the resource is missing.
For a local end-to-end test, build the service yourself:
```powershell
rustup target add x86_64-unknown-linux-musl
cargo install --locked cargo-zigbuild
cargo zigbuild -p dune-server-service --release --target x86_64-unknown-linux-musl
Copy-Item target\x86_64-unknown-linux-musl\release\dune-server-service `
app\src-tauri\binaries\dune-server-service
Copy-Item crates\dune-server-service\systemd\dune-server-service.service `
app\src-tauri\binaries\dune-server-service.service
Copy-Item crates\dune-server-service\openrc\dune-server-service `
app\src-tauri\binaries\dune-server-service.openrc
```

View File

@@ -0,0 +1,67 @@
fn main() {
expose_dune_server_service_version();
rerun_if_bundled_binaries_change();
tauri_build::build();
}
/// Tauri's resource-copy step only fires when Cargo decides build.rs needs to
/// re-run, which by default doesn't watch arbitrary files. Without these
/// `rerun-if-changed` lines, refreshing the bundled `dune-server-service`
/// binary or its systemd/openrc units in `binaries/` after a previous build
/// produces a stale `target/release/binaries/` copy — the running exe then
/// pushes the OLD binary on Install/Update, with no visible signal.
fn rerun_if_bundled_binaries_change() {
let dir = std::path::Path::new(env!("CARGO_MANIFEST_DIR")).join("binaries");
// Watch the directory itself so file additions/deletions also trigger a rerun.
println!("cargo:rerun-if-changed={}", dir.display());
if let Ok(entries) = std::fs::read_dir(&dir) {
for entry in entries.flatten() {
let path = entry.path();
// Skip README, .gitignore, and similar bookkeeping files.
if matches!(
path.file_name().and_then(|n| n.to_str()),
Some("README.md") | Some(".gitignore")
) {
continue;
}
println!("cargo:rerun-if-changed={}", path.display());
}
}
}
fn expose_dune_server_service_version() {
let cargo_toml = std::path::Path::new(env!("CARGO_MANIFEST_DIR"))
.join("../../crates/dune-server-service/Cargo.toml");
println!("cargo:rerun-if-changed={}", cargo_toml.display());
let contents = std::fs::read_to_string(&cargo_toml)
.unwrap_or_else(|err| panic!("reading {}: {err}", cargo_toml.display()));
let version = parse_package_version(&contents).unwrap_or_else(|| {
panic!(
"could not find [package].version in {}",
cargo_toml.display()
)
});
println!("cargo:rustc-env=DUNE_SERVER_SERVICE_VERSION={version}");
}
fn parse_package_version(toml: &str) -> Option<String> {
let mut in_package = false;
for line in toml.lines() {
let trimmed = line.trim();
if trimmed.starts_with('[') {
in_package = trimmed == "[package]";
continue;
}
if !in_package {
continue;
}
if let Some(rest) = trimmed.strip_prefix("version") {
let rest = rest.trim_start();
let rest = rest.strip_prefix('=')?.trim_start();
let rest = rest.trim_start_matches('"');
let end = rest.find('"')?;
return Some(rest[..end].to_string());
}
}
None
}

View File

@@ -0,0 +1,7 @@
{
"$schema": "../gen/schemas/desktop-schema.json",
"identifier": "default",
"description": "Default desktop app permissions",
"windows": ["main"],
"permissions": ["core:default", "dialog:allow-open", "process:default", "shell:allow-open", "updater:default"]
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 94 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.8 KiB

View File

@@ -0,0 +1,5 @@
<?xml version="1.0" encoding="utf-8"?>
<adaptive-icon xmlns:android="http://schemas.android.com/apk/res/android">
<foreground android:drawable="@mipmap/ic_launcher_foreground"/>
<background android:drawable="@color/ic_launcher_background"/>
</adaptive-icon>

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 152 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

View File

@@ -0,0 +1,4 @@
<?xml version="1.0" encoding="utf-8"?>
<resources>
<color name="ic_launcher_background">#fff</color>
</resources>

Binary file not shown.

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 193 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1005 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.7 KiB

Some files were not shown because too many files have changed in this diff Show More