Compare commits
27 Commits
agent-v2.0
...
agent-v2.0
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
d13f2cb8b1 | ||
|
|
651a35d4be | ||
|
|
0715492ddf | ||
|
|
4ef5db5b0d | ||
|
|
bb71763714 | ||
|
|
f18b45e3f2 | ||
|
|
702de24e28 | ||
|
|
6b3e805ac2 | ||
|
|
7c84912ff5 | ||
|
|
355a53f6e3 | ||
|
|
589516a021 | ||
|
|
f60e6abd33 | ||
|
|
877fadcb6c | ||
|
|
e897a4802f | ||
|
|
c0b20f2f78 | ||
|
|
06e832fca1 | ||
|
|
009ceb86ad | ||
|
|
6f31c41dc3 | ||
|
|
99433a09d1 | ||
|
|
b442ef4102 | ||
|
|
856106174a | ||
|
|
463908b18e | ||
|
|
00cff51ce5 | ||
|
|
7a07d600e7 | ||
|
|
4a4ae7a5d4 | ||
|
|
930f655bf5 | ||
|
|
700dc2254d |
@@ -67,6 +67,43 @@ jobs:
|
||||
sha256sum corrosion-host-agent-windows-amd64.exe >> checksums.txt
|
||||
cat checksums.txt
|
||||
|
||||
- name: Sign artifacts (minisign)
|
||||
env:
|
||||
MINISIGN_SECRET_KEY: ${{ secrets.MINISIGN_SECRET_KEY }}
|
||||
run: |
|
||||
if [ -z "$MINISIGN_SECRET_KEY" ]; then
|
||||
echo "::error::MINISIGN_SECRET_KEY secret is not set — refusing to publish unsigned agent artifacts."
|
||||
exit 1
|
||||
fi
|
||||
# minisign isn't packaged for bullseye — fetch the official static binary.
|
||||
curl -sSL https://github.com/jedisct1/minisign/releases/download/0.12/minisign-0.12-linux.tar.gz -o /tmp/minisign.tgz
|
||||
tar -xzf /tmp/minisign.tgz -C /tmp
|
||||
MINISIGN="$(find /tmp -type f -name minisign -path '*linux*' | head -1)"
|
||||
chmod +x "$MINISIGN"
|
||||
"$MINISIGN" -v
|
||||
# A minisign secret key file is TWO lines (comment + base64 blob). CI
|
||||
# secret storage mangles embedded newlines, collapsing it to one line
|
||||
# so minisign can't load it. Preferred form: store the secret
|
||||
# base64-encoded (single line) — we decode it here. Auto-detect so a
|
||||
# correctly-stored raw two-line key still works.
|
||||
if printf '%s' "$MINISIGN_SECRET_KEY" | base64 -d 2>/dev/null | head -1 | grep -q "untrusted comment:"; then
|
||||
printf '%s' "$MINISIGN_SECRET_KEY" | base64 -d > /tmp/sign.key
|
||||
else
|
||||
printf '%s\n' "$MINISIGN_SECRET_KEY" > /tmp/sign.key
|
||||
fi
|
||||
if ! head -1 /tmp/sign.key | grep -q "untrusted comment:"; then
|
||||
echo "::error::MINISIGN_SECRET_KEY is neither base64 of a minisign key nor a raw two-line key file. Store it as: base64 < your-secret.key | tr -d '\n'"
|
||||
rm -f /tmp/sign.key
|
||||
exit 1
|
||||
fi
|
||||
cd corrosion-host-agent/bin
|
||||
# Passwordless key (-W generated); feed empty stdin so it never blocks.
|
||||
for f in corrosion-host-agent-linux-amd64 corrosion-host-agent-windows-amd64.exe checksums.txt; do
|
||||
"$MINISIGN" -S -s /tmp/sign.key -m "$f" -x "$f.minisig" < /dev/null
|
||||
done
|
||||
rm -f /tmp/sign.key
|
||||
echo "signed: $(ls *.minisig)"
|
||||
|
||||
- name: Create Release
|
||||
env:
|
||||
RELEASE_TOKEN: ${{ secrets.RELEASE_TOKEN }}
|
||||
@@ -82,7 +119,9 @@ jobs:
|
||||
"${API_URL}/repos/${REPO}/releases")
|
||||
RELEASE_ID=$(echo "$RESPONSE" | grep -o '"id":[0-9]*' | head -1 | grep -o '[0-9]*')
|
||||
|
||||
for f in corrosion-host-agent-linux-amd64 corrosion-host-agent-windows-amd64.exe checksums.txt; do
|
||||
for f in corrosion-host-agent-linux-amd64 corrosion-host-agent-linux-amd64.minisig \
|
||||
corrosion-host-agent-windows-amd64.exe corrosion-host-agent-windows-amd64.exe.minisig \
|
||||
checksums.txt checksums.txt.minisig; do
|
||||
curl -s -X POST \
|
||||
-H "Authorization: token ${RELEASE_TOKEN}" \
|
||||
-H "Content-Type: application/octet-stream" \
|
||||
@@ -95,7 +134,9 @@ jobs:
|
||||
CDN_URL="https://cdn.corrosionmgmt.com"
|
||||
VERSION="${{ steps.version.outputs.VERSION }}"
|
||||
|
||||
for f in corrosion-host-agent-linux-amd64 corrosion-host-agent-windows-amd64.exe checksums.txt; do
|
||||
for f in corrosion-host-agent-linux-amd64 corrosion-host-agent-linux-amd64.minisig \
|
||||
corrosion-host-agent-windows-amd64.exe corrosion-host-agent-windows-amd64.exe.minisig \
|
||||
checksums.txt checksums.txt.minisig; do
|
||||
curl -s -X POST \
|
||||
-F "file=@corrosion-host-agent/bin/$f" \
|
||||
"${CDN_URL}/host-agent/alpha/$f"
|
||||
|
||||
29
CHANGELOG.md
@@ -4,6 +4,35 @@ All notable changes to this project will be documented in this file.
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
### Added (Host-agent Phase 2 — Dune docker-compose adapter — 2026-06-12)
|
||||
|
||||
**`Supervisor` trait abstraction (`corrosion-host-agent`):**
|
||||
- Introduced `trait Supervisor` (via `async-trait`, the battle-tested ecosystem standard) so the agent can manage games with fundamentally different models behind one wire contract. `ProcessSupervisor` (spawned OS process — Rust/Conan/Soulmask) and the new `DockerComposeSupervisor` (Dune) both implement it; `Agent.supervisors` is now `HashMap<String, Arc<dyn Supervisor>>` and the instance command dispatch (`instancecmd::dispatch`) is fully game-agnostic — `start`/`stop`/`restart`/`status` are identical across games. A per-game factory in `main` selects the impl. `InstanceState` moved to the shared `supervisor` module.
|
||||
- **Architecture call** (per Commander): chose the `dyn` trait over a zero-dependency enum because the Dune references point at *several* future management planes (kubectl, AMP/podman, SSH) — a trait makes each new plane "new struct + impl," no central match to edit.
|
||||
|
||||
**`DockerComposeSupervisor` (Dune: Awakening):**
|
||||
- Drives `docker compose up -d` / `stop` / `restart` against the instance's compose project (a "battlegroup"), with `-f`/`-p`/single-service support and a configurable compose binary (`docker compose` default, `docker-compose` legacy). New `[instance.docker_compose]` config block (file/project/service/command, all optional). `steam_update` already rejected for Dune (Docker images, no SteamCMD).
|
||||
- **Scope (first cut):** lifecycle + cached state. Deferred to Phase 3b (with process PID adoption): container crash-detection and state adoption on agent restart (both reconcilable with a `docker compose ps` probe).
|
||||
- Verified: 6 new docker-compose tests (mock `docker` binary asserting exact invocations + state transitions + failure paths) + the 5 refactored process-supervisor tests; full agent suite 56 tests green, zero warnings. Live verification against a real Dune stack pending the Commander standing one up.
|
||||
|
||||
### Changed (Fleet-driven active game + signed-update CI fix — 2026-06-12)
|
||||
|
||||
**Frontend — active game follows the deployed fleet:**
|
||||
- The panel's active game (shell skin + sidebar nav + dashboard terminology) is now **derived from the deployed instances** instead of a localStorage-only toggle. `syncActiveGameFromFleet()` reads the distinct `game` values of the license's instances (`game_instances.game`, reported by the host agent): exactly one game deployed → the shell auto-skins to it; zero or multiple → `all` (neutral house skin). Wired into `DashboardLayout` (the always-mounted admin shell) via a watch on the fleet store.
|
||||
- A manual GameSwitcher pick still wins — it persists to `cc-active-game` and suppresses auto-derive (operator intent beats the heuristic). Un-overridden panels keep tracking the fleet across sessions.
|
||||
- **No backend/schema change:** a license's game(s) are the distinct games of its instances — the normalized source of truth. Deliberately did NOT add a `licenses.game` column (would duplicate `game_instances.game` and drift; see Lesson 20).
|
||||
|
||||
**Frontend — sidebar agent-health footer is now fleet-aware:**
|
||||
- The shell footer read a single legacy `server.connection` (one `server_connections` row), which disagreed with the multi-host fleet. Repointed it at the fleet store: one host → hostname + status + last-heartbeat; multiple → `{online}/{total} online` + total instance count. Tone aggregates (all online → healthy, some → degraded, none → offline). Dropped the legacy `useServerStore` dependency from the shell entirely.
|
||||
|
||||
**Frontend — removed dead `vuefinder` dependency:**
|
||||
- VueFinder was replaced by the native instance-scoped file manager but the plugin (and its CSS) were still globally registered in `main.ts` and shipped in the bundle. Removed the dep + the three `main.ts` lines. Side effect: the main JS chunk dropped **588 kB → 165 kB** (vuefinder bundled an entire unused file-manager UI).
|
||||
|
||||
**Recon note (not a change):** `corrosion.{license}.cmd.server` was on the cleanup list as "dead v1" — it is NOT. It remains the live license-level command path for all plugin/module config applies, plugin install, scheduled tasks, and legacy start/stop/restart, served only by the legacy Go agent. The Rust agent does not implement it yet — this is a **parity/migration gap** (Phase 2+), not dead code. Left intact.
|
||||
|
||||
**CI — signed host-agent build:**
|
||||
- Fixed the `Sign artifacts (minisign)` step (`Error while loading the secret key file`): a minisign secret key is two lines and CI secret storage mangles the embedded newline. The job now base64-decodes the secret (single-line, mangling-proof) with auto-detect fallback to a raw key. `MINISIGN_SECRET_KEY` must be stored as `base64 < secret.key | tr -d '\n'`. Verified end-to-end: `agent-v2.0.0-alpha.8` Linux + Windows binaries validate against the agent's embedded public key; tampered byte rejected.
|
||||
|
||||
### Added (Host-Agent v2 Consumer + SEO Meta — 2026-06-11)
|
||||
|
||||
**Backend (NestJS):**
|
||||
|
||||
@@ -447,3 +447,9 @@ Things I discovered about myself building a sister platform across multiple sess
|
||||
24. **`onModuleInit` runs before async `onModuleInit` of dependencies completes — register NATS/external subscriptions in `onApplicationBootstrap`.** `NatsService.onModuleInit` connects to NATS (async); `NatsBridgeService`/`HostAgentConsumerService` registered their subscriptions in their own `onModuleInit`, which fired while the connection was still null — so every `subscribe()` hit the `[OFFLINE]` no-op path and the WS bridge was dead-on-boot in *every* production build, silently. Nest guarantees `onApplicationBootstrap` runs only after all module init (including the awaited connect) finishes. Anything that depends on another provider's async startup belongs in bootstrap, not init. The tell: a subscription that "should be there" but the handler never fires and there's no error — trace the *startup ordering*, not the handler.
|
||||
|
||||
25. **Fixing a dead code path detonates the live code behind it — budget for the second bug.** The moment Lesson 24's fix made the NATS→WS bridge actually deliver events, the API crashed on the first forwarded heartbeat: `WebSocket.OPEN` was `undefined` at runtime because `esModuleInterop` is off, so `import WebSocket from 'ws'` compiled to `ws_1.default` (undefined). That crash had sat behind the dead bridge since the gateway was written — never hit because no event ever reached it. When you resurrect a path that was silently no-op, everything downstream of it is effectively *untested code running for the first time in production*. Verify the whole chain end-to-end (I watched the DB row appear, then flip offline), don't stop at "the subscription fires now." This is Lesson 10 with a fuse on it. Import-runtime gotcha worth remembering: when `esModuleInterop` is off, prefer instance constants (`client.OPEN`) over class statics (`WebSocket.OPEN`) for `ws`.
|
||||
|
||||
26. **A jail check at the entry point does not jail the recursive walk behind it — and my own "line-by-line" review missed it; the automated security review didn't.** The file manager's `jail()` correctly canonicalized and prefix-checked the top-level path, and I traced every escape vector through it and signed off. But `copy_recursive` then walked the directory tree with `fs::metadata` (which *follows* symlinks). A symlink planted inside the jail pointing at `/etc`, then a `copy` of its parent, would dereference it and pull external content *into* the jail to be read — a jail escape the entry check never sees, because the escape is reintroduced by a descendant during traversal. Fix: `symlink_metadata` (lstat) everywhere you recurse, and refuse/never-follow symlinks across the boundary. The transferable rule: **validate at the boundary AND at every step that re-derives a path** (recursion, `read_dir`, glob, archive extraction). And the humbling part — I was confident after reviewing the jail function; the security-review pass caught the HIGH I'd waved through. Trust adversarial verification over your own once-over on security-critical code, especially path/traversal logic.
|
||||
|
||||
27. **Validate infra config BEFORE it reaches a deploy — and know that `docker compose up -d <service>` will recreate other services whose definitions changed.** During the NATS auth cutover I ran `docker compose up -d api` to pick up new env. Because the *nats* service definition had also changed (a new volume mount), compose recreated **corrosion-nats too** — and it failed to start on a config error (`no_auth_user` nested inside `authorization{}` instead of at top level), taking the broker down for ~3 minutes with the backend in offline mode. Two lessons: (a) a broker/proxy/DB config file is code — lint it before it can reach a restart (`nats-server -t -c cfg` to test-parse, `nginx -t`, etc.), don't let the first validation be the production container's startup; (b) `compose up -d <one-service>` is not surgical — it reconciles that service's **dependencies** too, so a stale edit to a depended-on service ships when you didn't mean it to. When touching shared-infra config, restart that service explicitly and watch it come up before moving on. Recovery also surfaced a third gotcha: recreating a client (api) while its server (nats) is down leaves the client stuck on a cached DNS failure (`EAI_AGAIN`) — restart the client once the server is healthy.
|
||||
|
||||
28. **A multi-line secret in CI (minisign/SSH/PGP keys) must be stored base64-encoded — the runner mangles embedded newlines and the key silently fails to load.** The signed-update CI passed the toolchain build, downloaded minisign fine, then died at the sign step on `Error while loading the secret key file` (exit 2). The cause wasn't the key or minisign — a minisign secret key file is **two lines** (`untrusted comment:` + base64 blob), and Gitea/act_runner secret storage collapses the embedded newline so the reconstructed file is one unparseable line. The robust pattern: store the secret as `base64 < secret.key | tr -d '\n'` (single line, mangling-proof) and `base64 -d` it in the job, with auto-detect fallback so a correctly-stored raw key still works, and a loud `::error::` carrying the fix command if it's neither. This applies to **any** multi-line credential in CI, not just minisign. Two corollaries: (a) the tell is "the tool runs but can't load its key" — suspect newline-mangling before the key itself; (b) generating that base64 prints the **private key to the terminal/transcript** — for a supply-chain signing key, treat it as exposed and rotate before cutover (embed the new pubkey, re-store the new secret, retire the old). And verify the published artifact end-to-end against the *embedded* pubkey (`minisign -Vm bin -P <pub>`) plus a tampered-byte negative control — a green build that signs is not the same as a signature the agent will actually accept.
|
||||
|
||||
@@ -45,6 +45,8 @@ import { BetterChatModule } from './modules/betterchat/betterchat.module';
|
||||
import { TimedExecuteModule } from './modules/timedexecute/timedexecute.module';
|
||||
import { RaidableBasesModule } from './modules/raidablebases/raidablebases.module';
|
||||
import { EarlyAccessModule } from './modules/early-access/early-access.module';
|
||||
import { FleetModule } from './modules/fleet/fleet.module';
|
||||
import { InstancesModule } from './modules/instances/instances.module';
|
||||
|
||||
// Shared Services
|
||||
import { NatsService } from './services/nats.service';
|
||||
@@ -52,6 +54,8 @@ import { NatsBridgeService } from './services/nats-bridge.service';
|
||||
import { HostAgentConsumerService } from './services/host-agent-consumer.service';
|
||||
import { ServerConnection } from './entities/server-connection.entity';
|
||||
import { License } from './entities/license.entity';
|
||||
import { AgentHost } from './entities/agent-host.entity';
|
||||
import { GameInstance } from './entities/game-instance.entity';
|
||||
import { SteamService } from './services/steam.service';
|
||||
|
||||
// Gateway
|
||||
@@ -95,7 +99,7 @@ import { NatsBridgeGateway } from './gateways/nats-bridge.gateway';
|
||||
ScheduleModule.forRoot(),
|
||||
|
||||
// Repositories for app-level shared services (host-agent consumer)
|
||||
TypeOrmModule.forFeature([ServerConnection, License]),
|
||||
TypeOrmModule.forFeature([ServerConnection, License, AgentHost, GameInstance]),
|
||||
|
||||
// Feature Modules
|
||||
AuthModule,
|
||||
@@ -131,6 +135,8 @@ import { NatsBridgeGateway } from './gateways/nats-bridge.gateway';
|
||||
TimedExecuteModule,
|
||||
RaidableBasesModule,
|
||||
EarlyAccessModule,
|
||||
FleetModule,
|
||||
InstancesModule,
|
||||
],
|
||||
providers: [
|
||||
// Global guards (order matters: auth first, then license, then permissions)
|
||||
|
||||
@@ -6,6 +6,15 @@ export default () => ({
|
||||
},
|
||||
nats: {
|
||||
url: process.env.NATS_URL || 'nats://localhost:4222',
|
||||
// Public broker address shown to agents in setup instructions.
|
||||
publicUrl: process.env.NATS_PUBLIC_URL || 'nats://nats.corrosionmgmt.com:4222',
|
||||
// Privileged internal credentials for the backend's own NATS connection
|
||||
// (full corrosion.> access). Empty = anonymous (transition period).
|
||||
internalUser: process.env.NATS_INTERNAL_USER || '',
|
||||
internalPassword: process.env.NATS_INTERNAL_PASSWORD || '',
|
||||
// Secret used to derive a per-license agent password:
|
||||
// HMAC-SHA256(license_id, secret). Shared with the nats.conf generator.
|
||||
tokenSecret: process.env.NATS_TOKEN_SECRET || '',
|
||||
},
|
||||
jwt: {
|
||||
secret: process.env.JWT_SECRET || 'change-me',
|
||||
|
||||
74
backend-nest/src/entities/agent-host.entity.ts
Normal file
@@ -0,0 +1,74 @@
|
||||
import { Entity, PrimaryGeneratedColumn, Column, ManyToOne, JoinColumn, Check, Unique } from 'typeorm';
|
||||
import { License } from './license.entity';
|
||||
|
||||
export interface AgentHostDisk {
|
||||
mount: string;
|
||||
total_mb: number;
|
||||
free_mb: number;
|
||||
}
|
||||
|
||||
/**
|
||||
* One Corrosion host agent / one machine. Owns the machine-level facts.
|
||||
*
|
||||
* NOTE: distinct from the B2B `hosts` table (hosting-partner companies). This
|
||||
* is `agent_hosts` — the physical/virtual box a customer runs the agent on.
|
||||
*/
|
||||
@Entity('agent_hosts')
|
||||
@Unique(['license_id', 'hostname'])
|
||||
@Check(`"status" IN ('connected', 'degraded', 'offline')`)
|
||||
export class AgentHost {
|
||||
@PrimaryGeneratedColumn('uuid')
|
||||
id: string;
|
||||
|
||||
@Column({ type: 'uuid' })
|
||||
license_id: string;
|
||||
|
||||
@Column({ type: 'varchar', length: 255, default: '' })
|
||||
hostname: string;
|
||||
|
||||
@Column({ type: 'varchar', length: 64, nullable: true })
|
||||
agent_version: string | null;
|
||||
|
||||
@Column({ type: 'varchar', length: 64, nullable: true })
|
||||
agent_commit: string | null;
|
||||
|
||||
@Column({ type: 'varchar', length: 32, nullable: true })
|
||||
os: string | null;
|
||||
|
||||
@Column({ type: 'varchar', length: 32, nullable: true })
|
||||
arch: string | null;
|
||||
|
||||
@Column({ type: 'varchar', length: 20, default: 'offline' })
|
||||
status: string;
|
||||
|
||||
@Column({ type: 'timestamptz', nullable: true })
|
||||
last_heartbeat_at: Date | null;
|
||||
|
||||
@Column({ type: 'double precision', nullable: true })
|
||||
cpu_percent: number | null;
|
||||
|
||||
@Column({ type: 'integer', nullable: true })
|
||||
cpu_cores: number | null;
|
||||
|
||||
@Column({ type: 'bigint', nullable: true })
|
||||
mem_total_mb: number | null;
|
||||
|
||||
@Column({ type: 'bigint', nullable: true })
|
||||
mem_used_mb: number | null;
|
||||
|
||||
@Column({ type: 'bigint', nullable: true })
|
||||
uptime_seconds: number | null;
|
||||
|
||||
@Column({ type: 'jsonb', nullable: true })
|
||||
disks: AgentHostDisk[] | null;
|
||||
|
||||
@Column({ type: 'timestamptz', default: () => 'NOW()' })
|
||||
created_at: Date;
|
||||
|
||||
@Column({ type: 'timestamptz', default: () => 'NOW()' })
|
||||
updated_at: Date;
|
||||
|
||||
@ManyToOne(() => License, { onDelete: 'CASCADE' })
|
||||
@JoinColumn({ name: 'license_id' })
|
||||
license: License;
|
||||
}
|
||||
59
backend-nest/src/entities/game-instance.entity.ts
Normal file
@@ -0,0 +1,59 @@
|
||||
import { Entity, PrimaryGeneratedColumn, Column, ManyToOne, JoinColumn, Unique } from 'typeorm';
|
||||
import { License } from './license.entity';
|
||||
import { AgentHost } from './agent-host.entity';
|
||||
|
||||
/**
|
||||
* One game server process / orchestrated unit (a Rust server, a Conan world,
|
||||
* a Dune battlegroup). The billing unit — plans count instances.
|
||||
* `agent_instance_id` is the agent's slug and the NATS subject segment.
|
||||
*/
|
||||
@Entity('game_instances')
|
||||
@Unique(['license_id', 'agent_instance_id'])
|
||||
export class GameInstance {
|
||||
@PrimaryGeneratedColumn('uuid')
|
||||
id: string;
|
||||
|
||||
@Column({ type: 'uuid' })
|
||||
license_id: string;
|
||||
|
||||
@Column({ type: 'uuid', nullable: true })
|
||||
host_id: string | null;
|
||||
|
||||
@Column({ type: 'uuid', nullable: true })
|
||||
cluster_id: string | null;
|
||||
|
||||
@Column({ type: 'varchar', length: 64 })
|
||||
agent_instance_id: string;
|
||||
|
||||
@Column({ type: 'varchar', length: 32 })
|
||||
game: string;
|
||||
|
||||
@Column({ type: 'varchar', length: 255, nullable: true })
|
||||
label: string | null;
|
||||
|
||||
@Column({ type: 'varchar', length: 32, default: 'unknown' })
|
||||
state: string;
|
||||
|
||||
@Column({ type: 'text', nullable: true })
|
||||
root_path: string | null;
|
||||
|
||||
@Column({ type: 'bigint', default: 0 })
|
||||
uptime_seconds: number;
|
||||
|
||||
@Column({ type: 'timestamptz', nullable: true })
|
||||
last_seen_at: Date | null;
|
||||
|
||||
@Column({ type: 'timestamptz', default: () => 'NOW()' })
|
||||
created_at: Date;
|
||||
|
||||
@Column({ type: 'timestamptz', default: () => 'NOW()' })
|
||||
updated_at: Date;
|
||||
|
||||
@ManyToOne(() => License, { onDelete: 'CASCADE' })
|
||||
@JoinColumn({ name: 'license_id' })
|
||||
license: License;
|
||||
|
||||
@ManyToOne(() => AgentHost, { onDelete: 'SET NULL', nullable: true })
|
||||
@JoinColumn({ name: 'host_id' })
|
||||
host: AgentHost | null;
|
||||
}
|
||||
38
backend-nest/src/entities/instance-cluster.entity.ts
Normal file
@@ -0,0 +1,38 @@
|
||||
import { Entity, PrimaryGeneratedColumn, Column, ManyToOne, JoinColumn } from 'typeorm';
|
||||
import { License } from './license.entity';
|
||||
|
||||
/**
|
||||
* Optional grouping of instances for games with linked topologies:
|
||||
* Soulmask main/child clusters, Dune BattleGroup → Sietches. Reserved now;
|
||||
* cluster orchestration ships with those game adapters.
|
||||
*/
|
||||
@Entity('instance_clusters')
|
||||
export class InstanceCluster {
|
||||
@PrimaryGeneratedColumn('uuid')
|
||||
id: string;
|
||||
|
||||
@Column({ type: 'uuid' })
|
||||
license_id: string;
|
||||
|
||||
@Column({ type: 'varchar', length: 32 })
|
||||
game: string;
|
||||
|
||||
@Column({ type: 'varchar', length: 255 })
|
||||
name: string;
|
||||
|
||||
@Column({ type: 'varchar', length: 32, nullable: true })
|
||||
topology: string | null;
|
||||
|
||||
@Column({ type: 'jsonb', nullable: true })
|
||||
config: Record<string, unknown> | null;
|
||||
|
||||
@Column({ type: 'timestamptz', default: () => 'NOW()' })
|
||||
created_at: Date;
|
||||
|
||||
@Column({ type: 'timestamptz', default: () => 'NOW()' })
|
||||
updated_at: Date;
|
||||
|
||||
@ManyToOne(() => License, { onDelete: 'CASCADE' })
|
||||
@JoinColumn({ name: 'license_id' })
|
||||
license: License;
|
||||
}
|
||||
38
backend-nest/src/entities/instance-stats.entity.ts
Normal file
@@ -0,0 +1,38 @@
|
||||
import { Entity, PrimaryGeneratedColumn, Column, ManyToOne, JoinColumn } from 'typeorm';
|
||||
import { GameInstance } from './game-instance.entity';
|
||||
|
||||
/**
|
||||
* Per-instance time-series game metrics (player count, FPS, …). Populated once
|
||||
* game-level telemetry is collected via RCON/plugin — the host heartbeat
|
||||
* carries host metrics, not game metrics, so this stays empty in Phase A.
|
||||
*/
|
||||
@Entity('instance_stats')
|
||||
export class InstanceStats {
|
||||
@PrimaryGeneratedColumn('uuid')
|
||||
id: string;
|
||||
|
||||
@Column({ type: 'uuid' })
|
||||
instance_id: string;
|
||||
|
||||
@Column({ type: 'uuid' })
|
||||
license_id: string;
|
||||
|
||||
@Column({ type: 'integer', default: 0 })
|
||||
player_count: number;
|
||||
|
||||
@Column({ type: 'integer', default: 0 })
|
||||
max_players: number;
|
||||
|
||||
@Column({ type: 'double precision', default: 0 })
|
||||
fps: number;
|
||||
|
||||
@Column({ type: 'integer', default: 0 })
|
||||
memory_usage_mb: number;
|
||||
|
||||
@Column({ type: 'timestamptz', default: () => 'NOW()' })
|
||||
recorded_at: Date;
|
||||
|
||||
@ManyToOne(() => GameInstance, { onDelete: 'CASCADE' })
|
||||
@JoinColumn({ name: 'instance_id' })
|
||||
instance: GameInstance;
|
||||
}
|
||||
26
backend-nest/src/modules/fleet/fleet.controller.ts
Normal file
@@ -0,0 +1,26 @@
|
||||
import { Controller, Get, Delete, Param } from '@nestjs/common';
|
||||
import { ApiTags, ApiBearerAuth, ApiOperation } from '@nestjs/swagger';
|
||||
import { FleetService } from './fleet.service';
|
||||
import { CurrentTenant } from '../../common/decorators/current-tenant.decorator';
|
||||
import { RequirePermission } from '../../common/decorators/require-permission.decorator';
|
||||
|
||||
@ApiTags('fleet')
|
||||
@ApiBearerAuth()
|
||||
@Controller('fleet')
|
||||
export class FleetController {
|
||||
constructor(private readonly fleetService: FleetService) {}
|
||||
|
||||
@Get()
|
||||
@RequirePermission('server.view')
|
||||
@ApiOperation({ summary: 'Get fleet overview — hosts and game instances for this license' })
|
||||
async getFleet(@CurrentTenant() licenseId: string) {
|
||||
return this.fleetService.getFleet(licenseId);
|
||||
}
|
||||
|
||||
@Delete('hosts/:id')
|
||||
@RequirePermission('server.manage')
|
||||
@ApiOperation({ summary: 'Remove a host and its instances (host must be offline)' })
|
||||
async deleteHost(@CurrentTenant() licenseId: string, @Param('id') id: string) {
|
||||
return this.fleetService.deleteHost(licenseId, id);
|
||||
}
|
||||
}
|
||||
15
backend-nest/src/modules/fleet/fleet.module.ts
Normal file
@@ -0,0 +1,15 @@
|
||||
import { Module } from '@nestjs/common';
|
||||
import { TypeOrmModule } from '@nestjs/typeorm';
|
||||
import { FleetController } from './fleet.controller';
|
||||
import { FleetService } from './fleet.service';
|
||||
import { AgentHost } from '../../entities/agent-host.entity';
|
||||
import { GameInstance } from '../../entities/game-instance.entity';
|
||||
import { ServerConnection } from '../../entities/server-connection.entity';
|
||||
|
||||
@Module({
|
||||
imports: [TypeOrmModule.forFeature([AgentHost, GameInstance, ServerConnection])],
|
||||
controllers: [FleetController],
|
||||
providers: [FleetService],
|
||||
exports: [FleetService],
|
||||
})
|
||||
export class FleetModule {}
|
||||
170
backend-nest/src/modules/fleet/fleet.service.ts
Normal file
@@ -0,0 +1,170 @@
|
||||
import { Injectable, NotFoundException, ConflictException } from '@nestjs/common';
|
||||
import { InjectRepository } from '@nestjs/typeorm';
|
||||
import { Repository } from 'typeorm';
|
||||
import { AgentHost } from '../../entities/agent-host.entity';
|
||||
import { GameInstance } from '../../entities/game-instance.entity';
|
||||
import { ServerConnection } from '../../entities/server-connection.entity';
|
||||
|
||||
export interface FleetInstanceDto {
|
||||
id: string;
|
||||
agent_instance_id: string;
|
||||
game: string;
|
||||
label: string | null;
|
||||
state: string;
|
||||
uptime_seconds: number;
|
||||
last_seen_at: string | null;
|
||||
}
|
||||
|
||||
export interface FleetHostDto {
|
||||
id: string;
|
||||
hostname: string;
|
||||
status: string;
|
||||
agent_version: string | null;
|
||||
os: string | null;
|
||||
arch: string | null;
|
||||
cpu_percent: number | null;
|
||||
cpu_cores: number | null;
|
||||
mem_total_mb: number | null;
|
||||
mem_used_mb: number | null;
|
||||
uptime_seconds: number | null;
|
||||
disks: AgentHost['disks'];
|
||||
last_heartbeat_at: string | null;
|
||||
instances: FleetInstanceDto[];
|
||||
}
|
||||
|
||||
export interface FleetSummaryDto {
|
||||
host_count: number;
|
||||
instance_count: number;
|
||||
online_host_count: number;
|
||||
}
|
||||
|
||||
export interface FleetResponseDto {
|
||||
hosts: FleetHostDto[];
|
||||
summary: FleetSummaryDto;
|
||||
}
|
||||
|
||||
@Injectable()
|
||||
export class FleetService {
|
||||
constructor(
|
||||
@InjectRepository(AgentHost)
|
||||
private readonly hostRepo: Repository<AgentHost>,
|
||||
@InjectRepository(GameInstance)
|
||||
private readonly instanceRepo: Repository<GameInstance>,
|
||||
@InjectRepository(ServerConnection)
|
||||
private readonly connectionRepo: Repository<ServerConnection>,
|
||||
) {}
|
||||
|
||||
/**
|
||||
* Remove a host and its game instances from the fleet.
|
||||
*
|
||||
* Refuses while the host is `connected` — a live agent re-registers on its
|
||||
* next heartbeat, so the operator must stop the agent first. Deletes the
|
||||
* host's instances explicitly (the FK is SET NULL, which would otherwise
|
||||
* orphan them); instance_stats cascade. If this was the license's last host,
|
||||
* the legacy single-server connection row is cleared too so the old
|
||||
* Dashboard doesn't show a stale server.
|
||||
*/
|
||||
async deleteHost(
|
||||
licenseId: string,
|
||||
hostId: string,
|
||||
): Promise<{ deleted: true; instances_removed: number }> {
|
||||
const host = await this.hostRepo.findOne({ where: { id: hostId, license_id: licenseId } });
|
||||
if (!host) throw new NotFoundException('Host not found');
|
||||
if (host.status === 'connected') {
|
||||
throw new ConflictException(
|
||||
'Host is online — stop the agent first, or it will re-register on its next heartbeat',
|
||||
);
|
||||
}
|
||||
|
||||
const del = await this.instanceRepo.delete({ license_id: licenseId, host_id: hostId });
|
||||
await this.hostRepo.delete({ id: hostId, license_id: licenseId });
|
||||
|
||||
const remaining = await this.hostRepo.count({ where: { license_id: licenseId } });
|
||||
if (remaining === 0) {
|
||||
await this.connectionRepo.delete({ license_id: licenseId });
|
||||
}
|
||||
|
||||
return { deleted: true, instances_removed: del.affected ?? 0 };
|
||||
}
|
||||
|
||||
async getFleet(licenseId: string): Promise<FleetResponseDto> {
|
||||
const [hosts, instances] = await Promise.all([
|
||||
this.hostRepo.find({
|
||||
where: { license_id: licenseId },
|
||||
order: { hostname: 'ASC' },
|
||||
}),
|
||||
this.instanceRepo.find({
|
||||
where: { license_id: licenseId },
|
||||
order: { game: 'ASC', label: 'ASC' },
|
||||
}),
|
||||
]);
|
||||
|
||||
// Group instances by host_id. Bigint columns come back as strings from pg — coerce.
|
||||
const instancesByHost = new Map<string | null, FleetInstanceDto[]>();
|
||||
for (const inst of instances) {
|
||||
const key = inst.host_id ?? null;
|
||||
if (!instancesByHost.has(key)) {
|
||||
instancesByHost.set(key, []);
|
||||
}
|
||||
instancesByHost.get(key)!.push({
|
||||
id: inst.id,
|
||||
agent_instance_id: inst.agent_instance_id,
|
||||
game: inst.game,
|
||||
label: inst.label,
|
||||
state: inst.state,
|
||||
uptime_seconds: Number(inst.uptime_seconds),
|
||||
last_seen_at: inst.last_seen_at ? inst.last_seen_at.toISOString() : null,
|
||||
});
|
||||
}
|
||||
|
||||
const hostDtos: FleetHostDto[] = hosts.map((h) => ({
|
||||
id: h.id,
|
||||
hostname: h.hostname,
|
||||
status: h.status,
|
||||
agent_version: h.agent_version,
|
||||
os: h.os,
|
||||
arch: h.arch,
|
||||
cpu_percent: h.cpu_percent !== null && h.cpu_percent !== undefined ? Number(h.cpu_percent) : null,
|
||||
cpu_cores: h.cpu_cores !== null && h.cpu_cores !== undefined ? Number(h.cpu_cores) : null,
|
||||
mem_total_mb: h.mem_total_mb !== null && h.mem_total_mb !== undefined ? Number(h.mem_total_mb) : null,
|
||||
mem_used_mb: h.mem_used_mb !== null && h.mem_used_mb !== undefined ? Number(h.mem_used_mb) : null,
|
||||
uptime_seconds: h.uptime_seconds !== null && h.uptime_seconds !== undefined ? Number(h.uptime_seconds) : null,
|
||||
disks: h.disks,
|
||||
last_heartbeat_at: h.last_heartbeat_at ? h.last_heartbeat_at.toISOString() : null,
|
||||
instances: instancesByHost.get(h.id) ?? [],
|
||||
}));
|
||||
|
||||
// Append synthetic "unassigned" bucket only if orphaned instances exist
|
||||
const unassigned = instancesByHost.get(null) ?? [];
|
||||
if (unassigned.length > 0) {
|
||||
hostDtos.push({
|
||||
id: '__unassigned__',
|
||||
hostname: 'Unassigned',
|
||||
status: 'offline',
|
||||
agent_version: null,
|
||||
os: null,
|
||||
arch: null,
|
||||
cpu_percent: null,
|
||||
cpu_cores: null,
|
||||
mem_total_mb: null,
|
||||
mem_used_mb: null,
|
||||
uptime_seconds: null,
|
||||
disks: null,
|
||||
last_heartbeat_at: null,
|
||||
instances: unassigned,
|
||||
});
|
||||
}
|
||||
|
||||
const online_host_count = hosts.filter((h) => h.status === 'connected').length;
|
||||
const instance_count = instances.length;
|
||||
|
||||
return {
|
||||
hosts: hostDtos,
|
||||
summary: {
|
||||
host_count: hosts.length,
|
||||
instance_count,
|
||||
online_host_count,
|
||||
},
|
||||
};
|
||||
}
|
||||
}
|
||||
133
backend-nest/src/modules/instances/instances.controller.ts
Normal file
@@ -0,0 +1,133 @@
|
||||
import { Controller, Post, Get, Put, Body, Param, Query } from '@nestjs/common';
|
||||
import { ApiTags, ApiBearerAuth, ApiOperation } from '@nestjs/swagger';
|
||||
import { CurrentTenant } from '../../common/decorators/current-tenant.decorator';
|
||||
import { RequirePermission } from '../../common/decorators/require-permission.decorator';
|
||||
import { InstancesService, LifecycleFunc } from './instances.service';
|
||||
|
||||
@ApiTags('instances')
|
||||
@ApiBearerAuth()
|
||||
@Controller('instances')
|
||||
export class InstancesController {
|
||||
constructor(private readonly instances: InstancesService) {}
|
||||
|
||||
@Post(':id/lifecycle')
|
||||
@RequirePermission('server.manage')
|
||||
@ApiOperation({ summary: 'Send a lifecycle command to a game instance (start/stop/restart/status/steam_update)' })
|
||||
async lifecycle(
|
||||
@CurrentTenant() licenseId: string,
|
||||
@Param('id') id: string,
|
||||
@Body() body: { action: LifecycleFunc },
|
||||
) {
|
||||
return this.instances.lifecycle(licenseId, id, body.action);
|
||||
}
|
||||
|
||||
@Post(':id/rcon')
|
||||
@RequirePermission('server.console')
|
||||
@ApiOperation({ summary: 'Send an RCON/console command to a game instance' })
|
||||
async rcon(
|
||||
@CurrentTenant() licenseId: string,
|
||||
@Param('id') id: string,
|
||||
@Body() body: { command: string },
|
||||
) {
|
||||
return this.instances.rcon(licenseId, id, body.command);
|
||||
}
|
||||
|
||||
@Get(':id/files')
|
||||
@RequirePermission('files.view')
|
||||
@ApiOperation({ summary: 'List a directory in the instance (jailed to its root)' })
|
||||
async listFiles(
|
||||
@CurrentTenant() licenseId: string,
|
||||
@Param('id') id: string,
|
||||
@Query('path') path?: string,
|
||||
) {
|
||||
return this.instances.listFiles(licenseId, id, path ?? '');
|
||||
}
|
||||
|
||||
@Get(':id/file')
|
||||
@RequirePermission('files.view')
|
||||
@ApiOperation({ summary: 'Read a text file from the instance (jailed, 5 MiB cap)' })
|
||||
async readFile(
|
||||
@CurrentTenant() licenseId: string,
|
||||
@Param('id') id: string,
|
||||
@Query('path') path: string,
|
||||
) {
|
||||
return this.instances.readFile(licenseId, id, path);
|
||||
}
|
||||
|
||||
@Put(':id/file')
|
||||
@RequirePermission('files.manage')
|
||||
@ApiOperation({ summary: 'Write a text file in the instance (jailed)' })
|
||||
async writeFile(
|
||||
@CurrentTenant() licenseId: string,
|
||||
@Param('id') id: string,
|
||||
@Body() body: { path: string; content: string },
|
||||
) {
|
||||
return this.instances.writeFile(licenseId, id, body.path, body.content ?? '');
|
||||
}
|
||||
|
||||
@Post(':id/files/delete')
|
||||
@RequirePermission('files.manage')
|
||||
@ApiOperation({ summary: 'Delete a file or directory (jailed)' })
|
||||
async deleteFile(
|
||||
@CurrentTenant() licenseId: string,
|
||||
@Param('id') id: string,
|
||||
@Body() body: { path: string },
|
||||
) {
|
||||
return this.instances.deleteFile(licenseId, id, body.path);
|
||||
}
|
||||
|
||||
@Post(':id/files/rename')
|
||||
@RequirePermission('files.manage')
|
||||
@ApiOperation({ summary: 'Rename a file/directory within its parent (jailed)' })
|
||||
async renameFile(
|
||||
@CurrentTenant() licenseId: string,
|
||||
@Param('id') id: string,
|
||||
@Body() body: { path: string; name: string },
|
||||
) {
|
||||
return this.instances.renameFile(licenseId, id, body.path, body.name);
|
||||
}
|
||||
|
||||
@Post(':id/files/mkdir')
|
||||
@RequirePermission('files.manage')
|
||||
@ApiOperation({ summary: 'Create a directory (jailed)' })
|
||||
async mkdir(
|
||||
@CurrentTenant() licenseId: string,
|
||||
@Param('id') id: string,
|
||||
@Body() body: { path: string },
|
||||
) {
|
||||
return this.instances.mkdir(licenseId, id, body.path);
|
||||
}
|
||||
|
||||
@Post(':id/files/mkfile')
|
||||
@RequirePermission('files.manage')
|
||||
@ApiOperation({ summary: 'Create an empty file (jailed)' })
|
||||
async mkfile(
|
||||
@CurrentTenant() licenseId: string,
|
||||
@Param('id') id: string,
|
||||
@Body() body: { path: string },
|
||||
) {
|
||||
return this.instances.mkfile(licenseId, id, body.path);
|
||||
}
|
||||
|
||||
@Post(':id/files/move')
|
||||
@RequirePermission('files.manage')
|
||||
@ApiOperation({ summary: 'Move a file/directory (jailed)' })
|
||||
async moveFile(
|
||||
@CurrentTenant() licenseId: string,
|
||||
@Param('id') id: string,
|
||||
@Body() body: { path: string; dest: string },
|
||||
) {
|
||||
return this.instances.moveFile(licenseId, id, body.path, body.dest);
|
||||
}
|
||||
|
||||
@Post(':id/files/copy')
|
||||
@RequirePermission('files.manage')
|
||||
@ApiOperation({ summary: 'Copy a file/directory (jailed)' })
|
||||
async copyFile(
|
||||
@CurrentTenant() licenseId: string,
|
||||
@Param('id') id: string,
|
||||
@Body() body: { path: string; dest: string },
|
||||
) {
|
||||
return this.instances.copyFile(licenseId, id, body.path, body.dest);
|
||||
}
|
||||
}
|
||||
13
backend-nest/src/modules/instances/instances.module.ts
Normal file
@@ -0,0 +1,13 @@
|
||||
import { Module } from '@nestjs/common';
|
||||
import { TypeOrmModule } from '@nestjs/typeorm';
|
||||
import { InstancesController } from './instances.controller';
|
||||
import { InstancesService } from './instances.service';
|
||||
import { GameInstance } from '../../entities/game-instance.entity';
|
||||
import { NatsService } from '../../services/nats.service';
|
||||
|
||||
@Module({
|
||||
imports: [TypeOrmModule.forFeature([GameInstance])],
|
||||
controllers: [InstancesController],
|
||||
providers: [InstancesService, NatsService],
|
||||
})
|
||||
export class InstancesModule {}
|
||||
145
backend-nest/src/modules/instances/instances.service.ts
Normal file
@@ -0,0 +1,145 @@
|
||||
import { Injectable, NotFoundException, BadRequestException, Logger } from '@nestjs/common';
|
||||
import { InjectRepository } from '@nestjs/typeorm';
|
||||
import { Repository } from 'typeorm';
|
||||
import { NatsService } from '../../services/nats.service';
|
||||
import { GameInstance } from '../../entities/game-instance.entity';
|
||||
|
||||
/** Lifecycle funcs the agent's {instance}.cmd handler accepts. */
|
||||
const LIFECYCLE_FUNCS = ['start', 'stop', 'restart', 'status', 'steam_update'] as const;
|
||||
export type LifecycleFunc = (typeof LIFECYCLE_FUNCS)[number];
|
||||
|
||||
@Injectable()
|
||||
export class InstancesService {
|
||||
private readonly logger = new Logger(InstancesService.name);
|
||||
|
||||
constructor(
|
||||
private readonly nats: NatsService,
|
||||
@InjectRepository(GameInstance)
|
||||
private readonly instanceRepo: Repository<GameInstance>,
|
||||
) {}
|
||||
|
||||
/** Resolve an instance the caller's license actually owns (tenant guard). */
|
||||
private async resolveInstance(licenseId: string, instanceId: string): Promise<GameInstance> {
|
||||
const inst = await this.instanceRepo.findOne({
|
||||
where: { id: instanceId, license_id: licenseId },
|
||||
});
|
||||
if (!inst) throw new NotFoundException('Instance not found');
|
||||
return inst;
|
||||
}
|
||||
|
||||
async lifecycle(licenseId: string, instanceId: string, func: LifecycleFunc): Promise<unknown> {
|
||||
if (!LIFECYCLE_FUNCS.includes(func)) {
|
||||
throw new BadRequestException(`Unsupported action '${func}'`);
|
||||
}
|
||||
const inst = await this.resolveInstance(licenseId, instanceId);
|
||||
const subject = `corrosion.${licenseId}.${inst.agent_instance_id}.cmd`;
|
||||
this.logger.log(`instance ${inst.agent_instance_id}: ${func}`);
|
||||
return this.nats.requestScoped(licenseId, subject, { func });
|
||||
}
|
||||
|
||||
async rcon(licenseId: string, instanceId: string, command: string): Promise<unknown> {
|
||||
if (!command || !command.trim()) {
|
||||
throw new BadRequestException('command is required');
|
||||
}
|
||||
const inst = await this.resolveInstance(licenseId, instanceId);
|
||||
const subject = `corrosion.${licenseId}.${inst.agent_instance_id}.cmd`;
|
||||
// RCON can take longer than a lifecycle ack — give it more headroom.
|
||||
return this.nats.requestScoped(licenseId, subject, { func: 'rcon', command }, 12_000);
|
||||
}
|
||||
|
||||
// -------------------------------------------------------------------------
|
||||
// File access — jailed to the instance root by the agent's file manager.
|
||||
// The agent protocol (corrosion-host-agent/src/filemanager.rs):
|
||||
// { op: list|read|write|delete|rename|mkdir|mkfile|move|copy, path, ... }
|
||||
// reply: { status: 'success'|'error', data?, message? }
|
||||
// -------------------------------------------------------------------------
|
||||
|
||||
private filesSubject(inst: GameInstance, licenseId: string): string {
|
||||
return `corrosion.${licenseId}.${inst.agent_instance_id}.files.cmd`;
|
||||
}
|
||||
|
||||
private async fileOp(
|
||||
licenseId: string,
|
||||
instanceId: string,
|
||||
payload: Record<string, unknown>,
|
||||
): Promise<{ status: string; data?: unknown; message?: string }> {
|
||||
const inst = await this.resolveInstance(licenseId, instanceId);
|
||||
const res = await this.nats.requestScoped<{ status: string; data?: unknown; message?: string }>(
|
||||
licenseId,
|
||||
this.filesSubject(inst, licenseId),
|
||||
payload,
|
||||
12_000,
|
||||
);
|
||||
if (res?.status === 'error') {
|
||||
throw new BadRequestException(res.message ?? 'File operation failed');
|
||||
}
|
||||
return res;
|
||||
}
|
||||
|
||||
async listFiles(licenseId: string, instanceId: string, path = ''): Promise<unknown> {
|
||||
const res = await this.fileOp(licenseId, instanceId, { op: 'list', path });
|
||||
return res.data;
|
||||
}
|
||||
|
||||
async readFile(licenseId: string, instanceId: string, path: string): Promise<unknown> {
|
||||
if (!path) throw new BadRequestException('path is required');
|
||||
const res = await this.fileOp(licenseId, instanceId, { op: 'read', path });
|
||||
return res.data;
|
||||
}
|
||||
|
||||
async writeFile(
|
||||
licenseId: string,
|
||||
instanceId: string,
|
||||
path: string,
|
||||
content: string,
|
||||
): Promise<unknown> {
|
||||
if (!path) throw new BadRequestException('path is required');
|
||||
const res = await this.fileOp(licenseId, instanceId, { op: 'write', path, content });
|
||||
return res.data ?? { status: 'success' };
|
||||
}
|
||||
|
||||
async deleteFile(licenseId: string, instanceId: string, path: string): Promise<unknown> {
|
||||
if (!path) throw new BadRequestException('path is required');
|
||||
return (await this.fileOp(licenseId, instanceId, { op: 'delete', path })).data ?? { ok: true };
|
||||
}
|
||||
|
||||
async renameFile(
|
||||
licenseId: string,
|
||||
instanceId: string,
|
||||
path: string,
|
||||
name: string,
|
||||
): Promise<unknown> {
|
||||
if (!path || !name) throw new BadRequestException('path and name are required');
|
||||
return (await this.fileOp(licenseId, instanceId, { op: 'rename', path, name })).data ?? { ok: true };
|
||||
}
|
||||
|
||||
async mkdir(licenseId: string, instanceId: string, path: string): Promise<unknown> {
|
||||
if (!path) throw new BadRequestException('path is required');
|
||||
return (await this.fileOp(licenseId, instanceId, { op: 'mkdir', path })).data ?? { ok: true };
|
||||
}
|
||||
|
||||
async mkfile(licenseId: string, instanceId: string, path: string): Promise<unknown> {
|
||||
if (!path) throw new BadRequestException('path is required');
|
||||
return (await this.fileOp(licenseId, instanceId, { op: 'mkfile', path })).data ?? { ok: true };
|
||||
}
|
||||
|
||||
async moveFile(
|
||||
licenseId: string,
|
||||
instanceId: string,
|
||||
path: string,
|
||||
dest: string,
|
||||
): Promise<unknown> {
|
||||
if (!path || !dest) throw new BadRequestException('path and dest are required');
|
||||
return (await this.fileOp(licenseId, instanceId, { op: 'move', path, dest })).data ?? { ok: true };
|
||||
}
|
||||
|
||||
async copyFile(
|
||||
licenseId: string,
|
||||
instanceId: string,
|
||||
path: string,
|
||||
dest: string,
|
||||
): Promise<unknown> {
|
||||
if (!path || !dest) throw new BadRequestException('path and dest are required');
|
||||
return (await this.fileOp(licenseId, instanceId, { op: 'copy', path, dest })).data ?? { ok: true };
|
||||
}
|
||||
}
|
||||
@@ -23,6 +23,13 @@ export class ServersController {
|
||||
return await this.serversService.getServer(licenseId);
|
||||
}
|
||||
|
||||
@Get('agent-credentials')
|
||||
@RequirePermission('server.manage')
|
||||
@ApiOperation({ summary: 'NATS credentials for this license\'s host agent' })
|
||||
async getAgentCredentials(@CurrentTenant() licenseId: string) {
|
||||
return await this.serversService.getAgentCredentials(licenseId);
|
||||
}
|
||||
|
||||
@Put('config')
|
||||
@RequirePermission('server.manage')
|
||||
@ApiOperation({ summary: 'Update server configuration' })
|
||||
|
||||
@@ -19,6 +19,15 @@ export class ServersService {
|
||||
private readonly natsService: NatsService,
|
||||
) {}
|
||||
|
||||
/**
|
||||
* NATS credentials the customer puts in their host agent's config so it can
|
||||
* authenticate to the per-license-scoped broker. Returns null if the broker
|
||||
* isn't enforcing auth yet (NATS_TOKEN_SECRET unset).
|
||||
*/
|
||||
async getAgentCredentials(licenseId: string) {
|
||||
return this.natsService.getAgentCredentials(licenseId);
|
||||
}
|
||||
|
||||
/**
|
||||
* Get server connection and config for a license.
|
||||
* Returns null fields if no server has been set up yet.
|
||||
|
||||
@@ -5,30 +5,53 @@ import { Repository } from 'typeorm';
|
||||
import { NatsService } from './nats.service';
|
||||
import { ServerConnection } from '../entities/server-connection.entity';
|
||||
import { License } from '../entities/license.entity';
|
||||
import { AgentHost, AgentHostDisk } from '../entities/agent-host.entity';
|
||||
import { GameInstance } from '../entities/game-instance.entity';
|
||||
|
||||
/**
|
||||
* Consumes Corrosion wire protocol v2 host-agent subjects
|
||||
* (corrosion-host-agent/PROTOCOL.md) and keeps server_connections truthful.
|
||||
* (corrosion-host-agent/PROTOCOL.md) and keeps the fleet model truthful.
|
||||
*
|
||||
* Before this service existed, NOTHING persisted agent heartbeats:
|
||||
* companion_last_seen was written once at setup and connection_status stayed
|
||||
* 'connected' forever. Now: heartbeat -> last_seen + connected (row
|
||||
* auto-created on first contact), going_offline beacon -> offline, and a
|
||||
* staleness sweep marks hosts offline when heartbeats stop arriving.
|
||||
* Writes the License → Host → Instance model (hosts + game_instances) from
|
||||
* each heartbeat, AND maintains the legacy single-server `server_connections`
|
||||
* row so the current panel keeps working during the fleet UI transition.
|
||||
*
|
||||
* Host identity: until enrollment issues a stable host id, a host is keyed by
|
||||
* (license_id, hostname). One agent = one host today; the schema is already
|
||||
* multi-host-ready.
|
||||
*/
|
||||
interface HeartbeatPayload {
|
||||
schema?: number;
|
||||
timestamp?: string;
|
||||
agent?: { version?: string; commit?: string; os?: string; arch?: string };
|
||||
host?: {
|
||||
hostname?: string | null;
|
||||
cpu_percent?: number;
|
||||
cpu_cores?: number;
|
||||
mem_total_mb?: number;
|
||||
mem_used_mb?: number;
|
||||
uptime_seconds?: number;
|
||||
disks?: AgentHostDisk[];
|
||||
};
|
||||
instances?: Array<{
|
||||
id: string;
|
||||
game: string;
|
||||
label?: string | null;
|
||||
state?: string;
|
||||
uptime_seconds?: number;
|
||||
}>;
|
||||
}
|
||||
|
||||
@Injectable()
|
||||
export class HostAgentConsumerService implements OnApplicationBootstrap {
|
||||
private readonly logger = new Logger(HostAgentConsumerService.name);
|
||||
|
||||
/** licenseId -> cache expiry epoch-ms. Positive = exists, absent = unknown. */
|
||||
private knownLicenses = new Map<string, number>();
|
||||
/** Unknown/garbage license ids we already warned about (anti log-spam). */
|
||||
private warnedUnknown = new Set<string>();
|
||||
|
||||
private static readonly UUID_RE =
|
||||
/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i;
|
||||
private static readonly LICENSE_CACHE_TTL_MS = 5 * 60_000;
|
||||
/** 3x the agent's default 60s heartbeat (which jitters to max 72s). */
|
||||
private static readonly OFFLINE_AFTER_MS = 180_000;
|
||||
|
||||
constructor(
|
||||
@@ -37,6 +60,10 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
|
||||
private readonly connectionRepository: Repository<ServerConnection>,
|
||||
@InjectRepository(License)
|
||||
private readonly licenseRepository: Repository<License>,
|
||||
@InjectRepository(AgentHost)
|
||||
private readonly hostRepository: Repository<AgentHost>,
|
||||
@InjectRepository(GameInstance)
|
||||
private readonly instanceRepository: Repository<GameInstance>,
|
||||
) {}
|
||||
|
||||
// Bootstrap, not module-init: subscriptions registered before NatsService
|
||||
@@ -44,10 +71,9 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
|
||||
onApplicationBootstrap() {
|
||||
this.nats.subscribe('corrosion.*.host.heartbeat', (data, subject) => {
|
||||
const licenseId = subject.split('.')[1];
|
||||
void this.onHeartbeat(licenseId).catch((err) =>
|
||||
void this.onHeartbeat(licenseId, data as HeartbeatPayload).catch((err) =>
|
||||
this.logger.error(`heartbeat handling failed for ${licenseId}: ${err.message}`, err.stack),
|
||||
);
|
||||
void data; // payload telemetry is bridged to the browser; persistence here is liveness only
|
||||
});
|
||||
|
||||
this.nats.subscribe('corrosion.*.host.going_offline', (_data, subject) => {
|
||||
@@ -60,25 +86,30 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
|
||||
this.logger.log('Host agent (protocol v2) consumer subscriptions initialized');
|
||||
}
|
||||
|
||||
private async onHeartbeat(licenseId: string): Promise<void> {
|
||||
private async onHeartbeat(licenseId: string, payload: HeartbeatPayload): Promise<void> {
|
||||
if (!(await this.isValidTenant(licenseId))) return;
|
||||
|
||||
// A well-formed v2 heartbeat always carries a host block. Reject malformed
|
||||
// payloads so a stray/empty publish can't create a phantom host row.
|
||||
if (!payload || typeof payload.host !== 'object' || payload.host === null) {
|
||||
this.logger.warn(`ignoring malformed heartbeat for license ${licenseId} (no host block)`);
|
||||
return;
|
||||
}
|
||||
const now = new Date();
|
||||
const existing = await this.connectionRepository.findOne({
|
||||
where: { license_id: licenseId },
|
||||
});
|
||||
|
||||
await this.updateLegacyConnection(licenseId, now);
|
||||
const host = await this.upsertHost(licenseId, payload, now);
|
||||
await this.upsertInstances(licenseId, host, payload, now);
|
||||
}
|
||||
|
||||
/** Legacy single-server row — keeps the current panel working. */
|
||||
private async updateLegacyConnection(licenseId: string, now: Date): Promise<void> {
|
||||
const existing = await this.connectionRepository.findOne({ where: { license_id: licenseId } });
|
||||
if (existing) {
|
||||
await this.connectionRepository.update(
|
||||
{ id: existing.id },
|
||||
{ companion_last_seen: now, connection_status: 'connected', updated_at: now },
|
||||
);
|
||||
if (existing.connection_status !== 'connected') {
|
||||
this.logger.log(`host agent for license ${licenseId} is back online`);
|
||||
}
|
||||
} else {
|
||||
// First contact from a host agent: auto-register the connection so the
|
||||
// panel lights up without a manual setup step.
|
||||
await this.connectionRepository.save(
|
||||
this.connectionRepository.create({
|
||||
license_id: licenseId,
|
||||
@@ -87,28 +118,102 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
|
||||
companion_last_seen: now,
|
||||
}),
|
||||
);
|
||||
this.logger.log(`host agent registered for license ${licenseId} (first heartbeat)`);
|
||||
}
|
||||
}
|
||||
|
||||
/** Upsert the fleet host row, keyed by (license_id, hostname). */
|
||||
private async upsertHost(licenseId: string, payload: HeartbeatPayload, now: Date): Promise<AgentHost> {
|
||||
const hostname = payload.host?.hostname ?? '';
|
||||
const fields = {
|
||||
agent_version: payload.agent?.version ?? null,
|
||||
agent_commit: payload.agent?.commit ?? null,
|
||||
os: payload.agent?.os ?? null,
|
||||
arch: payload.agent?.arch ?? null,
|
||||
status: 'connected',
|
||||
last_heartbeat_at: now,
|
||||
cpu_percent: payload.host?.cpu_percent ?? null,
|
||||
cpu_cores: payload.host?.cpu_cores ?? null,
|
||||
mem_total_mb: payload.host?.mem_total_mb ?? null,
|
||||
mem_used_mb: payload.host?.mem_used_mb ?? null,
|
||||
uptime_seconds: payload.host?.uptime_seconds ?? null,
|
||||
disks: payload.host?.disks ?? null,
|
||||
updated_at: now,
|
||||
};
|
||||
|
||||
const existing = await this.hostRepository.findOne({
|
||||
where: { license_id: licenseId, hostname },
|
||||
});
|
||||
if (existing) {
|
||||
await this.hostRepository.update({ id: existing.id }, fields);
|
||||
return { ...existing, ...fields } as AgentHost;
|
||||
}
|
||||
const created = await this.hostRepository.save(
|
||||
this.hostRepository.create({ license_id: licenseId, hostname, ...fields }),
|
||||
);
|
||||
this.logger.log(`host registered for license ${licenseId} (hostname '${hostname || 'unknown'}')`);
|
||||
return created;
|
||||
}
|
||||
|
||||
/** Upsert one game_instances row per heartbeat instance entry. */
|
||||
private async upsertInstances(
|
||||
licenseId: string,
|
||||
host: AgentHost,
|
||||
payload: HeartbeatPayload,
|
||||
now: Date,
|
||||
): Promise<void> {
|
||||
for (const inst of payload.instances ?? []) {
|
||||
if (!inst?.id || !inst?.game) continue;
|
||||
const fields = {
|
||||
host_id: host.id,
|
||||
game: inst.game,
|
||||
label: inst.label ?? null,
|
||||
state: inst.state ?? 'unknown',
|
||||
uptime_seconds: inst.uptime_seconds ?? 0,
|
||||
last_seen_at: now,
|
||||
updated_at: now,
|
||||
};
|
||||
const existing = await this.instanceRepository.findOne({
|
||||
where: { license_id: licenseId, agent_instance_id: inst.id },
|
||||
});
|
||||
if (existing) {
|
||||
await this.instanceRepository.update({ id: existing.id }, fields);
|
||||
} else {
|
||||
await this.instanceRepository.save(
|
||||
this.instanceRepository.create({
|
||||
license_id: licenseId,
|
||||
agent_instance_id: inst.id,
|
||||
...fields,
|
||||
}),
|
||||
);
|
||||
this.logger.log(`instance '${inst.id}' (${inst.game}) registered for license ${licenseId}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
private async onGoingOffline(licenseId: string): Promise<void> {
|
||||
if (!(await this.isValidTenant(licenseId))) return;
|
||||
|
||||
const now = new Date();
|
||||
await this.connectionRepository.update(
|
||||
{ license_id: licenseId },
|
||||
{ connection_status: 'offline', updated_at: new Date() },
|
||||
{ connection_status: 'offline', updated_at: now },
|
||||
);
|
||||
this.logger.log(`host agent for license ${licenseId} went offline (graceful beacon)`);
|
||||
await this.hostRepository.update(
|
||||
{ license_id: licenseId },
|
||||
{ status: 'offline', updated_at: now },
|
||||
);
|
||||
this.logger.log(`host(s) for license ${licenseId} went offline (graceful beacon)`);
|
||||
}
|
||||
|
||||
/**
|
||||
* Heartbeats stopping must flip the panel to offline — an agent that
|
||||
* crashes or loses network never sends the goodbye beacon.
|
||||
* crashes or loses network never sends the goodbye beacon. Sweeps both the
|
||||
* legacy connection and fleet hosts.
|
||||
*/
|
||||
@Interval(60_000)
|
||||
async sweepStaleConnections(): Promise<void> {
|
||||
const threshold = new Date(Date.now() - HostAgentConsumerService.OFFLINE_AFTER_MS);
|
||||
const result = await this.connectionRepository
|
||||
|
||||
const conn = await this.connectionRepository
|
||||
.createQueryBuilder()
|
||||
.update(ServerConnection)
|
||||
.set({ connection_status: 'offline', updated_at: () => 'NOW()' })
|
||||
@@ -117,8 +222,18 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
|
||||
.andWhere('companion_last_seen < :threshold', { threshold })
|
||||
.execute();
|
||||
|
||||
if (result.affected) {
|
||||
this.logger.warn(`marked ${result.affected} stale host connection(s) offline`);
|
||||
const hosts = await this.hostRepository
|
||||
.createQueryBuilder()
|
||||
.update(AgentHost)
|
||||
.set({ status: 'offline', updated_at: () => 'NOW()' })
|
||||
.where('status = :connected', { connected: 'connected' })
|
||||
.andWhere('last_heartbeat_at IS NOT NULL')
|
||||
.andWhere('last_heartbeat_at < :threshold', { threshold })
|
||||
.execute();
|
||||
|
||||
const affected = (conn.affected ?? 0) + (hosts.affected ?? 0);
|
||||
if (affected) {
|
||||
this.logger.warn(`marked ${affected} stale connection/host record(s) offline`);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -132,7 +247,6 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
|
||||
this.warnUnknownOnce(licenseId, 'not a UUID');
|
||||
return false;
|
||||
}
|
||||
|
||||
const cachedUntil = this.knownLicenses.get(licenseId);
|
||||
if (cachedUntil && cachedUntil > Date.now()) return true;
|
||||
|
||||
@@ -141,7 +255,6 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
|
||||
this.warnUnknownOnce(licenseId, 'no such license');
|
||||
return false;
|
||||
}
|
||||
|
||||
this.knownLicenses.set(licenseId, Date.now() + HostAgentConsumerService.LICENSE_CACHE_TTL_MS);
|
||||
return true;
|
||||
}
|
||||
|
||||
@@ -1,6 +1,14 @@
|
||||
import { Injectable, OnModuleInit, OnModuleDestroy, Logger } from '@nestjs/common';
|
||||
import { ConfigService } from '@nestjs/config';
|
||||
import { connect, NatsConnection, StringCodec, Subscription } from 'nats';
|
||||
import { createHmac, randomUUID } from 'crypto';
|
||||
|
||||
export interface AgentCredentials {
|
||||
license_id: string;
|
||||
nats_user: string;
|
||||
nats_password: string;
|
||||
nats_url: string;
|
||||
}
|
||||
|
||||
@Injectable()
|
||||
export class NatsService implements OnModuleInit, OnModuleDestroy {
|
||||
@@ -13,8 +21,13 @@ export class NatsService implements OnModuleInit, OnModuleDestroy {
|
||||
async onModuleInit() {
|
||||
try {
|
||||
const url = this.config.get<string>('nats.url') || 'nats://localhost:4222';
|
||||
this.nc = await connect({ servers: url });
|
||||
this.logger.log(`Connected to NATS at ${url}`);
|
||||
const user = this.config.get<string>('nats.internalUser');
|
||||
const pass = this.config.get<string>('nats.internalPassword');
|
||||
// Authenticate with the privileged internal user when configured;
|
||||
// otherwise connect anonymously (broker hasn't enforced auth yet).
|
||||
const opts = user && pass ? { servers: url, user, pass } : { servers: url };
|
||||
this.nc = await connect(opts);
|
||||
this.logger.log(`Connected to NATS at ${url}${user ? ` as ${user}` : ' (anonymous)'}`);
|
||||
} catch (err) {
|
||||
this.logger.warn(`NATS connection failed — running in offline mode: ${(err as Error).message}`);
|
||||
}
|
||||
@@ -62,6 +75,64 @@ export class NatsService implements OnModuleInit, OnModuleDestroy {
|
||||
return sub;
|
||||
}
|
||||
|
||||
/**
|
||||
* Request-reply to a host-agent subject with a LICENSE-SCOPED reply subject.
|
||||
*
|
||||
* Per-license agent users are confined to corrosion.{license}.> and have no
|
||||
* _INBOX permission, so the agent cannot publish a reply to the default
|
||||
* global inbox. The reply must live inside the license namespace
|
||||
* (corrosion.{license}.reply.<id>); the privileged backend subscribes there.
|
||||
* See corrosion-host-agent/PROTOCOL.md ("Reply-subject rule").
|
||||
*/
|
||||
async requestScoped<T = unknown>(
|
||||
licenseId: string,
|
||||
subject: string,
|
||||
payload: Record<string, unknown>,
|
||||
timeoutMs = 8000,
|
||||
): Promise<T> {
|
||||
if (!this.nc) {
|
||||
throw new Error('NATS unavailable — agent is not reachable');
|
||||
}
|
||||
const replySubject = `corrosion.${licenseId}.reply.${randomUUID()}`;
|
||||
const nc = this.nc;
|
||||
return new Promise<T>((resolve, reject) => {
|
||||
nc.subscribe(replySubject, {
|
||||
max: 1,
|
||||
timeout: timeoutMs,
|
||||
callback: (err, msg) => {
|
||||
if (err) {
|
||||
reject(new Error(`agent did not respond within ${timeoutMs}ms`));
|
||||
return;
|
||||
}
|
||||
try {
|
||||
resolve(JSON.parse(this.sc.decode(msg.data)) as T);
|
||||
} catch {
|
||||
resolve(this.sc.decode(msg.data) as unknown as T);
|
||||
}
|
||||
},
|
||||
});
|
||||
nc.publish(subject, this.sc.encode(JSON.stringify(payload)), { reply: replySubject });
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Derive a license's agent NATS credentials. Password is
|
||||
* HMAC-SHA256(license_id, NATS_TOKEN_SECRET) — must match the broker config
|
||||
* generated by scripts/generate-nats-auth.mjs. Returns null if the secret
|
||||
* isn't configured (broker not yet enforcing auth).
|
||||
*/
|
||||
getAgentCredentials(licenseId: string): AgentCredentials | null {
|
||||
const secret = this.config.get<string>('nats.tokenSecret');
|
||||
if (!secret) return null;
|
||||
const password = createHmac('sha256', secret).update(licenseId).digest('hex');
|
||||
return {
|
||||
license_id: licenseId,
|
||||
nats_user: licenseId,
|
||||
nats_password: password,
|
||||
nats_url: this.config.get<string>('nats.publicUrl') || 'nats://nats.corrosionmgmt.com:4222',
|
||||
};
|
||||
}
|
||||
|
||||
/** Publish a command to a specific license's server */
|
||||
async sendServerCommand(licenseId: string, action: string, payload: Record<string, unknown> = {}): Promise<void> {
|
||||
await this.publish(`corrosion.${licenseId}.cmd.server`, {
|
||||
|
||||
102
backend/migrations/022_fleet_model.sql
Normal file
@@ -0,0 +1,102 @@
|
||||
-- Fleet data model — License → Host → Instance (with optional Cluster)
|
||||
--
|
||||
-- ADDITIVE: existing server_connections / server_config / server_stats are
|
||||
-- left untouched so the current single-server panel keeps working. The
|
||||
-- host-agent consumer writes BOTH the legacy connection row and these fleet
|
||||
-- tables during the transition; the panel migrates to the fleet tables in a
|
||||
-- later phase.
|
||||
--
|
||||
-- Shape mirrors the host agent's wire protocol v2 heartbeat:
|
||||
-- host{} block → agent_hosts
|
||||
-- instances[] entries → game_instances
|
||||
-- Host metrics (CPU/RAM/disk) live on the HOST, not duplicated per instance.
|
||||
--
|
||||
-- Named `agent_hosts` (not `hosts`) to avoid collision with the existing B2B
|
||||
-- `hosts` table (hosting-partner companies) — different concept entirely.
|
||||
|
||||
-----------------------------------------------------------
|
||||
-- AGENT_HOSTS — one Corrosion host agent / one machine
|
||||
-----------------------------------------------------------
|
||||
CREATE TABLE IF NOT EXISTS agent_hosts (
|
||||
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
|
||||
license_id UUID NOT NULL REFERENCES licenses(id) ON DELETE CASCADE,
|
||||
-- Natural key until enrollment issues a stable host identity.
|
||||
hostname VARCHAR(255) NOT NULL DEFAULT '',
|
||||
agent_version VARCHAR(64),
|
||||
agent_commit VARCHAR(64),
|
||||
os VARCHAR(32),
|
||||
arch VARCHAR(32),
|
||||
status VARCHAR(20) NOT NULL DEFAULT 'offline'
|
||||
CHECK (status IN ('connected', 'degraded', 'offline')),
|
||||
last_heartbeat_at TIMESTAMPTZ,
|
||||
cpu_percent DOUBLE PRECISION,
|
||||
cpu_cores INTEGER,
|
||||
mem_total_mb BIGINT,
|
||||
mem_used_mb BIGINT,
|
||||
uptime_seconds BIGINT,
|
||||
disks JSONB, -- [{ "mount": "/", "total_mb": n, "free_mb": n }]
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (license_id, hostname)
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_agent_hosts_license ON agent_hosts(license_id);
|
||||
|
||||
-----------------------------------------------------------
|
||||
-- INSTANCE CLUSTERS — optional grouping (Soulmask main/child, Dune battlegroup)
|
||||
-- Reserved now; cluster logic ships with those game adapters.
|
||||
-----------------------------------------------------------
|
||||
CREATE TABLE IF NOT EXISTS instance_clusters (
|
||||
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
|
||||
license_id UUID NOT NULL REFERENCES licenses(id) ON DELETE CASCADE,
|
||||
game VARCHAR(32) NOT NULL,
|
||||
name VARCHAR(255) NOT NULL,
|
||||
topology VARCHAR(32), -- main_client | battlegroup
|
||||
config JSONB,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_clusters_license ON instance_clusters(license_id);
|
||||
|
||||
-----------------------------------------------------------
|
||||
-- GAME INSTANCES — one game server process / orchestrated unit.
|
||||
-- The billing unit (plans count instances).
|
||||
-----------------------------------------------------------
|
||||
CREATE TABLE IF NOT EXISTS game_instances (
|
||||
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
|
||||
license_id UUID NOT NULL REFERENCES licenses(id) ON DELETE CASCADE,
|
||||
host_id UUID REFERENCES agent_hosts(id) ON DELETE SET NULL,
|
||||
cluster_id UUID REFERENCES instance_clusters(id) ON DELETE SET NULL,
|
||||
-- The agent's instance slug; the NATS subject segment.
|
||||
agent_instance_id VARCHAR(64) NOT NULL,
|
||||
game VARCHAR(32) NOT NULL,
|
||||
label VARCHAR(255),
|
||||
-- running | stopped | starting | stopping | crashed
|
||||
-- | configured | missing_root | unmanaged | unknown
|
||||
state VARCHAR(32) NOT NULL DEFAULT 'unknown',
|
||||
root_path TEXT,
|
||||
uptime_seconds BIGINT NOT NULL DEFAULT 0,
|
||||
last_seen_at TIMESTAMPTZ,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
UNIQUE (license_id, agent_instance_id)
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_instances_license ON game_instances(license_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_instances_host ON game_instances(host_id);
|
||||
|
||||
-----------------------------------------------------------
|
||||
-- INSTANCE STATS — per-instance time series (game metrics).
|
||||
-- Populated once game-level telemetry (player count/FPS via RCON/plugin) is
|
||||
-- collected; the host heartbeat carries host metrics, not game metrics.
|
||||
-----------------------------------------------------------
|
||||
CREATE TABLE IF NOT EXISTS instance_stats (
|
||||
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
|
||||
instance_id UUID NOT NULL REFERENCES game_instances(id) ON DELETE CASCADE,
|
||||
license_id UUID NOT NULL REFERENCES licenses(id) ON DELETE CASCADE,
|
||||
player_count INTEGER NOT NULL DEFAULT 0,
|
||||
max_players INTEGER NOT NULL DEFAULT 0,
|
||||
fps DOUBLE PRECISION NOT NULL DEFAULT 0,
|
||||
memory_usage_mb INTEGER NOT NULL DEFAULT 0,
|
||||
recorded_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_instance_stats_instance
|
||||
ON instance_stats(instance_id, recorded_at DESC);
|
||||
484
corrosion-host-agent/Cargo.lock
generated
@@ -90,7 +90,7 @@ dependencies = [
|
||||
"nuid",
|
||||
"once_cell",
|
||||
"portable-atomic",
|
||||
"rand",
|
||||
"rand 0.8.6",
|
||||
"regex",
|
||||
"ring",
|
||||
"rustls-native-certs",
|
||||
@@ -100,7 +100,7 @@ dependencies = [
|
||||
"serde_json",
|
||||
"serde_nanos",
|
||||
"serde_repr",
|
||||
"thiserror",
|
||||
"thiserror 1.0.69",
|
||||
"time",
|
||||
"tokio",
|
||||
"tokio-rustls",
|
||||
@@ -110,6 +110,23 @@ dependencies = [
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "async-trait"
|
||||
version = "0.1.89"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "9035ad2d096bed7955a320ee7e2230574d28fd3c3a0f186cbea1ff3c7eed5dbb"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "atomic-waker"
|
||||
version = "1.1.2"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "1505bd5d3d116872e7271a6d4e16d81d0c8570876c8de68093a09ac269d8aac0"
|
||||
|
||||
[[package]]
|
||||
name = "autocfg"
|
||||
version = "1.5.1"
|
||||
@@ -180,6 +197,12 @@ version = "1.0.4"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801"
|
||||
|
||||
[[package]]
|
||||
name = "cfg_aliases"
|
||||
version = "0.2.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "613afe47fcd5fac7ccf1db93babcb082c5994d996f20b8b159f2ad1658eb5724"
|
||||
|
||||
[[package]]
|
||||
name = "chrono"
|
||||
version = "0.4.45"
|
||||
@@ -264,15 +287,18 @@ checksum = "773648b94d0e5d620f64f280777445740e61fe701025087ec8b57f45c791888b"
|
||||
|
||||
[[package]]
|
||||
name = "corrosion-host-agent"
|
||||
version = "2.0.0-alpha.3"
|
||||
version = "2.0.0-alpha.9"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"async-nats",
|
||||
"async-trait",
|
||||
"chrono",
|
||||
"clap",
|
||||
"futures",
|
||||
"libc",
|
||||
"rand",
|
||||
"minisign-verify",
|
||||
"rand 0.8.6",
|
||||
"reqwest",
|
||||
"serde",
|
||||
"serde_json",
|
||||
"sysinfo",
|
||||
@@ -585,8 +611,24 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "ff2abc00be7fca6ebc474524697ae276ad847ad0a6b3faa4bcb027e9a4614ad0"
|
||||
dependencies = [
|
||||
"cfg-if",
|
||||
"js-sys",
|
||||
"libc",
|
||||
"wasi",
|
||||
"wasm-bindgen",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "getrandom"
|
||||
version = "0.3.4"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "899def5c37c4fd7b2664648c28120ecec138e4d395b459e5ca34f9cce2dd77fd"
|
||||
dependencies = [
|
||||
"cfg-if",
|
||||
"js-sys",
|
||||
"libc",
|
||||
"r-efi 5.3.0",
|
||||
"wasip2",
|
||||
"wasm-bindgen",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -597,7 +639,7 @@ checksum = "0de51e6874e94e7bf76d726fc5d13ba782deca734ff60d5bb2fb2607c7406555"
|
||||
dependencies = [
|
||||
"cfg-if",
|
||||
"libc",
|
||||
"r-efi",
|
||||
"r-efi 6.0.0",
|
||||
"wasip2",
|
||||
"wasip3",
|
||||
]
|
||||
@@ -633,12 +675,94 @@ dependencies = [
|
||||
"itoa",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "http-body"
|
||||
version = "1.0.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "1efedce1fb8e6913f23e0c92de8e62cd5b772a67e7b3946df930a62566c93184"
|
||||
dependencies = [
|
||||
"bytes",
|
||||
"http",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "http-body-util"
|
||||
version = "0.1.3"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "b021d93e26becf5dc7e1b75b1bed1fd93124b374ceb73f43d4d4eafec896a64a"
|
||||
dependencies = [
|
||||
"bytes",
|
||||
"futures-core",
|
||||
"http",
|
||||
"http-body",
|
||||
"pin-project-lite",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "httparse"
|
||||
version = "1.10.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "6dbf3de79e51f3d586ab4cb9d5c3e2c14aa28ed23d180cf89b4df0454a69cc87"
|
||||
|
||||
[[package]]
|
||||
name = "hyper"
|
||||
version = "1.10.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "55281c53a1894c864990125767da440a4e630446785086f52523b20033b74498"
|
||||
dependencies = [
|
||||
"atomic-waker",
|
||||
"bytes",
|
||||
"futures-channel",
|
||||
"futures-core",
|
||||
"http",
|
||||
"http-body",
|
||||
"httparse",
|
||||
"itoa",
|
||||
"pin-project-lite",
|
||||
"smallvec",
|
||||
"tokio",
|
||||
"want",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "hyper-rustls"
|
||||
version = "0.27.9"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "33ca68d021ef39cf6463ab54c1d0f5daf03377b70561305bb89a8f83aab66e0f"
|
||||
dependencies = [
|
||||
"http",
|
||||
"hyper",
|
||||
"hyper-util",
|
||||
"rustls",
|
||||
"tokio",
|
||||
"tokio-rustls",
|
||||
"tower-service",
|
||||
"webpki-roots",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "hyper-util"
|
||||
version = "0.1.20"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "96547c2556ec9d12fb1578c4eaf448b04993e7fb79cbaad930a656880a6bdfa0"
|
||||
dependencies = [
|
||||
"base64",
|
||||
"bytes",
|
||||
"futures-channel",
|
||||
"futures-util",
|
||||
"http",
|
||||
"http-body",
|
||||
"hyper",
|
||||
"ipnet",
|
||||
"libc",
|
||||
"percent-encoding",
|
||||
"pin-project-lite",
|
||||
"socket2",
|
||||
"tokio",
|
||||
"tower-service",
|
||||
"tracing",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "iana-time-zone"
|
||||
version = "0.1.65"
|
||||
@@ -784,6 +908,12 @@ dependencies = [
|
||||
"serde_core",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "ipnet"
|
||||
version = "2.12.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "d98f6fed1fde3f8c21bc40a1abb88dd75e67924f9cffc3ef95607bad8017f8e2"
|
||||
|
||||
[[package]]
|
||||
name = "is_terminal_polyfill"
|
||||
version = "1.70.2"
|
||||
@@ -852,6 +982,12 @@ version = "0.4.32"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "953f07c43838f8e6f9758cab68bf5bed85465e7587ebe0b823f1bcd81978ad3a"
|
||||
|
||||
[[package]]
|
||||
name = "lru-slab"
|
||||
version = "0.1.2"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "112b39cec0b298b6c1999fee3e31427f74f676e4cb9879ed1a121b43661a4154"
|
||||
|
||||
[[package]]
|
||||
name = "matchers"
|
||||
version = "0.2.0"
|
||||
@@ -867,6 +1003,12 @@ version = "2.8.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "6b947ae49db0d222b1dbc6b113ce7248a3fc3a6ca21b696717bfc000ba4484d8"
|
||||
|
||||
[[package]]
|
||||
name = "minisign-verify"
|
||||
version = "0.2.5"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "22f9645cb765ea72b8111f36c522475d2daa0d22c957a9826437e97534bc4e9e"
|
||||
|
||||
[[package]]
|
||||
name = "mio"
|
||||
version = "1.2.1"
|
||||
@@ -889,7 +1031,7 @@ dependencies = [
|
||||
"ed25519-dalek",
|
||||
"getrandom 0.2.17",
|
||||
"log",
|
||||
"rand",
|
||||
"rand 0.8.6",
|
||||
"signatory",
|
||||
]
|
||||
|
||||
@@ -917,7 +1059,7 @@ version = "0.5.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "fc895af95856f929163a0aa20c26a78d26bfdc839f51b9d5aa7a5b79e52b7e83"
|
||||
dependencies = [
|
||||
"rand",
|
||||
"rand 0.8.6",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -1056,6 +1198,61 @@ dependencies = [
|
||||
"unicode-ident",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "quinn"
|
||||
version = "0.11.9"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "b9e20a958963c291dc322d98411f541009df2ced7b5a4f2bd52337638cfccf20"
|
||||
dependencies = [
|
||||
"bytes",
|
||||
"cfg_aliases",
|
||||
"pin-project-lite",
|
||||
"quinn-proto",
|
||||
"quinn-udp",
|
||||
"rustc-hash",
|
||||
"rustls",
|
||||
"socket2",
|
||||
"thiserror 2.0.18",
|
||||
"tokio",
|
||||
"tracing",
|
||||
"web-time",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "quinn-proto"
|
||||
version = "0.11.14"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "434b42fec591c96ef50e21e886936e66d3cc3f737104fdb9b737c40ffb94c098"
|
||||
dependencies = [
|
||||
"bytes",
|
||||
"getrandom 0.3.4",
|
||||
"lru-slab",
|
||||
"rand 0.9.4",
|
||||
"ring",
|
||||
"rustc-hash",
|
||||
"rustls",
|
||||
"rustls-pki-types",
|
||||
"slab",
|
||||
"thiserror 2.0.18",
|
||||
"tinyvec",
|
||||
"tracing",
|
||||
"web-time",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "quinn-udp"
|
||||
version = "0.5.14"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "addec6a0dcad8a8d96a771f815f0eaf55f9d1805756410b39f5fa81332574cbd"
|
||||
dependencies = [
|
||||
"cfg_aliases",
|
||||
"libc",
|
||||
"once_cell",
|
||||
"socket2",
|
||||
"tracing",
|
||||
"windows-sys 0.52.0",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "quote"
|
||||
version = "1.0.45"
|
||||
@@ -1065,6 +1262,12 @@ dependencies = [
|
||||
"proc-macro2",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "r-efi"
|
||||
version = "5.3.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "69cdb34c158ceb288df11e18b4bd39de994f6657d83847bdffdbd7f346754b0f"
|
||||
|
||||
[[package]]
|
||||
name = "r-efi"
|
||||
version = "6.0.0"
|
||||
@@ -1078,8 +1281,18 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "5ca0ecfa931c29007047d1bc58e623ab12e5590e8c7cc53200d5202b69266d8a"
|
||||
dependencies = [
|
||||
"libc",
|
||||
"rand_chacha",
|
||||
"rand_core",
|
||||
"rand_chacha 0.3.1",
|
||||
"rand_core 0.6.4",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "rand"
|
||||
version = "0.9.4"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "44c5af06bb1b7d3216d91932aed5265164bf384dc89cd6ba05cf59a35f5f76ea"
|
||||
dependencies = [
|
||||
"rand_chacha 0.9.0",
|
||||
"rand_core 0.9.5",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -1089,7 +1302,17 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "e6c10a63a0fa32252be49d21e7709d4d4baf8d231c2dbce1eaa8141b9b127d88"
|
||||
dependencies = [
|
||||
"ppv-lite86",
|
||||
"rand_core",
|
||||
"rand_core 0.6.4",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "rand_chacha"
|
||||
version = "0.9.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "d3022b5f1df60f26e1ffddd6c66e8aa15de382ae63b3a0c1bfc0e4d3e3f325cb"
|
||||
dependencies = [
|
||||
"ppv-lite86",
|
||||
"rand_core 0.9.5",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -1101,6 +1324,15 @@ dependencies = [
|
||||
"getrandom 0.2.17",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "rand_core"
|
||||
version = "0.9.5"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "76afc826de14238e6e8c374ddcc1fa19e374fd8dd986b0d2af0d02377261d83c"
|
||||
dependencies = [
|
||||
"getrandom 0.3.4",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "rayon"
|
||||
version = "1.12.0"
|
||||
@@ -1159,6 +1391,47 @@ version = "0.8.11"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "d6f6ff9a378485b298a5286656da665ba74413d36db0979633275d2e708145d4"
|
||||
|
||||
[[package]]
|
||||
name = "reqwest"
|
||||
version = "0.12.28"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "eddd3ca559203180a307f12d114c268abf583f59b03cb906fd0b3ff8646c1147"
|
||||
dependencies = [
|
||||
"base64",
|
||||
"bytes",
|
||||
"futures-core",
|
||||
"futures-util",
|
||||
"http",
|
||||
"http-body",
|
||||
"http-body-util",
|
||||
"hyper",
|
||||
"hyper-rustls",
|
||||
"hyper-util",
|
||||
"js-sys",
|
||||
"log",
|
||||
"percent-encoding",
|
||||
"pin-project-lite",
|
||||
"quinn",
|
||||
"rustls",
|
||||
"rustls-pki-types",
|
||||
"serde",
|
||||
"serde_json",
|
||||
"serde_urlencoded",
|
||||
"sync_wrapper",
|
||||
"tokio",
|
||||
"tokio-rustls",
|
||||
"tokio-util",
|
||||
"tower",
|
||||
"tower-http",
|
||||
"tower-service",
|
||||
"url",
|
||||
"wasm-bindgen",
|
||||
"wasm-bindgen-futures",
|
||||
"wasm-streams",
|
||||
"web-sys",
|
||||
"webpki-roots",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "ring"
|
||||
version = "0.17.14"
|
||||
@@ -1173,6 +1446,12 @@ dependencies = [
|
||||
"windows-sys 0.52.0",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "rustc-hash"
|
||||
version = "2.1.2"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "94300abf3f1ae2e2b8ffb7b58043de3d399c73fa6f4b73826402a5c457614dbe"
|
||||
|
||||
[[package]]
|
||||
name = "rustc_version"
|
||||
version = "0.4.1"
|
||||
@@ -1237,6 +1516,7 @@ version = "1.14.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "30a7197ae7eb376e574fe940d068c30fe0462554a3ddbe4eca7838e049c937a9"
|
||||
dependencies = [
|
||||
"web-time",
|
||||
"zeroize",
|
||||
]
|
||||
|
||||
@@ -1268,6 +1548,12 @@ version = "1.0.22"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "b39cdef0fa800fc44525c84ccb54a029961a8215f9619753635a9c0d2538d46d"
|
||||
|
||||
[[package]]
|
||||
name = "ryu"
|
||||
version = "1.0.23"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "9774ba4a74de5f7b1c1451ed6cd5285a32eddb5cccb8cc655a4e50009e06477f"
|
||||
|
||||
[[package]]
|
||||
name = "schannel"
|
||||
version = "0.1.29"
|
||||
@@ -1384,6 +1670,18 @@ dependencies = [
|
||||
"serde",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "serde_urlencoded"
|
||||
version = "0.7.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "d3491c14715ca2294c4d6a88f15e84739788c1d030eed8c110436aafdaa2f3fd"
|
||||
dependencies = [
|
||||
"form_urlencoded",
|
||||
"itoa",
|
||||
"ryu",
|
||||
"serde",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "sha1"
|
||||
version = "0.10.6"
|
||||
@@ -1438,7 +1736,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "c1e303f8205714074f6068773f0e29527e0453937fe837c9717d066635b65f31"
|
||||
dependencies = [
|
||||
"pkcs8",
|
||||
"rand_core",
|
||||
"rand_core 0.6.4",
|
||||
"signature",
|
||||
"zeroize",
|
||||
]
|
||||
@@ -1450,7 +1748,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "77549399552de45a898a580c1b41d445bf730df867cc44e6c0233bbc4b8329de"
|
||||
dependencies = [
|
||||
"digest",
|
||||
"rand_core",
|
||||
"rand_core 0.6.4",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -1514,6 +1812,15 @@ dependencies = [
|
||||
"unicode-ident",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "sync_wrapper"
|
||||
version = "1.0.2"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "0bf256ce5efdfa370213c1dabab5935a12e49f2c58d15e9eac2870d3b4f27263"
|
||||
dependencies = [
|
||||
"futures-core",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "synstructure"
|
||||
version = "0.13.2"
|
||||
@@ -1558,7 +1865,16 @@ version = "1.0.69"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "b6aaf5339b578ea85b50e080feb250a3e8ae8cfcdff9a461c9ec2904bc923f52"
|
||||
dependencies = [
|
||||
"thiserror-impl",
|
||||
"thiserror-impl 1.0.69",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "thiserror"
|
||||
version = "2.0.18"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "4288b5bcbc7920c07a1149a35cf9590a2aa808e0bc1eafaade0b80947865fbc4"
|
||||
dependencies = [
|
||||
"thiserror-impl 2.0.18",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -1572,6 +1888,17 @@ dependencies = [
|
||||
"syn",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "thiserror-impl"
|
||||
version = "2.0.18"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "ebc4ee7f67670e9b64d05fa4253e753e016c6c95ff35b89b7941d6b856dec1d5"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
"syn",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "thread_local"
|
||||
version = "1.1.9"
|
||||
@@ -1622,6 +1949,21 @@ dependencies = [
|
||||
"zerovec",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "tinyvec"
|
||||
version = "1.11.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "3e61e67053d25a4e82c844e8424039d9745781b3fc4f32b8d55ed50f5f667ef3"
|
||||
dependencies = [
|
||||
"tinyvec_macros",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "tinyvec_macros"
|
||||
version = "0.1.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "1f3ccbac311fea05f86f61904b462b55fb3df8837a366dfc601a0161d0532f20"
|
||||
|
||||
[[package]]
|
||||
name = "tokio"
|
||||
version = "1.52.3"
|
||||
@@ -1727,6 +2069,51 @@ version = "0.1.2"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "5d99f8c9a7727884afe522e9bd5edbfc91a3312b36a77b5fb8926e4c31a41801"
|
||||
|
||||
[[package]]
|
||||
name = "tower"
|
||||
version = "0.5.3"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "ebe5ef63511595f1344e2d5cfa636d973292adc0eec1f0ad45fae9f0851ab1d4"
|
||||
dependencies = [
|
||||
"futures-core",
|
||||
"futures-util",
|
||||
"pin-project-lite",
|
||||
"sync_wrapper",
|
||||
"tokio",
|
||||
"tower-layer",
|
||||
"tower-service",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "tower-http"
|
||||
version = "0.6.11"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "4cfcf7e2740e6fc6d4d688b4ef00650406bb94adf4731e43c096c3a19fe40840"
|
||||
dependencies = [
|
||||
"bitflags",
|
||||
"bytes",
|
||||
"futures-util",
|
||||
"http",
|
||||
"http-body",
|
||||
"pin-project-lite",
|
||||
"tower",
|
||||
"tower-layer",
|
||||
"tower-service",
|
||||
"url",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "tower-layer"
|
||||
version = "0.3.3"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "121c2a6cda46980bb0fcd1647ffaf6cd3fc79a013de288782836f6df9c48780e"
|
||||
|
||||
[[package]]
|
||||
name = "tower-service"
|
||||
version = "0.3.3"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "8df9b6e13f2d32c91b9bd719c00d1958837bc7dec474d94952798cc8e69eeec3"
|
||||
|
||||
[[package]]
|
||||
name = "tracing"
|
||||
version = "0.1.44"
|
||||
@@ -1788,6 +2175,12 @@ dependencies = [
|
||||
"tracing-log",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "try-lock"
|
||||
version = "0.2.5"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "e421abadd41a4225275504ea4d6566923418b7f05506fbc9c0fe86ba7396114b"
|
||||
|
||||
[[package]]
|
||||
name = "tryhard"
|
||||
version = "0.5.2"
|
||||
@@ -1810,9 +2203,9 @@ dependencies = [
|
||||
"http",
|
||||
"httparse",
|
||||
"log",
|
||||
"rand",
|
||||
"rand 0.8.6",
|
||||
"sha1",
|
||||
"thiserror",
|
||||
"thiserror 1.0.69",
|
||||
"utf-8",
|
||||
]
|
||||
|
||||
@@ -1882,6 +2275,15 @@ version = "0.9.5"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"
|
||||
|
||||
[[package]]
|
||||
name = "want"
|
||||
version = "0.3.1"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "bfa7760aed19e106de2c7c0b581b509f2f25d3dacaf737cb82ac61bc6d760b0e"
|
||||
dependencies = [
|
||||
"try-lock",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "wasi"
|
||||
version = "0.11.1+wasi-snapshot-preview1"
|
||||
@@ -1919,6 +2321,16 @@ dependencies = [
|
||||
"wasm-bindgen-shared",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "wasm-bindgen-futures"
|
||||
version = "0.4.73"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "54568702fabf5d4849ce2b90fadfa64168a097eaf4b351ce9df8b687a0086aaf"
|
||||
dependencies = [
|
||||
"js-sys",
|
||||
"wasm-bindgen",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "wasm-bindgen-macro"
|
||||
version = "0.2.123"
|
||||
@@ -1973,6 +2385,19 @@ dependencies = [
|
||||
"wasmparser",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "wasm-streams"
|
||||
version = "0.4.2"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "15053d8d85c7eccdbefef60f06769760a563c7f0a9d6902a13d35c7800b0ad65"
|
||||
dependencies = [
|
||||
"futures-util",
|
||||
"js-sys",
|
||||
"wasm-bindgen",
|
||||
"wasm-bindgen-futures",
|
||||
"web-sys",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "wasmparser"
|
||||
version = "0.244.0"
|
||||
@@ -1985,6 +2410,35 @@ dependencies = [
|
||||
"semver",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "web-sys"
|
||||
version = "0.3.100"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "6e0871acf327f283dc6da28a1696cdc64fb355ba9f935d052021fa77f35cce69"
|
||||
dependencies = [
|
||||
"js-sys",
|
||||
"wasm-bindgen",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "web-time"
|
||||
version = "1.1.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "5a6580f308b1fad9207618087a65c04e7a10bc77e02c8e84e9b00dd4b12fa0bb"
|
||||
dependencies = [
|
||||
"js-sys",
|
||||
"wasm-bindgen",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "webpki-roots"
|
||||
version = "1.0.7"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "52f5ee44c96cf55f1b349600768e3ece3a8f26010c05265ab73f945bb1a2eb9d"
|
||||
dependencies = [
|
||||
"rustls-pki-types",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "winapi"
|
||||
version = "0.3.9"
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "corrosion-host-agent"
|
||||
version = "2.0.0-alpha.3"
|
||||
version = "2.0.0-alpha.9"
|
||||
edition = "2021"
|
||||
description = "Corrosion Host Agent — multi-game ops runtime for self-hosted game servers"
|
||||
license = "UNLICENSED"
|
||||
@@ -23,9 +23,12 @@ chrono = { version = "0.4", features = ["serde", "clock"] }
|
||||
tracing = "0.1"
|
||||
tracing-subscriber = { version = "0.3", features = ["env-filter", "fmt"] }
|
||||
anyhow = "1"
|
||||
async-trait = "0.1"
|
||||
clap = { version = "4.5", features = ["derive"] }
|
||||
rand = "0.8"
|
||||
tokio-tungstenite = "0.24"
|
||||
minisign-verify = "0.2.5"
|
||||
reqwest = { version = "0.12", default-features = false, features = ["rustls-tls", "stream"] }
|
||||
|
||||
[target.'cfg(unix)'.dependencies]
|
||||
libc = "0.2"
|
||||
|
||||
@@ -85,6 +85,7 @@ Request: `{ "func": "<name>" }`. Reply: `{ "status": "success" | "error", ... }`
|
||||
| `ping` | `version`, `commit`, `uptime_seconds` |
|
||||
| `probe` | `report` — fresh ProbeReport (also cached for heartbeat) |
|
||||
| `sysinfo` | `snapshot` — full heartbeat payload, collected on demand |
|
||||
| `update` | `{ "func": "update", "url": "https://cdn.corrosionmgmt.com/host-agent/.../corrosion-host-agent-<plat>" }` → downloads the binary + `<url>.minisig`, verifies the minisign signature against the agent's EMBEDDED public key, atomically swaps (with `.old` rollback), replies `{ status: success, message: "...relaunching" }`, then relaunches the new binary. Rejects anything not signed by the release key and any URL that isn't `https://cdn.corrosionmgmt.com`. |
|
||||
|
||||
Unknown funcs return `status: "error"` with a message listing supported funcs.
|
||||
|
||||
@@ -100,8 +101,16 @@ Payload: `{}`.
|
||||
|
||||
Lifecycle and control for one game instance.
|
||||
|
||||
The same `start`/`stop`/`restart`/`status` funcs work for **every** game: the
|
||||
agent picks a `Supervisor` impl per game — a spawned-process supervisor for
|
||||
Rust/Conan/Soulmask, a **docker-compose supervisor for Dune** (`docker compose
|
||||
up -d` / `stop` / `restart` against the instance's compose project, configured
|
||||
via `[instance.docker_compose]`). The wire contract is identical; only the
|
||||
management model behind it differs.
|
||||
|
||||
Implemented funcs: `start`, `stop` (graceful with 30s budget, then force
|
||||
kill), `restart`, `status` (returns `state` + `uptime_seconds`), and
|
||||
kill — process supervisor; Dune maps stop to `docker compose stop`), `restart`,
|
||||
`status` (returns `state` + `uptime_seconds`), and
|
||||
`rcon` — `{ "func": "rcon", "command": "<console command>" }` returns
|
||||
`{ "status": "success", "output": <server response> }`. Protocol per game:
|
||||
WebRCON (WebSocket JSON) for rust, Source RCON (Valve TCP) for
|
||||
@@ -117,7 +126,10 @@ streaming progress lines to `corrosion.{license}.{instance}.steam_status`
|
||||
and replying on completion.
|
||||
|
||||
Planned funcs: `oxide_install` (rust), plus game-adapter-specific
|
||||
commands (Dune: docker lifecycle, RabbitMQ bus commands, Coriolis reset).
|
||||
commands (Dune: RabbitMQ admin-bus commands, Coriolis reset, Postgres admin
|
||||
surface). Dune **lifecycle** is already covered by the shared
|
||||
start/stop/restart funcs above; container crash-detection and state adoption on
|
||||
agent restart land with Phase 3b.
|
||||
|
||||
### `corrosion.{license_id}.{instance_id}.steam_status` (agent → backend, publish) — LIVE
|
||||
|
||||
@@ -179,6 +191,23 @@ service that attempts connections to the customer's public IP/ports on
|
||||
request; that is specified as a Phase 1+ feature and will reuse this report
|
||||
format with `direction: "inbound"`.
|
||||
|
||||
## Authentication & tenant isolation
|
||||
|
||||
The broker enforces per-license auth: an agent connects with `user = license_id`,
|
||||
`password = HMAC-SHA256(license_id, NATS_TOKEN_SECRET)` (shown on the panel
|
||||
Server page), and is scoped to `corrosion.{license_id}.>` only. The backend uses
|
||||
a privileged internal user. This makes cross-tenant access impossible at the
|
||||
broker, not just by convention.
|
||||
|
||||
**Reply-subject rule:** per-license users have NO `_INBOX` permission (granting
|
||||
it would let one license read another's request-reply traffic). Therefore any
|
||||
backend→agent request-reply MUST use a reply subject inside the license
|
||||
namespace — e.g. `corrosion.{license_id}.reply.<id>` — never the client's
|
||||
default global `_INBOX`. The agent is unaffected: it responds to whatever
|
||||
`msg.reply` it receives. The constraint is on the requester (the internal user
|
||||
has full access). The contract/CI tests run against an unauthenticated broker
|
||||
and use the default inbox; production request-reply must follow this rule.
|
||||
|
||||
## Versioning
|
||||
|
||||
- The agent embeds semver + git hash + build timestamp (`--version`,
|
||||
|
||||
@@ -20,8 +20,11 @@ instance on that host — Rust, Conan Exiles, Soulmask, Dune: Awakening.
|
||||
crash detection with exit codes, live state in heartbeats
|
||||
(integration-tested with real processes + live-NATS contract test)
|
||||
- [ ] Phase 1b: RCON trait (WebRCON rust / TCP conan+soulmask), SteamCMD, jailed file manager
|
||||
- [ ] Phase 2: Dune Docker adapter (compose lifecycle, RabbitMQ bus, Postgres admin)
|
||||
- [ ] Phase 3: signed self-update (enforced ed25519 — release gate), service install, supervisor split
|
||||
- [~] Phase 2: Dune Docker adapter — **compose lifecycle done** (`docker compose up -d/stop/restart`
|
||||
via the `Supervisor` trait + `DockerComposeSupervisor`); RabbitMQ admin bus + Postgres admin
|
||||
surface deferred. Container crash-detection + state adoption on agent restart land with Phase 3b.
|
||||
- [x] Phase 3a: SIGNED self-update — minisign-verified download+swap+relaunch (NATS `update` func); embedded public key; CI signs releases
|
||||
- [ ] Phase 3b: service install (systemd/SCM), PID adoption
|
||||
|
||||
## Build
|
||||
|
||||
|
||||
@@ -9,7 +9,11 @@
|
||||
[agent]
|
||||
license_id = "your-license-uuid"
|
||||
nats_url = "nats://nats.corrosionmgmt.com:4222"
|
||||
# nats_token = "set-me-or-use-CORROSION_NATS_TOKEN"
|
||||
# Per-license auth (preferred): user = license id, password = the token shown
|
||||
# on the panel Server page. The broker scopes you to corrosion.{license}.>
|
||||
# nats_user = "your-license-uuid" # defaults to license_id if omitted
|
||||
# nats_password = "set-me-or-use-CORROSION_NATS_PASSWORD"
|
||||
# nats_token = "legacy token-only auth; use nats_password instead"
|
||||
heartbeat_seconds = 60
|
||||
log_level = "info"
|
||||
|
||||
@@ -56,6 +60,24 @@ password = "changeme"
|
||||
# Dune instances do not use SteamCMD (Docker images); the steam_update func
|
||||
# will return a clear error if invoked on a dune instance.
|
||||
|
||||
# --- Dune: Awakening (container-managed) ---------------------------------
|
||||
# Dune runs as a docker-compose stack, not a spawned process — leave
|
||||
# `executable` unset and add an [instance.docker_compose] block. The agent
|
||||
# drives `docker compose up -d / stop / restart` for start/stop/restart, and
|
||||
# `steam_update` is rejected (Dune ships as Docker images).
|
||||
#
|
||||
# [[instance]]
|
||||
# id = "dune-main"
|
||||
# game = "dune"
|
||||
# root = "/opt/dune" # directory the compose commands run in
|
||||
# label = "Arrakis (battlegroup)"
|
||||
#
|
||||
# [instance.docker_compose]
|
||||
# file = "docker-compose.yml" # -f; relative to root. Omit to use compose's discovery
|
||||
# project = "dune-main" # -p; defaults to the instance id
|
||||
# service = "gameserver" # limit lifecycle to one service; omit for the whole stack
|
||||
# command = ["docker", "compose"] # default; use ["docker-compose"] for the legacy binary
|
||||
|
||||
[prober]
|
||||
interval_seconds = 300
|
||||
|
||||
|
||||
@@ -7,16 +7,17 @@ use tokio::sync::RwLock;
|
||||
use tokio_util::sync::CancellationToken;
|
||||
|
||||
use crate::config::Settings;
|
||||
use crate::process::ProcessSupervisor;
|
||||
use crate::prober::ProbeReport;
|
||||
use crate::supervisor::Supervisor;
|
||||
|
||||
pub struct Agent {
|
||||
pub cfg: Settings,
|
||||
pub nats: async_nats::Client,
|
||||
pub started: Instant,
|
||||
pub last_probe: RwLock<Option<ProbeReport>>,
|
||||
/// One supervisor per instance (unmanaged instances included — they
|
||||
/// report `unmanaged` state and reject process commands).
|
||||
pub supervisors: HashMap<String, Arc<ProcessSupervisor>>,
|
||||
/// One supervisor per instance, keyed by instance id. The concrete impl
|
||||
/// (process vs docker-compose) is chosen per game by the factory in main;
|
||||
/// every subsystem talks to the `Supervisor` trait only.
|
||||
pub supervisors: HashMap<String, Arc<dyn Supervisor>>,
|
||||
pub shutdown: CancellationToken,
|
||||
}
|
||||
|
||||
@@ -33,7 +33,15 @@ pub async fn connect(cfg: &Settings) -> Result<async_nats::Client> {
|
||||
if force_tls {
|
||||
opts = opts.require_tls(true);
|
||||
}
|
||||
if let Some(token) = &cfg.nats_token {
|
||||
|
||||
// Per-license auth: the broker maps user=license_id, password=derived
|
||||
// token to permissions scoped to corrosion.{license_id}.>. Falls back to
|
||||
// token-only or anonymous so the agent still works against a broker that
|
||||
// hasn't enforced auth yet (transition period).
|
||||
if let Some(password) = &cfg.nats_password {
|
||||
let user = cfg.nats_user.clone().unwrap_or_else(|| cfg.license_id.clone());
|
||||
opts = opts.user_and_password(user, password.clone());
|
||||
} else if let Some(token) = &cfg.nats_token {
|
||||
opts = opts.token(token.clone());
|
||||
}
|
||||
|
||||
|
||||
@@ -10,6 +10,7 @@ use serde::Deserialize;
|
||||
use std::collections::HashSet;
|
||||
use std::path::{Path, PathBuf};
|
||||
|
||||
use crate::docker_compose::DockerComposeConfig;
|
||||
use crate::rcon::RconConfig;
|
||||
use crate::steamcmd::SteamcmdConfig;
|
||||
|
||||
@@ -34,6 +35,12 @@ pub struct AgentSection {
|
||||
pub license_id: Option<String>,
|
||||
pub nats_url: Option<String>,
|
||||
pub nats_token: Option<String>,
|
||||
/// NATS username for per-license auth. Defaults to license_id when a
|
||||
/// password is set but no user is given.
|
||||
pub nats_user: Option<String>,
|
||||
/// NATS password (the per-license token). When set, the agent authenticates
|
||||
/// with user+password instead of a bare token.
|
||||
pub nats_password: Option<String>,
|
||||
#[serde(default = "default_heartbeat_seconds")]
|
||||
pub heartbeat_seconds: u64,
|
||||
#[serde(default = "default_log_level")]
|
||||
@@ -70,6 +77,10 @@ pub struct InstanceConfig {
|
||||
/// validate = false).
|
||||
#[serde(default)]
|
||||
pub steamcmd: Option<SteamcmdConfig>,
|
||||
/// Docker-compose settings for container-managed games (Dune). Absent =
|
||||
/// defaults apply (compose file in the instance root, project = instance id).
|
||||
#[serde(default)]
|
||||
pub docker_compose: Option<DockerComposeConfig>,
|
||||
}
|
||||
|
||||
impl InstanceConfig {
|
||||
@@ -122,6 +133,8 @@ pub struct Settings {
|
||||
pub license_id: String,
|
||||
pub nats_url: String,
|
||||
pub nats_token: Option<String>,
|
||||
pub nats_user: Option<String>,
|
||||
pub nats_password: Option<String>,
|
||||
pub heartbeat_seconds: u64,
|
||||
pub log_level: String,
|
||||
pub instances: Vec<InstanceConfig>,
|
||||
@@ -167,6 +180,16 @@ fn resolve(file: ConfigFile) -> Result<Settings> {
|
||||
.filter(|v| !v.is_empty())
|
||||
.or(file.agent.nats_token);
|
||||
|
||||
let nats_user = std::env::var("CORROSION_NATS_USER")
|
||||
.ok()
|
||||
.filter(|v| !v.is_empty())
|
||||
.or(file.agent.nats_user);
|
||||
|
||||
let nats_password = std::env::var("CORROSION_NATS_PASSWORD")
|
||||
.ok()
|
||||
.filter(|v| !v.is_empty())
|
||||
.or(file.agent.nats_password);
|
||||
|
||||
validate_subject_segment("license_id", &license_id)?;
|
||||
|
||||
let mut seen: HashSet<&str> = HashSet::new();
|
||||
@@ -196,6 +219,8 @@ fn resolve(file: ConfigFile) -> Result<Settings> {
|
||||
license_id,
|
||||
nats_url,
|
||||
nats_token,
|
||||
nats_user,
|
||||
nats_password,
|
||||
heartbeat_seconds: file.agent.heartbeat_seconds,
|
||||
log_level: file.agent.log_level,
|
||||
instances: file.instances,
|
||||
|
||||
216
corrosion-host-agent/src/docker_compose.rs
Normal file
@@ -0,0 +1,216 @@
|
||||
//! Docker-compose instance supervision — the Dune: Awakening adapter.
|
||||
//!
|
||||
//! Dune does not ship as a SteamCMD-updated process like Rust/Conan/Soulmask;
|
||||
//! it runs as Docker container(s) (game server + RabbitMQ broker + Postgres),
|
||||
//! orchestrated as a compose stack (a "battlegroup"). So Dune lifecycle is
|
||||
//! `docker compose up -d / stop / restart` against the instance's compose
|
||||
//! project, not a spawned OS process. This supervisor implements the same
|
||||
//! [`Supervisor`] trait `ProcessSupervisor` does, so the instance command
|
||||
//! dispatch is identical — only the management model differs.
|
||||
//!
|
||||
//! Scope (first cut): lifecycle + cached state. Two parity items are deferred
|
||||
//! to Phase 3b alongside process PID adoption: (1) crash detection (containers
|
||||
//! give us no child handle — a `docker compose ps` poll loop would supply it);
|
||||
//! (2) state adoption on agent restart (a running stack reports `stopped` until
|
||||
//! the next lifecycle command). Both are reconcilable with a `ps` probe.
|
||||
//!
|
||||
//! Reference: docs/reference-repos/icehunter SETUP_DOCKER.md (the docker
|
||||
//! control plane this mirrors).
|
||||
|
||||
use std::path::PathBuf;
|
||||
use std::process::Stdio;
|
||||
use std::sync::Arc;
|
||||
use std::time::Instant;
|
||||
|
||||
use anyhow::{bail, Context, Result};
|
||||
use serde::Deserialize;
|
||||
use tokio::process::Command;
|
||||
use tokio::sync::{watch, Mutex};
|
||||
|
||||
use crate::config::InstanceConfig;
|
||||
use crate::supervisor::{InstanceState, Supervisor};
|
||||
|
||||
/// Per-instance docker-compose settings (`[instance.docker_compose]`). All
|
||||
/// fields optional — defaults cover the common "one compose file in the
|
||||
/// instance root" case.
|
||||
#[derive(Debug, Clone, Default, Deserialize)]
|
||||
#[serde(deny_unknown_fields)]
|
||||
pub struct DockerComposeConfig {
|
||||
/// Compose file (`-f`). Relative paths resolve against the run dir. Default:
|
||||
/// compose's own discovery (docker-compose.yml in the run dir).
|
||||
#[serde(default)]
|
||||
pub file: Option<PathBuf>,
|
||||
/// Compose project name (`-p`). Default: the instance id.
|
||||
#[serde(default)]
|
||||
pub project: Option<String>,
|
||||
/// Limit lifecycle ops to one service. Default: every service in the file.
|
||||
#[serde(default)]
|
||||
pub service: Option<String>,
|
||||
/// Override the compose binary invocation. Default: `["docker","compose"]`.
|
||||
/// Use `["docker-compose"]` for the legacy standalone binary.
|
||||
#[serde(default)]
|
||||
pub command: Option<Vec<String>>,
|
||||
}
|
||||
|
||||
struct Inner {
|
||||
started_at: Option<Instant>,
|
||||
}
|
||||
|
||||
pub struct DockerComposeSupervisor {
|
||||
instance_id: String,
|
||||
/// Directory the compose commands run in (relative `-f`/file paths resolve
|
||||
/// against it).
|
||||
run_dir: PathBuf,
|
||||
compose_file: Option<PathBuf>,
|
||||
project: String,
|
||||
service: Option<String>,
|
||||
/// Compose binary + leading args, e.g. `["docker","compose"]`.
|
||||
command: Vec<String>,
|
||||
inner: Mutex<Inner>,
|
||||
state_tx: watch::Sender<InstanceState>,
|
||||
}
|
||||
|
||||
impl DockerComposeSupervisor {
|
||||
pub fn new(cfg: &InstanceConfig) -> Arc<Self> {
|
||||
let dc = cfg.docker_compose.clone().unwrap_or_default();
|
||||
let run_dir = cfg
|
||||
.working_dir
|
||||
.clone()
|
||||
.unwrap_or_else(|| cfg.root.clone());
|
||||
let command = dc
|
||||
.command
|
||||
.filter(|c| !c.is_empty())
|
||||
.unwrap_or_else(|| vec!["docker".to_string(), "compose".to_string()]);
|
||||
let (state_tx, _) = watch::channel(InstanceState::Stopped);
|
||||
Arc::new(Self {
|
||||
instance_id: cfg.id.clone(),
|
||||
run_dir,
|
||||
compose_file: dc.file,
|
||||
project: dc.project.unwrap_or_else(|| cfg.id.clone()),
|
||||
service: dc.service,
|
||||
command,
|
||||
inner: Mutex::new(Inner { started_at: None }),
|
||||
state_tx,
|
||||
})
|
||||
}
|
||||
|
||||
fn set_state(&self, state: InstanceState) {
|
||||
let _ = self.state_tx.send_replace(state);
|
||||
}
|
||||
|
||||
/// Run one compose subcommand (`up`/`stop`/`restart`/...), bailing with the
|
||||
/// captured stderr on non-zero exit. Global flags (`-f`, `-p`) precede the
|
||||
/// subcommand; the optional single service is appended last.
|
||||
async fn run(&self, action: &str, action_args: &[&str]) -> Result<()> {
|
||||
let mut cmd = Command::new(&self.command[0]);
|
||||
cmd.args(&self.command[1..]);
|
||||
if let Some(file) = &self.compose_file {
|
||||
cmd.arg("-f").arg(file);
|
||||
}
|
||||
cmd.arg("-p").arg(&self.project);
|
||||
cmd.arg(action);
|
||||
cmd.args(action_args);
|
||||
if let Some(service) = &self.service {
|
||||
cmd.arg(service);
|
||||
}
|
||||
cmd.current_dir(&self.run_dir)
|
||||
.stdin(Stdio::null())
|
||||
.stdout(Stdio::piped())
|
||||
.stderr(Stdio::piped());
|
||||
|
||||
let output = cmd
|
||||
.output()
|
||||
.await
|
||||
.with_context(|| format!("running `{} {action}` (is docker installed and on PATH?)", self.command.join(" ")))?;
|
||||
|
||||
if !output.status.success() {
|
||||
let stderr = String::from_utf8_lossy(&output.stderr);
|
||||
let stdout = String::from_utf8_lossy(&output.stdout);
|
||||
let detail = if !stderr.trim().is_empty() {
|
||||
stderr.trim()
|
||||
} else {
|
||||
stdout.trim()
|
||||
};
|
||||
bail!("compose {action} failed ({}): {detail}", output.status);
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait::async_trait]
|
||||
impl Supervisor for DockerComposeSupervisor {
|
||||
fn instance_id(&self) -> &str {
|
||||
&self.instance_id
|
||||
}
|
||||
|
||||
fn state(&self) -> InstanceState {
|
||||
self.state_tx.borrow().clone()
|
||||
}
|
||||
|
||||
fn watch_state(&self) -> watch::Receiver<InstanceState> {
|
||||
self.state_tx.subscribe()
|
||||
}
|
||||
|
||||
async fn uptime_seconds(&self) -> u64 {
|
||||
let inner = self.inner.lock().await;
|
||||
match (&*self.state_tx.borrow(), inner.started_at) {
|
||||
(InstanceState::Running, Some(t)) => t.elapsed().as_secs(),
|
||||
_ => 0,
|
||||
}
|
||||
}
|
||||
|
||||
async fn start(self: Arc<Self>) -> Result<()> {
|
||||
if matches!(
|
||||
*self.state_tx.borrow(),
|
||||
InstanceState::Running | InstanceState::Starting
|
||||
) {
|
||||
bail!("instance '{}' is already running", self.instance_id);
|
||||
}
|
||||
self.set_state(InstanceState::Starting);
|
||||
match self.run("up", &["-d"]).await {
|
||||
Ok(()) => {
|
||||
self.inner.lock().await.started_at = Some(Instant::now());
|
||||
self.set_state(InstanceState::Running);
|
||||
tracing::info!("instance '{}' compose up -d", self.instance_id);
|
||||
Ok(())
|
||||
}
|
||||
Err(e) => {
|
||||
self.set_state(InstanceState::Stopped);
|
||||
Err(e)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async fn stop(self: Arc<Self>) -> Result<()> {
|
||||
self.set_state(InstanceState::Stopping);
|
||||
match self.run("stop", &[]).await {
|
||||
Ok(()) => {
|
||||
self.inner.lock().await.started_at = None;
|
||||
self.set_state(InstanceState::Stopped);
|
||||
tracing::info!("instance '{}' compose stop", self.instance_id);
|
||||
Ok(())
|
||||
}
|
||||
Err(e) => {
|
||||
// Stop failed — the stack is most likely still up.
|
||||
self.set_state(InstanceState::Running);
|
||||
Err(e)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async fn restart(self: Arc<Self>) -> Result<()> {
|
||||
self.set_state(InstanceState::Starting);
|
||||
match self.run("restart", &[]).await {
|
||||
Ok(()) => {
|
||||
self.inner.lock().await.started_at = Some(Instant::now());
|
||||
self.set_state(InstanceState::Running);
|
||||
tracing::info!("instance '{}' compose restart", self.instance_id);
|
||||
Ok(())
|
||||
}
|
||||
Err(e) => {
|
||||
self.set_state(InstanceState::Stopped);
|
||||
Err(e)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -198,7 +198,11 @@ pub fn list(root: &Path, rel: &str) -> anyhow::Result<Vec<FileEntry>> {
|
||||
let mut entries: Vec<FileEntry> = Vec::new();
|
||||
for item in rd {
|
||||
let item = item.with_context(|| format!("reading directory entry in '{}'", abs.display()))?;
|
||||
let meta = item.metadata().with_context(|| format!("stat '{}'", item.path().display()))?;
|
||||
// symlink_metadata (lstat): report the link itself, never the target —
|
||||
// following it would leak the size/type/existence of files outside the
|
||||
// jail. A symlink lists as a zero-ish-size non-dir entry.
|
||||
let meta = fs::symlink_metadata(item.path())
|
||||
.with_context(|| format!("stat '{}'", item.path().display()))?;
|
||||
|
||||
let name = item.file_name().to_string_lossy().into_owned();
|
||||
let is_dir = meta.is_dir();
|
||||
@@ -367,11 +371,24 @@ pub fn copy(root: &Path, src: &str, dest: &str) -> anyhow::Result<()> {
|
||||
.with_context(|| format!("copy '{}' -> '{}'", src_abs.display(), dest_abs.display()))
|
||||
}
|
||||
|
||||
/// Recursive copy helper (mirrors Go's `copyRecursive`).
|
||||
/// Recursive copy helper.
|
||||
///
|
||||
/// SECURITY: uses `symlink_metadata` (does NOT follow symlinks) and refuses to
|
||||
/// copy any symlink. `jail()` only validates the top-level src/dest; a symlink
|
||||
/// *inside* a copied directory that points outside the jail would, if followed,
|
||||
/// pull external content (e.g. `/etc`) into the jail where it could then be
|
||||
/// read — a jail-escape exfiltration. Refusing symlinks closes that path.
|
||||
fn copy_recursive(src: &Path, dest: &Path) -> anyhow::Result<()> {
|
||||
let meta = fs::metadata(src)
|
||||
let meta = fs::symlink_metadata(src)
|
||||
.with_context(|| format!("stat source '{}'", src.display()))?;
|
||||
|
||||
if meta.file_type().is_symlink() {
|
||||
bail!(
|
||||
"refusing to copy symlink '{}' — symlinks are not followed across the jail boundary",
|
||||
src.display()
|
||||
);
|
||||
}
|
||||
|
||||
if meta.is_dir() {
|
||||
fs::create_dir_all(dest)
|
||||
.with_context(|| format!("create_dir_all '{}'", dest.display()))?;
|
||||
|
||||
@@ -13,11 +13,15 @@ use crate::agent::Agent;
|
||||
use crate::prober;
|
||||
use crate::subjects;
|
||||
use crate::telemetry;
|
||||
use crate::update;
|
||||
use crate::version;
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct HostCommand {
|
||||
func: String,
|
||||
/// Signed-update artifact URL (for func = "update").
|
||||
#[serde(default)]
|
||||
url: Option<String>,
|
||||
}
|
||||
|
||||
pub async fn run(agent: Arc<Agent>) -> anyhow::Result<()> {
|
||||
@@ -55,20 +59,46 @@ async fn handle(agent: Arc<Agent>, msg: async_nats::Message) {
|
||||
return;
|
||||
};
|
||||
|
||||
let response = match serde_json::from_slice::<HostCommand>(&msg.payload) {
|
||||
Ok(cmd) => dispatch(&agent, &cmd.func).await,
|
||||
Err(e) => json!({ "status": "error", "message": format!("invalid command payload: {e}") }),
|
||||
};
|
||||
|
||||
let bytes = match serde_json::to_vec(&response) {
|
||||
Ok(b) => b,
|
||||
let cmd = match serde_json::from_slice::<HostCommand>(&msg.payload) {
|
||||
Ok(cmd) => cmd,
|
||||
Err(e) => {
|
||||
tracing::error!("response serialize failed: {e}");
|
||||
publish(&agent, &reply, json!({ "status": "error", "message": format!("invalid command payload: {e}") })).await;
|
||||
return;
|
||||
}
|
||||
};
|
||||
if let Err(e) = agent.nats.publish(reply, bytes.into()).await {
|
||||
tracing::warn!("response publish failed: {e}");
|
||||
|
||||
// Self-update is special: it must reply BEFORE relaunching, because the
|
||||
// relaunch replaces this process and nothing after it would run.
|
||||
if cmd.func == "update" {
|
||||
let Some(url) = cmd.url else {
|
||||
publish(&agent, &reply, json!({ "status": "error", "message": "update requires a 'url'" })).await;
|
||||
return;
|
||||
};
|
||||
match update::download_verify_swap(&url).await {
|
||||
Ok(_) => {
|
||||
publish(&agent, &reply, json!({ "status": "success", "func": "update", "message": "verified and swapped; relaunching" })).await;
|
||||
let _ = agent.nats.flush().await;
|
||||
update::relaunch_and_exit();
|
||||
}
|
||||
Err(e) => {
|
||||
publish(&agent, &reply, json!({ "status": "error", "func": "update", "message": format!("{e:#}") })).await;
|
||||
}
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
let response = dispatch(&agent, &cmd.func).await;
|
||||
publish(&agent, &reply, response).await;
|
||||
}
|
||||
|
||||
async fn publish(agent: &Arc<Agent>, reply: &async_nats::Subject, value: serde_json::Value) {
|
||||
match serde_json::to_vec(&value) {
|
||||
Ok(bytes) => {
|
||||
if let Err(e) = agent.nats.publish(reply.clone(), bytes.into()).await {
|
||||
tracing::warn!("response publish failed: {e}");
|
||||
}
|
||||
}
|
||||
Err(e) => tracing::error!("response serialize failed: {e}"),
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -13,9 +13,9 @@ use serde_json::json;
|
||||
use std::sync::Arc;
|
||||
|
||||
use crate::agent::Agent;
|
||||
use crate::process::ProcessSupervisor;
|
||||
use crate::subjects;
|
||||
use crate::steamcmd;
|
||||
use crate::supervisor::Supervisor;
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct InstanceCommand {
|
||||
@@ -26,8 +26,8 @@ struct InstanceCommand {
|
||||
}
|
||||
|
||||
/// Forward every supervisor state change as a status event.
|
||||
pub async fn publish_state_changes(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>) {
|
||||
let subject = subjects::instance_status(&agent.cfg.license_id, &sup.instance_id);
|
||||
pub async fn publish_state_changes(agent: Arc<Agent>, sup: Arc<dyn Supervisor>) {
|
||||
let subject = subjects::instance_status(&agent.cfg.license_id, sup.instance_id());
|
||||
let mut rx = sup.watch_state();
|
||||
let cancel = agent.shutdown.clone();
|
||||
|
||||
@@ -40,13 +40,13 @@ pub async fn publish_state_changes(agent: Arc<Agent>, sup: Arc<ProcessSupervisor
|
||||
let state = rx.borrow().clone();
|
||||
let event = json!({
|
||||
"timestamp": Utc::now().to_rfc3339_opts(SecondsFormat::Secs, true),
|
||||
"instance_id": sup.instance_id,
|
||||
"instance_id": sup.instance_id(),
|
||||
"event": state,
|
||||
});
|
||||
match serde_json::to_vec(&event) {
|
||||
Ok(bytes) => {
|
||||
if let Err(e) = agent.nats.publish(subject.clone(), bytes.into()).await {
|
||||
tracing::warn!("status publish failed for '{}': {e}", sup.instance_id);
|
||||
tracing::warn!("status publish failed for '{}': {e}", sup.instance_id());
|
||||
}
|
||||
}
|
||||
Err(e) => tracing::error!("status serialize failed: {e}"),
|
||||
@@ -58,8 +58,8 @@ pub async fn publish_state_changes(agent: Arc<Agent>, sup: Arc<ProcessSupervisor
|
||||
}
|
||||
|
||||
/// Request-reply command handler for one instance.
|
||||
pub async fn run(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>) -> anyhow::Result<()> {
|
||||
let subject = subjects::instance_cmd(&agent.cfg.license_id, &sup.instance_id);
|
||||
pub async fn run(agent: Arc<Agent>, sup: Arc<dyn Supervisor>) -> anyhow::Result<()> {
|
||||
let subject = subjects::instance_cmd(&agent.cfg.license_id, sup.instance_id());
|
||||
let mut sub = agent.nats.subscribe(subject.clone()).await?;
|
||||
tracing::info!("instance command handler listening on {subject}");
|
||||
|
||||
@@ -74,13 +74,13 @@ pub async fn run(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>) -> anyhow::Resu
|
||||
tokio::spawn(async move { handle(agent, sup, msg).await });
|
||||
}
|
||||
None => {
|
||||
tracing::warn!("instance command subscription ended for '{}'", sup.instance_id);
|
||||
tracing::warn!("instance command subscription ended for '{}'", sup.instance_id());
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
_ = cancel.cancelled() => {
|
||||
tracing::info!("instance command handler stopping for '{}'", sup.instance_id);
|
||||
tracing::info!("instance command handler stopping for '{}'", sup.instance_id());
|
||||
break;
|
||||
}
|
||||
}
|
||||
@@ -88,7 +88,7 @@ pub async fn run(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>) -> anyhow::Resu
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn handle(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>, msg: async_nats::Message) {
|
||||
async fn handle(agent: Arc<Agent>, sup: Arc<dyn Supervisor>, msg: async_nats::Message) {
|
||||
let Some(reply) = msg.reply.clone() else {
|
||||
tracing::warn!("instance command without reply subject ignored");
|
||||
return;
|
||||
@@ -113,20 +113,22 @@ async fn handle(agent: Arc<Agent>, sup: Arc<ProcessSupervisor>, msg: async_nats:
|
||||
|
||||
async fn dispatch(
|
||||
agent: &Arc<Agent>,
|
||||
sup: &Arc<ProcessSupervisor>,
|
||||
sup: &Arc<dyn Supervisor>,
|
||||
cmd: &InstanceCommand,
|
||||
) -> serde_json::Value {
|
||||
let func = cmd.func.as_str();
|
||||
|
||||
// start/stop/restart take `self: Arc<Self>` (they may hand a clone to a
|
||||
// monitor task), so clone the Arc before the consuming call.
|
||||
let outcome = match func {
|
||||
"start" => sup.start().await.map(|_| "starting"),
|
||||
"stop" => sup.stop().await.map(|_| "stopped"),
|
||||
"restart" => sup.restart().await.map(|_| "restarted"),
|
||||
"start" => sup.clone().start().await.map(|_| "starting"),
|
||||
"stop" => sup.clone().stop().await.map(|_| "stopped"),
|
||||
"restart" => sup.clone().restart().await.map(|_| "restarted"),
|
||||
"status" => {
|
||||
return json!({
|
||||
"status": "success",
|
||||
"func": "status",
|
||||
"instance_id": sup.instance_id,
|
||||
"instance_id": sup.instance_id(),
|
||||
"state": sup.state(),
|
||||
"uptime_seconds": sup.uptime_seconds().await,
|
||||
});
|
||||
@@ -139,15 +141,15 @@ async fn dispatch(
|
||||
.cfg
|
||||
.instances
|
||||
.iter()
|
||||
.find(|i| i.id == sup.instance_id);
|
||||
.find(|i| i.id == sup.instance_id());
|
||||
|
||||
let rcon_cfg = inst_cfg.and_then(|i| i.rcon.as_ref());
|
||||
let Some(rcon_cfg) = rcon_cfg else {
|
||||
return json!({
|
||||
"status": "error",
|
||||
"func": "rcon",
|
||||
"instance_id": sup.instance_id,
|
||||
"message": format!("instance '{}' has no rcon configured", sup.instance_id),
|
||||
"instance_id": sup.instance_id(),
|
||||
"message": format!("instance '{}' has no rcon configured", sup.instance_id()),
|
||||
});
|
||||
};
|
||||
|
||||
@@ -155,7 +157,7 @@ async fn dispatch(
|
||||
return json!({
|
||||
"status": "error",
|
||||
"func": "rcon",
|
||||
"instance_id": sup.instance_id,
|
||||
"instance_id": sup.instance_id(),
|
||||
"message": "rcon func requires a 'command' field",
|
||||
});
|
||||
};
|
||||
@@ -165,13 +167,13 @@ async fn dispatch(
|
||||
Ok(output) => json!({
|
||||
"status": "success",
|
||||
"func": "rcon",
|
||||
"instance_id": sup.instance_id,
|
||||
"instance_id": sup.instance_id(),
|
||||
"output": output,
|
||||
}),
|
||||
Err(e) => json!({
|
||||
"status": "error",
|
||||
"func": "rcon",
|
||||
"instance_id": sup.instance_id,
|
||||
"instance_id": sup.instance_id(),
|
||||
"message": format!("{e:#}"),
|
||||
}),
|
||||
};
|
||||
@@ -181,14 +183,14 @@ async fn dispatch(
|
||||
// settings. The supervisor only carries process-control state, not
|
||||
// the full config, so we reach into agent.cfg.instances here as the
|
||||
// rcon dispatch does.
|
||||
let inst_cfg = agent.cfg.instances.iter().find(|i| i.id == sup.instance_id);
|
||||
let inst_cfg = agent.cfg.instances.iter().find(|i| i.id == sup.instance_id());
|
||||
|
||||
let Some(inst_cfg) = inst_cfg else {
|
||||
return json!({
|
||||
"status": "error",
|
||||
"func": "steam_update",
|
||||
"instance_id": sup.instance_id,
|
||||
"message": format!("no config found for instance '{}'", sup.instance_id),
|
||||
"instance_id": sup.instance_id(),
|
||||
"message": format!("no config found for instance '{}'", sup.instance_id()),
|
||||
});
|
||||
};
|
||||
|
||||
@@ -209,7 +211,7 @@ async fn dispatch(
|
||||
};
|
||||
|
||||
let license = agent.cfg.license_id.clone();
|
||||
let instance_id = sup.instance_id.clone();
|
||||
let instance_id = sup.instance_id().to_string();
|
||||
let nats = agent.nats.clone();
|
||||
|
||||
// Publish each progress line to the steam_status subject.
|
||||
@@ -240,12 +242,12 @@ async fn dispatch(
|
||||
Ok(()) => json!({
|
||||
"status": "success",
|
||||
"func": "steam_update",
|
||||
"instance_id": sup.instance_id,
|
||||
"instance_id": sup.instance_id(),
|
||||
}),
|
||||
Err(e) => json!({
|
||||
"status": "error",
|
||||
"func": "steam_update",
|
||||
"instance_id": sup.instance_id,
|
||||
"instance_id": sup.instance_id(),
|
||||
"message": format!("{e:#}"),
|
||||
}),
|
||||
};
|
||||
@@ -262,14 +264,14 @@ async fn dispatch(
|
||||
Ok(result) => json!({
|
||||
"status": "success",
|
||||
"func": func,
|
||||
"instance_id": sup.instance_id,
|
||||
"instance_id": sup.instance_id(),
|
||||
"result": result,
|
||||
"state": sup.state(),
|
||||
}),
|
||||
Err(e) => json!({
|
||||
"status": "error",
|
||||
"func": func,
|
||||
"instance_id": sup.instance_id,
|
||||
"instance_id": sup.instance_id(),
|
||||
"message": format!("{e:#}"),
|
||||
}),
|
||||
}
|
||||
|
||||
@@ -4,6 +4,7 @@
|
||||
pub mod agent;
|
||||
pub mod bus;
|
||||
pub mod config;
|
||||
pub mod docker_compose;
|
||||
pub mod filemanager;
|
||||
pub mod hostcmd;
|
||||
pub mod instancecmd;
|
||||
@@ -12,5 +13,7 @@ pub mod process;
|
||||
pub mod rcon;
|
||||
pub mod steamcmd;
|
||||
pub mod subjects;
|
||||
pub mod supervisor;
|
||||
pub mod telemetry;
|
||||
pub mod update;
|
||||
pub mod version;
|
||||
|
||||
@@ -5,8 +5,8 @@
|
||||
//! game adapters arrive in Phase 1+ (see PROTOCOL.md).
|
||||
|
||||
use corrosion_host_agent::{
|
||||
agent, bus, config, filemanager, hostcmd, instancecmd, prober, process, subjects, telemetry,
|
||||
version,
|
||||
agent, bus, config, docker_compose, filemanager, hostcmd, instancecmd, prober, process,
|
||||
subjects, supervisor, telemetry, version,
|
||||
};
|
||||
|
||||
use anyhow::{Context, Result};
|
||||
@@ -92,10 +92,20 @@ async fn run(settings: config::Settings) -> Result<()> {
|
||||
|
||||
let nats = bus::connect(&settings).await?;
|
||||
|
||||
let supervisors = settings
|
||||
// Per-game supervisor factory: container-managed games (Dune) get a
|
||||
// docker-compose supervisor; everything else is a spawned-process
|
||||
// supervisor. Both satisfy the `Supervisor` trait, so the rest of the agent
|
||||
// is game-agnostic.
|
||||
let supervisors: std::collections::HashMap<String, Arc<dyn supervisor::Supervisor>> = settings
|
||||
.instances
|
||||
.iter()
|
||||
.map(|inst| (inst.id.clone(), process::ProcessSupervisor::new(inst)))
|
||||
.map(|inst| {
|
||||
let sup: Arc<dyn supervisor::Supervisor> = match inst.game.as_str() {
|
||||
"dune" => docker_compose::DockerComposeSupervisor::new(inst),
|
||||
_ => process::ProcessSupervisor::new(inst),
|
||||
};
|
||||
(inst.id.clone(), sup)
|
||||
})
|
||||
.collect();
|
||||
|
||||
let agent = Arc::new(Agent {
|
||||
|
||||
@@ -1,14 +1,16 @@
|
||||
//! Per-instance game-server process supervision.
|
||||
//!
|
||||
//! One `ProcessSupervisor` per process-managed instance. Lifecycle mirrors the
|
||||
//! proven Go agent behavior — graceful SIGTERM with a 30s budget before force
|
||||
//! kill, a monitor task that reaps the child and records crash-vs-stop — with
|
||||
//! two fixes the Go version needed: args are a proper list (no naive space
|
||||
//! splitting), and every state change is observable through a watch channel
|
||||
//! so the panel gets push events instead of waiting for the next heartbeat.
|
||||
//! One `ProcessSupervisor` per process-managed instance (Rust/Conan/Soulmask).
|
||||
//! Lifecycle mirrors the proven Go agent behavior — graceful SIGTERM with a 30s
|
||||
//! budget before force kill, a monitor task that reaps the child and records
|
||||
//! crash-vs-stop — with two fixes the Go version needed: args are a proper list
|
||||
//! (no naive space splitting), and every state change is observable through a
|
||||
//! watch channel so the panel gets push events instead of waiting for the next
|
||||
//! heartbeat. Lifecycle control is exposed through the [`Supervisor`] trait so
|
||||
//! the command dispatch is identical across process- and container-managed
|
||||
//! games.
|
||||
|
||||
use anyhow::{bail, Context, Result};
|
||||
use serde::Serialize;
|
||||
use std::path::PathBuf;
|
||||
use std::process::Stdio;
|
||||
use std::sync::Arc;
|
||||
@@ -17,39 +19,11 @@ use tokio::process::{Child, Command};
|
||||
use tokio::sync::{watch, Mutex};
|
||||
|
||||
use crate::config::InstanceConfig;
|
||||
use crate::supervisor::{InstanceState, Supervisor};
|
||||
|
||||
const GRACEFUL_STOP_BUDGET: Duration = Duration::from_secs(30);
|
||||
const RESTART_PAUSE: Duration = Duration::from_secs(2);
|
||||
|
||||
#[derive(Debug, Clone, PartialEq, Serialize)]
|
||||
#[serde(rename_all = "snake_case", tag = "state")]
|
||||
pub enum InstanceState {
|
||||
/// Not process-managed (no executable configured).
|
||||
Unmanaged,
|
||||
Stopped,
|
||||
Starting,
|
||||
Running,
|
||||
Stopping,
|
||||
/// Process exited without a stop request.
|
||||
Crashed {
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
exit_code: Option<i32>,
|
||||
},
|
||||
}
|
||||
|
||||
impl InstanceState {
|
||||
pub fn as_label(&self) -> &'static str {
|
||||
match self {
|
||||
InstanceState::Unmanaged => "unmanaged",
|
||||
InstanceState::Stopped => "stopped",
|
||||
InstanceState::Starting => "starting",
|
||||
InstanceState::Running => "running",
|
||||
InstanceState::Stopping => "stopping",
|
||||
InstanceState::Crashed { .. } => "crashed",
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
struct Inner {
|
||||
child: Option<Child>,
|
||||
started_at: Option<Instant>,
|
||||
@@ -59,7 +33,7 @@ struct Inner {
|
||||
}
|
||||
|
||||
pub struct ProcessSupervisor {
|
||||
pub instance_id: String,
|
||||
instance_id: String,
|
||||
executable: Option<PathBuf>,
|
||||
args: Vec<String>,
|
||||
working_dir: Option<PathBuf>,
|
||||
@@ -90,72 +64,6 @@ impl ProcessSupervisor {
|
||||
})
|
||||
}
|
||||
|
||||
pub fn state(&self) -> InstanceState {
|
||||
self.state_tx.borrow().clone()
|
||||
}
|
||||
|
||||
pub fn watch_state(&self) -> watch::Receiver<InstanceState> {
|
||||
self.state_tx.subscribe()
|
||||
}
|
||||
|
||||
pub async fn uptime_seconds(&self) -> u64 {
|
||||
let inner = self.inner.lock().await;
|
||||
match (&*self.state_tx.borrow(), inner.started_at) {
|
||||
(InstanceState::Running, Some(t)) => t.elapsed().as_secs(),
|
||||
_ => 0,
|
||||
}
|
||||
}
|
||||
|
||||
pub async fn start(self: &Arc<Self>) -> Result<()> {
|
||||
let Some(exe) = self.executable.clone() else {
|
||||
bail!("instance '{}' has no executable configured", self.instance_id);
|
||||
};
|
||||
if !exe.exists() {
|
||||
bail!("executable not found: {}", exe.display());
|
||||
}
|
||||
|
||||
let mut inner = self.inner.lock().await;
|
||||
if matches!(*self.state_tx.borrow(), InstanceState::Running | InstanceState::Starting) {
|
||||
bail!("instance '{}' is already running", self.instance_id);
|
||||
}
|
||||
|
||||
self.set_state(InstanceState::Starting);
|
||||
|
||||
let workdir = self
|
||||
.working_dir
|
||||
.clone()
|
||||
.or_else(|| exe.parent().map(|p| p.to_path_buf()))
|
||||
.unwrap_or_else(|| PathBuf::from("."));
|
||||
|
||||
let child = Command::new(&exe)
|
||||
.args(&self.args)
|
||||
.current_dir(&workdir)
|
||||
.stdin(Stdio::null())
|
||||
.stdout(Stdio::inherit())
|
||||
.stderr(Stdio::inherit())
|
||||
.spawn()
|
||||
.with_context(|| format!("spawning {}", exe.display()))?;
|
||||
|
||||
let pid = child.id();
|
||||
inner.child = Some(child);
|
||||
inner.started_at = Some(Instant::now());
|
||||
inner.stop_requested = false;
|
||||
drop(inner);
|
||||
|
||||
self.set_state(InstanceState::Running);
|
||||
tracing::info!(
|
||||
"instance '{}' started: {} (pid {:?})",
|
||||
self.instance_id,
|
||||
exe.display(),
|
||||
pid
|
||||
);
|
||||
|
||||
// Monitor: reap the child and classify the exit.
|
||||
let sup = Arc::clone(self);
|
||||
tokio::spawn(async move { sup.monitor().await });
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn monitor(self: Arc<Self>) {
|
||||
// Take a waiter without holding the lock across the whole child
|
||||
// lifetime: Child::wait needs &mut, so the child stays in inner and
|
||||
@@ -201,7 +109,85 @@ impl ProcessSupervisor {
|
||||
}
|
||||
}
|
||||
|
||||
pub async fn stop(self: &Arc<Self>) -> Result<()> {
|
||||
fn set_state(&self, state: InstanceState) {
|
||||
// send_replace never fails even with zero receivers.
|
||||
let _ = self.state_tx.send_replace(state);
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait::async_trait]
|
||||
impl Supervisor for ProcessSupervisor {
|
||||
fn instance_id(&self) -> &str {
|
||||
&self.instance_id
|
||||
}
|
||||
|
||||
fn state(&self) -> InstanceState {
|
||||
self.state_tx.borrow().clone()
|
||||
}
|
||||
|
||||
fn watch_state(&self) -> watch::Receiver<InstanceState> {
|
||||
self.state_tx.subscribe()
|
||||
}
|
||||
|
||||
async fn uptime_seconds(&self) -> u64 {
|
||||
let inner = self.inner.lock().await;
|
||||
match (&*self.state_tx.borrow(), inner.started_at) {
|
||||
(InstanceState::Running, Some(t)) => t.elapsed().as_secs(),
|
||||
_ => 0,
|
||||
}
|
||||
}
|
||||
|
||||
async fn start(self: Arc<Self>) -> Result<()> {
|
||||
let Some(exe) = self.executable.clone() else {
|
||||
bail!("instance '{}' has no executable configured", self.instance_id);
|
||||
};
|
||||
if !exe.exists() {
|
||||
bail!("executable not found: {}", exe.display());
|
||||
}
|
||||
|
||||
let mut inner = self.inner.lock().await;
|
||||
if matches!(*self.state_tx.borrow(), InstanceState::Running | InstanceState::Starting) {
|
||||
bail!("instance '{}' is already running", self.instance_id);
|
||||
}
|
||||
|
||||
self.set_state(InstanceState::Starting);
|
||||
|
||||
let workdir = self
|
||||
.working_dir
|
||||
.clone()
|
||||
.or_else(|| exe.parent().map(|p| p.to_path_buf()))
|
||||
.unwrap_or_else(|| PathBuf::from("."));
|
||||
|
||||
let child = Command::new(&exe)
|
||||
.args(&self.args)
|
||||
.current_dir(&workdir)
|
||||
.stdin(Stdio::null())
|
||||
.stdout(Stdio::inherit())
|
||||
.stderr(Stdio::inherit())
|
||||
.spawn()
|
||||
.with_context(|| format!("spawning {}", exe.display()))?;
|
||||
|
||||
let pid = child.id();
|
||||
inner.child = Some(child);
|
||||
inner.started_at = Some(Instant::now());
|
||||
inner.stop_requested = false;
|
||||
drop(inner);
|
||||
|
||||
self.set_state(InstanceState::Running);
|
||||
tracing::info!(
|
||||
"instance '{}' started: {} (pid {:?})",
|
||||
self.instance_id,
|
||||
exe.display(),
|
||||
pid
|
||||
);
|
||||
|
||||
// Monitor: reap the child and classify the exit.
|
||||
let sup = Arc::clone(&self);
|
||||
tokio::spawn(async move { sup.monitor().await });
|
||||
Ok(())
|
||||
}
|
||||
|
||||
async fn stop(self: Arc<Self>) -> Result<()> {
|
||||
let mut inner = self.inner.lock().await;
|
||||
if inner.child.is_none() {
|
||||
bail!("instance '{}' is not running", self.instance_id);
|
||||
@@ -263,16 +249,14 @@ impl ProcessSupervisor {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub async fn restart(self: &Arc<Self>) -> Result<()> {
|
||||
if !matches!(*self.state_tx.borrow(), InstanceState::Stopped | InstanceState::Crashed { .. } | InstanceState::Unmanaged) {
|
||||
self.stop().await?;
|
||||
async fn restart(self: Arc<Self>) -> Result<()> {
|
||||
if !matches!(
|
||||
*self.state_tx.borrow(),
|
||||
InstanceState::Stopped | InstanceState::Crashed { .. } | InstanceState::Unmanaged
|
||||
) {
|
||||
self.clone().stop().await?;
|
||||
}
|
||||
tokio::time::sleep(RESTART_PAUSE).await;
|
||||
self.start().await
|
||||
}
|
||||
|
||||
fn set_state(&self, state: InstanceState) {
|
||||
// send_replace never fails even with zero receivers.
|
||||
let _ = self.state_tx.send_replace(state);
|
||||
}
|
||||
}
|
||||
|
||||
80
corrosion-host-agent/src/supervisor.rs
Normal file
@@ -0,0 +1,80 @@
|
||||
//! The supervision abstraction.
|
||||
//!
|
||||
//! A `Supervisor` owns the lifecycle of one game instance. Different games are
|
||||
//! managed in fundamentally different ways — Rust/Conan/Soulmask are spawned OS
|
||||
//! processes ([`crate::process::ProcessSupervisor`]); Dune is a docker-compose
|
||||
//! stack ([`crate::docker_compose::DockerComposeSupervisor`]); future planes
|
||||
//! (kubectl, AMP/podman, SSH) will be their own impls. The instance command
|
||||
//! dispatch (`instancecmd::dispatch`) talks only to this trait, so it never
|
||||
//! learns which management model is behind a given instance.
|
||||
//!
|
||||
//! Trait objects (`Arc<dyn Supervisor>`) need object-safe, dynamically
|
||||
//! dispatchable async methods; native `async fn` in traits is not yet
|
||||
//! dyn-compatible, so we use `#[async_trait]` (the battle-tested ecosystem
|
||||
//! standard) to box the returned futures. The cost — one heap alloc per
|
||||
//! lifecycle call — is irrelevant for start/stop/restart, which happen seconds
|
||||
//! to minutes apart.
|
||||
|
||||
use std::sync::Arc;
|
||||
|
||||
use anyhow::Result;
|
||||
use serde::Serialize;
|
||||
use tokio::sync::watch;
|
||||
|
||||
/// Observable lifecycle state of one instance. Shared vocabulary across every
|
||||
/// supervisor impl; serialized verbatim into heartbeats and status events
|
||||
/// (`{"state":"running", ...}`).
|
||||
#[derive(Debug, Clone, PartialEq, Serialize)]
|
||||
#[serde(rename_all = "snake_case", tag = "state")]
|
||||
pub enum InstanceState {
|
||||
/// Not lifecycle-managed (a process instance with no executable, etc.).
|
||||
Unmanaged,
|
||||
Stopped,
|
||||
Starting,
|
||||
Running,
|
||||
Stopping,
|
||||
/// Exited/died without a stop request.
|
||||
Crashed {
|
||||
#[serde(skip_serializing_if = "Option::is_none")]
|
||||
exit_code: Option<i32>,
|
||||
},
|
||||
}
|
||||
|
||||
impl InstanceState {
|
||||
pub fn as_label(&self) -> &'static str {
|
||||
match self {
|
||||
InstanceState::Unmanaged => "unmanaged",
|
||||
InstanceState::Stopped => "stopped",
|
||||
InstanceState::Starting => "starting",
|
||||
InstanceState::Running => "running",
|
||||
InstanceState::Stopping => "stopping",
|
||||
InstanceState::Crashed { .. } => "crashed",
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Lifecycle control + state observation for one instance.
|
||||
///
|
||||
/// `start`/`stop`/`restart` take `self: Arc<Self>` so an impl can hand a clone
|
||||
/// to a spawned monitor task; callers hold an `Arc<dyn Supervisor>` and
|
||||
/// `clone()` before each call. `watch_state` exposes the same channel the
|
||||
/// status-event publisher drains, so panel push events stay decoupled from the
|
||||
/// heartbeat cadence.
|
||||
#[async_trait::async_trait]
|
||||
pub trait Supervisor: Send + Sync {
|
||||
/// The instance slug (a NATS subject segment).
|
||||
fn instance_id(&self) -> &str;
|
||||
|
||||
/// Current cached state (cheap; no I/O).
|
||||
fn state(&self) -> InstanceState;
|
||||
|
||||
/// Subscribe to state transitions.
|
||||
fn watch_state(&self) -> watch::Receiver<InstanceState>;
|
||||
|
||||
/// Seconds since the instance entered `Running` (0 otherwise).
|
||||
async fn uptime_seconds(&self) -> u64;
|
||||
|
||||
async fn start(self: Arc<Self>) -> Result<()>;
|
||||
async fn stop(self: Arc<Self>) -> Result<()>;
|
||||
async fn restart(self: Arc<Self>) -> Result<()>;
|
||||
}
|
||||
@@ -129,7 +129,7 @@ pub async fn collect(agent: &Agent, sys: &mut System) -> HeartbeatPayload {
|
||||
let mut instances = Vec::with_capacity(agent.cfg.instances.len());
|
||||
for inst in &agent.cfg.instances {
|
||||
let (state, uptime_seconds) = match agent.supervisors.get(&inst.id) {
|
||||
Some(sup) if !matches!(sup.state(), crate::process::InstanceState::Unmanaged) => {
|
||||
Some(sup) if !matches!(sup.state(), crate::supervisor::InstanceState::Unmanaged) => {
|
||||
(sup.state().as_label().to_string(), sup.uptime_seconds().await)
|
||||
}
|
||||
_ => {
|
||||
|
||||
154
corrosion-host-agent/src/update.rs
Normal file
@@ -0,0 +1,154 @@
|
||||
//! Signed self-update.
|
||||
//!
|
||||
//! The agent only ever runs a binary whose minisign signature verifies against
|
||||
//! the EMBEDDED public key below. Even if the CDN (which currently accepts
|
||||
//! unauthenticated uploads) served a malicious binary, the agent refuses it
|
||||
//! without a valid signature from the release private key (a CI secret).
|
||||
//!
|
||||
//! Flow: download binary + `.minisig` from the CDN → verify signature →
|
||||
//! atomic swap (current → `.old`, new → current, rollback on failure) →
|
||||
//! relaunch the new binary. Defence in depth mirrors the Vigilance updater:
|
||||
//! a real URL parse rejecting credential-in-URL bypasses, an https + host
|
||||
//! allowlist, and a size cap.
|
||||
|
||||
use anyhow::{bail, Context, Result};
|
||||
use minisign_verify::{PublicKey, Signature};
|
||||
use std::path::{Path, PathBuf};
|
||||
use std::time::Duration;
|
||||
|
||||
/// minisign public key. The matching private key signs releases in CI
|
||||
/// (Gitea Actions secret MINISIGN_SECRET_KEY). Rotating it means re-signing
|
||||
/// every published artifact and shipping an agent build with the new key.
|
||||
const PUBLIC_KEY: &str = "RWQKhJptuiwIkp31cZdz10z/R72UPZkl7/VtnZJ2Vfbe0dQfDlXHZYFC";
|
||||
|
||||
const ALLOWED_HOST: &str = "cdn.corrosionmgmt.com";
|
||||
const MAX_BINARY_BYTES: usize = 100 * 1024 * 1024; // 100 MiB sanity cap
|
||||
const DOWNLOAD_TIMEOUT: Duration = Duration::from_secs(600);
|
||||
|
||||
/// Verify a binary against the embedded public key + a minisign signature blob.
|
||||
/// The security core of self-update — tampered or unsigned content is rejected.
|
||||
pub fn verify_signature(binary: &[u8], signature_blob: &str) -> Result<()> {
|
||||
let pk = PublicKey::from_base64(PUBLIC_KEY).context("embedded public key is invalid")?;
|
||||
let sig = Signature::decode(signature_blob).context("malformed minisign signature")?;
|
||||
pk.verify(binary, &sig, false)
|
||||
.map_err(|e| anyhow::anyhow!("signature verification failed: {e}"))?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Reject anything but `https://cdn.corrosionmgmt.com/...` with no embedded
|
||||
/// credentials (the userinfo-bypass class).
|
||||
pub fn assert_url_allowed(url: &str) -> Result<()> {
|
||||
let parsed = reqwest::Url::parse(url).context("invalid update URL")?;
|
||||
if parsed.scheme() != "https" {
|
||||
bail!("update URL must be https");
|
||||
}
|
||||
if !parsed.username().is_empty() || parsed.password().is_some() {
|
||||
bail!("update URL must not contain credentials");
|
||||
}
|
||||
if parsed.host_str() != Some(ALLOWED_HOST) {
|
||||
bail!("update URL host not allowed: {:?}", parsed.host_str());
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Download, verify, and atomically swap in a new agent binary. Does NOT
|
||||
/// restart — the caller decides when to relaunch (after replying on NATS).
|
||||
/// Returns the path of the now-current (new) binary.
|
||||
pub async fn download_verify_swap(url: &str) -> Result<PathBuf> {
|
||||
assert_url_allowed(url)?;
|
||||
let sig_url = format!("{url}.minisig");
|
||||
assert_url_allowed(&sig_url)?;
|
||||
|
||||
let client = reqwest::Client::builder()
|
||||
.timeout(DOWNLOAD_TIMEOUT)
|
||||
.build()
|
||||
.context("building HTTP client")?;
|
||||
|
||||
let binary = client
|
||||
.get(url)
|
||||
.send()
|
||||
.await
|
||||
.with_context(|| format!("downloading {url}"))?
|
||||
.error_for_status()
|
||||
.context("update binary download failed")?
|
||||
.bytes()
|
||||
.await
|
||||
.context("reading update binary")?;
|
||||
|
||||
if binary.len() > MAX_BINARY_BYTES {
|
||||
bail!("update binary is {} bytes, exceeds the {MAX_BINARY_BYTES} cap", binary.len());
|
||||
}
|
||||
|
||||
let signature = client
|
||||
.get(&sig_url)
|
||||
.send()
|
||||
.await
|
||||
.with_context(|| format!("downloading {sig_url}"))?
|
||||
.error_for_status()
|
||||
.context("signature download failed")?
|
||||
.text()
|
||||
.await
|
||||
.context("reading signature")?;
|
||||
|
||||
verify_signature(&binary, &signature).context("refusing unsigned/tampered update")?;
|
||||
tracing::info!("update signature verified ({} bytes)", binary.len());
|
||||
|
||||
let current = std::env::current_exe().context("resolving current executable")?;
|
||||
swap_binary(¤t, &binary)?;
|
||||
tracing::info!("update swapped in at {}", current.display());
|
||||
Ok(current)
|
||||
}
|
||||
|
||||
/// Atomically replace `current` with `new_bytes`, keeping a `.old` backup and
|
||||
/// rolling back if the rename fails.
|
||||
pub fn swap_binary(current: &Path, new_bytes: &[u8]) -> Result<()> {
|
||||
let dir = current.parent().unwrap_or_else(|| Path::new("."));
|
||||
let stem = current.file_name().and_then(|s| s.to_str()).unwrap_or("corrosion-host-agent");
|
||||
let new_path = dir.join(format!("{stem}.new"));
|
||||
let backup = dir.join(format!("{stem}.old"));
|
||||
|
||||
std::fs::write(&new_path, new_bytes)
|
||||
.with_context(|| format!("writing {}", new_path.display()))?;
|
||||
|
||||
#[cfg(unix)]
|
||||
{
|
||||
use std::os::unix::fs::PermissionsExt;
|
||||
std::fs::set_permissions(&new_path, std::fs::Permissions::from_mode(0o755))
|
||||
.context("chmod +x on new binary")?;
|
||||
}
|
||||
|
||||
let _ = std::fs::remove_file(&backup);
|
||||
std::fs::rename(current, &backup)
|
||||
.with_context(|| format!("backing up current binary to {}", backup.display()))?;
|
||||
|
||||
if let Err(e) = std::fs::rename(&new_path, current) {
|
||||
// Roll back: restore the backup so the agent stays runnable.
|
||||
let _ = std::fs::rename(&backup, current);
|
||||
return Err(anyhow::anyhow!(e).context("installing new binary (rolled back)"));
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Relaunch the (already-swapped) binary with the same args, then exit. No
|
||||
/// service manager is required — the new process reconnects on its own. There
|
||||
/// is a sub-second window with no agent; acceptable for an update.
|
||||
pub fn relaunch_and_exit() -> ! {
|
||||
let exe = std::env::current_exe().unwrap_or_else(|_| PathBuf::from("corrosion-host-agent"));
|
||||
let args: Vec<String> = std::env::args().skip(1).collect();
|
||||
tracing::info!("relaunching {} after update", exe.display());
|
||||
|
||||
#[cfg(unix)]
|
||||
{
|
||||
use std::os::unix::process::CommandExt;
|
||||
// exec replaces this process image with the new binary — cleanest,
|
||||
// no gap. Only returns on failure.
|
||||
let err = std::process::Command::new(&exe).args(&args).exec();
|
||||
tracing::error!("exec after update failed: {err}; exiting for service restart");
|
||||
std::process::exit(70);
|
||||
}
|
||||
#[cfg(not(unix))]
|
||||
{
|
||||
let _ = std::process::Command::new(&exe).args(&args).spawn();
|
||||
std::process::exit(0);
|
||||
}
|
||||
}
|
||||
156
corrosion-host-agent/tests/docker_compose.rs
Normal file
@@ -0,0 +1,156 @@
|
||||
//! DockerComposeSupervisor tests. A fake `docker` script records the exact
|
||||
//! arguments it was invoked with and returns a controllable exit code, so we
|
||||
//! assert the compose invocations + state transitions with no real Docker
|
||||
//! daemon — the same mock-the-external-binary approach the steamcmd tests use.
|
||||
#![cfg(unix)]
|
||||
|
||||
use std::os::unix::fs::PermissionsExt;
|
||||
use std::path::{Path, PathBuf};
|
||||
|
||||
use corrosion_host_agent::config::InstanceConfig;
|
||||
use corrosion_host_agent::docker_compose::{DockerComposeConfig, DockerComposeSupervisor};
|
||||
use corrosion_host_agent::supervisor::{InstanceState, Supervisor};
|
||||
|
||||
/// Write a fake `docker` executable that appends its args (space-joined) to
|
||||
/// `args_log` and exits with the integer in `exit_file` (0 if absent).
|
||||
fn fake_docker(dir: &Path, args_log: &Path, exit_file: &Path) -> PathBuf {
|
||||
let script = dir.join("fakedocker");
|
||||
let body = format!(
|
||||
"#!/bin/sh\nprintf '%s\\n' \"$*\" >> '{}'\nexit \"$(cat '{}' 2>/dev/null || echo 0)\"\n",
|
||||
args_log.display(),
|
||||
exit_file.display(),
|
||||
);
|
||||
std::fs::write(&script, body).unwrap();
|
||||
let mut perms = std::fs::metadata(&script).unwrap().permissions();
|
||||
perms.set_mode(0o755);
|
||||
std::fs::set_permissions(&script, perms).unwrap();
|
||||
script
|
||||
}
|
||||
|
||||
fn dune_instance(command: Vec<String>, service: Option<String>) -> InstanceConfig {
|
||||
InstanceConfig {
|
||||
id: "dune-main".to_string(),
|
||||
game: "dune".to_string(),
|
||||
root: PathBuf::from("/tmp"),
|
||||
label: None,
|
||||
executable: None,
|
||||
args: vec![],
|
||||
working_dir: None,
|
||||
rcon: None,
|
||||
steamcmd: None,
|
||||
docker_compose: Some(DockerComposeConfig {
|
||||
file: Some(PathBuf::from("docker-compose.yml")),
|
||||
project: Some("duneproj".to_string()),
|
||||
service,
|
||||
command: Some(command),
|
||||
}),
|
||||
}
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn start_runs_compose_up_detached_and_sets_running() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let args_log = dir.path().join("args.log");
|
||||
let exit_file = dir.path().join("exit");
|
||||
let docker = fake_docker(dir.path(), &args_log, &exit_file);
|
||||
|
||||
let sup = DockerComposeSupervisor::new(&dune_instance(
|
||||
vec![docker.to_string_lossy().into_owned()],
|
||||
None,
|
||||
));
|
||||
assert_eq!(sup.state(), InstanceState::Stopped);
|
||||
|
||||
sup.clone().start().await.expect("compose up should succeed");
|
||||
assert_eq!(sup.state(), InstanceState::Running);
|
||||
|
||||
let logged = std::fs::read_to_string(&args_log).unwrap();
|
||||
assert!(logged.contains("up -d"), "expected `up -d`; got: {logged}");
|
||||
assert!(logged.contains("-p duneproj"), "expected project flag; got: {logged}");
|
||||
assert!(logged.contains("-f docker-compose.yml"), "expected file flag; got: {logged}");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn stop_runs_compose_stop_and_sets_stopped() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let args_log = dir.path().join("args.log");
|
||||
let exit_file = dir.path().join("exit");
|
||||
let docker = fake_docker(dir.path(), &args_log, &exit_file);
|
||||
|
||||
let sup = DockerComposeSupervisor::new(&dune_instance(
|
||||
vec![docker.to_string_lossy().into_owned()],
|
||||
None,
|
||||
));
|
||||
sup.clone().start().await.expect("up");
|
||||
sup.clone().stop().await.expect("compose stop should succeed");
|
||||
assert_eq!(sup.state(), InstanceState::Stopped);
|
||||
assert_eq!(sup.uptime_seconds().await, 0);
|
||||
|
||||
let logged = std::fs::read_to_string(&args_log).unwrap();
|
||||
assert!(logged.lines().any(|l| l.contains("stop")), "expected a `stop` call; got: {logged}");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn restart_runs_compose_restart() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let args_log = dir.path().join("args.log");
|
||||
let exit_file = dir.path().join("exit");
|
||||
let docker = fake_docker(dir.path(), &args_log, &exit_file);
|
||||
|
||||
let sup = DockerComposeSupervisor::new(&dune_instance(
|
||||
vec![docker.to_string_lossy().into_owned()],
|
||||
None,
|
||||
));
|
||||
sup.clone().restart().await.expect("compose restart should succeed");
|
||||
assert_eq!(sup.state(), InstanceState::Running);
|
||||
|
||||
let logged = std::fs::read_to_string(&args_log).unwrap();
|
||||
assert!(logged.contains("restart"), "expected `restart`; got: {logged}");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn single_service_is_targeted() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let args_log = dir.path().join("args.log");
|
||||
let exit_file = dir.path().join("exit");
|
||||
let docker = fake_docker(dir.path(), &args_log, &exit_file);
|
||||
|
||||
let sup = DockerComposeSupervisor::new(&dune_instance(
|
||||
vec![docker.to_string_lossy().into_owned()],
|
||||
Some("gameserver".to_string()),
|
||||
));
|
||||
sup.clone().start().await.expect("up");
|
||||
|
||||
let logged = std::fs::read_to_string(&args_log).unwrap();
|
||||
assert!(
|
||||
logged.contains("up -d gameserver"),
|
||||
"service must be appended after `up -d`; got: {logged}"
|
||||
);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn compose_failure_errors_and_reverts_state() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let args_log = dir.path().join("args.log");
|
||||
let exit_file = dir.path().join("exit");
|
||||
std::fs::write(&exit_file, "1").unwrap(); // make the fake docker fail
|
||||
let docker = fake_docker(dir.path(), &args_log, &exit_file);
|
||||
|
||||
let sup = DockerComposeSupervisor::new(&dune_instance(
|
||||
vec![docker.to_string_lossy().into_owned()],
|
||||
None,
|
||||
));
|
||||
let err = sup.clone().start().await.expect_err("nonzero compose exit must fail");
|
||||
assert!(err.to_string().contains("compose up failed"), "got: {err}");
|
||||
assert_eq!(sup.state(), InstanceState::Stopped, "failed start must revert to Stopped");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn missing_docker_binary_errors_cleanly() {
|
||||
let sup = DockerComposeSupervisor::new(&dune_instance(
|
||||
vec!["/nonexistent/docker-xyz".to_string()],
|
||||
None,
|
||||
));
|
||||
let err = sup.clone().start().await.expect_err("missing docker must fail");
|
||||
assert!(err.to_string().contains("docker"), "error should mention docker: {err}");
|
||||
assert_eq!(sup.state(), InstanceState::Stopped);
|
||||
}
|
||||
@@ -347,6 +347,62 @@ fn jail_rejects_chained_symlink_escape() {
|
||||
);
|
||||
}
|
||||
|
||||
/// SECURITY REGRESSION: copying a directory that contains a symlink pointing
|
||||
/// OUTSIDE the jail must NOT dereference it and pull external content inside.
|
||||
/// jail() validates only the top-level src/dest; the recursive copy must
|
||||
/// refuse symlinks itself or it becomes a read-escape exfiltration path.
|
||||
#[cfg(unix)]
|
||||
#[test]
|
||||
fn copy_refuses_to_follow_symlink_out_of_jail() {
|
||||
let dir = tempdir();
|
||||
let root = dir.path();
|
||||
let outside = tempdir();
|
||||
std::fs::write(outside.path().join("secret.txt"), "TOP SECRET")
|
||||
.expect("write external secret");
|
||||
|
||||
// A directory inside the jail containing a symlink to the outside dir.
|
||||
std::fs::create_dir(root.join("src")).expect("mkdir src");
|
||||
std::os::unix::fs::symlink(outside.path(), root.join("src").join("escape"))
|
||||
.expect("plant symlink to outside");
|
||||
|
||||
// Attempt to copy src -> dest (both inside the jail).
|
||||
let err = filemanager::copy(root, "src", "dest")
|
||||
.expect_err("copy must refuse the embedded symlink");
|
||||
assert!(
|
||||
format!("{err:#}").contains("symlink"),
|
||||
"error should name the refused symlink, got: {err:#}"
|
||||
);
|
||||
|
||||
// The external secret must NOT have landed inside the jail.
|
||||
assert!(
|
||||
!root.join("dest").join("escape").join("secret.txt").exists(),
|
||||
"external content leaked into the jail via symlink-following copy",
|
||||
);
|
||||
}
|
||||
|
||||
/// `list` must report a symlink as the link itself, never the dereferenced
|
||||
/// target — otherwise it leaks the size/type of files outside the jail.
|
||||
#[cfg(unix)]
|
||||
#[test]
|
||||
fn list_does_not_dereference_symlink_metadata() {
|
||||
let dir = tempdir();
|
||||
let root = dir.path();
|
||||
std::os::unix::fs::symlink(Path::new("/etc/passwd"), root.join("leak"))
|
||||
.expect("plant symlink");
|
||||
|
||||
let entries = filemanager::list(root, "").expect("list root");
|
||||
let leak = entries.iter().find(|e| e.name == "leak").expect("symlink listed");
|
||||
// /etc/passwd is a regular file; if we followed the link, is_dir would
|
||||
// reflect the target. We must report the link, which is not a directory,
|
||||
// and must NOT expose the target's byte size.
|
||||
assert!(!leak.is_dir, "symlink must not be reported as a directory");
|
||||
let target_size = std::fs::metadata("/etc/passwd").map(|m| m.len()).unwrap_or(0);
|
||||
assert!(
|
||||
leak.size != target_size || target_size == 0,
|
||||
"list leaked the symlink target's size ({target_size} bytes)"
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Dispatch layer tests
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
2
corrosion-host-agent/tests/fixtures/sample.bin
vendored
Normal file
@@ -0,0 +1,2 @@
|
||||
corrosion-host-agent signed-update test fixture
|
||||
version 2.0.0-test
|
||||
4
corrosion-host-agent/tests/fixtures/sample.bin.minisig
vendored
Normal file
@@ -0,0 +1,4 @@
|
||||
untrusted comment: signature from minisign secret key
|
||||
RUQKhJptuiwIkp378Z59BTwosDycAhmlhrdZZVwk1Vdb293OgcsXx0S3W0XezMtOXIXdgvQtW/DpDKlb1gdW4elQXLG5KFUgawI=
|
||||
trusted comment: timestamp:1781222247 file:sample.bin hashed
|
||||
QtUiOfJqRKYJZTL6QV93xeLVnODr8HXWvZIR3Q1AG0yqmqesZPyiKpVa9kD34Mwp1fQ76nx1Z7c6CB1v5KHQAw==
|
||||
@@ -8,7 +8,8 @@ use std::path::PathBuf;
|
||||
use std::time::Duration;
|
||||
|
||||
use corrosion_host_agent::config::InstanceConfig;
|
||||
use corrosion_host_agent::process::{InstanceState, ProcessSupervisor};
|
||||
use corrosion_host_agent::process::ProcessSupervisor;
|
||||
use corrosion_host_agent::supervisor::{InstanceState, Supervisor};
|
||||
|
||||
fn managed_instance(executable: &str, args: &[&str]) -> InstanceConfig {
|
||||
InstanceConfig {
|
||||
@@ -21,6 +22,7 @@ fn managed_instance(executable: &str, args: &[&str]) -> InstanceConfig {
|
||||
working_dir: None,
|
||||
rcon: None,
|
||||
steamcmd: None,
|
||||
docker_compose: None,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -47,15 +49,15 @@ async fn start_status_stop_lifecycle() {
|
||||
let sup = ProcessSupervisor::new(&managed_instance("/bin/sleep", &["300"]));
|
||||
assert_eq!(sup.state(), InstanceState::Stopped);
|
||||
|
||||
sup.start().await.expect("start should succeed");
|
||||
sup.clone().start().await.expect("start should succeed");
|
||||
assert_eq!(sup.state(), InstanceState::Running);
|
||||
tokio::time::sleep(Duration::from_millis(1100)).await;
|
||||
assert!(sup.uptime_seconds().await >= 1, "uptime should advance");
|
||||
|
||||
// Double-start must be rejected while running.
|
||||
assert!(sup.start().await.is_err(), "double start must fail");
|
||||
assert!(sup.clone().start().await.is_err(), "double start must fail");
|
||||
|
||||
sup.stop().await.expect("stop should succeed");
|
||||
sup.clone().stop().await.expect("stop should succeed");
|
||||
let state = wait_for_state(&sup, |s| matches!(s, InstanceState::Stopped), Duration::from_secs(5)).await;
|
||||
assert_eq!(state, InstanceState::Stopped);
|
||||
assert_eq!(sup.uptime_seconds().await, 0);
|
||||
@@ -64,7 +66,7 @@ async fn start_status_stop_lifecycle() {
|
||||
#[tokio::test]
|
||||
async fn unexpected_exit_is_crashed_with_code() {
|
||||
let sup = ProcessSupervisor::new(&managed_instance("/bin/sh", &["-c", "sleep 0.2; exit 7"]));
|
||||
sup.start().await.expect("start should succeed");
|
||||
sup.clone().start().await.expect("start should succeed");
|
||||
|
||||
let state = wait_for_state(
|
||||
&sup,
|
||||
@@ -78,16 +80,16 @@ async fn unexpected_exit_is_crashed_with_code() {
|
||||
#[tokio::test]
|
||||
async fn restart_from_crashed_recovers() {
|
||||
let sup = ProcessSupervisor::new(&managed_instance("/bin/sh", &["-c", "exit 1"]));
|
||||
sup.start().await.expect("start should succeed");
|
||||
sup.clone().start().await.expect("start should succeed");
|
||||
wait_for_state(&sup, |s| matches!(s, InstanceState::Crashed { .. }), Duration::from_secs(5)).await;
|
||||
|
||||
// Restart from crashed must work (panel "Restart" after a crash).
|
||||
// Use a long-lived command this time by replacing the supervisor — the
|
||||
// command is fixed per supervisor, so emulate via a fresh one.
|
||||
let sup2 = ProcessSupervisor::new(&managed_instance("/bin/sleep", &["300"]));
|
||||
sup2.restart().await.expect("restart from stopped should start");
|
||||
sup2.clone().restart().await.expect("restart from stopped should start");
|
||||
assert_eq!(sup2.state(), InstanceState::Running);
|
||||
sup2.stop().await.expect("cleanup stop");
|
||||
sup2.clone().stop().await.expect("cleanup stop");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
@@ -96,14 +98,14 @@ async fn unmanaged_instance_rejects_process_commands() {
|
||||
cfg.executable = None;
|
||||
let sup = ProcessSupervisor::new(&cfg);
|
||||
assert_eq!(sup.state(), InstanceState::Unmanaged);
|
||||
assert!(sup.start().await.is_err(), "unmanaged start must fail");
|
||||
assert!(sup.stop().await.is_err(), "unmanaged stop must fail");
|
||||
assert!(sup.clone().start().await.is_err(), "unmanaged start must fail");
|
||||
assert!(sup.clone().stop().await.is_err(), "unmanaged stop must fail");
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn missing_executable_fails_cleanly() {
|
||||
let sup = ProcessSupervisor::new(&managed_instance("/nonexistent/bin/gameserver", &[]));
|
||||
let err = sup.start().await.expect_err("must fail");
|
||||
let err = sup.clone().start().await.expect_err("must fail");
|
||||
assert!(err.to_string().contains("not found"), "error should say not found: {err}");
|
||||
assert_eq!(sup.state(), InstanceState::Stopped, "failed start must not leave Starting state");
|
||||
}
|
||||
|
||||
63
corrosion-host-agent/tests/update.rs
Normal file
@@ -0,0 +1,63 @@
|
||||
//! Signed self-update tests — the security-critical part is signature
|
||||
//! verification: a valid signature is accepted, anything tampered is rejected.
|
||||
//! Fixtures (tests/fixtures/sample.bin + .minisig) were signed with the real
|
||||
//! release private key, so these run with no key present (as in CI).
|
||||
|
||||
use corrosion_host_agent::update;
|
||||
|
||||
const SAMPLE: &[u8] = include_bytes!("fixtures/sample.bin");
|
||||
const SAMPLE_SIG: &str = include_str!("fixtures/sample.bin.minisig");
|
||||
|
||||
#[test]
|
||||
fn accepts_a_validly_signed_binary() {
|
||||
update::verify_signature(SAMPLE, SAMPLE_SIG).expect("valid signature must verify");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rejects_a_tampered_binary() {
|
||||
let mut tampered = SAMPLE.to_vec();
|
||||
tampered[0] ^= 0xFF; // flip a byte
|
||||
let err = update::verify_signature(&tampered, SAMPLE_SIG)
|
||||
.expect_err("tampered binary must be rejected");
|
||||
assert!(err.to_string().contains("verification failed"), "got: {err}");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rejects_a_garbage_signature() {
|
||||
assert!(update::verify_signature(SAMPLE, "not a real minisig blob").is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn rejects_empty_binary_against_real_sig() {
|
||||
assert!(update::verify_signature(b"", SAMPLE_SIG).is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn url_allowlist_enforced() {
|
||||
// Allowed.
|
||||
update::assert_url_allowed("https://cdn.corrosionmgmt.com/host-agent/alpha/corrosion-host-agent-linux-amd64")
|
||||
.expect("the real CDN host must be allowed");
|
||||
// http rejected.
|
||||
assert!(update::assert_url_allowed("http://cdn.corrosionmgmt.com/x").is_err());
|
||||
// wrong host rejected.
|
||||
assert!(update::assert_url_allowed("https://evil.example.com/x").is_err());
|
||||
// credential-in-URL (userinfo bypass) rejected.
|
||||
assert!(update::assert_url_allowed("https://cdn.corrosionmgmt.com:[email protected]/x").is_err());
|
||||
// host as userinfo trick rejected (real host is evil.com).
|
||||
assert!(update::assert_url_allowed("https://[email protected]/x").is_err());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn swap_binary_replaces_and_backs_up() {
|
||||
let dir = tempfile::tempdir().expect("tempdir");
|
||||
let current = dir.path().join("corrosion-host-agent");
|
||||
std::fs::write(¤t, b"OLD BINARY").unwrap();
|
||||
|
||||
update::swap_binary(¤t, b"NEW BINARY").expect("swap should succeed");
|
||||
|
||||
assert_eq!(std::fs::read(¤t).unwrap(), b"NEW BINARY", "current is the new binary");
|
||||
let backup = dir.path().join("corrosion-host-agent.old");
|
||||
assert_eq!(std::fs::read(&backup).unwrap(), b"OLD BINARY", ".old holds the previous binary");
|
||||
// the .new scratch file is consumed by the rename
|
||||
assert!(!dir.path().join("corrosion-host-agent.new").exists());
|
||||
}
|
||||
@@ -31,6 +31,9 @@ services:
|
||||
volumes:
|
||||
- nats_data:/data
|
||||
- ./nats.conf:/etc/nats/nats.conf:ro
|
||||
# Per-license authorization (generated on the host; carries secrets, not
|
||||
# committed with real users — see scripts/generate-nats-auth.mjs).
|
||||
- ./nats-auth.conf:/etc/nats/nats-auth.conf:ro
|
||||
ports:
|
||||
- "8089:4222" # Client connections
|
||||
|
||||
@@ -43,6 +46,12 @@ services:
|
||||
DATABASE_URL: postgres://corrosion:${DB_PASSWORD:-corrosion_dev}@postgres:5432/corrosion
|
||||
DATABASE_MAX_CONNECTIONS: "20"
|
||||
NATS_URL: nats://nats:4222
|
||||
# Privileged internal NATS user (full corrosion.> access). Empty = anonymous.
|
||||
NATS_INTERNAL_USER: ${NATS_INTERNAL_USER:-}
|
||||
NATS_INTERNAL_PASSWORD: ${NATS_INTERNAL_PASSWORD:-}
|
||||
# Secret for deriving per-license agent passwords (shared with the
|
||||
# nats-auth generator). HMAC-SHA256(license_id, secret).
|
||||
NATS_TOKEN_SECRET: ${NATS_TOKEN_SECRET:-}
|
||||
JWT_SECRET: ${JWT_SECRET}
|
||||
JWT_ACCESS_EXPIRY_SECONDS: "14400"
|
||||
JWT_REFRESH_EXPIRY_SECONDS: "604800"
|
||||
|
||||
18
docker/nats-auth.conf
Normal file
@@ -0,0 +1,18 @@
|
||||
# BOOTSTRAP DEFAULT — no secrets, safe to commit.
|
||||
#
|
||||
# Anonymous is mapped to a HARMLESS namespace (corrosion.unclaimed.>), never to
|
||||
# real tenant subjects (corrosion.{uuid}.>) — so a fresh/stale deploy running
|
||||
# this default cannot read or forge any tenant's traffic. The REST API still
|
||||
# works; agent telemetry just won't flow until the real config is generated.
|
||||
#
|
||||
# On every real deploy, scripts/generate-nats-auth.mjs OVERWRITES this file
|
||||
# (on the host, not in git) with the privileged internal user + per-license
|
||||
# scoped users. NATS_AUTH_STAGE defaults to "enforce" (anonymous rejected).
|
||||
#
|
||||
# NOTE: no_auth_user is a TOP-LEVEL field, NOT inside authorization { }.
|
||||
authorization {
|
||||
users: [
|
||||
{ user: "anonymous", password: "", permissions: { publish: { allow: ["corrosion.unclaimed.>"] }, subscribe: { allow: ["corrosion.unclaimed.>"] } } }
|
||||
]
|
||||
}
|
||||
no_auth_user: "anonymous"
|
||||
@@ -28,8 +28,11 @@ logtime: true
|
||||
max_payload: 8MB # Support map file transfer metadata
|
||||
max_connections: 10000
|
||||
|
||||
# Authorization — tokens validated per-connection
|
||||
# Plugin and companion agents authenticate with license-specific tokens
|
||||
authorization {
|
||||
timeout: 5
|
||||
}
|
||||
# Authorization — per-license isolation.
|
||||
# The committed nats-auth.conf is the SAFE OPEN default (anonymous full access,
|
||||
# no secrets — same as before). On deploy, scripts/generate-nats-auth.mjs
|
||||
# regenerates this file from the licenses table with the privileged internal
|
||||
# user + per-license scoped users; flip NATS_AUTH_STAGE=enforce to reject
|
||||
# anonymous. The host copy carries secrets and is NOT committed
|
||||
# (git update-index --assume-unchanged docker/nats-auth.conf).
|
||||
include "nats-auth.conf"
|
||||
|
||||
69
docs/reference-repos/README.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Reference Repos
|
||||
|
||||
Third-party Dune: Awakening server-management projects, kept here as **behavior
|
||||
references** for Phase 2 (the Corrosion host-agent Dune adapter + future panel
|
||||
Dune features). These are NOT Corrosion code and are not built or shipped — they
|
||||
are read-only references. `.git` histories, `node_modules`, and compiled
|
||||
binaries were stripped on import (the 38 MB `icehunter/web/dune-admin` build
|
||||
artifact and a Tauri `.icns` are intentionally absent).
|
||||
|
||||
> Imported 2026-06-12 from `/tmp/dune-re`. Each was a separate upstream repo;
|
||||
> see each project's own `LICENSE` and `README.md`. Treat as documentation.
|
||||
|
||||
## Why these are here
|
||||
|
||||
Dune: Awakening does **not** use SteamCMD or a plain game-server process like
|
||||
Rust/Conan/Soulmask. It ships as **Docker container(s)** fronted by a **RabbitMQ
|
||||
broker** (admin + game vhosts) and a **PostgreSQL** admin database (`dune`
|
||||
schema), orchestrated as a "**battlegroup**". The game process is
|
||||
`DuneSandboxServer-Linux-Shipping` (one per partition). Server settings live in
|
||||
INI files (`UserEngine.ini` / `UserGame.ini`) and only take effect after a
|
||||
restart. Our Dune adapter must model that container/broker/DB world instead of
|
||||
the process+SteamCMD model — these repos are how that world actually works in
|
||||
the wild.
|
||||
|
||||
## The references
|
||||
|
||||
### `icehunter/` — `dune-admin` (Go backend + React SPA)
|
||||
The richest ops reference. A web admin panel with **four interchangeable control
|
||||
planes**: `docker`, `kubectl`, `local`, and `amp` (CubeCoders AMP / podman).
|
||||
Most relevant to us:
|
||||
- **`SETUP_DOCKER.md`** — the Docker control plane: `docker start/stop/restart`
|
||||
for lifecycle, `docker logs -f` for streaming, `docker exec` into the broker
|
||||
container for RabbitMQ (`rabbitmqctl`) commands, direct TCP to the `dune`
|
||||
Postgres. Optional SSH tunnelling when the admin is off-host. **This is the
|
||||
closest analog to what the Corrosion host-agent Dune adapter must do.**
|
||||
- `cmd/dune-admin/control_docker.go` / `control_kubectl.go` / `control_local.go`
|
||||
/ `control_amp.go` — the `ControlPlane` interface and its implementations
|
||||
(the start/stop/restart/status/log/broker abstraction we mirror as a Rust
|
||||
game-adapter trait).
|
||||
- `db.go` / `model.go` — the full Dune admin data model (players, bases,
|
||||
inventory, exchange/market) for when Corrosion grows a richer Dune admin
|
||||
surface beyond lifecycle.
|
||||
- `CLAUDE.md` — upstream's own engineering notes; the AMP section documents the
|
||||
INI-vs-API server-settings gotcha (AMP regenerates INIs on start).
|
||||
|
||||
### `adainrivers/` — Dune Dedicated Server Manager (Rust / Tauri desktop)
|
||||
**The Rust reference.** Manages already-provisioned servers over **SSH +
|
||||
Kubernetes** ("BattleGroup" start/stop/restart/update), with secure SSH tunnels
|
||||
to Director / File Browser / Postgres / PgHero, an in-game admin console (item
|
||||
grants, vehicle spawns, journey/XP tags), and a bundled **`dune-server-service`**
|
||||
daemon for scheduled maintenance (timed restarts with in-game warnings, backups,
|
||||
update apply). Closest to our stack idiomatically — read it for Rust patterns on
|
||||
SSH control, the maintenance-daemon design, and the in-game command surface.
|
||||
|
||||
### `the4rchangel/` — Dune: Awakening Server Manager (Node.js local web UI)
|
||||
**Matches the Commander's exact self-host path.** A local dashboard that
|
||||
replaces the `battlegroup.bat` terminal menu — guided VM import (Hyper-V),
|
||||
network, SSH, bootstrap, then daily ops: battlegroup start/stop/restart/update,
|
||||
character editor, visual game-config editor (PvP, sandstorms, sandworms, mining
|
||||
rates, decay, building limits), monitoring, DB access. Read it to understand the
|
||||
`battlegroup.bat` workflow our agent has to drive on a Windows/Hyper-V host.
|
||||
|
||||
## How we use them
|
||||
|
||||
- **Lifecycle/control** → mirror `icehunter`'s `ControlPlane` docker provider as
|
||||
the agent's Dune game-adapter (compose/`docker` lifecycle, `docker logs`
|
||||
console, reject SteamCMD).
|
||||
- **Rust idioms / maintenance daemon / SSH** → `adainrivers`.
|
||||
- **Battlegroup.bat reality / setup flow / game-config schema** → `the4rchangel`.
|
||||
71
docs/reference-repos/adainrivers/.github/workflows/ci.yml
vendored
Normal file
@@ -0,0 +1,71 @@
|
||||
name: CI
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
|
||||
env:
|
||||
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
|
||||
|
||||
jobs:
|
||||
checks:
|
||||
name: Workspace checks (${{ matrix.platform }})
|
||||
runs-on: ${{ matrix.platform }}
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
platform: [windows-latest, ubuntu-22.04, macos-latest]
|
||||
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Install Rust
|
||||
uses: dtolnay/rust-toolchain@stable
|
||||
|
||||
- name: Install Node
|
||||
uses: actions/setup-node@v4
|
||||
with:
|
||||
node-version: 22
|
||||
cache: npm
|
||||
cache-dependency-path: app/package-lock.json
|
||||
|
||||
- name: Install Linux Tauri dependencies
|
||||
if: matrix.platform == 'ubuntu-22.04'
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y libwebkit2gtk-4.1-dev libappindicator3-dev librsvg2-dev patchelf pkg-config libssl-dev
|
||||
|
||||
- name: Install frontend dependencies
|
||||
working-directory: app
|
||||
run: npm ci
|
||||
|
||||
- name: Rust format
|
||||
run: cargo fmt --all -- --check
|
||||
|
||||
- name: Rust check
|
||||
run: cargo check --workspace
|
||||
|
||||
- name: Rust tests
|
||||
run: cargo test --workspace
|
||||
|
||||
- name: Core API docs
|
||||
run: cargo doc -p dune-manager-core --no-deps
|
||||
|
||||
- name: Frontend build
|
||||
working-directory: app
|
||||
run: npm run build
|
||||
|
||||
- name: Tauri shell check
|
||||
run: cargo check -p dune-dedicated-server-manager-app
|
||||
|
||||
- name: Secret and machine-constant scan
|
||||
if: matrix.platform == 'windows-latest'
|
||||
shell: pwsh
|
||||
run: |
|
||||
rg -n -S "I:|AutoUpdate|192\.168\.2\.|menna|dune-awakening|C:\\WINDOWS\\System32\\OpenSSH|C:\\Windows\\System32\\OpenSSH|change-me-before-exposing|c05564d|d177d3bbc40be761|qRmQx|FuncomLiveServices__ServiceAuthToken" . -g "!app/**/target/**" -g "!crates/**/target/**" -g "!target/**" -g "!app/node_modules/**" -g "!app/dist/**" -g "!*.md" -g "!app/steamcmd/**" -g "!app/dune-server/**" -g "!app/vm/**" -g "!app/vm-*/**" -g "!vm/**" -g "!.tmp/**"
|
||||
if ($LASTEXITCODE -eq 0) {
|
||||
throw "Secret or machine-specific constant scan found matches."
|
||||
}
|
||||
if ($LASTEXITCODE -ne 1) {
|
||||
exit $LASTEXITCODE
|
||||
}
|
||||
203
docs/reference-repos/adainrivers/.github/workflows/release.yml
vendored
Normal file
@@ -0,0 +1,203 @@
|
||||
name: Release
|
||||
|
||||
on:
|
||||
push:
|
||||
tags:
|
||||
- "v*.*.*"
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
version:
|
||||
description: "Version to release, for example 0.1.0"
|
||||
required: true
|
||||
type: string
|
||||
|
||||
permissions:
|
||||
contents: write
|
||||
|
||||
env:
|
||||
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
|
||||
|
||||
jobs:
|
||||
linux-service-binary:
|
||||
name: Build dune-server-service (musl)
|
||||
runs-on: ubuntu-22.04
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Install Rust toolchain
|
||||
uses: dtolnay/rust-toolchain@stable
|
||||
with:
|
||||
targets: x86_64-unknown-linux-musl
|
||||
|
||||
- name: Install Zig
|
||||
uses: mlugg/setup-zig@v1
|
||||
with:
|
||||
version: 0.13.0
|
||||
|
||||
- name: Install cargo-zigbuild
|
||||
run: cargo install --locked cargo-zigbuild
|
||||
|
||||
- name: Resolve release version
|
||||
shell: bash
|
||||
env:
|
||||
WORKFLOW_VERSION: ${{ inputs.version }}
|
||||
run: |
|
||||
version="$WORKFLOW_VERSION"
|
||||
if [ -z "$version" ]; then
|
||||
version="${GITHUB_REF_NAME#v}"
|
||||
fi
|
||||
if [ -z "$version" ]; then
|
||||
echo "could not resolve release version" >&2
|
||||
exit 1
|
||||
fi
|
||||
echo "RELEASE_VERSION=$version" >> "$GITHUB_ENV"
|
||||
echo "RELEASE_TAG=v$version" >> "$GITHUB_ENV"
|
||||
|
||||
- name: Build musl binary
|
||||
run: |
|
||||
cargo zigbuild -p dune-server-service --release --target x86_64-unknown-linux-musl
|
||||
strip target/x86_64-unknown-linux-musl/release/dune-server-service
|
||||
|
||||
- name: Stage release artifacts
|
||||
run: |
|
||||
mkdir -p release-artifacts
|
||||
cp target/x86_64-unknown-linux-musl/release/dune-server-service release-artifacts/dune-server-service
|
||||
cp crates/dune-server-service/systemd/dune-server-service.service release-artifacts/dune-server-service.service
|
||||
cp crates/dune-server-service/openrc/dune-server-service release-artifacts/dune-server-service.openrc
|
||||
|
||||
- name: Upload artifact for desktop bundle
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: dune-server-service-musl
|
||||
path: release-artifacts/
|
||||
retention-days: 7
|
||||
|
||||
- name: Resolve release notes
|
||||
if: startsWith(github.ref, 'refs/tags/v')
|
||||
shell: bash
|
||||
run: |
|
||||
notes_path="release-notes/${RELEASE_VERSION}.md"
|
||||
if [ -f "$notes_path" ]; then
|
||||
echo "RELEASE_BODY_PATH=$notes_path" >> "$GITHUB_ENV"
|
||||
else
|
||||
tmp=$(mktemp)
|
||||
printf 'Release v%s. No release-notes/%s.md was provided — see the commit log for details.\n' \
|
||||
"$RELEASE_VERSION" "$RELEASE_VERSION" > "$tmp"
|
||||
echo "RELEASE_BODY_PATH=$tmp" >> "$GITHUB_ENV"
|
||||
fi
|
||||
|
||||
- name: Attach to GitHub release
|
||||
if: startsWith(github.ref, 'refs/tags/v')
|
||||
uses: softprops/action-gh-release@v2
|
||||
with:
|
||||
tag_name: ${{ env.RELEASE_TAG }}
|
||||
body_path: ${{ env.RELEASE_BODY_PATH }}
|
||||
files: |
|
||||
release-artifacts/dune-server-service
|
||||
release-artifacts/dune-server-service.service
|
||||
release-artifacts/dune-server-service.openrc
|
||||
|
||||
desktop-app:
|
||||
name: Build ${{ matrix.name }} app
|
||||
needs: linux-service-binary
|
||||
runs-on: ${{ matrix.platform }}
|
||||
strategy:
|
||||
fail-fast: false
|
||||
matrix:
|
||||
include:
|
||||
- name: Windows
|
||||
platform: windows-latest
|
||||
args: --bundles nsis
|
||||
- name: Linux
|
||||
platform: ubuntu-22.04
|
||||
args: --bundles appimage,deb
|
||||
- name: macOS Apple Silicon
|
||||
platform: macos-latest
|
||||
args: --target aarch64-apple-darwin --bundles dmg
|
||||
- name: macOS Intel
|
||||
platform: macos-latest
|
||||
args: --target x86_64-apple-darwin --bundles dmg
|
||||
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Install Rust
|
||||
uses: dtolnay/rust-toolchain@stable
|
||||
with:
|
||||
targets: ${{ startsWith(matrix.name, 'macOS') && 'aarch64-apple-darwin,x86_64-apple-darwin' || '' }}
|
||||
|
||||
- name: Install Node
|
||||
uses: actions/setup-node@v4
|
||||
with:
|
||||
node-version: 22
|
||||
cache: npm
|
||||
cache-dependency-path: app/package-lock.json
|
||||
|
||||
- name: Install Linux Tauri dependencies
|
||||
if: matrix.platform == 'ubuntu-22.04'
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y libwebkit2gtk-4.1-dev libappindicator3-dev librsvg2-dev patchelf pkg-config libssl-dev
|
||||
|
||||
- name: Install frontend dependencies
|
||||
working-directory: app
|
||||
run: npm ci
|
||||
|
||||
- name: Download bundled dune-server-service binary
|
||||
uses: actions/download-artifact@v4
|
||||
with:
|
||||
name: dune-server-service-musl
|
||||
path: app/src-tauri/binaries/
|
||||
|
||||
- name: Resolve release version
|
||||
shell: pwsh
|
||||
env:
|
||||
WORKFLOW_VERSION: ${{ inputs.version }}
|
||||
run: |
|
||||
$version = $env:WORKFLOW_VERSION
|
||||
if ([string]::IsNullOrWhiteSpace($version)) {
|
||||
$version = "${{ github.ref_name }}".TrimStart("v")
|
||||
}
|
||||
if ([string]::IsNullOrWhiteSpace($version)) {
|
||||
throw "Release version could not be resolved."
|
||||
}
|
||||
"RELEASE_VERSION=$version" | Out-File -FilePath $env:GITHUB_ENV -Append
|
||||
"RELEASE_TAG=v$version" | Out-File -FilePath $env:GITHUB_ENV -Append
|
||||
|
||||
- name: Prepare release config
|
||||
shell: pwsh
|
||||
run: |
|
||||
$version = $env:RELEASE_VERSION
|
||||
|
||||
Push-Location app
|
||||
npm version --no-git-tag-version --allow-same-version $version
|
||||
Pop-Location
|
||||
|
||||
$tauriConfigPath = "app/src-tauri/tauri.conf.json"
|
||||
$config = Get-Content $tauriConfigPath -Raw
|
||||
$config = $config -replace '"version":\s*"[^"]+"', ('"version": "' + $version + '"')
|
||||
# Release builds publish signed updater artifacts; the checked-in
|
||||
# default keeps this off so local debug builds do not require
|
||||
# TAURI_SIGNING_PRIVATE_KEY.
|
||||
$config = $config -replace '"createUpdaterArtifacts":\s*false', '"createUpdaterArtifacts": true'
|
||||
Set-Content -Path $tauriConfigPath -Value $config -NoNewline
|
||||
|
||||
# The body is set by the linux-service-binary job's softprops step.
|
||||
# tauri-action only uploads desktop bundles + the signed updater
|
||||
# artifacts here; we don't pass releaseBody to avoid clobbering.
|
||||
- name: Build and publish Tauri release
|
||||
uses: tauri-apps/tauri-action@v0
|
||||
env:
|
||||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
TAURI_SIGNING_PRIVATE_KEY: ${{ secrets.TAURI_SIGNING_PRIVATE_KEY }}
|
||||
TAURI_SIGNING_PRIVATE_KEY_PASSWORD: ${{ secrets.TAURI_SIGNING_PRIVATE_KEY_PASSWORD }}
|
||||
VITE_ENABLE_STARTUP_UPDATE_CHECK: "true"
|
||||
with:
|
||||
projectPath: app
|
||||
tagName: ${{ env.RELEASE_TAG }}
|
||||
releaseName: "Dune Dedicated Server Manager ${{ env.RELEASE_TAG }}"
|
||||
releaseDraft: false
|
||||
prerelease: false
|
||||
args: ${{ matrix.args }}
|
||||
68
docs/reference-repos/adainrivers/.gitignore
vendored
Normal file
@@ -0,0 +1,68 @@
|
||||
# Dependencies
|
||||
node_modules/
|
||||
app/node_modules/
|
||||
|
||||
# Frontend build
|
||||
dist/
|
||||
app/dist/
|
||||
app/src-tauri/gen/schemas/
|
||||
|
||||
# Rust/Tauri build outputs
|
||||
target/
|
||||
src-tauri/target/
|
||||
app/src-tauri/target/
|
||||
manager-api/target/
|
||||
|
||||
# Local environment
|
||||
.env
|
||||
.env.*
|
||||
!.env.example
|
||||
|
||||
# Logs
|
||||
*.log
|
||||
npm-debug.log*
|
||||
yarn-debug.log*
|
||||
yarn-error.log*
|
||||
pnpm-debug.log*
|
||||
|
||||
# Docs are scratch notes for now; keep README trackable later
|
||||
*.md
|
||||
!README.md
|
||||
!docs/
|
||||
!docs/*.md
|
||||
docs/rabbitmq-protocol.md
|
||||
# Release notes go on GitHub releases via the release workflow.
|
||||
!release-notes/
|
||||
!release-notes/*.md
|
||||
|
||||
# Editor and OS noise
|
||||
.idea/
|
||||
.vscode/
|
||||
*.swp
|
||||
*.swo
|
||||
Thumbs.db
|
||||
Desktop.ini
|
||||
|
||||
# Local app/runtime data and secrets
|
||||
.tmp/
|
||||
.playwright-mcp/
|
||||
app/default-config.json
|
||||
app/steamcmd/
|
||||
app/dune-server/
|
||||
dune-server/
|
||||
app/vm/
|
||||
app/vm-*/
|
||||
app/src-tauri/dune-server/
|
||||
app/src-tauri/vm/
|
||||
app/src-tauri/resources/manager-api/dune-manager-api
|
||||
app/src-tauri/resources/manager-api/dune-manager-api.exe
|
||||
vm/
|
||||
*.pem
|
||||
*.key
|
||||
sshKey
|
||||
codex_vm_ed25519_dropbear
|
||||
codex_vm_ed25519_dropbear.pub
|
||||
snapshots/
|
||||
keys/
|
||||
initial-setup-log.txt
|
||||
secrets/
|
||||
7156
docs/reference-repos/adainrivers/Cargo.lock
generated
Normal file
7
docs/reference-repos/adainrivers/Cargo.toml
Normal file
@@ -0,0 +1,7 @@
|
||||
[workspace]
|
||||
members = ["crates/dune-manager-core", "crates/dune-server-service", "app/src-tauri"]
|
||||
resolver = "2"
|
||||
|
||||
[workspace.dependencies]
|
||||
serde = { version = "1", features = ["derive"] }
|
||||
serde_json = "1"
|
||||
21
docs/reference-repos/adainrivers/LICENSE
Normal file
@@ -0,0 +1,21 @@
|
||||
MIT License
|
||||
|
||||
Copyright (c) 2026 gaming.tools
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
59
docs/reference-repos/adainrivers/README.md
Normal file
@@ -0,0 +1,59 @@
|
||||
# Dune Dedicated Server Manager
|
||||
|
||||
A desktop manager for existing Dune Awakening dedicated servers.
|
||||
|
||||

|
||||
|
||||
The app manages already-provisioned Dune dedicated servers over SSH and
|
||||
Kubernetes control commands. It does not install the game server, create VMs,
|
||||
configure Hyper-V, provision Ubuntu, or manage external tools such as SteamCMD.
|
||||
|
||||
## Features
|
||||
|
||||
- Remote server profile management with SSH private-key authentication
|
||||
- BattleGroup status, start, stop, restart, and update controls
|
||||
- Component diagnostics, log viewing, and safe restart actions
|
||||
- Secure Director, File Browser, PostgreSQL, and PgHero access through local SSH tunnels
|
||||
- Bundled `dune-server-service` daemon for on-host scheduled maintenance (daily restarts with in-game warnings, automated backups, server update check + apply) — installed over SSH straight from the Management card
|
||||
- Admin console for in-game actions: item grants, vehicle spawns, skill/journey/XP tags, player lookup with live pawn location, and a logged history of every published command
|
||||
- Automated tasks tab with editable schedule settings (daily restart time, warning lead/frequency, update apply lead, IANA timezone) — saving auto-restarts the service so changes apply immediately
|
||||
- Welcome Package automation: a per-player onboarding chain (item grants, water refill, welcome whisper) driven by Postgres player detection, tracked in the management service's SQLite ledger, and configurable from the Welcome Package tab with both a visual editor and a raw JSON mode
|
||||
|
||||

|
||||
|
||||
More management features coming soon.
|
||||
|
||||
## Install
|
||||
|
||||
Download the latest release for your operating system from GitHub Releases.
|
||||
|
||||
- Windows: run the NSIS installer.
|
||||
- Linux: use the AppImage or Debian package.
|
||||
- macOS: use the DMG for your Mac architecture.
|
||||
|
||||
After launching the app, add an existing server profile with its host, SSH user,
|
||||
and private key path, then refresh it to detect BattleGroups and management
|
||||
endpoints.
|
||||
|
||||
## Managed Server Assumptions
|
||||
|
||||
The target server must already be installed and reachable over SSH. The app
|
||||
expects the Dune Kubernetes resources and vendor management scripts to exist on
|
||||
the server before you add it.
|
||||
|
||||
Required player-facing/server ports depend on your own server deployment. A
|
||||
typical dedicated-server deployment uses:
|
||||
|
||||
- UDP 7777-7810 for game servers
|
||||
- TCP 31982 for RMQ
|
||||
|
||||
If you found a bug or are having other issues, please create an issue here:
|
||||
https://github.com/adainrivers/dune-dedicated-server-manager/issues
|
||||
|
||||
## Building From Source
|
||||
|
||||
See [Building From Source](docs/building-from-source.md).
|
||||
|
||||
## License
|
||||
|
||||
MIT License. See [LICENSE](LICENSE).
|
||||
15
docs/reference-repos/adainrivers/app/index.html
Normal file
@@ -0,0 +1,15 @@
|
||||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>Dune Dedicated Server Manager</title>
|
||||
<link rel="preconnect" href="https://fonts.googleapis.com">
|
||||
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
||||
<link href="https://fonts.googleapis.com/css2?family=Funnel+Display:wght@400;500;600;700&family=Geist:wght@300;400;500;600;700&family=Geist+Mono:wght@400;500;600&display=swap" rel="stylesheet">
|
||||
</head>
|
||||
<body>
|
||||
<div id="root"></div>
|
||||
<script type="module" src="/src/main.tsx"></script>
|
||||
</body>
|
||||
</html>
|
||||
3897
docs/reference-repos/adainrivers/app/package-lock.json
generated
Normal file
32
docs/reference-repos/adainrivers/app/package.json
Normal file
@@ -0,0 +1,32 @@
|
||||
{
|
||||
"name": "dune-dedicated-server-manager-app",
|
||||
"private": true,
|
||||
"version": "0.3.16",
|
||||
"type": "module",
|
||||
"scripts": {
|
||||
"dev": "vite --host 127.0.0.1 --port 1420",
|
||||
"build": "tsc && vite build",
|
||||
"preview": "vite preview --host 127.0.0.1 --port 1420",
|
||||
"tauri": "tauri"
|
||||
},
|
||||
"dependencies": {
|
||||
"@radix-ui/react-icons": "^1.3.2",
|
||||
"@radix-ui/themes": "^3.2.1",
|
||||
"@tauri-apps/api": "^2.0.0",
|
||||
"@tauri-apps/plugin-dialog": "^2.7.1",
|
||||
"@tauri-apps/plugin-process": "^2.3.1",
|
||||
"@tauri-apps/plugin-shell": "^2.3.5",
|
||||
"@tauri-apps/plugin-updater": "^2.10.1",
|
||||
"markdown-to-jsx": "^9.8.1",
|
||||
"react": "^18.3.1",
|
||||
"react-dom": "^18.3.1"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@tauri-apps/cli": "^2.0.0",
|
||||
"@types/react": "^18.3.12",
|
||||
"@types/react-dom": "^18.3.1",
|
||||
"@vitejs/plugin-react": "^4.3.3",
|
||||
"typescript": "^5.6.3",
|
||||
"vite": "^5.4.10"
|
||||
}
|
||||
}
|
||||
26
docs/reference-repos/adainrivers/app/src-tauri/Cargo.toml
Normal file
@@ -0,0 +1,26 @@
|
||||
[package]
|
||||
name = "dune-dedicated-server-manager-app"
|
||||
version = "0.2.0"
|
||||
description = "Desktop shell for Dune Dedicated Server Manager"
|
||||
authors = ["Dune Dedicated Server Manager"]
|
||||
edition = "2021"
|
||||
|
||||
[lib]
|
||||
name = "dune_dedicated_server_manager_app_lib"
|
||||
crate-type = ["staticlib", "cdylib", "rlib"]
|
||||
|
||||
[build-dependencies]
|
||||
tauri-build = { version = "2", features = [] }
|
||||
|
||||
[dependencies]
|
||||
dune-manager-core = { path = "../../crates/dune-manager-core" }
|
||||
tauri = { version = "2", features = ["devtools"] }
|
||||
serde = { workspace = true }
|
||||
serde_json = { workspace = true }
|
||||
tauri-plugin-dialog = "2"
|
||||
tauri-plugin-updater = "2"
|
||||
tauri-plugin-process = "2"
|
||||
tauri-plugin-shell = "2"
|
||||
base64 = "0.22"
|
||||
chrono = { version = "0.4", default-features = false, features = ["clock", "std"] }
|
||||
reqwest = { version = "0.12", default-features = false, features = ["json"] }
|
||||
6
docs/reference-repos/adainrivers/app/src-tauri/binaries/.gitignore
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
# Populated by CI from the `linux-service-binary` job artifact, or locally
|
||||
# via `cargo zigbuild -p dune-server-service --release --target
|
||||
# x86_64-unknown-linux-musl` + manual copy. Not tracked.
|
||||
dune-server-service
|
||||
dune-server-service.service
|
||||
dune-server-service.openrc
|
||||
@@ -0,0 +1,23 @@
|
||||
# Bundled service binaries
|
||||
|
||||
This directory holds the Linux `dune-server-service` binary (musl-static), its
|
||||
systemd unit, and its OpenRC init script. They are populated by the
|
||||
`linux-service-binary` job in `.github/workflows/release.yml` and bundled into
|
||||
the desktop installer as Tauri resources.
|
||||
|
||||
For local debug builds the directory can be empty — the `install_management_service`
|
||||
Tauri command surfaces a friendly error when the resource is missing.
|
||||
|
||||
For a local end-to-end test, build the service yourself:
|
||||
|
||||
```powershell
|
||||
rustup target add x86_64-unknown-linux-musl
|
||||
cargo install --locked cargo-zigbuild
|
||||
cargo zigbuild -p dune-server-service --release --target x86_64-unknown-linux-musl
|
||||
Copy-Item target\x86_64-unknown-linux-musl\release\dune-server-service `
|
||||
app\src-tauri\binaries\dune-server-service
|
||||
Copy-Item crates\dune-server-service\systemd\dune-server-service.service `
|
||||
app\src-tauri\binaries\dune-server-service.service
|
||||
Copy-Item crates\dune-server-service\openrc\dune-server-service `
|
||||
app\src-tauri\binaries\dune-server-service.openrc
|
||||
```
|
||||
67
docs/reference-repos/adainrivers/app/src-tauri/build.rs
Normal file
@@ -0,0 +1,67 @@
|
||||
fn main() {
|
||||
expose_dune_server_service_version();
|
||||
rerun_if_bundled_binaries_change();
|
||||
tauri_build::build();
|
||||
}
|
||||
|
||||
/// Tauri's resource-copy step only fires when Cargo decides build.rs needs to
|
||||
/// re-run, which by default doesn't watch arbitrary files. Without these
|
||||
/// `rerun-if-changed` lines, refreshing the bundled `dune-server-service`
|
||||
/// binary or its systemd/openrc units in `binaries/` after a previous build
|
||||
/// produces a stale `target/release/binaries/` copy — the running exe then
|
||||
/// pushes the OLD binary on Install/Update, with no visible signal.
|
||||
fn rerun_if_bundled_binaries_change() {
|
||||
let dir = std::path::Path::new(env!("CARGO_MANIFEST_DIR")).join("binaries");
|
||||
// Watch the directory itself so file additions/deletions also trigger a rerun.
|
||||
println!("cargo:rerun-if-changed={}", dir.display());
|
||||
if let Ok(entries) = std::fs::read_dir(&dir) {
|
||||
for entry in entries.flatten() {
|
||||
let path = entry.path();
|
||||
// Skip README, .gitignore, and similar bookkeeping files.
|
||||
if matches!(
|
||||
path.file_name().and_then(|n| n.to_str()),
|
||||
Some("README.md") | Some(".gitignore")
|
||||
) {
|
||||
continue;
|
||||
}
|
||||
println!("cargo:rerun-if-changed={}", path.display());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn expose_dune_server_service_version() {
|
||||
let cargo_toml = std::path::Path::new(env!("CARGO_MANIFEST_DIR"))
|
||||
.join("../../crates/dune-server-service/Cargo.toml");
|
||||
println!("cargo:rerun-if-changed={}", cargo_toml.display());
|
||||
let contents = std::fs::read_to_string(&cargo_toml)
|
||||
.unwrap_or_else(|err| panic!("reading {}: {err}", cargo_toml.display()));
|
||||
let version = parse_package_version(&contents).unwrap_or_else(|| {
|
||||
panic!(
|
||||
"could not find [package].version in {}",
|
||||
cargo_toml.display()
|
||||
)
|
||||
});
|
||||
println!("cargo:rustc-env=DUNE_SERVER_SERVICE_VERSION={version}");
|
||||
}
|
||||
|
||||
fn parse_package_version(toml: &str) -> Option<String> {
|
||||
let mut in_package = false;
|
||||
for line in toml.lines() {
|
||||
let trimmed = line.trim();
|
||||
if trimmed.starts_with('[') {
|
||||
in_package = trimmed == "[package]";
|
||||
continue;
|
||||
}
|
||||
if !in_package {
|
||||
continue;
|
||||
}
|
||||
if let Some(rest) = trimmed.strip_prefix("version") {
|
||||
let rest = rest.trim_start();
|
||||
let rest = rest.strip_prefix('=')?.trim_start();
|
||||
let rest = rest.trim_start_matches('"');
|
||||
let end = rest.find('"')?;
|
||||
return Some(rest[..end].to_string());
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
@@ -0,0 +1,7 @@
|
||||
{
|
||||
"$schema": "../gen/schemas/desktop-schema.json",
|
||||
"identifier": "default",
|
||||
"description": "Default desktop app permissions",
|
||||
"windows": ["main"],
|
||||
"permissions": ["core:default", "dialog:allow-open", "process:default", "shell:allow-open", "updater:default"]
|
||||
}
|
||||
BIN
docs/reference-repos/adainrivers/app/src-tauri/icons/128x128.png
Normal file
|
After Width: | Height: | Size: 24 KiB |
|
After Width: | Height: | Size: 65 KiB |
BIN
docs/reference-repos/adainrivers/app/src-tauri/icons/32x32.png
Normal file
|
After Width: | Height: | Size: 2.4 KiB |
BIN
docs/reference-repos/adainrivers/app/src-tauri/icons/64x64.png
Normal file
|
After Width: | Height: | Size: 7.3 KiB |
|
After Width: | Height: | Size: 18 KiB |
|
After Width: | Height: | Size: 28 KiB |
|
After Width: | Height: | Size: 31 KiB |
|
After Width: | Height: | Size: 82 KiB |
|
After Width: | Height: | Size: 2.1 KiB |
|
After Width: | Height: | Size: 94 KiB |
|
After Width: | Height: | Size: 3.9 KiB |
|
After Width: | Height: | Size: 8.7 KiB |
|
After Width: | Height: | Size: 13 KiB |
|
After Width: | Height: | Size: 4.8 KiB |
@@ -0,0 +1,5 @@
|
||||
<?xml version="1.0" encoding="utf-8"?>
|
||||
<adaptive-icon xmlns:android="http://schemas.android.com/apk/res/android">
|
||||
<foreground android:drawable="@mipmap/ic_launcher_foreground"/>
|
||||
<background android:drawable="@color/ic_launcher_background"/>
|
||||
</adaptive-icon>
|
||||
|
After Width: | Height: | Size: 4.0 KiB |
|
After Width: | Height: | Size: 35 KiB |
|
After Width: | Height: | Size: 4.0 KiB |
|
After Width: | Height: | Size: 3.9 KiB |
|
After Width: | Height: | Size: 18 KiB |
|
After Width: | Height: | Size: 3.8 KiB |
|
After Width: | Height: | Size: 12 KiB |
|
After Width: | Height: | Size: 55 KiB |
|
After Width: | Height: | Size: 12 KiB |
|
After Width: | Height: | Size: 23 KiB |
|
After Width: | Height: | Size: 100 KiB |
|
After Width: | Height: | Size: 25 KiB |
|
After Width: | Height: | Size: 37 KiB |
|
After Width: | Height: | Size: 152 KiB |
|
After Width: | Height: | Size: 40 KiB |
@@ -0,0 +1,4 @@
|
||||
<?xml version="1.0" encoding="utf-8"?>
|
||||
<resources>
|
||||
<color name="ic_launcher_background">#fff</color>
|
||||
</resources>
|
||||
BIN
docs/reference-repos/adainrivers/app/src-tauri/icons/icon.ico
Normal file
|
After Width: | Height: | Size: 83 KiB |
BIN
docs/reference-repos/adainrivers/app/src-tauri/icons/icon.png
Normal file
|
After Width: | Height: | Size: 193 KiB |
|
After Width: | Height: | Size: 1005 B |
|
After Width: | Height: | Size: 2.9 KiB |
|
After Width: | Height: | Size: 2.9 KiB |
|
After Width: | Height: | Size: 5.7 KiB |