Files
corrosion-admin-panel/corrosion-host-agent/src/supervisor.rs
Vantz Stockwell d13f2cb8b1
Some checks failed
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Failing after 35s
CI / integration (push) Has been skipped
Build Host Agent (Rust) / build (push) Successful in 1m45s
feat(host-agent): Phase 2 — Dune docker-compose adapter via Supervisor trait
Introduce a Supervisor trait (async-trait) so the agent manages games with
different models behind one wire contract. ProcessSupervisor (spawned process:
rust/conan/soulmask) and the new DockerComposeSupervisor (dune) both impl it;
Agent.supervisors is now HashMap<String, Arc<dyn Supervisor>> and instancecmd
dispatch is game-agnostic — start/stop/restart/status identical across games,
selected by a per-game factory in main. InstanceState moved to the shared
supervisor module.

DockerComposeSupervisor drives docker-compose up-d / stop / restart against
the instance's compose project, with -f/-p/single-service support and a
configurable compose binary. New [instance.docker_compose] config block.
First cut = lifecycle + cached state; container crash-detection + restart
adoption deferred to Phase 3b (reconcilable with a compose ps probe).

Trait choice (dyn over enum) per Commander: scales to future planes (kubectl,
AMP/podman, SSH) as new struct+impl, no central match.

56 tests green (6 new docker-compose mock-binary tests + 5 refactored process
tests), zero warnings. Live verification pending a real Dune stack.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 21:33:00 -04:00

81 lines
3.0 KiB
Rust

//! The supervision abstraction.
//!
//! A `Supervisor` owns the lifecycle of one game instance. Different games are
//! managed in fundamentally different ways — Rust/Conan/Soulmask are spawned OS
//! processes ([`crate::process::ProcessSupervisor`]); Dune is a docker-compose
//! stack ([`crate::docker_compose::DockerComposeSupervisor`]); future planes
//! (kubectl, AMP/podman, SSH) will be their own impls. The instance command
//! dispatch (`instancecmd::dispatch`) talks only to this trait, so it never
//! learns which management model is behind a given instance.
//!
//! Trait objects (`Arc<dyn Supervisor>`) need object-safe, dynamically
//! dispatchable async methods; native `async fn` in traits is not yet
//! dyn-compatible, so we use `#[async_trait]` (the battle-tested ecosystem
//! standard) to box the returned futures. The cost — one heap alloc per
//! lifecycle call — is irrelevant for start/stop/restart, which happen seconds
//! to minutes apart.
use std::sync::Arc;
use anyhow::Result;
use serde::Serialize;
use tokio::sync::watch;
/// Observable lifecycle state of one instance. Shared vocabulary across every
/// supervisor impl; serialized verbatim into heartbeats and status events
/// (`{"state":"running", ...}`).
#[derive(Debug, Clone, PartialEq, Serialize)]
#[serde(rename_all = "snake_case", tag = "state")]
pub enum InstanceState {
/// Not lifecycle-managed (a process instance with no executable, etc.).
Unmanaged,
Stopped,
Starting,
Running,
Stopping,
/// Exited/died without a stop request.
Crashed {
#[serde(skip_serializing_if = "Option::is_none")]
exit_code: Option<i32>,
},
}
impl InstanceState {
pub fn as_label(&self) -> &'static str {
match self {
InstanceState::Unmanaged => "unmanaged",
InstanceState::Stopped => "stopped",
InstanceState::Starting => "starting",
InstanceState::Running => "running",
InstanceState::Stopping => "stopping",
InstanceState::Crashed { .. } => "crashed",
}
}
}
/// Lifecycle control + state observation for one instance.
///
/// `start`/`stop`/`restart` take `self: Arc<Self>` so an impl can hand a clone
/// to a spawned monitor task; callers hold an `Arc<dyn Supervisor>` and
/// `clone()` before each call. `watch_state` exposes the same channel the
/// status-event publisher drains, so panel push events stay decoupled from the
/// heartbeat cadence.
#[async_trait::async_trait]
pub trait Supervisor: Send + Sync {
/// The instance slug (a NATS subject segment).
fn instance_id(&self) -> &str;
/// Current cached state (cheap; no I/O).
fn state(&self) -> InstanceState;
/// Subscribe to state transitions.
fn watch_state(&self) -> watch::Receiver<InstanceState>;
/// Seconds since the instance entered `Running` (0 otherwise).
async fn uptime_seconds(&self) -> u64;
async fn start(self: Arc<Self>) -> Result<()>;
async fn stop(self: Arc<Self>) -> Result<()>;
async fn restart(self: Arc<Self>) -> Result<()>;
}