feat(api): outbound webhooks — server_down + player_banned events
Some checks failed
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 15s
CI / agent-tests (push) Failing after 30s
CI / integration (push) Has been skipped

Roadmap 'Webhook events': per-license outbound webhooks with HMAC-SHA256
signatures (X-Corrosion-Signature), 5s timeout, fire-and-forget (a webhook
failure never breaks the triggering action), last_delivery_at/last_status
tracked.

- migration 024_webhooks; Webhook entity (events as simple-array);
  WebhooksModule (@Global, exports WebhooksService) wired into app.module;
  CRUD controller (license-scoped, webhooks.view/manage).
- Hooked events: players.performAction ban -> 'player_banned';
  host-agent-consumer going-offline + staleness sweep -> 'server_down'.
- 'wipe_completed' event lands next (needs wipe status from the agent reply).

Backend tsc green. Migration applies on a fresh DB (Saturday).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Vantz Stockwell
2026-06-12 02:13:13 -04:00
parent 55c9893131
commit 0effaaf86c
10 changed files with 507 additions and 0 deletions

View File

@@ -7,6 +7,7 @@ import { ServerConnection } from '../entities/server-connection.entity';
import { License } from '../entities/license.entity';
import { AgentHost, AgentHostDisk } from '../entities/agent-host.entity';
import { GameInstance } from '../entities/game-instance.entity';
import { WebhooksService } from '../modules/webhooks/webhooks.service';
/**
* Consumes Corrosion wire protocol v2 host-agent subjects
@@ -64,6 +65,7 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
private readonly hostRepository: Repository<AgentHost>,
@InjectRepository(GameInstance)
private readonly instanceRepository: Repository<GameInstance>,
private readonly webhooksService: WebhooksService,
) {}
// Bootstrap, not module-init: subscriptions registered before NatsService
@@ -197,22 +199,52 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
{ license_id: licenseId },
{ connection_status: 'offline', updated_at: now },
);
// Capture hostname(s) before flipping status so the webhook payload is useful.
const hosts = await this.hostRepository.find({ where: { license_id: licenseId } });
await this.hostRepository.update(
{ license_id: licenseId },
{ status: 'offline', updated_at: now },
);
this.logger.log(`host(s) for license ${licenseId} went offline (graceful beacon)`);
// Dispatch server_down event for each host that went offline. Fire-and-forget.
for (const host of hosts) {
void this.webhooksService
.dispatch(licenseId, 'server_down', {
host_id: host.id,
hostname: host.hostname ?? null,
reason: 'graceful_shutdown',
})
.catch(() => {
// dispatch() logs internally; swallow here to keep the handler clean.
});
}
}
/**
* Heartbeats stopping must flip the panel to offline — an agent that
* crashes or loses network never sends the goodbye beacon. Sweeps both the
* legacy connection and fleet hosts.
*
* Hosts that transition to offline here also fire the server_down webhook.
* We identify them BEFORE the bulk update so we can carry their identity
* into the webhook payload.
*/
@Interval(60_000)
async sweepStaleConnections(): Promise<void> {
const threshold = new Date(Date.now() - HostAgentConsumerService.OFFLINE_AFTER_MS);
// Identify stale hosts BEFORE bulk-updating so we can dispatch webhooks
// with meaningful host_id / hostname data.
const staleHosts = await this.hostRepository
.createQueryBuilder('host')
.where('host.status = :connected', { connected: 'connected' })
.andWhere('host.last_heartbeat_at IS NOT NULL')
.andWhere('host.last_heartbeat_at < :threshold', { threshold })
.getMany();
const conn = await this.connectionRepository
.createQueryBuilder()
.update(ServerConnection)
@@ -235,6 +267,20 @@ export class HostAgentConsumerService implements OnApplicationBootstrap {
if (affected) {
this.logger.warn(`marked ${affected} stale connection/host record(s) offline`);
}
// Dispatch server_down webhook for each host that just timed out.
// Fire-and-forget — webhook failures must never break the sweep.
for (const host of staleHosts) {
void this.webhooksService
.dispatch(host.license_id, 'server_down', {
host_id: host.id,
hostname: host.hostname ?? null,
reason: 'heartbeat_timeout',
})
.catch(() => {
// dispatch() logs internally; swallow here to keep the sweep clean.
});
}
}
/**