23 Commits

Author SHA1 Message Date
Vantz Stockwell
856106174a fix(nats): no_auth_user is top-level, not inside authorization{} — broke broker startup
All checks were successful
CI / backend-types (push) Successful in 9s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 43s
CI / integration (push) Successful in 22s
Caught during the live cutover: nats-server rejects 'unknown field
no_auth_user' when it is nested in the authorization block, taking the
whole broker down. Both the generator (open stage) and the committed
bootstrap default emitted it nested. Moved to top level. Enforce-stage
output was unaffected (no no_auth_user), which is what the live broker
now runs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:47:14 -04:00
Vantz Stockwell
463908b18e fix(nats): security review — secure-by-default + per-tenant inbox isolation
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 43s
CI / integration (push) Successful in 23s
Two HIGH findings from automated review on the generator, both fixed:
1. Cross-tenant inbox access: per-license users were granted _INBOX.>,
   letting license A subscribe to license B's request-reply responses.
   Now scoped to corrosion.{license}.> ONLY; replies must ride the
   license namespace (corrosion.{license}.reply.<id>) — documented in
   PROTOCOL.md. Agent unchanged (responds to msg.reply); constraint is
   on the requester (internal user has full >).
2. Default-open auth bypass: generator defaulted to stage=open with a
   full-access anonymous user — a stale regen left the broker wide open.
   Now defaults to enforce (secure by default); the explicit 'open'
   migration stage maps anonymous to a harmless corrosion.unclaimed.>
   namespace, never real tenant subjects. Committed bootstrap default
   hardened the same way.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:39:31 -04:00
Vantz Stockwell
00cff51ce5 feat(nats): per-license auth mechanism — agent user/password, scoped broker, generator (non-breaking)
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 17s
CI / agent-tests (push) Successful in 1m23s
Build Host Agent (Rust) / build (push) Successful in 1m38s
CI / integration (push) Successful in 23s
Closes the open broker (anonymous publish to any tenant's corrosion.*).
Per-license isolation via NATS user/password + subject permissions:
each license -> user=license_id, password=HMAC-SHA256(license_id,
NATS_TOKEN_SECRET), scoped to corrosion.{license_id}.> + _INBOX. Backend
uses a privileged internal user.

- Agent (alpha.5): nats_user/nats_password config + env, user_and_password
  auth; falls back to token/anonymous (transition-safe)
- Backend: connects with NATS_INTERNAL_USER/PASSWORD when set, else anon
- scripts/generate-nats-auth.mjs: regenerates nats-auth.conf from the
  licenses table; NATS_AUTH_STAGE=open keeps a no_auth_user fallback
  (verify creds first), =enforce rejects anonymous
- committed nats-auth.conf is the SAFE OPEN default (no secrets); the
  host copy carries real users and is not committed
- compose: NATS_INTERNAL_USER/PASSWORD/NATS_TOKEN_SECRET, mount nats-auth.conf

Entirely non-breaking until secrets+config deployed; staged cutover next.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 12:33:27 -04:00
Vantz Stockwell
9e5e828c8d fix(docker): nginx healthcheck uses 127.0.0.1 not localhost — IPv4-only listener
All checks were successful
CI / backend-types (push) Successful in 10s
CI / frontend-build (push) Successful in 16s
CI / agent-tests (push) Successful in 44s
CI / integration (push) Successful in 21s
corrosion-nginx reported (unhealthy) despite serving the panel fine:
nginx listens 0.0.0.0:80 (IPv4 only, no listen [::]:80), but
'localhost' resolves to ::1 first inside the container, so the probe
got connection-refused. Verified: 127.0.0.1:80 serves the SPA. Probe
now targets IPv4 explicitly. No nginx config change — the panel was
never broken, only the healthcheck's hostname resolution.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 11:43:01 -04:00
Vantz Stockwell
8b84bba165 fix(docker): auto-build schema on a fresh DB via docker-entrypoint-initdb.d
All checks were successful
Test Asgard Runner / test (push) Successful in 3s
Root cause of 'data lost on every rebuild': nothing created the Postgres schema. TypeORM is synchronize:false, the API container runs no migration step, and there was no init mount — so a fresh pg_data volume came up with ZERO tables (empty/broken DB; the schema had only ever been loaded manually). Mount backend/migrations/*.sql into /docker-entrypoint-initdb.d so Postgres auto-applies the full schema (001..021, plain SQL) ON FIRST INIT ONLY. Existing volumes are untouched (initdb scripts run only on an empty data dir); a fresh volume now self-heals the schema. NOTE: actual row DATA still persists only while the pg_data named volume persists — 'docker compose down' keeps it across 'build --no-cache'; 'down -v' / volume prune is the only thing that wipes it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 08:34:18 -04:00
Vantz Stockwell
14b099b075 fix: Replace socket.io with native WS adapter — fixes WebSocket 1006
All checks were successful
Test Asgard Runner / test (push) Successful in 3s
Frontend uses native WebSocket API, backend was using socket.io which
speaks an incompatible protocol. Switched to @nestjs/platform-ws so
both sides speak native WebSocket. Also fixed JWT TTL override in
docker-compose.yml (was hardcoded to 900s, now 14400s/4h).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 15:21:36 -05:00
Vantz Stockwell
1579a47cad chore: Harden Docker and Nginx configuration
All checks were successful
Test Asgard Runner / test (push) Successful in 4s
- Pin NATS image to nats:2.10-alpine for reproducible builds
- Add nginx healthcheck using wget (curl not present in alpine)
- Upgrade nginx depends_on to use condition: service_started
- Add proxy buffer directives to http block (prevents JWT/large-header truncation)
- Add X-Content-Type-Options, X-Frame-Options, X-XSS-Protection, and
  Referrer-Policy security headers to all SPA location blocks across
  all five server blocks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 13:35:25 -05:00
Vantz Stockwell
bd570ee199 chore: Remap Postgres external port to 8101 for MCP access
All checks were successful
Test Asgard Runner / test (push) Successful in 2s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 23:36:09 -05:00
Vantz Stockwell
d20493d533 feat: Complete NestJS backend scaffold — 22 modules, 39 entities, WebSocket gateway
All checks were successful
Test Asgard Runner / test (push) Successful in 3s
Full backend rewrite from Rust/Axum to NestJS/TypeScript.
- 22 feature modules (auth, servers, wipes, maps, plugins, players, console,
  chat, team, notifications, settings, schedules, analytics, alerts, status,
  store, webstore, admin, setup, migration, users, licenses)
- 39 TypeORM entities matching PostgreSQL schema (12 migrations)
- Common infrastructure: JWT/RBAC guards, decorators, exception filter
- NATS service with pub/sub/request-reply
- Socket.IO WebSocket gateway with NATS bridge
- Docker: NestJS Dockerfile + updated docker-compose.yml
- Zero compile errors (npx tsc --noEmit clean)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 21:29:25 -05:00
Vantz Stockwell
d7dddca106 fix: Simplify Dockerfile to use sqlx offline mode
All checks were successful
Test Asgard Runner / test (push) Successful in 2s
- Remove complex build caching
- Set SQLX_OFFLINE=true to skip compile-time query verification
- Queries still validated at runtime
- Eliminates need for DATABASE_URL during Docker build
2026-02-15 17:28:56 -05:00
Vantz Stockwell
77155d30be feat: Domain-based routing — marketing site at bare domain, panel at subdomain
corrosionmgmt.com now serves LandingView as the default page with marketing
routes at root level. panel.corrosionmgmt.com continues serving the admin
panel unchanged. /site/* backward compat via redirects on marketing domain.

- nginx: Add bare domain server block (only proxies /api/early-access/)
- router: Detect hostname at module load, generate domain-specific routes
- MarketingLayout: Named routes for nav, external <a> tags for auth links
- LandingView: CTAs point to panel domain via VITE_PANEL_URL

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 10:21:11 -05:00
Vantz Stockwell
0360fcf2e2 fix: Pass admin bootstrap env vars to API container
ADMIN_EMAIL and ADMIN_PASSWORD were in the .env file but not
forwarded to the API container — bootstrap_admin() couldn't
read them, so no initial user was created. Login returned 400
on every attempt because no user existed in the database.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 01:45:20 -05:00
Vantz Stockwell
c4fd4df513 infra: Add multi-stage frontend build to Docker
Nginx container now builds the Vue frontend in a Node stage
instead of mounting local dist/ files. This means:
- No need to commit dist/ or build locally before deploying
- docker compose up --build handles everything end-to-end
- Removed obsolete compose version key

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 01:39:30 -05:00
Vantz Stockwell
b8ef374a31 fix: Remove NATS healthcheck — use service_started instead
NATS minimal image has no shell tools for health probes. The API
already handles NATS unavailability gracefully, so service_started
is sufficient.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 01:32:25 -05:00
Vantz Stockwell
d4222a650c fix: Use TCP port check for NATS healthcheck
NATS image has no wget/curl. Use bash TCP check on port 4222 instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 01:26:41 -05:00
Vantz Stockwell
4df4a0a2cd fix: Replace NATS healthcheck — ldm signal was triggering shutdown
The old healthcheck used nats-server --signal ldm which puts NATS into
lame duck (shutdown) mode. Use the /healthz HTTP endpoint instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 01:20:33 -05:00
Vantz Stockwell
68f399659b fix: Remove duplicate NATS store_dir and jetstream flags from compose
Config file already sets jetstream and store_dir. Duplicate CLI flags
cause NATS to exit with error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 01:13:49 -05:00
Vantz Stockwell
271c9f43fa fix: Remap ports to avoid conflicts — 8087/8088/8089
Frontend nginx: 8087, API: 8088, NATS: 8089. Removed NATS
monitoring and WebSocket host ports (not needed externally).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 01:10:01 -05:00
Vantz Stockwell
ef5ee0f844 fix: Switch Dockerfile to Debian-based Rust image for 1.88+ support
Dependencies require Rust 1.88. Alpine images lag behind. Switched
to rust:latest (Debian) for build and debian:bookworm-slim for runtime.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 00:51:27 -05:00
Vantz Stockwell
c20ed2d384 fix: Change nginx port to 8087 — port 80 taken by NPM
NPM handles SSL termination and proxies to 8087.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 00:50:30 -05:00
Vantz Stockwell
fd509eea96 fix: Bump Rust image to 1.85 for edition2024 support
Dependency moxcms requires edition2024 feature, stabilized in Rust 1.85.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 00:48:54 -05:00
Vantz Stockwell
2c3688c914 fix: Make Cargo.lock optional in Docker build
Cargo.lock may not exist before first build. Use wildcard copy
so Docker doesn't fail if lockfile is missing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 00:45:04 -05:00
Vantz Stockwell
175d6f0a7b scaffold: Docker infrastructure — Compose, Nginx, NATS, Dockerfile
4-service stack (PostgreSQL 16, NATS JetStream, Rust API, Nginx),
multi-stage Rust build with dependency caching, wildcard subdomain
routing for public sites, WebSocket support, rate limiting zones.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 21:42:15 -05:00