Caught during the live cutover: nats-server rejects 'unknown field
no_auth_user' when it is nested in the authorization block, taking the
whole broker down. Both the generator (open stage) and the committed
bootstrap default emitted it nested. Moved to top level. Enforce-stage
output was unaffected (no no_auth_user), which is what the live broker
now runs.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Two HIGH findings from automated review on the generator, both fixed:
1. Cross-tenant inbox access: per-license users were granted _INBOX.>,
letting license A subscribe to license B's request-reply responses.
Now scoped to corrosion.{license}.> ONLY; replies must ride the
license namespace (corrosion.{license}.reply.<id>) — documented in
PROTOCOL.md. Agent unchanged (responds to msg.reply); constraint is
on the requester (internal user has full >).
2. Default-open auth bypass: generator defaulted to stage=open with a
full-access anonymous user — a stale regen left the broker wide open.
Now defaults to enforce (secure by default); the explicit 'open'
migration stage maps anonymous to a harmless corrosion.unclaimed.>
namespace, never real tenant subjects. Committed bootstrap default
hardened the same way.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Closes the open broker (anonymous publish to any tenant's corrosion.*).
Per-license isolation via NATS user/password + subject permissions:
each license -> user=license_id, password=HMAC-SHA256(license_id,
NATS_TOKEN_SECRET), scoped to corrosion.{license_id}.> + _INBOX. Backend
uses a privileged internal user.
- Agent (alpha.5): nats_user/nats_password config + env, user_and_password
auth; falls back to token/anonymous (transition-safe)
- Backend: connects with NATS_INTERNAL_USER/PASSWORD when set, else anon
- scripts/generate-nats-auth.mjs: regenerates nats-auth.conf from the
licenses table; NATS_AUTH_STAGE=open keeps a no_auth_user fallback
(verify creds first), =enforce rejects anonymous
- committed nats-auth.conf is the SAFE OPEN default (no secrets); the
host copy carries real users and is not committed
- compose: NATS_INTERNAL_USER/PASSWORD/NATS_TOKEN_SECRET, mount nats-auth.conf
Entirely non-breaking until secrets+config deployed; staged cutover next.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
corrosion-nginx reported (unhealthy) despite serving the panel fine:
nginx listens 0.0.0.0:80 (IPv4 only, no listen [::]:80), but
'localhost' resolves to ::1 first inside the container, so the probe
got connection-refused. Verified: 127.0.0.1:80 serves the SPA. Probe
now targets IPv4 explicitly. No nginx config change — the panel was
never broken, only the healthcheck's hostname resolution.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Root cause of 'data lost on every rebuild': nothing created the Postgres schema. TypeORM is synchronize:false, the API container runs no migration step, and there was no init mount — so a fresh pg_data volume came up with ZERO tables (empty/broken DB; the schema had only ever been loaded manually). Mount backend/migrations/*.sql into /docker-entrypoint-initdb.d so Postgres auto-applies the full schema (001..021, plain SQL) ON FIRST INIT ONLY. Existing volumes are untouched (initdb scripts run only on an empty data dir); a fresh volume now self-heals the schema. NOTE: actual row DATA still persists only while the pg_data named volume persists — 'docker compose down' keeps it across 'build --no-cache'; 'down -v' / volume prune is the only thing that wipes it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Frontend uses native WebSocket API, backend was using socket.io which
speaks an incompatible protocol. Switched to @nestjs/platform-ws so
both sides speak native WebSocket. Also fixed JWT TTL override in
docker-compose.yml (was hardcoded to 900s, now 14400s/4h).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Pin NATS image to nats:2.10-alpine for reproducible builds
- Add nginx healthcheck using wget (curl not present in alpine)
- Upgrade nginx depends_on to use condition: service_started
- Add proxy buffer directives to http block (prevents JWT/large-header truncation)
- Add X-Content-Type-Options, X-Frame-Options, X-XSS-Protection, and
Referrer-Policy security headers to all SPA location blocks across
all five server blocks
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove complex build caching
- Set SQLX_OFFLINE=true to skip compile-time query verification
- Queries still validated at runtime
- Eliminates need for DATABASE_URL during Docker build
corrosionmgmt.com now serves LandingView as the default page with marketing
routes at root level. panel.corrosionmgmt.com continues serving the admin
panel unchanged. /site/* backward compat via redirects on marketing domain.
- nginx: Add bare domain server block (only proxies /api/early-access/)
- router: Detect hostname at module load, generate domain-specific routes
- MarketingLayout: Named routes for nav, external <a> tags for auth links
- LandingView: CTAs point to panel domain via VITE_PANEL_URL
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ADMIN_EMAIL and ADMIN_PASSWORD were in the .env file but not
forwarded to the API container — bootstrap_admin() couldn't
read them, so no initial user was created. Login returned 400
on every attempt because no user existed in the database.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Nginx container now builds the Vue frontend in a Node stage
instead of mounting local dist/ files. This means:
- No need to commit dist/ or build locally before deploying
- docker compose up --build handles everything end-to-end
- Removed obsolete compose version key
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NATS minimal image has no shell tools for health probes. The API
already handles NATS unavailability gracefully, so service_started
is sufficient.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The old healthcheck used nats-server --signal ldm which puts NATS into
lame duck (shutdown) mode. Use the /healthz HTTP endpoint instead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Config file already sets jetstream and store_dir. Duplicate CLI flags
cause NATS to exit with error.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dependencies require Rust 1.88. Alpine images lag behind. Switched
to rust:latest (Debian) for build and debian:bookworm-slim for runtime.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cargo.lock may not exist before first build. Use wildcard copy
so Docker doesn't fail if lockfile is missing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>