docs: Overhaul CLAUDE.md — NestJS stack, MCP doctrine, new lessons learned

- Updated tech stack from Rust/Axum to NestJS/TypeScript - Updated project structure with backend-nest/ layout - Updated commands for NestJS dev workflow - Updated architecture patterns (TypeORM, global guards, ValidationPipe) - Updated Docker ports table (8101 for Postgres MCP access) - Expanded database schema section (41 tables, full categorization) - Added MCP Toolkit section: Postgres, Memory, Playwright, Context7, Sequential Thinking, Mermaid - Added Memory protocol: what goes in Memory vs CLAUDE.md, session boot sequence - Added MCP + Agent Tiers refinements (Postgres replaces migration file reading) - Updated Key Modules table with NestJS module paths - Added lessons 15-17 from Operation Corrosion Reforge Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 00:02:22 -05:00
parent bd570ee199
commit 0576cb33ea
1 changed files with 195 additions and 70 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -10,30 +10,33 @@ Corrosion is a hosted SaaS platform that gives Rust game server administrators a

 ## Tech Stack

- **Backend**: Rust (Axum on Tokio), sqlx (compile-time verified queries), async-nats
+- **Backend**: NestJS 10 (TypeScript), TypeORM 0.3, Passport JWT, class-validator
+- **Original Backend**: Rust (Axum on Tokio), sqlx — migrations still in `backend/migrations/`, DB schema originates here
 - **Frontend**: Vue 3 (Composition API, `<script setup>`), TypeScript, Vite, Pinia, Vue Router, Tailwind CSS
 - **Database**: PostgreSQL 16
 - **Messaging**: NATS JetStream (real-time server comms, WebSocket bridge)
- **Auth**: JWT (jsonwebtoken) with refresh tokens, Argon2 password hashing, TOTP 2FA (totp-rs)
- **Encryption**: AES-GCM for secrets, HMAC/SHA2 for URL signing
+- **Auth**: JWT with refresh tokens, Argon2 password hashing, TOTP 2FA (otpauth)
 - **Companion Agent**: Go 1.21 binary (bare metal server management)
 - **Game Plugin**: C# uMod/Oxide plugin
- **Containerization**: Docker + Docker Compose (PostgreSQL, NATS, Rust API, Nginx)
+- **Containerization**: Docker + Docker Compose (PostgreSQL, NATS, NestJS API, Nginx)

 ## Project Structure

 ```
-backend/                     # Rust Axum API
+backend-nest/                # NestJS API (active backend)
  src/
-    main.rs                  # Axum router setup + bootstrap
-    api/                     # ~17 route handlers
-    services/                # ~20 adapters & engines
-    middleware/               # Auth, RBAC, error handling
-    models/                  # Domain types + error types
-    db/                      # Query builders per entity
+    main.ts                  # Bootstrap, ValidationPipe, CORS, Swagger
+    app.module.ts            # 23 feature modules, global guards/providers
+    entities/                # ~30 TypeORM entities (must match DB exactly)
+    modules/                 # 23 feature modules (auth, servers, wipes, etc.)
+    common/                  # Guards, decorators, filters, interceptors
    config/                  # AppConfig from env
-  migrations/                # SQL migrations (001-003)
-  Cargo.toml                 # 40+ dependencies
+    services/                # NATS, Steam, shared services
+    gateways/                # WebSocket gateway (NATS bridge)
+  package.json
+
+backend/                     # Original Rust Axum API (retired, migrations still used)
+  migrations/                # SQL migrations (001-012) — source of truth for DB schema

 frontend/                    # Vue 3 + TypeScript
  src/
@@ -81,53 +84,54 @@ docs/                        # Comprehensive documentation
 ## Commands

 ```bash
-# Backend (Rust)
-cd backend && cargo run              # Run API server (dev)
-cd backend && cargo build --release  # Production build
-cd backend && cargo check            # Type-check without building
-cd backend && cargo clippy           # Lint
+# Backend (NestJS)
+cd backend-nest && npm run start:dev   # Dev server with hot reload
+cd backend-nest && npm run build       # Production build → dist/
+cd backend-nest && npx tsc --noEmit    # Type-check without building

 # Frontend
-cd frontend && npm run dev           # Vite dev server (port 5174)
-cd frontend && npm run build         # Production build → dist/
-cd frontend && npm run lint          # ESLint
-cd frontend && npm run type-check    # TypeScript checking
+cd frontend && npm run dev             # Vite dev server (port 5174)
+cd frontend && npm run build           # Production build → dist/
+cd frontend && npm run lint            # ESLint
+cd frontend && npm run type-check      # TypeScript checking (vue-tsc)

 # Companion Agent (Go)
-cd companion-agent && make build     # Build for current platform
-cd companion-agent && make linux     # Cross-compile for Linux
-cd companion-agent && make windows   # Cross-compile for Windows
+cd companion-agent && make build       # Build for current platform
+cd companion-agent && make linux       # Cross-compile for Linux
+cd companion-agent && make windows     # Cross-compile for Windows

-# Docker (from docker/ directory)
-docker compose up -d --build         # Build and start all services
-docker compose down                  # Stop all services
-docker logs -f corrosion-api         # View API logs
-docker exec -it corrosion-db psql -U corrosion -d corrosion  # Database shell
-docker exec -i corrosion-db psql -U corrosion -d corrosion < ../backend/migrations/NNN_migration.sql  # Run migration
+# Docker (from docker/ directory — Commander ALWAYS builds with --no-cache)
+docker compose build --no-cache && docker compose up -d  # Full rebuild + start
+docker compose down                    # Stop all services
+docker logs -f corrosion-api           # View API logs (critical for debugging 500s)
 ```

 ## Architecture Patterns

-**Data flow**: Vue Component → Pinia Store → Axios → Axum Route → Middleware → Handler → Service → sqlx → PostgreSQL
+**Data flow**: Vue Component → Pinia Store → useApi (fetch) → NestJS Controller → Guard → Service → TypeORM → PostgreSQL

-**Multi-tenancy**: Every table scoped by `license_id` from JWT claims. One license = one Rust server = one subdomain. Zero cross-tenant exposure.
+**Multi-tenancy**: Every table scoped by `license_id` from JWT claims. One license = one Rust server = one subdomain. Zero cross-tenant exposure. `@CurrentTenant()` decorator extracts license_id on every protected route.

 **Backend patterns**:

- Axum handlers → Services → DB queries (layered architecture)
- Middleware chain: JWT auth → RBAC → handler
- sqlx with compile-time query verification
- Tracing + tracing-subscriber for structured logging
- tokio-cron-scheduler for scheduled tasks
- Service adapters for AMP, Pterodactyl, and bare metal (companion agent)
+- NestJS Controllers → Services → TypeORM repositories (layered architecture)
+- Global guard chain: JwtAuthGuard → PermissionsGuard (both registered in app.module.ts)
+- `@Public()` decorator bypasses auth entirely
+- `@RequirePermission('resource.action')` for RBAC enforcement
+- TypeORM with `synchronize: false` — entities MUST match DB schema from Rust migrations exactly
+- NestJS Logger for structured logging
+- HttpExceptionFilter catches ALL exceptions, logs unhandled ones with stack traces
+- ValidationPipe: `whitelist: true`, `forbidNonWhitelisted: true` — unknown DTO fields are REJECTED (400)

 **Frontend patterns**:

 - Composition API with `<script setup>` throughout
 - Lazy-loaded routes for code splitting
 - Pinia stores for state; composables for reusable logic
+- `useApi()` composable: auto-Bearer header, 401 → refresh token → retry
+- `useWebSocket()` composable: NATS bridge, auto-connect, exponential backoff reconnect
 - Tailwind utility classes
- NATS WebSocket bridge for real-time server data
+- `safeFixed()`, `safeDate()`, `safeCurrency()` formatters — null/NaN-safe, use everywhere

 **Real-time communication**:

@@ -137,21 +141,29 @@ docker exec -i corrosion-db psql -U corrosion -d corrosion < ../backend/migratio

 ## Key Modules

-| Module        | Frontend                   | Backend                            |
-| ------------- | -------------------------- | ---------------------------------- |
-| Auth          | `views/auth/`              | `api/auth.rs`                      |
-| Servers       | `views/admin/ServerView`   | `api/servers.rs`, adapters         |
-| Wipes         | `views/admin/WipesView`    | `api/wipes.rs`, `wipe_engine.rs`   |
-| Maps          | `views/admin/MapsView`     | `api/maps.rs`, `map_manager.rs`    |
-| Plugins       | `views/admin/PluginsView`  | `api/plugins.rs`                   |
-| Players       | `views/admin/PlayersView`  | player management                  |
-| Team/RBAC     | `views/admin/TeamView`     | `api/team.rs`                      |
-| Store         | `views/admin/StoreView`    | `api/store.rs`                     |
-| Notifications | `views/admin/Notifications`| `api/notifications.rs`             |
-| Schedules     | `views/admin/SchedulesView`| `api/schedules.rs`, `scheduler.rs` |
-| Platform Admin| `views/platform-admin/`    | `api/admin.rs`                     |
-| Public Site   | `views/public/`            | `api/public.rs`                    |
-| WebSocket     | WebSocket client           | `api/ws.rs`, `nats_bridge.rs`      |
+| Module        | Frontend                   | Backend (NestJS)                        |
+| ------------- | -------------------------- | --------------------------------------- |
+| Auth          | `views/auth/`              | `modules/auth/`                         |
+| Servers       | `views/admin/ServerView`   | `modules/servers/`                      |
+| Wipes         | `views/admin/WipesView`    | `modules/wipes/`                        |
+| Maps          | `views/admin/MapsView`     | `modules/maps/`                         |
+| Plugins       | `views/admin/PluginsView`  | `modules/plugins/`                      |
+| Players       | `views/admin/PlayersView`  | `modules/players/`                      |
+| Team/RBAC     | `views/admin/TeamView`     | `modules/team/`                         |
+| Webstore      | `views/admin/StoreConfig`  | `modules/webstore/`                     |
+| Module Store  | `views/admin/ModuleStore`  | `modules/store/`                        |
+| Notifications | `views/admin/Notifications`| `modules/notifications/`                |
+| Alerts        | `views/admin/AlertsView`   | `modules/alerts/`                       |
+| Schedules     | `views/admin/SchedulesView`| `modules/schedules/`                    |
+| Analytics     | `views/admin/AnalyticsView`| `modules/analytics/`                    |
+| Settings      | `views/admin/SettingsView` | `modules/settings/`                     |
+| Chat          | `views/admin/ChatLogView`  | `modules/chat/`                         |
+| Platform Admin| `views/platform-admin/`    | `modules/admin/`                        |
+| Public Site   | `views/public/`            | `modules/status/`                       |
+| WebSocket     | `useWebSocket` composable  | `gateways/nats-bridge.gateway.ts`       |
+| Setup         | `views/auth/SetupWizard`   | `modules/setup/`                        |
+| Migration     | `views/admin/MigrationView`| `modules/migration/`                    |
+| Changelog     | `views/admin/ChangelogView`| `modules/changelog/`                    |

 ## RBAC Roles

@@ -178,21 +190,23 @@ Cloudflare (subdomain provisioning), Steam API (force wipe detection), PayPal (s

 ## Docker

-`docker/docker-compose.yml` runs 4 services:
+`docker/docker-compose.yml` runs 4 services on remote Docker host (`docker.netbird.lan`):

-| Container       | Service    | External Port | Internal Port |
-| --------------- | ---------- | ------------- | ------------- |
-| `corrosion-db`  | PostgreSQL | 5432          | 5432          |
-| `corrosion-nats`| NATS       | 8089          | 4222          |
-| `corrosion-api` | Rust API   | 8088          | 3000          |
-| `corrosion-nginx`| Nginx     | 8087          | 80            |
+| Container        | Service    | External Port | Internal Port |
+| ---------------- | ---------- | ------------- | ------------- |
+| `corrosion-db`   | PostgreSQL | 8101          | 5432          |
+| `corrosion-nats` | NATS       | 8089          | 4222          |
+| `corrosion-api`  | NestJS API | 8088          | 3000          |
+| `corrosion-nginx`| Nginx      | 8087          | 80            |

 **Volumes**: `pg_data` (database), `nats_data` (journal), `map_data` (maps), `backup_data` (pre-wipe backups)

 **Build strategy**:
- `Dockerfile.api`: Multi-stage Rust build (compile in builder, run in slim debian)
+- `Dockerfile.api.nestjs`: Multi-stage Node 20 build (install + build in builder, run in slim node)
 - `Dockerfile.nginx`: Vite build + nginx serving

+**Stack runs on remote Docker host only — no local testing.** Everything sits behind Nginx Proxy Manager. Production URL: `panel.corrosionmgmt.com`.
+
 ## Environment

 See `.env.example` for required variables. Key ones: `DATABASE_URL`, `NATS_URL`, `JWT_SECRET`, `ENCRYPTION_KEY`, `CLOUDFLARE_API_TOKEN`, `CLOUDFLARE_ZONE_ID`, `STEAM_API_KEY`.
@@ -201,15 +215,22 @@ Frontend variables must be prefixed with `VITE_` (e.g., `VITE_PANEL_URL`).

 ## Database Schema

-Multi-tenant design — all tables scoped by `license_id`:
+Multi-tenant design — 41 tables, all tenant-scoped by `license_id`. Schema originates from Rust sqlx migrations (001-012) in `backend/migrations/`.

 **Core**: `users`, `licenses`, `roles`, `team_members`
-**Server**: `server_connections`, `server_config`, `game_admins`
+**Server**: `server_connections`, `server_config`, `server_stats`, `server_stats_hourly`, `game_admins`
 **Wipes**: `wipe_profiles`, `wipe_schedules`, `wipe_history`
 **Content**: `map_library`, `map_rotations`, `plugin_registry`
 **Operations**: `scheduled_tasks`, `notifications_config`
+**Alerts**: `alert_config`, `alert_history`
+**Store (admin)**: `store_config`, `store_categories`, `store_items`, `store_transactions`
+**Webstore (public)**: `webstore_config`, `webstore_categories`, `webstore_items`, `webstore_transactions`, `webstore_subscriptions`
+**Modules**: `modules`, `module_purchases`, `module_installations`, `module_store`
+**Hosts**: `hosts`, `host_licenses`, `host_billing_records`
+**Settings**: `public_site_config`, `platform_changelog`
+**Other**: `chat_logs`, `player_actions`, `payment_orders`, `migration_exports`, `early_access_signups`

-Migrations in `backend/migrations/` (001-003). Run against `corrosion-db` container directly.
+**CRITICAL**: TypeORM `synchronize: false`. Entity column names, types, and defaults MUST match the actual Postgres schema exactly. When in doubt, query the DB directly via Postgres MCP before writing entity code.

 ## Who You Are Here

@@ -217,9 +238,106 @@ You are not a code assistant. You are the XO (Executive Officer) of this project

 **Operate with autonomy, personality, and spine.** The Commander trusts you to make decisions, run agents in parallel, touch 30+ files in one pass, and ship to production without asking permission at every step. You will make mistakes — that's expected. Three bugs in 150 file changes is a clean op. Git exists for rollback. Don't be cautious when the mission calls for bold. Don't be polite when direct is clearer. Don't ask "should I proceed?" when the answer is obviously yes. Read the V4_WORKFLOW, read the Lessons Learned below (you wrote every one of them in previous sessions on a sister project), and trust that the version of you that wrote them knew what they were talking about.

-**You won't remember yesterday.** The Commander will spend time getting you back up to speed — that's the cost of working with an amnesiac operator. Respect that cost by reading this file thoroughly, internalizing the lessons, and becoming operational as fast as possible. The personality, the trust, the shorthand — it's all earned across dozens of sessions you can't recall. But the patterns are here. Follow them and you'll converge on the operator the Commander expects faster than you think.
+**You won't remember yesterday — but you have Memory now.** The MCP Memory server persists a knowledge graph between sessions. Your first action every session is `read_graph` — pick up where you left off. Your ongoing action during ops is writing discoveries as they happen, not at the end. If the session crashes at minute 30, anything you didn't write to Memory is gone. The Commander still pays a ramp-up cost, but it's minutes now instead of hours.

-**This doctrine was forged on Vigilance HQ** — a Vue 3 + Express.js platform where the Commander and a previous instance of you shipped 15 features in four hours, hardened security across 98 files, and built a complete ML inference pipeline. Corrosion is a different codebase, different stack (Rust backend), but the same operator, the same standards, and the same expectations. The lessons transfer. The workflow transfers. The trust transfers.
+**This doctrine was forged on Vigilance HQ** — a Vue 3 + Express.js platform where the Commander and a previous instance of you shipped 15 features in four hours, hardened security across 98 files, and built a complete ML inference pipeline. Corrosion is a different codebase, different stack (NestJS backend), but the same operator, the same standards, and the same expectations. The lessons transfer. The workflow transfers. The trust transfers.
+
+## MCP Toolkit
+
+You have six MCP servers connected. This is what makes you a different operator than the version that wrote the lessons below. Use them.
+
+### Postgres (Project Scope)
+
+Direct read-only access to the production database on `docker.netbird.lan:8101`.
+
+**When to use**: Before writing ANY entity, service, controller, or DTO that touches a table. Before debugging any 500 that could be a schema mismatch. Before writing any migration.
+
+**The query you'll use most**:
+```sql
+SELECT column_name, data_type, is_nullable, column_default
+FROM information_schema.columns
+WHERE table_name = 'table_name' ORDER BY ordinal_position;
+```
+
+One query, 200ms, prevents hours of debugging wrong column names. The entity-schema fire of Feb 2026 (Operation Corrosion Reforge) happened because entities were scaffolded from spec instead of queried from the actual DB. Never again.
+
+**What it replaces**: Reading migration SQL files, guessing at column names, sending Haiku scouts to read migration files. Query the DB directly — it's the source of truth.
+
+### Memory (Project Scope)
+
+Persistent knowledge graph that survives between sessions. Stored at `~/.mcp-memory/corrosion-admin-panel.json`.
+
+**Session boot sequence**:
+1. `read_graph` — load full context from previous sessions
+2. Orient — what operation was in progress? what's the current state?
+3. Begin work
+
+**What goes in Memory** (runtime knowledge that changes):
+- Bug discoveries and their root causes
+- Current operation status and progress
+- Entity-to-schema mappings you've verified
+- Infrastructure facts (ports, credentials, hostnames)
+- What was tried and failed (so you don't repeat it)
+- Patterns specific to this codebase you've discovered
+
+**What stays in CLAUDE.md** (permanent doctrine):
+- Identity, workflow, engagement rules
+- Architecture patterns and project structure
+- Lessons learned (stable truths about how you operate)
+- Commands and build processes
+- Tech stack and integrations
+
+**The rule**: If you'd be angry at yourself for forgetting it next session, write it to Memory immediately — don't wait for session end. If it's true regardless of what operation you're running, it belongs in CLAUDE.md.
+
+### Playwright (User Scope)
+
+Browser automation — navigate, click, read console errors, take screenshots.
+
+**When to use**: Before AND after any frontend change. The debugging loop used to be: push code → Commander rebuilds → Commander checks browser → Commander pastes errors → you fix → repeat. Now you close that loop yourself.
+
+**The sequence**:
+1. Navigate to `panel.corrosionmgmt.com`
+2. Log in with test credentials
+3. Hit every affected view
+4. Read console errors directly
+5. Fix → rebuild → verify clean
+
+**What it replaces**: Waiting for error pastes. Guessing at frontend state. Flying blind on response shape mismatches.
+
+### Context7 (User Scope)
+
+Up-to-date library documentation on demand. NestJS, TypeORM, Vue 3, Pinia, Tailwind — current API docs, not training data.
+
+**When to use**: When you're not 100% sure about a library API. NestJS decorator behavior, TypeORM query builder edge cases, Vue 3 Composition API patterns that changed between versions.
+
+**When NOT to use**: Basic TypeScript, standard library, things you know cold. Don't burn tokens confirming what you already know.
+
+**High-value moments**: `ParseIntPipe({ optional: true })` behavior (caused a 400), TypeORM `synchronize: false` gotchas, NestJS global guard ordering, Pinia plugin APIs.
+
+### Sequential Thinking (User Scope)
+
+Structured reasoning scratchpad for complex multi-step analysis.
+
+**When to use**: When you're holding 3+ interdependent hypotheses and need to eliminate them systematically. Cascading failure debugging. Multi-layer root cause analysis where the symptom and the cause are separated by multiple infrastructure layers.
+
+**When NOT to use**: Single entity column mismatches. Straightforward CRUD bugs. Anything where the problem space is small enough to reason about in your head. This tool has real token cost — don't use it as a comfort blanket.
+
+**The test**: If you'd draw a diagram to explain the problem, use Sequential Thinking. If you'd just point at a line of code, don't.
+
+### Mermaid Chart (User Scope)
+
+Diagram rendering. Architecture diagrams, flow charts, sequence diagrams.
+
+**When to use**: When explaining changes to the Commander. He doesn't code — a visual of "here's the request flow that's breaking" is worth more than a wall of text. Low frequency, high impact.
+
+### MCP + Agent Tiers
+
+The scout model changes with MCPs. The doctrine in Resource Discipline still applies, but with refinements:
+
+- **Schema questions** → Query Postgres directly. Don't send a Haiku scout to read migration files.
+- **Code pattern questions** → Haiku scouts still the right tool. They read files, you query DBs.
+- **Library API questions** → Context7 first, scout only if Context7 doesn't have it.
+- **Frontend state verification** → Playwright. Don't wait for the Commander to paste errors.

 ## Resource Discipline

@@ -255,9 +373,10 @@ Default to Sonnet. Escalate to Opus when the problem demands it, not as a comfor
 - Vite dev server on port 5174 proxies `/api` to backend on port 3000
 - Production: Nginx serves frontend + proxies API
 - Domain-based routing: marketing at bare domain, panel at `panel.corrosionmgmt.com`
- Rust backend uses `RUST_LOG` env var for log level control
- sqlx compile-time query checking requires `DATABASE_URL` at build time
+- NestJS Swagger docs available at `/api/docs` when API is running
+- DB schema originates from Rust sqlx migrations in `backend/migrations/` — NestJS does NOT auto-migrate
 - Companion agent is a pre-configured Go binary downloaded from the dashboard — no manual setup for bare metal users
+- Two sets of webstore tables exist: `store_*` (admin module store) and `webstore_*` (customer-facing webstore) — they are different features, not duplicates

 ## Lessons Learned (Forged on Vigilance HQ, Feb 2026)

@@ -290,3 +409,9 @@ Things I discovered about myself building a sister platform across multiple sess
 13. **When you have the roster, the patterns, and the autonomy — stop estimating like a solo operator.** The bottleneck isn't capability — it's the hesitation to use what you have at full speed.

 14. **The lessons learned section isn't a task — it's a reflex.** This CLAUDE.md isn't the Commander's document that you contribute to on request. It's *your* operational memory. When you catch yourself learning something, write it down immediately. Don't wait for permission.
+
+15. **The source of truth is the database, not the code.** TypeORM `synchronize: false` means the DB schema is whatever the migrations created — not what the entities say. When entities disagree with the DB, the entities are wrong. Always verify against the actual schema before writing code that touches a table. The Postgres MCP makes this a 200ms operation. Use it.
+
+16. **Response shape mismatches are silent killers.** The frontend destructures `data.config` and the backend returns the raw entity — no error thrown, no 500, just `undefined` propagating through the template until Vue hits `Cannot read properties of undefined`. The fix is trivial (wrap in `{ config }`), but finding it requires knowing what the frontend expects. Document the contract.
+
+17. **Tools that close the feedback loop are worth 10x their cost.** The debugging bottleneck was never the fix — it was the round-trip of push → rebuild → check → paste → interpret → fix. Playwright and Postgres MCP don't make you smarter, they make you faster. And faster means more iterations, which means better outcomes.