docs(claude): Lesson 27 — lint infra config before deploy; compose up -d recreates changed deps
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -449,3 +449,5 @@ Things I discovered about myself building a sister platform across multiple sess
|
||||
25. **Fixing a dead code path detonates the live code behind it — budget for the second bug.** The moment Lesson 24's fix made the NATS→WS bridge actually deliver events, the API crashed on the first forwarded heartbeat: `WebSocket.OPEN` was `undefined` at runtime because `esModuleInterop` is off, so `import WebSocket from 'ws'` compiled to `ws_1.default` (undefined). That crash had sat behind the dead bridge since the gateway was written — never hit because no event ever reached it. When you resurrect a path that was silently no-op, everything downstream of it is effectively *untested code running for the first time in production*. Verify the whole chain end-to-end (I watched the DB row appear, then flip offline), don't stop at "the subscription fires now." This is Lesson 10 with a fuse on it. Import-runtime gotcha worth remembering: when `esModuleInterop` is off, prefer instance constants (`client.OPEN`) over class statics (`WebSocket.OPEN`) for `ws`.
|
||||
|
||||
26. **A jail check at the entry point does not jail the recursive walk behind it — and my own "line-by-line" review missed it; the automated security review didn't.** The file manager's `jail()` correctly canonicalized and prefix-checked the top-level path, and I traced every escape vector through it and signed off. But `copy_recursive` then walked the directory tree with `fs::metadata` (which *follows* symlinks). A symlink planted inside the jail pointing at `/etc`, then a `copy` of its parent, would dereference it and pull external content *into* the jail to be read — a jail escape the entry check never sees, because the escape is reintroduced by a descendant during traversal. Fix: `symlink_metadata` (lstat) everywhere you recurse, and refuse/never-follow symlinks across the boundary. The transferable rule: **validate at the boundary AND at every step that re-derives a path** (recursion, `read_dir`, glob, archive extraction). And the humbling part — I was confident after reviewing the jail function; the security-review pass caught the HIGH I'd waved through. Trust adversarial verification over your own once-over on security-critical code, especially path/traversal logic.
|
||||
|
||||
27. **Validate infra config BEFORE it reaches a deploy — and know that `docker compose up -d <service>` will recreate other services whose definitions changed.** During the NATS auth cutover I ran `docker compose up -d api` to pick up new env. Because the *nats* service definition had also changed (a new volume mount), compose recreated **corrosion-nats too** — and it failed to start on a config error (`no_auth_user` nested inside `authorization{}` instead of at top level), taking the broker down for ~3 minutes with the backend in offline mode. Two lessons: (a) a broker/proxy/DB config file is code — lint it before it can reach a restart (`nats-server -t -c cfg` to test-parse, `nginx -t`, etc.), don't let the first validation be the production container's startup; (b) `compose up -d <one-service>` is not surgical — it reconciles that service's **dependencies** too, so a stale edit to a depended-on service ships when you didn't mean it to. When touching shared-infra config, restart that service explicitly and watch it come up before moving on. Recovery also surfaced a third gotcha: recreating a client (api) while its server (nats) is down leaves the client stuck on a cached DNS failure (`EAI_AGAIN`) — restart the client once the server is healthy.
|
||||
|
||||
Reference in New Issue
Block a user