docs: Add lessons 20-21 — state drift + resilient routing

Cross-pollinated from parallel instance on sister project. Adapted to Corrosion context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 04:04:39 -05:00
parent 1bb810f851
commit ef128b47d2
1 changed files with 4 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -427,3 +427,7 @@ Things I discovered about myself building a sister platform across multiple sess
 18. **When aggregating across N similar modules, scout for the one that doesn't match the pattern — it's always the oldest or the first-built.** The Loot module was the first plugin config module built, so it uses `fetchProfiles()`/`profiles` while the other 8 use `fetchConfigs()`/`configs`. The first implementation defines its own naming before a convention exists. Every aggregation layer (landing pages, batch operations, monitoring dashboards) will hit this drift. A 30-second recon across all N modules before writing the aggregator prevents a mid-implementation refactor.

 19. **UI scaling problems are invisible when you're adding one item at a time — they only become obvious in aggregate.** Nine plugin config sidebar entries were added across multiple sessions, each one reasonable in isolation. Nobody noticed the sidebar was becoming unusable until all nine were there. When building a repeatable pattern (nav items, config modules, API endpoints), build the aggregation layer early — ideally when N hits 3 or 4 — not after it's already painful.
+
+20. **Parallel state fields that track related things will drift apart — and the bugs are silent.** When two fields represent aspects of the same state (`captureMode` and `vkiMode`, or `isLoading` and `error`, or `connection_status` and `companion_last_seen`), every code path that mutates one must also update the other. But new code paths get added over time, and they only update the field they know about. Future me: when you see two fields tracking related state, grep for ALL mutation sites of each — if any path updates one but not the other, that's a bug waiting to happen. And when you add a new mutation path, check every sibling field, not just the obvious one.
+
+21. **Route through the component that survives transitions, not the one that doesn't.** When two systems can handle the same job but one is resilient to failure modes and the other isn't, route through the survivor. Don't build infrastructure to prop up the fragile path when the robust path already exists. In this project: NATS request-reply through the companion agent is the robust path; direct WebSocket to the browser is the fragile one. If a feature can work through either, prefer the path that handles disconnects, reconnects, and restarts gracefully. One routing change beats an entire retry/recovery subsystem.