Files
corrosion-admin-panel/CHANGELOG.md
Vantz Stockwell 3e8b29f2ee
All checks were successful
Test Asgard Runner / test (push) Successful in 2s
feat: Implement Phase 2 alerting system with anomaly detection
Proactive monitoring infrastructure for server health:

**Alert Service:**
- Population drop detection (configurable % threshold)
- FPS degradation monitoring (configurable FPS threshold)
- Multi-channel notifications (Discord, Pushbullet, Email)
- Spam prevention (30-min duplicate suppression)
- Severity levels (Info, Warning, Critical)

**Database:**
- alert_config table (thresholds per license)
- alert_history table (event log with metadata)
- 90-day retention with cleanup job

**Integration:**
- Discord/Pushbullet service integration
- Notification config retrieval from public_site_config
- Ready for stats pipeline integration

Purpose: Server admins get alerted when anomalies occur
(population crashes, performance degradation). Configurable
thresholds enable proactive server management.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15 14:28:51 -05:00

15 KiB

CHANGELOG — Corrosion Admin Panel

All notable changes to this project will be documented in this file.

[Unreleased]

Added (Phase 2 — Alerting System)

Backend:

  • Migration 008: Alert configuration and history tables
    • alert_config table with threshold settings per license (population drop %, FPS threshold)
    • alert_history table logging all triggered alerts with metadata
    • Default alert config created for all existing licenses
  • Alert service (services/alerting.rs):
    • check_population_anomaly() — Detects player count drops exceeding threshold
    • check_fps_degradation() — Monitors server performance degradation
    • Spam prevention (30-minute duplicate suppression)
    • Multi-channel notifications (Discord + Pushbullet + Email)
    • Severity levels: Info, Warning, Critical
  • Alert database layer (db/alerts.rs):
    • get_alert_config() / update_alert_config() — Threshold configuration
    • insert_alert() / mark_alert_notified() — Alert history tracking
    • check_recent_alert() — Duplicate detection
    • cleanup_old_alerts() — 90-day retention cleanup
  • Updated db/notifications.rs — Notification config retrieval with webhook/API key support

Alert Types:

  • Population Drop — Triggers when player count drops >X% in 1 hour
  • FPS Degradation — Triggers when FPS falls below configurable threshold
  • Server Crash — Critical alert for auto-recovery failures
  • Wipe Failed — Alert when wipe execution fails

Purpose: Proactive monitoring for server health issues. Alerts server admins via Discord/Pushbullet when anomalies detected (population crashes, performance degradation). Configurable thresholds per license.

Added (Phase 2 — Wipe Performance Analytics)

Backend:

  • backend/src/db/wipes.rs — Comprehensive wipe analytics query layer:
    • get_wipe_success_rate() — Success vs failure rate over time range
    • get_average_wipe_duration() — Average execution time for successful wipes
    • get_wipe_to_peak_population() — Hours from wipe completion to peak player count (24h window)
    • get_population_curve_by_cycle() — Day 1 vs Day 2 vs Day 3 average player counts post-wipe
    • get_optimal_wipe_timing() — Recommends best day of week + hour based on historical peak populations
    • get_wipe_analytics_entries() — Detailed per-wipe records for charting (duration, peak pop, success)
    • All queries use hourly aggregates (server_stats_hourly) with 90-day retention
  • backend/src/api/analytics.rs — Wipe performance endpoint:
    • GET /api/analytics/wipes/performance?range=90d — Returns full wipe performance metrics
    • Supports range params: 6d, 12d, 90d, all (converted to wipe count estimates)
    • Response includes: success rate, avg duration, population curve, optimal timing, individual wipe entries

Frontend:

  • WipeAnalyticsView.vue — Complete wipe performance dashboard:
    • ECharts Visualizations:
      • Wipe success timeline (scatter plot: green = success, red = failed)
      • Population curve bar chart (Day 1/Day 2/Day 3 average players post-wipe)
      • Wipe duration trend (line chart showing execution time evolution)
    • Insight Cards:
      • Success rate percentage with total wipe count
      • Average wipe duration (formatted as minutes:seconds)
      • Peak population day identifier
      • Optimal wipe timing recommendation (day + hour)
    • Actionable Recommendations Banner:
      • Optimal wipe day/hour based on post-wipe player peaks
      • Weekly vs bi-weekly wipe suggestion (if Day 1 >> Day 2 population)
      • Duration optimization alerts (if avg > 10 minutes)
      • Rollback protection warnings (if failures detected)
    • Time range selector: Last 6 wipes / Last 12 wipes / All time
    • CSV export functionality
  • Added route /wipes/analytics to router
  • TypeScript interfaces: WipePerformanceMetrics, WipeAnalyticsEntry, PopulationCurve

Purpose: Answers critical questions: "How long do wipes take? When do players peak post-wipe? What's my success rate? When should I schedule wipes for max population?" Enables data-driven wipe timing optimization and operational insights.

Added (Phase 3 — Public Status Page)

Backend:

  • Migration 007: Added status_page_description TEXT column to public_site_config
  • Public API models (models/public.rs):
    • PublicServerStatus — Server status with live stats for public display
    • PlatformHealth — Platform-wide health metrics (total servers, online count, total players, uptime)
    • StatusPageResponse — Complete status page data structure
    • PublicSiteConfig — Full public site configuration model
  • Public database queries (db/public.rs):
    • get_public_servers() — Retrieves all opted-in servers with current stats, uptime percentages (24h/7d/30d), wipe schedules
    • get_platform_health() — Calculates platform-wide aggregate metrics
    • calculate_uptime_percentage() — Uptime calculation from hourly stats
    • format_cron_expression() — Human-readable wipe schedule formatting
    • get_public_site_config() / create_public_site_config() / update_public_site_config() — Config management
  • Public API endpoint (api/public.rs):
    • GET /api/public/status — Public status page data (no auth required)
  • Settings API (api/settings.rs):
    • GET /api/settings/public-site — Fetch public site config (auth required)
    • PUT /api/settings/public-site — Update status page opt-in and description (auth required)

Frontend:

  • StatusPageView.vue — Complete public status page with:
    • Platform health header (total servers, online now, total players, platform uptime)
    • Server grid with status indicators (green/yellow/red), player counts, uptime badges (24h/7d/30d)
    • Wipe schedule display with countdown timers
    • Server search/filter functionality
    • Auto-refresh every 10 seconds via polling
    • Mobile-responsive grid layout
    • "Powered by Corrosion" footer with panel link
  • Settings dashboard integration (SettingsView.vue):
    • New "Public Status" tab with toggle for show_on_status_page
    • Text area for status_page_description
    • Save endpoint integration

Infrastructure:

  • nginx already configured for status.corrosionmgmt.com routing
  • Router already configured with /status route on both panel and marketing domains

Purpose: Public-facing marketing page showcasing all Corrosion servers. Drives platform visibility and attracts new customers ("I want this for my server too").

Added (Phase 2.2 — Player Retention Analytics)

Backend:

  • Migration 004_player_sessions.sql — Player session tracking table with indexes for retention queries
  • backend/src/db/player_sessions.rs — Complete player session tracking and retention analysis:
    • track_player_join() / track_player_leave() — Record individual player sessions
    • calculate_retention_after_wipe() — Calculate 24h/48h/72h return rates per wipe
    • get_unique_player_count() / get_avg_session_duration() — Session metrics
    • get_new_vs_returning_ratio() — New vs returning player analysis
    • get_recent_wipe_retention_metrics() — Multi-wipe retention trends
    • cleanup_old_player_sessions() — 90-day retention cleanup
  • backend/src/api/plugin.rs — Plugin event endpoints:
    • POST /api/plugin/player-event — Track player join/leave events
    • POST /api/plugin/checkin — Plugin registration on server start
  • Extended backend/src/api/analytics.rs with retention endpoints:
    • GET /api/analytics/retention?wipe_count=6 — Multi-wipe retention metrics
    • GET /api/analytics/retention/export — CSV export of retention data

Frontend:

  • PlayerRetentionView.vue — Complete retention analytics dashboard:
    • ECharts retention curve (24h/48h/72h lines across multiple wipes)
    • Summary cards: unique players, avg session duration, new vs returning ratio
    • Wipe selector (last 3/6/10/20 wipes)
    • Detailed wipe table with retention percentages
    • CSV export functionality
  • Added route /retention to router
  • TypeScript interfaces: WipeRetentionMetric, SessionSummary, RetentionResponse

Plugin:

  • Updated CorrosionCompanion.cs to track player events via /api/plugin/player-event
  • Modified OnPlayerConnected / OnPlayerDisconnected hooks with license_key authentication

Purpose: Answers critical question: "What percentage of players return 24h/48h/72h after a wipe?" Enables data-driven wipe timing optimization and player retention analysis.

Added (Phase 2.2 — Map Analytics System)

Backend:

  • Migration 005: Added map_id FK to server_stats and wipe_history for map effectiveness tracking
  • Stats consumer now captures current_map_id from server_config when persisting stats
  • Map analytics database queries (db/maps.rs):
    • get_map_analytics() — Returns performance metrics per map (avg/peak players, times used, effectiveness score)
    • get_map_population_trends() — Player count trends per map over wipe cycles
    • Effectiveness scoring algorithm: (avg_players / peak_players) * 100
  • Analytics API endpoint (api/analytics.rs):
    • GET /api/analytics/maps?range=90d — Map performance summary with rotation effectiveness

Frontend:

  • MapAnalyticsView.vue — Complete map effectiveness dashboard with:
    • Summary cards: Best performing map, rotation effectiveness %, total maps tracked
    • ECharts bar chart comparing avg vs peak players per map
    • Sortable performance table with effectiveness color coding (green ≥80%, yellow ≥60%, red <60%)
    • Actionable insights section recommending rotation improvements
    • CSV export functionality
    • Time range selector (30d/90d/all)
  • TypeScript types: MapPerformanceMetrics, MapAnalyticsSummary
  • Router: Added /maps/analytics route under admin dashboard

Purpose: Answers "Which maps drive the most players? Is my rotation working?" Enables data-driven map selection for wipe day.

Added (Phase 2 — Data Aggregation Pipeline)

Backend:

  • Stats ingestion consumer service (stats_consumer.rs) subscribing to corrosion.*.stats NATS subject
  • Complete stats database queries (db/stats.rs) with support for:
    • Raw stats insertion and retrieval
    • Hourly aggregation queries
    • Analytics summary calculations (peak/avg players, uptime)
    • Data retention cleanup (7 days raw, 90 days hourly)
  • Hourly stats aggregation scheduler job (runs at :05 past every hour)
  • Daily cleanup scheduler job (runs at 03:00 UTC)
  • Analytics API endpoints (api/analytics.rs):
    • GET /api/analytics/summary — Peak/avg players, uptime percentage
    • GET /api/analytics/timeseries — Time-series data for charting (hourly/raw granularity)
    • GET /api/analytics/export — CSV export of server stats
  • Background service initialization in main.rs (stats consumer + scheduler)

Frontend:

  • Analytics TypeScript types (AnalyticsSummary, TimeseriesData, HourlyStats)
  • Complete AnalyticsView.vue implementation with:
    • Real-time data fetching from analytics API
    • Apache ECharts integration for Player Count and Server Performance charts
    • Time range selector (24h/7d/30d)
    • CSV export functionality
    • Loading states and responsive layout

Infrastructure:

  • Made NatsBridge.jetstream public for service consumer access

Added (Sovereign Infrastructure Stack)

Services Deployed:

  • Gitea (git.corrosionmgmt.com) — Self-hosted Git with Actions support
    • Container: corrosion-gitea on port 8090 (HTTP) and 8095 (SSH)
    • SQLite database (self-contained, persistent)
    • Replaces GitHub dependency for source control
    • Gitea Actions enabled for CI/CD
  • SeaweedFS (cdn.corrosionmgmt.com) — S3-compatible object storage and CDN
    • Container: corrosion-cdn with integrated Master/Volume/Filer/S3
    • Filer UI at port 8091 (cdn.corrosionmgmt.com)
    • Master UI at port 8093 (admin.cdn.corrosionmgmt.com)
    • S3 API at port 8092 (internal access)
    • Purpose: Map hosting, plugin packages, companion binaries, backups
  • Gitea Act Runner (asgard build server) — CI/CD execution environment
    • Runs on Ryzen 9 7945HX (16C/32T, 64GB DDR5)
    • Docker-based job execution
    • Go 1.21+ and Rust toolchains available
    • Connects to public Gitea instance remotely

CI/CD Workflows:

  • test-runner.yml — Runner capability validation (hostname, resources, toolchains)
  • build-companion.yml — Production companion agent build pipeline:
    • Triggers on version tags (v*..)
    • Cross-compiles for Linux AMD64 and Windows AMD64
    • Generates SHA256 checksums
    • Creates Gitea release with auto-generated installation instructions
    • Uploads binaries and checksums as release assets

Documentation:

  • infra/docker-compose.yml — Infrastructure stack definition
  • infra/README.md — Deployment guide and architecture overview
  • infra/NPM-CONFIG.md — Nginx Proxy Manager configuration
  • infra/ASGARD-RUNNER.md — Act runner setup guide

Repository Migration:

  • Migrated from GitHub to self-hosted Gitea
  • Remote updated to git@git.corrosionmgmt.com:vantzs/corrosion-admin-panel.git
  • All future development on sovereign infrastructure

Technical Details

Data Flow:

Plugin/Agent publishes stats (60s interval)
  → NATS JetStream (corrosion.*.stats)
  → StatsConsumerService persists to server_stats table
  → Hourly aggregation job rolls up to server_stats_hourly
  → Analytics API queries aggregated data
  → Frontend renders charts via ECharts

Database Schema:

  • server_stats table (raw stats, 7-day retention)
  • server_stats_hourly table (aggregated hourly data, 90-day retention)

Scheduler Jobs:

  • Hourly aggregation: 0 5 * * * * (at :05 past every hour)
  • Daily cleanup: 0 0 3 * * * (at 03:00 UTC)

Installation Notes

Frontend:

cd frontend && npm install echarts

Backend: No additional dependencies beyond existing Cargo.toml.

Deferred to Phase 2.2

  • Player retention tracking (new vs returning players, session duration)
  • Wipe-correlated analytics
  • Player activity heatmaps (time-of-day patterns)
  • Anomaly alerting system

[2025-02-15] — Phase 1 Complete

Added (Phase 1 — Foundation)

Backend Services:

  • Core control plane (Axum + Tokio)
  • Auto-wiper with rollback (wipe_engine.rs)
  • Plugin management system
  • WebSocket/NATS bridge for real-time data
  • Companion agent adapter (bare metal server management)
  • Panel adapters (AMP + Pterodactyl)

Frontend:

  • Vue 3 dashboard with 19 admin sub-views
  • Wipe management UI with real-time progress
  • Toast notification system
  • Plugin management interface
  • Public server site

Infrastructure:

  • PostgreSQL schema (migrations 001-003)
  • NATS JetStream streams (6 streams configured)
  • Docker Compose deployment (4 services)
  • JWT auth with refresh tokens, TOTP 2FA

Companion Agent:

  • Go binary for bare metal server management
  • NATS-based command execution
  • Process lifecycle control
  • File operations support

uMod Plugin:

  • C# plugin for Rust game server integration
  • Stats publishing every 60 seconds
  • Server lifecycle event reporting

Commits

  • c5d0571 — feat: Complete Phase 1 frontend — WebSocket + Wipe feature end-to-end
  • 590765f — feat: Complete Phase 1 backend services and WebSocket/NATS bridge
  • 8320591 — docs: Update companion agent language choice to Go
  • 3c39345 — docs: Add CLAUDE.md and Claude Code settings
  • 81eeb3b — docs: Add AGENTS.md roster and resource discipline

Format: type: Short description

Types: feat, fix, docs, refactor, test, chore, perf, ci