Admin Operations Center

Complete

Overview

The Admin Operations Center is the master control plane for every autonomous system running inside ThreatOps. As the platform evolves toward a self-operating SOC, individual services run as long-lived background loops — integrity scans, ML retraining, autonomous SOC cycles, self-healing checks, advisory refreshes, and the frontend factory engine. Without a unified control surface, operators have no visibility into whether these systems are healthy, and no mechanism to intervene when something stalls.

This module aggregates the real-time status of all seven autonomous engines into a single dashboard, surfaces live process metrics (CPU, memory, WebSocket connections, database pool), and provides per-system action controls — trigger, stop, and restart — as well as platform-wide kill-switch and restart-all operations. A time-ordered activity feed consolidates events from every engine into one scrollable log. A sub-page at /admin-ops/factory extends this with a full lifecycle manager for the Frontend Factory Engine, covering requirement submission, TSX code generation, preview, and deployment.

What Was Proposed

A centralized operator console for monitoring all autonomous and scheduled platform services in one view
Per-system health cards showing running state, last execution timestamp, error count, and success rate
Manual trigger capability for every system without requiring direct shell access to any pod
An emergency stop mechanism able to halt any individual system or all systems simultaneously
Restart controls to bring stopped systems back online without a full platform redeploy
A unified, time-sorted activity log aggregating events from integrity scans, deploys, SOC decisions, healing actions, and ML retrains
System-wide runtime metrics: uptime, memory, CPU, active WebSocket connections, DB pool stats
Frontend Factory sub-page for managing dynamically generated Next.js page requirements through a submit-generate-preview-deploy workflow

What's Built

Aggregated system dashboard (7 autonomous engines)	✓ Complete
Per-system health cards (status, schedule, last run, errors, success rate)	✓ Complete
Manual trigger for every registered system	✓ Complete
Emergency stop — individual and kill-switch-all	✓ Complete
Restart — individual and restart-all	✓ Complete
Combined activity log across all systems (time-sorted, configurable limit)	✓ Complete
System-wide metrics endpoint (uptime, memory, CPU, WS connections, DB pool)	✓ Complete
Frontend main dashboard with 8-second auto-refresh	✓ Complete
Emergency mode banner (shown when any system is stopped)	✓ Complete
Toast notification system for action results (4-second auto-dismiss)	✓ Complete
Frontend Factory sub-page at /admin-ops/factory	✓ Complete
Factory: requirement submission, generate, preview, deploy workflow	✓ Complete
Factory: template browser with 6 page types	✓ Complete
Factory: component library listing	✓ Complete

Architecture

Admin Router

File: platform/api/app/routers/admin_ops.py — Prefix: /api/v1/admin

The router uses lazy import functions for all seven engine singletons to avoid circular-import issues at startup. Each engine is accessed through a private getter (e.g., _get_integrity_monitor()). The SYSTEM_REGISTRY dict maps slug keys to their display label, getter function reference, and schedule description string. A private _background_tasks dict tracks the asyncio.Task handle for each engine, enabling the router to cancel and recreate tasks during stop/restart operations without requiring the services module to expose task management.

The core helper _system_status(name: str) branches on the system name, introspects engine-specific state attributes, and returns a normalized dict containing running, last_run, error_count, success_rate, and any engine-specific extras (e.g., pending_deploy, buffer_size, total_deploys). Success rate is computed as (total - errors) / total * 100.

Frontend Factory Router

File: platform/api/app/routers/frontend_factory.py — Prefix: /api/v1/factory

Manages the full lifecycle of page requirements from submission through code generation, preview, and file deployment. The FrontendFactoryEngine singleton (app/services/frontend_factory.py) compiles Next.js TSX from template generators, validates requirements, writes output files to the frontend source tree, and signals the auto-deploy pipeline. A COMPONENT_REGISTRY dict (app/services/component_registry.py) catalogues available UI building blocks accessible through the component library endpoint.

Autonomous Systems Registry

Seven systems are registered in SYSTEM_REGISTRY at module load time. Each entry carries a slug key (used verbatim in API paths), a human-readable label, a lazy getter returning the engine singleton, and a schedule description. The _system_status helper branches on the slug to extract engine-specific fields.

Platform Integrity Monitor

integrity-scan | Every 30 min

Scans registered API routes and frontend pages, detects missing endpoints, and flags the auto-deploy pipeline when new pages need to be generated or deployed. Extras: total_scans, pages_generated, pending_deploy.

Auto-Deploy Pipeline

deploy | On demand (watches integrity monitor)

Triggers container image builds and Kubernetes rolling updates when the integrity monitor signals new content. Tracks deploy_history list and is_deploying boolean. Errors counted as failed deploy entries.

Autonomous SOC Engine

soc-cycle | 4 concurrent loops: 30min / 10s / 1hr / 5min

Four asyncio loops covering alert triage, escalation, SOAR playbook execution, and scheduled threat hunting. Error count is escalated_to_human. Trigger runs get_stats() and returns current cycle statistics.

Self-Healing Engine

heal-check | 4 concurrent loops: 30s / 60s / 120s / 1hr

Monitors five subsystems (database, Redis, ML models, advisory engine, WebSocket). Executes automated remediation when health checks fail. Tracks circuit breaker failure counts and a healing_log. Trigger runs all five health checks via asyncio.gather.

ML Training Pipeline

ml-retrain | Auto-retrain at 100 feedback samples

Maintains three sklearn models: AlertClassifier (RandomForest), AnomalyDetector (IsolationForest), ThreatScorer (GradientBoosting). Always reported as running. Exposes training_buffer, retrain_threshold, total_retrains. Trigger calls trigger_retrain().

Threat Advisory Engine

advisory-refresh | Every 30 min

Pulls threat intelligence feeds and synthesizes actionable advisories. Managed as a background task in main.py, so always considered running. Exposes total_advisories and feed_count. Trigger calls refresh_advisories().

Frontend Factory Engine

frontend-factory | Continuous 10-second queue check

Processes the page requirement queue and generates Next.js TSX files from templates. Exposes total_requirements, total_generated, total_deployed, success_rate_pct via get_stats(). Has its own sub-page at /admin-ops/factory.

API Endpoints

Admin Router — `/api/v1/admin`

Method	Path	Description
GET /api/v1/admin/dashboard	Aggregated status of all 7 autonomous systems; sets `emergency_mode: true` if any system is stopped
POST /api/v1/admin/trigger/{system_name}	Manual trigger for a system — runs one cycle or returns current stats immediately; returns system-specific result payload
GET /api/v1/admin/activity-log	Combined time-sorted event log from all engines; optional `?limit=100` query param
GET /api/v1/admin/system/{name}/detail	Full engine status for one system; includes deep fields like `recent_decisions` (SOC) or `performance_log` (ML)
POST /api/v1/admin/emergency-stop/{name}	Stop a specific system — calls `engine.stop()` if available, else sets `_running=False`, cancels asyncio task
POST /api/v1/admin/emergency-stop-all	Kill switch — iterates all 7 systems and stops each; returns per-system result map
POST /api/v1/admin/restart/{name}	Restart a system — stops first if running, then creates new asyncio.Task via `engine.start()`
POST /api/v1/admin/restart-all	Restart all systems; returns per-system result map
GET /api/v1/admin/metrics	System-wide metrics: uptime, memory (MB + %), CPU %, threads, WS connections, DB pool, systems running count

Frontend Factory Router — `/api/v1/factory`

Method	Path	Description
POST /api/v1/factory/requirements	Submit new page requirement to queue; validates before enqueuing; returns RequirementResponse
GET /api/v1/factory/requirements	List all requirements with current status
GET /api/v1/factory/requirements/{req_id}	Full requirement detail including api_endpoints, data_fields, features, has_code flag
POST /api/v1/factory/requirements/{req_id}/generate	Trigger TSX generation; only valid when status is `pending` or `failed`
GET /api/v1/factory/requirements/{req_id}/preview	Returns full tsx_code, api_wiring object, sidebar_entry config, and code_lines count
POST /api/v1/factory/requirements/{req_id}/deploy	Write generated file to disk and signal auto-deploy pipeline
DELETE /api/v1/factory/requirements/{req_id}	Cancel and remove a requirement from the queue
GET /api/v1/factory/templates	List all page templates with label, description, config_keys, and example_config
GET /api/v1/factory/templates/{type}/schema	Full configuration schema for a specific template type
POST /api/v1/factory/generate-preview	One-shot generate without saving to queue; returns tsx_code + complexity estimate + validation result
GET /api/v1/factory/stats	Factory stats: total_requirements, total_generated, total_deployed, total_failed, avg_generation_time_ms, success_rate
GET /api/v1/factory/component-library	List registered UI components with name, description, and props

Frontend Routes

Route	File	Description
/admin-ops	`src/app/admin-ops/page.tsx`	Main operations dashboard — system status grid, metrics panel, activity feed; auto-refreshes every 8s
/admin-ops/factory	`src/app/admin-ops/factory/page.tsx`	Frontend Factory sub-page — requirement management, template browser, generate/preview/deploy workflow

Both pages are Next.js App Router client components ("use client"). The main dashboard calls GET /api/v1/admin/dashboard and GET /api/v1/admin/metrics in parallel, followed by a separate call to GET /api/v1/admin/activity-log that fails gracefully if unavailable. API calls are made via the shared api client from @/lib/api-client.

Frontend Factory Sub-page

The factory sub-page at /admin-ops/factory provides a full lifecycle UI for the Frontend Factory Engine. It can be navigated to directly or linked from the frontend-factory system card on the main dashboard.

Page Type Templates

Type Key	Label	Description
`data_table`	Data Table	Sortable, filterable table with search and pagination
`dashboard`	Dashboard	Metric cards, charts, and summary widgets
`form`	Form Page	Create/edit forms with field validation
`detail`	Detail View	Single entity detail page with related data panels
`analytics`	Analytics Page	Charts, graphs, and statistical analysis panels
`settings`	Settings Page	Configuration and preferences management form

Requirement Lifecycle States

Requirements move through: pending → generating → generated → deployed. Failed attempts land in failed state and can be re-triggered.

Submit — POST to /api/v1/factory/requirements; validation checks route format, required fields, and template compatibility
Generate — POST to /api/v1/factory/requirements/{id}/generate; factory selects the matching template generator and compiles TSX; records generation_time_ms
Preview — GET /api/v1/factory/requirements/{id}/preview; returns full TSX code, API wiring object, and sidebar entry config without side effects
Deploy — POST to /api/v1/factory/requirements/{id}/deploy; writes the .tsx file to the frontend source tree and signals the auto-deploy pipeline to rebuild the container

Data Models

The admin ops router does not use SQLAlchemy database models. It introspects in-memory state of engine singletons at request time. The structures below document the API response shapes.

Dashboard Response

{
  "timestamp": "2026-03-01T12:00:00",
  "emergency_mode": false,                 // true if any system.running == false
  "systems": {
    "integrity-scan": {
      "name": "integrity-scan",
      "label": "Platform Integrity Monitor",
      "running": true,
      "schedule": "Every 30 min",
      "last_run": "2026-03-01T11:30:00",
      "error_count": 0,
      "success_rate": 100.0,
      "total_scans": 12,
      "pages_generated": 5,
      "pending_deploy": false
    },
    "soc-cycle": {
      "name": "soc-cycle",
      "label": "Autonomous SOC Engine",
      "running": true,
      "schedule": "4 concurrent loops (30min / 10s / 1hr / 5min)",
      "last_run": "2026-03-01T11:59:50",
      "error_count": 0,                    // escalated_to_human count
      "success_rate": 100.0,
      "alerts_processed": 847,
      "auto_resolved": 720
    }
    // ... 5 more systems
  }
}

Metrics Response

{
  "timestamp": "2026-03-01T12:00:00",
  "uptime_seconds": 3621.4,
  "memory_mb": 184.2,
  "memory_percent": 2.3,
  "cpu_percent": 1.8,
  "threads": 14,
  "active_ws_connections": 3,
  "db_pool": {
    "size": 5,
    "checked_in": 4,
    "checked_out": 1,
    "overflow": 0
  },
  "autonomous_systems_total": 7,
  "autonomous_systems_running": 7,
  "note": "psutil not installed — install for detailed metrics"  // when psutil absent
}

Activity Log Event

{
  "timestamp": "2026-03-01T11:45:00",
  "system": "integrity-scan",
  "action": "integrity_scan",
  "detail": "Scanned 47 routes, 32 pages, generated 2 pages",
  "result": "success"                      // success | failed | auto_resolved | rule_generated
}

Factory RequirementSubmission (POST body)

{
  "title": "Vulnerability Risk Matrix",
  "description": "Table of open CVEs grouped by CVSS score and affected asset",
  "page_type": "data_table",
  "route": "/vulnerabilities/risk",
  "api_endpoints": [
    { "method": "GET", "path": "/api/v1/vulnerabilities", "description": "List CVEs" }
  ],
  "data_fields": [
    { "name": "cve_id", "label": "CVE ID", "type": "string", "required": true },
    { "name": "cvss_score", "label": "CVSS", "type": "number" },
    { "name": "status", "label": "Status", "type": "enum",
      "values": ["open", "mitigated", "accepted"] }
  ],
  "features": ["search", "filter", "export", "pagination"],
  "refresh_interval": 30,
  "sidebar_section": "main",
  "sidebar_icon": "ShieldAlert",
  "priority": "high",
  "requested_by": "admin"
}

Prerequisites

Platform Integrity Monitor — app/services/platform_integrity.py — singleton integrity_monitor
Auto-Deploy Pipeline — app/services/auto_deploy.py — singleton auto_deploy
Autonomous SOC Engine — app/services/autonomous_soc.py — singleton autonomous_soc
Self-Healing Engine — app/services/self_healing.py — singleton self_healing_engine
ML Training Pipeline — app/ml/training_pipeline.py — singleton ml_pipeline
Threat Advisory Engine — app/services/threat_advisory_service.py — singleton advisory_engine
Frontend Factory Engine — app/services/frontend_factory.py — singleton frontend_factory; requires TEMPLATE_SCHEMAS dict and GENERATORS map
Component Registry — app/services/component_registry.py — dict COMPONENT_REGISTRY used by the component library endpoint
WebSocket Manager — app/routers/websocket.py — manager.active_connections for WS connection count in metrics
Database Engine — app/core/database.py — async_engine.pool for DB pool stats
psutil (optional) — Python package; if absent, memory and CPU fields return null and a note is included in the metrics response
FastAPI & asyncio — All trigger, stop, and restart handlers are async def; restart creates asyncio.create_task handles

UI Layout

Main Dashboard — `/admin-ops`

Emergency Mode Banner — Conditionally rendered (red border, bg-red-50) when any system is stopped. Shows count of stopped systems, their labels, and a shortcut "Restart All" button. Disappears once all systems are running again.
Header Row — Title "Admin Operations Center" with Settings2 icon (orange accent). Right-aligned action bar: Refresh button (slate border), Restart All (blue), Emergency Stop All (red). Emergency Stop All requires a browser confirm() dialog before firing.
Toast Notifications — Ephemeral success/error banners rendered below the header. Auto-dismiss after 4 seconds via setTimeout. Success banners are emerald, error banners are red.
System Status Grid — Responsive grid (1 col on mobile, 2 on md, 3 on xl). Each system card has: colored border (emerald when running, red when stopped), engine icon, label, schedule text, animated pulse dot with status label, a 3-column stats row (Last Run, Errors, Success%), optional error message box, and per-system Trigger/Stop or Trigger/Restart buttons.
Metrics Panel — Left 1/3 column card. Rows for: Uptime (formatted as Xd Xh Xm), Memory Usage (progress bar, thresholds: green <250 MB, amber 250–400 MB, red >400 MB), CPU %, WebSocket Connections, Systems Running (X/7). DB pool section (2x2 grid): Pool Size, In Use, Available, Overflow.
Activity Feed — Right 2/3 column card, max height 520px with overflow-y scroll. Each event row: system icon, orange system slug tag, action name badge, result badge (emerald/red/slate), detail text, and clock timestamp. Empty state shows a CheckCircle icon with initialization message.

Factory Sub-page — `/admin-ops/factory`

Stats Bar — 4 metric cards: Total Requirements, Generated, Deployed, Success Rate (percentage).
Requirement Table — Lists all requirements with status badges color-coded by lifecycle stage, page type icon, creation time, generation time, and per-row action buttons (Generate, Preview, Deploy, Delete).
New Requirement Form — Expandable panel with fields: title, description, page type selector with icon grid, route, API endpoints list (add/remove), data fields list (add/remove with type selector), features multi-select, refresh interval, sidebar section, sidebar icon, priority, and requested_by.
Code Preview Modal — Displays generated TSX in a dark monospace pre block with a copy-to-clipboard button and line count indicator.