Admin Operations Center
Overview
The Admin Operations Center is the master control plane for every autonomous system running inside ThreatOps. As the platform evolves toward a self-operating SOC, individual services run as long-lived background loops — integrity scans, ML retraining, autonomous SOC cycles, self-healing checks, advisory refreshes, and the frontend factory engine. Without a unified control surface, operators have no visibility into whether these systems are healthy, and no mechanism to intervene when something stalls.
This module aggregates the real-time status of all seven autonomous engines into a single dashboard, surfaces live process metrics (CPU, memory, WebSocket connections, database pool), and provides per-system action controls — trigger, stop, and restart — as well as platform-wide kill-switch and restart-all operations. A time-ordered activity feed consolidates events from every engine into one scrollable log. A sub-page at /admin-ops/factory extends this with a full lifecycle manager for the Frontend Factory Engine, covering requirement submission, TSX code generation, preview, and deployment.
What Was Proposed
- A centralized operator console for monitoring all autonomous and scheduled platform services in one view
- Per-system health cards showing running state, last execution timestamp, error count, and success rate
- Manual trigger capability for every system without requiring direct shell access to any pod
- An emergency stop mechanism able to halt any individual system or all systems simultaneously
- Restart controls to bring stopped systems back online without a full platform redeploy
- A unified, time-sorted activity log aggregating events from integrity scans, deploys, SOC decisions, healing actions, and ML retrains
- System-wide runtime metrics: uptime, memory, CPU, active WebSocket connections, DB pool stats
- Frontend Factory sub-page for managing dynamically generated Next.js page requirements through a submit-generate-preview-deploy workflow
What's Built
| Aggregated system dashboard (7 autonomous engines) | ✓ Complete |
| Per-system health cards (status, schedule, last run, errors, success rate) | ✓ Complete |
| Manual trigger for every registered system | ✓ Complete |
| Emergency stop — individual and kill-switch-all | ✓ Complete |
| Restart — individual and restart-all | ✓ Complete |
| Combined activity log across all systems (time-sorted, configurable limit) | ✓ Complete |
| System-wide metrics endpoint (uptime, memory, CPU, WS connections, DB pool) | ✓ Complete |
| Frontend main dashboard with 8-second auto-refresh | ✓ Complete |
| Emergency mode banner (shown when any system is stopped) | ✓ Complete |
| Toast notification system for action results (4-second auto-dismiss) | ✓ Complete |
| Frontend Factory sub-page at /admin-ops/factory | ✓ Complete |
| Factory: requirement submission, generate, preview, deploy workflow | ✓ Complete |
| Factory: template browser with 6 page types | ✓ Complete |
| Factory: component library listing | ✓ Complete |
Architecture
Admin Router
File: platform/api/app/routers/admin_ops.py — Prefix: /api/v1/admin
The router uses lazy import functions for all seven engine singletons to avoid circular-import issues at startup. Each engine is accessed through a private getter (e.g., _get_integrity_monitor()). The SYSTEM_REGISTRY dict maps slug keys to their display label, getter function reference, and schedule description string. A private _background_tasks dict tracks the asyncio.Task handle for each engine, enabling the router to cancel and recreate tasks during stop/restart operations without requiring the services module to expose task management.
The core helper _system_status(name: str) branches on the system name, introspects engine-specific state attributes, and returns a normalized dict containing running, last_run, error_count, success_rate, and any engine-specific extras (e.g., pending_deploy, buffer_size, total_deploys). Success rate is computed as (total - errors) / total * 100.
Frontend Factory Router
File: platform/api/app/routers/frontend_factory.py — Prefix: /api/v1/factory
Manages the full lifecycle of page requirements from submission through code generation, preview, and file deployment. The FrontendFactoryEngine singleton (app/services/frontend_factory.py) compiles Next.js TSX from template generators, validates requirements, writes output files to the frontend source tree, and signals the auto-deploy pipeline. A COMPONENT_REGISTRY dict (app/services/component_registry.py) catalogues available UI building blocks accessible through the component library endpoint.
Autonomous Systems Registry
Seven systems are registered in SYSTEM_REGISTRY at module load time. Each entry carries a slug key (used verbatim in API paths), a human-readable label, a lazy getter returning the engine singleton, and a schedule description. The _system_status helper branches on the slug to extract engine-specific fields.
Platform Integrity Monitor
integrity-scan | Every 30 minScans registered API routes and frontend pages, detects missing endpoints, and flags the auto-deploy pipeline when new pages need to be generated or deployed. Extras: total_scans, pages_generated, pending_deploy.
Auto-Deploy Pipeline
deploy | On demand (watches integrity monitor)Triggers container image builds and Kubernetes rolling updates when the integrity monitor signals new content. Tracks deploy_history list and is_deploying boolean. Errors counted as failed deploy entries.
Autonomous SOC Engine
soc-cycle | 4 concurrent loops: 30min / 10s / 1hr / 5minFour asyncio loops covering alert triage, escalation, SOAR playbook execution, and scheduled threat hunting. Error count is escalated_to_human. Trigger runs get_stats() and returns current cycle statistics.
Self-Healing Engine
heal-check | 4 concurrent loops: 30s / 60s / 120s / 1hrMonitors five subsystems (database, Redis, ML models, advisory engine, WebSocket). Executes automated remediation when health checks fail. Tracks circuit breaker failure counts and a healing_log. Trigger runs all five health checks via asyncio.gather.
ML Training Pipeline
ml-retrain | Auto-retrain at 100 feedback samplesMaintains three sklearn models: AlertClassifier (RandomForest), AnomalyDetector (IsolationForest), ThreatScorer (GradientBoosting). Always reported as running. Exposes training_buffer, retrain_threshold, total_retrains. Trigger calls trigger_retrain().
Threat Advisory Engine
advisory-refresh | Every 30 minPulls threat intelligence feeds and synthesizes actionable advisories. Managed as a background task in main.py, so always considered running. Exposes total_advisories and feed_count. Trigger calls refresh_advisories().
Frontend Factory Engine
frontend-factory | Continuous 10-second queue checkProcesses the page requirement queue and generates Next.js TSX files from templates. Exposes total_requirements, total_generated, total_deployed, success_rate_pct via get_stats(). Has its own sub-page at /admin-ops/factory.
API Endpoints
Admin Router — /api/v1/admin
| Method | Path | Description |
|---|---|---|
| GET /api/v1/admin/dashboard | Aggregated status of all 7 autonomous systems; sets emergency_mode: true if any system is stopped | |
| POST /api/v1/admin/trigger/{system_name} | Manual trigger for a system — runs one cycle or returns current stats immediately; returns system-specific result payload | |
| GET /api/v1/admin/activity-log | Combined time-sorted event log from all engines; optional ?limit=100 query param | |
| GET /api/v1/admin/system/{name}/detail | Full engine status for one system; includes deep fields like recent_decisions (SOC) or performance_log (ML) | |
| POST /api/v1/admin/emergency-stop/{name} | Stop a specific system — calls engine.stop() if available, else sets _running=False, cancels asyncio task | |
| POST /api/v1/admin/emergency-stop-all | Kill switch — iterates all 7 systems and stops each; returns per-system result map | |
| POST /api/v1/admin/restart/{name} | Restart a system — stops first if running, then creates new asyncio.Task via engine.start() | |
| POST /api/v1/admin/restart-all | Restart all systems; returns per-system result map | |
| GET /api/v1/admin/metrics | System-wide metrics: uptime, memory (MB + %), CPU %, threads, WS connections, DB pool, systems running count | |
Frontend Factory Router — /api/v1/factory
| Method | Path | Description |
|---|---|---|
| POST /api/v1/factory/requirements | Submit new page requirement to queue; validates before enqueuing; returns RequirementResponse | |
| GET /api/v1/factory/requirements | List all requirements with current status | |
| GET /api/v1/factory/requirements/{req_id} | Full requirement detail including api_endpoints, data_fields, features, has_code flag | |
| POST /api/v1/factory/requirements/{req_id}/generate | Trigger TSX generation; only valid when status is pending or failed | |
| GET /api/v1/factory/requirements/{req_id}/preview | Returns full tsx_code, api_wiring object, sidebar_entry config, and code_lines count | |
| POST /api/v1/factory/requirements/{req_id}/deploy | Write generated file to disk and signal auto-deploy pipeline | |
| DELETE /api/v1/factory/requirements/{req_id} | Cancel and remove a requirement from the queue | |
| GET /api/v1/factory/templates | List all page templates with label, description, config_keys, and example_config | |
| GET /api/v1/factory/templates/{type}/schema | Full configuration schema for a specific template type | |
| POST /api/v1/factory/generate-preview | One-shot generate without saving to queue; returns tsx_code + complexity estimate + validation result | |
| GET /api/v1/factory/stats | Factory stats: total_requirements, total_generated, total_deployed, total_failed, avg_generation_time_ms, success_rate | |
| GET /api/v1/factory/component-library | List registered UI components with name, description, and props | |
Frontend Routes
| Route | File | Description |
|---|---|---|
| /admin-ops | src/app/admin-ops/page.tsx |
Main operations dashboard — system status grid, metrics panel, activity feed; auto-refreshes every 8s |
| /admin-ops/factory | src/app/admin-ops/factory/page.tsx |
Frontend Factory sub-page — requirement management, template browser, generate/preview/deploy workflow |
Both pages are Next.js App Router client components ("use client"). The main dashboard calls GET /api/v1/admin/dashboard and GET /api/v1/admin/metrics in parallel, followed by a separate call to GET /api/v1/admin/activity-log that fails gracefully if unavailable. API calls are made via the shared api client from @/lib/api-client.
Frontend Factory Sub-page
The factory sub-page at /admin-ops/factory provides a full lifecycle UI for the Frontend Factory Engine. It can be navigated to directly or linked from the frontend-factory system card on the main dashboard.
Page Type Templates
| Type Key | Label | Description |
|---|---|---|
data_table | Data Table | Sortable, filterable table with search and pagination |
dashboard | Dashboard | Metric cards, charts, and summary widgets |
form | Form Page | Create/edit forms with field validation |
detail | Detail View | Single entity detail page with related data panels |
analytics | Analytics Page | Charts, graphs, and statistical analysis panels |
settings | Settings Page | Configuration and preferences management form |
Requirement Lifecycle States
Requirements move through: pending → generating → generated → deployed. Failed attempts land in failed state and can be re-triggered.
- Submit — POST to
/api/v1/factory/requirements; validation checks route format, required fields, and template compatibility - Generate — POST to
/api/v1/factory/requirements/{id}/generate; factory selects the matching template generator and compiles TSX; recordsgeneration_time_ms - Preview — GET
/api/v1/factory/requirements/{id}/preview; returns full TSX code, API wiring object, and sidebar entry config without side effects - Deploy — POST to
/api/v1/factory/requirements/{id}/deploy; writes the .tsx file to the frontend source tree and signals the auto-deploy pipeline to rebuild the container
Data Models
The admin ops router does not use SQLAlchemy database models. It introspects in-memory state of engine singletons at request time. The structures below document the API response shapes.
Dashboard Response
{
"timestamp": "2026-03-01T12:00:00",
"emergency_mode": false, // true if any system.running == false
"systems": {
"integrity-scan": {
"name": "integrity-scan",
"label": "Platform Integrity Monitor",
"running": true,
"schedule": "Every 30 min",
"last_run": "2026-03-01T11:30:00",
"error_count": 0,
"success_rate": 100.0,
"total_scans": 12,
"pages_generated": 5,
"pending_deploy": false
},
"soc-cycle": {
"name": "soc-cycle",
"label": "Autonomous SOC Engine",
"running": true,
"schedule": "4 concurrent loops (30min / 10s / 1hr / 5min)",
"last_run": "2026-03-01T11:59:50",
"error_count": 0, // escalated_to_human count
"success_rate": 100.0,
"alerts_processed": 847,
"auto_resolved": 720
}
// ... 5 more systems
}
}
Metrics Response
{
"timestamp": "2026-03-01T12:00:00",
"uptime_seconds": 3621.4,
"memory_mb": 184.2,
"memory_percent": 2.3,
"cpu_percent": 1.8,
"threads": 14,
"active_ws_connections": 3,
"db_pool": {
"size": 5,
"checked_in": 4,
"checked_out": 1,
"overflow": 0
},
"autonomous_systems_total": 7,
"autonomous_systems_running": 7,
"note": "psutil not installed — install for detailed metrics" // when psutil absent
}
Activity Log Event
{
"timestamp": "2026-03-01T11:45:00",
"system": "integrity-scan",
"action": "integrity_scan",
"detail": "Scanned 47 routes, 32 pages, generated 2 pages",
"result": "success" // success | failed | auto_resolved | rule_generated
}
Factory RequirementSubmission (POST body)
{
"title": "Vulnerability Risk Matrix",
"description": "Table of open CVEs grouped by CVSS score and affected asset",
"page_type": "data_table",
"route": "/vulnerabilities/risk",
"api_endpoints": [
{ "method": "GET", "path": "/api/v1/vulnerabilities", "description": "List CVEs" }
],
"data_fields": [
{ "name": "cve_id", "label": "CVE ID", "type": "string", "required": true },
{ "name": "cvss_score", "label": "CVSS", "type": "number" },
{ "name": "status", "label": "Status", "type": "enum",
"values": ["open", "mitigated", "accepted"] }
],
"features": ["search", "filter", "export", "pagination"],
"refresh_interval": 30,
"sidebar_section": "main",
"sidebar_icon": "ShieldAlert",
"priority": "high",
"requested_by": "admin"
}
Prerequisites
- Platform Integrity Monitor —
app/services/platform_integrity.py— singletonintegrity_monitor - Auto-Deploy Pipeline —
app/services/auto_deploy.py— singletonauto_deploy - Autonomous SOC Engine —
app/services/autonomous_soc.py— singletonautonomous_soc - Self-Healing Engine —
app/services/self_healing.py— singletonself_healing_engine - ML Training Pipeline —
app/ml/training_pipeline.py— singletonml_pipeline - Threat Advisory Engine —
app/services/threat_advisory_service.py— singletonadvisory_engine - Frontend Factory Engine —
app/services/frontend_factory.py— singletonfrontend_factory; requiresTEMPLATE_SCHEMASdict andGENERATORSmap - Component Registry —
app/services/component_registry.py— dictCOMPONENT_REGISTRYused by the component library endpoint - WebSocket Manager —
app/routers/websocket.py—manager.active_connectionsfor WS connection count in metrics - Database Engine —
app/core/database.py—async_engine.poolfor DB pool stats - psutil (optional) — Python package; if absent, memory and CPU fields return
nulland a note is included in the metrics response - FastAPI & asyncio — All trigger, stop, and restart handlers are
async def; restart createsasyncio.create_taskhandles
UI Layout
Main Dashboard — /admin-ops
- Emergency Mode Banner — Conditionally rendered (red border, bg-red-50) when any system is stopped. Shows count of stopped systems, their labels, and a shortcut "Restart All" button. Disappears once all systems are running again.
- Header Row — Title "Admin Operations Center" with Settings2 icon (orange accent). Right-aligned action bar: Refresh button (slate border), Restart All (blue), Emergency Stop All (red). Emergency Stop All requires a browser
confirm()dialog before firing. - Toast Notifications — Ephemeral success/error banners rendered below the header. Auto-dismiss after 4 seconds via
setTimeout. Success banners are emerald, error banners are red. - System Status Grid — Responsive grid (1 col on mobile, 2 on md, 3 on xl). Each system card has: colored border (emerald when running, red when stopped), engine icon, label, schedule text, animated pulse dot with status label, a 3-column stats row (Last Run, Errors, Success%), optional error message box, and per-system Trigger/Stop or Trigger/Restart buttons.
- Metrics Panel — Left 1/3 column card. Rows for: Uptime (formatted as Xd Xh Xm), Memory Usage (progress bar, thresholds: green <250 MB, amber 250–400 MB, red >400 MB), CPU %, WebSocket Connections, Systems Running (X/7). DB pool section (2x2 grid): Pool Size, In Use, Available, Overflow.
- Activity Feed — Right 2/3 column card, max height 520px with overflow-y scroll. Each event row: system icon, orange system slug tag, action name badge, result badge (emerald/red/slate), detail text, and clock timestamp. Empty state shows a CheckCircle icon with initialization message.
Factory Sub-page — /admin-ops/factory
- Stats Bar — 4 metric cards: Total Requirements, Generated, Deployed, Success Rate (percentage).
- Requirement Table — Lists all requirements with status badges color-coded by lifecycle stage, page type icon, creation time, generation time, and per-row action buttons (Generate, Preview, Deploy, Delete).
- New Requirement Form — Expandable panel with fields: title, description, page type selector with icon grid, route, API endpoints list (add/remove), data fields list (add/remove with type selector), features multi-select, refresh interval, sidebar section, sidebar icon, priority, and requested_by.
- Code Preview Modal — Displays generated TSX in a dark monospace pre block with a copy-to-clipboard button and line count indicator.