Admin Operations Center

Complete

Overview

The Admin Operations Center is the master control plane for every autonomous system running inside ThreatOps. As the platform evolves toward a self-operating SOC, individual services run as long-lived background loops — integrity scans, ML retraining, autonomous SOC cycles, self-healing checks, advisory refreshes, and the frontend factory engine. Without a unified control surface, operators have no visibility into whether these systems are healthy, and no mechanism to intervene when something stalls.

This module aggregates the real-time status of all seven autonomous engines into a single dashboard, surfaces live process metrics (CPU, memory, WebSocket connections, database pool), and provides per-system action controls — trigger, stop, and restart — as well as platform-wide kill-switch and restart-all operations. A time-ordered activity feed consolidates events from every engine into one scrollable log. A sub-page at /admin-ops/factory extends this with a full lifecycle manager for the Frontend Factory Engine, covering requirement submission, TSX code generation, preview, and deployment.

What Was Proposed

What's Built

Aggregated system dashboard (7 autonomous engines)✓ Complete
Per-system health cards (status, schedule, last run, errors, success rate)✓ Complete
Manual trigger for every registered system✓ Complete
Emergency stop — individual and kill-switch-all✓ Complete
Restart — individual and restart-all✓ Complete
Combined activity log across all systems (time-sorted, configurable limit)✓ Complete
System-wide metrics endpoint (uptime, memory, CPU, WS connections, DB pool)✓ Complete
Frontend main dashboard with 8-second auto-refresh✓ Complete
Emergency mode banner (shown when any system is stopped)✓ Complete
Toast notification system for action results (4-second auto-dismiss)✓ Complete
Frontend Factory sub-page at /admin-ops/factory✓ Complete
Factory: requirement submission, generate, preview, deploy workflow✓ Complete
Factory: template browser with 6 page types✓ Complete
Factory: component library listing✓ Complete

Architecture

Admin Router

File: platform/api/app/routers/admin_ops.py — Prefix: /api/v1/admin

The router uses lazy import functions for all seven engine singletons to avoid circular-import issues at startup. Each engine is accessed through a private getter (e.g., _get_integrity_monitor()). The SYSTEM_REGISTRY dict maps slug keys to their display label, getter function reference, and schedule description string. A private _background_tasks dict tracks the asyncio.Task handle for each engine, enabling the router to cancel and recreate tasks during stop/restart operations without requiring the services module to expose task management.

The core helper _system_status(name: str) branches on the system name, introspects engine-specific state attributes, and returns a normalized dict containing running, last_run, error_count, success_rate, and any engine-specific extras (e.g., pending_deploy, buffer_size, total_deploys). Success rate is computed as (total - errors) / total * 100.

Frontend Factory Router

File: platform/api/app/routers/frontend_factory.py — Prefix: /api/v1/factory

Manages the full lifecycle of page requirements from submission through code generation, preview, and file deployment. The FrontendFactoryEngine singleton (app/services/frontend_factory.py) compiles Next.js TSX from template generators, validates requirements, writes output files to the frontend source tree, and signals the auto-deploy pipeline. A COMPONENT_REGISTRY dict (app/services/component_registry.py) catalogues available UI building blocks accessible through the component library endpoint.

Autonomous Systems Registry

Seven systems are registered in SYSTEM_REGISTRY at module load time. Each entry carries a slug key (used verbatim in API paths), a human-readable label, a lazy getter returning the engine singleton, and a schedule description. The _system_status helper branches on the slug to extract engine-specific fields.

Platform Integrity Monitor

integrity-scan  |  Every 30 min

Scans registered API routes and frontend pages, detects missing endpoints, and flags the auto-deploy pipeline when new pages need to be generated or deployed. Extras: total_scans, pages_generated, pending_deploy.

Auto-Deploy Pipeline

deploy  |  On demand (watches integrity monitor)

Triggers container image builds and Kubernetes rolling updates when the integrity monitor signals new content. Tracks deploy_history list and is_deploying boolean. Errors counted as failed deploy entries.

Autonomous SOC Engine

soc-cycle  |  4 concurrent loops: 30min / 10s / 1hr / 5min

Four asyncio loops covering alert triage, escalation, SOAR playbook execution, and scheduled threat hunting. Error count is escalated_to_human. Trigger runs get_stats() and returns current cycle statistics.

Self-Healing Engine

heal-check  |  4 concurrent loops: 30s / 60s / 120s / 1hr

Monitors five subsystems (database, Redis, ML models, advisory engine, WebSocket). Executes automated remediation when health checks fail. Tracks circuit breaker failure counts and a healing_log. Trigger runs all five health checks via asyncio.gather.

ML Training Pipeline

ml-retrain  |  Auto-retrain at 100 feedback samples

Maintains three sklearn models: AlertClassifier (RandomForest), AnomalyDetector (IsolationForest), ThreatScorer (GradientBoosting). Always reported as running. Exposes training_buffer, retrain_threshold, total_retrains. Trigger calls trigger_retrain().

Threat Advisory Engine

advisory-refresh  |  Every 30 min

Pulls threat intelligence feeds and synthesizes actionable advisories. Managed as a background task in main.py, so always considered running. Exposes total_advisories and feed_count. Trigger calls refresh_advisories().

Frontend Factory Engine

frontend-factory  |  Continuous 10-second queue check

Processes the page requirement queue and generates Next.js TSX files from templates. Exposes total_requirements, total_generated, total_deployed, success_rate_pct via get_stats(). Has its own sub-page at /admin-ops/factory.

API Endpoints

Admin Router — /api/v1/admin

MethodPathDescription
GET /api/v1/admin/dashboardAggregated status of all 7 autonomous systems; sets emergency_mode: true if any system is stopped
POST /api/v1/admin/trigger/{system_name}Manual trigger for a system — runs one cycle or returns current stats immediately; returns system-specific result payload
GET /api/v1/admin/activity-logCombined time-sorted event log from all engines; optional ?limit=100 query param
GET /api/v1/admin/system/{name}/detailFull engine status for one system; includes deep fields like recent_decisions (SOC) or performance_log (ML)
POST /api/v1/admin/emergency-stop/{name}Stop a specific system — calls engine.stop() if available, else sets _running=False, cancels asyncio task
POST /api/v1/admin/emergency-stop-allKill switch — iterates all 7 systems and stops each; returns per-system result map
POST /api/v1/admin/restart/{name}Restart a system — stops first if running, then creates new asyncio.Task via engine.start()
POST /api/v1/admin/restart-allRestart all systems; returns per-system result map
GET /api/v1/admin/metricsSystem-wide metrics: uptime, memory (MB + %), CPU %, threads, WS connections, DB pool, systems running count

Frontend Factory Router — /api/v1/factory

MethodPathDescription
POST /api/v1/factory/requirementsSubmit new page requirement to queue; validates before enqueuing; returns RequirementResponse
GET /api/v1/factory/requirementsList all requirements with current status
GET /api/v1/factory/requirements/{req_id}Full requirement detail including api_endpoints, data_fields, features, has_code flag
POST /api/v1/factory/requirements/{req_id}/generateTrigger TSX generation; only valid when status is pending or failed
GET /api/v1/factory/requirements/{req_id}/previewReturns full tsx_code, api_wiring object, sidebar_entry config, and code_lines count
POST /api/v1/factory/requirements/{req_id}/deployWrite generated file to disk and signal auto-deploy pipeline
DELETE /api/v1/factory/requirements/{req_id}Cancel and remove a requirement from the queue
GET /api/v1/factory/templatesList all page templates with label, description, config_keys, and example_config
GET /api/v1/factory/templates/{type}/schemaFull configuration schema for a specific template type
POST /api/v1/factory/generate-previewOne-shot generate without saving to queue; returns tsx_code + complexity estimate + validation result
GET /api/v1/factory/statsFactory stats: total_requirements, total_generated, total_deployed, total_failed, avg_generation_time_ms, success_rate
GET /api/v1/factory/component-libraryList registered UI components with name, description, and props

Frontend Routes

RouteFileDescription
/admin-ops src/app/admin-ops/page.tsx Main operations dashboard — system status grid, metrics panel, activity feed; auto-refreshes every 8s
/admin-ops/factory src/app/admin-ops/factory/page.tsx Frontend Factory sub-page — requirement management, template browser, generate/preview/deploy workflow

Both pages are Next.js App Router client components ("use client"). The main dashboard calls GET /api/v1/admin/dashboard and GET /api/v1/admin/metrics in parallel, followed by a separate call to GET /api/v1/admin/activity-log that fails gracefully if unavailable. API calls are made via the shared api client from @/lib/api-client.

Frontend Factory Sub-page

The factory sub-page at /admin-ops/factory provides a full lifecycle UI for the Frontend Factory Engine. It can be navigated to directly or linked from the frontend-factory system card on the main dashboard.

Page Type Templates

Type KeyLabelDescription
data_tableData TableSortable, filterable table with search and pagination
dashboardDashboardMetric cards, charts, and summary widgets
formForm PageCreate/edit forms with field validation
detailDetail ViewSingle entity detail page with related data panels
analyticsAnalytics PageCharts, graphs, and statistical analysis panels
settingsSettings PageConfiguration and preferences management form

Requirement Lifecycle States

Requirements move through: pendinggeneratinggenerateddeployed. Failed attempts land in failed state and can be re-triggered.

  1. Submit — POST to /api/v1/factory/requirements; validation checks route format, required fields, and template compatibility
  2. Generate — POST to /api/v1/factory/requirements/{id}/generate; factory selects the matching template generator and compiles TSX; records generation_time_ms
  3. Preview — GET /api/v1/factory/requirements/{id}/preview; returns full TSX code, API wiring object, and sidebar entry config without side effects
  4. Deploy — POST to /api/v1/factory/requirements/{id}/deploy; writes the .tsx file to the frontend source tree and signals the auto-deploy pipeline to rebuild the container

Data Models

The admin ops router does not use SQLAlchemy database models. It introspects in-memory state of engine singletons at request time. The structures below document the API response shapes.

Dashboard Response

{
  "timestamp": "2026-03-01T12:00:00",
  "emergency_mode": false,                 // true if any system.running == false
  "systems": {
    "integrity-scan": {
      "name": "integrity-scan",
      "label": "Platform Integrity Monitor",
      "running": true,
      "schedule": "Every 30 min",
      "last_run": "2026-03-01T11:30:00",
      "error_count": 0,
      "success_rate": 100.0,
      "total_scans": 12,
      "pages_generated": 5,
      "pending_deploy": false
    },
    "soc-cycle": {
      "name": "soc-cycle",
      "label": "Autonomous SOC Engine",
      "running": true,
      "schedule": "4 concurrent loops (30min / 10s / 1hr / 5min)",
      "last_run": "2026-03-01T11:59:50",
      "error_count": 0,                    // escalated_to_human count
      "success_rate": 100.0,
      "alerts_processed": 847,
      "auto_resolved": 720
    }
    // ... 5 more systems
  }
}

Metrics Response

{
  "timestamp": "2026-03-01T12:00:00",
  "uptime_seconds": 3621.4,
  "memory_mb": 184.2,
  "memory_percent": 2.3,
  "cpu_percent": 1.8,
  "threads": 14,
  "active_ws_connections": 3,
  "db_pool": {
    "size": 5,
    "checked_in": 4,
    "checked_out": 1,
    "overflow": 0
  },
  "autonomous_systems_total": 7,
  "autonomous_systems_running": 7,
  "note": "psutil not installed — install for detailed metrics"  // when psutil absent
}

Activity Log Event

{
  "timestamp": "2026-03-01T11:45:00",
  "system": "integrity-scan",
  "action": "integrity_scan",
  "detail": "Scanned 47 routes, 32 pages, generated 2 pages",
  "result": "success"                      // success | failed | auto_resolved | rule_generated
}

Factory RequirementSubmission (POST body)

{
  "title": "Vulnerability Risk Matrix",
  "description": "Table of open CVEs grouped by CVSS score and affected asset",
  "page_type": "data_table",
  "route": "/vulnerabilities/risk",
  "api_endpoints": [
    { "method": "GET", "path": "/api/v1/vulnerabilities", "description": "List CVEs" }
  ],
  "data_fields": [
    { "name": "cve_id", "label": "CVE ID", "type": "string", "required": true },
    { "name": "cvss_score", "label": "CVSS", "type": "number" },
    { "name": "status", "label": "Status", "type": "enum",
      "values": ["open", "mitigated", "accepted"] }
  ],
  "features": ["search", "filter", "export", "pagination"],
  "refresh_interval": 30,
  "sidebar_section": "main",
  "sidebar_icon": "ShieldAlert",
  "priority": "high",
  "requested_by": "admin"
}

Prerequisites

UI Layout

Main Dashboard — /admin-ops

Factory Sub-page — /admin-ops/factory