Security Analytics

Complete

Overview

The Security Analytics module exposes the internals of the ThreatOps classical machine learning pipeline to operators and analysts. Rather than treating AI triage as a black box, this module provides a live window into model health, training progress, version history, feature importance rankings, and the ensemble scoring strategy that drives every alert disposition decision on the platform.

The frontend page at /analytics polls the ML backend every 30 seconds and renders the state of three sklearn models — an Alert Classifier, an Anomaly Detector, and a Threat Scorer — alongside a training buffer progress bar, the 70/30 ensemble blend visualization, a ranked bar chart of the top 17 RandomForest feature importances, and a summary of the four possible disposition outcomes and their confidence thresholds. The backend is served entirely through the /api/v1/ml router, which also handles real-time predictions, analyst feedback recording, forced retraining, and model version history retrieval for rollback workflows.

What Was Proposed

A classical ML pipeline with three specialized models covering alert classification, anomaly detection, and threat risk scoring
Ensemble scoring that blends ML confidence with rule-based scores at a 70/30 ratio to balance data-driven decisions with deterministic expert rules
Automatic model retraining triggered by accumulated analyst feedback, without requiring manual intervention or scheduled jobs
Analyst feedback loop: every resolve, escalate, or false-positive action feeds labeled training data back to the models
Feature importance transparency so analysts can audit which alert attributes drive high-risk scores
Model versioning with persistent storage (Azure Blob) so retrained models can be rolled back if performance regresses
Real-time prediction endpoint for ad-hoc analysis and integration testing
A frontend analytics page showing live model stats, training buffer fill, ensemble weights, and feature importance rankings

What's Built

Alert Classifier (RandomForest, sklearn)	✓ Complete
Anomaly Detector (IsolationForest, sklearn)	✓ Complete
Threat Scorer (GradientBoosting, sklearn)	✓ Complete
70/30 ensemble blend (ML + rule-based)	✓ Complete
Auto-retrain at 100 analyst feedback samples	✓ Complete
Analyst feedback recording endpoint with disposition validation	✓ Complete
Feature importance extraction (RandomForest)	✓ Complete
Model version history (Azure Blob persistent storage)	✓ Complete
Real-time prediction endpoint with sample test	✓ Complete
ML health check with latency percentiles	✓ Complete
Performance log with historical retrain records	✓ Complete
Frontend analytics page (model cards, training buffer, ensemble viz, feature chart)	✓ Complete
Four disposition outcomes with threshold documentation	✓ Complete

Architecture

ML Router

File: platform/api/app/routers/ml.py — Prefix: /api/v1/ml

A thin FastAPI router that delegates all logic to the ml_pipeline singleton imported from app/ml/training_pipeline.py. The router is stateless — it does not hold any model state itself. The pipeline singleton manages model loading from Azure Blob storage at startup, maintains the training feedback buffer in memory, and schedules retraining when the buffer crosses the configured threshold (default: 100 samples).

ML Training Pipeline (`app/ml/training_pipeline.py`)

The singleton ml_pipeline encapsulates three sklearn model wrappers:

AlertClassifier — wraps a RandomForestClassifier; provides predict(alert) returning a classification label and confidence score, and get_feature_importances() returning a ranked dict of feature names to importance weights
AnomalyDetector — wraps an IsolationForest with contamination=0.10 (10% expected anomaly rate); unsupervised, so it does not use labeled training data
ThreatScorer — wraps a GradientBoostingClassifier with 100 estimators; outputs a continuous risk score in the 0–100 range

The pipeline exposes predict_ensemble(alert: dict), which runs all three models and blends results using the 70/30 formula: confidence = 0.7 * ml_score + 0.3 * rule_score. It also manages record_feedback(), which appends analyst-labeled samples to the training buffer and triggers trigger_retrain() when the buffer reaches threshold.

Model Store (`app/ml/model_store.py`)

Handles persistence of trained model artifacts to and from Azure Blob storage (container: ml-models on storage account stroconmlmodels). Each model version is saved with a timestamp and version number. The store provides get_all_model_versions() and get_version_history(model_name) for the rollback UI.

ML Models

RandomForest

Alert Classifier

Supervised multi-class classifier that categorizes incoming alerts by threat category. Trained on labeled analyst feedback (true_positive, false_positive, suspicious, benign). Provides accuracy metric and feature importances. Version tracked as stats.classifier.version.

IsolationForest

Anomaly Detector

Unsupervised anomaly detection with contamination=0.10, meaning 10% of training samples are expected to be anomalous. Does not require labeled data. Displayed as a static 10% contamination rate in the UI since it has no accuracy metric in the traditional sense. Version tracked as stats.anomaly_detector.version.

GradientBoosting

Threat Scorer

Produces a continuous risk score from 0 to 100 for each alert. Built with 100 estimators. The score feeds directly into the ensemble blending formula. Higher scores push alerts toward critical_immediate disposition. Version tracked as stats.threat_scorer.version.

API Endpoints

ML Router — `/api/v1/ml`

Method	Path	Description
GET /api/v1/ml/stats	Current model versions, accuracy, training buffer size, last retrain timestamp, per-class distribution, and data drift indicators
GET /api/v1/ml/health	ML pipeline health: model load status per model, prediction latency percentiles (p50, p95, min, max, mean), buffer size, last prediction timestamp, scheduler status, storage backend info
POST /api/v1/ml/predict	Run full ensemble prediction on a provided alert dict; returns classification, anomaly flag, risk score, blended confidence, and disposition recommendation
GET /api/v1/ml/predict/sample	Run prediction on a hardcoded sample alert (PowerShell execution on DESKTOP-ADMIN01) for integration testing; returns result + sample_alert
POST /api/v1/ml/feedback	Record analyst feedback for training; accepts alert dict, disposition (true_positive \| false_positive \| suspicious \| benign), and optional risk_score override; triggers retrain if buffer threshold reached
POST /api/v1/ml/retrain	Force immediate retrain of all models with accumulated buffer data (admin use; bypasses threshold check)
GET /api/v1/ml/performance	Historical model performance log plus current stats snapshot; shows accuracy trends across retrain events
GET /api/v1/ml/feature-importances	RandomForest feature importance dict sorted descending; used by the analytics page for the ranked bar chart visualization
GET /api/v1/ml/versions	Version history for all three models from Azure Blob model_store
GET /api/v1/ml/versions/{model_name}	Version history for a specific model (alert_classifier \| anomaly_detector \| threat_scorer); returns latest version and full history list

Frontend Route

Route	File	Description
/analytics	`src/app/analytics/page.tsx`	Security Analytics page — model cards, training buffer, ensemble strategy, feature importances bar chart, disposition breakdown

The page is a Next.js App Router client component ("use client"). On mount it fires two requests in parallel via Promise.allSettled: GET /api/v1/ml/stats and GET /api/v1/ml/feature-importances. Either can fail independently without breaking the page — the component renders whatever data arrives. Auto-refreshes every 30 seconds via setInterval. API calls use the shared api client from @/lib/api-client.

Ensemble Strategy

The ensemble combines outputs from all three models into a single blended confidence score that drives the final alert disposition. The formula is applied in ml_pipeline.predict_ensemble():

blended_confidence = (0.7 × ml_score) + (0.3 × rule_score)

The 70% ML weight reflects confidence in the trained models when sufficient labeled data is available. The 30% rule-based weight preserves expert-authored detection logic as a safety net, particularly for novel attack patterns that the models have not yet seen in training data. The split is visualized in the frontend as a two-segment progress bar (cyan for ML, emerald for rule-based).

Score Sources

Component	Weight	Source
ML Score	70%	Blended output of AlertClassifier + AnomalyDetector + ThreatScorer predictions
Rule Score	30%	Rule-based confidence from `RiskScoringModel` in the 6-stage triage pipeline

The 6-stage triage pipeline (entity extraction, threat intel enrichment, UEBA, correlation, rule scoring, ML ensemble) feeds both the rule_score and the raw alert features into the ML models. The full pipeline is documented in the Alerts module.

Disposition Logic

The final blended confidence score maps to one of four disposition labels. These labels drive what the Autonomous SOC Engine does with the alert — auto-close it, notify an analyst, open a SOAR playbook, or page the incident response team.

benign_auto_resolved

Confidence > 85% that alert is benign. Automatically closed without analyst review. Feeds a "benign" label into the ML training buffer for future retraining.

requires_investigation

Confidence between 50% and 85%. Insufficient certainty for automated closure. Alert queued for analyst review. The ML models will receive the analyst's final verdict as feedback.

suspicious_escalate

Confidence below 50% or anomaly score high. Alert escalated to the incidents queue. A SOAR playbook may be triggered depending on severity. Analyst must investigate and close.

critical_immediate

High risk score (>80 on 0–100 scale) and high severity. Immediate escalation. Incident created automatically. On-call analyst paged. SOAR playbook triggered without waiting for analyst action.

The thresholds are: >85% blended confidence → auto-resolve; 50–85% → investigate; <50% → escalate; high threat score + high severity → critical immediate. These thresholds are implemented in TriageService in app/services/triage_service.py.

Data Models

The analytics module does not define its own SQLAlchemy database models. All data is maintained in-memory by the ml_pipeline singleton, with model artifacts persisted to Azure Blob via model_store. The structures below document the API response shapes consumed by the frontend.

MLStats Response (`GET /api/v1/ml/stats`)

{
  "classifier": {
    "version": "1.3.0",
    "accuracy": 0.923,
    "samples_trained": 1450
  },
  "anomaly_detector": {
    "version": "1.1.0",
    "accuracy": null,              // unsupervised — no accuracy metric
    "samples_trained": null
  },
  "threat_scorer": {
    "version": "1.2.0",
    "accuracy": 0.891,
    "samples_trained": 1450
  },
  "training_buffer": 34,           // samples accumulated since last retrain
  "retrain_threshold": 100,        // samples needed to trigger auto-retrain
  "total_retrains": 14,
  "last_retrain": "2026-02-28T18:42:00",
  "feature_importances": {
    "severity_encoded": 0.187,
    "rule_confidence": 0.142,
    "correlation_count": 0.118,
    // ... up to 17 features
  }
}

FeedbackRequest (POST /api/v1/ml/feedback)

{
  "alert": {
    "title": "Lateral movement detected on DC01",
    "severity": "high",
    "source_siem": "sentinel",
    "mitre_tactic": "lateral_movement",
    "rule_confidence": 0.72,
    "correlation_count": 5
  },
  "disposition": "true_positive",   // true_positive | false_positive | suspicious | benign
  "risk_score": 87.5                // optional analyst-provided override
}

PredictResponse (POST /api/v1/ml/predict)

{
  "classification": "suspicious",
  "anomaly_score": -0.31,           // IsolationForest raw score (lower = more anomalous)
  "risk_score": 74.2,               // ThreatScorer output (0–100)
  "ml_confidence": 0.68,
  "rule_score": 0.72,
  "blended_confidence": 0.692,      // 0.7 * ml_confidence + 0.3 * rule_score
  "disposition": "requires_investigation",
  "reasoning": "High rule confidence, moderate ML confidence. Correlation count of 5 suggests related activity."
}

FeatureImportances Response (`GET /api/v1/ml/feature-importances`)

{
  "feature_importances": {
    "severity_encoded": 0.187,
    "rule_confidence": 0.142,
    "correlation_count": 0.118,
    "hour_of_day": 0.094,
    "source_siem_encoded": 0.082,
    "mitre_tactic_encoded": 0.079,
    "ioc_count": 0.071,
    "asset_count": 0.063,
    // ... additional features
  }
}

ModelVersionHistory (`GET /api/v1/ml/versions/{model_name}`)

{
  "model_name": "alert_classifier",
  "latest": {
    "version": "1.3.0",
    "saved_at": "2026-02-28T18:42:00",
    "accuracy": 0.923,
    "samples": 1450,
    "blob_path": "ml-models/alert_classifier/v1.3.0.pkl"
  },
  "history": [ /* array of version records */ ],
  "total_versions": 14
}

Prerequisites

ML Training Pipeline — app/ml/training_pipeline.py singleton ml_pipeline; must be initialized at startup; models loaded from Azure Blob or initialized fresh if no stored versions exist
AlertClassifier — sklearn RandomForestClassifier wrapper with predict(), get_feature_importances(), and versioned state
AnomalyDetector — sklearn IsolationForest wrapper (contamination=0.10)
ThreatScorer — sklearn GradientBoostingClassifier wrapper (100 estimators) with 0–100 score output
Model Store — app/ml/model_store.py; requires Azure Blob Storage credentials (storage account stroconmlmodels, container ml-models); falls back to local /tmp/ storage if unavailable (ephemeral)
scikit-learn — Python package; required for all three model types; must be present in requirements.txt
numpy / pandas — Required for feature engineering in the training pipeline
Triage Service — app/services/triage_service.py; calls ml_pipeline.predict_ensemble() during the 6-stage alert triage pipeline; provides the rule_score that feeds the 30% blend weight
Alerts Router — app/routers/alerts.py; calls ml_pipeline.record_feedback() during bulk actions so analyst dispositions reach the training buffer
Admin Ops Router — app/routers/admin_ops.py; uses ml_pipeline.get_model_stats(), trigger_retrain(), and get_performance_log() for the admin dashboard and activity log

UI Layout

Analytics Page — `/analytics`

Header Row — "Security Analytics" title with Brain icon (blue, #3b82f6). Subtitle: "Classical ML pipeline performance and feature analysis". Right-aligned Refresh button (slate background).
Error Banner — Conditionally rendered red banner showing the error message if either API call fails. Does not prevent the rest of the page from rendering with available data.
Model Cards Row — 3-column grid (stacks to 1 on mobile). Each card has:
- Alert Classifier: Target icon (blue), "RandomForest" badge (blue), large accuracy percentage as headline stat, version and samples_trained as subtext
- Anomaly Detector: Activity icon (emerald), "IsolationForest" badge (emerald), "10%" contamination rate as headline stat (static), unsupervised note in subtext
- Threat Scorer: TrendingUp icon (orange), "GradientBoosting" badge (orange), "0–100" as headline stat for risk score range, "100 estimators" note in subtext
Training Buffer Card — Shows current buffer fill as "X / Y" and a gradient progress bar (cyan to emerald). Subtext shows auto-retrain threshold and total retrains completed. Buffer percentage capped at 100% even if buffer overflows threshold.
Ensemble Strategy Card — Side-by-side with training buffer. Shows two rows: "ML Weight: 70%" (blue) and "Rule-Based Weight: 30%" (emerald), with a two-segment horizontal bar visually splitting the weights. Formula shown as subtext: "Blended confidence = 0.7 * ML score + 0.3 * rule score".
Feature Importances Chart — Full-width card with BarChart3 icon. Ranked list of up to 17 features (top-N from RandomForest). Each row: rank number, feature name in monospace font (truncated at w-44), horizontal bar (gradient cyan to emerald), and importance percentage value. Bars are relative to the top feature's importance (not absolute).
Alert Disposition Actions — Full-width card with 4-column grid (stacks on mobile). One card per disposition: Auto-Resolve (emerald), Escalate (orange), Investigate (yellow), Critical Alert (red). Each shows a colored label, description, and threshold band. Subtext below the grid summarizes the threshold logic: >85% auto-resolve, <50% escalate, 50–85% investigate.

The page uses a white/slate design with colored badges and gradient bars. No chart library dependency — all visualizations are built with plain CSS div elements and inline width styles driven by the data values.