ML Models & Training Pipeline

Three-model sklearn ensemble for alert classification, anomaly detection, and threat scoring with continuous learning from analyst feedback.

Status

Built

ML Models

Pipeline Files

Frontend

Live

Overview

The ML Pipeline provides the AI scoring backbone for ThreatOps. It uses three classical ML models from scikit-learn, bootstrapped with synthetic training data and continuously improved through analyst feedback. The ensemble combines a Random Forest classifier (alert disposition), an Isolation Forest anomaly detector (anomaly flagging), and a Gradient Boosting threat scorer (0-100 risk score). Models are persisted via Azure Blob Storage with local fallback. A background scheduler checks hourly for auto-retraining when 50+ new samples are available.

What Was Proposed

Three-model ML ensemble: AlertClassifier, AnomalyDetector, ThreatScorer
Feature engineering from raw alert payloads
Continuous learning from analyst feedback (true_positive, false_positive, suspicious, benign)
Auto-retrain when threshold samples reached
Model versioning with rollback capability
Azure Blob Storage persistence surviving pod restarts
Health monitoring and latency tracking
Score blending: 70% ML + 30% rule-based

What's Built

Feature	Status	Details
AlertClassifier	Complete	Random Forest, 4 classes (true_positive, false_positive, suspicious, benign), bootstrapped with 2000 synthetic samples
AnomalyDetector	Complete	Isolation Forest, contamination-based anomaly flagging
ThreatScorer	Complete	Gradient Boosting Regressor, outputs 0-100 risk score
Feature Engineering	Complete	AlertFeatureExtractor: severity mapping, SIEM source encoding, MITRE tactic mapping, temporal features, IOC/asset counts
Training Pipeline	Complete	Continuous learning: collect feedback, accumulate samples, auto-retrain at 100 (feedback) or 50 (scheduled)
Model Store	Complete	Azure Blob Storage (stroconmlmodels, container: ml-models) with local /tmp fallback. Version tracking per model
Ensemble Prediction	Complete	Combined prediction: disposition + confidence + risk_score + is_anomaly + anomaly_score
Health Monitoring	Complete	Model load status, prediction latency percentiles (p50, p95), training buffer size, scheduler status
Feature Importances	Complete	Random Forest feature importances exposed for interpretability
Version History	Complete	Per-model version history from ModelStore for rollback UI
Background Scheduler	Complete	Hourly check, auto-retrain at 50+ samples since last retrain

Architecture

ML Pipeline Architecture

platform/api/app/ml/
  __init__.py
  feature_engineering.py   -- AlertFeatureExtractor (raw alert -> feature vector)
  models.py                -- AlertClassifier (RF), AnomalyDetector (IF), ThreatScorer (GB)
  training_pipeline.py     -- TrainingPipeline (feedback collection, retrain, scheduler)
  model_store.py           -- ModelStore (Azure Blob Storage + local fallback)

Prediction Flow:
  Raw Alert -> AlertFeatureExtractor.extract() -> feature_vector (numpy array)
                |
                +-> AlertClassifier.predict()    -> disposition, confidence
                +-> AnomalyDetector.predict()    -> is_anomaly, anomaly_score
                +-> ThreatScorer.predict()       -> risk_score (0-100)
                |
                v
           Ensemble Result: { disposition, confidence, risk_score, is_anomaly, anomaly_score }

Training Flow:
  Analyst Feedback -> record_feedback() -> training_buffer (JSON, persisted)
                                            |
                                            +-- buffer >= 100 samples -> trigger_retrain()
                                            +-- scheduler (hourly) -> 50+ new samples -> retrain
                                            |
                                            v
                                    retrain all 3 models -> persist via ModelStore -> log performance

Score Blending (6-Stage Triage)

Final Score = 0.70 * ML_ensemble_score + 0.30 * rule_based_score

6-Stage Triage Pipeline:
  1. Entity Extraction
  2. Threat Intelligence Lookup
  3. UEBA (User and Entity Behavior Analytics)
  4. Alert Correlation
  5. Rule-Based Scoring (372 detection rules in 20+ modules)
  6. ML Ensemble (AlertClassifier + AnomalyDetector + ThreatScorer)

API Routing

Router prefix: /api/v1/ml — Tag: ml-models

GET/statsModel versions, accuracy, training buffer, per-class distribution

GET/healthPipeline health: load status, latency percentiles, scheduler status

POST/predictRun ML ensemble prediction on an alert

GET/predict/sampleTest prediction on a sample PowerShell alert

POST/feedbackRecord analyst feedback for model training

POST/retrainForce retrain all models (admin)

GET/performanceHistorical model performance log

GET/feature-importancesRandom Forest feature importances

GET/versionsVersion history for all models

GET/versions/{model_name}Version history for specific model

Prerequisites

scikit-learn -- RandomForestClassifier, IsolationForest, GradientBoostingRegressor
numpy -- Feature vector computation
Azure Blob Storage -- Account: stroconmlmodels, container: ml-models (optional, falls back to /tmp/ml_models/)
Models bootstrap with synthetic data on first start (no external training data required)

Data Model

ML models use in-memory state with persistence via ModelStore. No SQLAlchemy models required.

AlertClassifier

Attribute	Type	Description
model	RandomForestClassifier	sklearn RF with 100 estimators
classes	list[str]	["true_positive", "false_positive", "suspicious", "benign"]
version	str	Semantic version (e.g., "1.0.0")
trained_samples	int	Number of training samples
accuracy	float	Test set accuracy

AnomalyDetector

Attribute	Type	Description
model	IsolationForest	sklearn IF with contamination parameter
version	str	Semantic version

ThreatScorer

Attribute	Type	Description
model	GradientBoostingRegressor	sklearn GBR, outputs 0-100
version	str	Semantic version
r2_score	float	R-squared on test set

UI Description

File: platform/frontend/src/app/ml-models/page.tsx

The ML Models dashboard provides full visibility into the AI pipeline:

Model Cards -- Per-model cards showing version, accuracy/R2, trained samples, last updated timestamp
Feature Importance Chart -- Bar chart of RF feature importances for interpretability
Training Status -- Buffer size vs threshold indicator, retrain button, auto-retrain schedule
Live Prediction Test -- Submit a sample alert and see ensemble results: disposition, confidence, risk score, anomaly flag
Feedback Interface -- Submit analyst feedback (disposition + optional risk score) for model training
Performance Log -- Historical retrain log with accuracy/version changes per model
Version History -- Per-model version timeline with rollback indicators
Health Panel -- Pipeline health: model load status, prediction latency percentiles, storage backend info