Service Level Agreements are the contractual backbone of a managed SOC. Enterprise customers buy specific response guarantees: a critical alert must be acknowledged within 15 minutes (MTTA) and fully resolved within 4 hours (MTTR). Without automated tracking, SOC teams rely on manual ticket timestamps and spreadsheets, which are error-prone and cannot provide real-time warning before a breach occurs.
The SLA Dashboard module provides real-time and historical SLA compliance tracking for all tenant tiers. It surfaces per-severity scorecards graded A through F, live breach detection with WebSocket alerts, 30-day MTTA/MTTR trend charts, configurable SLA target definitions, multi-level escalation chain management, multi-channel breach notifications, daily/monthly compliance reports, and penalty tracking.
What Was Proposed
Per-tenant SLA definition with MTTA and MTTR targets by severity (critical, high, medium, low)
Real-time alert lifecycle tracking from creation through acknowledgement to resolution
Background breach detection loop running on a 60-second interval
Warning notifications at configurable threshold (default 80% of SLA window elapsed)
Metric snapshot persistence at 5-minute intervals for historical trend analysis
Historical compliance reports and MTTA/MTTR trend data (up to 365 days)
SLA Configuration CRUD with four predefined templates (standard, premium, enterprise, government)
Multi-level escalation chains with per-severity routing to L1/L2/L3/Manager/CISO
Multi-channel breach notifications: WebSocket, email, and Slack
Daily and monthly SLA reports, period-over-period comparison, and penalty tracking
Period comparison endpoint (delta analysis between two date ranges)
✓ Complete
Penalty tracking and monthly penalty summary
✓ Complete
Frontend: scorecard grid with A-F grades and MTTA/MTTR progress bars
✓ Complete
Frontend: live metrics panel (open alerts, breaches, warnings, compliance %)
✓ Complete
Frontend: 30-day CSS bar chart trend visualization for MTTA and MTTR
✓ Complete
Frontend: active breaches table with elapsed vs target time
✓ Complete
Frontend: collapsible SLA configuration form with per-severity target inputs
✓ Complete
Architecture
Backend Service: SLAManager
File: app/services/sla_manager.py — The SLAManager is a singleton (sla_manager) instantiated at module import time. It composes four sub-services into one cohesive object exposed to the router:
SLAManager core — Holds a dict of SLADefinition dataclasses per tenant and a dict of SLATracker dataclasses per alert. Runs a background asyncio task checking for breaches every 60 seconds and persisting metric snapshots every 5 minutes. Sends WebSocket push notifications at the warning threshold and on confirmed breach.
SLAConfigManager — Full CRUD for SLAConfig records (in-memory dict per tenant). Provides an apply_template shortcut for the four named tiers. Configs carry their own field-level overrides on top of the template baseline.
EscalationEngine — Manages per-tenant, per-severity escalation chains. Each chain is an ordered list of EscalationStep objects (level, role, timeout, notification method). The engine tracks current escalation level per alert and emits EscalationEvent records with timestamps for full audit history.
BreachNotifier — Configures multi-channel notification recipients per tenant (WebSocket, email, Slack). Stores a notification history log. Triggers notify_breach() and notify_warning() when the breach detection loop fires.
SLAReportEngine — Generates daily and monthly compliance reports using stored metric snapshots. Supports period-over-period comparison and penalty calculation at the configured per-hour rate.
SLA Templates
Template
Critical MTTA
Critical MTTR
Penalty/Hour
Notes
standard
30 min
8 hr
$50.00
Default for most tenants
premium
15 min
4 hr
$100.00
Faster response commitments
enterprise
5 min
1 hr
$250.00
Aggressive targets, highest penalties
government
10 min
2 hr
$200.00
FedRAMP compliance flags, data sovereignty, audit logging
All endpoints are defined in app/routers/sla.py under the prefix /api/v1/sla.
Core Dashboard Metrics
GET /api/v1/sla/current?tenant_id=...
# Live metrics: open_alerts, active_breaches, warnings, overall_compliance_pct, overall_health
GET /api/v1/sla/scorecard?tenant_id=...
# Per-severity scorecard: compliance_pct, mtta_avg_min, mtta_target_min, mttr_avg_min, mttr_target_min, grade
GET /api/v1/sla/compliance?tenant_id=...&start_date=...&end_date=...
# Historical compliance report with optional ISO 8601 date range
GET /api/v1/sla/trends/mtta?tenant_id=...&days=30
# MTTA trend data points (days: 1-365)
GET /api/v1/sla/trends/mttr?tenant_id=...&days=30
# MTTR trend data points (days: 1-365)
GET /api/v1/sla/breaches?tenant_id=...
# Active and recent SLA breaches
GET /api/v1/sla/alerts/{alert_id}/tracking
# Full SLA tracking record for a specific alert (404 if not tracked)
SLA Target Definition
GET /api/v1/sla/definition?tenant_id=...
# Returns current SLADefinition for the tenant
PUT /api/v1/sla/definition?tenant_id=...
Body: SLADefinitionRequest {
critical_mtta_minutes: int (1-120), critical_mttr_hours: int (1-48),
high_mtta_minutes: int (1-240), high_mttr_hours: int (1-72),
medium_mtta_hours: int (1-24), medium_mttr_hours: int (1-168),
low_mtta_hours: int (1-72), low_mttr_hours: int (1-720),
notification_threshold_pct: int (50-99)
}
SLA Config CRUD
GET /api/v1/sla/templates
POST /api/v1/sla/configs/{tenant_id}
Body: { name, template: "standard|premium|enterprise|government", overrides: {} }
GET /api/v1/sla/configs/{tenant_id}
GET /api/v1/sla/configs/{tenant_id}/{config_id}
PUT /api/v1/sla/configs/{tenant_id}/{config_id} # Partial update (all fields optional)
DELETE /api/v1/sla/configs/{tenant_id}/{config_id}
POST /api/v1/sla/configs/{tenant_id}/apply-template
Body: { template_name: "standard|premium|enterprise|government" }
Escalation Chains
GET /api/v1/sla/escalations/{tenant_id}
POST /api/v1/sla/escalations/{tenant_id}
Body: { severity: "critical|high|medium|low", chain: [EscalationStepRequest, ...] }
POST /api/v1/sla/escalate/{alert_id}?tenant_id=...&severity=high
# Trigger escalation to next level; 400 if chain exhausted or undefined
GET /api/v1/sla/escalation-status/{alert_id}
GET /api/v1/sla/escalation-events/{tenant_id}?limit=50
Notifications
GET /api/v1/sla/notifications/{tenant_id}?limit=50
POST /api/v1/sla/notifications/{tenant_id}/configure
Body: { channels: ["websocket","email","slack"],
recipients: { "email": ["soc@example.com"] },
thresholds: [80, 90, 100] }
GET /api/v1/sla/notifications/{tenant_id}/config
Reports and Penalties
GET /api/v1/sla/reports/{tenant_id}/daily?date=YYYY-MM-DD
GET /api/v1/sla/reports/{tenant_id}/monthly?month=YYYY-MM
GET /api/v1/sla/reports/{tenant_id}/trends?metric=mtta|mttr|compliance&days=90
POST /api/v1/sla/reports/{tenant_id}/compare
Body: { period1_start, period1_end, period2_start, period2_end } # ISO 8601
GET /api/v1/sla/penalties/{tenant_id}
GET /api/v1/sla/penalties/{tenant_id}/summary?month=YYYY-MM
Routing
Layer
Path
Description
/sla-dashboard
Frontend route (Next.js App Router)
Main SLA Dashboard page
/api/v1/sla
API prefix (FastAPI router)
All SLA backend endpoints
Data Model
The SLA module uses in-memory Python dataclasses rather than SQLAlchemy ORM tables. This allows the demo system to operate without database migrations while still providing realistic data. State is lost on API restart.
SLADefinition
Field
Type
Default
Description
tenant_id
str
"default"
Tenant scoping key
critical_mtta_minutes
int
15
Max acknowledgement time for critical severity alerts
critical_mttr_hours
int
4
Max resolution time for critical alerts
high_mtta_minutes
int
30
Max acknowledgement time for high severity alerts
high_mttr_hours
int
8
Max resolution time for high alerts
medium_mtta_hours
int
4
Max acknowledgement time for medium alerts
medium_mttr_hours
int
24
Max resolution time for medium alerts
low_mtta_hours
int
8
Max acknowledgement time for low alerts
low_mttr_hours
int
72
Max resolution time for low alerts
notification_threshold_pct
int
80
% of SLA window elapsed before warning notification fires
SLAMetric (snapshot for trend analysis)
Field
Type
Description
timestamp
str (ISO 8601)
When this snapshot was captured (every 5 minutes)
tenant_id
str
Tenant this metric belongs to
metric_type
MetricType enum
mtta / mttr / volume / auto_resolve_rate
value
float
Metric value in seconds (MTTA/MTTR) or count (volume)
severity
str
critical / high / medium / low
EscalationStep
Field
Type
Description
level
str (EscalationLevel)
L1 / L2 / L3 / Manager / CISO
role
str
Role or team to notify (e.g. "SOC Analyst", "CISO")
timeout_minutes
int (1-1440)
Minutes to wait before auto-escalating to the next step
notification_method
str
email / slack / pager / phone
Prerequisites
SLA Manager singleton — app/services/sla_manager.py must be imported (which happens via the router registration in app/main.py). The singleton starts background asyncio tasks on startup.
Notification Service — app/services/notification_service.py is called internally by the breach notifier to push WebSocket messages. The WebSocket router must also be registered.
Alerts Router integration — app/routers/alerts.py calls sla_manager.track_alert() on alert creation and updates lifecycle state on acknowledgement and resolution so the breach loop has live data.
Frontend API Client — src/lib/api-client.ts provides api.get() and api.put(). All five dashboard data fetches use Promise.all with individual .catch(() => null) guards so any single endpoint failure falls back to mock data without breaking the entire page.
UI Layout
Page Sections (top to bottom)
Header Row — Orange Timer icon, "SLA Dashboard" h1, subtitle text, and a Refresh button (RefreshCw icon spins while loading === true). Refresh calls loadData() which re-fetches all 5 API endpoints.
SLA Scorecard — 4-column grid (Critical, High, Medium, Low). Each card shows: colored severity dot, compliance percentage (green ≥95%, yellow ≥85%, red <85%), A-F grade badge (green/blue/yellow/orange/red), MTTA average vs target with a progress bar (green if within target, red if over), MTTR average vs target with same color logic. Card border/background color matches the compliance level.
Live Metrics — 5-column row of metric tiles: Open Alerts (gray Activity icon), Active Breaches (red XCircle), Warnings (yellow AlertTriangle), Overall Compliance % (dynamically colored based on thresholds), Overall Health (CheckCircle green / AlertTriangle yellow / XCircle red with status label).
Trend Charts — Side-by-side CSS bar charts (no external charting library). Each bar is green when the daily value is at or below the SLA target and red when over. An orange dashed horizontal line marks the target. Hover tooltip shows date and value in minutes. Date range labels at chart bottom. Legend: Under SLA (green), Over SLA (red), Target (orange dash).
Active Breaches Table — White rounded card with columns: Alert ID (monospace), Severity (colored dot + label), Created At (locale-formatted), Elapsed Time (red font when over target), SLA Target, Status ("Breached" red pill or "Warning" yellow pill). Empty state row: "No active breaches -- all SLAs within targets".
SLA Configuration (collapsible) — Section toggle header with ChevronUp/Down. When expanded: 4-column form grid with one column per severity level. Each column has MTTA Target (min) and MTTR Target (min) numeric inputs. Orange "Save Configuration" button submits to PUT /api/v1/sla/definition. Inline success (CheckCircle green) or failure (XCircle red) message appears next to the button after save.