Observability Setup¶
Compliance mapping: ISO 27001 A.8.16 · SOC 2 CC7.2 · NIST DE.CM · FedRAMP AU-6/SI-4
Overview¶
Opencomplai emits OpenTelemetry (OTel) traces and Prometheus metrics from every service. The Docker Compose stack bundles a full observability pipeline:
| Component | Role |
|---|---|
otel-collector | Receives OTLP from all services; exports metrics to Prometheus |
prometheus | Stores time-series metrics; scraped by Grafana |
grafana | Visualises metrics; hosts the Opencomplai compliance health dashboard |
Quick Start¶
- Copy the env template and enable OTel:
cp infra/compose/.env.example infra/compose/.env
# OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_SERVICE_NAME are enabled by default
- Start the stack:
- Open Grafana at
http://localhost:3001(default credentials: anonymous viewer).
Environment Variables¶
| Variable | Default | Description |
|---|---|---|
OTEL_EXPORTER_OTLP_ENDPOINT | http://otel-collector:4317 | gRPC OTLP endpoint for trace/metric export |
OTEL_SERVICE_NAME | opencomplai | Service name tag on all telemetry |
PROMETHEUS_HOST_PORT | 9090 | Host port for Prometheus UI |
GRAFANA_HOST_PORT | 3001 | Host port for Grafana UI |
Leave OTEL_EXPORTER_OTLP_ENDPOINT unset to disable trace export. Prometheus metrics are always exposed via each service's /metrics endpoint regardless.
Instrumented Events¶
All services emit the following canonical events (PRD Section 11.1):
| Event | Metric Counter | Meaning |
|---|---|---|
compliance_check_started | opencomplai_compliance_check_started_total | A compliance scan was initiated |
compliance_check_completed | opencomplai_compliance_check_completed_total | A compliance scan finished (with status label) |
trap_detected | opencomplai_trap_detected_total | Substantial modification trap fired |
override_submitted | opencomplai_override_submitted_total | Break-glass / HITL override submitted |
verification_failed | opencomplai_verification_failed_total | Claim verification or auth failure |
dossier_generated | opencomplai_dossier_generated_total | Annex IV dossier produced |
egress_blocked | opencomplai_egress_blocked_total | Outbound request blocked by egress-proxy |
badge_issued | opencomplai_badge_issued_total | Compliance badge issued |
Grafana Dashboard Panels¶
The provisioned Opencomplai — Compliance Health dashboard includes:
- Time to first scan (P95 ms) — latency gauge
- Control pass rate — percentage of scans that pass all controls
- Trap detection frequency — rate of
trap_detectedevents by system - Override rate — rate of
override_submittedevents - Egress blocked events — total
EGRESS_BLOCKEDcount (red alert threshold: ≥10) - BREAK_GLASS_ACTIVATED count — total override activations (red alert threshold: ≥1)
- Audit Events Rate — rate of audit events entering the ledger
- Auth Failure Rate — rate of verification/auth failures (brute-force indicator)
Alert Routing¶
For production deployments, configure alert routing in Grafana or your SIEM:
| Alert | Threshold | Response |
|---|---|---|
EGRESS_BLOCKED ≥ 10 in 5 min | High | Investigate potential exfiltration |
BREAK_GLASS_ACTIVATED ≥ 1 | Critical | Verify HITL approval exists |
| Auth failures > 5/min | High | Potential brute-force — review source IP |
| Compliance check error rate > 5% | Medium | Service degradation — check service health |
Air-Gapped Deployments¶
In air-gapped environments where OTel export is not possible:
- Leave
OTEL_EXPORTER_OTLP_ENDPOINTunset. - Prometheus still scrapes each service's
/metricsendpoint directly. - Traces are emitted locally but not forwarded to the collector.
See Air-Gap Deployment for full configuration.