Skip to content

Customer workflow — what to expect from Opencomplai

This guide walks a customer through using Opencomplai end-to-end: what to feed it, what comes out, and what obligations remain on the customer's side under the EU AI Act.

Opencomplai is a compliance toolkit, not a certification service. It produces structured, machine-readable evidence (a ScanStatusArtifact, an Annex IV dossier, and a tamper-evident ledger) that a customer can hand to an internal auditor, an external reviewer, or a notified body.

What the product actually does

  1. Classifies an AI system under EU AI Act risk levels: Unacceptable, High, Limited, Minimal.
  2. Runs deterministic compliance rules (Articles 5, 6, 25, Annex III) and produces pass/fail with a written rationale for each.
  3. Generates an Annex IV technical-documentation dossier (Article 11) per release candidate.
  4. Records every check as a Merkle-linked ledger event so the evidence chain can be independently verified.

It is not a policy/training/ticketing platform, and it does not auto-discover AI systems. Nothing happens in the running Docker stack until a customer pushes a manifest through it.

How the customer feeds it data

The only input the customer provides is a system manifest describing one AI system. Free-text intended_purpose is what the rule engine pattern-matches against Annex III categories.

Step 1 — install the CLI

pip install opencomplai

Step 2 — initialise a manifest per AI system

A minimal MINIMAL-risk system needs only two flags:

opencomplai init \
  --system-id "loan-decision-model" \
  --intended-purpose "automated credit scoring for retail lending"

This creates system-manifest.json:

{
  "system_id": "loan-decision-model",
  "intended_purpose": "automated credit scoring for retail lending",
  "compliance_target": "EU_AI_ACT",
  "high_risk_presumption": false,
  "commit_ref": "HEAD"
}

For HIGH-risk systems (anything that maps to Annex III, including the credit-scoring example above), the manifest must carry the real Annex IV Section 2 and 3 content. Either pass them inline:

opencomplai init \
  --system-id "loan-decision-model" \
  --intended-purpose "automated credit scoring for retail lending" \
  --high-risk-presumption \
  --training-data-description "5M anonymised loan applications 2018-2025, EU-only, GDPR-cleared lineage in s3://opencomplai-evidence/training-data-manifest.json" \
  --model-architecture "Gradient-boosted decision trees (XGBoost 2.0), 1200 trees, depth 8, calibrated isotonic probabilities" \
  --monitoring-approach "Hourly PSI drift checks per protected attribute; weekly KS test against the training distribution" \
  --incident-response-procedure "runbooks/credit-scoring-incident.md (15-min p0 SLA, on-call rotation in PagerDuty)"

…or supply the long-form structured fields from a JSON file:

opencomplai init \
  --system-id "loan-decision-model" \
  --intended-purpose "automated credit scoring for retail lending" \
  --high-risk-presumption \
  --section-extras-file ./manifest-extras.json

Where manifest-extras.json looks like:

{
  "training_data_description": "5M anonymised loan applications…",
  "model_architecture": "Gradient-boosted decision trees…",
  "performance_metrics": { "auc": 0.83, "calibration_error": 0.04 },
  "known_limitations": [
    "Degrades on applicants under 6 months of credit history",
    "Not validated for non-EU residency"
  ],
  "human_oversight_measures": [
    "All declines reviewed by a human underwriter within 24h",
    "Adverse-action notices generated by humans, not the model"
  ],
  "monitoring_approach": "Hourly PSI drift checks per protected attribute…",
  "incident_response_procedure": "runbooks/credit-scoring-incident.md"
}

If --high-risk-presumption is set without training_data_description or model_architecture, init emits a warning — the resulting dossier would otherwise misrepresent the system to an auditor.

Step 2b — (optional) corroborate declaration against code

The code corroboration scanner cross-checks your intended_purpose against AI capability signals in the repository (dependencies, imports, endpoints, model artifacts). It runs offline and never auto-edits the manifest or risk classification.

opencomplai scan --manifest system-manifest.json --repo-root .
# or opt-in during init/check:
opencomplai init ... --scan
opencomplai check --manifest system-manifest.json --scan

Honesty rules:

  • Declaration remains authoritative; the scanner surfaces gaps for human reconciliation.
  • "No local AI signals detected" is not a compliance verdict.
  • Use --fail-on new-major in CI only when you are ready to gate on new discrepancies.

See scan command reference and examples/sample-system/under-declared-* fixtures.

Step 3 — run a check against the running stack

OPENCOMPLAI_API_URL=http://localhost:8080 opencomplai check

The 10-step service-backed workflow runs: validate manifest → classify → run control checks → generate Annex IV dossier → append events to the evidence ledger. This is the step that actually populates the evidence-vault database and Grafana metrics.

Step 4 — generate the Annex IV dossier (per release)

OPENCOMPLAI_API_URL=http://localhost:8080 opencomplai docs generate \
  --manifest system-manifest.json \
  --system-id "loan-decision-model" \
  --commit-ref "$(git rev-parse HEAD)" \
  --intended-purpose "automated credit scoring for retail lending" \
  --provider-name "ACME Financial AI"

Passing --manifest ensures the Section 2/3 inputs from opencomplai init reach the dossier generator. Without it, those sections fall back to placeholder stubs — acceptable only for MINIMAL-risk systems.

Step 5 — verify the evidence chain

python3 tools/verify-ledger/verify_ledger.py --gateway-url http://localhost:8080

What the customer receives

Opencomplai produces exactly four artifacts. There is no scheduled email, no compliance score, no opinion. Everything is on-demand and machine-readable.

# Artifact Where Purpose
1 Human assessment table opencomplai check stdout Quick read: risk level + per-rule pass/fail + rationale.
2 compliance-artifact.json (ScanStatusArtifact) working directory Machine-readable CI gate result. Fields: result, failed_controls, evidence_hashes, rationale_hash, signature.
3 dossier_<id>.json (Annex IV technical documentation) --output-dir or evidence-vault CAS The deliverable a regulator or notified body asks for under Article 11. Five sections: (1) system description, (2) development process + training data, (3) human oversight & monitoring, (4) logging, (5) risk management. SHA-256 bundle_checksum.
4 Merkle-linked ledger events evidence-vault PostgreSQL Tamper-evident audit trail (compliance_check_started, compliance_check_completed, …). Independently verifiable.

CI exit codes double as the report for automated gates:

Code Meaning
0 pass
1 control failure (one or more rules failed)
2 invalid manifest
3 policy block (prohibited practice — Article 5)
4 trap detected (substantial modification — Article 25)

The customer's operating checklist

Opencomplai gives you evidence and rule outputs — it does not make you compliant on its own. A realistic operating checklist:

  1. One manifest per AI system, committed alongside that system's code. intended_purpose must be accurate — it drives Annex III classification.
  2. opencomplai check runs in CI on every PR, with exit codes 1/3/4 blocking merges.
  3. opencomplai docs generate runs on every release tag, producing the Annex IV dossier for that version. Store dossier_<id>.json + bundle_checksum as a release artifact.
  4. For HIGH-risk systems (Annex III match), the team owes the real substance behind the dossier — these fields are NOT auto-filled by the engine:
  5. Section 2: training-data description, model architecture, performance metrics, known limitations. Supply via opencomplai init --training-data-description ... --model-architecture ... or --section-extras-file.
  6. Section 3: human-oversight measures, monitoring approach, incident-response procedure. Same input path.
  7. Section 5: rationale + failed-rule remediation (carried by the rule outputs). When the manifest does not provide these, the generator falls back to stubs ("Not specified in this release.") and the dossier's signature_status makes the trust level explicit so an auditor cannot mistake the artifact for a fully populated one.
  8. Run verify-ledger periodically (weekly, and before any audit). If the chain breaks, evidence is no longer trustworthy.
  9. Retain logs for the EU-AI-Act-required period — default LOG_RETENTION_DAYS=2555 (7 years) is already set in Section 4 of the dossier.
  10. If EU_AIA_ART25_MODIFICATION_TRAP fires (substantial modification declared), do not redeploy until a fresh conformity assessment is signed off. The system enforces this with exit code 4.
  11. Air-gap mode — set EGRESS_ALLOWED_DESTINATIONS= (empty) in infra/compose/.env if the customer needs to prove no data leaves their network during assessments.

Gateway authentication and dossier signing modes

Gateway auth (OPENCOMPLAI_API_KEY) — the gateway refuses to start unless one of the following is set in infra/compose/.env:

  • OPENCOMPLAI_API_KEY=<strong-shared-secret> — every non-/health request must carry x-api-key: <secret>. This is the only acceptable production setting.
  • OPENCOMPLAI_AUTH_DISABLED=1 — explicit dev/CI escape hatch, logs a warning on every boot. Never use in production.

Generate a secret with:

python -c "import secrets; print(secrets.token_urlsafe(32))"

Dossier signing modes — the dossier's own signature_status field tells an auditor exactly what trust level the artifact carries:

signature_status What it means When
unsigned No signature applied. Bundle checksum is still present for tamper detection. OSS default.
hmac-local HMAC-SHA256 with a local symmetric key (LOCAL_SIGNING_KEY_PATH). Verifiable only by holders of the same key. OSS with a configured local key — adequate for in-org integrity, not for third-party audit.
ed25519 Asymmetric Ed25519 signature (DOSSIER_SIGNING_KEY_PATH). Verifiable by anyone holding the corresponding published public key. Pro/Enterprise, or any deployment that publishes a verification key.

When both signing keys are set, Ed25519 always wins — the system never silently downgrades to a weaker mode.

What Opencomplai will NOT do for you

  • It will not auto-fill training-data lineage, performance metrics, or oversight procedures. Those are human inputs into the dossier.
  • It will not classify by reading model weights or code — only by the manifest's free-text intended_purpose and explicit answers (profiling_detected, substantial_modification, high_risk_presumption).
  • It is not certification. The dossier is a structured input for an internal conformity assessment or a notified-body review — not a regulator-issued stamp.
  • The OSS edition produces an unsigned dossier by default, an hmac-local dossier when a symmetric key is configured, or an ed25519 dossier when an Ed25519 PEM private key is configured. HSM/KMS key management, key rotation, and a hosted multi-tenant verification view are the Pro/Enterprise (SaaS) tier.

Not sure if the EU AI Act applies to your system?

Use the EU AI Act Checker — an interactive wizard that runs entirely in the browser and walks through provider/deployer scope, high-risk classification, GPAI, and obligations. No account or backend required.

TL;DR

Run opencomplai initopencomplai checkopencomplai docs generate against the Docker stack, once per AI system per release. You receive compliance-artifact.json (CI pass/fail), dossier_<id>.json (Annex IV documentation), and a tamper-evident ledger entry. That bundle is what you hand to your auditor. The Docker stack stays empty until that CLI runs.


Demo data — pre-seeded reference scenarios

The running stack is pre-loaded with five representative AI systems covering every risk tier and a range of real-world compliance narratives. These are safe to explore, reset, and re-seed at any time — everything is prefixed demo- so it cannot touch production data.

The five demo systems

System ID Name Risk class EU AI Act category Narrative
demo-credit-scoring-v1 Credit Risk Scorer v1.3.0 HIGH Art. 6 + Annex III §5b Mostly passing, with a mid-period failure window on controls CTRL-002 and CTRL-005, plus bias alerts. Recovers cleanly.
demo-hr-hiring-v2 HR Candidate Ranker v2.0.1 HIGH Annex III §4a Passes, then 5 consecutive failures on CTRL-004, then a HITL halt (3-week scan gap), then full remediation and resumption.
demo-medical-triage-v1 Medical Triage Assistant v1.1.0 HIGH Annex III §1a All 30 scans pass, but the policy bundle is frozen at v1.0.0 — triggers a policy-drift alert even with a clean scan record.
demo-customer-chat-v1 Customer Service Bot v3.2.0 LIMITED Art. 50 transparency Continuously green. 30 passing scans, no failures, very low pending verifications. Shows what a well-maintained limited-risk system looks like.
demo-inventory-opt-v1 Inventory Optimizer v1.0.4 MINIMAL Not listed (MINIMAL) 30 passing scans at ~98% control pass rate. Minimal documentation required. Baseline for the simplest possible compliance posture.

What is seeded

For each system the seeder injects:

  • 5 risk-classification ledger events (one per system, timestamped 91 days ago — before any scan)
  • 147 scan-status artifacts across a 90-day rolling window (30 per system except HR Hiring which has 27 due to the 3-week HITL halt gap)
  • 4 Annex IV dossiers for the three HIGH-risk systems (HR Hiring gets two — pre-halt v2.0.1 and post-remediation v2.0.1-remediated)
  • 5 compliance badges (one per system, result=pass, pending_verifications_count=0)
  • 3 HITL ledger events for demo-hr-hiring-v2: hitl_halt (day −65), hitl_review_started (day −65), hitl_resume (day −50)
  • 8 bias alerts: 4 for Credit Scoring (HIGH → HIGH → MEDIUM → LOW, showing a severity trend that resolves), 4 for HR Hiring (HIGH × 2 pre-halt, MEDIUM and LOW post-remediation)

Grafana dashboard

Open http://localhost:3001 after docker compose up. The Opencomplai — Compliance Health dashboard shows live values from the seeded data:

Panel Seeded value
Control pass rate ~94.6% (139 pass / 147 total)
First scans completed (total) 124
Dossiers generated (total) 4
Badges issued (total) 5+
Egress blocked (total) 0

Panels update in real time — every opencomplai check you run increments the counters.

Using the demo CLI

The seeder runs automatically on docker compose up. To run it manually or explore the reset workflow:

Re-seed (idempotent, safe to run at any time):

docker exec compose-evidence-vault-1 python3 /app/scripts/seed_demo.py \
  --gateway http://gateway-api:8080 \
  --vault http://localhost:8002

Dry-run — print all payloads without writing anything:

docker exec compose-evidence-vault-1 python3 /app/scripts/seed_demo.py \
  --gateway http://gateway-api:8080 \
  --vault http://localhost:8002 \
  --dry-run

Wipe all demo- data:

docker exec compose-evidence-vault-1 python3 /app/scripts/reset_demo.py \
  --vault http://localhost:8002

Wipe and immediately re-seed:

docker exec compose-evidence-vault-1 python3 /app/scripts/reset_demo.py \
  --vault http://localhost:8002 \
  --reseed

From the host (if Python is installed locally):

python scripts/seed_demo.py \
  --gateway http://localhost:8080 \
  --vault http://localhost:8002

Tracing a complete narrative end-to-end

HR Hiring — HITL halt and remediation is the richest scenario to follow:

  1. Query the ledger for the halt event:
    curl http://localhost:8002/v1/evidence/events?system_id=demo-hr-hiring-v2
    
  2. Notice the 3-week scan gap in the timeline (indices 15–17 missing).
  3. Check the two dossiers — one generated before the halt (v2.0.1-demo), one after remediation (v2.0.1-remediated-demo).
  4. Inspect bias alerts: severity drops from HIGH to LOW across 4 alerts post-remediation.
  5. Run verify-ledger to confirm the Merkle chain is intact across the halt:
    python3 tools/verify-ledger/verify_ledger.py --gateway-url http://localhost:8080
    

Credit Scoring — mid-period failure recovery shows what CTRL-002/CTRL-005 failures look like in the scan history and how bias alert severity tracks alongside (HIGH bias alerts coincide with the scan failures, then both resolve).

Medical Triage — policy drift without scan failure is the edge case where everything looks green on the surface (100% pass rate) but the policy bundle has not been updated for 90 days — demonstrating that a clean scan record alone is not sufficient evidence of compliance.