Skip to content

[Tier 6] Configuration baselines (CIS) + drift detection + scorecards #60

@dcoln25-writer

Description

@dcoln25-writer

Problem

The detection engine today is event-driven — rules fire when something happens. There's no notion of point-in-time configuration state and no answer to "how does this org score against CIS Google Workspace Benchmark v1.5?"

Buyers routinely ask:

  • "What's our current CIS benchmark score?"
  • "Show me drift from our chosen baseline over the last 30 days."
  • "Did anything that should be enabled get turned off?"

Event-driven detection misses these because:

  • The misconfiguration may have existed since before Aperio was installed (no event).
  • The fix may have been silently reverted (the reversion fires, but operators need the current state, not the event log).

Goals

  1. Configuration snapshotter — per-connector, periodic capture of the entire posture-relevant config surface into a typed snapshot.
  2. Baseline catalog — ship CIS Google Workspace, CIS M365, CIS GitHub, CIS Okta benchmarks as YAML; let customers author org-specific baselines.
  3. Scoring engine — score each connector snapshot against each enabled baseline; persist time-series of scores.
  4. Drift detection — diff today's snapshot against yesterday's; alert on regressions away from the baseline.
  5. UI: per-org baseline scorecards, drift timeline, per-control drill-down with remediation guidance.

Non-goals

Proposed design

Snapshot model

model ConfigurationSnapshot {
  id              String   @id @default(cuid())
  organizationId  String   @map("organization_id")
  integrationId   String   @map("integration_id")
  provider        SaaSProvider
  snapshotData    Json     @map("snapshot_data")        // typed per provider
  snapshotHash    String   @map("snapshot_hash") @db.VarChar(64) // sha256 for change detection
  capturedAt      DateTime @default(now()) @map("captured_at")
  organization    Organization @relation(...)
  integration     IntegrationConnection @relation(fields: [integrationId], references: [id], onDelete: Cascade)
  @@index([organizationId, integrationId, capturedAt])
  @@index([organizationId, provider, capturedAt])
  @@map("configuration_snapshots")
}

Per-connector snapshot capture (runs on a daily schedule, separate from event-driven sync):

Provider Captured surface
Google Workspace Admin console settings: external sharing default, 2SV enforcement, DWD allow-list, password policy, alert center subscriptions
Okta Sign-on policies, password policies, network zones, MFA enrollment policies, admin role assignments count
GitHub Org settings (2FA enforcement, base permission, default branch protection rules, secret-scanning), repo count by visibility
Slack Workspace 2FA enforcement, app install policy, retention settings, guest access defaults
M365 Tenant CA policies, legacy auth state, sensitivity labels enabled, mailbox auto-forward policy
Atlassian Org-level visibility defaults, Marketplace app install policy

Baseline format (YAML)

id: cis_google_workspace_v1_5
name: CIS Google Workspace Benchmark
version: 1.5.0
provider: GOOGLE_WORKSPACE

controls:
  - id: "1.1"
    title: "Ensure 2-Step Verification is enforced for all users"
    severity: HIGH
    check:
      path: "admin.security.two_step_verification.enforcement"
      expected: "enforced_for_all"
    remediation_steps:
      - "Admin console → Security → 2-Step Verification → Enforcement → New users + All users"

  - id: "1.2"
    title: "Ensure password length policy is set to >= 12"
    severity: MEDIUM
    check:
      path: "admin.security.password_policy.min_length"
      expected_min: 12

  - id: "5.3"
    title: "Ensure external sharing of Drive content is restricted"
    severity: HIGH
    check:
      path: "drive.sharing.external_sharing"
      expected_one_of: ["disabled", "allowlist_domains_only"]

Built-in baselines live in baselines/:

baselines/
├── cis_google_workspace_v1_5.yaml
├── cis_microsoft_365_v3_0.yaml
├── cis_github_v1_0.yaml
├── cis_okta_v1_0.yaml
└── cis_atlassian_v1_0.yaml

Scoring engine

internal/baseline/ Go package:

  1. Load enabled baselines per org.
  2. For each ConfigurationSnapshot, evaluate each baseline control's check expression against the snapshot JSON.
  3. Persist a BaselineScore row per (org × baseline × control × snapshot).
  4. Aggregate scores into a current scorecard + a per-day time series.
enum BaselineControlStatus {
  PASS
  FAIL
  NOT_APPLICABLE
  ERROR
}

model BaselineScore {
  id              String   @id @default(cuid())
  organizationId  String   @map("organization_id")
  snapshotId      String   @map("snapshot_id")
  baselineId      String   @map("baseline_id") @db.VarChar(120)
  baselineVersion String   @map("baseline_version") @db.VarChar(20)
  controlId       String   @map("control_id") @db.VarChar(60)
  status          BaselineControlStatus
  observedValue   Json?    @map("observed_value")
  evaluatedAt     DateTime @default(now()) @map("evaluated_at")
  organization    Organization @relation(...)
  snapshot        ConfigurationSnapshot @relation(fields: [snapshotId], references: [id], onDelete: Cascade)
  @@unique([snapshotId, baselineId, controlId])
  @@index([organizationId, baselineId, evaluatedAt])
  @@map("baseline_scores")
}

Drift detection

When a new snapshot lands, diff it against the previous snapshot. Any control that flipped PASS → FAIL opens a SecurityFinding with ruleKey = "baseline.regression.<baseline>.<controlId>". This routes through the existing finding pipeline + workflow #50.

UI surface

  • /baselines — list of enabled baselines, current pass rate, trend sparkline.
  • /baselines/<id> — per-control table: status, last evaluated, observed value, remediation. Filter by failing.
  • Per-control drift timeline — green/red bar per day for the last 90 days.
  • Dashboard tile — "CIS Google Workspace score: 82% (↓ 3% this week)".

Phasing

Phase Scope
P1 Schema; snapshot capture for Google Workspace; CIS Google Workspace baseline YAML; scoring engine; /baselines index page
P2 Snapshot capture for Okta + GitHub + Slack; respective CIS baselines; drift detection rule emitter
P3 M365 + Atlassian snapshots and baselines; per-control drift timeline; per-tenant custom baselines
P4 Baseline overrides per tenant ("we don't enforce control 5.3 by design"); auto-export to compliance evidence pack (#5)

Open questions

  • Snapshot retention — fast-growing table; downsample after N days vs. raw-keep forever for audit?
  • Baseline version pinning — operators want to lock to a baseline version for audit windows; upgrade flow needs to be explicit.
  • Conflict between event-driven rules (Detection-as-code: declarative YAML rules + community rule packs #47) and drift findings: dedupe so the same misconfig doesn't open two findings.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions