Skip to content

bmendonca3/k8s-auto-fix

Repository files navigation

k8s-auto-fix

k8s-auto-fix is a closed-loop pipeline that detects Kubernetes misconfigurations, proposes JSON patches, verifies them against guardrails, and schedules accepted fixes. It supports deterministic rules as well as Grok and OpenAI-compatible LLM modes, and underpins the accompanying research paper.

Key features

  • End-to-end detector -> proposer -> verifier -> risk -> scheduler -> queue workflow with reproducible CLI entry points.
  • Switchable proposer backends (rules, Grok, vendor, vLLM) with semantic regression checks, targeted policy guidance, and optional response caching for remote model runs.
  • Verifier integrates kube-linter, Kyverno, kubectl apply --dry-run=server, and bespoke safety gates before a patch is accepted.
  • Metrics bundles, benchmarks, and reproducibility scripts that back the paper's evaluation.

Getting started

pip install -r requirements.txt    # dependencies (see make setup)
make doctor                        # check local prerequisites and optional tools
make tiny-regression               # CI-safe detector/proposer/verifier/scheduler/queue smoke
make kind-up                       # optional: bring up a Kind verification cluster
make fixtures                      # optional: seed RBAC/NetworkPolicy fixtures after a cluster is active
make e2e                           # optional: scanner-backed end-to-end run

Workflow at a glance

Stage Command Output
Detect misconfigurations python -m src.detector.cli --in data/manifests --out data/detections.json --policies-dir data/policies/kyverno --jobs 4 data/detections.json
Generate patches python -m src.proposer.cli --detections data/detections.json --out data/patches.json --config configs/run.yaml --jobs 4 data/patches.json
Verify patches python -m src.verifier.cli --patches data/patches.json --detections data/detections.json --out data/verified.json --include-errors --require-kubectl --enable-rescan --policies-dir data/policies/kyverno --jobs 4 data/verified.json
Compute risk make cti && python -m src.risk.cli --detections data/detections.json --out data/risk.json --epss-csv data/epss.csv --kev-json data/kev.json data/risk.json
Schedule fixes python -m src.scheduler.cli --verified data/verified.json --detections data/detections.json --risk data/risk.json --out data/schedule.json data/schedule.json
Summarize rollout batches python -m src.scheduler.cli --verified data/verified.json --detections data/detections.json --risk data/risk.json --out data/schedule.json --batch-group-by policy --batches-out tmp/schedule-batches.json tmp/schedule-batches.json
Queue accepted patches python -m src.scheduler.queue_cli enqueue --db data/queue.db --verified data/verified.json --detections data/detections.json --risk data/risk.json data/queue.db

Benchmark helpers (make benchmark-grok200, make benchmark-full, make benchmark-scheduler) and aggregation commands (python -m src.eval.metrics, make summarize-failures) mirror the evaluation in the paper. Use make pipeline-plan to print the default lightweight detector -> proposer -> verifier -> risk -> scheduler command plan without running it.

Components

  • Detector (src/detector) wraps kube-linter and Kyverno, applies extra guards (hostPath, hostPort, CronJob traversal), and emits rigid detections.
  • Proposer (src/proposer) merges rule-based fixes with LLM output, validates JSON Patch structure, blocks destructive edits (container or volume removal, service-account regressions), and can cache validated model responses by input/config hash.
  • Verifier (src/verifier) rechecks policy conformance, performs kubectl dry-runs, enforces custom safety assertions, and optionally rescans the targeted policy.
  • Scheduler (src/scheduler) ranks accepted patches using acceptance probability, expected runtime, exploration, aging, and KEV signals; supports queue management and opt-in batch summaries.
  • Scheduler batches and rollout helpers (src/scheduler/batches.py, src/scheduler/rollout.py) group prioritized fixes by policy, namespace, owner/team, or root cause, then annotate batches with change-window and blast-radius metadata for operator-friendly rollout planning.
  • Risk enrichment (src/risk) fuses EPSS/KEV feeds and optional image scans for downstream prioritisation.
  • Automation (Makefile, scripts/) provides repeatable entry points for experiments, telemetry refresh, and reproducibility bundles.

Repository layout

  • archives/ – historical exports and large bundles kept out of the active workspace.
  • configs/ – pipeline presets (run.yaml, run_grok.yaml, run_rules.yaml).
  • data/ – retains the canonical folders (data/manifests, data/batch_runs, etc.) and now exposes curated views via data/corpora/ (inputs) and data/outputs/ (generated artefacts). See data/README.md for details.
  • data/samples/tiny_regression/ – small CI-safe manifests and expected outcomes for detector, proposer, verifier, scheduler, and queue behavior.
  • docs/ – research notes, policy guidance, reproducibility appendices, future work plans.
  • infra/fixtures/ – RBAC, NetworkPolicies, and manifest samples (CronJob scanner, Bitnami PostgreSQL) for reproducing edge cases.
  • logs/ – proposer/verifier transcripts, Grok sweep summaries, and root-level logs (e.g. logs/access.log).
  • notes/ – working notes and backlog items formerly at the repository root.
  • paper/ – IEEE Access manuscript sources; appendices live in paper/appendices.tex (no zip bundle checked in), and Overleaf-ready sources sit under paper/overleaf/.
  • scripts/ – maintenance and evaluation helpers; see scripts/README.md for an index by pipeline stage.
  • src/ – core packages (common, detector, proposer, risk, scheduler, verifier).
  • tests/ – pytest suite validating detectors, proposer guardrails, verifier gates, scheduler scoring, CLI tooling.
  • tmp/ – scratch workspace (ignored by git). Historic large exports remain under archives/ if needed.

Documentation and helper scripts

  • Architecture maps the detector -> proposer -> verifier -> risk -> scheduler -> queue flow and the operator review path.
  • Contributing covers local setup, test tiers, artifact hygiene, and secret-handling expectations.
  • Troubleshooting, Security Model, and Artifact Policy explain failure triage, verifier trust boundaries, and tracked-artifact retention.
  • scripts/doctor.py (make doctor) checks Python packages, optional Kubernetes tools, and key repository paths.
  • scripts/validate_configs.py (make validate-configs) validates checked-in YAML config structure.
  • scripts/check_docs_links.py (make docs-link-check) checks local Markdown links and heading anchors in the docs set.
  • scripts/check_metrics_consistency.py (make metrics-consistency) checks paper-facing metric text against canonical JSON artifacts without modifying files.
  • scripts/clean_generated.py (make clean-generated) lists ignored generated outputs that are safe to remove with the script's explicit --delete flag.
  • scripts/run_pipeline.py (make pipeline-plan, make pipeline-manifest-smoke, make pipeline-status-smoke) prints a rules-mode pipeline plan, optionally writing reproducibility and per-stage status JSON with declared input/output paths, file hashes, and remediation hints, or runs it when invoked directly with --run and optional --resume.
  • scripts/run_tiny_regression.py (make tiny-regression) validates the tiny fixture pack without kube-linter, Kyverno, kubectl, a cluster, or API keys.
  • scripts/build_review_packet.py (make review-packet-smoke, make review-packet-concise-smoke, make review-packet-rollout-smoke) combines verifier summaries, selected patch diffs, schedule explanation, optional rollout batches, queue health, and artifact traceability into a bounded operator review packet; --markdown-mode concise emits a PR/release-friendly summary without diff blocks.
  • scripts/render_patch_diff.py (make patch-diff-smoke) renders unified before/after YAML diffs for patch review.
  • scripts/verifier_report.py (make verifier-report) groups verifier rejects by gate, policy, and error with suggested next actions.
  • scripts/artifact_index.py (make artifact-index-smoke) inventories tracked artifact-like files; write full indexes to ignored tmp/ paths when needed.
  • scripts/artifact_traceability.py (make artifact-traceability-smoke) emits size, SHA-256, producer, and category records for selected artifacts.
  • scripts/build_evidence_manifest.py (make evidence-manifest-smoke, make evidence-manifest-pipeline-smoke, make evidence-manifest-claims-smoke, make evidence-manifest-claims-enforce) composes selected artifact traceability records with producer commands, claim labels, artifact hashes, optional pipeline manifest/status stage metadata, optional paper/research claim-table coverage, and claim-coverage summaries into JSON or Markdown. Add --fail-on-uncovered-claims with --claims-table when uncovered expected claims should fail the command.
  • scripts/gitops_writeback.py (make gitops-plan-smoke) builds a dry-run writeback plan for accepted patches, including skipped entries and reasons, without changing files, branches, commits, or PRs.
  • scripts/scheduler_explain.py (make scheduler-explain-smoke) explains scheduler score inputs, components, and final priority order.
  • scripts/queue_report.py (make queue-report-smoke) reports scheduler queue health from SQLite in read-only mode.

Paper and appendices

  • Main manuscript: paper/access.tex (title: “Closed-Loop Threat-Guided Auto-Fixing of Kubernetes Container Security Misconfigurations”).
  • Supplemental appendices: paper/appendices.tex (plain-English reading guide, risk worked example, glossary, artifact index). Legacy appendix zip bundles have been removed from the repo.
  • To push to Overleaf, use the contents of paper/ (or the mirror under paper/overleaf/); no zip archives are tracked here.

Configuration

configs/run.yaml centralises proposer configuration:

seed: 1337
max_attempts: 3
proposer:
  mode: grok          # rules | grok | vendor | vllm
  retries: 2
  timeout_seconds: 60
  cache_dir: tmp/proposer-cache  # optional; caches validated non-rules responses
grok:
  endpoint: "https://api.x.ai/v1/chat/completions"
  model: "grok-4.3"
  api_key_env: "XAI_API_KEY"
retry_budgets:
  default: 3
  no_latest_tag: 2
vendor:
  endpoint: "https://api.openai.com/v1/chat/completions"
  model: "gpt-4o-mini"
  api_key_env: "OPENAI_API_KEY"
vllm:
  endpoint: "https://<RUNPOD_ENDPOINT>/v1/chat/completions"
  model: "meta-llama/Meta-Llama-3-8B-Instruct"
  api_key_env: "RUNPOD_API_KEY"
rules:
  enabled: true

Export the appropriate API key (XAI_API_KEY, OPENAI_API_KEY, RUNPOD_API_KEY) before invoking remote modes.

Testing and QA

  • make doctor - check Python, required packages, optional Kubernetes tools, and key repo paths.
  • make validate-configs - validate checked-in YAML config structure.
  • make docs-link-check - check local Markdown links and heading anchors in the docs set.
  • make metrics-consistency - fail if paper-facing metrics drift from canonical JSON artifacts.
  • make secret-scan - scan tracked and unignored repo text files for common secret/token patterns, skipping artifact-heavy sample/generated paths by default.
  • make pipeline-plan - preview the lightweight pipeline commands without writing outputs.
  • make pipeline-manifest-smoke - write a dry-run reproducibility manifest to ignored tmp/.
  • make pipeline-status-smoke - write dry-run per-stage pipeline status to ignored tmp/.
  • make test - run the default pytest suite without requiring a Kubernetes cluster, API keys, or generated evaluation artifacts.
  • make tiny-regression - run the CI-safe fixture pack through builtin detector checks, rule-based proposer/verifier, scheduler, and a temp queue.
  • make artifact-test - opt into generated-artifact checks such as patch minimality/idempotence for data/patches.json.
  • make artifact-index-smoke - run a bounded tracked-artifact inventory using scripts/artifact_index.py --limit 10.
  • make artifact-traceability-smoke - compute a deterministic traceability record for data/patches.json.
  • make evidence-manifest-smoke - build a small evidence manifest with artifact hashes, producer command, and claim labels.
  • make evidence-manifest-pipeline-smoke - build an evidence manifest that includes pipeline stage inputs, outputs, hashes, and statuses.
  • make evidence-manifest-claims-smoke - build an evidence manifest with expected-claim coverage, including a deliberately uncovered smoke claim for reporting.
  • make evidence-manifest-claims-enforce - build an evidence manifest and fail if the expected claims table has uncovered claims.
  • make gitops-plan-smoke - build a dry-run GitOps writeback plan into ignored tmp/ without mutating files or git state.
  • make patch-diff-smoke - render a bounded patch review diff for one smoke detection.
  • make verifier-report - summarize rejected verifier records from data/verified.json.
  • make scheduler-explain-smoke - explain a sample scheduler decision from stdin.
  • make scheduler-batches-smoke - emit grouped schedule batch summaries into ignored tmp/ files.
  • make queue-report-smoke - render a read-only queue health JSON report from data/queue.db.
  • make review-packet-smoke - build a bounded operator review packet for one smoke detection.
  • make review-packet-concise-smoke - build a PR/release-friendly review packet summary without diff blocks.
  • make review-packet-rollout-smoke - build a concise review packet with scheduler batch rollout annotations.
  • make clean-generated - list ignored generated outputs that scripts/clean_generated.py --delete may remove.
  • make e2e - exercises the full pipeline on bundled manifests.
  • make summarize-failures - aggregates verifier rejects by policy/manifest.
  • make reproducible-report - rebuilds the research appendix with current artifacts.
  • scripts/parallel_runner.py - parallelise proposer/verifier workloads; scripts/probe_grok_rate.py sizes safe LLM concurrency.

Metrics aligned to the paper (traceable in-repo)

  • Full rules + guardrails replay – 13,338 / 13,373 patched items accepted (99.74%; auto-fix rate 0.8486 over 15,718 detections; median patch ops 9) from data/metrics_rules_full.json (patches_rules_full.json.gz, verified_rules_full.json.gz).
  • Rules on the 5k extended corpus – 4,677 / 5,000 accepted (93.54%; median ops 6) from data/metrics_rules_5000.json (patches_rules_5000.json, verified_rules_5000.json).
  • Grok/xAI 5k proposer – 4,426 / 5,000 accepted (88.52%; median ops 9) from data/outputs/batch_runs/grok_5k/metrics_grok5k.json.
  • Supported corpus (rules) – 1,264 / 1,264 accepted (median ops 8) captured in data/outputs/batch_runs/secondary_supported/summary.json and metrics_rules.json.
  • Live-cluster replay – 1,000 / 1,000 dry-run and live-apply success on the stratified slice (data/live_cluster/summary_1k.csv).
  • Scheduler fairnessdata/metrics_schedule_compare.json shows top-50 high-risk items at median rank 25.5 (P95 48) for the bandit vs median 422.5 (P95 620) under FIFO; wait-time sweeps live in data/metrics_schedule_sweep.json.

Policy-level success probabilities and runtimes regenerate via scripts/compute_policy_metrics.py into data/policy_metrics.json. Scheduler sweeps and fairness telemetry are viewable at data/outputs/scheduler/metrics_schedule_sweep.json.

Large corpus artefacts now live under data/outputs/ and are stored as compressed .json.gz files to keep the repository lean. Gunzip the patches/verified/metrics files there before using tooling that expects plain .json inputs.

Related work

System Scope in paper Evidence / guardrails Scheduling
k8s-auto-fix (this work) Closed-loop detect → propose → verify → schedule JSON Patch rules + optional LLMs behind policy/schema/kubectl --dry-run gates; secret sanitisation; CRD/fixture seeding Risk-aware bandit with aging + KEV boost (data/metrics_schedule_compare.json)
GenKubeSec (2024) LLM-based detection/localization/remediation; authors report precision 0.990, recall 0.999 on a ~277k KCF corpus with 30-sample expert validation Human review; no automated guardrails None (FIFO human review)
Kyverno (mutation engine) Admission-time mutation/validation; depends on cluster fixtures Policy-driven mutate/validate; CLI baseline scripted in scripts/run_kyverno_baseline.py with results in data/baselines/kyverno_baseline.csv FIFO admission queue
Borg/SRE playbooks Production auto-remediation for infra fleets Health checks, rollbacks, throttling; no public acceptance % Priority queues / toil budgets
LLMSecConfig (2025) LLM remediation prompts with scanner checks Scanner re-checks; no server-side dry-run None

Baselines and Reproducibility

  • Kyverno mutate baseline (simulate or real): scripts/run_kyverno_baseline.py
  • Polaris mutate/CLI fix baseline (simulate or real): scripts/run_polaris_baseline.py
  • MutatingAdmissionPolicy baseline (simulate or YAML generation): scripts/run_mutatingadmission_baseline.py
  • LLMSecConfig-style slice: scripts/run_llmsecconfig_slice.py (requires OPENAI_API_KEY)
  • Risk throughput (KEV-weighted): scripts/eval_risk_throughput.py
  • Unified baseline comparison: scripts/compare_baselines.py (writes CSV/MD/TeX)

Quick start to regenerate bundles and baselines (simulation mode):

scripts/reproduce_all.sh

See ARTIFACTS.md for artifact map, docs/VERIFIER.md for guardrails, docs/BASELINES.md to run baselines, docs/RISK_EVAL.md for prioritization metrics, and docs/LIVE_EVAL.md for live-cluster methodology.

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors