Skip to content

Lab 8: golden-signals monitoring + error-rate alert & runbook#1323

Open
rikire wants to merge 5 commits into
inno-devops-labs:mainfrom
rikire:feature/lab8
Open

Lab 8: golden-signals monitoring + error-rate alert & runbook#1323
rikire wants to merge 5 commits into
inno-devops-labs:mainfrom
rikire:feature/lab8

Conversation

@rikire

@rikire rikire commented Jul 3, 2026

Copy link
Copy Markdown

Goal

Add a Prometheus + Grafana monitoring stack for QuickNotes: a 4-panel golden-signals dashboard and one sustained-breach error-rate alert with a runbook.

Changes

  • compose.yaml: adds pinned prometheus (v3.5.0) + grafana (12.0.0); Prometheus depends_on quicknotes service_healthy.
  • monitoring/prometheus: scrape config (15s) for quicknotes:8080 + HighErrorRate alert rule (>5% for 5m, severity=page, runbook_url).
  • monitoring/grafana: provisioned Prometheus datasource + 4-panel dashboard (Traffic, Errors, Latency-proxy, Saturation).
  • docs/runbook/high-error-rate.md: blameless runbook (meaning, triage, mitigations, post-incident).
  • submissions/lab8.md: config, verification, dashboard + firing-alert screenshots, design answers a–g.

Testing

  • Prometheus /targets: quicknotes UP; Grafana auto-loads the 4-panel dashboard.
  • Drove sustained >5% errors; alert observed Normal -> Pending -> Firing (exactly 5m gate), severity=page, value 0.82.

Checklist

  • Title is a clear sentence (≤ 70 chars)
  • Commits are signed
  • submissions/lab8.md updated

rikire added 5 commits June 9, 2026 20:39
Signed-off-by: rikire <rizireY@yandex.ru>
Signed-off-by: rikire <rizireY@yandex.ru>
Multi-stage Dockerfile: golang:1.24 builder -> distroless/static:nonroot,
static stripped binary, nonroot, 8.56 MB (<=25). compose.yaml with named
volume, self-healthcheck (binary dual-mode for shell-less distroless), env,
restart policy, and the 6 security defaults (read-only, cap_drop ALL,
no-new-privileges, tmpfs). main.go gains a 'healthcheck' subcommand.

Signed-off-by: rikire <rizireY@yandex.ru>
…rt + runbook

compose adds pinned prometheus/grafana; provisioned datasource + 4-panel golden
signals dashboard; HighErrorRate alert (>5% for 5m, severity=page, runbook link)
observed Normal->Pending->Firing; blameless runbook in docs/runbook.

Signed-off-by: rikire <rizireY@yandex.ru>
Signed-off-by: rikire <rizireY@yandex.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant