Skip to content

lab8: Prometheus + Grafana golden signals dashboard and HighErrorRate alert#1336

Open
1r444444 wants to merge 8 commits into
inno-devops-labs:mainfrom
1r444444:feature/lab8
Open

lab8: Prometheus + Grafana golden signals dashboard and HighErrorRate alert#1336
1r444444 wants to merge 8 commits into
inno-devops-labs:mainfrom
1r444444:feature/lab8

Conversation

@1r444444

@1r444444 1r444444 commented Jul 5, 2026

Copy link
Copy Markdown

Summary

  • compose.yaml extended with prom/prometheus:v3.4.1 and grafana/grafana:12.0.2; prometheus depends_on quicknotes healthcheck
  • monitoring/prometheus/prometheus.yml: scrapes quicknotes:8080 every 15 s
  • monitoring/prometheus/rules/alerts.yml: HighErrorRate fires when error ratio > 5% sustained for 5 min, severity: page, links runbook
  • monitoring/grafana/: file-provisioned Prometheus datasource + 4-panel golden signals dashboard (Latency, Traffic, Errors, Saturation)
  • docs/runbook/high-error-rate.md: triage steps, mitigations, post-incident checklist
  • submissions/lab8.md: config files, targets API output, alert lifecycle evidence, 7 design questions answered

Test plan

  • docker compose up -d — all 3 services healthy
  • curl localhost:9090/api/v1/targets"up" for quicknotes
  • Grafana dashboard auto-provisioned at localhost:3000 with 4 panels
  • ~217 mixed requests generated, non-trivial traffic visible in panels
  • HighErrorRate alert: pending at 10:28:36 UTC → firing at 10:33:41 UTC (5 min 5 s, error ratio ~29%)
  • All 7 design questions (a–g) answered in submissions/lab8.md

1r444444 added 8 commits June 9, 2026 09:25
Signed-off-by: Irina <irina.bychkova06@mail.ru>
Signed-off-by: Irina <irina.bychkova06@mail.ru>
Ubuntu 22.04 (jammy64), Go 1.24.5 via shell provisioner, port forward
127.0.0.1:18080 -> guest:8080, VirtualBox shared folder for ./app,
2 vCPU / 1024 MB RAM.
Task 1 and Task 2 design questions answered; terminal output placeholders
left for VM execution; bonus comparison table scaffold included.
Switch provider to qemu (perk/ubuntu-2204-arm64) for Apple Silicon host.
Fill submission with actual vagrant up, curl, snapshot lifecycle outputs.
- ansible/playbook.yaml: idempotent deploy — system user, data dir,
  binary, seed.json, systemd unit via Jinja2 template, handler-driven restart
- ansible/inventory.ini: targets Lab 5 Vagrant VM (127.0.0.1:50022)
- ansible/templates/quicknotes.service.j2: unit template with env vars
- ansible/files/: static arm64 binary (CGO_ENABLED=0) + seed.json
- submissions/lab7.md: PLAY RECAPs, curl proof, all 7 design questions
… alert

- compose.yaml: adds prometheus:v3.4.1 and grafana:12.0.2 services on top
  of Lab 6 QuickNotes; prometheus depends_on quicknotes healthcheck
- monitoring/prometheus/prometheus.yml: scrapes quicknotes:8080 every 15s
- monitoring/prometheus/rules/alerts.yml: HighErrorRate fires when error
  ratio >5% sustained for 5min, severity:page, links runbook
- monitoring/grafana/: provisioned datasource (Prometheus) and 4-panel
  golden signals dashboard (Latency, Traffic, Errors, Saturation)
- docs/runbook/high-error-rate.md: triage steps, mitigations, post-incident
- submissions/lab8.md: all config, targets API output, alert firing evidence,
  7 design questions answered (a-g)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant