lab8: Prometheus + Grafana golden signals dashboard and HighErrorRate alert#1336
Open
1r444444 wants to merge 8 commits into
Open
lab8: Prometheus + Grafana golden signals dashboard and HighErrorRate alert#13361r444444 wants to merge 8 commits into
1r444444 wants to merge 8 commits into
Conversation
Signed-off-by: Irina <irina.bychkova06@mail.ru>
Signed-off-by: Irina <irina.bychkova06@mail.ru>
Ubuntu 22.04 (jammy64), Go 1.24.5 via shell provisioner, port forward 127.0.0.1:18080 -> guest:8080, VirtualBox shared folder for ./app, 2 vCPU / 1024 MB RAM.
Task 1 and Task 2 design questions answered; terminal output placeholders left for VM execution; bonus comparison table scaffold included.
Switch provider to qemu (perk/ubuntu-2204-arm64) for Apple Silicon host. Fill submission with actual vagrant up, curl, snapshot lifecycle outputs.
- ansible/playbook.yaml: idempotent deploy — system user, data dir, binary, seed.json, systemd unit via Jinja2 template, handler-driven restart - ansible/inventory.ini: targets Lab 5 Vagrant VM (127.0.0.1:50022) - ansible/templates/quicknotes.service.j2: unit template with env vars - ansible/files/: static arm64 binary (CGO_ENABLED=0) + seed.json - submissions/lab7.md: PLAY RECAPs, curl proof, all 7 design questions
… alert - compose.yaml: adds prometheus:v3.4.1 and grafana:12.0.2 services on top of Lab 6 QuickNotes; prometheus depends_on quicknotes healthcheck - monitoring/prometheus/prometheus.yml: scrapes quicknotes:8080 every 15s - monitoring/prometheus/rules/alerts.yml: HighErrorRate fires when error ratio >5% sustained for 5min, severity:page, links runbook - monitoring/grafana/: provisioned datasource (Prometheus) and 4-panel golden signals dashboard (Latency, Traffic, Errors, Saturation) - docs/runbook/high-error-rate.md: triage steps, mitigations, post-incident - submissions/lab8.md: all config, targets API output, alert firing evidence, 7 design questions answered (a-g)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
compose.yamlextended withprom/prometheus:v3.4.1andgrafana/grafana:12.0.2; prometheusdepends_onquicknotes healthcheckmonitoring/prometheus/prometheus.yml: scrapesquicknotes:8080every 15 smonitoring/prometheus/rules/alerts.yml:HighErrorRatefires when error ratio > 5% sustained for 5 min,severity: page, links runbookmonitoring/grafana/: file-provisioned Prometheus datasource + 4-panel golden signals dashboard (Latency, Traffic, Errors, Saturation)docs/runbook/high-error-rate.md: triage steps, mitigations, post-incident checklistsubmissions/lab8.md: config files, targets API output, alert lifecycle evidence, 7 design questions answeredTest plan
docker compose up -d— all 3 services healthycurl localhost:9090/api/v1/targets→"up"for quicknoteslocalhost:3000with 4 panelspendingat 10:28:36 UTC →firingat 10:33:41 UTC (5 min 5 s, error ratio ~29%)