feat(alerts): per-org config with hierarchical merge#99
Draft
edospadoni wants to merge 4 commits into
Draft
Conversation
Contributor
|
🔗 Redirect URIs Added to Logto The following redirect URIs have been automatically added to the Logto application configuration: Redirect URIs:
Post-logout redirect URIs:
These will be automatically removed when the PR is closed or merged. |
Contributor
🚨 Breaking My API change detectedStructural change detailsAdded (1)
Modified (10)
Powered by Bump.sh |
83406ab to
b102ce8
Compare
Member
Author
|
update deploy |
Contributor
|
🚀 Build triggers updated! All |
Builds the operational alerts surface on top of Mimir Alertmanager: a
single paginated list endpoint plus per-system silence management,
resolved-alert history, and aggregations the UI uses to render the
overview page.
Endpoints:
- GET /alerts (cross-hierarchy / single-tenant / sub-tree scoping,
multi-value label filters, sorting on starts_at/severity/alertname,
pagination with stable fingerprint tiebreaker)
- GET /alerts/history (paginated alert_history rows with date range)
- GET /alerts/totals / /trend / /stats (severity buckets, time-series
deltas, top-N alertname/system_key, MTTR/MTBF)
- GET /alerts/{fingerprint}/activity (silence/unsilence audit timeline,
populated transparently by the silence endpoints)
- GET /systems/{id}/alerts and friends scoped to a single system
Each alert in the list is enriched with a local-DB system object
(id/name/type) so the frontend doesn't need a per-row round-trip.
Per-tenant fan-out failures are surfaced as warnings rather than
failing the whole request.
Gated on the existing read:systems / manage:systems permissions:
read for the list endpoints, manage for silence create/update/delete.
Adds POST/GET/DELETE /alerts/config — every organization saves its own
layer; the effective Mimir YAML for any tenant is the server-side merge
of all layers walking up the hierarchy from the tenant to the Owner.
The merge stays internal: /alerts/config exposes only the caller's own
row, never an inherited or merged view (no leakage of upstream
recipients or secrets to descendants).
Model is flat and recipient-centric:
enabled: {email, webhook, telegram} tri-state per layer
email_recipients: [{address, severities[], language, format}]
webhook_recipients: [{name, url, severities[]}]
telegram_recipients: [{bot_token, chat_id, severities[]}]
Per-recipient severities=[] means "all severities". Email recipients
additionally carry language (en|it) and format (html|plain) which the
template renderer turns into per-email_configs overrides:
- format=html emits our html template + our text fallback (multipart)
- format=plain emits our text template plus html: '' (the empty html
is mandatory — Alertmanager otherwise falls back to its built-in
HTML body and overrides ours with the generic "Sent by Alertmanager")
Rendering fans out a receiver per severity (critical/warning/info);
recipients with severities=[] land on every per-severity receiver. The
builtin alert-history webhook is always attached at the top of the
routes (continue: true) so history persists regardless of config.
Additive-only contract: descendants can ADD recipients but cannot
disable channels enabled by ancestors. The server normalises any
explicit false in enabled.{email,webhook,telegram} from non-Owner
layers to null on storage. Save+propagate is serialised per-org via an
in-process mutex; per-tenant push failures land in warnings[] without
failing the save. Body capped at 1 MiB; oversized requests get 413.
Gated on the dedicated alerts resource (read:alerts for GET,
manage:alerts for POST/DELETE) — admin/super only by default.
Includes:
- models/alerting.go: flat shape + Validate
- services/alerting/{merge,template,embed,effective,provision,redaction}.go
- migration 024_add_alert_config_layers
- entities/local_alert_config_layers.go (repo)
- middleware/body_limit.go
- logger/helpers.go: LogBusinessOperationDetails for audit snapshots
- methods/alerting.go: ConfigureAlerts/GetAlertingConfig/DisableAlerts
- methods/{customers,distributors,resellers}.go: provision sig change
- openapi.yaml: schemas + endpoints + 6 request examples + response examples
- templates: per-language dispatchers (alert_<lang>.html|txt|subject)
plus telegram_<lang>.message
The user-facing alerting docs and the AGENTS reference were stuck on
the previous shape (global mail_addresses/webhook_receivers + per-
severity + per-system overrides + per-tenant email_template_lang).
Rewrite the 'Alerting Configuration' section in both en and it locales
to describe the new layer model:
- flat shape: enabled tri-state + email_recipients/webhook_recipients/
telegram_recipients with per-recipient severities[]
- email recipients additionally carry language (en|it) and format
(html|plain)
- merge across the org hierarchy stays server-side; /alerts/config
returns only the caller's own layer (no inherited / merged view)
- additive-only contract; non-Owner explicit false on enabled.X is
normalised to null at save time
- RBAC: the Alerting Configuration tab is gated on read:alerts /
manage:alerts (admin/super only); the alerts list stays on
read:systems / manage:systems
Refresh the Telegram step-3 example to use the new shape and update
the email-notifications section to reflect per-recipient language and
format. Realign AGENTS.md §3.5 with the same wording.
a5d2789 to
5119691
Compare
…GET /alerts Stamp system_type at ingest (collect) alongside the other system_* labels and drop the per-request DB lookup that enriched each alert with a separate system object. Saves a SELECT on every GET /alerts and removes a redundant field the frontend never read.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Test instance
Summary
End-to-end refactor of the per-organization alerting configuration. The previous shape (global lists + per-severity overrides + per-system overrides + per-tenant
email_template_lang) was hard to consume from the UI and hard to extend; this PR replaces it with a flat, recipient-centric model where each recipient carries its own scope (severities) and rendering hints (language, format for email).The merge across the org hierarchy stays server-side only —
/alerts/configexposes the caller's own layer and nothing else. No inherited view, no merged-effective preview ever leaves the backend, so secrets and routing intent of an upstream org never leak to descendants.The branch also includes the alerts list / history / silences / activity timeline / aggregations rebuild (commit 1 of 2), kept in the same PR because both pieces were untested-on-QA, share the openapi surface, and the new list code consumes types defined by the config commit.
Frontend hand-off
The frontend
AlertingViewand its types will be rewritten by the frontend developer directly on this branch — the existing frontend code does not match this API and is intentionally left out of this PR.The new shape mirrors what
POST /alerts/configaccepts and whatGET /alerts/configreturns (datais the layer itself plusupdated_by_name/updated_at):{ "enabled": { "email": true, "webhook": null, "telegram": null }, "email_recipients": [ { "address": "noc@org.example", "severities": ["critical","warning"], "language": "it", "format": "html" } ], "webhook_recipients": [ { "name": "ops-slack", "url": "https://hooks.slack.com/...", "severities": ["critical"] } ], "telegram_recipients": [ { "bot_token": "123:ABC", "chat_id": -1001234567890, "severities": [] } ] }severities=[]= applies to all severities.enabled.X = null= no opinion at this layer (inherit). Owner only can setenabled.X = false; non-Owner explicit false is normalised to null on save.Refer to
backend/openapi.yamlfor the full schema, the 6 request examples onPOST /alerts/config, and response examples on the alert/silence endpoints (added in the openapi sections of both commits).What the backend does internally
services/alerting/merge.go): walks the chain Owner → … → tenant; unions recipients per channel with dedup keys (email→address, webhook→url, telegram→(bot_token,chat_id)); on a dedup hit, severities are unioned and "[] widens to all".services/alerting/template.go): fans out one Alertmanager receiver per severity (critical/warning/info); each email recipient emits its ownemail_configsentry referencing per-language dispatcher templates (alert_<lang>.html|txt|subject).format=plainemitshtml: ''explicitly so Alertmanager's default HTML body does not override ours.services/alerting/templates/): bothenanditare always shipped to every tenant; the dispatcher routes firing/resolved to the right language fragment.services/alerting/provision.go): on new org creation, pushes the effective merged config to Mimir so any ancestor layers take effect immediately.services/alerting/redaction.go): minimal helper used only for audit-log snapshots (webhook URL paths and Telegram tokens are scrubbed before the layer goes toLogBusinessOperationDetails). API responses never use this./alerts/config*is gated on the dedicatedalertsresource (read:alertsfor GET,manage:alertsfor POST/DELETE) — admin/super only. The list endpoints stay gated on the existingread:systems/manage:systems.Migration
Migration
024_add_alert_config_layers.sqlcreates the table for this layer. No data carry-over from any previous shape — the table starts empty and operators reconfigure/alerts/configafter deploy. Migration023_add_alert_activity.sqlcreates the alert_activity table backing the per-alert audit timeline.