Skip to content

[Incident] Sev0 - ApplicationDown alert fired on db-timetable-mcp-server (2026-06-08 21:05 UTC) #15

@abeckDev

Description

@abeckDev

Incident Report

Alert: ApplicationDown (Severity 0 — Critical)
Resource: db-timetable-mcp-server (Container App, Sweden Central)
Resource ID: /subscriptions/b814d650-0963-4701-9dc9-feffc970ad6a/resourceGroups/DB-Timetable-MCP/providers/Microsoft.App/containerApps/db-timetable-mcp-server
Signal: Metric — Replicas < 1 (Total, 5-min window, evaluated every 1 min)
Alert Fired: 2026-06-08T21:05:24Z
Alert Resolved: 2026-06-08T21:08:19Z (auto-mitigated)
Duration: ~3 minutes


Timeline of Events (UTC)

Time Event Actor
21:03:09 Start Container App (Accepted) admin@MngEnvMCAP857392.onmicrosoft.com
21:03:18 Create or Update Container App — Failed admin@MngEnvMCAP857392.onmicrosoft.com
21:03:25 Start Container App (Succeeded) admin@MngEnvMCAP857392.onmicrosoft.com
21:05:24 Alert Fired — Replicas < 1 Azure Monitor
21:05:47 Stop Container App (Started) admin@MngEnvMCAP857392.onmicrosoft.com
21:06:04 Stop Container App (Succeeded) admin@MngEnvMCAP857392.onmicrosoft.com
21:08:19 Alert Resolved (auto-mitigate) Azure Monitor
21:10:11 Start Container App (Started) admin@MngEnvMCAP857392.onmicrosoft.com
21:10:28 Start Container App (Succeeded) admin@MngEnvMCAP857392.onmicrosoft.com
21:10:39 Create or update action group (IncidentHandling) admin@MngEnvMCAP857392.onmicrosoft.com
21:11:23 Create or update metric alert (ApplicationDown) admin@MngEnvMCAP857392.onmicrosoft.com
21:13:15 Create or update action group (IncidentHandling) admin@MngEnvMCAP857392.onmicrosoft.com

Root Cause

Manual administrative action. The container app was explicitly stopped by admin@MngEnvMCAP857392.onmicrosoft.com at 21:05:47Z, which caused replicas to drop to 0 and triggered the ApplicationDown alert. The app was then restarted at 21:10:11Z. A failed Create/Update operation at 21:03:18Z preceded the stop — this may indicate a configuration change attempt that failed, prompting the stop/start cycle.

This was not an infrastructure failure, scaling issue, or application crash. The outage was caused by intentional operator intervention.

Current State

  • Status: Running (Healthy)
  • Revision: db-timetable-mcp-server--0000006 (active, 1 replica, 100% traffic)
  • Provisioning: Succeeded
  • Alert: Resolved

Recommendations

  1. Change management: Implement a maintenance window / change notification process before stopping production container apps to avoid unnecessary Sev0 alerts.
  2. Alert suppression during maintenance: Consider using Azure Monitor action rules to suppress alerts during planned maintenance windows.
  3. Investigate failed update: The failed Create/Update at 21:03:18Z should be investigated to understand what configuration change was attempted and why it failed.
  4. Set min replicas > 0: The app currently has minReplicas: 1 which is correct, but ensure scaling rules don't inadvertently scale to 0.

This incident was automatically created by Azure SRE Agent.
Thread: https://sre.azure.com/agents/subscriptions/b814d650-0963-4701-9dc9-feffc970ad6a/resourceGroups/SRE-Demo/providers/Microsoft.App/agents/sre-agent/views/thread/019a4bac-42da-4ea5-bb5b-dc2a66b170af

This issue was created by sre-agent--205413df
Tracked by the SRE agent here

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions