Goal
Create operational DR runbooks for the home lab.
Why this needs user input
DR pages are step-by-step failover/failback procedures. They need:
- RTO/RPO targets per service tier — what's the recovery time objective for Splunk vs Cribl vs the homelab dashboard? What's the recovery point objective (how much data loss is acceptable)?
- Paging path — who/what gets notified when DR triggers? Personal pager? PushNotification? Email? None?
- Failback validation steps — what proves DR has succeeded? Specific synthetic queries? Health checks? Smoke tests?
The current docs (about/homelab.mdx) describe DR conceptually but don't operationalize it.
Proposed page
- Path:
infrastructure/dr-runbooks.mdx
- Sidebar group: Infrastructure
- Tier: 2
- Key sources:
terraform-aws (AWS DR infra), observability/tf-splunk-aws.mdx (existing AWS Splunk doc), ansible-proxmox (host-level recovery)
Done definition
- One runbook per critical service (Splunk failover, Proxmox restore from snapshot, network failover).
- Each runbook has: trigger conditions, prerequisites, step-by-step actions, verification commands, rollback plan.
- Linked from
infrastructure/overview.mdx.
Goal
Create operational DR runbooks for the home lab.
Why this needs user input
DR pages are step-by-step failover/failback procedures. They need:
The current docs (
about/homelab.mdx) describe DR conceptually but don't operationalize it.Proposed page
infrastructure/dr-runbooks.mdxterraform-aws(AWS DR infra),observability/tf-splunk-aws.mdx(existing AWS Splunk doc),ansible-proxmox(host-level recovery)Done definition
infrastructure/overview.mdx.