device-health-oracle: detect link impairment from monitoring data and update link.health

## Context

We have a steady stream of link health issues on the network, but the device-health-oracle currently has **zero link criteria configured** — links auto-advance from Pending to ReadyForService without checks, and ReadyForService is treated as a terminal state with no demotion path. Meanwhile, mainnet has activated links with ISIS down and 100% packet loss that are not reflected in `link.health`.

## Goal

Enable the device-health-oracle to:
1. **Detect impaired links** from monitoring data and write `LinkHealth = Impaired` onchain
2. **Detect recovered links** and restore `LinkHealth = ReadyForService`
3. Support **bidirectional transitions**: `ReadyForService ↔ Impaired`

## Approach

### Data source: `link_rollup_5m`

Query the existing `link_rollup_5m` ClickHouse table, which already aggregates per-link health into 5-minute buckets:
- `link_pk` — link public key (no joins needed)
- `isis_down` (bool) — whether ISIS adjacency is down
- `a_loss_pct` / `z_loss_pct` (float) — packet loss percentage per direction

This table is in the lake ClickHouse instance the DHO already connects to.

### Impairment criteria

A link is **impaired** if the most recent `link_rollup_5m` bucket shows:
- `isis_down = true`, OR
- `a_loss_pct > threshold` OR `z_loss_pct > threshold` (default threshold: 5%, configurable via flag)

### Recovery criteria

A link has **recovered** if ALL `link_rollup_5m` buckets within the recovery window are clean (ISIS up AND loss ≤ threshold). The recovery window is derived from **ledger slots** (using the existing `DrainedSlotCount` parameter, resolved to wall-clock time via `GetBlockTime`), consistent with how device burn-in windows work. This asymmetry — fast impairment detection, slow recovery — prevents flapping.

### Evaluator changes

Extend `LinkHealthEvaluator.Evaluate()` to support three paths:
- **ReadyForService**: check impairment criteria → demote to `Impaired` if any fail
- **Impaired**: check impairment criteria over recovery window → promote to `ReadyForService` if all pass
- **Pending/Unknown**: check promotion criteria → advance to `ReadyForService` (existing behavior)

Add a `LinkBurnIn` helper (parallel to `DeviceBurnIn`) for slot-based window resolution.

### Onchain effect

Writing `LinkHealth = Impaired` updates the health field but does **not** automatically change `link.status` — the serviceability program's `check_status_transition()` is gated behind a "waiting for health oracle" comment. This is intentional: the health field serves as a signal to operators and dashboards. Automatic status transitions can be enabled later.

## Implementation

See [implementation plan](docs/superpowers/plans/2026-05-05-dho-link-health-impairment.md) for detailed tasks and code.

### Files to change

| Action | File | What |
|--------|------|------|
| Modify | `internal/worker/criteria.go` | Bidirectional `LinkHealthEvaluator`, `LinkBurnIn` helper |
| Modify | `internal/worker/criteria_test.go` | Tests for impairment/recovery transitions |
| Create | `internal/worker/link_health.go` | `LinkHealthCriterion` — queries `link_rollup_5m` |
| Create | `internal/worker/link_health_test.go` | Unit tests |
| Modify | `internal/worker/clickhouse.go` | `LinkHealthChecker` interface, `LinkHealthRecent` query |
| Modify | `cmd/device-health-oracle/main.go` | Wire up criterion, add `--link-loss-threshold` flag |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

device-health-oracle: detect link impairment from monitoring data and update link.health #2652

Context

Goal

Approach

Data source: `link_rollup_5m`

Impairment criteria

Recovery criteria

Evaluator changes

Onchain effect

Implementation

Files to change

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Action	File	What
Modify	`internal/worker/criteria.go`	Bidirectional `LinkHealthEvaluator`, `LinkBurnIn` helper
Modify	`internal/worker/criteria_test.go`	Tests for impairment/recovery transitions
Create	`internal/worker/link_health.go`	`LinkHealthCriterion` — queries `link_rollup_5m`
Create	`internal/worker/link_health_test.go`	Unit tests
Modify	`internal/worker/clickhouse.go`	`LinkHealthChecker` interface, `LinkHealthRecent` query
Modify	`cmd/device-health-oracle/main.go`	Wire up criterion, add `--link-loss-threshold` flag

Uh oh!

device-health-oracle: detect link impairment from monitoring data and update link.health #2652

Description

Context

Goal

Approach

Data source: link_rollup_5m

Impairment criteria

Recovery criteria

Evaluator changes

Onchain effect

Implementation

Files to change

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Data source: `link_rollup_5m`