Problem
When running gNMIC in a Kubernetes StatefulSet cluster (15–25 replicas), the leader-wait-timer creates an unavoidable trade-off between cold-start safety and rolling-restart speed.
Cold start scenario: All pods start simultaneously (podManagementPolicy: Parallel). If the leader dispatches targets before other pods have registered with Consul, a small number of early pods receive all targets and OOM. A long leader-wait-timer (e.g. 300s) prevents this.
Rolling restart scenario: Pods restart one at a time. When the leader pod restarts and a new leader is elected, 14–24 other pods are already running and registered. The long leader-wait-timer still fires, causing an unnecessary 5-minute metrics collection gap despite no OOM risk.
There is no way to configure gNMIC to distinguish between these two scenarios — the timer is a fixed delay regardless of cluster state.
Current behavior
In pkg/app/clustering.go, after a pod wins the leader lock, it unconditionally sleeps for LeaderWaitTimer before starting the loader and dispatching targets:
go func() {
go a.watchMembers(ctx)
a.Logger.Printf("leader waiting %s before dispatching targets",
a.Config.Clustering.LeaderWaitTimer)
time.Sleep(a.Config.Clustering.LeaderWaitTimer) // fixed delay
a.Logger.Printf("leader done waiting, starting loader and dispatching targets")
go a.startLoader(ctx)
go a.dispatchTargets(ctx)
}()
Meanwhile, watchMembers() is already running concurrently and populating a.apiServices with healthy registered instances (via Consul TTL health checks). The leader already knows how many cluster members are ready — it just doesn't use that information.
Proposed solution
Add a new clustering config field, min-ready-instances, that allows the leader to dispatch targets as soon as a sufficient number of cluster members have registered — while keeping leader-wait-timer as a maximum timeout.
Config example
clustering:
leader-wait-timer: 300s # maximum wait (safety net / timeout)
min-ready-instances: 12 # dispatch as soon as 12 members registered
Behavior
- If
min-ready-instances is set, the leader polls len(a.apiServices) during the wait period
- As soon as
len(a.apiServices) >= min-ready-instances, dispatch begins immediately
- If the threshold isn't reached within
leader-wait-timer, dispatch proceeds anyway (current behavior, prevents infinite blocking)
- If
min-ready-instances is not set (default 0), behavior is unchanged — pure timer-based wait
Implementation sketch
The change is localized to startCluster() in pkg/app/clustering.go and the config struct in pkg/config/clustering.go:
// In pkg/config/clustering.go — add to struct:
MinReadyInstances int `mapstructure:"min-ready-instances,omitempty" ...`
// In pkg/app/clustering.go — replace time.Sleep with:
deadline := time.After(a.Config.Clustering.LeaderWaitTimer)
ticker := time.NewTicker(2 * time.Second)
defer ticker.Stop()
for {
select {
case <-deadline:
a.Logger.Printf("leader-wait-timer expired, dispatching with %d instances",
len(a.apiServices))
goto DISPATCH
case <-ticker.C:
a.configLock.RLock()
n := len(a.apiServices)
a.configLock.RUnlock()
if a.Config.Clustering.MinReadyInstances > 0 && n >= a.Config.Clustering.MinReadyInstances {
a.Logger.Printf("min-ready-instances threshold met (%d/%d), dispatching",
n, a.Config.Clustering.MinReadyInstances)
goto DISPATCH
}
case <-ctx.Done():
return
}
}
DISPATCH:
Impact
| Scenario |
Current (300s timer) |
With min-ready-instances |
| Cold start (15 pods) |
5 min delay |
~30-60s (pods register quickly with Parallel policy) |
| Rolling restart |
5 min delay |
~2-4s (14 pods already registered) |
| Partial failure |
5 min delay |
Waits until threshold OR timeout |
Our deployment context
We run gNMIC v0.43.0 in production across multiple Kubernetes clusters:
- 15–25 replicas per cluster
- 200+ Arista/Junos targets per cluster
- Consul-based clustering with TTL health checks
podManagementPolicy: Parallel StatefulSets
- The 5-minute gap during rolling restarts is our primary pain point
We're happy to contribute a PR if the maintainers are open to this approach.
Problem
When running gNMIC in a Kubernetes StatefulSet cluster (15–25 replicas), the
leader-wait-timercreates an unavoidable trade-off between cold-start safety and rolling-restart speed.Cold start scenario: All pods start simultaneously (
podManagementPolicy: Parallel). If the leader dispatches targets before other pods have registered with Consul, a small number of early pods receive all targets and OOM. A longleader-wait-timer(e.g. 300s) prevents this.Rolling restart scenario: Pods restart one at a time. When the leader pod restarts and a new leader is elected, 14–24 other pods are already running and registered. The long
leader-wait-timerstill fires, causing an unnecessary 5-minute metrics collection gap despite no OOM risk.There is no way to configure gNMIC to distinguish between these two scenarios — the timer is a fixed delay regardless of cluster state.
Current behavior
In
pkg/app/clustering.go, after a pod wins the leader lock, it unconditionally sleeps forLeaderWaitTimerbefore starting the loader and dispatching targets:Meanwhile,
watchMembers()is already running concurrently and populatinga.apiServiceswith healthy registered instances (via Consul TTL health checks). The leader already knows how many cluster members are ready — it just doesn't use that information.Proposed solution
Add a new clustering config field,
min-ready-instances, that allows the leader to dispatch targets as soon as a sufficient number of cluster members have registered — while keepingleader-wait-timeras a maximum timeout.Config example
Behavior
min-ready-instancesis set, the leader pollslen(a.apiServices)during the wait periodlen(a.apiServices) >= min-ready-instances, dispatch begins immediatelyleader-wait-timer, dispatch proceeds anyway (current behavior, prevents infinite blocking)min-ready-instancesis not set (default0), behavior is unchanged — pure timer-based waitImplementation sketch
The change is localized to
startCluster()inpkg/app/clustering.goand the config struct inpkg/config/clustering.go:Impact
Our deployment context
We run gNMIC v0.43.0 in production across multiple Kubernetes clusters:
podManagementPolicy: ParallelStatefulSetsWe're happy to contribute a PR if the maintainers are open to this approach.