Track the work to find the per-device user-GRE-tunnel ceiling on DoubleZero's physical Arista switches and pick a new MaxUserTunnelSlots default. Full design: Notion doc.
Scope at a glance
- Control-plane stress only on isolated test switches (
dzd8 7130LBR, dzd10 7280CR3A). No data plane.
- Scale onchain user records against a DUT and measure the provisioning pipeline (ledger → stress controller → agent pull → eAPI commit).
- Production controller and
dzd1–dzd4 untouched; a parallel stress controller serves the DUTs.
- Restore DUTs to pre-study state via EOS
configure replace checkpoint: at the end.
Exit criteria
- Raw orchestrator + observer outputs archived for each DUT.
- Breaking-point step and trigger identified for each DUT.
- New
MaxUserTunnelSlots default merged with a headroom-based justification.
- Both DUTs restored to standalone EOS; pre/post
show running-config diff is empty (or fully explained).
Child issues (this batch)
Deferred (not yet filed)
- Phase 1 execution run — 7130LBR (
dzd8)
- Phase 2 execution run — 7280CR3A (
dzd10)
- Adopt new
MaxUserTunnelSlots default
Related but separate work on this milestone
Out of scope (deferred — see design doc § "Future work")
tools/stress/device-report post-run analyzer, multicast boundary list scale sweep, agent poll-interval sweep, full-fabric scale sweep, regression test added to CI, redesign of the agent's full-config-every-5s pull, control-plane-to-client and data-plane-to-client testing.
Track the work to find the per-device user-GRE-tunnel ceiling on DoubleZero's physical Arista switches and pick a new
MaxUserTunnelSlotsdefault. Full design: Notion doc.Scope at a glance
dzd87130LBR,dzd107280CR3A). No data plane.dzd1–dzd4untouched; a parallel stress controller serves the DUTs.configure replace checkpoint:at the end.Exit criteria
MaxUserTunnelSlotsdefault merged with a headroom-based justification.show running-configdiff is empty (or fully explained).Child issues (this batch)
MaxUserTunnelSlotsconfigurable via flag/configtools/stress/device-orchestratortools/stress/device-observerDeferred (not yet filed)
dzd8)dzd10)MaxUserTunnelSlotsdefaultRelated but separate work on this milestone
device_stress_physical_test.go(data-plane / iperf stress).Out of scope (deferred — see design doc § "Future work")
tools/stress/device-reportpost-run analyzer, multicast boundary list scale sweep, agent poll-interval sweep, full-fabric scale sweep, regression test added to CI, redesign of the agent's full-config-every-5s pull, control-plane-to-client and data-plane-to-client testing.