Skip to content

Fix hot-path track scan, tach ring overflow, WDT-unsafe GPS recovery (H-3/H-4/H-5, partial H-8)#66

Merged
TheAngryRaven merged 2 commits into
BETAfrom
claude/hotpath-and-isr-hardening-mhs8yd
Jun 12, 2026
Merged

Fix hot-path track scan, tach ring overflow, WDT-unsafe GPS recovery (H-3/H-4/H-5, partial H-8)#66
TheAngryRaven merged 2 commits into
BETAfrom
claude/hotpath-and-isr-hardening-mhs8yd

Conversation

@TheAngryRaven

Copy link
Copy Markdown
Owner

Batch of mechanical fixes from the review's HIGH list. Note H-1/H-2 were already fixed and merged in #61 (lap_format pure unit), so this PR covers H-3/H-4/H-5 plus the tractable slice of H-8. The big-decision items are deferred — list at the bottom.

H-3 — Track-detection scan ran at ~250 Hz, not per GPS fix

gpsData.fix stays true between PVT updates, so the O(N) software-double haversine manifest scan (several ms at the 200-entry ceiling on the single-precision-FPU M4F) ran every main-loop iteration while hunting for a track. Now throttled to 1 Hz — still effectively instant at driving pace, and the loop rate stays at ~250 Hz. CLAUDE.md's track-detection flow updated to match reality.

H-4 — Tach ring ISR can no longer lap the consumer

The pulse ISR advanced the ring head unconditionally; an SD GC stall (the documented 100 ms–2 s — the same failure mode the GPS serial ring exists for) at racing RPM overruns 16 entries, breaks the SPSC invariant, and TACH_LOOP() computes a confident-but-wrong RPM that lands in the CSV and feeds the >500 RPM auto-race trigger. The ISR now does the same SPSC full check its GPS sibling already had (one slot sacrificed so head==tail is unambiguously empty), drops the pulse and sets tachRingOverflow; TACH_LOOP() then discards the carried prev-timestamp so the one period spanning the gap is never computed — the Kalman estimate coasts briefly instead of going silently wrong. Chose drop-and-flag over "size for the stall" because covering a 2 s stall at 20k RPM needs a ~4 KB ring for data that's stale the moment the stall ends. tachRingTail becomes volatile (the ISR reads it now); the sleep wake trigger still fires on dropped pulses (a dropped timestamp is still a real pulse).

H-5 — GPS baud recovery vs. the 4 s watchdog

GPS_BAUD_RECOVERY() runs from GPS_LOOP() under the armed WDT but can block for up to three ~1.1 s myGNSS.begin() probes plus ~500 ms of delays — a genuinely hung module out-waits the watchdog, turning the one path built to revive a sick GPS into a reset → re-setup → recovery → reset boot loop. It now calls wdtPet() before each blocking probe, the exact treatment fwStageToFlash() already received for the same reason.

H-8 (partial) — Tach Kalman filter extracted to a tested pure unit

New tach_filter.{h,cpp}: predict/update math, the Q/R/uncertainty-floor tuning constants, and period→RPM conversion, host-tested (tests/tach_filter_test.cpp: convergence, monotonic no-overshoot approach, R scaling with period count, uncertainty floor, reset semantics, conversion edge cases). tachometer.ino keeps only the ISR/ring plumbing. Wired into unit-tests, coverage, and clang-tidy CI.

118 test cases / 1595 assertions pass locally; clang-tidy clean.

Deferred for discussion (big-decision items, per instructions)

  • H-6 (god-file decomposition of BirdsEye.ino) and H-7 (layering) — a real re-architecture; needs agreement on module boundaries and migration order before touching ~1300 lines.
  • H-8 remainder: extracting the OTA protocol state machine (727 lines, app-region-erase consequences) and the BLE command dispatcher to tested pure units — both are sizeable refactors of safety-critical paths that deserve their own reviewed PRs rather than riding in a batch.

https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL


Generated by Claude Code

…ery (H-3/H-4/H-5, partial H-8)

H-3: trackDetectionLoop() gated only on gpsData.fix, which stays true
between PVT updates, so the O(N) software-double haversine manifest scan
(several ms at the 200-entry ceiling on the single-precision-FPU M4F)
ran every ~250 Hz loop iteration, collapsing the loop rate. Throttled to
1 Hz — still instant at driving pace.

H-4: the tach pulse ISR advanced the ring head unconditionally. An SD GC
stall (documented 100 ms–2 s — the same failure the GPS serial ring
exists for) at racing RPM overruns the 16-entry buffer; lapping the
consumer breaks the SPSC invariant and TACH_LOOP computes a
confident-but-wrong RPM that lands in the CSV and feeds the >500 RPM
auto-race trigger. The ISR now does the SPSC full check (one slot
sacrificed, matching the GPS serial ISR), drops the pulse, and sets
tachRingOverflow; TACH_LOOP discards the carried prev-timestamp so the
one period spanning the gap is never computed and the Kalman estimate
coasts. tachRingTail becomes volatile (the ISR reads it now); the wake
trigger still fires on dropped pulses.

H-5: GPS_BAUD_RECOVERY() runs from GPS_LOOP() under the armed ~4 s WDT
but can block for up to three ~1.1 s myGNSS.begin() probes plus ~500 ms
of delays — a genuinely hung module out-waits the watchdog and the one
path built to recover a sick GPS becomes a reset/recovery boot loop. It
now pets the WDT before each blocking probe (same treatment
fwStageToFlash() already had).

H-8 (partial): the tach Kalman filter math (predict/update, Q/R/floor
tuning constants, period→RPM conversion) is extracted to the host-tested
tach_filter pure unit per the project convention, with tests covering
convergence, monotonic approach, R-scaling with period count, the
uncertainty floor, and reset semantics. The OTA protocol state machine
and BLE command dispatcher extractions are deliberately deferred — they
are large refactors that deserve their own PRs.

https://claude.ai/code/session_01Voi2VL5zjikyhEqgTvzmAL
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

Coverage — host-testable units

📂 Overall coverage

Metric Coverage
Lines 🟢 255/256 (99.6%)
Functions 🟢 26/26 (100.0%)
Branches 🟡 233/267 (87.3%)

📄 File coverage

File Lines Functions Branches
BirdsEye/crc32.cpp 🟢 30/30 (100.0%) 🟢 4/4 (100.0%) 🟢 24/24 (100.0%)
BirdsEye/dovex_header.cpp 🟢 96/97 (99.0%) 🟢 6/6 (100.0%) 🔴 57/88 (64.8%)
BirdsEye/filename_validator.cpp 🟢 14/14 (100.0%) 🟢 1/1 (100.0%) 🟢 30/30 (100.0%)
BirdsEye/gps_time.cpp 🟢 45/45 (100.0%) 🟢 6/6 (100.0%) 🟢 30/32 (93.8%)
BirdsEye/gps_validation.cpp 🟢 24/24 (100.0%) 🟢 2/2 (100.0%) 🟢 66/66 (100.0%)
BirdsEye/haversine.cpp 🟢 8/8 (100.0%) 🟢 1/1 (100.0%) ⚫ 0/0 (0.0%)
BirdsEye/lap_format.cpp 🟢 18/18 (100.0%) 🟢 1/1 (100.0%) 🟢 9/9 (100.0%)
BirdsEye/sd_access_policy.cpp 🟢 5/5 (100.0%) 🟢 2/2 (100.0%) 🟢 10/10 (100.0%)
BirdsEye/tach_filter.cpp 🟢 15/15 (100.0%) 🟢 3/3 (100.0%) 🟡 7/8 (87.5%)

@TheAngryRaven TheAngryRaven merged commit 72df990 into BETA Jun 12, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants