multicast: report devices with no mroute telemetry as unknown, not unhealthy#662
Open
bgm-malbeclabs wants to merge 2 commits into
Open
multicast: report devices with no mroute telemetry as unknown, not unhealthy#662bgm-malbeclabs wants to merge 2 commits into
bgm-malbeclabs wants to merge 2 commits into
Conversation
…healthy A multicast publisher/subscriber on a device that exports no mroute telemetry previously rendered as a confirmed 'unhealthy' fault with a reason implying its tunnel was missing as the RPF interface. The real cause is missing telemetry, not a forwarding problem. The health_multicast_user view now resolves the device-reports-nothing case to health_status='unknown' (an existing status, already wired through the rate view, API counts, and web badges) with the reason 'no mroute telemetry observed from <device>'. A genuine RPF mismatch on a reporting device still resolves to 'unhealthy'.
…SERT VALUES The testcontainer image was clickhouse/clickhouse-server:latest, which drifted to 26.3 whose Values parser rejects '--' comments embedded between INSERT ... VALUES tuples, breaking TestHealthMulticastUserRate. Pin the test image to 25.12 (matching docker-compose.yml and k8s/base/clickhouse.yaml) so tests track the version we actually run, and move the embedded SQL comments in the rate test out to Go comments.
|
🔗 Preview: https://pr-662.data.malbeclabs.com |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary of Changes
health_status='unknown'("no mroute telemetry observed from<device>") instead of a falseunhealthy. A genuine RPF mismatch on a reporting device still resolves tounhealthy.unknownstatus — already wired through the rate view (which defaults to it), API counts, and web badges/sort — so no API or schema changes were needed.unknowndefinition now covers the no-telemetry case, and the under-development banner explains non-reporting devices showunknown(notunhealthy).25.12(matchingdocker-compose.yml/k8s/base/clickhouse.yaml) instead of:latest, which had drifted to 26.3 and broke a test; also moves SQL--comments out ofINSERT ... VALUESblocks that the newer Values parser rejects.Diff Breakdown
The line count is dominated by the migration re-declaring the full
health_multicast_userview in its Up and Down blocks; the actual semantic change is two addedmultiIfbranches (health_statusandmismatch_reason).Key files (click to expand)
indexer/db/clickhouse/migrations/20260616000001_health_multicast_user_no_telemetry_reason.sql— adds adevices_with_mroutesflag tohealth_multicast_user; a device with zero observed mroutes resolveshealth_statustounknownwith a "no mroute telemetry observed" reason.indexer/pkg/dz/mroute/health_multicast_user_test.go— adds a publisher on a non-reporting device (assertsunknown+ telemetry-gap reason) and a contrasting reporting-device fault case (assertsunhealthy+ "RPF interface").web/src/components/multicast-group-health-tab.tsx— broadens theunknownstatus definition and rewrites the under-development banner to reflect the new behavior.indexer/pkg/clickhouse/testing/db.go,api/testing/clickhouse.go— pin the testcontainer ClickHouse image to25.12.indexer/pkg/dz/mroute/health_multicast_user_rate_test.go— relocate embedded--comments out ofINSERT ... VALUES.Testing Verification
health_multicast_user_test.gocovers both new paths: a publisher on a zero-mroute device resolves tounknownwith "no mroute telemetry observed from …" (and not "RPF interface"), while a publisher whose reporting device lacks its tunnel still resolves tounhealthy.Valuesparser rejects--comments betweenVALUEStuples);mroutepackage and APIMulticasttests pass on the pinned25.12image.