Tiredsolid scoreboard (internal) by armcconnell · Pull Request #669 · malbeclabs/lake

armcconnell · 2026-06-29T17:10:42Z

Internal-only scoreboard for a new market-data venue (sibling to the Edge scoreboard). Gated to allowed-domain Google users.

github-actions · 2026-06-29T21:59:10Z

🔗 Preview: https://pr-669.data.malbeclabs.com

…ty window

…sence

…efault to 1h

… revisions

…auth

…ymbol recent races

… live prices

The page-cache refresh aggregated the feeds summary table with uniqExact over the 6-column race key and quantileExact over lead times. Both buffer per-group state proportional to the window's race count (~7.5M matchup rows/hour, ~768 MiB peak). On the preview/dev ClickHouse the feeds tables are remoteSecure() proxies, so the GROUP BY runs on the small local instance; under the 7.2 GiB server limit with the parallel page-cache refreshes, the OvercommitTracker killed the query (code 241), spamming ERR cache-refresh-failed logs. Switch to bounded-memory estimators (uniqCombined, quantileTDigest): peak state drops ~43x (768 -> 18 MiB) on real data with identical results and runtime, at a sub-1% approximation invisible on the scoreboard. Does not change rows read (the event_ts filter still does not prune through the proxy; deferred separately).

The 24h composite-latency query grouped every feed source by a source-keyed tuple over the raw observations table in a first CTE, then re-grouped. On the memory-constrained proxy ClickHouse (feeds is a remoteSecure() proxy, so the GROUP BY runs locally) that hash table tripped the OvercommitTracker (code 241), logging a WARN every refresh. Collapse to a single GROUP BY over (metro, symbol, block): min(recv) is associative so the per-source pre-aggregation was redundant, uniqExact(source)>=2 enforces the '>=2 DZ feeds' rule directly, and filtering to tob_ feeds in WHERE drops competitor rows before the hash table. Add max_bytes_before_external_group_by as a disk-spill safety net. Verified equivalent to the old form over 1h; the 24h query now peaks ~13 MiB (was OOMing at 7.2 GiB) in ~7s.

HypeRPC (hyperpc_shared_bbo) delivers quotes a median ~100s stale during recurring multi-hour backlog episodes (vs ~300ms for the other feeds), which made DoubleZero appear to win by seconds-to-minutes. That is a HypeRPC-side feed fault, not a real result, so withhold it from every measurement (per-competitor, per-node, headline totals, recent races) via a documented hyperliquidExcludedFeeds list until the feed is fixed.

Add a per-symbol 'updated Xs ago' line derived from the newest race event_ts, ticking with the existing 5s clock, so a stale or gap-filled symbol is visible at a glance.

The scoreboard ran two full-table scans over the proxied summary table (compQ for per-competitor, nodeQ for per-node). On the memory-constrained preview ClickHouse the refresh needed both scans to survive the OvercommitTracker in the same attempt while other page-cache queries had the 7.2 GiB server near its ceiling, so it was killed as a collateral victim nearly every cycle and its cached blob went stale (never picking up composite_latency). A race key includes measurement_node_id, so per-competitor totals are the per-node cells rolled up over the node dimension. Replace the two queries with one 'GROUP BY competitor, measurement_node_id WITH ROLLUP' scan: the detail rows give the per-vantage breakdown and the rolled-up rows give per-competitor totals. Halves the ClickHouse work and, more importantly, makes the refresh one atomic query that succeeds-or-retries as a unit. Verified equivalent to the old form against live data (within uniqCombined's sub-1% approximation).

The headline feed-latency stat took the earliest receive per (metro, symbol, block), so slower metros inflated it (p50 ~219ms). Scope it to the Tokyo vantage point — DZ's fastest metro, nearest Hyperliquid — matching Grafana's 'tob_* (first-arrival, all DZ)' panel: min receive across the tob_ DZ feeds at Tokyo per block, still requiring >=2 feeds delivered (which keeps p99 realistic). Now p50/p90/p99 read ~162/187/243 over 24h (~161/180/220 over 3h, matching Grafana's 160/176/208). Relabel the UI 'Composite feed latency' -> 'Tokyo feed latency' with a blocktime->receive / first-arrival-across-DZ-Tokyo-feeds description.

nikw9944

Automated correctness + ClickHouse performance review. Findings are labeled by severity (🔴 high / 🟠 medium / 🟢 low). Nice feature overall — SQL is safely parameterized and it degrades gracefully when the proxy table is absent.

nikw9944 · 2026-07-01T14:43:48Z

+		SELECT competitor, measurement_node_id, any(location_code) AS location_code,
+			uniqCombinedIf(rk, dz_won = 1) AS dz_wins,
+			uniqCombinedIf(rk, dz_won = 0) AS dz_losses,
+			toFloat64(quantileTDigestIf(0.5)(lead_ms, dz_won = 1)) AS lead_p50,


🔴 High — NaN quantile breaks the entire scoreboard response

These percentiles are computed only over DZ-won rows (dz_won = 1). For any (competitor, node) cell — or the per-competitor rollup — where DZ won zero races in the window, the If predicate matches no rows and ClickHouse's quantileTDigest over an empty set returns NaN. That NaN is scanned into LeadP50Ms/LeadP95Ms (float64), and writeJSON → json.NewEncoder().Encode() errors on NaN (json: unsupported value: NaN), producing a broken/empty 200 body.

Failure scenario: at a single vantage node far from Tokyo (or in a thin 1h window) DZ loses all its head-to-head races vs one competitor → that one cell is NaN → the entire scoreboard fails to encode for all users. It also breaks the page-cache worker: FetchHyperliquidScoreboardData → WritePageCache marshaling fails the same way, so the cache never populates and every request falls through to the (also-failing) live query. No test covers the 0-DZ-wins case.

Fix: coalesce NaN in SQL (e.g. wrap in ifNull/replace isNaN→0) or sanitize the float64s in Go before encoding.

nikw9944 · 2026-07-01T14:43:48Z

+			FROM %[2]s.hyperliquid_bbo_feed_race_summary
+			WHERE feed != loser_feed AND loser_feed != ''
+			  AND (startsWith(feed,'tob_') != startsWith(loser_feed,'tob_')) %[3]s
+			  AND event_ts >= now() - INTERVAL %[4]s


🟠 Medium — 24h/7d (and symbol-filtered) requests do full multi-day scans on the request path

This filters on event_ts >= now() - INTERVAL <window> and symbol IN (…), but the table is ORDER BY (measurement_node_id, symbol, …) with PARTITION BY toDate(event_ts). With no measurement_node_id predicate, neither filter uses the primary index — the time filter only prunes at day granularity and symbol (2nd sort column) can't skip granules without the leading key.

hyperliquidScoreboardCacheKey returns "" for any window other than 1h and for any symbol request, so those bypass the page cache and run live under the 60s request timeout. A 7d window therefore scans ~7–8 full day-partitions (order of ~10⁹ matchup rows at the ~7.5M rows/hour noted in the code) computing uniqCombined/tdigest per group, on the memory-constrained proxy — likely very slow or a timeout per click.

Consider caching 24h/7d too (with a longer refresh interval), or bounding/rejecting them.

nikw9944 · 2026-07-01T14:43:48Z

+
+// hyperliquidScoreboardCacheKey returns the cache key for the default request shape, or "".
+func hyperliquidScoreboardCacheKey(r *http.Request) string {
+	if r.URL.Query().Get("since_ts") != "" || r.URL.Query().Get("symbol") != "" {


🟢 Low — since_ts param bypasses cache but is never applied

This skips the cache when since_ts is present (implying support), but GetHyperliquidScoreboard never parses since_ts and FetchHyperliquidScoreboardData hardcodes fetchHyperliquidRecentRaces(ctx, time.Time{}, 10) (line 370). A ?since_ts=… request therefore pays for an uncached live query yet still returns the default last-15-minutes races. Either wire it through or drop the since_ts check here. (Frontend doesn't send it today, so impact is latent.)

nikw9944 · 2026-07-01T14:43:48Z

+	symbol := r.URL.Query().Get("symbol")
+
+	// Degrade gracefully if the proxy table isn't available (e.g. local dev).
+	if !a.hyperliquidFeedsTableExists(ctx) {


🟢 Low — redundant EXISTS TABLE round-trip per uncached request

On the cache-MISS path this calls hyperliquidFeedsTableExists, then immediately calls FetchHyperliquidScoreboardData, which calls it again (line 208) — two EXISTS TABLE queries to ClickHouse per request. Cheap individually, but a pointless extra round-trip. Drop this handler-level check (Fetch already degrades gracefully) or pass the result through.

nikw9944 · 2026-07-01T14:43:48Z

+		resp.DZWinSharePct = 100.0 * float64(globalWins) / float64(globalRaces)
+	}
+
+	recent, err := a.fetchHyperliquidRecentRaces(ctx, time.Time{}, 10)


🟢 Low — three independent scans run sequentially on the request path

The main aggregation, fetchHyperliquidRecentRaces (15-min scan), and fetchHyperliquidPrices (1-min observations scan) run sequentially. They're independent and could run concurrently (errgroup). Harmless for the cached 1h default (runs on the worker), but on the uncached 24h/7d live path their latencies stack under the same 60s budget.

nikw9944 · 2026-07-01T14:43:48Z

-  const isEdgeRoute = isShredsRoute
+  const isHyperliquidScoreboardRoute = location.pathname === '/dz/hyperliquid/scoreboard'
+  const isHyperliquidRoute = location.pathname.startsWith('/dz/hyperliquid')
+  const isEdgeRoute = isShredsRoute || isHyperliquidRoute


🟢 Low / cosmetic — sidebar highlight inconsistency for the Hyperliquid route

isEdgeRoute = isShredsRoute || isHyperliquidRoute makes the collapsed "Edge" icon highlight on /dz/hyperliquid (line 286), but the expanded Edge parent link (line 559) keys off isShredsRoute only, so it won't reflect the hyperliquid route. Purely visual; worth confirming the intended grouping (Hyperliquid also renders as its own expanded section).

- Guard NaN lead-time percentiles: a (competitor, node) cell where DoubleZero won zero races made quantileTDigestIf aggregate over an empty set, returning NaN and failing JSON encoding of the entire response (and poisoning the page cache). Coalesce to 0 via ifNotFinite; add a regression test. - Cache the 24h/7d scoreboards on the slow background refresher so those windows are served from cache instead of running a multi-day scan on the request path. - Run the three independent request-path scans (main ROLLUP, recent races, live prices) concurrently via errgroup instead of sequentially. - Drop the dead since_ts cache-key check (never parsed or applied) and the redundant per-request EXISTS TABLE round-trip (Fetch already degrades).

…ver detail

armcconnell · 2026-07-01T17:54:15Z

Thanks for the thorough review — all findings addressed in 2792d91, with two small follow-ups after. Mapping each:

🔴 NaN percentile → broken response — Fixed. Both lead-time percentiles are now wrapped in ifNotFinite(…, 0), so a (competitor, node) cell where DoubleZero won zero races serializes as 0 instead of NaN. Added a regression test (TestHyperliquidScoreboard_ZeroDZWins_Encodable) that reproduces the exact json: unsupported value: NaN failure and now passes.

🟠 24h/7d full scans on the request path — Fixed. The 24h/7d windows are now precomputed on the slow background refresher (StartHyperliquidBackgroundRefresher) and served from the page cache under per-window keys (hyperliquidScoreboardWindowKey), so the request path no longer runs a multi-day scan per click. 1h stays on the page-cache worker; a cold-cache request falls back to the same bounded live path as before.

🟢 since_ts bypasses cache but never applied — Fixed. Dropped the dead since_ts branch from hyperliquidScoreboardCacheKey (it was never parsed).

🟢 Redundant EXISTS TABLE per request — Fixed. Removed the handler-level existence check; FetchHyperliquidScoreboardData already degrades to an empty-but-valid response, so it's a single round-trip now (still covered by TestGetHyperliquidScoreboard_MissingTable).

🟢 Three sequential scans — Fixed. The main ROLLUP scan, recent-races scan, and price lookup now run concurrently via errgroup (verified race-clean).

🟢 Sidebar highlight — No change; I think this is a false positive. Line 559 is the Shreds item, not an "Edge parent" — Edge is a section-header <span>, and Shreds/<the new venue> are siblings that each highlight on their own route. Keying the Shreds link off the other route would incorrectly highlight Shreds while viewing that page; the collapsed group icon intentionally highlights for either. Happy to adjust if I've misread the intended grouping.

Two follow-ups also in the branch since the review: added Dwellir as a competitor, and reordered the per-vantage table (TYO/CHI/NYC) with datacenter facility names + hover detail.

Verified: go build, go vet, gofmt, all scoreboard unit tests pass under -race; tsc clean.

armcconnell added the preview label Jun 29, 2026

armcconnell added 27 commits June 30, 2026 21:25

docs: hyperliquid bbo scoreboard design spec

21ccdc1

docs: hyperliquid bbo scoreboard implementation plan

f08a2e2

api: add feeds database config and FeedsDB wiring

ef2d81a

admin: proxy feeds.hyperliquid_bbo_feed_race_summary remote table

ce06054

api: hyperliquid scoreboard headline and per-competitor query

10d7acc

api: use quantileExact for hyperliquid lead percentiles and guard emp…

7f799f1

…ty window

api: hyperliquid scoreboard per-node breakdown

7562313

api: hyperliquid scoreboard recent races strip

3175ff7

api: hyperliquid scoreboard HTTP handler, caching, and route

2575cb3

api: register hyperliquid scoreboard page cache entries

eef967c

web: hyperliquid scoreboard api client and types

538c7b1

web: hyperliquid scoreboard page component

9534f6a

web: hyperliquid scoreboard route and sidebar nav

c81f2d0

scripts: seed local hyperliquid feed race summary

abfb8f0

scripts: fail fast on curl errors and validate seed args

46fd038

api: drop unread hyperliquid latest cache and gate fetch on table pre…

24d1335

…sence

hyperliquid: redesign scoreboard to match shreds, optimize queries, d…

94b0472

…efault to 1h

hyperliquid: add local feeds proxy script and document implementation…

8739900

… revisions

hyperliquid: gate scoreboard endpoint and nav behind internal-domain …

367def9

…auth

api: hyperliquid scoreboard composite latency, live prices, and per-s…

1313483

…ymbol recent races

admin: proxy feeds.hyperliquid_bbo_observations for composite latency

f2e225d

web: hyperliquid scoreboard composite latency, recent-races grid, and…

2a7b8d4

… live prices

docs: correct HypeRPC branding spelling

59261b9

web: show last-race timestamp on hyperliquid coin cards

1472274

Add a per-symbol 'updated Xs ago' line derived from the newest race event_ts, ticking with the existing 5s clock, so a stale or gap-filled symbol is visible at a glance.

armcconnell force-pushed the tiredsolid-scoreboard branch from c9de871 to 1472274 Compare July 1, 2026 02:26

armcconnell added 2 commits June 30, 2026 21:48

armcconnell force-pushed the tiredsolid-scoreboard branch from 47eae3f to 440e04c Compare July 1, 2026 13:22

api: gofmt hyperliquid_scoreboard (drop trailing newline)

2f38528

nikw9944 reviewed Jul 1, 2026

View reviewed changes

nikw9944 requested changes Jul 1, 2026

View reviewed changes

armcconnell added 3 commits July 1, 2026 11:53

api: add Dwellir as a hyperliquid scoreboard competitor

a51724e

web: order hyperliquid vantages TYO/CHI/NYC and show facility with ho…

2131925

…ver detail

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tiredsolid scoreboard (internal)#669

Tiredsolid scoreboard (internal)#669
armcconnell wants to merge 33 commits into
mainfrom
tiredsolid-scoreboard

armcconnell commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

nikw9944 left a comment

Uh oh!

nikw9944 Jul 1, 2026

Uh oh!

nikw9944 Jul 1, 2026

Uh oh!

nikw9944 Jul 1, 2026

Uh oh!

nikw9944 Jul 1, 2026

Uh oh!

nikw9944 Jul 1, 2026

Uh oh!

nikw9944 Jul 1, 2026

Uh oh!

armcconnell commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

armcconnell commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

nikw9944 left a comment

Choose a reason for hiding this comment

Uh oh!

nikw9944 Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

nikw9944 Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

nikw9944 Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

nikw9944 Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

nikw9944 Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

nikw9944 Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

armcconnell commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants