Skip to content

DLPX-96312 Drop name=total from estat_nfs and estat_iscsi#122

Closed
dbshah12 wants to merge 13 commits into
developfrom
DLPX-96312-drop-estat-total
Closed

DLPX-96312 Drop name=total from estat_nfs and estat_iscsi#122
dbshah12 wants to merge 13 commits into
developfrom
DLPX-96312-drop-estat-total

Conversation

@dbshah12

Copy link
Copy Markdown

Summary

  • Extends the existing name=total drop filter (previously estat_backend-io only) to also cover estat_nfs and estat_iscsi
  • The total row is read+write summed at collection time and can be derived in Grafana, making it redundant storage (~33% reduction per measurement)
  • hist_estat_* clones are unaffected — name=total rows carry no microseconds field and are already discarded by the histogram Starlark processor

Test plan

  • Deployed to dhruv3.dlpxdc.co and verified via InfluxDB query
  • estat_nfs contains only v4/read and v4/write name tags — no total
  • estat_iscsi contains only read/write name tags — no total
  • hist_estat_nfs and hist_estat_iscsi histogram data unaffected
  • All other measurements (cpu, disk, diskio, net, zfs, tcp_stats, hist_estat_backend-io) verified flowing normally

🤖 Generated with Claude Code

dbshah12 and others added 13 commits June 15, 2026 10:43
…range

metaslab_alloc_dva() is no longer in the normal write path on ZFS 2.4.99+
(Delphix 2026.3); it now only appears in vdev_removal.c. The write path is:

  metaslab_alloc() -> metaslab_alloc_range() -> metaslab_alloc_dva_range()
    -> metaslab_group_alloc()

Replace the outer kprobe/kretprobe pair from metaslab_alloc_dva to
metaslab_alloc_dva_range, which occupies the same structural role.
metaslab_alloc_dva_range() receives spa_t * as its first argument, making
pool filtering a direct equal_to_pool(spa->spa_name) call with no struct
chain traversal needed.

This also removes the dual-path dva_owned complexity introduced to work
around the missing outer probe — with metaslab_alloc_dva_range as the
anchor, metaslab_group_alloc_entry always finds an existing data_map entry
and only needs to fill in vdev name and asize.

On older ZFS without metaslab_alloc_dva_range, estat(8) prints a WARNING
and skips those probes gracefully.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…range

metaslab_alloc_dva() is no longer in the normal write path on ZFS 2.4.99+
(Delphix 2026.3); it now only appears in vdev_removal.c. The write path is:

  metaslab_alloc() -> metaslab_alloc_range() -> metaslab_alloc_dva_range()
    -> metaslab_group_alloc()

Replace the outer kprobe/kretprobe pair from metaslab_alloc_dva to
metaslab_alloc_dva_range, which occupies the same structural role.
metaslab_alloc_dva_range() receives spa_t * as its first argument, making
pool filtering a direct equal_to_pool(spa->spa_name) call with no struct
chain traversal needed.

This also removes the dual-path dva_owned complexity introduced to work
around the missing outer probe — with metaslab_alloc_dva_range as the
anchor, metaslab_group_alloc_entry always finds an existing data_map entry
and only needs to fill in vdev name and asize.

On older ZFS without metaslab_alloc_dva_range, estat(8) prints a WARNING
and skips those probes gracefully.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
The Starlark processor was returning [metric] (pass-through) for
hist_estat_* rows whose microseconds field is absent — specifically
name=total summary rows, which carry only iops and throughput.
Those clones are pure duplicates of the corresponding estat_* row
and should not exist in hist_estat_*.

Fix: return [] to drop the metric when microseconds is None.
Only rows with actual histogram data produce le=<bucket> output.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
When microseconds is present but parses to no valid bucket pairs
(e.g. "{ }" from a zero-activity interval), the Starlark function
was falling through to `return result if result else [metric]` and
passing the metric through unchanged — a duplicate of the estat_*
summary row for that series.

Fix: return [] instead of [metric] when parsing yields no results.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
When microseconds is present but parses to no valid bucket pairs
(e.g. "{ }" from a zero-activity interval), the Starlark function
was falling through to `return result if result else [metric]` and
passing the metric through unchanged — a duplicate of the estat_*
summary row for that series.

Fix: return [] instead of [metric] when parsing yields no results.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
mawk 1.3.4 does not flush its stdout buffer via fflush() when writing to a
Telegraf execd pipe, causing tcp_stats data to never reach InfluxDB on engines
where mawk is the default awk. Wrapping connstat in a while loop with -c 2
forces awk to exit naturally after each 10-second interval, triggering the C
runtime exit flush (fclose) that reliably delivers data to Telegraf. The END
block captures the partial second interval on each awk exit so no data is lost.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Extend the existing name=total starlark drop filter (previously
estat_backend-io only) to also cover estat_nfs and estat_iscsi.
The total row is read+write summed at collection time and can be
derived in Grafana, so storing it wastes ~33% of each measurement's
space. The hist_estat_* clones are unaffected since name=total rows
carry no microseconds field and are already dropped by the histogram
processor.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@dbshah12 dbshah12 changed the base branch from engine-perf to develop June 19, 2026 17:43
@dbshah12 dbshah12 closed this Jun 19, 2026
@dbshah12 dbshah12 reopened this Jun 19, 2026
@dbshah12 dbshah12 closed this Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant