Use robust centrality dispersion measures by oleksandr-pavlyk · Pull Request #348 · NVIDIA/nvbench

oleksandr-pavlyk · 2026-05-04T21:37:23Z

This PR closes #342

Introduces std::array<ValueType, N> nvbench::detail::statistics::compute_percentiles(Iter begin, Iter end, std::array<int, N> percentiles) which finds requested percentile levels using "zero-based nearest endpoint-scaled rank".
Report 25-th, 50-th, and 75-th percentiles (first quartile $Q_1$, median $Q_2$, third quartile $Q_3$) as well absolute interquartile range $Q_3 - Q_1$ and relative interquartile range $(Q_3 - Q_1) / Q_2$ for GPU and CPU times in measure_cold.
Make "nv/cold/time/gpu/mean" and "nv/cold/time/gpu/stdev/relative" hidden, and replace them with "nv/cold/time/gpu/median" and "nv/cold/time/gpu/ir/relative", respectively.
Make similar change for "nv/cold/time/cpu*"
Change stdrel_criterion to track "ir/relative" noise, i.e. $(Q_3 - Q_1) / Q_2$. This noise value is much less susceptible to outliers.
Modified nvbench_compare to reflect changes above
a. It now queries median for time value, and relative interquartile range as noise
b. Comparison logic has been changed to get reference noise from min(ref_noise, cmp_noise) to max(ref_noise, cmp_noise). The relative difference between durations should be permitted to be wider if one measurement is significantly noisier than another.

Consequence of the change to stdrel_criterion the number of samples collected by default has become significantly smaller, especially with use of sufficient count of warm-up iterations.

CLI option --warmup-runs implemented and documented. The warm-up counts is enforced to always be positive. This is necessary to ensure that JIT-ting has occurred, and use of blocking kernel would not result in time-outs. Test is option parser is added.

Because warm-up runs are executed without use of blocking kernel, the blocking kernel was not jitted until actual measurements were collected. The module loading cost incurred during the first run shows as elevated CPU time noise value for the first measurement as noted in NVIDIA#339 This PR adds `this->block_stream(); this->unblock_stream();` prior to executing warm-up loop with use of blocking kernel disabled. This ensures that blocking kernel is instantiated during the warm-up, but it no other kernel is launched between its launch and stream sync thus avoiding deadlocking.

--stopping-criterion sample-count --target-samples 100 would stop once max(--min-samples, --target-samples) samples are collected

Percentiles on empty dataset are NaN, not infinity Add Robust statistics of CPU times to summary Fixed name for nv/cold/time/gpu/q3, corrected value reported for nv/cold/time/gpu/ir/relative Use median and IR to compute location and noise in measure_cold Also in stdrel_criterion, compute noise as IR / median.

1. For JSON files that contains repeated measurements of run-time axis values, make sure that scripts compares corresponding reference entries. If cmp had two states with the same name and ref had two, we would compare measurements for each state in cmp against the first state in ref. Change here introduces counters tracking how many times each particular axis value, and retrieve corresponding entry in ref. Previously, I had ``` | BlockSize | NumBlocks | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |-------------|-------------|------------|-------------|------------|-------------|-----------|---------|----------| | 2^8 | 64 | 1.776 ms | 0.46% | 1.777 ms | 0.40% | 1.024 us | 0.06% | SAME | | 2^8 | 64 | 1.776 ms | 0.46% | 1.774 ms | 0.52% | -2.048 us | -0.12% | SAME | | 2^8 | 64 | 1.776 ms | 0.46% | 1.773 ms | 0.52% | -3.072 us | -0.17% | SAME | | 2^8 | 64 | 1.776 ms | 0.46% | 1.774 ms | 0.58% | -2.048 us | -0.12% | SAME | | 2^8 | 64 | 1.776 ms | 0.46% | 1.773 ms | 0.58% | -3.072 us | -0.17% | SAME | ``` and now it becomes ``` | BlockSize | NumBlocks | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |-------------|-------------|------------|-------------|------------|-------------|-----------|---------|----------| | 2^8 | 64 | 1.776 ms | 0.46% | 1.777 ms | 0.40% | 1.024 us | 0.06% | SAME | | 2^8 | 64 | 1.773 ms | 0.64% | 1.774 ms | 0.52% | 1.024 us | 0.06% | SAME | | 2^8 | 64 | 1.774 ms | 0.46% | 1.773 ms | 0.52% | -1.024 us | -0.06% | SAME | | 2^8 | 64 | 1.773 ms | 0.46% | 1.774 ms | 0.58% | 1.024 us | 0.06% | SAME | | 2^8 | 64 | 1.774 ms | 0.52% | 1.773 ms | 0.58% | -1.024 us | -0.06% | SAME | ``` With the following raw data expected ``` (py313) opavlyk@NV-22T4X34:~/repos/nvbench$ jq '. | .benchmarks[] | .states[] | .summaries[] | select(.tag == "nv/cold/time/gpu/median") | .data[] | .value' base.json "0.0017756160497665405" "0.0017725440263748169" "0.001773568034172058" "0.0017725440263748169" "0.001773568034172058" (py313) opavlyk@NV-22T4X34:~/repos/nvbench$ jq '. | .benchmarks[] | .states[] | .summaries[] | select(.tag == "nv/cold/time/gpu/median") | .data[] | .value' test.json "0.0017766400575637818" "0.001773568034172058" "0.0017725440263748169" "0.001773568034172058" "0.0017725440263748169" ``` 2. nvbench_compare changes from using min_noise = min(ref_noise, cmp_noise) to using max_noise = max(ref_noise, cmp_noise) Using larger of ref and cmp noise level as a reference against which to gauge timing difference ratio makes more sense.

These measures are less sensitive to outliers

oleksandr-pavlyk added 7 commits May 4, 2026 08:52

Implement sample-count stopping criterion with parameter target-samples

e9daaba

--stopping-criterion sample-count --target-samples 100 would stop once max(--min-samples, --target-samples) samples are collected

Use median and IR/relative as cmp_time/ref_time and cmp_noise/ref_noise

e53a1a2

These measures are less sensitive to outliers

Require at least 5 samples to begin estimating noise level

8d1b316

oleksandr-pavlyk self-assigned this May 5, 2026

oleksandr-pavlyk added this to CCCL May 5, 2026

github-project-automation Bot moved this to Todo in CCCL May 5, 2026

oleksandr-pavlyk added the release: breaking change Include in "Breaking Changes" section of release notes. label May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use robust centrality dispersion measures#348

Use robust centrality dispersion measures#348
oleksandr-pavlyk wants to merge 7 commits intoNVIDIA:mainfrom
oleksandr-pavlyk:use-robust-centrality-dispersion-measures

oleksandr-pavlyk commented May 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oleksandr-pavlyk commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

oleksandr-pavlyk commented May 4, 2026 •

edited

Loading