Add early-exit option for RDKit benchmarks that operates on a specific time budget by scal444 · Pull Request #180 · NVIDIA-BioNeMo/nvMolKit

scal444 · 2026-05-26T19:46:07Z

Calculates throughput after running for up to X amount of time.

Refactor of substruct rdkit code to reduce churn, but it's still unfortunately verbose due to indentation issues.

- Add --rdkit_max_seconds to etkdg/ff_optimize/substruct benches. When set, the RDKit comparison stops once wall-clock elapsed exceeds the cap; the reported timing is over molecules/pairs actually processed. - Replace 'speedup = baseline_ms / avg_ms' with throughput-based ratios (mols * confs / elapsed) so methods that processed different amounts of work are comparable. - CSV gains mols_processed/pairs_processed, rdkit_max_seconds, confs_per_second/pairs_per_second, vs_rdkit_throughput_ratio columns. - substruct: --rdkit_match_mode and --rdkit_threads become nargs='+'; main runs the cartesian product and picks the best-throughput variant as the validation/throughput baseline.

greptile-apps · 2026-05-26T19:54:21Z

Greptile Summary

This PR adds a --rdkit_max_seconds early-exit option to all three RDKit benchmarks (ETKDG, FF-optimize, substruct), backed by a new Deadline class and time_it_bounded helper in bench_utils/timing.py. It also refactors substruct_bench to support sweeping multiple --rdkit_match_mode and --rdkit_threads values in a single run, and adds unit tests for the new timing primitives.

time_it_bounded drives early exit in substruct_bench; etkdg_bench and ff_optimize_bench instead embed a per-run Deadline inside their run() closure and still call the existing time_it, so total benchmark time with max_seconds set is runs × max_seconds rather than approximately max_seconds.
CSV output now includes mols_processed, rdkit_max_seconds, throughput columns, and a vs_rdkit_throughput_ratio; rdkit_max_seconds is correctly gated to rdkit-only rows in all three scripts.

Confidence Score: 4/5

Safe to merge with one bug fix: time_it_bounded can return an inconsistent (avg_ms, last_progress) pair when complete runs are followed by a partial run.

time_it_bounded returns last_progress from a partial run paired with avg_ms computed from the prior complete runs. Any throughput calculation that divides last_progress by avg_ms will underestimate actual throughput, and the RDKit hit max_seconds budget diagnostic will fire spuriously even though complete runs occurred. The fix is a one-line change on the return statement.

benchmarks/bench_utils/timing.py — the time_it_bounded mixed complete+partial return path

Important Files Changed

Filename	Overview
benchmarks/bench_utils/timing.py	Adds Deadline, throughput_per_s, time_it_bounded, and add_rdkit_max_seconds_arg. time_it_bounded has a bug in the mixed complete+partial run path: it returns last_progress from the partial run paired with avg_ms from complete runs, causing underestimated throughput at the call site.
benchmarks/etkdg_bench.py	Adds per-run Deadline inside bench_rdkit and exposes --rdkit_max_seconds; uses time_it (not time_it_bounded), so all runs execute even when max_seconds is set, making total time runs×max_seconds rather than ~max_seconds.
benchmarks/ff_optimize_bench.py	Mirrors etkdg_bench changes: adds per-run Deadline, rdkit_max_seconds gating, and throughput columns. Same time_it-vs-time_it_bounded inconsistency as etkdg_bench.
benchmarks/substruct_bench.py	Significant refactor: supports multi-value --rdkit_match_mode/--rdkit_threads sweeps, extracts _validate_matches, uses time_it_bounded for early exit, and correctly gates rdkit_max_seconds to rdkit-only CSV rows.
benchmarks/tests/test_timing.py	New unit tests for Deadline, throughput_per_s, and time_it_bounded. Test comment for the budget-exhaustion test now correctly describes the run-2-completes / deadline-before-run-3 behaviour.
benchmarks/tests/conftest.py	New conftest that inserts the benchmarks/ directory into sys.path so bench_utils is importable from the tests/ subdirectory.
benchmarks/bench_utils/init.py	Re-exports the five new timing symbols (Deadline, add_rdkit_max_seconds_arg, throughput_per_s, time_it_bounded) and adds them to all.

_{Reviews (2): Last reviewed commit: "Address greptile comments" | Re-trigger Greptile}

evasnow1992

Changes look good to me. Thanks!

scal444 added 3 commits May 26, 2026 15:20

Consolidate testing

b4f07c8

Refactor substruct to reduce churn

5553ea1

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

Comment thread benchmarks/substruct_bench.py

Comment thread benchmarks/tests/test_timing.py Outdated

Address greptile comments

5d243aa

scal444 requested a review from evasnow1992 May 27, 2026 12:59

evasnow1992 reviewed May 27, 2026

View reviewed changes

Comment thread benchmarks/substruct_bench.py Outdated

evasnow1992 reviewed May 27, 2026

View reviewed changes

Comment thread benchmarks/substruct_bench.py Outdated

Address review feedback

dd75bfe

evasnow1992 approved these changes May 29, 2026

View reviewed changes

scal444 added 2 commits May 29, 2026 12:52

Merge branch 'main' into split/throughput

4cde457

Merge branch 'main' into split/throughput

91e741b

scal444 merged commit b54d2dd into NVIDIA-BioNeMo:main May 29, 2026
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add early-exit option for RDKit benchmarks that operates on a specific time budget#180

Add early-exit option for RDKit benchmarks that operates on a specific time budget#180
scal444 merged 7 commits into
NVIDIA-BioNeMo:mainfrom
scal444:split/throughput

scal444 commented May 26, 2026

Uh oh!

greptile-apps Bot commented May 26, 2026 •

edited

Loading

Greptile Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

evasnow1992 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scal444 commented May 26, 2026

Uh oh!

greptile-apps Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

evasnow1992 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 26, 2026 •

edited

Loading