ci: split otap-dataflow tests into build-once + 3 partitions by jmacd · Pull Request #2615 · open-telemetry/otel-arrow

jmacd · 2026-04-08T22:23:25Z

The monolithic cargo nextest run --all-features --workspace step is timing out in the merge queue at around 55min on ARM, 35min on Windows.

This restructures the CI to:

build_required / build_extra: compile test binaries once per OS and upload a nextest archive artifact.
test_and_coverage_required / test_extra: download the archive and run tests with --partition count:N/3, splitting into 3 parallel jobs per OS.
Coverage: Linux build uses cargo llvm-cov show-env to instrument binaries at build time. Test partitions source the same env vars, run their slice of tests, then cargo llvm-cov report generates per-partition lcov files uploaded to codecov.
The non-required test_and_coverage job now only covers experimental folders. otap-dataflow on ARM and macOS moves to the new build_extra/test_extra jobs.

Expected improvement: each test partition runs ~1/3 of tests without paying the full compile cost.

Fixes #2534

codecov · 2026-04-08T22:26:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.05%. Comparing base (fdc0cfc) to head (106b63c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2615      +/-   ##
==========================================
- Coverage   88.06%   88.05%   -0.02%     
==========================================
  Files         630      630              
  Lines      232988   232986       -2     
==========================================
- Hits       205181   205147      -34     
- Misses      27283    27315      +32     
  Partials      524      524

Components	Coverage Δ
otap-dataflow	`89.75% <ø> (-0.02%)`	⬇️
query_abstraction	`80.61% <ø> (ø)`
query_engine	`90.74% <ø> (ø)`
otel-arrow-go	`52.45% <ø> (ø)`
quiver	`92.27% <ø> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The monolithic 'cargo nextest run --all-features --workspace' step was timing out in the merge queue (56min on ARM, 32min on Windows, 13min on Linux). This restructures the CI to: 1. build_required / build_extra: compile test binaries once per OS and upload a nextest archive artifact. 2. test_and_coverage_required / test_extra: download the archive and run tests with '--partition count:N/3', splitting into 3 parallel jobs per OS. No recompilation needed. 3. Coverage: Linux build uses 'cargo llvm-cov show-env' to instrument binaries at build time. Test partitions source the same env vars, run their slice of tests, then 'cargo llvm-cov report' generates per-partition lcov files uploaded to codecov with partition flags. 4. The non-required test_and_coverage job now only covers experimental folders (small, no partitioning needed). otap-dataflow on ARM and macOS moves to the new build_extra/test_extra jobs. Expected improvement: each test partition runs ~1/3 of tests without paying the full compile cost, bringing wall-clock time well under the timeout threshold. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The nextest archive + Go test collector fill the disk on ubuntu-latest runners. Add the same disk cleanup step used by the build jobs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ARM runners are slow — compiling --all-features (3 crypto backends + TLS) takes 50+ minutes. The full feature matrix is already tested on x86 Linux and Windows. ARM/macOS only need to catch architecture- specific issues, which default features cover. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Standardize all disk cleanup steps to the same compact 3-line form. Previously there were 3 variants: verbose (14 lines with df -h), compact (3 lines), and missing entirely. Now every job that needs disk space gets the same pattern. Also adds cleanup to test_extra which was missing it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…d/split_tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When a test fails, nextest cancels all remaining tests by default. With --max-fail=10 via a ci profile, we continue running and report up to 10 failures before stopping, giving better signal on flaky vs. systematic failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The experimental workspaces don't have .config/nextest.toml with the ci profile, so --profile ci fails there. Remove it from the test_and_coverage job which only runs experimental folders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- test_and_coverage -> experimental_tests - build_extra -> build_nonrequired - test_extra -> test_nonrequired Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

This job no longer does coverage -- that moved to a separate job. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…d/split_tests

When running from an archive, nextest may not find the repository config at .config/nextest.toml. Pass it explicitly so the ci profile (with max-fail=10) is actually used. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

lquerel · 2026-04-12T15:08:57Z

Few weeks ago in the context of the new cargo xtask check, I compared the speed of nextest with the regular cargo test and for an unknown reason cargo test was significantly faster. We should go back to cargo test or find a way to make nextest faster (I try several hours without success).

github-project-automation bot added this to OTel-Arrow Apr 8, 2026

github-actions bot added the ci-repo Repository maintenance, build, GH workflows, repo cleanup, or other chores label Apr 8, 2026

jmacd force-pushed the jmacd/split_tests branch 2 times, most recently from d381634 to 9ab666a Compare April 8, 2026 23:20

jmacd force-pushed the jmacd/split_tests branch from eb712dd to 6e55019 Compare April 9, 2026 15:40

jmacd mentioned this pull request Apr 9, 2026

Build cache needed #2619

Open

1 task

jmacd and others added 3 commits April 9, 2026 09:27

ci: add disk cleanup to test partition jobs

6fd1b3a

The nextest archive + Go test collector fill the disk on ubuntu-latest runners. Add the same disk cleanup step used by the build jobs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jmacd force-pushed the jmacd/split_tests branch from 5ecf9c6 to fddf97f Compare April 10, 2026 20:36

jmacd and others added 2 commits April 10, 2026 13:36

Merge branch 'main' of github.com:open-telemetry/otel-arrow into jmac…

7339dc5

…d/split_tests

ci: fix non-ASCII em-dash in comment

0c3b729

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jmacd marked this pull request as ready for review April 10, 2026 21:22

jmacd requested a review from a team as a code owner April 10, 2026 21:22

jmacd marked this pull request as draft April 10, 2026 21:25

github-actions bot added the rust Pull requests that update Rust code label Apr 10, 2026

jmacd and others added 5 commits April 10, 2026 14:41

ci: rename jobs for clarity

1f3d3f3

- test_and_coverage -> experimental_tests - build_extra -> build_nonrequired - test_extra -> test_nonrequired Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ci: rename test_and_coverage_required to test_required

b78b2c5

This job no longer does coverage -- that moved to a separate job. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge branch 'main' of github.com:open-telemetry/otel-arrow into jmac…

6d95732

…d/split_tests

jmacd marked this pull request as ready for review April 11, 2026 01:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: split otap-dataflow tests into build-once + 3 partitions#2615

ci: split otap-dataflow tests into build-once + 3 partitions#2615
jmacd wants to merge 12 commits intoopen-telemetry:mainfrom
jmacd:jmacd/split_tests

jmacd commented Apr 8, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

lquerel commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jmacd commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lquerel commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jmacd commented Apr 8, 2026 •

edited

Loading

codecov bot commented Apr 8, 2026 •

edited

Loading