Skip to content

Optimize 1D direct NFFT by carrying the phase by recurrence.#212

Draft
jenskeiner wants to merge 2 commits into
developfrom
feature/nfft-trafo-direct-recurrence
Draft

Optimize 1D direct NFFT by carrying the phase by recurrence.#212
jenskeiner wants to merge 2 commits into
developfrom
feature/nfft-trafo-direct-recurrence

Conversation

@jenskeiner

@jenskeiner jenskeiner commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

This PR adds an optimization to the 1D direct NFFT algorithm that should result in performance improvements.

It's an example of a result obtained by using a coding agent with a harness to guide the code optimization process and ensure correctness. I'll leave it here for a bit before merging.

Unfortunately, the benchmarks suffer from noise from different GitHub runners and thus are not really comparable with develop; see https://codspeed.io/blog/why-glibc-faster-github-actions.

To fix this, we should switch to CodSpeeds Macro Runners. The drawback may be that we can easily hit the 500 minutes per month quota on the free tier. However, for open-source projects, they may be willing to increase the quota as per their website.

It would be best to merge this PR only when we have removed the noise from our CI.

@codspeed-hq

codspeed-hq Bot commented Jun 16, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by ×4.4

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 30 improved benchmarks
✅ 102 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime codspeed-macro_gcc_kaiserbessel_double/nfft_forward_direct_1d[512/1600] 41.3 ms 4 ms ×10
WallTime codspeed-macro_gcc_kaiserbessel_double/nfft_forward_direct_1d_omp[512/1600] 3,003.1 µs 291.1 µs ×10
WallTime codspeed-macro_gcc_kaiserbessel_double/nfft_forward_direct_1d[256/800] 10.4 ms 1 ms ×10
WallTime codspeed-macro_gcc_kaiserbessel_double/nfft_forward_direct_1d[128/400] 2,633.1 µs 281.7 µs ×9.3
WallTime codspeed-macro_gcc_kaiserbessel_double/nfft_forward_direct_1d_omp[256/800] 769.8 µs 84.6 µs ×9.1
WallTime codspeed-macro_gcc_kaiserbessel_double/nfft_forward_direct_1d[64/200] 652 µs 80.9 µs ×8.1
WallTime codspeed-macro_gcc_kaiserbessel_double/nfft_forward_direct_1d_omp[128/400] 206.7 µs 28.8 µs ×7.2
WallTime codspeed-macro_gcc_kaiserbessel_double/nfft_forward_direct_1d[32/100] 159.8 µs 24.9 µs ×6.4
WallTime codspeed-macro_gcc_kaiserbessel_float/nfft_forward_direct_1d_omp[512/1600] 1,567.5 µs 285.2 µs ×5.5
WallTime codspeed-macro_gcc_kaiserbessel_float/nfft_forward_direct_1d[512/1600] 21.2 ms 3.9 ms ×5.4
WallTime codspeed-macro_gcc_kaiserbessel_float/nfft_forward_direct_1d[256/800] 4,875.7 µs 993.5 µs ×4.9
WallTime codspeed-macro_gcc_kaiserbessel_float/nfft_forward_direct_1d_omp[256/800] 373.7 µs 80.9 µs ×4.6
WallTime codspeed-macro_gcc_kaiserbessel_double/nfft_forward_direct_1d_omp[64/200] 57.7 µs 13.5 µs ×4.3
WallTime codspeed-macro_gcc_kaiserbessel_float/nfft_forward_direct_1d[128/400] 1,100.1 µs 259 µs ×4.2
WallTime codspeed-macro_gcc_kaiserbessel_float/nfft_forward_direct_1d[64/200] 262.5 µs 68.6 µs ×3.8
WallTime codspeed-macro_gcc_kaiserbessel_long-double/nfft_forward_direct_1d[512/1600] 2,019.3 ms 574.2 ms ×3.5
WallTime codspeed-macro_gcc_kaiserbessel_long-double/nfft_forward_direct_1d_omp[512/1600] 144.8 ms 41.4 ms ×3.5
WallTime codspeed-macro_gcc_kaiserbessel_float/nfft_forward_direct_1d_omp[128/400] 93.1 µs 26.9 µs ×3.5
WallTime codspeed-macro_gcc_kaiserbessel_long-double/nfft_forward_direct_1d[256/800] 503.1 ms 145.8 ms ×3.5
WallTime codspeed-macro_gcc_kaiserbessel_long-double/nfft_forward_direct_1d_omp[256/800] 36.5 ms 10.6 ms ×3.4
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing feature/nfft-trafo-direct-recurrence (56f7684) with develop (783cb56)

Open in CodSpeed

@jenskeiner jenskeiner changed the title perf(nfft): carry the direct-NFFT phase by recurrence in the 1d branch. Optimize 1D direct NFFT by carrying the phase by recurrence. Jun 16, 2026
@jenskeiner jenskeiner force-pushed the feature/nfft-trafo-direct-recurrence branch from 82ce491 to 93b2bd3 Compare June 16, 2026 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant