Skip to content

Add freq04.cpp: SIMD + hash-table optimized version of freq03.cpp#29

Open
metacircu1ar wants to merge 1 commit into
shodanium:masterfrom
metacircu1ar:optimizations
Open

Add freq04.cpp: SIMD + hash-table optimized version of freq03.cpp#29
metacircu1ar wants to merge 1 commit into
shodanium:masterfrom
metacircu1ar:optimizations

Conversation

@metacircu1ar
Copy link
Copy Markdown

Optimizations:

  • SIMD tokenization (NEON on ARM, SSE2 on x86)
  • Split short/long hash-table acquire paths
  • Inline short-key storage (words up to 8 chars in entry::shortkey)
  • Direct-address counter table for words of length 1..5

Benchmark on Apple M-series, Makefile default -O3, 10 warm runs:

freq03 0.686..0.715s 490.4 MB/s (baseline)
freq04 0.417..0.454s 805.5 MB/s (optimized)

Speedup: 1.64x (wall-clock time reduced by ~39%)
Throughput: +315.1 MB/s (+64.3%)

Output verified byte-equal to freq03 via cmp.

Optimizations:
- SIMD tokenization (NEON on ARM, SSE2 on x86)
- Split short/long hash-table acquire paths
- Inline short-key storage (words up to 8 chars in entry::shortkey)
- Direct-address counter table for words of length 1..5

Benchmark on Apple M-series, Makefile default -O3, 10 warm runs:

  freq03  0.686..0.715s   490.4 MB/s  (baseline)
  freq04  0.417..0.454s   805.5 MB/s  (optimized)

Speedup:     1.64x (wall-clock time reduced by ~39%)
Throughput:  +315.1 MB/s (+64.3%)

Output verified byte-equal to freq03 via cmp.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant