Add freq04.cpp: SIMD + hash-table optimized version of freq03.cpp#29
Open
metacircu1ar wants to merge 1 commit into
Open
Add freq04.cpp: SIMD + hash-table optimized version of freq03.cpp#29metacircu1ar wants to merge 1 commit into
metacircu1ar wants to merge 1 commit into
Conversation
Optimizations: - SIMD tokenization (NEON on ARM, SSE2 on x86) - Split short/long hash-table acquire paths - Inline short-key storage (words up to 8 chars in entry::shortkey) - Direct-address counter table for words of length 1..5 Benchmark on Apple M-series, Makefile default -O3, 10 warm runs: freq03 0.686..0.715s 490.4 MB/s (baseline) freq04 0.417..0.454s 805.5 MB/s (optimized) Speedup: 1.64x (wall-clock time reduced by ~39%) Throughput: +315.1 MB/s (+64.3%) Output verified byte-equal to freq03 via cmp.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Optimizations:
Benchmark on Apple M-series, Makefile default -O3, 10 warm runs:
freq03 0.686..0.715s 490.4 MB/s (baseline)
freq04 0.417..0.454s 805.5 MB/s (optimized)
Speedup: 1.64x (wall-clock time reduced by ~39%)
Throughput: +315.1 MB/s (+64.3%)
Output verified byte-equal to freq03 via cmp.