perf: use ASCII bitsets for strip chars#789
Draft
He-Pin wants to merge 2 commits intodatabricks:masterfrom
Draft
perf: use ASCII bitsets for strip chars#789He-Pin wants to merge 2 commits intodatabricks:masterfrom
He-Pin wants to merge 2 commits intodatabricks:masterfrom
Conversation
9004941 to
7f64a1d
Compare
Contributor
Author
|
Closing as low priority after rebase: current master already has the main strip-char fast path, and this PR shows no stable standout delta on the go_suite strip docs cases or Native hyperfine. |
Contributor
Author
|
Reopened. This is not negative; keep it as draft as a small GC-friendly strip delimiter cleanup. Current docs strip benchmarks do not show standout gains, so it should not be moved to ready yet. |
f569b54 to
7f64a1d
Compare
Port the useful part of He-Pin/sjsonnet jit commit dd90b11 onto current master. The new path avoids Set[Int] allocation for ASCII-only stripChars, lstripChars, and rstripChars calls, preserves the existing codepoint-aware fallback for non-ASCII strip sets, and uses platform-specific masks selected from local measurements. Upstream-source: dd90b11
7f64a1d to
4addea1
Compare
Contributor
Author
|
Rebased to latest |
Motivation: The ASCII bitset optimization made stripChars, lstripChars, and rstripChars validate chars before str, contradicting the builtin call path and changing observable error ordering. Modification: Restore str-before-chars extraction for all three strip helpers while keeping the optimized strip implementation. Update the regression test to assert the existing first-argument error behavior. Result: The optimization keeps existing builtin semantics for type and error ordering. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation:
std.stripChars,std.lstripChars, andstd.rstripCharsalready avoid boxedSet[Int]for single/BMP delimiter sets on currentmaster. This draft keeps that Unicode-aware path and narrows the ASCII-only multi-character delimiter case by replacing the per-calljava.util.BitSetallocation with primitive masks.Key Design Decision:
Intmasks; Scala Native uses twoLongmasks.BitSetfast path.charsis forced beforestr, preserving existing error evaluation order.Modification:
stripChars,lstripChars, andrstripCharsthrough the ASCII mask path whencharscontains only ASCII delimiters.Benchmark Results:
Rebased onto
upstream/masterat0ae7b78a93c4e643f9bcfb6dd1d99d9fe7a522a9.JMH command:
lstripChars.jsonnetrstripChars.jsonnetstripChars.jsonnetbench.09.jsonnetScala Native hyperfine, original go_suite strip files, 50 runs, compared against source-built jrsonnet
0.5.0-pre98:lstripChars.jsonnetrstripChars.jsonnetstripChars.jsonnetScaled temporary Native strip stress cases, 30 runs,
--shell=none, outputs checked equal across master/PR/jrsonnet:lstrip-scaled.jsonnetrstrip-scaled.jsonnetstrip-scaled.jsonnetAnalysis:
Current
masteralready has the main BMPBitSetstrip fast path, so this PR no longer closes a visible go_suite strip gap on JVM JMH. The remaining value is narrower: a GC-friendly ASCII delimiter cleanup that can be slightly positive in scaled Native strip stress cases, while original go_suite cases are too small/noisy to justify moving this out of draft.References:
Result:
Keep as draft. This is safe after rebase and tests, but it should not be marked ready unless a future benchmark shows a stable user-visible improvement or we explicitly decide the GC/allocation cleanup is worth merging on its own.