perf: scan lf text-block lines with indexOf#841
Open
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Open
perf: scan lf text-block lines with indexOf#841He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Conversation
Motivation: Large text blocks spend visible parser time finding line boundaries with a Scala character loop. Modification: Use String.indexOf for single-character LF separators in the bulk text-block scanner while preserving the existing CRLF and multi-character separator path. Result: LF text-block parsing uses the platform string search fast path without changing text-block semantics.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation:
Large text blocks spend visible parser time finding line boundaries with a Scala character loop.
Key Design Decision:
Only single-character LF separators use
String.indexOf('\n'); CRLF and other multi-character separator paths keep the previous scanner to preserve behavior.Modification:
In the bulk
|||text-block body scanner, useString.indexOfto find LF line endings after indentation.Benchmark Results:
bench/resources/cpp_suite/large_string_template.jsonnet10.855 +/- 0.773 ms10.300 +/- 0.638 ms-5.12%11.191 ms10.373 ms-7.3%10.552 +/- 0.656 ms; jrsonnet5.611 +/- 0.826 ms1.88xAnalysis:
This is a narrow parser optimization with explicit CRLF fallback. Other larger text-block scanner rewrites were tested and rejected, so this PR keeps only the stable LF
indexOfwin.References:
Source exploration commit: He-Pin/sjsonnet
8f8ed59a.Result:
Local
./mill --no-server -j 1 __.reformatand./mill --no-server -j 1 __.testpassed on this split branch (2066/2066).