perf: comprehensive Scala Native render pipeline optimization#776
perf: comprehensive Scala Native render pipeline optimization#776He-Pin wants to merge 14 commits intodatabricks:masterfrom
Conversation
|
I think the string join can be improved with ast rewritten,but I want to do that after this got merged. |
● Benchmark 结果汇总
环境: Apple Silicon, macOS | 工具: hyperfine --warmup 5 --min-runs 20 -N sjsonnet: Scala Native (当前分支, 含 PR #776 优化) | jrsonnet: 0.5.0-pre98 (从源码编译)
可靠基准 (>20ms 运行时间,启动开销不主导)
Benchmark │ sjsonnet (ms) │ jrsonnet (ms) │ 比值 │ 胜者
───────────────────────────────────────────────┼──────────────────────────────────────────────┼──────────────────────────────────────────────┼──────────────────────────────────────────────┼──────────────────────────────────────────────
comparsion_for_primitives │ 37.6 │ 214.5 │ sjsonnet 5.71x 更快 │ sjsonnet
inheritance_recursion │ 60.7 │ 120.2 │ sjsonnet 1.98x 更快 │ sjsonnet
simple_recursive_call │ 28.8 │ 52.6 │ sjsonnet 1.83x 更快 │ sjsonnet
realistic_2 │ 89.4 │ 101.7 │ sjsonnet 1.14x 更快 │ sjsonnet
std_reverse │ 21.6 │ 23.5 │ 持平 (1.09x) │ 持平
中等规模 (10-20ms)
Benchmark │ sjsonnet (ms) │ jrsonnet (ms) │ 比值 │ 胜者
───────────────────────────────────────────────┼──────────────────────────────────────────────┼──────────────────────────────────────────────┼──────────────────────────────────────────────┼──────────────────────────────────────────────
std_base64_byte_array │ 9.8 │ 18.2 │ sjsonnet 1.86x 更快 │ sjsonnet
std_base64decodebytes │ 14.1 │ 20.5 │ sjsonnet 1.45x 更快 │ sjsonnet
big_object │ 10.5 │ 11.6 │ sjsonnet 1.10x 更快 │ sjsonnet
realistic_1 │ 9.3 │ 11.9 │ sjsonnet 1.27x 更快 │ sjsonnet
小规模 (<10ms,启动开销主导)
Benchmark │ sjsonnet (ms) │ jrsonnet (ms) │ 比值 │ 胜者
───────────────────────────────────────────────┼──────────────────────────────────────────────┼──────────────────────────────────────────────┼──────────────────────────────────────────────┼──────────────────────────────────────────────
comparsion_for_array │ 6.3 │ 12.8 │ sjsonnet 2.02x 更快 │ sjsonnet
foldl_string_concat │ 5.4 │ 8.6 │ sjsonnet 1.59x 更快 │ sjsonnet
std_foldl │ 6.2 │ 7.4 │ sjsonnet 1.19x 更快 │ sjsonnet
large_string_join │ 6.8 │ 5.4 │ jrsonnet 1.26x 更快 │ jrsonnet
array_sorts │ 8.2 │ 5.5 │ jrsonnet 1.49x 更快 │ jrsonnet
std_base64 │ 7.8 │ 4.2 │ jrsonnet 1.86x 更快 │ jrsonnet
std_base64decode │ 7.3 │ 5.3 │ jrsonnet 1.36x 更快 │ jrsonnet
std_manifestjsonex │ 6.4 │ 4.1 │ jrsonnet 1.54x 更快 │ jrsonnet
std_manifesttomlex │ 6.5 │ 3.6 │ jrsonnet 1.82x 更快 │ jrsonnet
std_parseint │ 6.1 │ 3.6 │ jrsonnet 1.70x 更快 │ jrsonnet
std_substr │ 6.2 │ 4.2 │ jrsonnet 1.45x 更快 │ jrsonnet
string_strips │ 5.7 │ 3.9 │ jrsonnet 1.48x 更快 │ jrsonnet
tail_call │ 5.9 │ 3.7 │ jrsonnet 1.57x 更快 │ jrsonnet
inheritance_function_recursion │ 5.0 │ 2.9 │ jrsonnet 1.74x 更快 │ jrsonnet |
Motivation: Combined review of PR databricks#776 + databricks#778 identified ~130 lines of duplicated SWAR string rendering and long-to-char conversion code, plus two missing overflow checks in StringModule. Modification: - Extract renderQuotedStringSWAR as protected method in BaseCharRenderer, delegate from MaterializeJsonRenderer (removes ~60 lines duplication) - Make escapeCharInline protected, remove duplicate in Renderer - Consolidate Renderer.visitFloat64 onto inherited writeLongDirect, remove standalone RenderUtils.appendLong (~40 lines) - Add totalLen > Int.MaxValue guard in Join pre-sized allocation - Add Long overflow detection in parseDigits - Leverage _asciiSafe flag in Substr/Join to skip redundant scans Result: Net -132 lines. All tests pass across JVM/JS/Native/WASM.
Motivation: Combined review of PR databricks#776 + databricks#778 identified ~130 lines of duplicated SWAR string rendering and long-to-char conversion code, plus two missing overflow checks in StringModule. Modification: - Extract renderQuotedStringSWAR as protected method in BaseCharRenderer, delegate from MaterializeJsonRenderer (removes ~60 lines duplication) - Make escapeCharInline protected, remove duplicate in Renderer - Consolidate Renderer.visitFloat64 onto inherited writeLongDirect, remove standalone RenderUtils.appendLong (~40 lines) - Add totalLen > Int.MaxValue guard in Join pre-sized allocation - Add Long overflow detection in parseDigits - Leverage _asciiSafe flag in Substr/Join to skip redundant scans Result: Net -132 lines. All tests pass across JVM/JS/Native/WASM.
Motivation: Combined review of PR databricks#776 + databricks#778 identified ~130 lines of duplicated SWAR string rendering and long-to-char conversion code, plus two missing overflow checks in StringModule. Modification: - Extract renderQuotedStringSWAR as protected method in BaseCharRenderer, delegate from MaterializeJsonRenderer (removes ~60 lines duplication) - Make escapeCharInline protected, remove duplicate in Renderer - Consolidate Renderer.visitFloat64 onto inherited writeLongDirect, remove standalone RenderUtils.appendLong (~40 lines) - Add totalLen > Int.MaxValue guard in Join pre-sized allocation - Add Long overflow detection in parseDigits - Leverage _asciiSafe flag in Substr/Join to skip redundant scans Result: Net -132 lines. All tests pass across JVM/JS/Native/WASM.
|
Reviewed and keeping this as a follow-up, not part of this PR. #776 is scoped to the render/materialization pipeline and Scala Native-friendly SWAR/direct rendering paths. An AST rewrite for string join would be a separate optimization because it changes the optimization boundary earlier in the pipeline and should get its own focused benchmark/compatibility review. |
Motivation: PR databricks#776 already propagates _asciiSafe through parser literals, base64, joins, and substrings, but MaterializeJsonRenderer still sent those known-safe strings through the chunked char renderer, allocating a temporary char array and scanning for escapes. The hand-written parseInt path also rejected Long.MinValue, which the previous Long.parseLong-based implementation accepted. Modification: Add a char-renderer fast path for known ASCII-safe strings and use it in fused MaterializeJsonRenderer. Let std.length trust _asciiSafe before scanning, and switch parseDigits to negative accumulation so Long.MinValue is accepted while positive overflow remains rejected. Result: Known ASCII-safe strings skip allocation and escape scanning in char materialization and std.length. parseInt keeps the overflow guard without regressing the Long.MinValue boundary.
|
Small follow-up pushed in
Validation run locally:
|
Motivation: Combined review of PR databricks#776 + databricks#778 identified ~130 lines of duplicated SWAR string rendering and long-to-char conversion code, plus two missing overflow checks in StringModule. Modification: - Extract renderQuotedStringSWAR as protected method in BaseCharRenderer, delegate from MaterializeJsonRenderer (removes ~60 lines duplication) - Make escapeCharInline protected, remove duplicate in Renderer - Consolidate Renderer.visitFloat64 onto inherited writeLongDirect, remove standalone RenderUtils.appendLong (~40 lines) - Add totalLen > Int.MaxValue guard in Join pre-sized allocation - Add Long overflow detection in parseDigits - Leverage _asciiSafe flag in Substr/Join to skip redundant scans Result: Net -132 lines. All tests pass across JVM/JS/Native/WASM.
Motivation: PR databricks#776 already propagates _asciiSafe through parser literals, base64, joins, and substrings, but MaterializeJsonRenderer still sent those known-safe strings through the chunked char renderer, allocating a temporary char array and scanning for escapes. The hand-written parseInt path also rejected Long.MinValue, which the previous Long.parseLong-based implementation accepted. Modification: Add a char-renderer fast path for known ASCII-safe strings and use it in fused MaterializeJsonRenderer. Let std.length trust _asciiSafe before scanning, and switch parseDigits to negative accumulation so Long.MinValue is accepted while positive overflow remains rejected. Result: Known ASCII-safe strings skip allocation and escape scanning in char materialization and std.length. parseInt keeps the overflow guard without regressing the Long.MinValue boundary.
Motivation: Split the JMH-positive, JDK17/JIT/GC-friendly long-string rendering piece out of #776. Keep this PR focused on byte rendering for long strings that contain JSON escapes; this does not include the broader format, stdlib, compareStrings, or Scala Native experiments from #776. Modification: - Add `CharSWAR.findFirstEscapeChar(byte[], from, to)` on JVM, Scala.js, and Scala Native. - In `BaseByteRenderer`, keep the existing UTF-8 byte array for long strings, locate escape bytes, bulk-copy clean chunks with `System.arraycopy`, and escape only matching bytes inline. - Precompute the exact escaped output length, reserve `ByteBuilder` once, then write directly to the backing byte array. This removes repeated `ensureLength`/`appendUnsafeC` calls from the dirty long-string loop. - Use a static byte hex table for `\u00XX` control escapes. JIT / GC shape: - Hot code stays in simple `while` loops, `System.arraycopy`, and small private helpers. - No reflection, no internal JDK APIs, no closures/iterators in the rendering loop. - No per-chunk or per-escape objects are allocated by this follow-up; the existing per-long-string UTF-8 byte array remains the only temporary for this path. - I tested a no-allocation ASCII scalar path, but rejected it because it regressed `large_string_template` and `large_string_join` JMH. Notable results only: JMH target run, same machine, same command shape on `upstream/master` and this branch: `./mill -i bench.runRegressions bench/resources/cpp_suite/large_string_template.jsonnet bench/resources/cpp_suite/large_string_join.jsonnet` | Benchmark | upstream/master | PR | Delta | | --- | ---: | ---: | ---: | | `large_string_template` | 1.552 ms/op | 1.154 ms/op | -25.6% / 1.34x faster | Scala Native hyperfine, release-full native binary, 20 runs: | Benchmark | upstream/master | PR | Delta | | --- | ---: | ---: | ---: | | `large_string_template` | 10.5 +/- 0.2 ms | 9.6 +/- 0.3 ms | -8.6% / 1.09x faster | `large_string_join` was rechecked as a guardrail and stayed neutral, so it is intentionally omitted from the result tables. Verification: - `./mill -i 'sjsonnet.jvm[3.3.7].compile'` - `./mill -i 'sjsonnet.jvm[3.3.7].test'` - `./mill -i 'sjsonnet.js[3.3.7].compile' 'sjsonnet.native[3.3.7].compile'` - `./mill -i 'sjsonnet.native[3.3.7].nativeLink'` - `./mill -i __.checkFormat` - `git diff --check` - Focused JMH and Native hyperfine commands above References: - Split from #776 - Base: `b4c667d55d82d7c50c2103db967c33bebb0c2c98` - Head: `ff70b63e`
|
Closing obsolete broad draft. The useful render work has been or should be split into smaller focused PRs with current docs-aligned data; this branch is now conflicting and too broad to carry forward as-is. |
|
Reopened. This broad branch still conflicts heavily with current renderer/SWAR code and overlaps later split PRs. Keep as draft/source material for extracting smaller PRs rather than closing it as negative. |
|
Rebase retry against current upstream/master still conflicts at the first renderer/SWAR commit: sjsonnet/src-js/sjsonnet/CharSWAR.scala, sjsonnet/src-native/sjsonnet/CharSWAR.scala, and sjsonnet/src/sjsonnet/BaseByteRenderer.scala. Keeping this as draft/source material; not closing because this is not a negative benchmark result. |
Motivation: String comparison (compareStringsByCodepoint) and long string rendering are hot paths in sort-heavy and render-heavy Jsonnet workloads. The comparison used per-char charAt() virtual dispatch preventing JIT vectorization. Long string rendering used a binary scan (clean→bulk copy, dirty→full reprocess from position 0). Modification: 1. compareStrings: bulk getChars() + tight array loop enabling JIT auto-vectorization (AVX2/SSE). Surrogate check deferred to mismatch point only (O(1) vs O(n)). ThreadLocal buffers on JVM, local alloc on Native, scalar fallback on JS. 2. findFirstEscapeChar: SWAR scan returning position (not boolean). 3. visitLongString: chunked rendering — find escape position, arraycopy clean prefix, escape inline, repeat. Avoids re-processing entire string when only a few chars need escaping. Result: All tests pass across JVM (Scala 3.3.7, 2.13.18) and JS. All benchmark regressions pass. Endian-safe (SWAR operates on independent byte lanes).
Replace per-call `new Array[Char](n)` allocation with module-level pre-allocated buffers in Scala Native's compareStrings. Safe because Scala Native is single-threaded (mirrors the JVM ThreadLocal approach).
Motivation: manifestJsonEx/manifestTomlEx used the generic Visitor interface for char-based rendering, missing the fused direct-walk optimization that ByteRenderer already had. Additionally, char-based string rendering (BaseCharRenderer, MaterializeJsonRenderer) did binary hasEscapeChar check → char-by-char RenderUtils.escapeChar fallback, while ByteRenderer had proper chunked SWAR scanning → bulk arraycopy → inline escape. Modification: - Add materializeDirect(Val) to MaterializeJsonRenderer, mirroring ByteRenderer's fused materializer with valTag-based switch dispatch - Replace visitNonNullString in BaseCharRenderer with chunked rendering: findFirstEscapeCharChar → bulk arraycopy clean segments → escapeCharInline - Add renderQuotedString to MaterializeJsonRenderer with same chunked pattern - Add findFirstEscapeCharChar(char[]) to all 3 CharSWAR platform impls - Wire ManifestModule to use renderer.materializeDirect instead of Materializer.apply0 + Visitor interface Result: manifestJsonEx gap reduced from 2.15x to ~1.4x slower vs jrsonnet. realistic_2 flipped from 1.62x slower to 1.12x faster.
…afe propagation Motivation: String-heavy stdlib operations (substr, length, join, parseInt) had unnecessary overhead on Scala Native: codePointCount/offsetByCodePoints O(n) scans for ASCII strings, StringBuilder resize churn for join, exception-based parseInt via Long.parseLong. Modification: - Add ASCII fast path to Length and Substr using CharSWAR.isAllAscii: skip codePointCount/offsetByCodePoints for ASCII-only strings (99% case) - Pre-sized char[] assembly for std.join: two-pass approach calculates exact output length, then copies with getChars — zero resize overhead - Hand-written parseDigits loop for parseInt/parseOctal/parseHex: no exception setup, no intermediate allocation, single pass - Propagate _asciiSafe flag: parser sets it on ASCII string literals, Val.Str.concat preserves it when both children are ASCII-safe, join propagates it through all elements Result: substr gap reduced from 2.03x to ~1.07x. parseint from 1.80x to ~1.0x. large_string_join from 1.81x to ~1.27x. realistic_2 benefits from combined improvements.
Motivation: Format.format() used StringBuilder which starts small and resizes multiple times for large output. The large_string_template benchmark (591KB template, 256 interpolations) showed 2.78x gap vs jrsonnet. Modification: - Three-pass approach: compute formatted values into String array, calculate exact total output length, allocate char[] and copy with getChars — eliminates StringBuilder resize/copy overhead - Add direct Val dispatch in format loop: skip Materializer for common types (Str, Num, Bool, Null) to avoid ujson.Value roundtrip Result: large_string_template gap reduced from 2.78x to ~1.88x. Remaining gap is dominated by Scala Native startup overhead (~7ms vs Rust ~1ms); pure computation time is within ~1ms of jrsonnet.
Motivation: CI fails on two issues: (1) unused `alwaysinline` import in Native CharSWAR.scala, (2) `\uXXXX` sequences in comments are parsed as unicode escapes in Scala 2.12, causing compilation errors. Modification: - Remove unused `scala.scalanative.annotation.alwaysinline` import - Escape backslash-u sequences in comments across BaseByteRenderer and Renderer Result: Full test suite passes across all platforms and Scala versions
Motivation: Combined review of PR databricks#776 + databricks#778 identified ~130 lines of duplicated SWAR string rendering and long-to-char conversion code, plus two missing overflow checks in StringModule. Modification: - Extract renderQuotedStringSWAR as protected method in BaseCharRenderer, delegate from MaterializeJsonRenderer (removes ~60 lines duplication) - Make escapeCharInline protected, remove duplicate in Renderer - Consolidate Renderer.visitFloat64 onto inherited writeLongDirect, remove standalone RenderUtils.appendLong (~40 lines) - Add totalLen > Int.MaxValue guard in Join pre-sized allocation - Add Long overflow detection in parseDigits - Leverage _asciiSafe flag in Substr/Join to skip redundant scans Result: Net -132 lines. All tests pass across JVM/JS/Native/WASM.
Motivation: PR databricks#776 already propagates _asciiSafe through parser literals, base64, joins, and substrings, but MaterializeJsonRenderer still sent those known-safe strings through the chunked char renderer, allocating a temporary char array and scanning for escapes. The hand-written parseInt path also rejected Long.MinValue, which the previous Long.parseLong-based implementation accepted. Modification: Add a char-renderer fast path for known ASCII-safe strings and use it in fused MaterializeJsonRenderer. Let std.length trust _asciiSafe before scanning, and switch parseDigits to negative accumulation so Long.MinValue is accepted while positive overflow remains rejected. Result: Known ASCII-safe strings skip allocation and escape scanning in char materialization and std.length. parseInt keeps the overflow guard without regressing the Long.MinValue boundary.
Motivation: The full PR databricks#776 rebase introduced a duplicate JVM escape scan overload, missed the char renderer hex table, and allowed no-spec format strings to return null through the offset-based RuntimeFormat path. Modification: Remove the duplicate JVM byte-array escape scan, expose HEX_CHARS for BaseCharRenderer, and return the original source string for RuntimeFormat entries with no format specs. Result: The rebased branch compiles and the full cross-platform Mill test matrix passes locally. References: Upstream PR: databricks#776
Motivation: The rebased format optimization improved large templates but initially regressed the short repeat_format regression case because single-placeholder formats paid for an unnecessary final char-array assembly step. Modification: Return the already formatted value directly when a format string has exactly one spec and no static literal characters. Result: repeat_format improves from 0.188 ms/op on upstream master to 0.148 ms/op on this branch in the local JMH gate, while the full Mill test matrix remains green. References: Upstream PR: databricks#776
|
Complete rebase pushed to |
Motivation: std.substr on long ASCII strings repeatedly pays codepoint-offset scans even when parser-time analysis can prove the literal is printable ASCII and JSON-render safe. Modification: Mark long ASCII JSON-safe literals with the existing _asciiSafe flag using a single platform CharSWAR scan, propagate the flag through string concatenation, and let std.length/std.substr use direct UTF-16 length/substring only for proven-safe values. Add UnicodeHandlingTests coverage for long ASCII length/substr boundaries and concat propagation. Result: Focused JVM JMH improves go_suite/substr from 0.056 ms/op to 0.046-0.047 ms/op with split_resolve unchanged and realistic2 in the same noise range. Scala Native hyperfine is neutral against master on the same case. References: Extracted from ideas in databricks#776, especially commit a190a80 (ASCII fast paths and asciiSafe propagation), narrowed to avoid the broader join/parseInt changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation: std.substr on long ASCII strings repeatedly pays codepoint-offset scans even when parser-time analysis can prove the literal is printable ASCII and JSON-render safe. Modification: Mark long ASCII JSON-safe literals with the existing _asciiSafe flag using a single platform CharSWAR scan, propagate the flag through string concatenation, and let std.length/std.substr use direct UTF-16 length/substring only for proven-safe values. Add UnicodeHandlingTests coverage for long ASCII length/substr boundaries and concat propagation. Result: Focused JVM JMH improves go_suite/substr from 0.056 ms/op to 0.046-0.047 ms/op with split_resolve unchanged and realistic2 in the same noise range. Scala Native hyperfine is neutral against master on the same case. References: Extracted from ideas in databricks#776, especially commit a190a80 (ASCII fast paths and asciiSafe propagation), narrowed to avoid the broader join/parseInt changes.
Motivation: PR #776 showed that format-heavy workloads benefit when the format path avoids unnecessary intermediate assembly. This split keeps only the smallest safe idea: short format strings with exactly one specifier and no static literal text (for example `%08d`, `%010x`, `%-20s`, `%20s`) do not need a `StringBuilder` after the formatted value has already been computed. Key Design Decision: Keep the existing generic formatting implementation and all validation/arity checks. The optimization only bypasses appending to `StringBuilder` after the single formatted value is known, so format semantics and error behavior stay unchanged. Modification: - Detect `specBits.length == 1 && parsed.staticChars == 0` in `Format.format`. - Avoid allocating/appending a `StringBuilder` for that case. - Return the computed formatted value directly after the existing too-many/too-few value checks. Benchmark Results: JMH (`./mill -j 1 bench.runRegressions ...`, ms/op lower is better; ops/ms higher is better): | Case | master ms/op | PR ms/op | master ops/ms | PR ops/ms | Delta | |---|---:|---:|---:|---:|---:| | `repeat_format` | 0.190 | 0.133 | 5.263 | 7.519 | +42.9% | | `large_string_template` guard | 1.155 | 1.160 | 0.866 | 0.862 | -0.4% noisy/neutral | Scala Native hyperfine (`hyperfine --warmup 10 --min-runs 50 -N`, ms lower is better): | Case | master native | PR native | jrsonnet | Result | |---|---:|---:|---:|---| | `repeat_format` | 6.4 ± 0.9 ms | 6.4 ± 0.7 ms | 5.6 ± 1.0 ms | Native neutral; JMH target positive | | `large_string_template` guard | 12.6 ± 7.4 ms | 11.4 ± 0.8 ms | 5.4 ± 0.9 ms | No Native regression observed | Analysis: The target case is dominated by many short format expressions. Returning the already computed formatted string removes a redundant builder allocation/append path on the JVM. The guard case does not use this single-spec/no-static-literal path and remains effectively unchanged within benchmark noise. References: - Source idea: #776 - Split branch commit: He-Pin/sjsonnet@1f58504d Result: - `./mill -j 1 __.reformat && ./mill -j 1 __.test` passed locally. - Draft PR split from the broader #776 optimization branch to keep the change reviewable and avoid carrying unrelated Native template work.
Motivation: std.substr on long ASCII strings repeatedly pays codepoint-offset scans even when parser-time analysis can prove the literal is printable ASCII and JSON-render safe. Modification: Mark long ASCII JSON-safe literals with the existing _asciiSafe flag using a single platform CharSWAR scan, propagate the flag through string concatenation, and let std.length/std.substr use direct UTF-16 length/substring only for proven-safe values. Add UnicodeHandlingTests coverage for long ASCII length/substr boundaries and concat propagation. Result: Focused JVM JMH improves go_suite/substr from 0.056 ms/op to 0.046-0.047 ms/op with split_resolve unchanged and realistic2 in the same noise range. Scala Native hyperfine is neutral against master on the same case. References: Extracted from ideas in databricks#776, especially commit a190a80 (ASCII fast paths and asciiSafe propagation), narrowed to avoid the broader join/parseInt changes.
Motivation: std.substr on long ASCII strings repeatedly pays codepoint-offset scans even when parser-time analysis can prove the literal is printable ASCII and JSON-render safe. Modification: Mark long ASCII JSON-safe literals with the existing _asciiSafe flag using a single platform CharSWAR scan, propagate the flag through string concatenation, and let std.length/std.substr use direct UTF-16 length/substring only for proven-safe values. Add UnicodeHandlingTests coverage for long ASCII length/substr boundaries and concat propagation. Result: Focused JVM JMH improves go_suite/substr from 0.056 ms/op to 0.046-0.047 ms/op with split_resolve unchanged and realistic2 in the same noise range. Scala Native hyperfine is neutral against master on the same case. References: Extracted from ideas in databricks#776, especially commit a190a80 (ASCII fast paths and asciiSafe propagation), narrowed to avoid the broader join/parseInt changes.
Motivation: The fused renderer fallback entered the generic materializer with a fresh context, losing active object cycle tracking once the recursive depth limit was reached. Modification: Expose the stackless materializer fallback inside sjsonnet and route char/byte direct renderers through it with the existing MaterializeContext. Add regression coverage for manifestJson and ByteRenderer with a low recursive depth limit. Result: Deep direct rendering preserves recursion detection while retaining the stackless fallback path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation:
This PR is the complete rebased render/string optimization branch for preserving the full #776 experiment on top of current
master. It is intentionally broad and should remain draft/source material before we split the useful pieces into focused PRs.Key Design Decision:
Modification:
upstream/masterat0ae7b78a93c4e643f9bcfb6dd1d99d9fe7a522a9.std.formatno-spec/single-spec path so offset-based scanned formats do not create invalid null strings.Benchmark Results:
JMH command:
large_string_template.jsonnetlarge_string_join.jsonnetbench.09.jsonnetrepeat_format.jsonnetmanifestTomlEx.jsonnetmanifestJsonEx.jsonnetsubstr.jsonnetparseInt.jsonnetScala Native hyperfine, 30 runs,
--shell=none, compared against source-built jrsonnet0.5.0-pre98:large_string_template.jsonnetlarge_string_join.jsonnetrepeat_format.jsonnetmanifestTomlEx.jsonnetsubstr.jsonnetparseInt.jsonnetAnalysis:
The full branch has real positive signals, especially
large_string_join,repeat_format, and some JVM string/format paths. It also carries broad code movement and measurable regressions/risks (bench.09,manifestTomlEx, and noisy Native short-run cases). Therefore this should stay draft and be used as the source for focused splits rather than merged wholesale.References:
Result:
Complete rebase is pushed. Keep this PR draft/source material; next step is to split the positive pieces from this rebased head into smaller PRs with isolated tests and benchmark gates.