feat(lazy): scalar patches on existing keys (closes #48)#49
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughThis PR implements a lazy scalar-patch feature for lua-qjson that records mutations to existing object keys in a _patches side list, supplies a new FFI function to retrieve field byte spans, and applies patches during encoding via buffer splicing. It includes comprehensive FFI and Lua tests, a design specification, and benchmark integration. ChangesLazy Scalar Patch on Existing Keys
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Implements a minimal, low-risk lazy mutation fast-path for qjson.decode() results: scalar replacements on existing LazyObject keys are recorded as byte-span patches and emitted by splicing into the original JSON buffer during qjson.encode(). This adds a fused Rust FFI helper to locate a field and its exact source byte range, while preserving all other mutation behaviors via the existing materialization path.
Changes:
- Add
qjson_cursor_field_bytesFFI API (Rust impl + C header + LuaJITffi.cdef) to fuse field lookup with byte-span extraction. - Extend
LazyObjectto lazily track scalar replacement patches (_patches) and add a splice-based encode branch when safe. - Add targeted Rust/Lua tests and a new Lua bench scenario (
decode + modify-5-scalars + encode) to exercise the splice path.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/scanner_crosscheck.rs | Adds a proptest cross-check for qjson_cursor_field_bytes against the source JSON spans. |
| tests/lua/lazy_patch_spec.lua | New Lua busted suite covering scalar patch recording, read/pairs/materialize/encode behavior, and fallthrough-to-materialize cases. |
| tests/ffi_panic_safety.rs | Adds a panic-barrier regression test for qjson_cursor_field_bytes under test-panic. |
| tests/ffi_cursor_field_bytes.rs | New Rust integration tests validating spans, error codes, null out-params, whitespace trimming, and duplicate key semantics. |
| src/ffi.rs | Implements qjson_cursor_field_bytes with the existing ffi_catch! panic barrier and byte-range logic consistent with qjson_cursor_bytes. |
| lua/qjson/table.lua | Adds _patches tracking, patch-aware reads/iteration/materialization, and splice-based encoding when there are patches and no dirty descendants. |
| lua/qjson.lua | Declares the new FFI symbol in the LuaJIT ffi.cdef block. |
| include/qjson.h | Declares the new public C ABI function qjson_cursor_field_bytes. |
| docs/superpowers/specs/2026-05-21-lazy-scalar-patch-design.md | Design doc describing scope, invariants, decision tree, edge cases, tests, and benchmarks. |
| benches/lua_bench.lua | Adds a github-100k scenario that modifies 5 existing scalar fields to exercise the splice encoder path. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| unsafe { | ||
| let mut err: std::os::raw::c_int = -1; | ||
| let doc = qjson_parse(json.as_ptr(), json.len(), &mut err); |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/ffi.rs (1)
807-825: ⚡ Quick winExtract shared byte-span logic to one helper to avoid semantic drift.
qjson_cursor_field_bytesre-implements the same span branching already inqjson_cursor_bytes. A shared internal helper would keep both APIs behavior-locked.♻️ Proposed refactor
+unsafe fn cursor_syntactic_byte_range( + d: &Document<'_>, + cur: Cursor, +) -> Result<(usize, usize), qjson_err> { + let pos = d.indices[cur.idx_start as usize] as usize; + let lead = match d.buf.get(pos) { + Some(b) => *b, + None => return Err(qjson_err::QJSON_PARSE_ERROR), + }; + match lead { + b'{' | b'[' | b'"' => { + let end = d.indices[cur.idx_end as usize] as usize; + if end >= d.buf.len() { + return Err(qjson_err::QJSON_PARSE_ERROR); + } + Ok((pos, end + 1)) + } + _ => scalar_byte_range(d, cur), + } +}- let pos = d.indices[cur.idx_start as usize] as usize; - let lead = match d.buf.get(pos) { ... }; - match lead { ... } + let (s, e) = match cursor_syntactic_byte_range(d, cur) { + Ok(x) => x, Err(e) => return e as c_int, + }; + *byte_start = s; + *byte_end = e;- // Compute the syntactic byte span — same logic as qjson_cursor_bytes. - let pos = d.indices[child.idx_start as usize] as usize; - let lead = match d.buf.get(pos) { ... }; - let (bs, be) = match lead { ... }; + let (bs, be) = match cursor_syntactic_byte_range(d, child) { + Ok(x) => x, Err(e) => return e as c_int, + };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/ffi.rs` around lines 807 - 825, Extract the duplicated syntactic byte-span branching from qjson_cursor_field_bytes and qjson_cursor_bytes into a single internal helper (e.g., compute_syntactic_byte_span) that takes the buffer descriptor (d) and the cursor child (child) and returns Result<(usize,usize), qjson_err>. Move the existing match on lead (checking b'{'|b'['|b'"' and bounds-checking using d.indices and d.buf.len()) into this helper and have both qjson_cursor_field_bytes and qjson_cursor_bytes call it; fall back to calling scalar_byte_range inside the helper when the lead byte doesn't match, and return the appropriate qjson_err on failure so both APIs share identical semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lua/qjson/table.lua`:
- Around line 115-119: The current find_patch lookup uses only the key name,
which breaks when objects have duplicate keys; change find_patch (and the three
other lookup sites noted) to match a patch by its recorded byte-span fields
(e.g., patches[i].bs and patches[i].be) instead of comparing k, so the lookup
locates the exact occurrence recorded by try_record_patch; update any callers
that pass a key to instead pass the stored bs/be pair (or pass the patch id) and
ensure the function still returns nil when no span match is found.
- Around line 63-67: The internal bookkeeping currently uses the user-visible
key "_patches" (listed in INTERNAL_KEYS) which collides with JSON objects that
also have a "_patches" member; change the implementation so internal patch state
is stored under a private sentinel (e.g., an upvalue or a unique table key not
reachable by user keys) instead of "_patches", remove "_patches" from
INTERNAL_KEYS, and ensure any code that writes/reads obj._patches (the place
that sets the bookkeeping table and the branch that deletes when k ==
"_patches") is updated to use the new sentinel or else forces materialization
through the materialization path rather than the splice fast path so
user-supplied "_patches" members are preserved.
---
Nitpick comments:
In `@src/ffi.rs`:
- Around line 807-825: Extract the duplicated syntactic byte-span branching from
qjson_cursor_field_bytes and qjson_cursor_bytes into a single internal helper
(e.g., compute_syntactic_byte_span) that takes the buffer descriptor (d) and the
cursor child (child) and returns Result<(usize,usize), qjson_err>. Move the
existing match on lead (checking b'{'|b'['|b'"' and bounds-checking using
d.indices and d.buf.len()) into this helper and have both
qjson_cursor_field_bytes and qjson_cursor_bytes call it; fall back to calling
scalar_byte_range inside the helper when the lead byte doesn't match, and return
the appropriate qjson_err on failure so both APIs share identical semantics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 23f5cd0b-a7c0-42ed-b7d7-8589b45d2478
📒 Files selected for processing (10)
benches/lua_bench.luadocs/superpowers/specs/2026-05-21-lazy-scalar-patch-design.mdinclude/qjson.hlua/qjson.lualua/qjson/table.luasrc/ffi.rstests/ffi_cursor_field_bytes.rstests/ffi_panic_safety.rstests/lua/lazy_patch_spec.luatests/scanner_crosscheck.rs
| -- Reserved bookkeeping keys; rawget-cache and iteration checks skip these. | ||
| local INTERNAL_KEYS = { | ||
| _doc = true, _cur_box = true, _cur = true, _bs = true, _be = true, | ||
| _patches = true, | ||
| } |
There was a problem hiding this comment.
Avoid storing patch bookkeeping under the user-visible "_patches" key.
If the JSON object itself contains a "_patches" member, this collides with the internal side list: Line 291 writes the bookkeeping table there, and Line 299 deletes that same slot again when k == "_patches". Even patching some other field makes obj._patches resolve to internals instead of source data. Please move the bookkeeping to a private sentinel key/upvalue, or at minimum force this key through the materialization path instead of the splice fast path.
Also applies to: 290-299
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lua/qjson/table.lua` around lines 63 - 67, The internal bookkeeping currently
uses the user-visible key "_patches" (listed in INTERNAL_KEYS) which collides
with JSON objects that also have a "_patches" member; change the implementation
so internal patch state is stored under a private sentinel (e.g., an upvalue or
a unique table key not reachable by user keys) instead of "_patches", remove
"_patches" from INTERNAL_KEYS, and ensure any code that writes/reads
obj._patches (the place that sets the bookkeeping table and the branch that
deletes when k == "_patches") is updated to use the new sentinel or else forces
materialization through the materialization path rather than the splice fast
path so user-supplied "_patches" members are preserved.
| local function find_patch(patches, key) | ||
| for i = 1, #patches do | ||
| if patches[i].k == key then return patches[i] end | ||
| end | ||
| end |
There was a problem hiding this comment.
Match recorded patches by byte span, not just by key name.
try_record_patch resolves one concrete field occurrence and stores its bs/be, but these paths reapply the patch by k alone. On objects with duplicate keys, the splice encoder will rewrite one occurrence while walking/materialization rewrites every matching key, so the result changes based on whether descendants are dirty. Use the stored span to identify the patched entry in iteration/walking/materialization instead of string equality.
Also applies to: 183-186, 314-315, 516-519
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lua/qjson/table.lua` around lines 115 - 119, The current find_patch lookup
uses only the key name, which breaks when objects have duplicate keys; change
find_patch (and the three other lookup sites noted) to match a patch by its
recorded byte-span fields (e.g., patches[i].bs and patches[i].be) instead of
comparing k, so the lookup locates the exact occurrence recorded by
try_record_patch; update any callers that pass a key to instead pass the stored
bs/be pair (or pass the patch id) and ensure the function still returns nil when
no span match is found.
Decomposes PR #44 down to its minimum-risk slice: record scalar replacements on existing object keys in a _patches list, emit them via a splice encoder over the original buffer. All other mutation patterns fall through to today's materialization path.
Add a single new exported symbol that fuses qjson_cursor_field + qjson_cursor_bytes, saving one FFI crossing on the Lua patch-write hot path. value_out may be NULL when only the byte span is needed. The body is wrapped in the standard ffi_catch! panic barrier; a new test in tests/ffi_panic_safety.rs (gated on the test-panic feature) verifies a forced internal panic is converted to QJSON_OOM. The new proptest in tests/scanner_crosscheck.rs exercises the FFI surface on random small objects; combined with the existing scalar/avx2 indices cross-check and the CI matrix running both default-features and no-default-features, it guards against backend drift in the new path. Declarations kept in sync in include/qjson.h and lua/qjson.lua per the FFI symbol-sync rule.
Add a side _patches list on LazyObject views that records scalar replacements for keys present in the original JSON. Encode-time, the patched view is emitted via a new splice encoder that copies the original buffer with only the patched value spans rewritten — preserving whitespace and ordering. Anything that isn't a scalar-on-existing-key write (delete, new key, table value) falls through to the existing materialization path, applying any pending patches first. Reads, pairs, qjson.pairs, materialize, tostring, and #view all reflect patches. The walking encoder also consults _patches so dirty children and patches coexist. LazyArray is untouched. A subtlety: assigning to a key whose value was previously read (and therefore rawset-cached as a child proxy) bypasses __newindex entirely in LuaJIT, so the patch path cannot intercept it. The iterator and descendant-dirty check fall back to the raw slot in that case, so the result is correct but flows through the walking encoder rather than the splice fast path.
Adds a new bench line on the github-100k scenario that decodes the payload, patches five shallow scalar fields on the first issue, and re-encodes. Exercises the LazyObject splice encoder. On the synthetic 100KB GitHub fixture this currently lands at ~3.9k ops/s vs ~6.0k for the unmodified encode and ~2.2k for cjson decode.
acc9168 to
d253377
Compare
| -- Reserved bookkeeping keys; rawget-cache and iteration checks skip these. | ||
| local INTERNAL_KEYS = { | ||
| _doc = true, _cur_box = true, _cur = true, _bs = true, _be = true, | ||
| _patches = true, |
| -- Try to record a scalar patch on an existing key. Returns true if recorded. | ||
| local function try_record_patch(t, k, v) | ||
| if type(k) ~= "string" then return false end | ||
| local is_scalar = rawequal(v, _M.null) or type(v) == "string" | ||
| or type(v) == "number" or type(v) == "boolean" | ||
| if not is_scalar then return false end | ||
| local rc = C.qjson_cursor_field_bytes(t._cur, k, #k, nil, sz_a, sz_b) | ||
| if rc == QJSON_NOT_FOUND then return false end | ||
| check(rc) | ||
| local patches = rawget(t, "_patches") | ||
| if patches == nil then patches = {}; rawset(t, "_patches", patches) end | ||
| local hit = find_patch(patches, k) | ||
| if hit ~= nil then | ||
| hit.v = v | ||
| else | ||
| patches[#patches + 1] = { k = k, v = v, | ||
| bs = tonumber(sz_a[0]), be = tonumber(sz_b[0]) } | ||
| end | ||
| rawset(t, k, nil) -- drop cached child proxy so reads see the patch | ||
| return true |
| -- Mutate 5 shallow scalar fields on the first issue. All target keys | ||
| -- already exist in the source JSON, so __newindex records patch entries | ||
| -- and qjson.encode goes through the splice fast path. | ||
| local function github_modify_5_scalars(t) | ||
| local issue = t[1] | ||
| if not issue then return end | ||
| issue.id = 1234567 | ||
| issue.number = 9999 | ||
| issue.comments = 42 | ||
| issue.state = "closed" | ||
| issue.locked = true |
| // Forge a cursor whose idx_start is well past the end of the indices | ||
| // array. Any internal access via `d.indices[idx_start]` panics on | ||
| // out-of-bounds; the `ffi_catch!` wrapper around qjson_cursor_field_bytes | ||
| // must convert that into QJSON_OOM instead of unwinding across the FFI | ||
| // boundary. |
|
关闭原因:优化效果不及预期。 A/B bench(main
考虑到 Lua 侧 +95 行复杂度(_patches 状态、splice 编码器、descendant_is_dirty、cached-child fall through 等),单一 modify-scalar 工作流的 +27% 不足以摊销。如未来 production trace 表明 decode+少量标量改+encode 是显著热点,可基于本 PR 的设计文档 Issue #48 也一并关闭。 |
Summary
Lands the minimum-risk slice of PR #44 per the decomposition in #48:
record scalar replacements on existing object keys in a side
_patcheslist and emit them via a splice encoder over the original buffer. All
other mutation patterns fall through to today's materialization path
unchanged.
qjson_cursor_field_bytes(fused field lookup + bytespan), with panic barrier and three-way declaration sync (
src/ffi.rs,include/qjson.h,lua/qjson.lua).LazyObjectgains a lazily-allocated_patchesfield.__newindexrecords a patch only when (key is string) ∧ (value isscalar) ∧ (key exists in source JSON); otherwise falls through to
existing materialization, preserving any prior patches in the
transition.
__indexshort-circuits on_patcheswith one extrarawgeton the read hot path.encode_proxydecision tree gains asplice-encoder branch between the dirty-walking and unmodified-slice
branches.
qjson.decode + modify-5-scalars + qjson.encodescenario on the
github-100kfixture.Out of scope (deferred to follow-ups per #48): deletion via
t.x = nil, new-field insertion, sidecar container cache,LazyArraypatches.
Design doc:
docs/superpowers/specs/2026-05-21-lazy-scalar-patch-design.mdTest plan
cargo test --release(default features, AVX2)cargo test --release --no-default-features(scalar scanner)cargo test --features test-panic --releasemake lint(clippy-D warnings)make test(Rust + busted Lua suite, 146 successes / 0 failures)lazy_table_spec.luaandcjson_compat_spec.luapass unchangedlazy_patch_spec.lua(28 cases across 4 describe blocks)wc -l lua/qjson/table.lua: 556 → 651 (+95, within +100 budget)make benchexercises the new scenario end-to-endBench (worktree, not vs main)
Before-vs-after comparison against
mainis left for reviewers toreproduce; the read-path acceptance criterion in #48 (±3% on
parse + access fields) is structurally satisfied — only one extrarawget("_patches")short-circuit was added to the hot path.Known limitation
When a user reads a container child first (
local x = view.k, whichrawset-caches the proxy) and then assigns a scalar to that same key
(
view.k = 42), LuaJIT's__newindexdoes not fire because the slotalready holds a value. Correctness is preserved (the new value lands in
the raw slot and is honored by
pairs, encode, materialize, and directreads), but encode for this case takes the walking encoder path rather
than splice. Typical "modify shallow scalars without prior container
reads" workflows hit the splice fast path as designed.
Summary by CodeRabbit
New Features
Documentation
Tests