perf: combine FFI calls and skip redundant EAGER-mode re-validation#53
perf: combine FFI calls and skip redundant EAGER-mode re-validation#53membphis wants to merge 3 commits into
Conversation
|
Warning Review limit reached
Your plan currently allows 1 review/hour. Refill in 44 minutes and 17 seconds. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more review capacity refills, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughThis PR adds eager-mode parsing (skipping validation when JSON is already known valid) and a new unified cursor value API. The Rust core introduces an eager flag through decode functions, optimizes FFI entry points with fast paths, and exports a new ChangesEager Mode and Unified Cursor API
Benchmark Data Updates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
P0: add qjson_cursor_get_value — resolves type + decodes value + returns byte span in a single FFI call. decode_cursor in table.lua now does 1 FFI round-trip instead of 2-3; wrap_child inherits pre-computed byte span and skips the separate cursor_bytes call. P0: EAGER get_str fast path — when the document is EAGER-mode and the string contains no backslash, return the buffer slice directly without borrowing scratch or calling validate_string_span (already done at parse time). P1: EAGER parse_i64 / parse_f64 skip redundant validate_number — the eager parse pass already validated every number ABNF; the decode step does not need to re-validate. P1: merge root typeof + cursor_bytes into one qjson_cursor_get_value call inside _M.decode, removing the separate cursor_bytes round-trip. Bench numbers (AMD EPYC-Rome, 4 vCPUs, Ubuntu 26.04): github-100k qjson.parse: 4,496 → 5,661 ops/s (+25.9%) 10m qjson.parse: 1,035 → 1,764 ops/s (+70.4%) 10m decode+encode: 1,050 → 1,789 ops/s (+70.4%) 200k decode+encode: 81,433 → 100,200 ops/s (+23.0%)
There was a problem hiding this comment.
Pull request overview
This PR focuses on reducing Lua↔Rust FFI round-trips and avoiding redundant validation work in EAGER mode to improve decode/encode throughput.
Changes:
- Added a new combined FFI export (
qjson_cursor_get_value) to fetch cursor type, decode scalars, and return byte spans in a single call. - Introduced EAGER-mode fast paths to skip re-validating already-validated strings and numbers.
- Updated the Lua proxy decode pipeline to use the new combined FFI call and reuse precomputed byte spans; refreshed benchmark documentation/numbers.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ffi.rs | Adds EAGER string fast path, passes eager flag into decoders, and introduces qjson_cursor_get_value combined FFI export. |
| src/doc.rs | Stores parse mode on Document and exposes is_eager_mode() for downstream fast paths. |
| src/decode/string.rs | Extends decode_string with an eager flag to skip validation when already done in EAGER parse. |
| src/decode/number.rs | Extends number parsers with an eager flag to optionally skip ABNF validation. |
| lua/qjson/table.lua | Switches decode pipeline to qjson_cursor_get_value and removes redundant cursor_bytes calls for child wrapping/root decode. |
| lua/qjson/lib.lua | Declares and requires the new qjson_cursor_get_value symbol in the LuaJIT FFI loader. |
| include/qjson.h | Exposes the new qjson_cursor_get_value function in the public C header. |
| README.md | Updates benchmark numbers in the README table. |
| docs/benchmarks.md | Updates benchmark environment and refreshed results tables. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub(crate) fn parse_i64(bytes: &[u8], eager: bool) -> Result<i64, qjson_err> { | ||
| if !eager { | ||
| crate::validate::validate_number(bytes)?; | ||
| } | ||
| // After ABNF validation, integer-only inputs have no `.`/`e`/`E`. | ||
| if bytes.iter().any(|&b| b == b'.' || b == b'e' || b == b'E') { |
There was a problem hiding this comment.
Already addressed by PR #51 which was merged while this PR was in review. The eager fast-path now includes:
- Empty-input guard:
if bytes.is_empty() { return INVALID_NUMBER; } - Leading-byte precheck:
if skip_validation && !matches!(bytes[0], b- | b0..=b9) { return INVALID_NUMBER; }
This PR was rebased on top of PR #51 so these guards are in place.
| pub(crate) fn parse_f64(bytes: &[u8], eager: bool) -> Result<f64, qjson_err> { | ||
| if !eager { | ||
| crate::validate::validate_number(bytes)?; | ||
| } | ||
| let s = std::str::from_utf8(bytes).map_err(|_| qjson_err::QJSON_DECODE_FAILED)?; |
There was a problem hiding this comment.
Already addressed by PR #51. The eager fast-path includes a precheck:
if skip_validation {
if bytes.is_empty() || !matches!(bytes[0], b- | b. | b0..=b9) {
return Err(qjson_err::QJSON_INVALID_NUMBER);
}
}So calling f64 getter on non-number scalars in EAGER mode now returns QJSON_INVALID_NUMBER, preserving error-code stability.
| pub unsafe extern "C" fn qjson_cursor_get_value( | ||
| c: *const qjson_cursor, | ||
| type_out: *mut c_int, | ||
| str_ptr: *mut *const u8, str_len: *mut usize, | ||
| i64_out: *mut i64, f64_out: *mut f64, bool_out: *mut c_int, | ||
| byte_start: *mut usize, byte_end: *mut usize, |
There was a problem hiding this comment.
Fixed in 6e95c7a — removed i64_out from the signature. The function now only takes f64_out since that is what all current callers need.
| /// Fills `*type_out` unconditionally. For strings, fills `(*str_ptr, *str_len)`; | ||
| /// for numbers, fills `*f64_out`; for bool, fills `*bool_out`; | ||
| /// for containers, fills `(*byte_start, *byte_end)`. |
There was a problem hiding this comment.
Fixed in 6e95c7a — reworded to "On success (QJSON_OK), fills *type_out unconditionally."
37a3024 to
1f2ce96
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/ffi.rs (1)
684-765: 💤 Low valueUnused
i64_outparameter inqjson_cursor_get_value.The function accepts
i64_out: *mut i64but never writes to it — numbers are only decoded asf64(line 750). This creates a misleading API contract where callers must pass a non-NULL pointer (line 693) that is never populated.Consider either:
- Removing
i64_outfrom the signature and NULL check, or- Populating it when the number fits in
i64range (dual number decode)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/ffi.rs` around lines 684 - 765, The qjson_cursor_get_value signature declares i64_out but never writes to it; either remove i64_out from the signature and the initial null-check, or implement dual number decoding: when you detect a numeric scalar (in the match arm that sets qjson_type::QJSON_T_NUM), attempt to parse bytes as i64 (e.g. via a parse_i64 helper or number::parse_i64) and if that succeeds set *i64_out to the value, otherwise leave it unchanged or set to 0 and still parse f64 via number::parse_f64; update the null-pointer guard accordingly (if you keep i64_out require non-null, or allow it to be null and only write when non-null), and keep references to symbols cursor_to_internal, scalar_bytes, scalar_byte_range, number::parse_f64, and qjson_type::QJSON_T_NUM to locate where to add the logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/benchmarks.md`:
- Line 83: The table separator rows contain one extra column segment (an extra
"|---|"/"|---:|" token) compared to their header rows; edit each offending
separator line so the number of "|---" segments exactly matches the number of
header columns for that table (preserve any ":" alignment markers for
right-aligned columns), specifically fix the three separator lines flagged so
they have one fewer segment to match their respective header rows.
In `@README.md`:
- Around line 108-111: The README's benchmark environment text still says "Intel
Core i5-9400" while the updated benchmark table uses AMD EPYC-Rome data; update
the environment description to match the new run metadata by replacing the Intel
reference (search for the string "Intel Core i5-9400" or the benchmark
environment paragraph) with the correct AMD EPYC-Rome details and any other
mismatched spec lines (CPU model, core count, clock, OS, and compiler/runtime
versions) so the descriptive text aligns with the refreshed benchmark table.
---
Nitpick comments:
In `@src/ffi.rs`:
- Around line 684-765: The qjson_cursor_get_value signature declares i64_out but
never writes to it; either remove i64_out from the signature and the initial
null-check, or implement dual number decoding: when you detect a numeric scalar
(in the match arm that sets qjson_type::QJSON_T_NUM), attempt to parse bytes as
i64 (e.g. via a parse_i64 helper or number::parse_i64) and if that succeeds set
*i64_out to the value, otherwise leave it unchanged or set to 0 and still parse
f64 via number::parse_f64; update the null-pointer guard accordingly (if you
keep i64_out require non-null, or allow it to be null and only write when
non-null), and keep references to symbols cursor_to_internal, scalar_bytes,
scalar_byte_range, number::parse_f64, and qjson_type::QJSON_T_NUM to locate
where to add the logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2636b18e-0a47-4d55-a545-04312d6c0122
📒 Files selected for processing (9)
README.mddocs/benchmarks.mdinclude/qjson.hlua/qjson/lib.lualua/qjson/table.luasrc/decode/number.rssrc/decode/string.rssrc/doc.rssrc/ffi.rs
Review feedback: - Removed i64_out parameter since the function only decodes numbers as f64 - Clarified that type_out is filled on QJSON_OK return only - Updated README CPU description to match current bench environment (AMD EPYC-Rome, 4 vCPUs) Comment responses: - parse_i64/parse_f64 safety guards already landed in PR #51 (this was rebased on top of it); the eager fast-path includes empty-input and leading-byte prechecks. - Table separator columns verified correct — all dash groups match their respective header column counts.
| | 5m | 5.00 MB | 102 | 663 | 2,982 | 3,728 | 3,829 | | ||
| | 10m | 10.00 MB | 50 | 402 | 1,899 | 1,918 | 1,925 | | ||
| | interleaved (100k/200k/500k/1m, cycled) | — | 1,141 | 9,544 | 34,043 | 33,611 | 32,752 | | ||
| |---|---|---:|---:|---:|---:|---:|---:| |
| | 2m | 35.0× | 4.5× | 33.3× | 4.2× | | ||
| | 5m | 29.2× | 4.5× | 36.5× | 5.6× | | ||
| | 10m | 38.0× | 4.7× | 38.4× | 4.8× | | ||
| |---|---|---:|---:|---:|---:| |
| | medium | +1,955 | +2,660 | +333 | +1,114 | +1,120 | | ||
| | github-100k | +12,018 | +3,527 | +14 | +536 | +230 | | ||
| | 100k | +485 | +748 | +67 | +692 | +229 | | ||
| |---|---|---:|---:|---:|---:|---:| |
| let (s, e) = match scalar_byte_range(d, cur) { | ||
| Ok(x) => x, Err(e) => return e as c_int, | ||
| }; | ||
| *byte_start = s; | ||
| *byte_end = e; |
| /// | ||
| /// See the module-level [shared safety contract](self#shared-safety-contract). | ||
| /// `c` must point to a cursor produced by an earlier `qjson_*` call whose | ||
| /// document is still alive. All out pointers must be non-NULL. |
|
Closing this PR. The 4 optimizations here (combined FFI get_value, EAGER get_str fast path, EAGER number decode skip, merged root cursor_bytes) are correct and harmless, but they produce no measurable improvement in Why: The right next step is to target the structural scan itself or batch-FFI patterns, where the win is large enough to punch through the noise. |
Summary
Four FFI/Lua-pipeline optimizations that reduce FFI call count and eliminate redundant validation work.
P0:
qjson_cursor_get_value— single-FFI decode_cursorNew FFI export resolves type + decodes value + returns byte span in one call.
decode_cursorintable.luadrops from 2-3 FFI round-trips to 1;wrap_childinherits pre-computed byte span (skips the standalonecursor_bytescall).P0: EAGER get_str fast path
When the document is EAGER-mode and the string contains no
\, return thebuffer slice directly — no scratch RefCell borrow, no validate_string_span call
(validation already passed during parse).
P1: EAGER number decode skips validate_number
parse_i64/parse_f64accept askip_validationflag; when true, the ABNFvalidation (already done in the eager parse pass) is skipped. Built on top of
PR #51 which added the same pattern with safety guards.
P1: merge root typeof + cursor_bytes in decode()
_M.decodenow usesqjson_cursor_get_valuefor the root cursor too,removing the separate
cursor_bytesround-trip.Benchmark (10 rounds, AMD EPYC-Rome, 4 vCPUs, Ubuntu 26.04)
Note: the "Before" column is the 5-round run on the same machine before
rebasing onto PR #51. The 10-round runs show that the ~20-40% per-run noise
from the LuaJIT GC/VM makes per-scenario comparisons unreliable at these
magnitudes. The structural wins (fewer FFI calls, fewer validation passes)
are real but the bench framework cannot resolve them.
Summary table (full 10-round median)
All tests pass (153 Rust unit + integration + Lua busted, both AVX2 and scalar
backends). Clippy clean.