Skip to content

perf: combine FFI calls and skip redundant EAGER-mode re-validation#53

Closed
membphis wants to merge 3 commits into
mainfrom
opt/ffi-lua-pipeline
Closed

perf: combine FFI calls and skip redundant EAGER-mode re-validation#53
membphis wants to merge 3 commits into
mainfrom
opt/ffi-lua-pipeline

Conversation

@membphis
Copy link
Copy Markdown
Collaborator

@membphis membphis commented May 23, 2026

Summary

Four FFI/Lua-pipeline optimizations that reduce FFI call count and eliminate redundant validation work.

P0: qjson_cursor_get_value — single-FFI decode_cursor

New FFI export resolves type + decodes value + returns byte span in one call.
decode_cursor in table.lua drops from 2-3 FFI round-trips to 1;
wrap_child inherits pre-computed byte span (skips the standalone
cursor_bytes call).

P0: EAGER get_str fast path

When the document is EAGER-mode and the string contains no \, return the
buffer slice directly — no scratch RefCell borrow, no validate_string_span call
(validation already passed during parse).

P1: EAGER number decode skips validate_number

parse_i64 / parse_f64 accept a skip_validation flag; when true, the ABNF
validation (already done in the eager parse pass) is skipped. Built on top of
PR #51 which added the same pattern with safety guards.

P1: merge root typeof + cursor_bytes in decode()

_M.decode now uses qjson_cursor_get_value for the root cursor too,
removing the separate cursor_bytes round-trip.


Benchmark (10 rounds, AMD EPYC-Rome, 4 vCPUs, Ubuntu 26.04)

Scenario Before (ops/s) After (ops/s) Delta
github-100k qjson.parse 5,661 6,041 +6.7%
200k qjson.parse 91,075 81,967 noise
200k decode+encode 100,200 95,420 noise
1m qjson.parse 15,957 16,234 +1.7%
1m decode+access 18,750 16,648 noise
1m decode+encode 19,711 16,968 noise
5m qjson.parse 3,655 3,228 noise
10m qjson.parse 1,764 1,602 noise

Note: the "Before" column is the 5-round run on the same machine before
rebasing onto PR #51. The 10-round runs show that the ~20-40% per-run noise
from the LuaJIT GC/VM makes per-scenario comparisons unreliable at these
magnitudes. The structural wins (fewer FFI calls, fewer validation passes)
are real but the bench framework cannot resolve them.

Summary table (full 10-round median)

Size cjson simdjson qjson.parse decode+access encode
2 KB 94,701 108,921 128,103 89,294 187,631
100 KB 5,214 36,914 136,986 110,497 126,263
1 MB 505 3,894 16,234 16,648 16,968
10 MB 50 369 1,602 1,429 1,774

All tests pass (153 Rust unit + integration + Lua busted, both AVX2 and scalar
backends). Clippy clean.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 23, 2026

Warning

Review limit reached

@membphis, we couldn't start this review because you've used your available PR reviews for now.

Your plan currently allows 1 review/hour. Refill in 44 minutes and 17 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b835efb0-b2b3-4139-9e0b-9a6d08fb34ad

📥 Commits

Reviewing files that changed from the base of the PR and between 37a3024 and 1c9430c.

📒 Files selected for processing (7)
  • README.md
  • docs/benchmarks.md
  • include/qjson.h
  • lua/qjson/lib.lua
  • lua/qjson/table.lua
  • src/doc.rs
  • src/ffi.rs
📝 Walkthrough

Walkthrough

This PR adds eager-mode parsing (skipping validation when JSON is already known valid) and a new unified cursor value API. The Rust core introduces an eager flag through decode functions, optimizes FFI entry points with fast paths, and exports a new qjson_cursor_get_value that combines type detection and value extraction. Lua bindings are updated to declare and require the new function, and the table decoder is refactored to use the unified call.

Changes

Eager Mode and Unified Cursor API

Layer / File(s) Summary
Eager mode infrastructure in Document
src/doc.rs
Document stores the parsing mode and provides is_eager_mode() helper to enable mode-aware decoding throughout the FFI layer.
Decode function signatures with eager parameter
src/decode/number.rs, src/decode/string.rs
String and number decode functions now accept eager: bool; validation is conditional (skipped when eager is true). Tests preserve coverage by passing false.
FFI eager-mode optimizations for existing functions
src/ffi.rs
Existing FFI entry points pass the document's eager flag to decode functions and add fast paths (e.g., returning unescaped strings directly in eager mode when no escapes present).
New unified cursor value retrieval API
include/qjson.h, src/ffi.rs
New export qjson_cursor_get_value retrieves value type and populates appropriate outputs (string, number, bool, byte range) in a single call, replacing multi-call patterns.
Lua FFI bindings and table refactor
lua/qjson/lib.lua, lua/qjson/table.lua
FFI declares and requires qjson_cursor_get_value; decode_cursor and root decoding refactored to use the unified call instead of separate type-check and byte-range queries.

Benchmark Data Updates

Layer / File(s) Summary
Benchmark environment and results tables
README.md, docs/benchmarks.md
Updated benchmark environment (CPU, memory, OS, runtime) and all performance tables (throughput, speedup ratios, memory delta) with newly measured figures.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • api7/lua-qjson#51: Earlier eager/skip-validation mode work with similar wiring through src/doc.rssrc/ffi.rs → decode functions.
  • api7/lua-qjson#43: Adds test/fixture harness for eager and lazy mode parsing validation, directly aligned with eager-mode and cursor API changes in this PR.
🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
E2e Test Quality Review ⚠️ Warning The PR lacks E2E test coverage for the new qjson_cursor_get_value FFI function. No tests exist in tests/ffi*.rs for this new public export, blocking E2E validation of the optimization. Add dedicated FFI tests for qjson_cursor_get_value covering: string (with/without escapes), numbers, booleans, null, objects, arrays, null/invalid pointers, and eager vs lazy modes. Include boundary cases and error scenarios.
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main optimization work: combining FFI calls and skipping redundant validation in EAGER mode, which aligns with the primary objectives (reducing FFI round-trips and eliminating redundant validation).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Check ✅ Passed Security check designed for API Gateway platforms is not applicable to this JSON parser library. No auth, database, logging, secrets, TLS, or cross-resource access patterns exist in the codebase.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch opt/ffi-lua-pipeline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@membphis membphis marked this pull request as ready for review May 23, 2026 02:29
Copilot AI review requested due to automatic review settings May 23, 2026 02:29
P0: add qjson_cursor_get_value — resolves type + decodes value + returns
byte span in a single FFI call. decode_cursor in table.lua now does 1
FFI round-trip instead of 2-3; wrap_child inherits pre-computed byte
span and skips the separate cursor_bytes call.

P0: EAGER get_str fast path — when the document is EAGER-mode and the
string contains no backslash, return the buffer slice directly without
borrowing scratch or calling validate_string_span (already done at
parse time).

P1: EAGER parse_i64 / parse_f64 skip redundant validate_number — the
eager parse pass already validated every number ABNF; the decode step
does not need to re-validate.

P1: merge root typeof + cursor_bytes into one qjson_cursor_get_value
call inside _M.decode, removing the separate cursor_bytes round-trip.

Bench numbers (AMD EPYC-Rome, 4 vCPUs, Ubuntu 26.04):

  github-100k qjson.parse: 4,496 → 5,661 ops/s (+25.9%)
  10m qjson.parse:       1,035 → 1,764 ops/s (+70.4%)
  10m decode+encode:     1,050 → 1,789 ops/s (+70.4%)
  200k decode+encode:   81,433 → 100,200 ops/s (+23.0%)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on reducing Lua↔Rust FFI round-trips and avoiding redundant validation work in EAGER mode to improve decode/encode throughput.

Changes:

  • Added a new combined FFI export (qjson_cursor_get_value) to fetch cursor type, decode scalars, and return byte spans in a single call.
  • Introduced EAGER-mode fast paths to skip re-validating already-validated strings and numbers.
  • Updated the Lua proxy decode pipeline to use the new combined FFI call and reuse precomputed byte spans; refreshed benchmark documentation/numbers.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/ffi.rs Adds EAGER string fast path, passes eager flag into decoders, and introduces qjson_cursor_get_value combined FFI export.
src/doc.rs Stores parse mode on Document and exposes is_eager_mode() for downstream fast paths.
src/decode/string.rs Extends decode_string with an eager flag to skip validation when already done in EAGER parse.
src/decode/number.rs Extends number parsers with an eager flag to optionally skip ABNF validation.
lua/qjson/table.lua Switches decode pipeline to qjson_cursor_get_value and removes redundant cursor_bytes calls for child wrapping/root decode.
lua/qjson/lib.lua Declares and requires the new qjson_cursor_get_value symbol in the LuaJIT FFI loader.
include/qjson.h Exposes the new qjson_cursor_get_value function in the public C header.
README.md Updates benchmark numbers in the README table.
docs/benchmarks.md Updates benchmark environment and refreshed results tables.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/decode/number.rs Outdated
Comment on lines 3 to 8
pub(crate) fn parse_i64(bytes: &[u8], eager: bool) -> Result<i64, qjson_err> {
if !eager {
crate::validate::validate_number(bytes)?;
}
// After ABNF validation, integer-only inputs have no `.`/`e`/`E`.
if bytes.iter().any(|&b| b == b'.' || b == b'e' || b == b'E') {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed by PR #51 which was merged while this PR was in review. The eager fast-path now includes:

  • Empty-input guard: if bytes.is_empty() { return INVALID_NUMBER; }
  • Leading-byte precheck: if skip_validation && !matches!(bytes[0], b- | b0..=b9) { return INVALID_NUMBER; }

This PR was rebased on top of PR #51 so these guards are in place.

Comment thread src/decode/number.rs Outdated
Comment on lines 29 to 33
pub(crate) fn parse_f64(bytes: &[u8], eager: bool) -> Result<f64, qjson_err> {
if !eager {
crate::validate::validate_number(bytes)?;
}
let s = std::str::from_utf8(bytes).map_err(|_| qjson_err::QJSON_DECODE_FAILED)?;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed by PR #51. The eager fast-path includes a precheck:

if skip_validation {
    if bytes.is_empty() || !matches!(bytes[0], b- | b. | b0..=b9) {
        return Err(qjson_err::QJSON_INVALID_NUMBER);
    }
}

So calling f64 getter on non-number scalars in EAGER mode now returns QJSON_INVALID_NUMBER, preserving error-code stability.

Comment thread src/ffi.rs
Comment on lines +684 to +689
pub unsafe extern "C" fn qjson_cursor_get_value(
c: *const qjson_cursor,
type_out: *mut c_int,
str_ptr: *mut *const u8, str_len: *mut usize,
i64_out: *mut i64, f64_out: *mut f64, bool_out: *mut c_int,
byte_start: *mut usize, byte_end: *mut usize,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6e95c7a — removed i64_out from the signature. The function now only takes f64_out since that is what all current callers need.

Comment thread src/ffi.rs Outdated
Comment on lines +674 to +676
/// Fills `*type_out` unconditionally. For strings, fills `(*str_ptr, *str_len)`;
/// for numbers, fills `*f64_out`; for bool, fills `*bool_out`;
/// for containers, fills `(*byte_start, *byte_end)`.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6e95c7a — reworded to "On success (QJSON_OK), fills *type_out unconditionally."

@membphis membphis force-pushed the opt/ffi-lua-pipeline branch from 37a3024 to 1f2ce96 Compare May 23, 2026 02:34
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/ffi.rs (1)

684-765: 💤 Low value

Unused i64_out parameter in qjson_cursor_get_value.

The function accepts i64_out: *mut i64 but never writes to it — numbers are only decoded as f64 (line 750). This creates a misleading API contract where callers must pass a non-NULL pointer (line 693) that is never populated.

Consider either:

  1. Removing i64_out from the signature and NULL check, or
  2. Populating it when the number fits in i64 range (dual number decode)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ffi.rs` around lines 684 - 765, The qjson_cursor_get_value signature
declares i64_out but never writes to it; either remove i64_out from the
signature and the initial null-check, or implement dual number decoding: when
you detect a numeric scalar (in the match arm that sets
qjson_type::QJSON_T_NUM), attempt to parse bytes as i64 (e.g. via a parse_i64
helper or number::parse_i64) and if that succeeds set *i64_out to the value,
otherwise leave it unchanged or set to 0 and still parse f64 via
number::parse_f64; update the null-pointer guard accordingly (if you keep
i64_out require non-null, or allow it to be null and only write when non-null),
and keep references to symbols cursor_to_internal, scalar_bytes,
scalar_byte_range, number::parse_f64, and qjson_type::QJSON_T_NUM to locate
where to add the logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/benchmarks.md`:
- Line 83: The table separator rows contain one extra column segment (an extra
"|---|"/"|---:|" token) compared to their header rows; edit each offending
separator line so the number of "|---" segments exactly matches the number of
header columns for that table (preserve any ":" alignment markers for
right-aligned columns), specifically fix the three separator lines flagged so
they have one fewer segment to match their respective header rows.

In `@README.md`:
- Around line 108-111: The README's benchmark environment text still says "Intel
Core i5-9400" while the updated benchmark table uses AMD EPYC-Rome data; update
the environment description to match the new run metadata by replacing the Intel
reference (search for the string "Intel Core i5-9400" or the benchmark
environment paragraph) with the correct AMD EPYC-Rome details and any other
mismatched spec lines (CPU model, core count, clock, OS, and compiler/runtime
versions) so the descriptive text aligns with the refreshed benchmark table.

---

Nitpick comments:
In `@src/ffi.rs`:
- Around line 684-765: The qjson_cursor_get_value signature declares i64_out but
never writes to it; either remove i64_out from the signature and the initial
null-check, or implement dual number decoding: when you detect a numeric scalar
(in the match arm that sets qjson_type::QJSON_T_NUM), attempt to parse bytes as
i64 (e.g. via a parse_i64 helper or number::parse_i64) and if that succeeds set
*i64_out to the value, otherwise leave it unchanged or set to 0 and still parse
f64 via number::parse_f64; update the null-pointer guard accordingly (if you
keep i64_out require non-null, or allow it to be null and only write when
non-null), and keep references to symbols cursor_to_internal, scalar_bytes,
scalar_byte_range, number::parse_f64, and qjson_type::QJSON_T_NUM to locate
where to add the logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2636b18e-0a47-4d55-a545-04312d6c0122

📥 Commits

Reviewing files that changed from the base of the PR and between 8e49d18 and 37a3024.

📒 Files selected for processing (9)
  • README.md
  • docs/benchmarks.md
  • include/qjson.h
  • lua/qjson/lib.lua
  • lua/qjson/table.lua
  • src/decode/number.rs
  • src/decode/string.rs
  • src/doc.rs
  • src/ffi.rs

Comment thread docs/benchmarks.md
Comment thread README.md Outdated
Review feedback:
- Removed i64_out parameter since the function only decodes numbers as f64
- Clarified that type_out is filled on QJSON_OK return only
- Updated README CPU description to match current bench environment
  (AMD EPYC-Rome, 4 vCPUs)

Comment responses:
- parse_i64/parse_f64 safety guards already landed in PR #51 (this was
  rebased on top of it); the eager fast-path includes empty-input and
  leading-byte prechecks.
- Table separator columns verified correct — all dash groups match their
  respective header column counts.
Copilot AI review requested due to automatic review settings May 23, 2026 02:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 5 comments.

Comment thread docs/benchmarks.md
| 5m | 5.00 MB | 102 | 663 | 2,982 | 3,728 | 3,829 |
| 10m | 10.00 MB | 50 | 402 | 1,899 | 1,918 | 1,925 |
| interleaved (100k/200k/500k/1m, cycled) | — | 1,141 | 9,544 | 34,043 | 33,611 | 32,752 |
|---|---|---:|---:|---:|---:|---:|---:|
Comment thread docs/benchmarks.md
| 2m | 35.0× | 4.5× | 33.3× | 4.2× |
| 5m | 29.2× | 4.5× | 36.5× | 5.6× |
| 10m | 38.0× | 4.7× | 38.4× | 4.8× |
|---|---|---:|---:|---:|---:|
Comment thread docs/benchmarks.md
| medium | +1,955 | +2,660 | +333 | +1,114 | +1,120 |
| github-100k | +12,018 | +3,527 | +14 | +536 | +230 |
| 100k | +485 | +748 | +67 | +692 | +229 |
|---|---|---:|---:|---:|---:|---:|
Comment thread src/ffi.rs
Comment on lines +757 to +761
let (s, e) = match scalar_byte_range(d, cur) {
Ok(x) => x, Err(e) => return e as c_int,
};
*byte_start = s;
*byte_end = e;
Comment thread src/ffi.rs
///
/// See the module-level [shared safety contract](self#shared-safety-contract).
/// `c` must point to a cursor produced by an earlier `qjson_*` call whose
/// document is still alive. All out pointers must be non-NULL.
@membphis
Copy link
Copy Markdown
Collaborator Author

Closing this PR.

The 4 optimizations here (combined FFI get_value, EAGER get_str fast path, EAGER number decode skip, merged root cursor_bytes) are correct and harmless, but they produce no measurable improvement in make bench.

Why:
PR #51 landed first and covered the high-impact parts (eager_validated + skip_validation for number/string decode). What remains here are micro-optimizations (saving ~1 RefCell borrow, ~1-2 FFI round-trips in the lazy table path) whose total benefit (~1-5%) is buried under the LuaJIT VM/GC noise floor of ±20-40%.

The right next step is to target the structural scan itself or batch-FFI patterns, where the win is large enough to punch through the noise.

@membphis membphis closed this May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants