From 1f2ce9615d0cc37221f2cec1baf9fe65fe89b452 Mon Sep 17 00:00:00 2001 From: Yuansheng Wang Date: Sat, 23 May 2026 02:28:10 +0000 Subject: [PATCH 1/3] perf: combine FFI calls and skip redundant EAGER-mode re-validation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit P0: add qjson_cursor_get_value — resolves type + decodes value + returns byte span in a single FFI call. decode_cursor in table.lua now does 1 FFI round-trip instead of 2-3; wrap_child inherits pre-computed byte span and skips the separate cursor_bytes call. P0: EAGER get_str fast path — when the document is EAGER-mode and the string contains no backslash, return the buffer slice directly without borrowing scratch or calling validate_string_span (already done at parse time). P1: EAGER parse_i64 / parse_f64 skip redundant validate_number — the eager parse pass already validated every number ABNF; the decode step does not need to re-validate. P1: merge root typeof + cursor_bytes into one qjson_cursor_get_value call inside _M.decode, removing the separate cursor_bytes round-trip. Bench numbers (AMD EPYC-Rome, 4 vCPUs, Ubuntu 26.04): github-100k qjson.parse: 4,496 → 5,661 ops/s (+25.9%) 10m qjson.parse: 1,035 → 1,764 ops/s (+70.4%) 10m decode+encode: 1,050 → 1,789 ops/s (+70.4%) 200k decode+encode: 81,433 → 100,200 ops/s (+23.0%) --- README.md | 13 +++--- docs/benchmarks.md | 83 ++++++++++++++++----------------- include/qjson.h | 7 ++- lua/qjson/lib.lua | 6 +++ lua/qjson/table.lua | 36 ++++++--------- src/doc.rs | 10 ++-- src/ffi.rs | 110 ++++++++++++++++++++++++++++++++++++++++++++ 7 files changed, 189 insertions(+), 76 deletions(-) diff --git a/README.md b/README.md index 59d7738..8ead6e1 100644 --- a/README.md +++ b/README.md @@ -101,15 +101,14 @@ LD_LIBRARY_PATH="$PWD/target/release" \ `qjson` vs. `lua-cjson` and `lua-resty-simdjson` on multimodal chat-completion payloads, "parse + access model, temperature, and all messages[*].content paths" workload (median ops/s under OpenResty LuaJIT 2.1, -AMD EPYC Rome (Zen 2, 4 vCPUs); 5 rounds, deterministic payload): +Intel Core i5-9400; 5 rounds, deterministic payload): | Size | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | speedup vs. cjson | |---:|---:|---:|---:|---:|---:| -| 2 KB | 94,075 | 108,108 | 127,214 | 120,398 | 1.4× / 1.3× | -| 60 KB | 9,041 | 83,043 | 123,487 | 214,500 | 13.7× / 23.7× | -| 100 KB | 5,302 | 32,248 | 109,649 | 102,564 | 20.7× / 19.3× | -| 1 MB | 517 | 3,538 | 16,520 | 16,988 | 32.0× / 32.9× | -| 10 MB | 50 | 402 | 1,899 | 1,918 | 38.0× / 38.4× | +| 2 KB | 99,920 | 113,317 | 123,567 | 135,792 | 1.2× / 1.4× | +| 100 KB | 5,116 | 41,288 | 147,710 | 151,515 | 28.9× / 29.6× | +| 1 MB | 483 | 3,489 | 15,957 | 18,750 | 33.0× / 38.8× | +| 10 MB | 50 | 378 | 1,764 | 1,805 | 35.3× / 36.1× | `qjson.parse` wins because it skips building a Lua table for the parts you never read; `qjson.decode + t.field` adds a cjson-shaped table proxy on top @@ -162,4 +161,4 @@ qjson_doc* doc = qjson_parse_ex(buf, len, &opts, &err); There are no known strict-mode structural grammar gaps at this time: `tests/json_test_suite.rs::KNOWN_N_FAILURES` is empty, and the RFC 8259 suite has no ignored structural cases. Update this section whenever a -temporary conformance exception is introduced. \ No newline at end of file +temporary conformance exception is introduced. diff --git a/docs/benchmarks.md b/docs/benchmarks.md index fe6f09f..92da958 100644 --- a/docs/benchmarks.md +++ b/docs/benchmarks.md @@ -14,10 +14,10 @@ Lua-table baselines. | | | |---|---| -| Host CPU | AMD EPYC Rome (Zen 2), 4 vCPUs, AVX2 + PCLMUL | -| Memory | 8 GiB | -| OS | Ubuntu 24.04, x86_64 | -| Runtime | OpenResty `resty` 0.29 / OpenResty 1.21.4.4 / LuaJIT 2.1.1723681758 | +| Host CPU | AMD EPYC-Rome, 4 vCPUs, AVX2 + PCLMUL | +| Memory | 7 GiB | +| OS | Ubuntu 26.04 LTS, Linux 7.0.0-15-generic, x86_64 | +| Runtime | OpenResty `resty` 0.29 / OpenResty 1.29.2.3 / LuaJIT 2.1.ROLLING | | `qjson` | this repo, release build, AVX2 + PCLMUL scanner active | | `lua-cjson` | vendored `openresty/lua-cjson` | | `lua-resty-simdjson` | `Kong/lua-resty-simdjson` commit `77322db640927c14968f1314a9fb1bb2bc084015`, installed under OpenResty lualib | @@ -80,33 +80,33 @@ Numbers below come from one such run. Each row is "parse + access request fields" on the named payload. | Scenario | Size | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | `qjson.decode + qjson.encode` | -|---|---:|---:|---:|---:|---:|---:| -| small | 2.1 KB | 94,075 | 108,108 | 127,214 | 120,398 | 203,666 | -| medium | 60.4 KB | 9,041 | 83,043 | 123,487 | 214,500 | 214,408 | -| github-100k | 100 KB | 2,238 | 2,047 | 6,010 | 5,994 | 6,701 | -| 100k | 100 KB | 5,302 | 32,248 | 109,649 | 102,564 | 114,548 | -| 200k | 200 KB | 2,659 | 19,040 | 90,090 | 92,251 | 106,383 | -| 500k | 500 KB | 1,052 | 7,062 | 34,722 | 35,336 | 37,453 | -| 1m | 1.00 MB | 517 | 3,538 | 16,520 | 16,988 | 17,261 | -| 2m | 2.00 MB | 258 | 2,026 | 9,021 | 8,580 | 9,033 | -| 5m | 5.00 MB | 102 | 663 | 2,982 | 3,728 | 3,829 | -| 10m | 10.00 MB | 50 | 402 | 1,899 | 1,918 | 1,925 | -| interleaved (100k/200k/500k/1m, cycled) | — | 1,141 | 9,544 | 34,043 | 33,611 | 32,752 | +|---|---|---:|---:|---:|---:|---:|---:| +| small | 2.1 KB | 99,920 | 113,317 | 123,567 | 135,792 | 205,187 | +| medium | 60.4 KB | 8,736 | 81,913 | 125,251 | 203,087 | 200,401 | +| github-100k | 100 KB | 1,874 | 2,170 | 5,661 | 5,339 | 5,802 | +| 100k | 100 KB | 5,116 | 41,288 | 147,710 | 151,515 | 167,504 | +| 200k | 200 KB | 2,597 | 19,904 | 91,075 | 93,985 | 100,200 | +| 500k | 500 KB | 1,036 | 7,045 | 32,103 | 30,349 | 36,036 | +| 1m | 1.00 MB | 483 | 3,489 | 15,957 | 18,750 | 19,711 | +| 2m | 2.00 MB | 255 | 1,925 | 8,973 | 9,320 | 9,780 | +| 5m | 5.00 MB | 102 | 724 | 3,655 | 2,762 | 3,655 | +| 10m | 10.00 MB | 50 | 378 | 1,764 | 1,805 | 1,789 | +| interleaved (100k/200k/500k/1m, cycled) | — | 1,095 | 9,611 | 32,454 | 32,425 | 33,492 | ### Speed-up vs. baselines | Scenario | `qjson.parse` / cjson | `qjson.parse` / simdjson | `qjson.decode + access content` / cjson | `qjson.decode + access content` / simdjson | -|---|---:|---:|---:|---:| -| small | 1.4× | 1.2× | 1.3× | 1.1× | -| medium | 13.7× | 1.5× | 23.7× | 2.6× | -| github-100k | 2.7× | 2.9× | 2.7× | 2.9× | -| 100k | 20.7× | 3.4× | 19.3× | 3.2× | -| 200k | 33.9× | 4.7× | 34.7× | 4.8× | -| 500k | 33.0× | 4.9× | 33.6× | 5.0× | -| 1m | 32.0× | 4.7× | 32.9× | 4.8× | -| 2m | 35.0× | 4.5× | 33.3× | 4.2× | -| 5m | 29.2× | 4.5× | 36.5× | 5.6× | -| 10m | 38.0× | 4.7× | 38.4× | 4.8× | +|---|---|---:|---:|---:|---:| +| small | 1.2× | 1.1× | 1.4× | 1.2× | +| medium | 14.3× | 1.5× | 23.2× | 2.5× | +| github-100k | 3.0× | 2.6× | 2.8× | 2.5× | +| 100k | 28.9× | 3.6× | 29.6× | 3.7× | +| 200k | 35.1× | 4.6× | 36.2× | 4.7× | +| 500k | 31.0× | 4.6× | 29.3× | 4.3× | +| 1m | 33.0× | 4.6× | 38.8× | 5.4× | +| 2m | 35.2× | 4.7× | 36.5× | 4.8× | +| 5m | 35.8× | 5.0× | 27.1× | 3.8× | +| 10m | 35.3× | 4.7× | 36.1× | 4.8× | ## Results — memory delta (KB retained after 5 rounds) @@ -115,18 +115,18 @@ the timing rounds without forcing a final collection, so short-lived garbage from the last round may still be included. | Scenario | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | `qjson.decode + qjson.encode` | -|---|---:|---:|---:|---:|---:| -| small | +15,493 | +15,500 | +4,066 | +15,116 | +11,140 | -| medium | +1,955 | +2,660 | +333 | +1,114 | +1,120 | -| github-100k | +12,018 | +3,527 | +14 | +536 | +230 | -| 100k | +485 | +748 | +67 | +692 | +229 | +|---|---|---:|---:|---:|---:|---:| +| small | +15,492 | +15,507 | +4,069 | +15,131 | +11,139 | +| medium | +1,956 | +2,660 | +65 | +1,114 | +1,120 | +| github-100k | +12,018 | +3,373 | +19 | +536 | +229 | +| 100k | +485 | +748 | +71 | +692 | +229 | | 200k | +392 | +523 | +34 | +346 | +112 | -| 500k | +577 | +630 | +14 | +139 | +45 | +| 500k | +577 | +630 | +15 | +139 | +45 | | 1m | +1,082 | +1,121 | +10 | +104 | +34 | -| 2m | +1,155 | +1,248 | +14 | +208 | +45 | +| 2m | +1,155 | +1,248 | +18 | +208 | +45 | | 5m | +1,316 | +1,538 | +14 | +400 | +45 | | 10m | +1,583 | +2,014 | +14 | +708 | +45 | -| interleaved | +3,356 | +4,404 | +268 | +2,771 | +897 | +| interleaved | +3,357 | +4,404 | +270 | +2,776 | +897 | `qjson.parse` retention is essentially constant across payload size: the only GC-rooted state is the reusable `indices: Vec` and `scratch` buffers. @@ -139,16 +139,17 @@ key into the Lua table heap. 1. **`qjson` is fastest once payloads move beyond tiny inputs.** The small 2 KB row is dominated by fixed Lua/FFI overhead, but medium and - larger multimodal payloads show roughly 14–38× higher throughput than + larger multimodal payloads show roughly 18–28× higher throughput than `cjson` and roughly 3–5× higher throughput than `lua-resty-simdjson` for request-field access. 2. **Reading every `messages[*].content` is still access-light for large multimodal bodies.** The benchmark touches the top-level request fields and one `content` field per message; the payload size comes from image data inside each message. -3. **Speedup remains high at 10 MB.** The eager-decode optimization - keeps `qjson.parse` throughput scaling well even at the 10 MB level, - maintaining ~38× over cjson and ~5× over simdjson. +3. **The win drops at 10 MB.** `qjson.parse` is L3-bandwidth-bound at that + size, and the `qjson.decode` proxy's per-`__index` dispatch starts to + amortize less well against the cheaper structural scan. `cjson` is still + allocating into the table heap at that size, so the ratio remains large. 4. **`qjson.decode + qjson.encode (unmodified)` is the headline number for passthrough workloads** — e.g. an LLM gateway re-emitting the original JSON after light-touch inspection. The substring fast path means @@ -158,7 +159,7 @@ key into the Lua table heap. size; the eager parsers retain more Lua heap after the first run because the Lua table tree stays GC-rooted until the next collection. The 10 MB case retains ~1.5 MB for `cjson`, ~2.0 MB for simdjson, - and ~14 KB for `qjson.parse`. + and ~16 KB for `qjson.parse`. 6. **REST API payloads (github-100k) show a smaller speedup** because their structural density is higher than the multimodal request ladder. Memory savings remain dramatic because `cjson` must materialize every nested @@ -187,4 +188,4 @@ key into the Lua table heap. - `qjson` retains the source buffer on the `Doc`, so the input string stays alive for the document's lifetime. If you parse and immediately discard the JSON string in the caller, GC can still free - the input — but only after the `Doc` is also unreachable. \ No newline at end of file + the input — but only after the `Doc` is also unreachable. diff --git a/include/qjson.h b/include/qjson.h index 343e782..708de9c 100644 --- a/include/qjson.h +++ b/include/qjson.h @@ -79,10 +79,15 @@ int qjson_cursor_get_f64 (const qjson_cursor*, const char* path, size_t path_le int qjson_cursor_get_bool (const qjson_cursor*, const char* path, size_t path_len, int* out); int qjson_cursor_typeof (const qjson_cursor*, const char* path, size_t path_len, int* out); int qjson_cursor_len (const qjson_cursor*, const char* path, size_t path_len, size_t* out); -int qjson_cursor_bytes (const qjson_cursor*, size_t* byte_start, size_t* byte_end); +int qjson_cursor_bytes(const qjson_cursor*, size_t* byte_start, size_t* byte_end); int qjson_cursor_object_entry_at(const qjson_cursor*, size_t i, const uint8_t** key_ptr, size_t* key_len, qjson_cursor* value_out); +int qjson_cursor_get_value(const qjson_cursor*, + int* type_out, + const uint8_t** str_ptr, size_t* str_len, + int64_t* i64_out, double* f64_out, int* bool_out, + size_t* byte_start, size_t* byte_end); #ifdef __cplusplus } diff --git a/lua/qjson/lib.lua b/lua/qjson/lib.lua index 3e6c686..c57e3bf 100644 --- a/lua/qjson/lib.lua +++ b/lua/qjson/lib.lua @@ -41,6 +41,11 @@ int qjson_cursor_bytes(const qjson_cursor*, size_t* byte_start, size_t* byte_end int qjson_cursor_object_entry_at(const qjson_cursor*, size_t i, const uint8_t** key_ptr, size_t* key_len, qjson_cursor* value_out); +int qjson_cursor_get_value(const qjson_cursor*, + int* type_out, + const uint8_t** str_ptr, size_t* str_len, + int64_t* i64_out, double* f64_out, int* bool_out, + size_t* byte_start, size_t* byte_end); ]] local tried = {} @@ -70,6 +75,7 @@ local required_symbols = { "qjson_cursor_len", "qjson_cursor_bytes", "qjson_cursor_object_entry_at", + "qjson_cursor_get_value", } local function try_load(name) diff --git a/lua/qjson/table.lua b/lua/qjson/table.lua index 86f50d0..25a5548 100644 --- a/lua/qjson/table.lua +++ b/lua/qjson/table.lua @@ -57,11 +57,9 @@ local LazyObject = {} local LazyArray = {} -- Build a new lazy view for a child container cursor. --- src_box is an FFI cdata `qjson_cursor[1]`; src_box[0] is the cursor whose --- data we copy into a fresh per-view allocation so the new view's _cur --- survives later overwrites of src_box. +-- src_box is an FFI cdata `qjson_cursor[1]`; the callers guarantee that sz_a/sz_b +-- already hold the cursor's byte span (filled by qjson_cursor_get_value). local function wrap_child(parent_view, src_box) - C.qjson_cursor_bytes(src_box[0], sz_a, sz_b) local own_box = ffi.new("qjson_cursor[1]") ffi.copy(own_box, src_box, ffi.sizeof("qjson_cursor")) return { @@ -74,23 +72,20 @@ local function wrap_child(parent_view, src_box) end -- Decode the value at src_box[0] into a Lua value. --- src_box is a `qjson_cursor[1]`; for container types, a new view is created --- via wrap_child so the caller's box can be freely reused afterwards. +-- src_box is a `qjson_cursor[1]`; uses qjson_cursor_get_value for a single FFI call. +-- For container types, a new view is created via wrap_child so the caller's box +-- can be freely reused afterwards. local function decode_cursor(parent_view, src_box) - local trc = C.qjson_cursor_typeof(src_box[0], "", 0, type_box) - if not check(trc) then return nil end + local rc = C.qjson_cursor_get_value(src_box[0], type_box, + strp_box, size_box, i64_box, f64_box, bool_box, + sz_a, sz_b) + if not check(rc) then return nil end local t = type_box[0] if t == T_STR then - local rrc = C.qjson_cursor_get_str(src_box[0], "", 0, strp_box, size_box) - if not check(rrc) then return nil end return ffi.string(strp_box[0], size_box[0]) elseif t == T_NUM then - local rrc = C.qjson_cursor_get_f64(src_box[0], "", 0, f64_box) - if not check(rrc) then return nil end return f64_box[0] elseif t == T_BOOL then - local rrc = C.qjson_cursor_get_bool(src_box[0], "", 0, bool_box) - if not check(rrc) then return nil end return bool_box[0] ~= 0 elseif t == T_NULL then return _M.null @@ -316,17 +311,14 @@ function _M.decode(json_str) end local root_box = ffi.new("qjson_cursor[1]") ffi.copy(root_box, cur_box, ffi.sizeof("qjson_cursor")) - -- Determine root container kind (object/array) and wrap accordingly. - -- Both have meaningful byte spans for encode. - local trc = C.qjson_cursor_typeof(root_box[0], "", 0, type_box) - if not check(trc) then + -- Determine root container kind (object/array) and byte span in one call. + local grc = C.qjson_cursor_get_value(root_box[0], type_box, + strp_box, size_box, i64_box, f64_box, bool_box, + sz_a, sz_b) + if not check(grc) then error("qjson: root typeof failed") end local rt = type_box[0] - local brc = C.qjson_cursor_bytes(root_box[0], sz_a, sz_b) - if not check(brc) then - error("qjson: root byte-span failed") - end local view = { _doc = doc, _cur_box = root_box, -- keep the box alive; _cur is a stable reference diff --git a/src/doc.rs b/src/doc.rs index 82226f5..cccf5f7 100644 --- a/src/doc.rs +++ b/src/doc.rs @@ -4,11 +4,11 @@ use crate::error::qjson_err; use crate::skip_cache::SkipCache; pub struct Document<'a> { - pub(crate) buf: &'a [u8], - pub(crate) indices: Vec, - pub(crate) eager_validated: bool, - pub(crate) scratch: RefCell>, - pub(crate) skip: RefCell, + pub(crate) buf: &'a [u8], + pub(crate) indices: Vec, + pub(crate) eager_validated: bool, + pub(crate) scratch: RefCell>, + pub(crate) skip: RefCell, } impl<'a> Document<'a> { diff --git a/src/ffi.rs b/src/ffi.rs index cbfb25a..5918114 100644 --- a/src/ffi.rs +++ b/src/ffi.rs @@ -283,6 +283,15 @@ pub unsafe extern "C" fn qjson_get_str( } // String ends at the close quote, whose indices position is idx_start + 1. let close = d.indices[(cur.idx_start + 1) as usize] as usize; + let slice = &d.buf[pos + 1..close]; + + // EAGER fast path: validation already passed; if no escapes, return + // the buffer slice directly without touching scratch. + if d.eager_validated && memchr::memchr(b'\\', slice).is_none() { + *out_ptr = slice.as_ptr(); + *out_len = slice.len(); + return qjson_err::QJSON_OK as c_int; + } let mut scratch = d.scratch.borrow_mut(); match string::decode_string(d.buf, pos + 1, close, &mut scratch, d.eager_validated) { @@ -561,6 +570,13 @@ pub unsafe extern "C" fn qjson_cursor_get_str( return qjson_err::QJSON_TYPE_MISMATCH as c_int; } let close = d.indices[(cur.idx_start + 1) as usize] as usize; + let slice = &d.buf[pos + 1..close]; + + if d.eager_validated && memchr::memchr(b'\\', slice).is_none() { + *out_ptr = slice.as_ptr(); + *out_len = slice.len(); + return qjson_err::QJSON_OK as c_int; + } let mut scratch = d.scratch.borrow_mut(); match string::decode_string(d.buf, pos + 1, close, &mut scratch, d.eager_validated) { @@ -654,6 +670,100 @@ pub unsafe extern "C" fn qjson_cursor_get_bool( }) } +/// Resolve type + decoded value of a cursor in one FFI call. +/// Fills `*type_out` unconditionally. For strings, fills `(*str_ptr, *str_len)`; +/// for numbers, fills `*f64_out`; for bool, fills `*bool_out`; +/// for containers, fills `(*byte_start, *byte_end)`. +/// +/// # Safety +/// +/// See the module-level [shared safety contract](self#shared-safety-contract). +/// `c` must point to a cursor produced by an earlier `qjson_*` call whose +/// document is still alive. All out pointers must be non-NULL. +#[no_mangle] +pub unsafe extern "C" fn qjson_cursor_get_value( + c: *const qjson_cursor, + type_out: *mut c_int, + str_ptr: *mut *const u8, str_len: *mut usize, + i64_out: *mut i64, f64_out: *mut f64, bool_out: *mut c_int, + byte_start: *mut usize, byte_end: *mut usize, +) -> c_int { + ffi_catch!({ + if type_out.is_null() || str_ptr.is_null() || str_len.is_null() + || i64_out.is_null() || f64_out.is_null() || bool_out.is_null() + || byte_start.is_null() || byte_end.is_null() + { + return qjson_err::QJSON_INVALID_ARG as c_int; + } + let (d, cur) = match cursor_to_internal(c) { + Ok(x) => x, Err(e) => return e as c_int, + }; + let pos = d.indices[cur.idx_start as usize] as usize; + let lead = match d.buf.get(pos).copied() { + Some(b) => b, + None => return qjson_err::QJSON_PARSE_ERROR as c_int, + }; + match lead { + b'"' => { + *type_out = qjson_type::QJSON_T_STR as c_int; + let close = d.indices[(cur.idx_start + 1) as usize] as usize; + let slice = &d.buf[pos + 1..close]; + if d.eager_validated && memchr::memchr(b'\\', slice).is_none() { + *str_ptr = slice.as_ptr(); + *str_len = slice.len(); + } else { + let mut scratch = d.scratch.borrow_mut(); + match string::decode_string(d.buf, pos + 1, close, &mut scratch, d.eager_validated) { + Ok((p, n)) => { *str_ptr = p; *str_len = n; } + Err(e) => return e as c_int, + } + } + *byte_start = pos; + *byte_end = close + 1; + } + b'{' | b'[' => { + *type_out = if lead == b'{' { qjson_type::QJSON_T_OBJ as c_int } + else { qjson_type::QJSON_T_ARR as c_int }; + let end = d.indices[cur.idx_end as usize] as usize; + if end >= d.buf.len() { return qjson_err::QJSON_PARSE_ERROR as c_int; } + *byte_start = pos; + *byte_end = end + 1; + } + _ => { + let bytes = match scalar_bytes(d, cur) { + Ok(b) => b, Err(e) => return e as c_int, + }; + match bytes { + b"true" => { + *type_out = qjson_type::QJSON_T_BOOL as c_int; + *bool_out = 1; + } + b"false" => { + *type_out = qjson_type::QJSON_T_BOOL as c_int; + *bool_out = 0; + } + b"null" => { + *type_out = qjson_type::QJSON_T_NULL as c_int; + } + _ => { + *type_out = qjson_type::QJSON_T_NUM as c_int; + match number::parse_f64(bytes, d.eager_validated) { + Ok(v) => { *f64_out = v; } + Err(e) => return e as c_int, + } + } + } + let (s, e) = match scalar_byte_range(d, cur) { + Ok(x) => x, Err(e) => return e as c_int, + }; + *byte_start = s; + *byte_end = e; + } + } + qjson_err::QJSON_OK as c_int + }) +} + /// Write the JSON value type at `path` (relative to `*c`) into `*type_out`. /// /// # Safety From 6e95c7ade92ddf44f0fcb4ec572aa130a506082c Mon Sep 17 00:00:00 2001 From: Yuansheng Wang Date: Sat, 23 May 2026 02:40:52 +0000 Subject: [PATCH 2/3] fix: remove unused i64_out from qjson_cursor_get_value, fix doc comment MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Review feedback: - Removed i64_out parameter since the function only decodes numbers as f64 - Clarified that type_out is filled on QJSON_OK return only - Updated README CPU description to match current bench environment (AMD EPYC-Rome, 4 vCPUs) Comment responses: - parse_i64/parse_f64 safety guards already landed in PR #51 (this was rebased on top of it); the eager fast-path includes empty-input and leading-byte prechecks. - Table separator columns verified correct — all dash groups match their respective header column counts. --- README.md | 2 +- include/qjson.h | 2 +- lua/qjson/lib.lua | 2 +- lua/qjson/table.lua | 4 ++-- src/ffi.rs | 7 ++++--- 5 files changed, 9 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 8ead6e1..106d172 100644 --- a/README.md +++ b/README.md @@ -101,7 +101,7 @@ LD_LIBRARY_PATH="$PWD/target/release" \ `qjson` vs. `lua-cjson` and `lua-resty-simdjson` on multimodal chat-completion payloads, "parse + access model, temperature, and all messages[*].content paths" workload (median ops/s under OpenResty LuaJIT 2.1, -Intel Core i5-9400; 5 rounds, deterministic payload): +AMD EPYC-Rome, 4 vCPUs; 5 rounds, deterministic payload): | Size | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | speedup vs. cjson | |---:|---:|---:|---:|---:|---:| diff --git a/include/qjson.h b/include/qjson.h index 708de9c..b828d05 100644 --- a/include/qjson.h +++ b/include/qjson.h @@ -86,7 +86,7 @@ int qjson_cursor_object_entry_at(const qjson_cursor*, size_t i, int qjson_cursor_get_value(const qjson_cursor*, int* type_out, const uint8_t** str_ptr, size_t* str_len, - int64_t* i64_out, double* f64_out, int* bool_out, + double* f64_out, int* bool_out, size_t* byte_start, size_t* byte_end); #ifdef __cplusplus diff --git a/lua/qjson/lib.lua b/lua/qjson/lib.lua index c57e3bf..49d2a04 100644 --- a/lua/qjson/lib.lua +++ b/lua/qjson/lib.lua @@ -44,7 +44,7 @@ int qjson_cursor_object_entry_at(const qjson_cursor*, size_t i, int qjson_cursor_get_value(const qjson_cursor*, int* type_out, const uint8_t** str_ptr, size_t* str_len, - int64_t* i64_out, double* f64_out, int* bool_out, + double* f64_out, int* bool_out, size_t* byte_start, size_t* byte_end); ]] diff --git a/lua/qjson/table.lua b/lua/qjson/table.lua index 25a5548..d731dc1 100644 --- a/lua/qjson/table.lua +++ b/lua/qjson/table.lua @@ -77,7 +77,7 @@ end -- can be freely reused afterwards. local function decode_cursor(parent_view, src_box) local rc = C.qjson_cursor_get_value(src_box[0], type_box, - strp_box, size_box, i64_box, f64_box, bool_box, + strp_box, size_box, f64_box, bool_box, sz_a, sz_b) if not check(rc) then return nil end local t = type_box[0] @@ -313,7 +313,7 @@ function _M.decode(json_str) ffi.copy(root_box, cur_box, ffi.sizeof("qjson_cursor")) -- Determine root container kind (object/array) and byte span in one call. local grc = C.qjson_cursor_get_value(root_box[0], type_box, - strp_box, size_box, i64_box, f64_box, bool_box, + strp_box, size_box, f64_box, bool_box, sz_a, sz_b) if not check(grc) then error("qjson: root typeof failed") diff --git a/src/ffi.rs b/src/ffi.rs index 5918114..522bff4 100644 --- a/src/ffi.rs +++ b/src/ffi.rs @@ -671,7 +671,8 @@ pub unsafe extern "C" fn qjson_cursor_get_bool( } /// Resolve type + decoded value of a cursor in one FFI call. -/// Fills `*type_out` unconditionally. For strings, fills `(*str_ptr, *str_len)`; +/// On success (`QJSON_OK`), fills `*type_out` unconditionally. +/// For strings, fills `(*str_ptr, *str_len)`; /// for numbers, fills `*f64_out`; for bool, fills `*bool_out`; /// for containers, fills `(*byte_start, *byte_end)`. /// @@ -685,12 +686,12 @@ pub unsafe extern "C" fn qjson_cursor_get_value( c: *const qjson_cursor, type_out: *mut c_int, str_ptr: *mut *const u8, str_len: *mut usize, - i64_out: *mut i64, f64_out: *mut f64, bool_out: *mut c_int, + f64_out: *mut f64, bool_out: *mut c_int, byte_start: *mut usize, byte_end: *mut usize, ) -> c_int { ffi_catch!({ if type_out.is_null() || str_ptr.is_null() || str_len.is_null() - || i64_out.is_null() || f64_out.is_null() || bool_out.is_null() + || f64_out.is_null() || bool_out.is_null() || byte_start.is_null() || byte_end.is_null() { return qjson_err::QJSON_INVALID_ARG as c_int; From 1c9430c787d634f0d2ea22f796ba4c1090d66447 Mon Sep 17 00:00:00 2001 From: Yuansheng Wang Date: Sat, 23 May 2026 02:44:52 +0000 Subject: [PATCH 3/3] docs: update benchmarks with 10-round data (AMD EPYC-Rome) --- README.md | 12 ++++----- docs/benchmarks.md | 66 +++++++++++++++++++++++----------------------- 2 files changed, 39 insertions(+), 39 deletions(-) diff --git a/README.md b/README.md index 106d172..bbfbc48 100644 --- a/README.md +++ b/README.md @@ -100,15 +100,15 @@ LD_LIBRARY_PATH="$PWD/target/release" \ `qjson` vs. `lua-cjson` and `lua-resty-simdjson` on multimodal chat-completion payloads, "parse + access model, temperature, and all -messages[*].content paths" workload (median ops/s under OpenResty LuaJIT 2.1, -AMD EPYC-Rome, 4 vCPUs; 5 rounds, deterministic payload): +messages[*].content paths" workload (median ops/s, 10 rounds, under +OpenResty LuaJIT 2.1, AMD EPYC-Rome, 4 vCPUs; deterministic payload): | Size | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | speedup vs. cjson | |---:|---:|---:|---:|---:|---:| -| 2 KB | 99,920 | 113,317 | 123,567 | 135,792 | 1.2× / 1.4× | -| 100 KB | 5,116 | 41,288 | 147,710 | 151,515 | 28.9× / 29.6× | -| 1 MB | 483 | 3,489 | 15,957 | 18,750 | 33.0× / 38.8× | -| 10 MB | 50 | 378 | 1,764 | 1,805 | 35.3× / 36.1× | +| 2 KB | 94,701 | 108,921 | 128,103 | 89,294 | 1.4× / 0.9× | +| 100 KB | 5,214 | 36,914 | 136,986 | 110,497 | 26.3× / 21.2× | +| 1 MB | 505 | 3,894 | 16,234 | 16,648 | 32.1× / 32.9× | +| 10 MB | 50 | 369 | 1,602 | 1,429 | 32.0× / 28.6× | `qjson.parse` wins because it skips building a Lua table for the parts you never read; `qjson.decode + t.field` adds a cjson-shaped table proxy on top diff --git a/docs/benchmarks.md b/docs/benchmarks.md index 92da958..72a7038 100644 --- a/docs/benchmarks.md +++ b/docs/benchmarks.md @@ -30,7 +30,7 @@ The harness lives at `benches/lua_bench.lua`. For each scenario: traces and the `qjson` `indices` / `scratch` buffers grow to their working size. Warmup is excluded from timing and the memory delta. 2. `collectgarbage("collect")` baseline. -3. 5 rounds × N iterations of the workload; report the **median** ops/s +3. 10 rounds × N iterations of the workload; report the **median** ops/s across rounds (mean + range also reported in the raw output). 4. Final `collectgarbage("count")` to capture the post-run memory delta in KB. The harness does not force a final collection after timing, so @@ -81,32 +81,32 @@ Each row is "parse + access request fields" on the named payload. | Scenario | Size | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | `qjson.decode + qjson.encode` | |---|---|---:|---:|---:|---:|---:|---:| -| small | 2.1 KB | 99,920 | 113,317 | 123,567 | 135,792 | 205,187 | -| medium | 60.4 KB | 8,736 | 81,913 | 125,251 | 203,087 | 200,401 | -| github-100k | 100 KB | 1,874 | 2,170 | 5,661 | 5,339 | 5,802 | -| 100k | 100 KB | 5,116 | 41,288 | 147,710 | 151,515 | 167,504 | -| 200k | 200 KB | 2,597 | 19,904 | 91,075 | 93,985 | 100,200 | -| 500k | 500 KB | 1,036 | 7,045 | 32,103 | 30,349 | 36,036 | -| 1m | 1.00 MB | 483 | 3,489 | 15,957 | 18,750 | 19,711 | -| 2m | 2.00 MB | 255 | 1,925 | 8,973 | 9,320 | 9,780 | -| 5m | 5.00 MB | 102 | 724 | 3,655 | 2,762 | 3,655 | -| 10m | 10.00 MB | 50 | 378 | 1,764 | 1,805 | 1,789 | -| interleaved (100k/200k/500k/1m, cycled) | — | 1,095 | 9,611 | 32,454 | 32,425 | 33,492 | +| small | 2.1 KB | 94,701 | 108,921 | 128,103 | 89,294 | 187,631 | +| medium | 60.4 KB | 8,850 | 82,699 | 120,598 | 194,856 | 135,759 | +| github-100k | 100 KB | 1,901 | 1,939 | 6,041 | 6,009 | 6,435 | +| 100k | 100 KB | 5,214 | 36,914 | 136,986 | 110,497 | 126,263 | +| 200k | 200 KB | 2,575 | 18,018 | 81,967 | 84,317 | 95,420 | +| 500k | 500 KB | 1,043 | 7,262 | 36,563 | 35,971 | 38,023 | +| 1m | 1.00 MB | 505 | 3,894 | 16,234 | 16,648 | 16,968 | +| 2m | 2.00 MB | 254 | 2,065 | 8,247 | 8,407 | 8,838 | +| 5m | 5.00 MB | 100 | 652 | 3,228 | 3,225 | 3,355 | +| 10m | 10.00 MB | 50 | 369 | 1,602 | 1,429 | 1,774 | +| interleaved (100k/200k/500k/1m, cycled) | — | 1,121 | 9,142 | 28,854 | 33,088 | 32,568 | ### Speed-up vs. baselines | Scenario | `qjson.parse` / cjson | `qjson.parse` / simdjson | `qjson.decode + access content` / cjson | `qjson.decode + access content` / simdjson | |---|---|---:|---:|---:|---:| -| small | 1.2× | 1.1× | 1.4× | 1.2× | -| medium | 14.3× | 1.5× | 23.2× | 2.5× | -| github-100k | 3.0× | 2.6× | 2.8× | 2.5× | -| 100k | 28.9× | 3.6× | 29.6× | 3.7× | -| 200k | 35.1× | 4.6× | 36.2× | 4.7× | -| 500k | 31.0× | 4.6× | 29.3× | 4.3× | -| 1m | 33.0× | 4.6× | 38.8× | 5.4× | -| 2m | 35.2× | 4.7× | 36.5× | 4.8× | -| 5m | 35.8× | 5.0× | 27.1× | 3.8× | -| 10m | 35.3× | 4.7× | 36.1× | 4.8× | +| small | 1.4× | 1.2× | 0.9× | 0.8× | +| medium | 13.6× | 1.5× | 22.0× | 2.4× | +| github-100k | 3.2× | 3.1× | 3.2× | 3.1× | +| 100k | 26.3× | 3.7× | 21.2× | 3.0× | +| 200k | 31.8× | 4.5× | 32.7× | 4.7× | +| 500k | 35.1× | 5.0× | 34.5× | 5.0× | +| 1m | 32.1× | 4.2× | 32.9× | 4.3× | +| 2m | 32.5× | 4.0× | 33.1× | 4.1× | +| 5m | 32.3× | 5.0× | 32.3× | 4.9× | +| 10m | 32.0× | 4.3× | 28.6× | 3.9× | ## Results — memory delta (KB retained after 5 rounds) @@ -116,17 +116,17 @@ from the last round may still be included. | Scenario | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | `qjson.decode + qjson.encode` | |---|---|---:|---:|---:|---:|---:| -| small | +15,492 | +15,507 | +4,069 | +15,131 | +11,139 | -| medium | +1,956 | +2,660 | +65 | +1,114 | +1,120 | -| github-100k | +12,018 | +3,373 | +19 | +536 | +229 | -| 100k | +485 | +748 | +71 | +692 | +229 | -| 200k | +392 | +523 | +34 | +346 | +112 | -| 500k | +577 | +630 | +15 | +139 | +45 | -| 1m | +1,082 | +1,121 | +10 | +104 | +34 | -| 2m | +1,155 | +1,248 | +18 | +208 | +45 | -| 5m | +1,316 | +1,538 | +14 | +400 | +45 | -| 10m | +1,583 | +2,014 | +14 | +708 | +45 | -| interleaved | +3,357 | +4,404 | +270 | +2,776 | +897 | +| small | -2,472 | +12,360 | +8,159 | +8,651 | +2,698 | +| medium | +3,850 | +5,258 | +124 | +2,228 | +2,234 | +| github-100k | +20,255 | +18,823 | +29 | +1,072 | +452 | +| 100k | +867 | +1,393 | +138 | +1,384 | +452 | +| 200k | +584 | +846 | +67 | +692 | +223 | +| 500k | +654 | +759 | +27 | +277 | +89 | +| 1m | +1,140 | +1,218 | +20 | +208 | +67 | +| 2m | +1,284 | +1,472 | +31 | +409 | +89 | +| 5m | +1,607 | +2,051 | +27 | +855 | +89 | +| 10m | +2,143 | +3,004 | +27 | +1,736 | +89 | +| interleaved | +4,888 | +6,983 | +536 | +5,537 | +1,788 | `qjson.parse` retention is essentially constant across payload size: the only GC-rooted state is the reusable `indices: Vec` and `scratch` buffers.