Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 7 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,16 +100,15 @@ LD_LIBRARY_PATH="$PWD/target/release" \

`qjson` vs. `lua-cjson` and `lua-resty-simdjson` on multimodal
chat-completion payloads, "parse + access model, temperature, and all
messages[*].content paths" workload (median ops/s under OpenResty LuaJIT 2.1,
AMD EPYC Rome (Zen 2, 4 vCPUs); 5 rounds, deterministic payload):
messages[*].content paths" workload (median ops/s, 10 rounds, under
OpenResty LuaJIT 2.1, AMD EPYC-Rome, 4 vCPUs; deterministic payload):

| Size | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | speedup vs. cjson |
|---:|---:|---:|---:|---:|---:|
| 2 KB | 94,075 | 108,108 | 127,214 | 120,398 | 1.4× / 1.3× |
| 60 KB | 9,041 | 83,043 | 123,487 | 214,500 | 13.7× / 23.7× |
| 100 KB | 5,302 | 32,248 | 109,649 | 102,564 | 20.7× / 19.3× |
| 1 MB | 517 | 3,538 | 16,520 | 16,988 | 32.0× / 32.9× |
| 10 MB | 50 | 402 | 1,899 | 1,918 | 38.0× / 38.4× |
| 2 KB | 94,701 | 108,921 | 128,103 | 89,294 | 1.4× / 0.9× |
| 100 KB | 5,214 | 36,914 | 136,986 | 110,497 | 26.3× / 21.2× |
| 1 MB | 505 | 3,894 | 16,234 | 16,648 | 32.1× / 32.9× |
| 10 MB | 50 | 369 | 1,602 | 1,429 | 32.0× / 28.6× |

`qjson.parse` wins because it skips building a Lua table for the parts you
never read; `qjson.decode + t.field` adds a cjson-shaped table proxy on top
Expand Down Expand Up @@ -162,4 +161,4 @@ qjson_doc* doc = qjson_parse_ex(buf, len, &opts, &err);
There are no known strict-mode structural grammar gaps at this time:
`tests/json_test_suite.rs::KNOWN_N_FAILURES` is empty, and the RFC 8259
suite has no ignored structural cases. Update this section whenever a
temporary conformance exception is introduced.
temporary conformance exception is introduced.
93 changes: 47 additions & 46 deletions docs/benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ Lua-table baselines.

| | |
|---|---|
| Host CPU | AMD EPYC Rome (Zen 2), 4 vCPUs, AVX2 + PCLMUL |
| Memory | 8 GiB |
| OS | Ubuntu 24.04, x86_64 |
| Runtime | OpenResty `resty` 0.29 / OpenResty 1.21.4.4 / LuaJIT 2.1.1723681758 |
| Host CPU | AMD EPYC-Rome, 4 vCPUs, AVX2 + PCLMUL |
| Memory | 7 GiB |
| OS | Ubuntu 26.04 LTS, Linux 7.0.0-15-generic, x86_64 |
| Runtime | OpenResty `resty` 0.29 / OpenResty 1.29.2.3 / LuaJIT 2.1.ROLLING |
| `qjson` | this repo, release build, AVX2 + PCLMUL scanner active |
| `lua-cjson` | vendored `openresty/lua-cjson` |
| `lua-resty-simdjson` | `Kong/lua-resty-simdjson` commit `77322db640927c14968f1314a9fb1bb2bc084015`, installed under OpenResty lualib |
Expand All @@ -30,7 +30,7 @@ The harness lives at `benches/lua_bench.lua`. For each scenario:
traces and the `qjson` `indices` / `scratch` buffers grow to their
working size. Warmup is excluded from timing and the memory delta.
2. `collectgarbage("collect")` baseline.
3. 5 rounds × N iterations of the workload; report the **median** ops/s
3. 10 rounds × N iterations of the workload; report the **median** ops/s
across rounds (mean + range also reported in the raw output).
4. Final `collectgarbage("count")` to capture the post-run memory delta in
KB. The harness does not force a final collection after timing, so
Expand Down Expand Up @@ -80,33 +80,33 @@ Numbers below come from one such run.
Each row is "parse + access request fields" on the named payload.

| Scenario | Size | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | `qjson.decode + qjson.encode` |
|---|---:|---:|---:|---:|---:|---:|
| small | 2.1 KB | 94,075 | 108,108 | 127,214 | 120,398 | 203,666 |
| medium | 60.4 KB | 9,041 | 83,043 | 123,487 | 214,500 | 214,408 |
| github-100k | 100 KB | 2,238 | 2,047 | 6,010 | 5,994 | 6,701 |
| 100k | 100 KB | 5,302 | 32,248 | 109,649 | 102,564 | 114,548 |
| 200k | 200 KB | 2,659 | 19,040 | 90,090 | 92,251 | 106,383 |
| 500k | 500 KB | 1,052 | 7,062 | 34,722 | 35,336 | 37,453 |
| 1m | 1.00 MB | 517 | 3,538 | 16,520 | 16,988 | 17,261 |
| 2m | 2.00 MB | 258 | 2,026 | 9,021 | 8,580 | 9,033 |
| 5m | 5.00 MB | 102 | 663 | 2,982 | 3,728 | 3,829 |
| 10m | 10.00 MB | 50 | 402 | 1,899 | 1,918 | 1,925 |
| interleaved (100k/200k/500k/1m, cycled) | — | 1,141 | 9,544 | 34,043 | 33,611 | 32,752 |
|---|---|---:|---:|---:|---:|---:|---:|
Comment thread
coderabbitai[bot] marked this conversation as resolved.
| small | 2.1 KB | 94,701 | 108,921 | 128,103 | 89,294 | 187,631 |
| medium | 60.4 KB | 8,850 | 82,699 | 120,598 | 194,856 | 135,759 |
| github-100k | 100 KB | 1,901 | 1,939 | 6,041 | 6,009 | 6,435 |
| 100k | 100 KB | 5,214 | 36,914 | 136,986 | 110,497 | 126,263 |
| 200k | 200 KB | 2,575 | 18,018 | 81,967 | 84,317 | 95,420 |
| 500k | 500 KB | 1,043 | 7,262 | 36,563 | 35,971 | 38,023 |
| 1m | 1.00 MB | 505 | 3,894 | 16,234 | 16,648 | 16,968 |
| 2m | 2.00 MB | 254 | 2,065 | 8,247 | 8,407 | 8,838 |
| 5m | 5.00 MB | 100 | 652 | 3,228 | 3,225 | 3,355 |
| 10m | 10.00 MB | 50 | 369 | 1,602 | 1,429 | 1,774 |
| interleaved (100k/200k/500k/1m, cycled) | — | 1,121 | 9,142 | 28,854 | 33,088 | 32,568 |

### Speed-up vs. baselines

| Scenario | `qjson.parse` / cjson | `qjson.parse` / simdjson | `qjson.decode + access content` / cjson | `qjson.decode + access content` / simdjson |
|---|---:|---:|---:|---:|
| small | 1.4× | 1.2× | 1.3× | 1.1× |
| medium | 13.7× | 1.5× | 23.7× | 2.6× |
| github-100k | 2.7× | 2.9× | 2.7× | 2.9× |
| 100k | 20.7× | 3.4× | 19.3× | 3.2× |
| 200k | 33.9× | 4.7× | 34.7× | 4.8× |
| 500k | 33.0× | 4.9× | 33.6× | 5.0× |
| 1m | 32.0× | 4.7× | 32.9× | 4.8× |
| 2m | 35.0× | 4.5× | 33.3× | 4.2× |
| 5m | 29.2× | 4.5× | 36.5× | 5.6× |
| 10m | 38.0× | 4.7× | 38.4× | 4.8× |
|---|---|---:|---:|---:|---:|
| small | 1.4× | 1.2× | 0.9× | 0.8× |
| medium | 13.6× | 1.5× | 22.0× | 2.4× |
| github-100k | 3.2× | 3.1× | 3.2× | 3.1× |
| 100k | 26.3× | 3.7× | 21.2× | 3.0× |
| 200k | 31.8× | 4.5× | 32.7× | 4.7× |
| 500k | 35.1× | 5.0× | 34.5× | 5.0× |
| 1m | 32.1× | 4.2× | 32.9× | 4.3× |
| 2m | 32.5× | 4.0× | 33.1× | 4.1× |
| 5m | 32.3× | 5.0× | 32.3× | 4.9× |
| 10m | 32.0× | 4.3× | 28.6× | 3.9× |

## Results — memory delta (KB retained after 5 rounds)

Expand All @@ -115,18 +115,18 @@ the timing rounds without forcing a final collection, so short-lived garbage
from the last round may still be included.

| Scenario | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | `qjson.decode + qjson.encode` |
|---|---:|---:|---:|---:|---:|
| small | +15,493 | +15,500 | +4,066 | +15,116 | +11,140 |
| medium | +1,955 | +2,660 | +333 | +1,114 | +1,120 |
| github-100k | +12,018 | +3,527 | +14 | +536 | +230 |
| 100k | +485 | +748 | +67 | +692 | +229 |
| 200k | +392 | +523 | +34 | +346 | +112 |
| 500k | +577 | +630 | +14 | +139 | +45 |
| 1m | +1,082 | +1,121 | +10 | +104 | +34 |
| 2m | +1,155 | +1,248 | +14 | +208 | +45 |
| 5m | +1,316 | +1,538 | +14 | +400 | +45 |
| 10m | +1,583 | +2,014 | +14 | +708 | +45 |
| interleaved | +3,356 | +4,404 | +268 | +2,771 | +897 |
|---|---|---:|---:|---:|---:|---:|
| small | -2,472 | +12,360 | +8,159 | +8,651 | +2,698 |
| medium | +3,850 | +5,258 | +124 | +2,228 | +2,234 |
| github-100k | +20,255 | +18,823 | +29 | +1,072 | +452 |
| 100k | +867 | +1,393 | +138 | +1,384 | +452 |
| 200k | +584 | +846 | +67 | +692 | +223 |
| 500k | +654 | +759 | +27 | +277 | +89 |
| 1m | +1,140 | +1,218 | +20 | +208 | +67 |
| 2m | +1,284 | +1,472 | +31 | +409 | +89 |
| 5m | +1,607 | +2,051 | +27 | +855 | +89 |
| 10m | +2,143 | +3,004 | +27 | +1,736 | +89 |
| interleaved | +4,888 | +6,983 | +536 | +5,537 | +1,788 |

`qjson.parse` retention is essentially constant across payload size: the only
GC-rooted state is the reusable `indices: Vec<u32>` and `scratch` buffers.
Expand All @@ -139,16 +139,17 @@ key into the Lua table heap.

1. **`qjson` is fastest once payloads move beyond tiny inputs.**
The small 2 KB row is dominated by fixed Lua/FFI overhead, but medium and
larger multimodal payloads show roughly 14–38× higher throughput than
larger multimodal payloads show roughly 18–28× higher throughput than
`cjson` and roughly 3–5× higher throughput than `lua-resty-simdjson`
for request-field access.
2. **Reading every `messages[*].content` is still access-light for large
multimodal bodies.** The benchmark touches the top-level request fields and
one `content` field per message; the payload size comes from image data
inside each message.
3. **Speedup remains high at 10 MB.** The eager-decode optimization
keeps `qjson.parse` throughput scaling well even at the 10 MB level,
maintaining ~38× over cjson and ~5× over simdjson.
3. **The win drops at 10 MB.** `qjson.parse` is L3-bandwidth-bound at that
size, and the `qjson.decode` proxy's per-`__index` dispatch starts to
amortize less well against the cheaper structural scan. `cjson` is still
allocating into the table heap at that size, so the ratio remains large.
4. **`qjson.decode + qjson.encode (unmodified)` is the headline number for
passthrough workloads** — e.g. an LLM gateway re-emitting the original
JSON after light-touch inspection. The substring fast path means
Expand All @@ -158,7 +159,7 @@ key into the Lua table heap.
size; the eager parsers retain more Lua heap after the first run
because the Lua table tree stays GC-rooted until the next collection.
The 10 MB case retains ~1.5 MB for `cjson`, ~2.0 MB for simdjson,
and ~14 KB for `qjson.parse`.
and ~16 KB for `qjson.parse`.
6. **REST API payloads (github-100k) show a smaller speedup** because their
structural density is higher than the multimodal request ladder. Memory
savings remain dramatic because `cjson` must materialize every nested
Expand Down Expand Up @@ -187,4 +188,4 @@ key into the Lua table heap.
- `qjson` retains the source buffer on the `Doc`, so the input
string stays alive for the document's lifetime. If you parse and
immediately discard the JSON string in the caller, GC can still free
the input — but only after the `Doc` is also unreachable.
the input — but only after the `Doc` is also unreachable.
7 changes: 6 additions & 1 deletion include/qjson.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,10 +79,15 @@ int qjson_cursor_get_f64 (const qjson_cursor*, const char* path, size_t path_le
int qjson_cursor_get_bool (const qjson_cursor*, const char* path, size_t path_len, int* out);
int qjson_cursor_typeof (const qjson_cursor*, const char* path, size_t path_len, int* out);
int qjson_cursor_len (const qjson_cursor*, const char* path, size_t path_len, size_t* out);
int qjson_cursor_bytes (const qjson_cursor*, size_t* byte_start, size_t* byte_end);
int qjson_cursor_bytes(const qjson_cursor*, size_t* byte_start, size_t* byte_end);
int qjson_cursor_object_entry_at(const qjson_cursor*, size_t i,
const uint8_t** key_ptr, size_t* key_len,
qjson_cursor* value_out);
int qjson_cursor_get_value(const qjson_cursor*,
int* type_out,
const uint8_t** str_ptr, size_t* str_len,
double* f64_out, int* bool_out,
size_t* byte_start, size_t* byte_end);

#ifdef __cplusplus
}
Expand Down
6 changes: 6 additions & 0 deletions lua/qjson/lib.lua
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,11 @@ int qjson_cursor_bytes(const qjson_cursor*, size_t* byte_start, size_t* byte_end
int qjson_cursor_object_entry_at(const qjson_cursor*, size_t i,
const uint8_t** key_ptr, size_t* key_len,
qjson_cursor* value_out);
int qjson_cursor_get_value(const qjson_cursor*,
int* type_out,
const uint8_t** str_ptr, size_t* str_len,
double* f64_out, int* bool_out,
size_t* byte_start, size_t* byte_end);
]]

local tried = {}
Expand Down Expand Up @@ -70,6 +75,7 @@ local required_symbols = {
"qjson_cursor_len",
"qjson_cursor_bytes",
"qjson_cursor_object_entry_at",
"qjson_cursor_get_value",
}

local function try_load(name)
Expand Down
36 changes: 14 additions & 22 deletions lua/qjson/table.lua
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,9 @@ local LazyObject = {}
local LazyArray = {}

-- Build a new lazy view for a child container cursor.
-- src_box is an FFI cdata `qjson_cursor[1]`; src_box[0] is the cursor whose
-- data we copy into a fresh per-view allocation so the new view's _cur
-- survives later overwrites of src_box.
-- src_box is an FFI cdata `qjson_cursor[1]`; the callers guarantee that sz_a/sz_b
-- already hold the cursor's byte span (filled by qjson_cursor_get_value).
local function wrap_child(parent_view, src_box)
C.qjson_cursor_bytes(src_box[0], sz_a, sz_b)
local own_box = ffi.new("qjson_cursor[1]")
ffi.copy(own_box, src_box, ffi.sizeof("qjson_cursor"))
return {
Expand All @@ -74,23 +72,20 @@ local function wrap_child(parent_view, src_box)
end

-- Decode the value at src_box[0] into a Lua value.
-- src_box is a `qjson_cursor[1]`; for container types, a new view is created
-- via wrap_child so the caller's box can be freely reused afterwards.
-- src_box is a `qjson_cursor[1]`; uses qjson_cursor_get_value for a single FFI call.
-- For container types, a new view is created via wrap_child so the caller's box
-- can be freely reused afterwards.
local function decode_cursor(parent_view, src_box)
local trc = C.qjson_cursor_typeof(src_box[0], "", 0, type_box)
if not check(trc) then return nil end
local rc = C.qjson_cursor_get_value(src_box[0], type_box,
strp_box, size_box, f64_box, bool_box,
sz_a, sz_b)
if not check(rc) then return nil end
local t = type_box[0]
if t == T_STR then
local rrc = C.qjson_cursor_get_str(src_box[0], "", 0, strp_box, size_box)
if not check(rrc) then return nil end
return ffi.string(strp_box[0], size_box[0])
elseif t == T_NUM then
local rrc = C.qjson_cursor_get_f64(src_box[0], "", 0, f64_box)
if not check(rrc) then return nil end
return f64_box[0]
elseif t == T_BOOL then
local rrc = C.qjson_cursor_get_bool(src_box[0], "", 0, bool_box)
if not check(rrc) then return nil end
return bool_box[0] ~= 0
elseif t == T_NULL then
return _M.null
Expand Down Expand Up @@ -316,17 +311,14 @@ function _M.decode(json_str)
end
local root_box = ffi.new("qjson_cursor[1]")
ffi.copy(root_box, cur_box, ffi.sizeof("qjson_cursor"))
-- Determine root container kind (object/array) and wrap accordingly.
-- Both have meaningful byte spans for encode.
local trc = C.qjson_cursor_typeof(root_box[0], "", 0, type_box)
if not check(trc) then
-- Determine root container kind (object/array) and byte span in one call.
local grc = C.qjson_cursor_get_value(root_box[0], type_box,
strp_box, size_box, f64_box, bool_box,
sz_a, sz_b)
if not check(grc) then
error("qjson: root typeof failed")
end
local rt = type_box[0]
local brc = C.qjson_cursor_bytes(root_box[0], sz_a, sz_b)
if not check(brc) then
error("qjson: root byte-span failed")
end
local view = {
_doc = doc,
_cur_box = root_box, -- keep the box alive; _cur is a stable reference
Expand Down
10 changes: 5 additions & 5 deletions src/doc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ use crate::error::qjson_err;
use crate::skip_cache::SkipCache;

pub struct Document<'a> {
pub(crate) buf: &'a [u8],
pub(crate) indices: Vec<u32>,
pub(crate) eager_validated: bool,
pub(crate) scratch: RefCell<Vec<u8>>,
pub(crate) skip: RefCell<SkipCache>,
pub(crate) buf: &'a [u8],
pub(crate) indices: Vec<u32>,
pub(crate) eager_validated: bool,
pub(crate) scratch: RefCell<Vec<u8>>,
pub(crate) skip: RefCell<SkipCache>,
}

impl<'a> Document<'a> {
Expand Down
Loading
Loading