Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,14 @@ cargo test --features test-panic --release

### Two-phase parse

**Phase 1** (`src/scan/`, called from `Document::parse_with_options`): a structural scanner walks the input once and writes the byte offset of every non-string-interior `{ } [ ] : , "` into `doc.indices`. Then `validate_depth` is run unconditionally; in EAGER mode, `validate_trailing` and `validate_eager_values` (number ABNF + string content + UTF-8) follow. In LAZY mode, value-level checks are skipped and rely on the lazy decode path at field-access time. A `u32::MAX` sentinel is appended. The scanner is selected at first use via `OnceCell` in `src/scan/mod.rs`:
**Phase 1** (`src/scan/`, called from `Document::parse_with_options`): a structural scanner walks the input once and writes the byte offset of every non-string-interior `{ } [ ] : , "` into `doc.indices`. In LAZY mode, only `validate_depth` is run. In EAGER mode, `validate_eager_fused` runs — a single O(indices) pass that combines depth checking, trailing-content detection, and grammar/value validation (number ABNF + string content + UTF-8). String validation uses a PSHUFB nibble-LUT byte classifier (`src/validate/classify.rs`) for per-byte class bitmasks in ~3 SIMD ops per 32-byte chunk. A `u32::MAX` sentinel is appended. The scanner and string validator are selected at first use via `OnceCell`:

- `Avx2Scanner` (gated by the `avx2` cargo feature, default-on) when both `avx2` and `pclmulqdq` are detected at runtime.
- `ScalarScanner` otherwise.
- **Scanner** (`src/scan/mod.rs`):
- `Avx2Scanner` (gated by the `avx2` cargo feature, default-on) when both `avx2` and `pclmulqdq` are detected at runtime.
- `ScalarScanner` otherwise.
- **String validator** (`src/validate/strings/mod.rs`):
- AVX2 PSHUFB classifier when `avx2` is detected.
- Scalar state machine otherwise.

Validation level depends on `qjson_options.mode`. **EAGER** (default): a post-scan pass walks `indices` and validates RFC 8259 number ABNF, string content (no unescaped control chars), and UTF-8 — parse fails on any value-level violation. **LAZY** (opt-in): bracket/quote balance + max-depth only; value-level errors surface when the offending field is accessed (lua-cjson-equivalent behavior). Trailing-content rejection and value-level validation are eager-only; max-depth (default 1024, configurable up to 4096) is enforced in both modes.

Expand All @@ -72,6 +76,7 @@ src/
cursor.rs Cursor + path resolution + skip-cache walk
path.rs zero-alloc path-string iterator
decode/ lazy string / number decode
validate/ post-scan validators: validate_eager_fused, depth, strings, numbers
scan/ ScalarScanner, Avx2Scanner, runtime dispatch
skip_cache.rs Phase 2 sibling-skip cache
error.rs qjson_err + qjson_type enums (must stay in sync with include/qjson.h and lua/qjson.lua)
Expand Down
22 changes: 14 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Rust-implemented fast JSON decoder exposed to LuaJIT via FFI. Optimized for the

## Status

Initial implementation complete: scalar + AVX2/PCLMUL + ARM64 NEON/PMULL structural scanner (runtime-dispatched), root-path and cursor APIs, escape-decoded strings, integer/float/bool/typeof/len, FFI panic barrier, and a LuaJIT wrapper. Rust unit/integration tests and Lua busted tests run in CI. The benchmark harness compares against lua-cjson and lua-resty-simdjson.
Scalar + AVX2/PCLMUL + ARM64 NEON/PMULL structural scanner (runtime-dispatched), root-path and cursor APIs, escape-decoded strings, integer/float/bool/typeof/len, FFI panic barrier, and a LuaJIT wrapper. Eager validation uses a fused single-pass grammar state machine with a PSHUFB nibble-LUT byte classifier for string validation. Rust unit/integration tests and Lua busted tests run in CI. The benchmark harness compares against lua-cjson and lua-resty-simdjson.

## Building

Expand Down Expand Up @@ -100,23 +100,29 @@ LD_LIBRARY_PATH="$PWD/target/release" \

`qjson` vs. `lua-cjson` and `lua-resty-simdjson` on multimodal
chat-completion payloads, "parse + access model, temperature, and all
messages[*].content paths" workload (median ops/s under OpenResty LuaJIT 2.1,
AMD EPYC Rome (Zen 2, 4 vCPUs); 5 rounds, deterministic payload):
messages[*].content paths" workload (mean ops/s under OpenResty LuaJIT 2.1,
AMD EPYC Rome (Zen 2, 4 vCPUs); 10 rounds, deterministic payload):

| Size | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | speedup vs. cjson |
|---:|---:|---:|---:|---:|---:|
| 2 KB | 94,075 | 108,108 | 127,214 | 120,398 | 1.4× / 1.3× |
| 60 KB | 9,041 | 83,043 | 123,487 | 214,500 | 13.7× / 23.7× |
| 100 KB | 5,302 | 32,248 | 109,649 | 102,564 | 20.7× / 19.3× |
| 1 MB | 517 | 3,538 | 16,520 | 16,988 | 32.0× / 32.9× |
| 10 MB | 50 | 402 | 1,899 | 1,918 | 38.0× / 38.4× |
| 2 KB | 100,127 | 109,588 | 130,867 | 105,038 | 1.3× / 1.0× |
| 60 KB | 8,701 | 77,936 | 135,700 | 177,650 | 15.6× / 20.4× |
| 100 KB (CJK) | 2,203 | 2,367 | 4,965 | 5,363 | 2.3× / 2.4× |
| 100 KB | 4,985 | 32,232 | 130,621 | 125,348 | 26.2× / 25.1× |
| 1 MB | 498 | 3,697 | 15,831 | 15,784 | 31.8× / 31.7× |
| 10 MB | 50 | 383 | 1,473 | 1,548 | 29.5× / 31.0× |

`qjson.parse` wins because it skips building a Lua table for the parts you
never read; `qjson.decode + t.field` adds a cjson-shaped table proxy on top
with similar throughput. Memory retention for `qjson` is essentially
flat in payload size (a few KB for the reusable buffers), while `cjson`
and `simdjson` retain more Lua heap because they materialize the table tree.

The eager validation path (fused single-pass grammar + PSHUFB string
classifier) yields **13–15% throughput improvement** on 1 MB payloads
measured at the Rust level. See [`docs/benchmarks.md`](docs/benchmarks.md)
for the micro-benchmark data and the full size ladder.

See [`docs/benchmarks.md`](docs/benchmarks.md) for the full size ladder,
memory numbers, an "encode round-trip" row (passthrough emit via
`memcpy`), exact environment, and the reproduction command. `make bench`
Expand Down
107 changes: 104 additions & 3 deletions benches/lua_bench.lua
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ local function make_payload(target_bytes)
.. table.concat(messages, ",") .. ']}'
end

local ROUNDS = 5
local ROUNDS = 10

local function bench(name, iters, fn)
-- Warmup pass: lets JIT compile hot traces and any one-time pools fill
Expand Down Expand Up @@ -220,6 +220,105 @@ local function default_table_access(t)
end
end

-- Safe UTF-8 truncation: backs up past incomplete multi-byte sequences.
local function safe_sub(s, len)
if #s <= len then return s end
local pos = len
while pos > 0 and s:byte(pos) >= 0x80 and s:byte(pos) < 0xC0 do pos = pos - 1 end
if pos > 0 then
local lead = s:byte(pos)
local need = 0
if lead >= 0xF0 then need = 3
elseif lead >= 0xE0 then need = 2
elseif lead >= 0xC2 then need = 1
end
if len - pos < need then pos = pos - 1 end
while pos > 0 and s:byte(pos) >= 0x80 and s:byte(pos) < 0xC0 do pos = pos - 1 end
end
return s:sub(1, pos)
end

-- CJK GitHub-issues payload: same 20-field structure as github-100k but
-- with Chinese text and emoji in body/title/labels. Directly comparable
-- to github-100k — isolates the UTF-8 / high-bit byte impact.
local function make_cjk_payload(target_bytes)
local issues = {}
local current = 2
local n = 1
local cjk_body = "这是一段用于模拟GitHub Issues中文描述的测试文本包含常见的开发术语问题报告功能请求以及Bug修复记录"
.. "😀🎉💡✨🚀🌟🔥🎊💯👍❤️🌍📱🎵🏆🍕🎮📚💻🔑🎁"
local cjk_title = "修复用户登录页面在移动端的显示问题并优化响应式布局"
while current < target_bytes do
local labels = {}
local label_count = (n % 4)
local label_names = { "缺陷bug", "功能增强", "文档优化", "性能改进" }
for i = 1, label_count do
labels[#labels + 1] = string.format(
[[{"id":%d,"name":"%s","color":"%06x","description":"标签分类描述"}]],
10000 + n * 10 + i, label_names[i], (n * 12345 + i) % 0xFFFFFF)
end
-- Use whole multiples of cjk_body to avoid UTF-8 truncation
local reps = 1 + (n % 3)
local body = string.rep(cjk_body, reps)
local issue = string.format([[{
"id":%d,
"number":%d,
"title":"%s #%d",
"body":"%s",
"state":"%s",
"locked":%s,
"comments":%d,
"user":{"login":"用户%d","id":%d,"avatar_url":"https://avatars.githubusercontent.com/u/%d?v=4","type":"用户","site_admin":false},
"labels":[%s],
"assignees":[],
"milestone":null,
"created_at":"2024-%02d-%02dT%02d:%02d:%02dZ",
"updated_at":"2024-%02d-%02dT%02d:%02d:%02dZ",
"closed_at":null,
"author_association":"贡献者",
"html_url":"https://github.com/example/中文仓库/issues/%d",
"url":"https://api.github.com/repos/example/中文仓库/issues/%d",
"repository_url":"https://api.github.com/repos/example/中文仓库",
"labels_url":"https://api.github.com/repos/example/中文仓库/issues/%d/labels{/名称}",
"comments_url":"https://api.github.com/repos/example/中文仓库/issues/%d/评论",
"events_url":"https://api.github.com/repos/example/中文仓库/issues/%d/事件"
}]],
1000000 + n, n, cjk_title, n, body,
n % 3 == 0 and "已关闭" or "进行中",
n % 7 == 0 and "true" or "false",
n % 50, n % 100, 100000 + n, 100000 + n,
table.concat(labels, ","),
(n % 12) + 1, (n % 28) + 1, n % 24, n % 60, n % 60,
(n % 12) + 1, (n % 28) + 1, (n + 1) % 24, (n + 5) % 60, (n + 10) % 60,
n, n, n, n, n)
issue = issue:gsub("\n", "")
if current + #issue + 3 > target_bytes then break end
issues[#issues + 1] = issue
current = current + #issue + 1
n = n + 1
end
return "[" .. table.concat(issues, ",") .. "]"
end

local function cjk_qjson_access(d)
if not d then return end
local _ = d:get_i64("[0].id")
local _ = d:get_str("[0].title")
local _ = d:get_str("[0].user.login")
end

local function cjk_table_access(t)
local _ = t[1] and t[1].id
local _ = t[1] and t[1].title
local _ = t[1] and t[1].user and t[1].user.login
end

local function cjk_cjson_access(obj)
local _ = obj[1] and obj[1].id
local _ = obj[1] and obj[1].title
local _ = obj[1] and obj[1].user and obj[1].user.login
end

-- GitHub issues accessors: array of issues, access first issue's fields
local function github_cjson_access(obj)
local _ = obj[1] and obj[1].id
Expand All @@ -242,8 +341,10 @@ end
local scenarios = {
{name = "small", iters = 5000, payload = read_file("benches/fixtures/small_api.json")},
{name = "medium", iters = 500, payload = read_file("benches/fixtures/medium_resp.json")},
{name = "github-100k", iters = 100, payload = make_github_issues_payload(100 * 1024),
{name = "github-100k", iters = 100, payload = make_github_issues_payload(100 * 1024),
cjson_access = github_cjson_access, qjson_access = github_qjson_access, table_access = github_table_access},
{name = "cjk-100k", iters = 100, payload = make_cjk_payload(100 * 1024),
cjson_access = cjk_cjson_access, qjson_access = cjk_qjson_access, table_access = cjk_table_access},
{name = "100k", iters = 100, payload = make_payload(100 * 1024)},
{name = "200k", iters = 50, payload = make_payload(200 * 1024)},
{name = "500k", iters = 20, payload = make_payload(500 * 1024)},
Expand Down Expand Up @@ -275,7 +376,7 @@ for _, s in ipairs(scenarios) do
cjson_access(obj)
end)

if simdjson then
if simdjson and not s.no_simdjson then
bench("simdjson.decode + access fields", s.iters, function()
local obj = simdjson:decode(s.payload)
cjson_access(obj)
Expand Down
Loading
Loading