Skip to content

perf: fuse eager validation passes + PSHUFB string classifier#52

Open
membphis wants to merge 17 commits into
mainfrom
fuse-eager-passes
Open

perf: fuse eager validation passes + PSHUFB string classifier#52
membphis wants to merge 17 commits into
mainfrom
fuse-eager-passes

Conversation

@membphis
Copy link
Copy Markdown
Collaborator

Summary

Fuse 3 post-scan eager validation passes (validate_depth, validate_trailing, validate_eager_values) into a single O(indices) traversal, and replace the old AVX2 string validator with a PSHUFB nibble-LUT byte classifier.

Changes

  • Pass fusion (validate/mod.rs, doc.rs): merge depth checking and trailing-content detection into the grammar state machine. Eliminates 2 redundant indices traversals.
  • PSHUFB classifier (classify.rs): new shared byte-classification module using _mm256_shuffle_epi8 nibble-LUT. Classifies all 32 bytes in a chunk simultaneously instead of doing 3 separate SIMD compares.
  • AVX2 string validator rewrite (strings/avx2.rs): uses classifier instead of "find-first-interesting-then-scalar" approach. Escape sequences and UTF-8 triggers are processed in-batch.

Performance (1MB JSON, 10-run avg ± stddev)

Payload Before After Improvement
GitHub-style REST API (pure ASCII) 1688 ± 97 us 1462 ± 39 us 13.4%
Escape-heavy (\n \t \ \uXXXX) 912 ± 77 us 776 ± 30 us 14.9%

Testing

  • All 312 existing tests pass (default features + scalar-only)
  • Clippy clean (-D warnings)
  • Scanner crosscheck (scalar vs AVX2) passes
  • JSONTestSuite conformance preserved
  • Third-party fixture tests (cJSON, simdjson) pass

Copilot AI review requested due to automatic review settings May 22, 2026 17:32
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Warning

Rate limit exceeded

@membphis has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 58 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8d5ed18c-383d-4305-bcce-5169c1d3b528

📥 Commits

Reviewing files that changed from the base of the PR and between 8e49d18 and 3f6822a.

📒 Files selected for processing (9)
  • CLAUDE.md
  • README.md
  • benches/lua_bench.lua
  • docs/benchmarks.md
  • src/decode/number.rs
  • src/doc.rs
  • src/validate/classify.rs
  • src/validate/mod.rs
  • src/validate/strings/avx2.rs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fuse-eager-passes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

membphis added 6 commits May 22, 2026 17:34
Merge validate_depth, validate_trailing, and validate_eager_values into a
single fused pass. Replace AVX2 string validation with PSHUFB nibble-LUT
byte classifier. Add AVX-512 dual path. Add SIMD number validation.
8 tasks: PSHUFB classifier, AVX2 string rewrite, AVX-512 path,
SIMD number validation, pass fusion, doc.rs wiring, full test
verification, CLAUDE.md update.
Add #[allow(dead_code)] to classify.rs module and to validate_trailing /
validate_eager_values (kept for tests and planned future use). Fix
overindented doc list item.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes qjson’s eager validation path by (1) fusing multiple post-scan validation passes into a single traversal over the scanner’s indices, and (2) replacing the AVX2 string fast-path with a PSHUFB (nibble-LUT) byte classifier shared across validation routines.

Changes:

  • Add validate_eager_fused to merge depth checking, trailing-content detection, and grammar/value validation into one O(indices) walk, and wire it into Document::parse_with_options for eager mode.
  • Introduce a new PSHUFB nibble-LUT classifier module and rewrite AVX2 string validation to use its attention mask.
  • Add design/plan documentation for the SIMD + pass-fusion work; update .gitignore.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/validate/strings/avx2.rs Reworks AVX2 string-span validation to iterate “interesting” bytes using the classifier-derived mask.
src/validate/mod.rs Adds validate_eager_fused and test coverage for fused behavior; keeps older passes for lazy mode/tests.
src/validate/classify.rs New shared PSHUFB nibble-LUT classifier + exhaustive LUT correctness tests.
src/doc.rs Switches eager parsing to the fused validator; retains lazy depth-only validation.
docs/superpowers/specs/2026-05-22-fuse-eager-simd-design.md Design spec describing pass fusion + SIMD classifier approach.
docs/superpowers/plans/2026-05-22-fuse-eager-simd-plan.md Implementation plan/checklist for the overall optimization effort.
.gitignore Adds an ignore rule for docs/superpowers/specs/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/validate/mod.rs
Comment on lines +238 to +248
strings::validate_string_span(&buf[pos + 1 .. close])?;

let cur = stack.last_mut().ok_or(qjson_err::QJSON_PARSE_ERROR)?;
match *cur {
CtxKind::ObjAfterOpen | CtxKind::ObjAfterComma => {
*cur = CtxKind::ObjAfterKey;
}
CtxKind::Top
| CtxKind::ArrAfterOpen
| CtxKind::ArrAfterComma
| CtxKind::ObjAfterColon => {
Comment thread src/validate/mod.rs
Comment on lines +269 to +283
// token, validate it, then check for trailing content beyond it.
if matches!(*stack.last().unwrap(), CtxKind::Top) && depth == 0 {
let mut scan = prev_end;
while scan < buf.len() && is_ws(buf[scan]) { scan += 1; }
if scan < buf.len() {
let mut end = scan;
while end < buf.len() && !is_ws(buf[end]) { end += 1; }
validate_scalar(&buf[scan..end])?;
*stack.last_mut().unwrap() = CtxKind::TopDone;

let mut p = end;
while p < buf.len() && is_ws(buf[p]) { p += 1; }
if p < buf.len() {
return Err(qjson_err::QJSON_TRAILING_CONTENT);
}
Comment thread src/validate/classify.rs
Comment on lines +175 to +178
pub(crate) unsafe fn classify_str_mask(chunk: __m256i) -> u32 {
let lo_lut = make_lut(&STR_LO_TABLE);
let hi_lut = make_lut(&STR_HI_TABLE);
let classes = classify_chunk(chunk, lo_lut, hi_lut);
Comment on lines +8 to +14
The eager decode path (`Document::parse_with_options` in `src/doc.rs`) runs **4 independent passes** over the `indices` array after structural scanning:

1. `validate_depth` — depth counting
2. `validate_trailing` — reject trailing non-whitespace
3. `validate_eager_values` — grammar state machine + string validation + number validation

Each pass is a scalar O(indices) walk. Additionally, string validation SIMD (`strings/avx2.rs`) is conservative: it hands off to scalar on the *first* interesting byte found (backslash, control, or high-bit), leaving most of the SIMD register width unused on mixed content. Number validation has no SIMD path at all.
Comment on lines +54 to +64
| 3 | Printable ASCII (0x20..0x7E, excluding backslash) |

**Algorithm per 32-byte chunk:**
1. Split each byte into high-nibble and low-nibble via shift + mask.
2. `_mm256_shuffle_epi8(lo_nibble, lo_lut)` and `_mm256_shuffle_epi8(hi_nibble, hi_lut)`.
3. AND low and high LUT results → per-byte class bitmask.
4. If any bit 0 set → `QJSON_INVALID_STRING` (control char).
5. If bits 1 and 2 are zero → pure printable ASCII, advance 32 bytes.
6. Otherwise: scan class bitmask for backslash positions, validate escape sequences; for high-bit bytes, run SIMD-enhanced UTF-8 validation.

Key improvement: the classifier tells us **exactly which bytes need what kind of attention**, rather than a binary "there's a problem here". Multiple backslashes in one chunk are all located without re-scanning. High-bit bytes are identified by position, enabling batch UTF-8 validation.
@membphis membphis force-pushed the fuse-eager-passes branch from ddb53eb to c62b28d Compare May 22, 2026 17:38
Copilot AI review requested due to automatic review settings May 22, 2026 17:40
membphis added 2 commits May 22, 2026 17:41
- README: update Status section with fused validation + PSHUFB description
- benchmarks.md: add eager validation micro-benchmark section (13-15% improvement), update observation #3
- CLAUDE.md: update Phase 1 architecture and Layout sections
Note 13-15% eager validation improvement from fused pass + PSHUFB
classifier. Clarify that Lua bench numbers already include this.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comment thread src/validate/mod.rs
} else {
CtxKind::ArrAfterOpen
});
}
Comment thread src/validate/classify.rs
Comment on lines +175 to +182
pub(crate) unsafe fn classify_str_mask(chunk: __m256i) -> u32 {
let lo_lut = make_lut(&STR_LO_TABLE);
let hi_lut = make_lut(&STR_HI_TABLE);
let classes = classify_chunk(chunk, lo_lut, hi_lut);
let zero = _mm256_cmpeq_epi8(classes, _mm256_setzero_si256());
let zero_mask = _mm256_movemask_epi8(zero) as u32;
zero_mask ^ 0xFFFF_FFFF // invert: 1 = interesting
}
membphis added 2 commits May 22, 2026 17:46
Update README and benchmarks.md tables with actual throughput,
speedup, and memory delta numbers from make bench on current
branch (includes fused validation + decode deduplication).
- Increase ROUNDS from 5 to 10 for noise reduction
- Switch from median to mean ops/s across rounds
- Update all 3 tables (throughput, speedup, memory) with fresh
  make bench data on current branch
Copilot AI review requested due to automatic review settings May 22, 2026 17:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Comment thread src/validate/classify.rs
Comment on lines +175 to +178
pub(crate) unsafe fn classify_str_mask(chunk: __m256i) -> u32 {
let lo_lut = make_lut(&STR_LO_TABLE);
let hi_lut = make_lut(&STR_HI_TABLE);
let classes = classify_chunk(chunk, lo_lut, hi_lut);
Comment thread docs/benchmarks.md
Comment on lines 82 to +84
| Scenario | Size | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | `qjson.decode + qjson.encode` |
|---|---:|---:|---:|---:|---:|---:|
| small | 2.1 KB | 94,075 | 108,108 | 127,214 | 120,398 | 203,666 |
| medium | 60.4 KB | 9,041 | 83,043 | 123,487 | 214,500 | 214,408 |
| github-100k | 100 KB | 2,238 | 2,047 | 6,010 | 5,994 | 6,701 |
| 100k | 100 KB | 5,302 | 32,248 | 109,649 | 102,564 | 114,548 |
| 200k | 200 KB | 2,659 | 19,040 | 90,090 | 92,251 | 106,383 |
| 500k | 500 KB | 1,052 | 7,062 | 34,722 | 35,336 | 37,453 |
| 1m | 1.00 MB | 517 | 3,538 | 16,520 | 16,988 | 17,261 |
| 2m | 2.00 MB | 258 | 2,026 | 9,021 | 8,580 | 9,033 |
| 5m | 5.00 MB | 102 | 663 | 2,982 | 3,728 | 3,829 |
| 10m | 10.00 MB | 50 | 402 | 1,899 | 1,918 | 1,925 |
| interleaved (100k/200k/500k/1m, cycled) | — | 1,141 | 9,544 | 34,043 | 33,611 | 32,752 |
|---|---|---:|---:|---:|---:|---:|---:|
| small | 2.1 KB | 100,127 | 109,588 | 130,867 | 105,038 | 210,886 |
Comment thread docs/benchmarks.md Outdated
| 2m | 35.0× | 4.5× | 33.3× | 4.2× |
| 5m | 29.2× | 4.5× | 36.5× | 5.6× |
| 10m | 38.0× | 4.7× | 38.4× | 4.8× |
|---|---|---:|---:|---:|---:|
Comment thread docs/benchmarks.md
Comment on lines 117 to +119
| Scenario | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | `qjson.decode + qjson.encode` |
|---|---:|---:|---:|---:|---:|
| small | +15,493 | +15,500 | +4,066 | +15,116 | +11,140 |
| medium | +1,955 | +2,660 | +333 | +1,114 | +1,120 |
| github-100k | +12,018 | +3,527 | +14 | +536 | +230 |
| 100k | +485 | +748 | +67 | +692 | +229 |
| 200k | +392 | +523 | +34 | +346 | +112 |
| 500k | +577 | +630 | +14 | +139 | +45 |
| 1m | +1,082 | +1,121 | +10 | +104 | +34 |
| 2m | +1,155 | +1,248 | +14 | +208 | +45 |
| 5m | +1,316 | +1,538 | +14 | +400 | +45 |
| 10m | +1,583 | +2,014 | +14 | +708 | +45 |
| interleaved | +3,356 | +4,404 | +268 | +2,771 | +897 |
|---|---|---:|---:|---:|---:|---:|
| small | -2,359 | +8,055 | +8,159 | +8,643 | +2,701 |
membphis added 2 commits May 22, 2026 18:04
~100 KB array of objects with Chinese text, emoji, and mixed
ASCII/CJK field names. Stresses PSHUFB byte classifier (high-bit bytes)
and UTF-8 validation path.
Pre: 4,720 ops/s → Post: 5,064 ops/s (+7.3%)
100 KB array of objects with Chinese text and emoji. Stresses PSHUFB
UTF-8/high-bit classification. Pre-opt: 4,720 ops/s → Post-opt: 4,605
ops/s. qjson memory delta: +26 KB (cjson: +17 MB).
Copilot AI review requested due to automatic review settings May 22, 2026 18:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Comment thread src/validate/classify.rs
Comment on lines +175 to +181
pub(crate) unsafe fn classify_str_mask(chunk: __m256i) -> u32 {
let lo_lut = make_lut(&STR_LO_TABLE);
let hi_lut = make_lut(&STR_HI_TABLE);
let classes = classify_chunk(chunk, lo_lut, hi_lut);
let zero = _mm256_cmpeq_epi8(classes, _mm256_setzero_si256());
let zero_mask = _mm256_movemask_epi8(zero) as u32;
zero_mask ^ 0xFFFF_FFFF // invert: 1 = interesting
Comment thread docs/benchmarks.md Outdated
| 5m | 5.00 MB | 102 | 663 | 2,982 | 3,728 | 3,829 |
| 10m | 10.00 MB | 50 | 402 | 1,899 | 1,918 | 1,925 |
| interleaved (100k/200k/500k/1m, cycled) | — | 1,141 | 9,544 | 34,043 | 33,611 | 32,752 |
|---|---|---:|---:|---:|---:|---:|---:|
Comment thread docs/benchmarks.md Outdated
| 5m | 31.4× | 4.9× | 31.5× | 4.9× |
| 10m | 29.5× | 3.8× | 31.0× | 4.0× |

## Results — memory delta (KB retained after 5 rounds)
Comment thread docs/benchmarks.md Outdated
| 5m | +1,316 | +1,538 | +14 | +400 | +45 |
| 10m | +1,583 | +2,014 | +14 | +708 | +45 |
| interleaved | +3,356 | +4,404 | +268 | +2,771 | +897 |
|---|---|---:|---:|---:|---:|---:|
Comment thread docs/benchmarks.md
@@ -80,33 +80,35 @@ Numbers below come from one such run.
Each row is "parse + access request fields" on the named payload.
membphis added 2 commits May 22, 2026 18:23
- Replace safe_sub-based truncation with integer multiples of cjk_body
  to avoid splitting multi-byte sequences
- Skip simdjson for cjk scenario (no_simdjson flag)
- Add safe_sub utility (unused now, kept for potential future use)
- Update cjk-100k data: qjson.parse mean 5,018 ops/s (2.3x vs cjson)
Remove no_simdjson flag. simdjson mean 2,367 ops/s on cjk-100k.
qjson.parse/cjson = 2.3×, qjson.parse/simdjson = 2.1×.
Copilot AI review requested due to automatic review settings May 22, 2026 18:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Comment thread src/validate/mod.rs
Comment on lines +269 to +283
// token, validate it, then check for trailing content beyond it.
if matches!(*stack.last().unwrap(), CtxKind::Top) && depth == 0 {
let mut scan = prev_end;
while scan < buf.len() && is_ws(buf[scan]) { scan += 1; }
if scan < buf.len() {
let mut end = scan;
while end < buf.len() && !is_ws(buf[end]) { end += 1; }
validate_scalar(&buf[scan..end])?;
*stack.last_mut().unwrap() = CtxKind::TopDone;

let mut p = end;
while p < buf.len() && is_ws(buf[p]) { p += 1; }
if p < buf.len() {
return Err(qjson_err::QJSON_TRAILING_CONTENT);
}
Comment thread src/validate/classify.rs
Comment on lines +131 to +182
pub(crate) unsafe fn classify_chunk(chunk: __m256i, lo_lut: __m256i, hi_lut: __m256i) -> __m256i {
let nib_mask = _mm256_set1_epi8(0x0Fu8 as i8);

let lo_nibs = _mm256_and_si256(chunk, nib_mask);
let hi_shift = _mm256_srli_epi32::<4>(chunk);
let hi_nibs = _mm256_and_si256(hi_shift, nib_mask);

let lo_class = _mm256_shuffle_epi8(lo_lut, lo_nibs);
let hi_class = _mm256_shuffle_epi8(hi_lut, hi_nibs);

_mm256_and_si256(lo_class, hi_class)
}

/// Build a 32-byte `__m256i` from a 16-entry nibble LUT by duplicating
/// the table into both 128-bit lanes.
#[cfg(all(target_arch = "x86_64", feature = "avx2"))]
unsafe fn make_lut(table: &[u8; 16]) -> __m256i {
let t = table;
_mm256_setr_epi8(
t[0] as i8, t[1] as i8, t[2] as i8, t[3] as i8,
t[4] as i8, t[5] as i8, t[6] as i8, t[7] as i8,
t[8] as i8, t[9] as i8, t[10] as i8, t[11] as i8,
t[12] as i8, t[13] as i8, t[14] as i8, t[15] as i8,
t[0] as i8, t[1] as i8, t[2] as i8, t[3] as i8,
t[4] as i8, t[5] as i8, t[6] as i8, t[7] as i8,
t[8] as i8, t[9] as i8, t[10] as i8, t[11] as i8,
t[12] as i8, t[13] as i8, t[14] as i8, t[15] as i8,
)
}

/// Classify a 32-byte chunk for string validation.
///
/// Returns a bitmask (one bit per byte) where set bits indicate bytes
/// that have any interesting class bit (CTRL | BS | HIGH). Zero means
/// the entire chunk is pure printable ASCII without escapes or UTF-8.
#[cfg(all(target_arch = "x86_64", feature = "avx2"))]
#[target_feature(enable = "avx2")]
pub(crate) unsafe fn classify_str_chunk(chunk: __m256i) -> u32 {
classify_str_mask(chunk)
}

/// Returns a bitmask of bytes that match CTRL | BS | HIGH.
#[cfg(all(target_arch = "x86_64", feature = "avx2"))]
#[target_feature(enable = "avx2")]
pub(crate) unsafe fn classify_str_mask(chunk: __m256i) -> u32 {
let lo_lut = make_lut(&STR_LO_TABLE);
let hi_lut = make_lut(&STR_HI_TABLE);
let classes = classify_chunk(chunk, lo_lut, hi_lut);
let zero = _mm256_cmpeq_epi8(classes, _mm256_setzero_si256());
let zero_mask = _mm256_movemask_epi8(zero) as u32;
zero_mask ^ 0xFFFF_FFFF // invert: 1 = interesting
}
Comment thread docs/benchmarks.md
Comment on lines 76 to +84
Numbers below come from one such run.

## Results — throughput (median ops/s)

Each row is "parse + access request fields" on the named payload.

| Scenario | Size | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | `qjson.decode + qjson.encode` |
|---|---:|---:|---:|---:|---:|---:|
| small | 2.1 KB | 94,075 | 108,108 | 127,214 | 120,398 | 203,666 |
| medium | 60.4 KB | 9,041 | 83,043 | 123,487 | 214,500 | 214,408 |
| github-100k | 100 KB | 2,238 | 2,047 | 6,010 | 5,994 | 6,701 |
| 100k | 100 KB | 5,302 | 32,248 | 109,649 | 102,564 | 114,548 |
| 200k | 200 KB | 2,659 | 19,040 | 90,090 | 92,251 | 106,383 |
| 500k | 500 KB | 1,052 | 7,062 | 34,722 | 35,336 | 37,453 |
| 1m | 1.00 MB | 517 | 3,538 | 16,520 | 16,988 | 17,261 |
| 2m | 2.00 MB | 258 | 2,026 | 9,021 | 8,580 | 9,033 |
| 5m | 5.00 MB | 102 | 663 | 2,982 | 3,728 | 3,829 |
| 10m | 10.00 MB | 50 | 402 | 1,899 | 1,918 | 1,925 |
| interleaved (100k/200k/500k/1m, cycled) | — | 1,141 | 9,544 | 34,043 | 33,611 | 32,752 |
|---|---|---:|---:|---:|---:|---:|---:|
| small | 2.1 KB | 100,127 | 109,588 | 130,867 | 105,038 | 210,886 |
Comment thread docs/benchmarks.md
Comment on lines 82 to +88
| Scenario | Size | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | `qjson.decode + qjson.encode` |
|---|---:|---:|---:|---:|---:|---:|
| small | 2.1 KB | 94,075 | 108,108 | 127,214 | 120,398 | 203,666 |
| medium | 60.4 KB | 9,041 | 83,043 | 123,487 | 214,500 | 214,408 |
| github-100k | 100 KB | 2,238 | 2,047 | 6,010 | 5,994 | 6,701 |
| 100k | 100 KB | 5,302 | 32,248 | 109,649 | 102,564 | 114,548 |
| 200k | 200 KB | 2,659 | 19,040 | 90,090 | 92,251 | 106,383 |
| 500k | 500 KB | 1,052 | 7,062 | 34,722 | 35,336 | 37,453 |
| 1m | 1.00 MB | 517 | 3,538 | 16,520 | 16,988 | 17,261 |
| 2m | 2.00 MB | 258 | 2,026 | 9,021 | 8,580 | 9,033 |
| 5m | 5.00 MB | 102 | 663 | 2,982 | 3,728 | 3,829 |
| 10m | 10.00 MB | 50 | 402 | 1,899 | 1,918 | 1,925 |
| interleaved (100k/200k/500k/1m, cycled) | — | 1,141 | 9,544 | 34,043 | 33,611 | 32,752 |
|---|---|---:|---:|---:|---:|---:|---:|
| small | 2.1 KB | 100,127 | 109,588 | 130,867 | 105,038 | 210,886 |
| medium | 60.4 KB | 8,701 | 77,936 | 135,700 | 177,650 | 164,142 |
| github-100k | 100 KB | 2,106 | 2,247 | 5,964 | 5,900 | 6,321 |
| cjk-100k | 99 KB | 2,203 | 2,367 | 4,965 | 5,363 | 6,063 |
| 100k | 100 KB | 4,985 | 32,232 | 130,621 | 125,348 | 145,613 |
Comment thread docs/benchmarks.md
Comment on lines 111 to +121
@@ -115,18 +117,19 @@
from the last round may still be included.

| Scenario | cjson | simdjson | `qjson.parse` | `qjson.decode + access content` | `qjson.decode + qjson.encode` |
|---|---:|---:|---:|---:|---:|
| small | +15,493 | +15,500 | +4,066 | +15,116 | +11,140 |
| medium | +1,955 | +2,660 | +333 | +1,114 | +1,120 |
| github-100k | +12,018 | +3,527 | +14 | +536 | +230 |
| 100k | +485 | +748 | +67 | +692 | +229 |
| 200k | +392 | +523 | +34 | +346 | +112 |
| 500k | +577 | +630 | +14 | +139 | +45 |
| 1m | +1,082 | +1,121 | +10 | +104 | +34 |
| 2m | +1,155 | +1,248 | +14 | +208 | +45 |
| 5m | +1,316 | +1,538 | +14 | +400 | +45 |
| 10m | +1,583 | +2,014 | +14 | +708 | +45 |
| interleaved | +3,356 | +4,404 | +268 | +2,771 | +897 |
|---|---|---:|---:|---:|---:|---:|
| small | -2,359 | +8,055 | +8,159 | +8,643 | +2,701 |
Comment thread docs/benchmarks.md Outdated
| 5m | +1,316 | +1,538 | +14 | +400 | +45 |
| 10m | +1,583 | +2,014 | +14 | +708 | +45 |
| interleaved | +3,356 | +4,404 | +268 | +2,771 | +897 |
|---|---|---:|---:|---:|---:|---:|
Comment thread docs/benchmarks.md
Comment on lines 97 to +105
### Speed-up vs. baselines

| Scenario | `qjson.parse` / cjson | `qjson.parse` / simdjson | `qjson.decode + access content` / cjson | `qjson.decode + access content` / simdjson |
|---|---:|---:|---:|---:|
| small | 1.4× | 1.2× | 1.3× | 1.1× |
| medium | 13.7× | 1.5× | 23.7× | 2.6× |
| github-100k | 2.7× | 2.9× | 2.7× | 2.9× |
| 100k | 20.7× | 3.4× | 19.3× | 3.2× |
| 200k | 33.9× | 4.7× | 34.7× | 4.8× |
| 500k | 33.0× | 4.9× | 33.6× | 5.0× |
| 1m | 32.0× | 4.7× | 32.9× | 4.8× |
| 2m | 35.0× | 4.5× | 33.3× | 4.2× |
| 5m | 29.2× | 4.5× | 36.5× | 5.6× |
| 10m | 38.0× | 4.7× | 38.4× | 4.8× |
|---|---|---:|---:|---:|---:|
| small | 1.3× | 1.2× | 1.0× | 1.0× |
| medium | 15.6× | 1.7× | 20.4× | 2.3× |
| github-100k | 2.8× | 2.7× | 2.8× | 2.6× |
| cjk-100k | 2.3× | 2.1× | 2.4× | 2.3× |
| 100k | 26.2× | 4.1× | 25.1× | 3.9× |
- Fused validator: check trailing content before string/scalar validation
  to preserve old validate_trailing error-code precedence
  (e.g. '\"\\q" x' → QJSON_TRAILING_CONTENT, not QJSON_INVALID_STRING)
- Fused validator: detect TopDone+structural as QJSON_TRAILING_CONTENT
  (e.g. '42 {}' → trailing, not PARSE_ERROR)
- classify_str_mask: precompute LUT vectors as 32-byte aligned statics,
  load with _mm256_load_si256 instead of rebuilding per call
- benchmarks.md: fix table separator column counts, update
  '5 rounds'→'10 rounds', 'median'→'mean' in section titles
- Add 3 regression tests for error-code precedence
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants