proposal: streaming evaluation#15
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request proposes a design for streaming evaluation to optimize policy enforcement by avoiding full body buffering when only specific fields are required. The feedback focuses on architectural improvements for efficiency and binary size, specifically suggesting the use of a trie for path matching, refining the streaming event model to minimize allocations, and utilizing a shared lexer to reduce code duplication in the JSON parser.
| ```zig | ||
| const BodyDeps = struct { | ||
| refs_body: bool = false, | ||
| refs_paths: std.ArrayList([]const []const u8), // prefix tree |
There was a problem hiding this comment.
The refs_paths field is described as a "prefix tree" but typed as a flat ArrayList of paths. For efficient matching during streaming, especially if a policy has many references, a true trie or prefix tree structure would be more appropriate to allow
| pub const StreamEvent = union(enum) { | ||
| enter_object: []const u8, // path so far | ||
| leave_object: void, | ||
| field: struct { path: [][]const u8, value: Value }, | ||
| done: void, | ||
| }; |
There was a problem hiding this comment.
The StreamEvent design appears to mix hierarchical and flat approaches. It provides enter_object but also includes a full path in each field event, which is redundant and expensive to allocate during streaming. Furthermore, using the recursive Value type for fields implies that nested objects are still fully buffered before being emitted. A more efficient streaming API would emit separate events for container boundaries and only use Value for scalar leaves.
| - The streaming parser nearly doubles the size of `src/json.zig`. | ||
| Worth running a sizing experiment before committing: does | ||
| `--release=small` keep zopa.wasm under 80 KB with both paths | ||
| present? |
There was a problem hiding this comment.
Configure-time analysis pass that categorises a compiled module by
how it references the request body:
- no_body_refs: policy never touches input.body
- prefix_only: policy reads specific body sub-paths
- full_tree: policy reads input.body whole or iterates over a
body-rooted ref (full body must be buffered)
Conservative classifier: when uncertain, returns full_tree.
Tests cover header-only, body-leaf, bare body, body iteration, and
the bare 'body' shorthand (no input prefix). prefix_count returns
the number of distinct body sub-paths so callers can size buffers
accordingly.
Streaming evaluation (per docs/proposals/streaming-evaluation.md)
needs the body-aware callback path landing first. This analyser is
the configure-time piece; it ships independently and the proxy-wasm
shim can call it in a follow-up to skip proxy_on_request_body when
the policy doesn't need the body.
ci.yml gains a test-unit job (parity with feat/string-builtins,
feat/composite-ref-iteration, etc.).
918ed0b to
2198054
Compare
Design doc only. Tracks the streaming-evaluation item from ROADMAP longer term.
See
docs/proposals/streaming-evaluation.md.Summary:
src/json.zigthat emits(path, value)events; evaluator decides as soon as every referenced prefix is resolved.proxy_on_request_bodyentirely. Prefix-only policies short-circuit before full buffering.Status: design only, no implementation. Depends on body-aware-policies (#6) landing first.