Skip to content

refactor(codec)!: location/read/redact handler API#148

Merged
martsokha merged 8 commits into
mainfrom
refactor/codec-redactions
May 19, 2026
Merged

refactor(codec)!: location/read/redact handler API#148
martsokha merged 8 commits into
mainfrom
refactor/codec-redactions

Conversation

@martsokha
Copy link
Copy Markdown
Member

@martsokha martsokha commented May 18, 2026

Summary

Ground-up redesign of the codec around locations as the cross-boundary identifier. Handlers no longer yield content tuples; they expose cheap identity (locations()), on-demand typed reads (read(&L)), and direct batch redaction (redact(Redactions<L, R>)).

Earlier commits on this branch landed Redactions<S, R> as a grouped, overlap-aware collection. This PR builds on that to remove the rest of the Span-based reading/editing pipeline.

Shape

Types

  • Located<L> { source: ContentSource, location: L } — handlers tag every location they emit with their ContentSource, so provenance travels alongside identity without polluting the location's PartialEq.
  • LocationStream<'a, L> — replaces SpanStream<'a, Id, Data>. Cheap to enumerate; no payload allocation.
  • Span<L, D> { source, location, data } — same name and shape as before, but not on handler traits. Codec utility for callers that want enumerate-with-content.

Handler capability traits (TextHandler / ImageHandler / AudioHandler)

trait TextHandler: Handler {
    fn locations(&self) -> LocationStream<'_, TextLocation>;
    async fn read(&self, loc: &TextLocation) -> Option<TextData>;
    async fn redact(&mut self, redactions: Redactions<TextLocation, TextRedaction>) -> Result<()>;
}

The *Transform blanket-impl traits and edit_* methods are gone. Byte-level replacement logic lives in pub(crate) helpers under transform/text/apply.rs and transform/image/apply.rs; handlers walk the Redactions collection and call the helper on the affected slice of their internal model.

ContentHandle surface

  • text_locations() / image_locations() / audio_locations() — typed streams per modality.
  • read_text(&TextLocation) / read_image(&ImageLocation) / read_audio(&AudioLocation) — typed per-modality fetches. Replaces the unsound value_at(&Location) -> Option<String> that returned String even for image/audio locations.
  • apply_text_redactions / apply_image_redactions / apply_audio_redactions — unchanged from the caller's perspective.

Layout changes

  • handler/tabular/ folded into handler/text/CsvHandler and XlsxHandler implement TextHandler; there is no TabularHandler capability trait. Tabular redaction is a transform-layer concern, not a handler modality.
  • document/span.rs deleted, then reintroduced — but only as a codec utility (Span<L, D>), never on trait signatures.
  • transform/{text,image,audio,tabular}/transform.rs (the blanket-impl files) deleted. Replaced by apply.rs helpers.

Engine

  • Document gains collect_text/image/audio_locations + read_text/image/audio + a narrower value_at(&Location) -> Option<String> (text and audio only).
  • EntityRecognitionOp, PatternRecognitionOp, VisualExtractionOp, and ValidationOp walk locations() and build Vec<Span<L, D>> work lists via Span::from_located + read_*.
  • RedactionApplicator switches from value_at(&Location) to read_text(loc).into_inner() — typed, no enum dispatch.
  • Document::apply_tabular_redactions dropped; nothing in the engine drove it after TabularTransform's removal.

Style sweeps (workspace-wide)

Two parallel mechanical sweeps in the last commit:

  1. Rustdoc reference links — 83 inline [\Foo`](crate::path)forms across 51 files moved to bottom-of-docblock[`Foo`]: crate::path` reference style.
  2. Inline importsuse statements inside function bodies, impl scopes, and cfg-gated inner blocks hoisted to top-of-file (with cfg-attr wrapping where needed). 14+ production sites; test-module-top and macro-body uses left alone.

Commit walkthrough

  • `refactor(codec)!: reshape handler API around locations and read/redact` — new types, traits, ContentHandle surface. Crate intentionally doesn't compile (concrete handlers not yet migrated).
  • `refactor(codec): migrate concrete handlers to location/read/redact API` — all 11 concrete handlers + transform helpers. Codec builds, 87/87 codec tests pass.
  • `refactor(engine): switch to location/read/redact codec API` — engine call sites switched over. Workspace builds.
  • `refactor(codec, engine): reintroduce Span<L, D> for engine work lists` — replaces `(Location, Data)` tuples in engine ops with the named struct.
  • `style(workspace): rustdoc refs at bottom, no inline imports` — mechanical sweeps.

Test plan

  • `cargo check --workspace --all-features` clean
  • `cargo clippy --workspace --all-features -- -D warnings` clean
  • `cargo test --workspace --all-features` — all green
  • `cargo doc --workspace --no-deps --all-features` — no new warnings (only pre-existing nvisy_server lib/bin output collision, unrelated)

🤖 Generated with Claude Code

martsokha and others added 2 commits May 18, 2026 23:12
Reworks the codec *Transform layer to take a generic Redactions
collection keyed by span identity instead of a flat slice.

New types in nvisy-codec/transform/:
- `Redactions<S, R>` (redactions.rs): groups payloads by span, with
  overlap detection on insert. Consumed via IntoIterator; no raw
  map access.
- `Mergeable` (mergeable.rs): trait for the redaction payload `R`.
  `overlaps()` for detection, `try_merge()` for merging with
  honest failure semantics (returns `None` when outputs differ).
- `ConflictPolicy` (policy.rs): Reject / Merge / Replace.
  Merge falls back to `InsertError::NotMergeable` when `try_merge`
  returns `None`, rather than picking a magic default.

*Redaction structs lose their span_id field:
- TextRedaction { start, end, output }
- ImageRedaction { bounding_box, output }
- AudioRedaction { time_span, output }
- TabularRedaction { start, end, output }
- Each gets a `::new()` constructor.
- Mergeable impls reuse ontology primitives' overlaps()/union().

Transform traits now take `Redactions<Location, Payload>` by value:
- TextTransform::redact_text
- ImageTransform::redact_images
- AudioTransform::redact_audio
- TabularTransform::redact_tabular

Transforms iterate `for (loc, mut items) in redactions` instead of
re-grouping a flat slice. Overlap checking is no longer duplicated
per handler — the collection enforces it on insert.

Engine apply.rs builds redactions via `try_insert`; insertion
failures surface as validation errors with the rejected/unmergeable
reason. Tests use `*Redaction::new()` and `TabularLocationBuilder`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sweeps the workspace for inline-qualified `nvisy_core::Result<...>` and
`nvisy_core::Error::...` uses and adds proper `use nvisy_core::{...};`
imports following the existing convention used across other engine
files.

Affected:
- nvisy-engine/src/operation/redaction/apply.rs
- nvisy-engine/src/operation/mod.rs
- nvisy-engine/src/utility/encryption/provider.rs
- nvisy-provider/src/http/mod.rs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@martsokha martsokha self-assigned this May 18, 2026
@martsokha martsokha added feat request for or implementation of a new feature codec nvisy-codec: loaders, transforms, handlers labels May 18, 2026
martsokha and others added 6 commits May 18, 2026 23:25
The four redaction payload structs (TextRedaction, ImageRedaction,
AudioRedaction, TabularRedaction) are constructed via ::new() and
their fields are only read inside nvisy-codec (by transforms and by
Mergeable impls). Tightens the surface to pub(crate) — external
crates already use ::new() exclusively.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces Span<Id, Data> + SpanStream with Located<L> + LocationStream.
Handler capability traits now expose locations() (cheap identity-only
streams), read(&L) -> Option<*Data> (typed per-modality fetch), and
redact(Redactions<L, R>) -> Result<()> (direct batch application).
ContentHandle gains typed read_text/read_image/read_audio in place of
the modality-erased value_at(&Location) -> Option<String>.

Tabular handlers (CSV, XLSX) move into handler/text since they
implement TextHandler. The *Transform blanket-impl traits are removed;
helpers will live alongside the per-modality instruction types.

Concrete handlers still implement the old API and do not compile after
this commit — follow-up commits migrate them and the engine callers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every text, image, audio, and rich handler now implements the new
capability traits directly: locations() yields cheap Located<L>
identities, read(&loc) fetches typed *Data on demand, and
redact(Redactions<L, R>) applies a batch in place.

Byte-level replacement logic lives in pub(crate) helpers under
transform/text/apply.rs and transform/image/apply.rs; handlers walk
the Redactions collection and call the helper on the affected slice
of their internal model (lines, cells, pages, image buffer).

Per-handler tests are rewritten against the new API. The codec crate
compiles and 85/85 codec tests pass. The engine still references the
old API and does not compile — that's the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Engine operations now consume the new codec surface:

- Document gains collect_text/image/audio_locations + read_text/image/audio
  + value_at(&Location); old text_spans/image_spans/audio_spans and
  the *_at typed accessors are gone.
- EntityRecognitionOp and PatternRecognitionOp build their
  (TextLocation, TextData) work lists by walking locations() and
  calling read_text per item.
- VisualExtractionOp builds (ContentSource, ImageData) pairs the same
  way for OCR batches and verification.
- ValidationOp concatenates current text by reading each location.
- RedactionApplicator reads each entity's text value via read_text
  instead of the old enum-typed value_at(&Location).
- Document::apply_tabular_redactions is dropped — nothing in the
  engine drove it, and the (row, col) → byte-offset bridge it relied
  on (TabularTransform) is gone with the rest of the *Transform
  blanket impls.

cargo check + clippy + tests all clean across the workspace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Engine operations that enumerate a document's locations together
with their content (NER detector, pattern scanner, OCR extractor)
previously walked locations() + read_*() and pushed (Location, Data)
tuples into a Vec. Tuples obscure intent and force destructuring at
every call site.

Span<L, D> { source, location, data } lives in codec — same shape as
the type we deleted at the start of the refactor, but intentionally
*not* used on handler trait signatures. Handlers still expose only
cheap identity via locations() plus on-demand read(); engine callers
that want enumerate-with-content build Span::from_located in their
read loops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two parallel sweeps:

1. Inline rustdoc reference-style links — [`Foo`](crate::path::Foo) —
   moved to bottom-of-docblock references: [`Foo`] + `[`Foo`]:
   crate::path::Foo` after a blank /// separator. 83 conversions
   across 51 files.

2. Inline `use` statements inside function bodies, impl scopes, and
   non-test inner blocks hoisted to top of file. Cfg-gated inline
   uses preserved their cfg with a wrapping #[cfg(...)] on the
   hoisted form. Macro-body uses and test-module-top uses left as-is.

Tests, clippy, and rustdoc all clean across the workspace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@martsokha martsokha changed the title refactor(codec, engine): Redactions<S, R> collection with overlap policy refactor(codec)!: location/read/redact handler API May 19, 2026
@martsokha martsokha merged commit 6eaa3bc into main May 19, 2026
5 checks passed
@martsokha martsokha deleted the refactor/codec-redactions branch May 19, 2026 01:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codec nvisy-codec: loaders, transforms, handlers feat request for or implementation of a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant