refactor(ontology): primitives, LanguageTag, Transcription, derive_more#142
Merged
Conversation
… Transcription, derive_more cleanup - Rename `math` module to `primitive` across workspace - Add `oxilangtag` dependency, create typed `LanguageTag` newtype for BCP-47 tags - Use `LanguageTag` in `Entity::language` and `Transcription::language` - Rework `Transcription`: remove `text` field, add `TranscriptSegment` with `time_span`, `speaker_id`, `confidence` for diarization support - Add `Transcription::text()` method to join segments - Replace manual impls with derive_more across ontology types: - `Annotations`: Deref, DerefMut, From, IntoIterator - `ContentSource`: Display - `ContentArtifacts`: From - `Contexts`: Deref, DerefMut, From - `ContextEntryData`: From - `Policies`: Deref, DerefMut - `RedactionMap`: Deref, DerefMut - `GraphNodeKind`: Display, From Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- TextArtifacts: add language and char_count fields - TabularArtifacts: add row_count, column_count, sparse headers (ColumnHeader) - RichArtifacts: add tabular field alongside text and image - ContentArtifacts: add as_text/as_text_mut and as_tabular/as_tabular_mut accessors - Vision extraction: store OCR results in ImageArtifacts::ocr_pages instead of discarding Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix UUID version: ContextEntry now uses now_v7 (was new_v4) - EmbeddingData: Vec<f64> → Vec<f32>, remove redundant dimensions field, rename model → algorithm for consistency with FaceData/VoiceData - PatternExpression: rename serde tag "kind" → "syntax" to avoid collision with AnalyticVariant's "kind" tag when flattened - SignatureData: add missing algorithm field - AddressData: rename region → state to avoid GeospatialVariant confusion - GeoShape::Circle: centre → center (American English consistency) - GeoShape::Polygon: polygon → boundary (avoid redundant naming) - ReferenceVariant::Object → Image (match wrapped ImageData type) - CredentialData: skip_serializing on value to prevent plaintext leaks, add CredentialKind enum replacing untyped credential_type string - TextData: add language field (Option<LanguageTag>) - TemporalVariant: add TimeSpan variant with TimeSpanData Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- PatternExpression: extract RegexPattern and GlobPattern as dedicated types - ImageData: remove untyped format field - TemporalVariant: add TimeOfDay(TimeOfDayData) and DateTime(DateTimeData) variants using jiff::civil::Time and jiff::civil::DateTime Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ontology context cleanup:
- Remove format hints from biometric/document/temporal types (ContentSource
already carries file format via extension)
- Rename temporal::TimeOfDayData → TimeData (file: time_of_day.rs → time.rs)
- Rename existing time.rs → timespan.rs (contains TimeSpanData)
- TemporalVariant: Time(TimeData), TimeSpan(TimeSpanData) layout
- PatternExpression: extract RegexPattern and GlobPattern as dedicated structs
- CredentialData: add #[serde(default)] on value so round-trip yields ""
- TextData: derive Default, add new() and with_language() constructors;
TextEntry: add new() constructor
Tests:
- Add 19 serde round-trip tests covering PatternExpression (incl. tag-
collision regression), CredentialData (secret redaction + roundtrip),
TextData, TimeData, DateTimeData, TimeSpanData, TabularArtifacts
rig 0.33 → 0.37:
- Workspace dep: rig (umbrella crate, version 0.37) — re-exports rig-core
- Adapt to API changes:
- Completion::completion now generic over Into<Message>; use typed
Vec::<Message>::new() for empty chat history
- StructuredOutputError::PromptError now wraps Box<PromptError>; deref
before From conversion
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Pin Rust toolchain to 1.95.0 in rust-toolchain.toml and Cargo.toml
- Pin CI toolchain via RUST_TOOLCHAIN env var (defined once per workflow)
using dtolnay/rust-toolchain@master with explicit toolchain input
- Fix clippy errors:
- sort_by with reverse → sort_by_key + Reverse (codec text/tabular)
- needless_borrows on AsRef bounds (content_data sha256 test)
- unused imports across engine tests after Entity::test_builder refactor
- unused TabularLocation import in span_size tests
- unused value parameter in annotation test_entity (prefix with _)
- Add #![allow(dead_code)] to engine tests/fixtures/mod.rs (shared helpers
appear unused from individual test files)
- Replace 80+ inline std:: paths with top-of-file imports across 47 files:
fmt::{Display,Debug,Formatter}, cmp::{Ordering,Reverse}, time::Duration,
fs, mem, path, io, str, env, future, ops::Deref, slice, sync::Arc, etc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rustdoc fixes (18 files, ~30 broken links resolved): - Correct stale paths: crate::graph::* → crate::workflow::* (ontology workflow files, 13 references) - Method rename: PatternEngine::scan_text → scan_entities (pattern engine helpers + doc example) - Error variant moves: Error::Request / Error::Validation → ErrorKind::* (provider STT, TTS, OCR backend) - Self-method links: [`acquire_resources`] → Self::acquire_resources, [`maybe_compact`] → Self::maybe_compact - External crate URLs: [`tempfile`], [`lopdf`], [`scraper`] → docs.rs URLs - Cross-crate unreachables (private items / circular dep targets) rewritten as backticks or prose: extraction/detection/deduplication/redaction submodules, Pipeline, ExecutionPlan, RawMatch, CompositeKey, Operation, KeyProvider - Header bullet that rustdoc parsed as a ref def: `[`Engine`]: …` → em-dash - Macro files: add ref defs for [`Handler`], [`AudioHandler`], [`ImageHandler`], [`Span`], [`DocumentType`] All ref defs are at the bottom of their docblocks. Verified: - `RUSTDOCFLAGS="-D warnings" cargo doc --workspace --all-features --no-deps` passes (only unrelated nvisy_server filename collision warning) - `cargo clippy --workspace --all-features --all-targets -- -D warnings` clean - `cargo +nightly fmt --all` applied across the workspace Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
mathmodule toprimitive— better name now that it contains non-math types likeLanguageTagoxilangtagdependency and typedLanguageTagnewtype for BCP-47 language tags, used inEntity::languageandTranscription::languageTranscriptionto support diarization: replaces flattext: StringwithVec<TranscriptSegment>containingtime_span,speaker_id, andconfidenceper segmentDisplay,From,Deref,DerefMutimpls withderive_moreacross 9 ontology typesTextArtifacts(language, char_count),TabularArtifacts(row/col counts, sparse headers),RichArtifacts(add tabular)ImageArtifacts::ocr_pagesduring vision extractionas_text/as_tabularaccessors onContentArtifactsRegexPatternandGlobPatternas dedicated typesTimeOfDayandDateTimevariantsCredentialKindenumTest plan
cargo check --workspace --all-featurescargo test --workspace --all-features(469 tests pass)🤖 Generated with Claude Code