feat(search): dense semantic search with v5 embedding model by kdroidFilter · Pull Request #85 · kdroidFilter/SeforimLibrary

kdroidFilter · 2026-06-25T17:44:31Z

Summary

Adds hybrid lexical + dense semantic search powered by the v5 Hebrew/Aramaic embedding model.

SeforimEmbedder — ONNX (v5 int8) query embedder; HebrewV5Normalizer (strip nikud/teamim + final-letter folding) to match training.
HybridSearchEngine — fuses BM25 + dense (RRF); VectorSearcher over a fused Lucene index (KnnFloatVectorField, cosine).
Generator: BuildVectorIndex + fused text/vector indexing.
Packaging: bundle the v5 model next to the DB (PackageArtifacts); DownloadEmbedModel fetches the v5-int8 release.
CI: free disk space on the runner before the build.

Model

v5b (full-corpus, final-folded). Fair link eval on identical 2000 pairs: R@1 0.461 / R@10 0.870 / MRR 0.606 vs v4 0.418. int8 quantization ≈ 0 quality loss (cosine 0.999).

Notes

Query and passage vectors share the same v5 weights + normalization (verified: embedder self-cosine ~1, related ≫ unrelated).
v4 model/normalizer fully removed.

- Add SeforimEmbedder (ONNX v5 int8) + HebrewV5Normalizer (final-letter folding) - HybridSearchEngine (BM25 + dense, RRF) + VectorSearcher over a fused Lucene index - BuildVectorIndex / fused KnnFloatVectorField indexing in the generator - Bundle + fetch the v5 model (PackageArtifacts, DownloadEmbedModel -> v5-int8) - CI: free disk space on the runner before the build

Y-PLONI · 2026-06-26T11:54:21Z

מתוכנן לשחרור ציבורי?

kdroidFilter · 2026-06-26T12:26:34Z

אני לא יודע, אוליי, אבל מה זה יעזור לכם ?
TANTIVY לא תומך עד כמה שאני יודע בחיפוש ווקטורי

Y-PLONI · 2026-06-26T13:06:53Z

זו כבר בעייה שלנו 😉

kdroidFilter force-pushed the feat/v5-dense-search branch 8 times, most recently from 2cd5766 to b9e938b Compare June 26, 2026 05:58

kdroidFilter force-pushed the feat/v5-dense-search branch from b9e938b to 8f8da4e Compare June 26, 2026 06:07

kdroidFilter merged commit a72cc8d into master Jun 26, 2026
1 check passed

kdroidFilter deleted the feat/v5-dense-search branch June 26, 2026 06:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(search): dense semantic search with v5 embedding model#85

feat(search): dense semantic search with v5 embedding model#85
kdroidFilter merged 1 commit into
masterfrom
feat/v5-dense-search

kdroidFilter commented Jun 25, 2026

Uh oh!

Uh oh!

Y-PLONI commented Jun 26, 2026

Uh oh!

kdroidFilter commented Jun 26, 2026

Uh oh!

Y-PLONI commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kdroidFilter commented Jun 25, 2026

Summary

Model

Notes

Uh oh!

Uh oh!

Y-PLONI commented Jun 26, 2026

Uh oh!

kdroidFilter commented Jun 26, 2026

Uh oh!

Y-PLONI commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants