test(benchmark): add latency bench for envector-msa-1.2.2#133
Draft
heeyeon01 wants to merge 7 commits into
Draft
test(benchmark): add latency bench for envector-msa-1.2.2#133heeyeon01 wants to merge 7 commits into
heeyeon01 wants to merge 7 commits into
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port latency_bench.py and latency dataclasses from benchmark/envector-latency-v1.4.0. Remove secure= param (not supported in pyenvector 1.2.2; TLS active via access_token). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add BENCHMARK_DIR to sys.path so runners.common import resolves correctly - Update report title from envector-msa-1.4.0 to 1.2.2 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Captures full end-to-end latency across capture/recall/batch_capture/vault_status with pyenvector 1.2.2 and the new cloud envector endpoint. Key findings vs 2026-05-05 (pyenvector 1.4.0): - capture ~5–8x faster (insert 490ms vs 5018ms — EvalKey overhead absent in 1.2.2) - recall ~1.4x slower (score+vault_topk: endpoint migration to domain-based URL) - vault_health ~9.5x slower (Vault moved from localhost to remote 193.122.124.173) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Header: add eval_mode=rmp/index_type=flat for 1.2.2 vs mm for 1.4.0 - Comparison note: list three confounders (version, eval_mode, infra change) - Recall section: break down score/vault_topk causes explicitly; note that numbers cannot be read as pure version comparison Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
Author
❌ Some tests failed
Full Test Results365 passed, 1 failed, 2 skipped (368 total) — See CI logs
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rune × envector-msa-1.2.2 Latency Report
v1.4.0 plan과 달라진 환경:
secure=false(plaintext)146.56.178.130:50050(IP 직접)0511-1401-0001-4r6cwxfc908b.clusters.envector.io(도메인)tcp://localhost:50051(로컬)tcp://193.122.124.173:50051(원격)secure=파라미터cfg.envector.secure전달실험 계획 및 표 읽는 법
시나리오 정의 / 측정 방법론
시나리오(T1–T9), 단계 분해, 통계 기준, 환경 조건은 아래 plan 문서 참조:
https://github.com/CryptoLabInc/rune/blob/benchmark/envector-latency-v1.2.2/benchmark/plans/latency_bench_plan_envector_v1.2.2.md
통계 약자
단계 이름
embedscorevault_topkinsertremindtotal실행 결과 요약
capture 시간 분해:
insert(490–550 ms)가 전체의 ~74%. 1.4.0(92%)보다 비중은 낮아졌으나 여전히 지배적.recall 시간 분해:
score(255–273 ms) +vault_topk(131–176 ms)의 FHE 경로가 전체의 ~84%, embedding ~34 ms.batch per-item: 294–309 ms (batch size 1→20 전 구간 ±15 ms 이내). batch_capture는 insert 없이 embed+score만 측정.
recall topk-독립: T7에서 topk = 1, 3, 5, 10 모두 p50 472–509 ms. 차이 < 8%.
1.4.0 vs 1.2.2 p50 비교
── capture (p50 ms, 기준 5269ms = 40칸) ──────────────────────────────
T1 영문
1.4.0 ████████████████████████████████████████ 5269ms (×7.9)
1.2.2 █████ 664ms
T2 긴영문
1.4.0 ████████████████████████████████████ 4707ms (×5.8)
1.2.2 ██████ 810ms
T3 한국어
1.4.0 ███████████████████████████████████ 4595ms (×5.3)
1.2.2 ███████ 871ms
T4 중복
1.4.0 █████████████████████████████████████ 4886ms (×5.1)
1.2.2 ████████ 965ms
── recall (p50 ms, 기준 529ms = 30칸) ────────────────────────────────
T5 exact
1.4.0 █████████████████████ 368ms (÷1.4)
1.2.2 ██████████████████████████████ 529ms
T6 cross
1.4.0 ██████████████████████ 384ms (÷1.3)
1.2.2 █████████████████████████████ 516ms
── vault health (p50 ms, 기준 7.6ms = 20칸) ──────────────────────────
T9 vault
1.4.0 ██ 0.8ms (÷9.5)
1.2.2 ████████████████████ 7.6ms
상세 phase별 표(p50, p95, p99, mean)는
benchmark/reports/latency_results_v1.2.2_2026-05-11.md참조.주목할 점
capture가 1.4.0 대비 ~5–8x 빠름 — insert phase 개선이 핵심
score·vault_topk는 거의 동일한 반면,insert가 5018 ms → 490 ms (÷10). 1.4.0에서 EvalKey (~1.18 GB)를 매 insert마다 처리하는 비용이 컸던 것으로 추정. 1.2.2에는 해당 EvalKey 부담이 없음.recall·vault가 1.4.0 대비 느림 — 인프라 변경이 주 원인
recall과 T9의 성능 저하는 pyenvector 버전 차이가 아닌 인프라 환경 변화 때문이다 — 도메인 기반 클라우드 전환에 따른 DNS 조회 + TLS handshake 오버헤드로 추정:
vault_topk+77ms / T9 ×9.5: Vault가 localhost → 원격(193.122.124.173)으로 변경. 7.6 ms는 순수 네트워크 RTT.score+90ms: 엔드포인트 변경(IP 직접 → 도메인 기반) + TLS 활성화 + eval_mode 차이(mm→rmp) 복합. 개별 기여도 분리 불가.이 수치들은 pyenvector 1.2.2 vs 1.4.0의 순수 알고리즘 성능 비교로 해석해선 안 된다.
T4 중복 입력의 score가 다른 capture 시나리오의 약 2.7배
T1 score 102 ms, T4 score 279 ms. 1.4.0에서도 동일한 패턴(T1: 100 ms → T4: 211 ms). near-duplicate 판정 경로에서 vault_topk decrypt 분기가 더 무거운 path를 실행하기 때문으로 추정. 버전에 무관하게 일관된 패턴.
T3 한국어 capture가 영문 대비 score/vault_topk에서 추가 시간 발생
1.4.0과 동일한 경향이나 score/vault_topk 배율이 더 커졌다. FHE 벡터 길이는 이론적으로 동일해야 하므로 payload 직렬화 오버헤드 차이로 추정.
batch_capture per-item이 단일 capture의 ~44%
per-item 300 ms, 단일 capture 664 ms. batch_capture는 insert 없이 embed+score만 측정하기 때문. 1.4.0(per-item 240 ms vs 5000 ms)과 구조적으로 동일한 이유.