test(benchmark): add latency bench for envector-msa-1.2.2 by heeyeon01 · Pull Request #133 · CryptoLabInc/rune

heeyeon01 · 2026-05-11T08:24:55Z

Rune × envector-msa-1.2.2 Latency Report

[측정 환경] pyenvector 1.2.2 사용. secure 파라미터 미지원 — access_token 설정 시 TLS 자동 활성. eval_mode=rmp,
index_type=flat.
Vault는 원격(tcp://193.122.124.173:50051)에서 TLS로 실행 중.
envector 엔드포인트는 클라우드 도메인 기반(0511-...envector.io)으로 이전 — 이전 측정(2026-05-05, 146.56.178.130:50050 IP 직접 접속)과 네트워크 경로가 다름.

[비교 노트] 2026-05-05 리포트(pyenvector 1.4.0)와의 수치 차이는 pyenvector 버전, eval_mode(rmp vs mm),
엔드포인트/인프라 변경 세 가지 요인이 복합되어 있어 버전 단독 영향으로 해석하면 안 됨.

v1.4.0 plan과 달라진 환경:

항목	v1.4.0	v1.2.2
pyenvector	1.4.0	1.2.2
eval_mode	mm	rmp
index_type	flat	flat
envector TLS	`secure=false` (plaintext)	TLS 자동 활성 (access_token)
envector endpoint	`146.56.178.130:50050` (IP 직접)	`0511-1401-0001-4r6cwxfc908b.clusters.envector.io` (도메인)
Vault	`tcp://localhost:50051` (로컬)	`tcp://193.122.124.173:50051` (원격)
runner `secure=` 파라미터	`cfg.envector.secure` 전달	제거 (TLS는 access_token으로 자동 활성)

[비교 주의] 위 세 가지 차이(버전, eval_mode, 인프라)가 동시에 바뀌었으므로 v1.4.0 수치와의 차이를 pyenvector 버전 단독 영향으로 해석해선 안 된다.

실험 계획 및 표 읽는 법

시나리오 정의 / 측정 방법론

시나리오(T1–T9), 단계 분해, 통계 기준, 환경 조건은 아래 plan 문서 참조:
https://github.com/CryptoLabInc/rune/blob/benchmark/envector-latency-v1.2.2/benchmark/plans/latency_bench_plan_envector_v1.2.2.md

통계 약자

n: 워밍업(첫 2회)을 제외하고 실제 측정에 사용한 반복 횟수.
p50: 측정값을 정렬했을 때 50번째 백분위. 중앙값 = "보통 케이스".
p95: 95번째 백분위. 가장 느린 5%를 제외한 시간 = "운 나쁠 때 케이스".
평균(mean) 대신 p50/p95를 쓰는 이유는 outlier 한 번이 평균을 끌어올려 "보통 속도"를 왜곡하기 때문.

단계 이름

단계	하는 일	어디서
`embed`	텍스트 → 벡터 변환 (Qwen3-Embedding-0.6B)	로컬 CPU
`score`	암호화된 벡터로 유사도 계산 요청	envector cloud (FHE)
`vault_topk`	envector가 돌려준 암호화 점수를 복호화	Rune-Vault (원격 gRPC)
`insert`	새 데이터를 암호화하여 envector에 저장	로컬 encrypt + envector cloud
`remind`	recall 시 메타데이터 조회	로컬 JSON
`total`	위 단계들의 합 (= 사용자가 체감하는 end-to-end 시간)	—

실행 결과 요약

시나리오	feature	p50 total	p95 total	n	Δ(2026-05-05 대비)
T1 짧은 영문 (~35 tokens)	capture	664.0 ms	792.0 ms	8	÷7.9
T2 긴 영문 (~155 tokens)	capture	809.8 ms	955.1 ms	8	÷5.8
T3 한국어	capture	870.5 ms	951.7 ms	8	÷5.3
T4 중복 입력 (near-duplicate path)	capture	964.7 ms	1013.1 ms	8	÷5.1
T5 Recall — exact match	recall	528.6 ms	554.2 ms	8	×1.4
T6 Recall — 한→영 cross-language	recall	515.6 ms	571.2 ms	8	×1.3
T7 Recall — topk 1/3/5/10	recall	472–509 ms	498–527 ms	5 each	×1.3
T8 Batch per-item (size 1/5/10/20)	batch_capture	294–309 ms	300–383 ms	3 each	×1.3
T9 Vault health check	vault_status	7.6 ms	9.5 ms	8	×9.5

capture 시간 분해: insert (490–550 ms)가 전체의 ~74%. 1.4.0(92%)보다 비중은 낮아졌으나 여전히 지배적.
recall 시간 분해: score (255–273 ms) + vault_topk (131–176 ms)의 FHE 경로가 전체의 ~84%, embedding ~34 ms.
batch per-item: 294–309 ms (batch size 1→20 전 구간 ±15 ms 이내). batch_capture는 insert 없이 embed+score만 측정.
recall topk-독립: T7에서 topk = 1, 3, 5, 10 모두 p50 472–509 ms. 차이 < 8%.

1.4.0 vs 1.2.2 p50 비교

(×N) = 1.4.0이 1.2.2보다 N배 느림. (÷N) = 1.4.0이 1.2.2보다 N배 빠름.

── capture (p50 ms, 기준 5269ms = 40칸) ──────────────────────────────

T1 영문
1.4.0 ████████████████████████████████████████ 5269ms (×7.9)
1.2.2 █████ 664ms

T2 긴영문
1.4.0 ████████████████████████████████████ 4707ms (×5.8)
1.2.2 ██████ 810ms

T3 한국어
1.4.0 ███████████████████████████████████ 4595ms (×5.3)
1.2.2 ███████ 871ms

T4 중복
1.4.0 █████████████████████████████████████ 4886ms (×5.1)
1.2.2 ████████ 965ms

── recall (p50 ms, 기준 529ms = 30칸) ────────────────────────────────

T5 exact
1.4.0 █████████████████████ 368ms (÷1.4)
1.2.2 ██████████████████████████████ 529ms

T6 cross
1.4.0 ██████████████████████ 384ms (÷1.3)
1.2.2 █████████████████████████████ 516ms

── vault health (p50 ms, 기준 7.6ms = 20칸) ──────────────────────────

T9 vault
1.4.0 ██ 0.8ms (÷9.5)
1.2.2 ████████████████████ 7.6ms

상세 phase별 표(p50, p95, p99, mean)는 benchmark/reports/latency_results_v1.2.2_2026-05-11.md 참조.

주목할 점

capture가 1.4.0 대비 ~5–8x 빠름 — insert phase 개선이 핵심

	embed	score	vault_topk	insert	total
1.4.0 T1	92.6	100.4	31.9	5018.1	5269.5
1.2.2 T1	38.8	102.0	34.6	490.0	664.0

score·vault_topk는 거의 동일한 반면, insert가 5018 ms → 490 ms (÷10). 1.4.0에서 EvalKey (~1.18 GB)를 매 insert마다 처리하는 비용이 컸던 것으로 추정. 1.2.2에는 해당 EvalKey 부담이 없음.

recall·vault가 1.4.0 대비 느림 — 인프라 변경이 주 원인

recall과 T9의 성능 저하는 pyenvector 버전 차이가 아닌 인프라 환경 변화 때문이다 — 도메인 기반 클라우드 전환에 따른 DNS 조회 + TLS handshake 오버헤드로 추정:

vault_topk +77ms / T9 ×9.5: Vault가 localhost → 원격(193.122.124.173)으로 변경. 7.6 ms는 순수 네트워크 RTT.
score +90ms: 엔드포인트 변경(IP 직접 → 도메인 기반) + TLS 활성화 + eval_mode 차이(mm→rmp) 복합. 개별 기여도 분리 불가.

이 수치들은 pyenvector 1.2.2 vs 1.4.0의 순수 알고리즘 성능 비교로 해석해선 안 된다.

T4 중복 입력의 score가 다른 capture 시나리오의 약 2.7배

T1 score 102 ms, T4 score 279 ms. 1.4.0에서도 동일한 패턴(T1: 100 ms → T4: 211 ms). near-duplicate 판정 경로에서 vault_topk decrypt 분기가 더 무거운 path를 실행하기 때문으로 추정. 버전에 무관하게 일관된 패턴.

T3 한국어 capture가 영문 대비 score/vault_topk에서 추가 시간 발생

	embed	score	vault_topk	insert
T1 영문	38.8	102.0	34.6	490.0
T3 한국어	49.4 (+27%)	210.2 (+106%)	101.1 (+192%)	516.5 (+5%)

1.4.0과 동일한 경향이나 score/vault_topk 배율이 더 커졌다. FHE 벡터 길이는 이론적으로 동일해야 하므로 payload 직렬화 오버헤드 차이로 추정.

batch_capture per-item이 단일 capture의 ~44%

per-item 300 ms, 단일 capture 664 ms. batch_capture는 insert 없이 embed+score만 측정하기 때문. 1.4.0(per-item 240 ms vs 5000 ms)과 구조적으로 동일한 이유.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Port latency_bench.py and latency dataclasses from benchmark/envector-latency-v1.4.0. Remove secure= param (not supported in pyenvector 1.2.2; TLS active via access_token). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add BENCHMARK_DIR to sys.path so runners.common import resolves correctly - Update report title from envector-msa-1.4.0 to 1.2.2 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Captures full end-to-end latency across capture/recall/batch_capture/vault_status with pyenvector 1.2.2 and the new cloud envector endpoint. Key findings vs 2026-05-05 (pyenvector 1.4.0): - capture ~5–8x faster (insert 490ms vs 5018ms — EvalKey overhead absent in 1.2.2) - recall ~1.4x slower (score+vault_topk: endpoint migration to domain-based URL) - vault_health ~9.5x slower (Vault moved from localhost to remote 193.122.124.173) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Header: add eval_mode=rmp/index_type=flat for 1.2.2 vs mm for 1.4.0 - Comparison note: list three confounders (version, eval_mode, infra change) - Recall section: break down score/vault_topk causes explicitly; note that numbers cannot be read as pure version comparison Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

heeyeon01 · 2026-05-11T08:26:11Z

❌ Some tests failed

Ran on commit 0fb0076

Full Test Results

365 passed, 1 failed, 2 skipped (368 total) — See CI logs

Result	Test
❌ FAILED	`mcp/tests/test_server.py::test_diagnostics_envector_timeout`
⏭️ SKIPPED	`agents/tests/test_detector.py::TestPatternMatching::test_real_pattern_matching`
⏭️ SKIPPED (collection)	`mcp/tests/test_vault_direct.py:33`

heeyeon and others added 7 commits May 11, 2026 15:47

deps: pin pyenvector to ==1.2.2

679d9fc

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix: increase envector diagnosis timeout to 15s

35f5868

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(benchmark): fix sys.path and report title for pyenvector 1.2.2

c020637

- Add BENCHMARK_DIR to sys.path so runners.common import resolves correctly - Update report title from envector-msa-1.4.0 to 1.2.2 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(benchmark): append DNS+TLS overhead note to recall infra explana…

0fb0076

…tion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

heeyeon01 changed the title ~~test(benchmark): add latency bench for envector-msa-1.4.0~~ test(benchmark): add latency bench for envector-msa-1.2.2 May 11, 2026

couragehong added the DO NOT MERGE DO NOT MERGE label May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(benchmark): add latency bench for envector-msa-1.2.2#133

test(benchmark): add latency bench for envector-msa-1.2.2#133
heeyeon01 wants to merge 7 commits into
mainfrom
benchmark/envector-latency-v1.2.2

heeyeon01 commented May 11, 2026 •

edited

Loading

Uh oh!

heeyeon01 commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

heeyeon01 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rune × envector-msa-1.2.2 Latency Report

실험 계획 및 표 읽는 법

시나리오 정의 / 측정 방법론

통계 약자

단계 이름

실행 결과 요약

1.4.0 vs 1.2.2 p50 비교

주목할 점

capture가 1.4.0 대비 ~5–8x 빠름 — insert phase 개선이 핵심

recall·vault가 1.4.0 대비 느림 — 인프라 변경이 주 원인

T4 중복 입력의 score가 다른 capture 시나리오의 약 2.7배

T3 한국어 capture가 영문 대비 score/vault_topk에서 추가 시간 발생

batch_capture per-item이 단일 capture의 ~44%

Uh oh!

heeyeon01 commented May 11, 2026

❌ Some tests failed

Full Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

heeyeon01 commented May 11, 2026 •

edited

Loading