test(benchmark): add latency bench for envector-msa-1.4.0 by heeyeon01 · Pull Request #99 · CryptoLabInc/rune

heeyeon01 · 2026-05-04T09:54:36Z

Rune × envector-msa-1.4.0 Latency Report

[측정 환경] envector 연결은 secure=false(plaintext gRPC)로 측정. Vault(localhost:50051)는 TLS 활성 상태.
plaintext로 진행해도 무방하다고 판단한 이유: gRPC는 HTTP/2 커넥션을 재사용하므로 TLS 핸드셰이크 비용이 warmup 이후 측정값에 포함되지 않으며, 주요 처리 시간이 FHE 연산에서 발생해 TLS 유무가 결과에 미치는 영향이 작다고 판단.
이후 envector TLS 환경(secure=true)에서도 동일 시나리오로 벤치마크 진행 예정.

[2026-05-05 재측정 노트] 1차 측정(2026-05-04)에서 vault_topk = 0 ms로 FHE 복호화 분기 비활성이 의심되던 부분이 해소됨. eval key 정상 로드 상태에서 end-to-end FHE 경로(novelty check + topk decrypt)를 모두 거치는 결과. 1차 대비 capture는 ~50배, recall은 ~10배 증가했고, 차이는 1차 측정에 포함되지 않았던 FHE 복호화 비용에서 발생.

실험 계획 및 표 읽는 법

시나리오 정의 / 측정 방법론

전체 시나리오(T1–T9), 단계 분해, 통계 기준, 환경 조건, 검증 방법은 다음 문서 참조:
https://github.com/CryptoLabInc/rune/blob/benchmark/envector-latency-v1.4.0/benchmark/plans/latency_bench_plan_envector_v1.4.0.md

통계 약자

n: 워밍업(첫 2회)을 제외하고 실제 측정에 사용한 반복 횟수.
p50: 측정값을 정렬했을 때 50번째 백분위. 중앙값 = "보통 케이스".
p95: 95번째 백분위. 가장 느린 5%를 제외한 시간 = "운 나쁠 때 케이스".
평균(mean) 대신 p50/p95를 쓰는 이유는 outlier 한 번이 평균을 끌어올려 "보통 속도"를 왜곡하기 때문.

단계 이름

단계	하는 일	어디서
`embed`	텍스트 → 벡터 변환 (Qwen3-Embedding-0.6B)	로컬 CPU
`score`	암호화된 벡터로 유사도 계산 요청	envector cloud (FHE)
`vault_topk`	envector가 돌려준 암호화 점수를 복호화	Rune-Vault (로컬 gRPC)
`insert`	새 데이터를 암호화하여 envector에 저장	로컬 encrypt + envector cloud
`remind`	recall 시 메타데이터 조회	로컬 JSON
`total`	위 단계들의 합 (= 사용자가 체감하는 end-to-end 시간)	—

vault_topk = 0 ms 의 의미: envector가 빈 암호화 점수 리스트를 돌려줘서 Vault 복호화 분기가 skip 됐다는 신호. 진짜 0초가 걸린 게 아니라 단계 자체가 실행되지 않았음을 의미.

실행 결과 요약

시나리오	feature	p50 total	p95 total	n
T1 짧은 영문 (~35 tokens)	capture	5269.5 ms	5449.8 ms	8
T2 긴 영문 (~155 tokens)	capture	4707.0 ms	4931.9 ms	8
T3 한국어	capture	4595.2 ms	4789.1 ms	8
T4 중복 입력 (near-duplicate path)	capture	4886.1 ms	5020.2 ms	8
T5 Recall — exact match	recall	368.2 ms	385.3 ms	8
T6 Recall — 한→영 cross-language	recall	384.1 ms	464.3 ms	8
T7 Recall — topk 1/3/5/10	recall	367–376 ms	374–402 ms	5 each
T8 Batch per-item (size 1/5/10/20)	batch_capture	238–245 ms	242–246 ms	3 each
T9 Vault health check	vault_status	0.8 ms	0.9 ms	8

capture 시간 분해: insert (FHE encrypt + 원격 insert, 4200–5000 ms) 가 전체의 91–95%. embed/score/vault_topk 합계 5–9%.
recall 시간 분해: score (180–200 ms) + vault_topk (100–115 ms) + remind (45–47 ms) 합계가 전체의 ~80%, embedding 28–47 ms.
batch per-item: 238–245 ms (batch size 1, 5, 10, 20 전 구간 ±3 ms 이내). batch_capture는 insert를 수행하지 않고 embed +
score(novelty)만 측정.
recall topk-독립: T7에서 topk = 1, 3, 5, 10 모두 p50 367–376 ms. 차이 < 3%.

상세 phase별 표(p50, p95, p99, mean)는 benchmark/reports/latency_results_v1.4.0_2026-05-05.md 참조.

주목할 점

vault_topk / remind 가 정상 측정됨 — 1차 측정의 0 ms 문제 해소

1차 리포트에서 모든 시나리오의 vault_topk = 0.0 ms였던 것이 이번에는 capture 31–100 ms / recall 100–115 ms로 측정됨. eval key 로드 상태가 정상화되어 FHE encrypted_blobs가 비어있지 않게 되었고, Vault gRPC 복호화 분기가 실행됨.

부수 인프라 수정: mcp/adapter/vault_client.py의 MAX_MESSAGE_LENGTH를 256 MB → ~1.95 GB로 상향. EvalKey 응답이 ~1.18 GB라 256MB 한도에서 RESOURCE_EXHAUSTED로 실패했음 (별도 fix 커밋 d08e1bb).

capture 시간 대부분이 insert에서 소비됨 (~4.5초, 전체의 92%)

insert phase가 capture 전체 시간의 91–95%. 내부적으로 FHE 암호화 + 원격 envector insert가 합쳐진 구간이며, 절대값은 시나리오별로 4.2–5.0초. 다음 측정 라운드에서:

FHE encrypt 단독 시간 (로컬 CPU)
원격 envector insert_data gRPC 시간 (네트워크 + 서버측 저장)

을 분리 계측하면 시간이 어디서 발생하는지 더 명확해짐.

참고: envector-msa-1.4.0의 PPMM cache 최적화(docs/design/ppmm-cache-optimization-analysis.md)는 Search 경로 전용이며 insert와 무관. 또한 단일 쿼리(m=1) 시나리오에서는 효과 없음.

recall score 단계가 capture novelty score의 약 2배 시간

같은 score 호출이지만 capture 시 100 ms, recall 시 180 ms. capture는 novelty check만 하고 단순 거리 metric을 받는 반면, recall은 그 결과로 후속 vault_topk + remind 분기를 실행하며 ScoreEntry payload가 다르기 때문으로 추정. 코드 경로 확인 필요.

T3 한국어 capture가 영문 대비 score/vault_topk에서 추가 시간 발생

	embed	score	vault_topk	insert
T1 영문	92.6	100.4	31.9	5018.1
T3 한국어	126.9 (+37%)	167.5 (+67%)	75.6 (+137%)	4209.0 (-16%)

embed/score/vault_topk는 한국어가 영문보다 시간이 더 걸리고 insert는 줄어듦. embed는 토크나이저 subword 차이로 설명 가능하나, score/vault_topk가 토큰 수에 영향받는 이유는 명확하지 않음 (이론적으로 FHE 벡터 길이가 동일해야 함). 토크나이저 출력 vector dim 차이 또는 payload 길이 차이가 직렬화 오버헤드로 나타났을 가능성.

T4 중복 입력의 score가 다른 capture 시나리오의 약 2배

T1 score 100 ms, T4 score 211 ms. T4는 같은 텍스트를 두 번 capture하는 시나리오로, novelty 결과가 high-similarity로 나오면서 vault_topk decrypt 분기가 다른 path를 실행하기 때문으로 추정. 1차 측정의 p95 spike도 같은 원인일 가능성.

batch_capture per-item이 단일 capture 대비 약 1/20

per-item 240 ms, 단일 capture 5000 ms. 차이의 대부분은 batch_capture가 insert를 수행하지 않고 embed + score(novelty)만 측정하기 때문. batch insert 포함 시간이 필요하면 runner에 별도 시나리오 추가 필요.

…ector 1.4 - Add optional `secure` field to EnVectorConfig, EnVectorClient, SDK adapter, and MCP server; reads from config.json and ENVECTOR_SECURE env var - Normalize query_encryption bool → string ("plain"/"cipher") to match pyenvector 1.4 API Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Switches Claude Code MCP registration from relying on plugin.json mcpServers auto-discovery to explicit `claude mcp add --scope user` invocation, which is more reliable and avoids path-conflict issues with the plugin system. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When ENVECTOR_ENDPOINT or ENVECTOR_API_KEY are not set via environment variables, read them from the envector section of ~/.rune/config.json, consistent with how other config values (secure, vault) are loaded. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Measures per-phase latency (embed → FHE score → vault_topk → insert/recall) for capture, recall, batch_capture, and vault_status against live envector-msa-1.4.0. Includes plan doc, v1.4.0 results report, and README updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- agents/common/envector_client.py: read ENVECTOR_EVAL_MODE env var instead of hardcoding "rmp" - mcp/server/server.py: switch default eval_mode from "rmp" to "mm" in both runtime adapter init and CLI flag Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… EvalKey EvalKey in pyenvector >= 1.4.0 reaches ~1.2 GB, exceeding the previous 256 MB cap. Set the limit to 2000 MB — the largest round value that stays under INT32_MAX (gRPC's hard ceiling). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

End-to-end FHE 경로(eval key 로드 후, vault_topk 분기 활성)에서 재측정. 어제 대비 capture ~×50, recall ~×10 증가는 그동안 측정되지 않던 FHE 복호화 비용이 드디어 포함된 결과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

설치/측정 환경의 실제 버전은 stable 1.4.0 (pre-release a5 아님). plan과 2026-05-05 리포트 둘 다 동일하게 수정. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

plan에 추상적으로 적혀있던 두 변경 사항의 정확한 출처와 적용 범위를 표기: - `secure` 파라미터: SDK Configuration 문서. - PPMM cache 최적화: query batch m=4–16에서 1.20–1.40x (V2) / 1.35–1.59x (V3), 단일 쿼리(m=1)에는 효과 없음. 검색(Search) 경로 전용. 리포트의 capture insert 분석에서 PPMM 인용을 제거. PPMM은 Search 경로 최적화이고 insert/encrypt 경로와는 무관함을 envector design doc로 확인. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…t section 데이터(~92%)는 사실로 유지하되 헤딩 표현을 중립적으로 조정. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

GetPublicKey streams ~1.1 GiB EvalKey for pyenvector 1.4 and exceeds the shared 30s default deadline (measured ~38s on a 30 MiB/s link). Companion to d08e1bb, which raised the gRPC message size limit but left the deadline untouched. Standard RPC timeout stays at 30s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Apply Prettier-style cleanup to all .md files modified vs main on this branch: pad table cells to equal visual width (CJK-aware), insert blank lines before fenced code blocks and tables, and drop trailing two-space hard breaks where the next line is already a block boundary. Intentional hard breaks in paragraph-fragment lists are preserved. No content changes.

Plumb the new envector_secure field through fetch_keys_from_vault (now an 8-tuple) and apply it to EnVectorConfig.secure in both the reload (_init_pipelines) and startup-init paths. When Vault provides the field, the value is persisted to ~/.rune/config.json so the EnVectorClient/Adapter is constructed with the right TLS mode on subsequent runs. Backward-compatible: older Vault servers that omit the field cause vault_ev_secure to come back as None, in which case the client leaves the existing secure setting (env var → config → SDK default) untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lict Claude Code auto-registers enVector MCP from the plugin manifest (plugin:rune:envector). The duplicate user-scope `claude mcp add` entry competed with the plugin-scope registration during stdio handshake and caused both to fail. Keep the legacy-entry purge so older installs are cleaned up; let the plugin manifest own registration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The synchronous fetch_keys_from_vault() call in main() blocked MCP transport startup on the GetPublicKey gRPC stream. With pyenvector 1.4.0's larger EvalKey the stream now regularly exceeds Claude Code's 30s MCP startup limit, leaving the server unreachable on cold starts. The existing _init_pipelines background thread already re-fetches and rebuilds the adapter, so the sync block was redundant — remove it and let pipeline init populate key_id / agent_id / agent_dek / index_name once Vault responds. Tools that need these already gate via _ensure_pipelines() so they wait gracefully. Mirror the envector_not_provisioned dormant transition into _init_pipelines so the missing-credentials signal is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The thread wrapper around fetch_keys_from_vault used a 30s timeout while the underlying GetPublicKey gRPC call has a 90s deadline (654806e). On cold start the gRPC call could take 40s+ due to TLS handshake racing sentence-transformers model loading on MainThread, so the wrapper declared failure and switched the server to dormant even though the worker thread eventually succeeded and wrote keys to disk. Bump the wrapper timeout to 120s (gRPC deadline + buffer for thread/ asyncio/TLS overhead) with a 30s grace re-poll so an in-flight call returns its real result. The `with ThreadPoolExecutor` block already waits on shutdown, so a longer timeout costs nothing on the success path. Also rework the ad-hoc DEBUG file logger added during diagnosis into an opt-in RotatingFileHandler controlled by RUNE_MCP_DEBUG_LOG (=1 for INFO, =debug for verbose). Off by default so normal runs don't write to disk; preserved as a tool for future background-thread debugging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

esifea

In my opinion, we don't need to update Python implementatiom which will be deprecated after v0.4.0.

Could you please just leave the benchmark scripts that can be reusable also with Go implementation?

heeyeon01 · 2026-05-07T09:32:04Z

In my opinion, we don't need to update Python implementatiom which will be deprecated after v0.4.0.

Could you please just leave the benchmark scripts that can be reusable also with Go implementation?

I see your point. I kept the changes because this branch also contains information about which source code Rune was based on during the benchmark measurements. The modified code is intended for integration with enVector 1.4.0(mm mode).
I'll separate and clean up the branch later. Since this is still in the testing phase, I'd prefer to keep it as is for now. That's why it's still in the draft phase.

heeyeon01 and others added 4 commits May 4, 2026 16:42

heeyeon01 changed the title ~~test(benchmark): add latency bench for envector-msa-1.4.0~~ 진행중 - test(benchmark): add latency bench for envector-msa-1.4.0 May 4, 2026

heeyeon01 and others added 4 commits May 5, 2026 20:32

docs(benchmark): correct pyenvector version 1.4.0a5 → 1.4.0

d48fcaf

설치/측정 환경의 실제 버전은 stable 1.4.0 (pre-release a5 아님). plan과 2026-05-05 리포트 둘 다 동일하게 수정. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

heeyeon01 changed the title ~~진행중 - test(benchmark): add latency bench for envector-msa-1.4.0~~ test(benchmark): add latency bench for envector-msa-1.4.0 May 5, 2026

heeyeon01 and others added 2 commits May 5, 2026 21:02

docs(benchmark): tone down "압도적" to neutral phrasing in capture inser…

83e8cf9

…t section 데이터(~92%)는 사실로 유지하되 헤딩 표현을 중립적으로 조정. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

heeyeon01 requested a review from sunchuljung May 5, 2026 12:16

heeyeon01 self-assigned this May 6, 2026

heeyeon01 requested a review from a team May 6, 2026 00:28

heeyeon01 and others added 6 commits May 6, 2026 13:59

esifea requested changes May 7, 2026

View reviewed changes

heeyeon01 requested a review from esifea May 7, 2026 09:34

couragehong added the DO NOT MERGE DO NOT MERGE label May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(benchmark): add latency bench for envector-msa-1.4.0 #99

test(benchmark): add latency bench for envector-msa-1.4.0 #99
heeyeon01 wants to merge 16 commits into
mainfrom
benchmark/envector-latency-v1.4.0

heeyeon01 commented May 4, 2026 •

edited

Loading

Uh oh!

esifea left a comment

Uh oh!

heeyeon01 commented May 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

heeyeon01 commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rune × envector-msa-1.4.0 Latency Report

실험 계획 및 표 읽는 법

시나리오 정의 / 측정 방법론

통계 약자

단계 이름

실행 결과 요약

주목할 점

vault_topk / remind 가 정상 측정됨 — 1차 측정의 0 ms 문제 해소

capture 시간 대부분이 insert에서 소비됨 (~4.5초, 전체의 92%)

recall score 단계가 capture novelty score의 약 2배 시간

T3 한국어 capture가 영문 대비 score/vault_topk에서 추가 시간 발생

T4 중복 입력의 score가 다른 capture 시나리오의 약 2배

batch_capture per-item이 단일 capture 대비 약 1/20

Uh oh!

esifea left a comment

Choose a reason for hiding this comment

Uh oh!

heeyeon01 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

heeyeon01 commented May 4, 2026 •

edited

Loading

heeyeon01 commented May 7, 2026 •

edited

Loading