test(benchmark): add latency bench for envector-msa-1.4.0 #99
test(benchmark): add latency bench for envector-msa-1.4.0 #99heeyeon01 wants to merge 16 commits into
Conversation
…ector 1.4
- Add optional `secure` field to EnVectorConfig, EnVectorClient, SDK adapter,
and MCP server; reads from config.json and ENVECTOR_SECURE env var
- Normalize query_encryption bool → string ("plain"/"cipher") to match
pyenvector 1.4 API
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Switches Claude Code MCP registration from relying on plugin.json mcpServers auto-discovery to explicit `claude mcp add --scope user` invocation, which is more reliable and avoids path-conflict issues with the plugin system. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When ENVECTOR_ENDPOINT or ENVECTOR_API_KEY are not set via environment variables, read them from the envector section of ~/.rune/config.json, consistent with how other config values (secure, vault) are loaded. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Measures per-phase latency (embed → FHE score → vault_topk → insert/recall) for capture, recall, batch_capture, and vault_status against live envector-msa-1.4.0. Includes plan doc, v1.4.0 results report, and README updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- agents/common/envector_client.py: read ENVECTOR_EVAL_MODE env var instead of hardcoding "rmp" - mcp/server/server.py: switch default eval_mode from "rmp" to "mm" in both runtime adapter init and CLI flag Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… EvalKey EvalKey in pyenvector >= 1.4.0 reaches ~1.2 GB, exceeding the previous 256 MB cap. Set the limit to 2000 MB — the largest round value that stays under INT32_MAX (gRPC's hard ceiling). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end FHE 경로(eval key 로드 후, vault_topk 분기 활성)에서 재측정. 어제 대비 capture ~×50, recall ~×10 증가는 그동안 측정되지 않던 FHE 복호화 비용이 드디어 포함된 결과. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
설치/측정 환경의 실제 버전은 stable 1.4.0 (pre-release a5 아님). plan과 2026-05-05 리포트 둘 다 동일하게 수정. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
plan에 추상적으로 적혀있던 두 변경 사항의 정확한 출처와 적용 범위를 표기: - `secure` 파라미터: SDK Configuration 문서. - PPMM cache 최적화: query batch m=4–16에서 1.20–1.40x (V2) / 1.35–1.59x (V3), 단일 쿼리(m=1)에는 효과 없음. 검색(Search) 경로 전용. 리포트의 capture insert 분석에서 PPMM 인용을 제거. PPMM은 Search 경로 최적화이고 insert/encrypt 경로와는 무관함을 envector design doc로 확인. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t section 데이터(~92%)는 사실로 유지하되 헤딩 표현을 중립적으로 조정. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GetPublicKey streams ~1.1 GiB EvalKey for pyenvector 1.4 and exceeds the shared 30s default deadline (measured ~38s on a 30 MiB/s link). Companion to d08e1bb, which raised the gRPC message size limit but left the deadline untouched. Standard RPC timeout stays at 30s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply Prettier-style cleanup to all .md files modified vs main on this branch: pad table cells to equal visual width (CJK-aware), insert blank lines before fenced code blocks and tables, and drop trailing two-space hard breaks where the next line is already a block boundary. Intentional hard breaks in paragraph-fragment lists are preserved. No content changes.
Plumb the new envector_secure field through fetch_keys_from_vault (now an 8-tuple) and apply it to EnVectorConfig.secure in both the reload (_init_pipelines) and startup-init paths. When Vault provides the field, the value is persisted to ~/.rune/config.json so the EnVectorClient/Adapter is constructed with the right TLS mode on subsequent runs. Backward-compatible: older Vault servers that omit the field cause vault_ev_secure to come back as None, in which case the client leaves the existing secure setting (env var → config → SDK default) untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lict Claude Code auto-registers enVector MCP from the plugin manifest (plugin:rune:envector). The duplicate user-scope `claude mcp add` entry competed with the plugin-scope registration during stdio handshake and caused both to fail. Keep the legacy-entry purge so older installs are cleaned up; let the plugin manifest own registration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The synchronous fetch_keys_from_vault() call in main() blocked MCP transport startup on the GetPublicKey gRPC stream. With pyenvector 1.4.0's larger EvalKey the stream now regularly exceeds Claude Code's 30s MCP startup limit, leaving the server unreachable on cold starts. The existing _init_pipelines background thread already re-fetches and rebuilds the adapter, so the sync block was redundant — remove it and let pipeline init populate key_id / agent_id / agent_dek / index_name once Vault responds. Tools that need these already gate via _ensure_pipelines() so they wait gracefully. Mirror the envector_not_provisioned dormant transition into _init_pipelines so the missing-credentials signal is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The thread wrapper around fetch_keys_from_vault used a 30s timeout while the underlying GetPublicKey gRPC call has a 90s deadline (654806e). On cold start the gRPC call could take 40s+ due to TLS handshake racing sentence-transformers model loading on MainThread, so the wrapper declared failure and switched the server to dormant even though the worker thread eventually succeeded and wrote keys to disk. Bump the wrapper timeout to 120s (gRPC deadline + buffer for thread/ asyncio/TLS overhead) with a 30s grace re-poll so an in-flight call returns its real result. The `with ThreadPoolExecutor` block already waits on shutdown, so a longer timeout costs nothing on the success path. Also rework the ad-hoc DEBUG file logger added during diagnosis into an opt-in RotatingFileHandler controlled by RUNE_MCP_DEBUG_LOG (=1 for INFO, =debug for verbose). Off by default so normal runs don't write to disk; preserved as a tool for future background-thread debugging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
esifea
left a comment
There was a problem hiding this comment.
In my opinion, we don't need to update Python implementatiom which will be deprecated after v0.4.0.
Could you please just leave the benchmark scripts that can be reusable also with Go implementation?
I see your point. I kept the changes because this branch also contains information about which source code Rune was based on during the benchmark measurements. The modified code is intended for integration with enVector 1.4.0(mm mode). |
Rune × envector-msa-1.4.0 Latency Report
실험 계획 및 표 읽는 법
시나리오 정의 / 측정 방법론
전체 시나리오(T1–T9), 단계 분해, 통계 기준, 환경 조건, 검증 방법은 다음 문서 참조:
https://github.com/CryptoLabInc/rune/blob/benchmark/envector-latency-v1.4.0/benchmark/plans/latency_bench_plan_envector_v1.4.0.md
통계 약자
단계 이름
embedscorevault_topkinsertremindtotalvault_topk = 0 ms의 의미: envector가 빈 암호화 점수 리스트를 돌려줘서 Vault 복호화 분기가 skip 됐다는 신호. 진짜 0초가 걸린 게 아니라 단계 자체가 실행되지 않았음을 의미.실행 결과 요약
capture 시간 분해:
insert(FHE encrypt + 원격 insert, 4200–5000 ms) 가 전체의 91–95%. embed/score/vault_topk 합계 5–9%.recall 시간 분해:
score(180–200 ms) +vault_topk(100–115 ms) +remind(45–47 ms) 합계가 전체의 ~80%, embedding 28–47 ms.batch per-item: 238–245 ms (batch size 1, 5, 10, 20 전 구간 ±3 ms 이내). batch_capture는 insert를 수행하지 않고 embed +
score(novelty)만 측정.
recall topk-독립: T7에서 topk = 1, 3, 5, 10 모두 p50 367–376 ms. 차이 < 3%.
상세 phase별 표(p50, p95, p99, mean)는
benchmark/reports/latency_results_v1.4.0_2026-05-05.md참조.주목할 점
vault_topk / remind 가 정상 측정됨 — 1차 측정의 0 ms 문제 해소
1차 리포트에서 모든 시나리오의
vault_topk = 0.0 ms였던 것이 이번에는 capture 31–100 ms / recall 100–115 ms로 측정됨. eval key 로드 상태가 정상화되어 FHE encrypted_blobs가 비어있지 않게 되었고, Vault gRPC 복호화 분기가 실행됨.부수 인프라 수정:
mcp/adapter/vault_client.py의MAX_MESSAGE_LENGTH를 256 MB → ~1.95 GB로 상향. EvalKey 응답이 ~1.18 GB라 256MB 한도에서RESOURCE_EXHAUSTED로 실패했음 (별도 fix 커밋 d08e1bb).capture 시간 대부분이 insert에서 소비됨 (~4.5초, 전체의 92%)
insertphase가 capture 전체 시간의 91–95%. 내부적으로 FHE 암호화 + 원격 envector insert가 합쳐진 구간이며, 절대값은 시나리오별로 4.2–5.0초. 다음 측정 라운드에서:insert_datagRPC 시간 (네트워크 + 서버측 저장)을 분리 계측하면 시간이 어디서 발생하는지 더 명확해짐.
recall score 단계가 capture novelty score의 약 2배 시간
같은
score호출이지만 capture 시 100 ms, recall 시 180 ms. capture는 novelty check만 하고 단순 거리 metric을 받는 반면, recall은 그 결과로 후속 vault_topk + remind 분기를 실행하며 ScoreEntry payload가 다르기 때문으로 추정. 코드 경로 확인 필요.T3 한국어 capture가 영문 대비 score/vault_topk에서 추가 시간 발생
embed/score/vault_topk는 한국어가 영문보다 시간이 더 걸리고 insert는 줄어듦. embed는 토크나이저 subword 차이로 설명 가능하나, score/vault_topk가 토큰 수에 영향받는 이유는 명확하지 않음 (이론적으로 FHE 벡터 길이가 동일해야 함). 토크나이저 출력 vector dim 차이 또는 payload 길이 차이가 직렬화 오버헤드로 나타났을 가능성.
T4 중복 입력의 score가 다른 capture 시나리오의 약 2배
T1 score 100 ms, T4 score 211 ms. T4는 같은 텍스트를 두 번 capture하는 시나리오로, novelty 결과가 high-similarity로 나오면서 vault_topk decrypt 분기가 다른 path를 실행하기 때문으로 추정. 1차 측정의 p95 spike도 같은 원인일 가능성.
batch_capture per-item이 단일 capture 대비 약 1/20
per-item 240 ms, 단일 capture 5000 ms. 차이의 대부분은 batch_capture가 insert를 수행하지 않고 embed + score(novelty)만 측정하기 때문. batch insert 포함 시간이 필요하면 runner에 별도 시나리오 추가 필요.