Skip to content

chore: v04 hardening#128

Merged
couragehong merged 4 commits into
feat/go-migrationfrom
couragehong/chore/v04-hardening
May 8, 2026
Merged

chore: v04 hardening#128
couragehong merged 4 commits into
feat/go-migrationfrom
couragehong/chore/v04-hardening

Conversation

@couragehong
Copy link
Copy Markdown
Contributor

a. Vault endpoint 정규화 (internal/adapters/vault/endpoint.go)

문제: NormalizeEndpoint 가 미완성. 두 edge case 가 silently broken:

  • tcp://host (포트 없음) → 기본 :50051 안 붙고 그대로 반환
  • tcp://[::1]:50051 (IPv6 리터럴) → 처리 경로 없음

변경:

  • 세 scheme (tcp/http/https) 모두 url.Parse 통과시켜 host 추출
  • 단일 defaulting step 에서 포트 없으면 :50051 추가
  • IPv6 ([::1]) 의 경우 url.Parse 가 이미 brackets 채워주니 net.JoinHostPort 가 double-bracket 안 하도록 분기
  • 빈 입력 / scheme-only 입력 (tcp://) → typed VAULT_BAD_ENDPOINT 에러
  • 12개 happy-path + 3개 에러 case 테스트 추가

영향: Vault endpoint 입력 robustness. 사용자가 tcp://vault.cryptolab.co.kr 처럼 포트 안 적어도 동작.

b. Stale TODO 주석 정리 (internal/service/lifecycle.go:489)
변경: TODO 블록을 실제 contract 설명으로 교체.

c. Recall external-IO timeouts (internal/service/recall.go)

문제: recall 파이프라인이 8개 외부 RPC 를 호출하는데 (Embedder 3, Envector 2, Vault 3) gRPC keepalive 만 의존. keepalive 는 connection 끊김만 감지 — 서버가 ACK 보내고 stream 만 잡고 응답 안 주면 무한 hang. FHE 부하 큰 envector cluster 나 느린 Vault 호스트에서 실재 risk.

변경:

  • 5개 timeout 상수 + 각 외부 호출을 context.WithTimeout 으로 감쌈
  • embedder 10s, envector Score 30s, envector Metadata 15s, vault DecryptScores 30s, vault DecryptMetadata 30s
  • 각 호출 직후 cancel() 로 timer resource 즉시 해제
  • TODO 주석 제거

couragehong and others added 3 commits May 8, 2026 10:14
… tests

NormalizeEndpoint had a TODO marker about full url.Parse coverage, plus
two real edge-case bugs: tcp:// without an explicit port silently dropped
the default 50051, and IPv6 literals (tcp://[::1]) were never exercised.

Reshape so all three schemes (tcp/http/https) flow through url.Parse
into a single host string, then a single defaulting step appends :50051
when no port is already present. IPv6 needs a special-case skip of
net.JoinHostPort because url.Parse already brackets `[::1]` and JoinHostPort
would double-bracket a host whose name itself contains `:`.

Also reject empty / scheme-only inputs with a typed VAULT_BAD_ENDPOINT
error instead of returning a useless empty string downstream.

Adds endpoint_test.go covering 12 happy-path cases (tcp/http/https with
and without ports, schemeless host:port, schemeless host, IPv4 loopback,
IPv6 literal with and without port, whitespace trimming) and 3 error
cases (empty, whitespace-only, scheme-only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The TODO block above LifecycleService.ReloadPipelines documented the
function as "currently a no-op for state recovery" and instructed the
reader that /rune:activate cannot recover from dormant — both incorrect.
Task #28 (PR #117) wired Manager.SetReloadFunc / Manager.Retrigger so
ReloadPipelines re-spawns RunBootLoop from a terminal Dormant state,
and the smoke test in this branch history confirmed end-to-end recovery
without a process restart.

The stale comment is load-bearing — a recent parity audit took the TODO
at face value and reported reload_pipelines as incomplete in Go, flagging
it as a blocker for deleting the Python tree. Replacing it with the
actual contract avoids the same trap on future audits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The recall pipeline issues 8 external-IO calls (3 to embedder, 2 to
envector, 3 to vault) and was relying on gRPC keepalive alone to fail
hung dependencies. Keepalive only catches connection-level death; a
server that ACKs the request, holds the stream, and never replies will
stall the whole MCP request indefinitely — a real risk with FHE
operations on stressed envector clusters or AES decrypts batching too
many entries on a slow Vault host.

Wraps each external call in a context.WithTimeout derived from the
caller's context. Five constants (one per hop type) with explicit
rationale in the doc block:

  embedderCallTimeout         10s   runed cold-start tolerance
  envectorScoreTimeout        30s   FHE inner-product (heaviest hop)
  envectorMetadataTimeout     15s   shard/row cipher lookup
  vaultDecryptScoresTimeout   30s   FHE decrypt (matches Score)
  vaultDecryptMetadataTimeout 30s   AES-only, batch tolerant

The cancel() is called immediately after each RPC returns so the
timer resource releases as soon as the call resolves; this matches
the pattern Go's context docs recommend for non-deferred per-call
contexts. A hung dependency now surfaces as a typed
DeadlineExceeded error in seconds instead of stalling the request.

Drops the TODO at L41 that flagged this gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@couragehong couragehong changed the title Couragehong/chore/v04 hardening chore: v04 hardening May 8, 2026
@couragehong couragehong self-assigned this May 8, 2026
…ong/chore/v04-hardening

# Conflicts:
#	internal/service/lifecycle.go
@couragehong couragehong merged commit 35f59ff into feat/go-migration May 8, 2026
1 check passed
@couragehong couragehong deleted the couragehong/chore/v04-hardening branch May 8, 2026 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant