Skip to content

v0.4.0 — ship vLLM plugin for inference-time probe scoring #9

@caiovicentino

Description

@caiovicentino

Listed in Q3 2026 roadmap. ~5ms overhead target. Plugin should:

  1. Hook into vLLM's LLM.generate() postprocess
  2. Forward residual stream to loaded probe(s) at named layer
  3. Return score alongside text in response

Reference: vLLM's extension API + agent-probe-guard sklearn probe interface.

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent-guardagent-probe-guard SDK (capability + thinking probes)fabricationguardFabricationGuard hallucination probe

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions