Skip to content

v0.4.0 — Swap agent-probe-guard placeholder for Phase 6 N=99 SWE-bench data #6

@caiovicentino

Description

@caiovicentino

v0.3.0 shipped agent-probe-guard with placeholder capability data (N=42 from Phase 7). Phase 11/11b validated N=99 4/4 pushdown levers — swap the artifact to use the larger dataset.

Acceptance: AUROC stays in expected range; new sklearn p95 measurement; refit helper works in clean env; CHANGELOG entry; bump to v0.4.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent-guardagent-probe-guard SDK (capability + thinking probes)paperResearch paper writing / submission / review

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions