I build agentic engineering tools: repo readiness checks, review handoffs, verification gates, run ledgers, and local-first product prototypes.
My focus is the practical layer around coding agents: the prompts, scripts, workflows, and small products that keep AI-generated work scoped, inspectable, and easier to trust.
If you are building with Codex or evaluating how AI changes software work, find me on X @manuelsampedrop.
- Agent reliability: repo setup checks, context contracts, review packets, and repeatable handoffs.
- Verification discipline: change-aware test plans, honest closeout notes, and evidence before claims.
- Agent auditability: run ledgers, decisions, file changes, command history, and blockers a reviewer can inspect.
- Agent safety: permission gates, MCP tool controls, and receipt-based authorization for sensitive actions.
- Product judgment: small local-first prototypes that test the workflow before adding backend weight.
If you have five minutes, start with the workflow rather than the full repo list:
- Read repo-flightcheck, codex-review-packet, verify-by-change, and agent-run-ledger as the core reliability loop: readiness, review context, verification, and durable audit evidence.
- Check agent-context-sentinel, agent-secret-sentinel, and mcp-guard for safety judgment around untrusted context, secrets, and tool permissions.
- Use the Profile Evidence Map and cross-repo examples Agent Release Readiness Chain and Agent Review Packet to Ledger Chain to see how the claims and tools compose.
| Repo | What it proves | Why it matters |
|---|---|---|
| agent-request-brief | Raw request clarification | Audits raw coding-agent requests for objective, scope, acceptance criteria, constraints, context, verification, risks, and next actions before broad edits start. |
| agent-task-contract | Task scope readiness | Validates Markdown task briefs before coding-agent work starts, including stable acceptance-criteria IDs for traceable review packets, closeouts, ledgers, and profile proof. |
| agent-repo-map | Repository context mapping | Generates compact pre-run maps of docs, languages, entrypoints, commands, verification signals, CI, Git state, and risk paths before a coding-agent handoff. |
| agent-handoff-brief | Pre-run agent handoffs | Turns task contracts and repo context into compact coding-agent briefs with required reading, commands, verification, risk paths, gaps, and a ready-to-use prompt. |
| agent-continuation-brief | Long-run continuation handoffs | Audits continuation notes for original objective, current state, completed work, blockers, changed files, commands, next actions, risks, and next-agent instructions before another coding agent resumes. |
| agent-handoff-drift | Handoff state drift | Checks handoff notes against the live repo for missing files, stale HEAD or branch claims, false clean-tree claims, and weak command-success evidence before another agent continues. |
| agent-start-gate | Pre-run start gate | Gates coding-agent starts from a Markdown packet with objective, scope, inputs, traceable evidence pointers, worktree state, context screening, verification commands, and stop conditions before the first edit. |
| agent-context-sentinel | Context injection preflight | Audits untrusted context for prompt-injection language, hidden authority claims, secret exfiltration requests, dangerous commands, unattended sensitive actions, local paths, and missing source metadata before agent handoff. |
| agent-context-budget | Context budget planning | Estimates token pressure, detects oversized or duplicate context, and produces keep/summarize/drop plans before coding-agent handoffs. |
| agent-tool-schema-lint | Tool schema quality | Lints JSON tool definitions for strong descriptions, object schemas, required fields, extra-property control, parameter guidance, enum clarity, schema-aligned input examples, and safety language before agents can call them. |
| agent-tool-call-replay | Tool-call schema replay | Replays captured agent tool calls against current schemas to catch unknown tools, invalid JSON arguments, missing required fields, type mismatches, enum drift, extra arguments, and missing or duplicate stable call IDs before reruns or proof packets reuse them. |
| agent-output-contract | Structured output evidence | Validates coding-agent JSON outputs for schema version, outcome, score, issue lists, blocker consistency, structured check evidence, and obvious secret or local-path leakage before CI, ledgers, or review packets trust them. |
| agent-evidence-chain | Cross-artifact proof consistency | Validates that multiple JSON evidence artifacts from one coding-agent run share task, run, repo, and commit identity before review packets, ledgers, or closeouts treat them as one proof chain. |
| agent-source-grounding | Source-grounded agent claims | Audits agent-written Markdown and JSON artifacts for explicit sources, concrete evidence pointers, claim grounding, placeholder citations, and optional HTTP link checks before public proof or decisions reuse them. |
| agent-acceptance-trace | Acceptance criteria traceability | Turns task acceptance criteria, diffs, and closeout evidence into a criterion-by-criterion matrix with covered, partial, or missing status before a final answer is accepted. |
| agent-scope-guard | Scope boundary enforcement | Fails coding-agent diffs when changed paths fall outside an explicit file or glob allowlist, with text and JSON output, tests, CI, repo readiness proof, and validated proof-packet evidence. |
| agent-worktree-guard | Dirty worktree protection | Snapshots pre-existing user edits before a coding-agent run, emits reusable snapshot hashes, and blocks tampered baselines, protected-file drift, or unexpected dirty paths outside the task allowlist. |
| agent-instruction-audit | Agent instruction readiness | Audits AGENTS.md, CLAUDE.md, GEMINI.md, CURSOR.md, and .cursorrules for actionable scope, constraints, verification, safety, closeout guidance, and risky commands. |
| agent-decision-guard | Decision documentation gate | Blocks decision-worthy diffs when CI, automation, config, security, product scope, or agent-instruction changes lack DECISIONS.md, TODO.md, or an explicit no-follow-up waiver. |
| agent-diff-budget | Diff size and risk budget | Fails broad coding-agent diffs when changed files, line volume, or high-risk files exceed explicit review budgets, with path-level risk tags, reviewer questions, and validated proof-packet evidence. |
| agent-diff-splitter | Oversized diff split planning | Turns broad coding-agent diffs into ordered review slices by security, data, release, automation, agent instructions, tests, application, and product/docs with files, line counts, rationale, reviewer questions, and validated proof-packet evidence. |
| agent-review-map | Review lane routing | Maps mixed coding-agent diffs into security, data, release, automation, agent-instruction, product/docs, tests, and application lanes with owners, handoff order, reviewer questions, and validated proof-packet evidence. |
| agent-review-finding-check | Review finding quality | Audits coding-agent review findings for severity, concrete file lines, impact, actionable fixes, vague language, diff membership, and validated proof-packet evidence. |
| agent-run-ledger | Agent audit trails | Records AI agent runs as JSONL, imports review packets with embedded task contracts, CI, published-HEAD proof, readiness, sensitive-change blockers, rendered verification-envelope metadata, task-contract metadata from JSON envelopes and readiness reports, repo readiness reports and contracts, Markdown or JSON-envelope verification plans, and GitHub Actions run evidence, gates unresolved evidence with strict doctor mode, and renders static review reports. |
| agent-tool-call-audit | Tool-call history review | Audits coding-agent tool-call logs for destructive commands, sensitive tool actions, repeated failures, secret markers, missing working directories, skipped safety hooks, and missing approval evidence. |
| agent-memory-audit | Agent memory hygiene | Audits local agent memory files for stale current-state claims, missing or weak concrete source evidence, public-action risk, local-path exposure, weak memory policy, and secret-material markers before context is reused. |
| agent-pr-brief | PR description quality | Audits pull-request descriptions against real diffs for required sections, unmentioned changed files, risky paths, weak verification evidence, vague language, and large-diff scope notes. |
| agent-plan-trace | Plan execution traceability | Audits coding-agent plans against diffs, command logs, and closeouts so completed steps stay tied to evidence and pending work cannot hide behind a confident final answer. |
| repo-flightcheck | Pre-agent readiness | Audits whether a repository is ready for Codex, Claude Code, and human reviewers, including agent-readiness contract output, structured task-contract metadata, optional task-contract validation, CI/local verification coverage, local tool availability, remote publication failure classification, published-HEAD readiness, Python unittest detection, Python and Node CLI entrypoint readiness, GitHub Action repos, and stale documented commands. |
| codex-review-packet | Review context quality | Packages diffs, repo rules, local context, task contracts, review lanes, sensitive-change checks, repo readiness reports or contracts, generated Markdown or JSON-envelope verification plans with task-contract summaries, published-HEAD proof, and GitHub Actions CI evidence into a sharper handoff for Codex or Claude Code. |
| verify-by-change | Evidence-based closeout | Suggests honest checks from committed diffs, working-tree changes, and generated review packets, with JSON envelope metadata, packet readiness context, task-contract metadata, CI-local command parity, GitHub Action/workflow guidance, secret-material and security-sensitive path checks, Python and Node CLI context, and a repo readiness contract. |
| diff-to-eval | Agent learning loops | Turns real unified diffs into reusable JSON evaluation cases with changed files, risk tags, suggested checks, expected outcomes, tests, CI, and repo readiness proof. |
| agent-eval-runner | Eval case scoring | Scores proof packets, closeouts, or review notes against saved diff-to-eval cases for changed files, suggested checks, risk tags, expected outcomes, and pass/fail thresholds. |
| agent-bug-repro | Reproducible bug handoffs | Audits bug reports for summary, repro steps, expected and actual behavior, environment, evidence, regression context, vague language, and error-log signals before a debugging agent starts guessing. |
| agent-retry-guard | Retry-loop control | Detects repeated failed commands, unchanged error signatures, consecutive blind retries, missing strategy shifts, and verified failed command receipts before another run wastes context. |
| agent-ci-failure-packet | Focused CI retries | Turns noisy CI logs or verified failed command receipts into Markdown or JSON packets with failing commands, error signals, referenced files, suggested checks, and a scoped next-agent prompt. |
| agent-release-note-check | Release note accuracy | Audits release notes against real diffs so maintainers do not miss breaking, security, dependency, CI, test, code changes, or unsupported verification claims before publishing a release. |
| agent-rollback-plan | Operational rollback review | Turns risky agent diffs into rollback packets with changed files, risk tags, rollback steps, post-rollback checks, and reviewer questions before changes ship. |
| agent-closeout-check | Evidence-backed final answers | Lints coding-agent closeouts for summaries, changed-file evidence, exact verification commands, residual risks, vague claims, tests, CI, and repo readiness proof. |
| agent-claim-check | Closeout claim verification | Checks coding-agent closeout claims against changed files, exact claimed commands, explicit command evidence, risky paths, and no-risk language before PR comments or proof packets reuse the final answer. |
| agent-command-receipt | Command evidence receipts | Creates and verifies hashed command-outcome receipts with evidence file hashes before closeouts, proof packets, or ledgers reuse test and verification claims. |
| agent-test-impact | Test-impact evidence | Maps coding-agent diffs to direct, partial, or missing test evidence for each changed source file before a broad test pass is treated as enough. |
| agent-change-risk | Review-gate routing | Classifies coding-agent diffs into risk tags and required gates for scope, secrets, runbook drift, CI failure packets, rollback, eval cases, closeout checks, and change-aware verification. |
| agent-dependency-guard | Dependency-surface review | Classifies dependency manifest, lockfile, package spec, and install-script changes in coding-agent diffs before tests or closeouts treat the change as safe. |
| agent-merge-readiness | Merge verdict gate | Turns diff risk, explicit check results, and closeout evidence into strict ready, needs-review, or blocked verdicts with non-ready exit codes for automation. |
| agent-proof-packet | Review proof packets | Packages coding-agent diffs, explicit checks, evidence files, risks, decisions, open questions, and missing evidence into Markdown or JSON proof packets. |
| agent-publish-queue | Publication queue audit | Audits local proof repos for branch, HEAD, dirty state, GitHub remote, optional public HTTP status, blockers, and next actions before profile promotion. |
| profile-proof-audit | Profile claim audit | Audits a GitHub profile README for required proof sections, Selected Work shape, Latest Proof items, relative links, optional public HTTP status, and unsupported claim language. |
| runbook-drift-check | Operational doc drift | Checks README, AGENTS.md, and runbooks for missing local links, stale path references, script command drift, optional bash syntax checks, tests, CI, and repo readiness proof. |
| briefboard-local | Product scoping taste | Turns messy kickoff notes into a structured build brief, flags missing essentials and weak handoff scope, and generates a Codex-ready prompt with no backend, importable examples, and CI-local checks. |
These are small on purpose. I prefer tools a reviewer can clone, inspect, run, and challenge over larger demos with less operational signal.
| Repo | What it proves | Why it matters |
|---|---|---|
| agent-secret-sentinel | Secret leak preflight | Scans agent-generated diffs for likely private keys, provider tokens, suspicious assignments, and high-entropy values before commits, pull requests, or public examples. |
| agent-artifact-redactor | Artifact redaction before public proof | Redacts logs, transcripts, proof packets, and command artifacts for sensitive markers, then writes hash manifests that tie redacted copies to source artifacts before publication. |
| deploy-gate | Human authorization for AI-driven deploys | Blocks sensitive PRs until a named human approves the exact action with a signed receipt. |
| mcp-guard | Tool-call control for MCP agents | Enforces allow, block, or approval rules before dangerous MCP tool calls execute. |
| pp-cli | Local receipt verification | Verifies Permission Protocol receipts with local Ed25519 signature checks. |
| python-sdk | Approval workflow integration | Lets Python workflows request and verify authority receipts around sensitive actions. |
- Start with a raw request audit, a real brief, explicit acceptance criteria, and the smallest useful scope. Preserve that state across long-running agent sessions, then check handoff notes against the live repo instead of asking the next run to infer what happened. See agent-request-brief, agent-task-contract, agent-handoff-brief, agent-continuation-brief, agent-handoff-drift, agent-acceptance-trace, briefboard-local, Agent Request Brief, Agent Task Contract Preflight, Stable Acceptance Criteria IDs, Agent Handoff Brief, Agent Continuation Brief, Agent Handoff Drift, Agent Acceptance Trace, Handoff Scope Warnings Before Codex, and Brief Readiness Before Codex.
- Protect pre-existing user edits and enforce expected changed paths instead of trusting the agent to stay inside scope. See agent-worktree-guard, agent-scope-guard, Agent Worktree Guard, Hash-Backed Worktree Snapshots, Scope Guard for Agent Diffs, and Proof Packet Backed Scope Guards.
- Keep broad agent diffs below an explicit review budget, then split oversized changes into reviewable slices instead of pushing one broad patch. See agent-diff-budget, agent-diff-splitter, Agent Diff Budget, Proof Packet Backed Diff Budgets, Agent Diff Splitter, and Proof Packet Backed Diff Splits.
- Route mixed diffs to concrete reviewer lanes instead of accepting one flat handoff. See agent-review-map, Agent Review Map, and Proof Packet Backed Review Maps.
- Map repo context, budget context bundles, screen untrusted context, lint tool schemas, replay captured tool calls against current schemas, validate structured outputs, check evidence-chain consistency, require source grounding, check readiness, and audit instruction quality before handing work to an agent. See agent-repo-map, agent-context-budget, agent-context-sentinel, agent-tool-schema-lint, agent-tool-call-replay, agent-output-contract, agent-evidence-chain, agent-source-grounding, repo-flightcheck, agent-instruction-audit, Agent Repo Map, Agent Context Budget, Agent Context Sentinel, Agent Tool Schema Lint, Schema-Backed Tool Examples, Agent Tool Call Replay, Stable Tool Call IDs, Agent Output Contract, Structured Output Checks, Agent Evidence Chain, Run Identity Evidence Chains, Agent Source Grounding, Concrete Grounding Pointers, Agent Instruction Audit, Structured Task Contract Readiness, Task Contract Readiness Before Agent Work, Remote Failure Classification Before Public Proof, Published HEAD Before Public Proof, Remote Readiness Before Public Proof, Agent Readiness Contract Output, Local Tool Availability Preflight, Python CLI Entrypoint Readiness, Node CLI Entrypoint Readiness, Python Unittest Readiness Check, GitHub Action Repo Readiness, and Documented Command Drift Check.
- Keep operational docs tied to executable reality after agent or automation changes. See runbook-drift-check and Runbook Drift Check.
- Record durable intent when diffs alter CI, automation, config, security, product scope, or future agent behavior. See agent-decision-guard and Agent Decision Guard.
- Package repo-aware context so reviews can be stricter and more useful, audit the findings themselves, then check the PR description against the actual diff before posting it. See codex-review-packet, agent-review-finding-check, agent-pr-brief, Task Contracts in Review Packets, Task Contract Envelope Summary in Review Packets, Published HEAD Proof in Review Packets, Sensitive Change Checks in Review Packets, CI Evidence in Review Packets, Readiness Contract in Review Packets, Repo Readiness in Review Packets, Review Packet With Generated Verification, Generated Verification Envelopes in Review Packets, Verification Envelope in Review Packets, Review Map in Agent Packets, Proof Packet Backed Review Findings, Agent Review Finding Check, Agent PR Brief, and AI Repo Review Findings.
- Route each diff to the gates it actually needs before review. See agent-change-risk, agent-dependency-guard, Change Risk Matrix for Agent Diffs, and Agent Dependency Guard.
- Make merge readiness an explicit verdict, not a confident sentence. See agent-merge-readiness and Merge Readiness Gate for Agent Diffs.
- Package final evidence into a compact review artifact, then redact public-facing artifacts before sharing. See agent-proof-packet, agent-artifact-redactor, Agent Proof Packet for Review, Agent Artifact Redactor, and Redacted Artifact Manifests.
- Keep local proof repos, remotes, TODOs, and public profile claims in sync before promotion. See agent-publish-queue and Publish Queue for Local Agent Repos.
- Audit profile claims before treating the public surface as ready. See profile-proof-audit and Profile Proof Audit.
- Match verification to the actual change type, inspect test-impact evidence, and reuse real diffs as regression cases. See verify-by-change, agent-test-impact, diff-to-eval, agent-eval-runner, Agent Test Impact, Diff to Eval Case, Agent Eval Runner, Security-Sensitive Change Verification, Review Packet Readiness to Verification Envelope, Task Contract Metadata in Verification Envelopes, Verification by Change Type, Python CLI Change Verification, Node CLI Change Verification, GitHub Action Change Verification, JSON Envelope for Verification Gates, Review Packet to Verification Checklist, and Repo Readiness Contract for Agent Repos.
- Check the final answer for exact evidence, then compare its claims, completed plan items, actual diff, and hashed command receipts before accepting the handoff. See agent-plan-trace, agent-closeout-check, agent-claim-check, agent-command-receipt, Agent Plan Trace, Closeout Evidence Check for Agents, Agent Claim Check, and Agent Command Receipt.
- Turn vague bugs, retry loops, failed CI, release-note drift, and rollback risk into compact packets before asking another agent to continue or publishing maintainer-facing output. See agent-bug-repro, agent-retry-guard, agent-ci-failure-packet, agent-release-note-check, agent-rollback-plan, Agent Bug Repro, Agent Retry Guard, Receipt-Backed Retry Guard, CI Failure Packet for Agent Reruns, Receipt-Backed CI Failure Packets, Agent Release Note Check, Proof Packet Backed Release Notes, and Rollback Plan for Agent Diffs.
- Leave an audit trail for non-trivial agent runs, including tool-call history, approval evidence, and memory hygiene when context is reused. See agent-run-ledger, agent-tool-call-audit, agent-memory-audit, Agent Tool Call Audit, Approval-Backed Tool Call Audits, Agent Memory Audit, Concrete Source Memory Audits, Readiness Task Contract to Ledger Evidence, Task Contract Envelope to Ledger Evidence, Task Contract Evidence to Ledger, Published HEAD Proof to Ledger, Sensitive Review Packet to Ledger Evidence, Rendered Verification Envelope to Ledger Evidence, Review Packet CI Evidence to Ledger, GitHub Actions Run Evidence to Ledger, Review Packet Readiness to Ledger Evidence, Verification Envelope Readiness to Ledger Evidence, Readiness Contract to Ledger Evidence, Review Packet Verification to Ledger, Review Packet to Ledger Evidence, Repo Readiness to Ledger Evidence, and Verification Envelope to Ledger Evidence.
- Catch secret exposure before agent diffs become public artifacts, redact evidence packets before publication, tie redacted artifacts to manifests, then gate sensitive actions with explicit human authorization where execution risk is higher than review risk. See agent-secret-sentinel, agent-artifact-redactor, Agent Diff Secret Sentinel, Agent Artifact Redactor, Redacted Artifact Manifests, deploy-gate, and mcp-guard.
- AI lab notes: build notes, decisions, and launch logs tied to real repos or workflows.
- Recipes: reusable prompts, checklists, and implementation patterns that came from actual work.
- Examples: concrete proof-packet shapes for verifying profile and agent-workbench claims.
- Tooling radar: short research only when it changes a build or tooling decision.
- Docs: public operating docs, including the automation runbook and profile strategy.
This profile repo has a small local check so maintenance changes are not just copy edits:
make test
make lint
make buildThe check validates shell scripts, compiles local Python audit tools, runs Python unit tests, runs the commit-script shell fixture, executes the profile quality audit, regenerates public indexes, checks latest-proof freshness, refreshes latest-proof links, and fails if generated files drift.
- Latest lab note: 2026-06-07 - Publish Guard Coverage Stops at No-Op
- Latest recipes:
- Ship useful proof, not activity theater.
- Optimize for reviewability: strong AI workflows should leave evidence.
- Prefer own repos and working artifacts over meta commentary.
- Keep claims honest: what exists, what was tested, and what is still limited.
- Use the workbench as supporting evidence, not as a substitute for real projects.