Manuel Sampedro manuelsampedro1

Manuel Sampedro

I build agentic engineering tools: repo readiness checks, review handoffs, verification gates, run ledgers, and local-first product prototypes.

My focus is the practical layer around coding agents: the prompts, scripts, workflows, and small products that keep AI-generated work scoped, inspectable, and easier to trust.

If you are building with Codex or evaluating how AI changes software work, find me on X @manuelsampedrop.

Current Focus

Agent reliability: repo setup checks, context contracts, review packets, and repeatable handoffs.
Verification discipline: change-aware test plans, honest closeout notes, and evidence before claims.
Agent auditability: run ledgers, decisions, file changes, command history, and blockers a reviewer can inspect.
Agent safety: permission gates, MCP tool controls, and receipt-based authorization for sensitive actions.
Product judgment: small local-first prototypes that test the workflow before adding backend weight.

Reviewer Path

If you have five minutes, start with the workflow rather than the full repo list:

Read repo-flightcheck, codex-review-packet, verify-by-change, and agent-run-ledger as the core reliability loop: readiness, review context, verification, and durable audit evidence.
Check agent-context-sentinel, agent-secret-sentinel, and mcp-guard for safety judgment around untrusted context, secrets, and tool permissions.
Use the Profile Evidence Map and cross-repo examples Agent Release Readiness Chain and Agent Review Packet to Ledger Chain to see how the claims and tools compose.

Selected Work

Repo	What it proves	Why it matters
agent-request-brief	Raw request clarification	Audits raw coding-agent requests for objective, scope, acceptance criteria, constraints, context, verification, risks, and next actions before broad edits start.
agent-task-contract	Task scope readiness	Validates Markdown task briefs before coding-agent work starts, including stable acceptance-criteria IDs for traceable review packets, closeouts, ledgers, and profile proof.
agent-repo-map	Repository context mapping	Generates compact pre-run maps of docs, languages, entrypoints, commands, verification signals, CI, Git state, and risk paths before a coding-agent handoff.
agent-handoff-brief	Pre-run agent handoffs	Turns task contracts and repo context into compact coding-agent briefs with required reading, commands, verification, risk paths, gaps, and a ready-to-use prompt.
agent-continuation-brief	Long-run continuation handoffs	Audits continuation notes for original objective, current state, completed work, blockers, changed files, commands, next actions, risks, and next-agent instructions before another coding agent resumes.
agent-handoff-drift	Handoff state drift	Checks handoff notes against the live repo for missing files, stale HEAD or branch claims, false clean-tree claims, and weak command-success evidence before another agent continues.
agent-start-gate	Pre-run start gate	Gates coding-agent starts from a Markdown packet with objective, scope, inputs, traceable evidence pointers, worktree state, context screening, verification commands, and stop conditions before the first edit.
agent-context-sentinel	Context injection preflight	Audits untrusted context for prompt-injection language, hidden authority claims, secret exfiltration requests, dangerous commands, unattended sensitive actions, local paths, and missing source metadata before agent handoff.
agent-context-budget	Context budget planning	Estimates token pressure, detects oversized or duplicate context, and produces keep/summarize/drop plans before coding-agent handoffs.
agent-tool-schema-lint	Tool schema quality	Lints JSON tool definitions for strong descriptions, object schemas, required fields, extra-property control, parameter guidance, enum clarity, schema-aligned input examples, and safety language before agents can call them.
agent-tool-call-replay	Tool-call schema replay	Replays captured agent tool calls against current schemas to catch unknown tools, invalid JSON arguments, missing required fields, type mismatches, enum drift, extra arguments, and missing or duplicate stable call IDs before reruns or proof packets reuse them.
agent-output-contract	Structured output evidence	Validates coding-agent JSON outputs for schema version, outcome, score, issue lists, blocker consistency, structured check evidence, and obvious secret or local-path leakage before CI, ledgers, or review packets trust them.
agent-evidence-chain	Cross-artifact proof consistency	Validates that multiple JSON evidence artifacts from one coding-agent run share task, run, repo, and commit identity before review packets, ledgers, or closeouts treat them as one proof chain.
agent-source-grounding	Source-grounded agent claims	Audits agent-written Markdown and JSON artifacts for explicit sources, concrete evidence pointers, claim grounding, placeholder citations, and optional HTTP link checks before public proof or decisions reuse them.
agent-acceptance-trace	Acceptance criteria traceability	Turns task acceptance criteria, diffs, and closeout evidence into a criterion-by-criterion matrix with covered, partial, or missing status before a final answer is accepted.
agent-scope-guard	Scope boundary enforcement	Fails coding-agent diffs when changed paths fall outside an explicit file or glob allowlist, with text and JSON output, tests, CI, repo readiness proof, and validated proof-packet evidence.
agent-worktree-guard	Dirty worktree protection	Snapshots pre-existing user edits before a coding-agent run, emits reusable snapshot hashes, and blocks tampered baselines, protected-file drift, or unexpected dirty paths outside the task allowlist.
agent-instruction-audit	Agent instruction readiness	Audits `AGENTS.md`, `CLAUDE.md`, `GEMINI.md`, `CURSOR.md`, and `.cursorrules` for actionable scope, constraints, verification, safety, closeout guidance, and risky commands.
agent-decision-guard	Decision documentation gate	Blocks decision-worthy diffs when CI, automation, config, security, product scope, or agent-instruction changes lack `DECISIONS.md`, `TODO.md`, or an explicit no-follow-up waiver.
agent-diff-budget	Diff size and risk budget	Fails broad coding-agent diffs when changed files, line volume, or high-risk files exceed explicit review budgets, with path-level risk tags, reviewer questions, and validated proof-packet evidence.
agent-diff-splitter	Oversized diff split planning	Turns broad coding-agent diffs into ordered review slices by security, data, release, automation, agent instructions, tests, application, and product/docs with files, line counts, rationale, reviewer questions, and validated proof-packet evidence.
agent-review-map	Review lane routing	Maps mixed coding-agent diffs into security, data, release, automation, agent-instruction, product/docs, tests, and application lanes with owners, handoff order, reviewer questions, and validated proof-packet evidence.
agent-review-finding-check	Review finding quality	Audits coding-agent review findings for severity, concrete file lines, impact, actionable fixes, vague language, diff membership, and validated proof-packet evidence.
agent-run-ledger	Agent audit trails	Records AI agent runs as JSONL, imports review packets with embedded task contracts, CI, published-HEAD proof, readiness, sensitive-change blockers, rendered verification-envelope metadata, task-contract metadata from JSON envelopes and readiness reports, repo readiness reports and contracts, Markdown or JSON-envelope verification plans, and GitHub Actions run evidence, gates unresolved evidence with strict doctor mode, and renders static review reports.
agent-tool-call-audit	Tool-call history review	Audits coding-agent tool-call logs for destructive commands, sensitive tool actions, repeated failures, secret markers, missing working directories, skipped safety hooks, and missing approval evidence.
agent-memory-audit	Agent memory hygiene	Audits local agent memory files for stale current-state claims, missing or weak concrete source evidence, public-action risk, local-path exposure, weak memory policy, and secret-material markers before context is reused.
agent-pr-brief	PR description quality	Audits pull-request descriptions against real diffs for required sections, unmentioned changed files, risky paths, weak verification evidence, vague language, and large-diff scope notes.
agent-plan-trace	Plan execution traceability	Audits coding-agent plans against diffs, command logs, and closeouts so completed steps stay tied to evidence and pending work cannot hide behind a confident final answer.
repo-flightcheck	Pre-agent readiness	Audits whether a repository is ready for Codex, Claude Code, and human reviewers, including agent-readiness contract output, structured task-contract metadata, optional task-contract validation, CI/local verification coverage, local tool availability, remote publication failure classification, published-HEAD readiness, Python unittest detection, Python and Node CLI entrypoint readiness, GitHub Action repos, and stale documented commands.
codex-review-packet	Review context quality	Packages diffs, repo rules, local context, task contracts, review lanes, sensitive-change checks, repo readiness reports or contracts, generated Markdown or JSON-envelope verification plans with task-contract summaries, published-HEAD proof, and GitHub Actions CI evidence into a sharper handoff for Codex or Claude Code.
verify-by-change	Evidence-based closeout	Suggests honest checks from committed diffs, working-tree changes, and generated review packets, with JSON envelope metadata, packet readiness context, task-contract metadata, CI-local command parity, GitHub Action/workflow guidance, secret-material and security-sensitive path checks, Python and Node CLI context, and a repo readiness contract.
diff-to-eval	Agent learning loops	Turns real unified diffs into reusable JSON evaluation cases with changed files, risk tags, suggested checks, expected outcomes, tests, CI, and repo readiness proof.
agent-eval-runner	Eval case scoring	Scores proof packets, closeouts, or review notes against saved `diff-to-eval` cases for changed files, suggested checks, risk tags, expected outcomes, and pass/fail thresholds.
agent-bug-repro	Reproducible bug handoffs	Audits bug reports for summary, repro steps, expected and actual behavior, environment, evidence, regression context, vague language, and error-log signals before a debugging agent starts guessing.
agent-retry-guard	Retry-loop control	Detects repeated failed commands, unchanged error signatures, consecutive blind retries, missing strategy shifts, and verified failed command receipts before another run wastes context.
agent-ci-failure-packet	Focused CI retries	Turns noisy CI logs or verified failed command receipts into Markdown or JSON packets with failing commands, error signals, referenced files, suggested checks, and a scoped next-agent prompt.
agent-release-note-check	Release note accuracy	Audits release notes against real diffs so maintainers do not miss breaking, security, dependency, CI, test, code changes, or unsupported verification claims before publishing a release.
agent-rollback-plan	Operational rollback review	Turns risky agent diffs into rollback packets with changed files, risk tags, rollback steps, post-rollback checks, and reviewer questions before changes ship.
agent-closeout-check	Evidence-backed final answers	Lints coding-agent closeouts for summaries, changed-file evidence, exact verification commands, residual risks, vague claims, tests, CI, and repo readiness proof.
agent-claim-check	Closeout claim verification	Checks coding-agent closeout claims against changed files, exact claimed commands, explicit command evidence, risky paths, and no-risk language before PR comments or proof packets reuse the final answer.
agent-command-receipt	Command evidence receipts	Creates and verifies hashed command-outcome receipts with evidence file hashes before closeouts, proof packets, or ledgers reuse test and verification claims.
agent-test-impact	Test-impact evidence	Maps coding-agent diffs to direct, partial, or missing test evidence for each changed source file before a broad test pass is treated as enough.
agent-change-risk	Review-gate routing	Classifies coding-agent diffs into risk tags and required gates for scope, secrets, runbook drift, CI failure packets, rollback, eval cases, closeout checks, and change-aware verification.
agent-dependency-guard	Dependency-surface review	Classifies dependency manifest, lockfile, package spec, and install-script changes in coding-agent diffs before tests or closeouts treat the change as safe.
agent-merge-readiness	Merge verdict gate	Turns diff risk, explicit check results, and closeout evidence into strict `ready`, `needs-review`, or `blocked` verdicts with non-ready exit codes for automation.
agent-proof-packet	Review proof packets	Packages coding-agent diffs, explicit checks, evidence files, risks, decisions, open questions, and missing evidence into Markdown or JSON proof packets.
agent-publish-queue	Publication queue audit	Audits local proof repos for branch, HEAD, dirty state, GitHub remote, optional public HTTP status, blockers, and next actions before profile promotion.
profile-proof-audit	Profile claim audit	Audits a GitHub profile README for required proof sections, Selected Work shape, Latest Proof items, relative links, optional public HTTP status, and unsupported claim language.
runbook-drift-check	Operational doc drift	Checks README, AGENTS.md, and runbooks for missing local links, stale path references, script command drift, optional bash syntax checks, tests, CI, and repo readiness proof.
briefboard-local	Product scoping taste	Turns messy kickoff notes into a structured build brief, flags missing essentials and weak handoff scope, and generates a Codex-ready prompt with no backend, importable examples, and CI-local checks.

These are small on purpose. I prefer tools a reviewer can clone, inspect, run, and challenge over larger demos with less operational signal.

Agent Safety Layer

Repo	What it proves	Why it matters
agent-secret-sentinel	Secret leak preflight	Scans agent-generated diffs for likely private keys, provider tokens, suspicious assignments, and high-entropy values before commits, pull requests, or public examples.
agent-artifact-redactor	Artifact redaction before public proof	Redacts logs, transcripts, proof packets, and command artifacts for sensitive markers, then writes hash manifests that tie redacted copies to source artifacts before publication.
deploy-gate	Human authorization for AI-driven deploys	Blocks sensitive PRs until a named human approves the exact action with a signed receipt.
mcp-guard	Tool-call control for MCP agents	Enforces allow, block, or approval rules before dangerous MCP tool calls execute.
pp-cli	Local receipt verification	Verifies Permission Protocol receipts with local Ed25519 signature checks.
python-sdk	Approval workflow integration	Lets Python workflows request and verify authority receipts around sensitive actions.

How I Work With Codex

Start with a raw request audit, a real brief, explicit acceptance criteria, and the smallest useful scope. Preserve that state across long-running agent sessions, then check handoff notes against the live repo instead of asking the next run to infer what happened. See agent-request-brief, agent-task-contract, agent-handoff-brief, agent-continuation-brief, agent-handoff-drift, agent-acceptance-trace, briefboard-local, Agent Request Brief, Agent Task Contract Preflight, Stable Acceptance Criteria IDs, Agent Handoff Brief, Agent Continuation Brief, Agent Handoff Drift, Agent Acceptance Trace, Handoff Scope Warnings Before Codex, and Brief Readiness Before Codex.
Protect pre-existing user edits and enforce expected changed paths instead of trusting the agent to stay inside scope. See agent-worktree-guard, agent-scope-guard, Agent Worktree Guard, Hash-Backed Worktree Snapshots, Scope Guard for Agent Diffs, and Proof Packet Backed Scope Guards.
Keep broad agent diffs below an explicit review budget, then split oversized changes into reviewable slices instead of pushing one broad patch. See agent-diff-budget, agent-diff-splitter, Agent Diff Budget, Proof Packet Backed Diff Budgets, Agent Diff Splitter, and Proof Packet Backed Diff Splits.
Route mixed diffs to concrete reviewer lanes instead of accepting one flat handoff. See agent-review-map, Agent Review Map, and Proof Packet Backed Review Maps.
Map repo context, budget context bundles, screen untrusted context, lint tool schemas, replay captured tool calls against current schemas, validate structured outputs, check evidence-chain consistency, require source grounding, check readiness, and audit instruction quality before handing work to an agent. See agent-repo-map, agent-context-budget, agent-context-sentinel, agent-tool-schema-lint, agent-tool-call-replay, agent-output-contract, agent-evidence-chain, agent-source-grounding, repo-flightcheck, agent-instruction-audit, Agent Repo Map, Agent Context Budget, Agent Context Sentinel, Agent Tool Schema Lint, Schema-Backed Tool Examples, Agent Tool Call Replay, Stable Tool Call IDs, Agent Output Contract, Structured Output Checks, Agent Evidence Chain, Run Identity Evidence Chains, Agent Source Grounding, Concrete Grounding Pointers, Agent Instruction Audit, Structured Task Contract Readiness, Task Contract Readiness Before Agent Work, Remote Failure Classification Before Public Proof, Published HEAD Before Public Proof, Remote Readiness Before Public Proof, Agent Readiness Contract Output, Local Tool Availability Preflight, Python CLI Entrypoint Readiness, Node CLI Entrypoint Readiness, Python Unittest Readiness Check, GitHub Action Repo Readiness, and Documented Command Drift Check.
Keep operational docs tied to executable reality after agent or automation changes. See runbook-drift-check and Runbook Drift Check.
Record durable intent when diffs alter CI, automation, config, security, product scope, or future agent behavior. See agent-decision-guard and Agent Decision Guard.
Package repo-aware context so reviews can be stricter and more useful, audit the findings themselves, then check the PR description against the actual diff before posting it. See codex-review-packet, agent-review-finding-check, agent-pr-brief, Task Contracts in Review Packets, Task Contract Envelope Summary in Review Packets, Published HEAD Proof in Review Packets, Sensitive Change Checks in Review Packets, CI Evidence in Review Packets, Readiness Contract in Review Packets, Repo Readiness in Review Packets, Review Packet With Generated Verification, Generated Verification Envelopes in Review Packets, Verification Envelope in Review Packets, Review Map in Agent Packets, Proof Packet Backed Review Findings, Agent Review Finding Check, Agent PR Brief, and AI Repo Review Findings.
Route each diff to the gates it actually needs before review. See agent-change-risk, agent-dependency-guard, Change Risk Matrix for Agent Diffs, and Agent Dependency Guard.
Make merge readiness an explicit verdict, not a confident sentence. See agent-merge-readiness and Merge Readiness Gate for Agent Diffs.
Package final evidence into a compact review artifact, then redact public-facing artifacts before sharing. See agent-proof-packet, agent-artifact-redactor, Agent Proof Packet for Review, Agent Artifact Redactor, and Redacted Artifact Manifests.
Keep local proof repos, remotes, TODOs, and public profile claims in sync before promotion. See agent-publish-queue and Publish Queue for Local Agent Repos.
Audit profile claims before treating the public surface as ready. See profile-proof-audit and Profile Proof Audit.
Match verification to the actual change type, inspect test-impact evidence, and reuse real diffs as regression cases. See verify-by-change, agent-test-impact, diff-to-eval, agent-eval-runner, Agent Test Impact, Diff to Eval Case, Agent Eval Runner, Security-Sensitive Change Verification, Review Packet Readiness to Verification Envelope, Task Contract Metadata in Verification Envelopes, Verification by Change Type, Python CLI Change Verification, Node CLI Change Verification, GitHub Action Change Verification, JSON Envelope for Verification Gates, Review Packet to Verification Checklist, and Repo Readiness Contract for Agent Repos.
Check the final answer for exact evidence, then compare its claims, completed plan items, actual diff, and hashed command receipts before accepting the handoff. See agent-plan-trace, agent-closeout-check, agent-claim-check, agent-command-receipt, Agent Plan Trace, Closeout Evidence Check for Agents, Agent Claim Check, and Agent Command Receipt.
Turn vague bugs, retry loops, failed CI, release-note drift, and rollback risk into compact packets before asking another agent to continue or publishing maintainer-facing output. See agent-bug-repro, agent-retry-guard, agent-ci-failure-packet, agent-release-note-check, agent-rollback-plan, Agent Bug Repro, Agent Retry Guard, Receipt-Backed Retry Guard, CI Failure Packet for Agent Reruns, Receipt-Backed CI Failure Packets, Agent Release Note Check, Proof Packet Backed Release Notes, and Rollback Plan for Agent Diffs.
Leave an audit trail for non-trivial agent runs, including tool-call history, approval evidence, and memory hygiene when context is reused. See agent-run-ledger, agent-tool-call-audit, agent-memory-audit, Agent Tool Call Audit, Approval-Backed Tool Call Audits, Agent Memory Audit, Concrete Source Memory Audits, Readiness Task Contract to Ledger Evidence, Task Contract Envelope to Ledger Evidence, Task Contract Evidence to Ledger, Published HEAD Proof to Ledger, Sensitive Review Packet to Ledger Evidence, Rendered Verification Envelope to Ledger Evidence, Review Packet CI Evidence to Ledger, GitHub Actions Run Evidence to Ledger, Review Packet Readiness to Ledger Evidence, Verification Envelope Readiness to Ledger Evidence, Readiness Contract to Ledger Evidence, Review Packet Verification to Ledger, Review Packet to Ledger Evidence, Repo Readiness to Ledger Evidence, and Verification Envelope to Ledger Evidence.
Catch secret exposure before agent diffs become public artifacts, redact evidence packets before publication, tie redacted artifacts to manifests, then gate sensitive actions with explicit human authorization where execution risk is higher than review risk. See agent-secret-sentinel, agent-artifact-redactor, Agent Diff Secret Sentinel, Agent Artifact Redactor, Redacted Artifact Manifests, deploy-gate, and mcp-guard.

Public Workbench

AI lab notes: build notes, decisions, and launch logs tied to real repos or workflows.
Recipes: reusable prompts, checklists, and implementation patterns that came from actual work.
Examples: concrete proof-packet shapes for verifying profile and agent-workbench claims.
Tooling radar: short research only when it changes a build or tooling decision.
Docs: public operating docs, including the automation runbook and profile strategy.

Verify This Repo

This profile repo has a small local check so maintenance changes are not just copy edits:

make test
make lint
make build

The check validates shell scripts, compiles local Python audit tools, runs Python unit tests, runs the commit-script shell fixture, executes the profile quality audit, regenerates public indexes, checks latest-proof freshness, refreshes latest-proof links, and fails if generated files drift.

Latest Proof

Latest lab note: 2026-06-07 - Publish Guard Coverage Stops at No-Op
Latest recipes:

Principles

Ship useful proof, not activity theater.
Optimize for reviewability: strong AI workflows should leave evidence.
Prefer own repos and working artifacts over meta commentary.
Keep claims honest: what exists, what was tested, and what is still limited.
Use the workbench as supporting evidence, not as a substitute for real projects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly