[Ready for Review] Adapter | Condense tutorial for human, create agent version separately#21
[Ready for Review] Adapter | Condense tutorial for human, create agent version separately#21crystalxyz wants to merge 10 commits intoharbor-framework:mainfrom
Conversation
Ternura143
left a comment
There was a problem hiding this comment.
LGTM! I only have a small issue that needs to be addressed.
| | `split` | string | yes | Split name matching original. Use `"full"` if adapter works for all splits collectively. If different splits are registered/validated in different ways, split them out separately. | | ||
| | `adapted_benchmark_size` | int | yes | Number of tasks the adapter can convert. May differ from original if tasks were excluded for sufficient reasons documented in the README. | | ||
| | `parity_benchmark_size` | int | yes | Number of tasks used for parity. Equals `adapted_benchmark_size` if full set. | | ||
| | `parity_sampling_rate` | float | yes | `adapted_benchmark_size / parity_benchmark_size` | |
There was a problem hiding this comment.
parity_sampling_rate formula is inverted in the schema — should be parity_benchmark_size / adapted_benchmark_size, not adapted_benchmark_size / parity_benchmark_size.
|
|
||
| ### Step 3: Verify Oracle Solutions | ||
|
|
||
| Run the oracle agent on your entire dataset and confirm **100% reward on all tasks**. |
There was a problem hiding this comment.
For cases where the original benchmark has broken oracles, maybe we can advice the agent to document the tasks with oracle issues and file bugs to the upstream benchmark instead of attempting to fix them on the Harbor side.
6a77ccd to
99eb1f1
Compare
| |----------|-----------|-----------------|---------| | ||
| | **A: Compatible agents exist** | Original benchmark supports Harbor-compatible agents (OpenHands, Codex, Claude-Code, Gemini-CLI) | None — run parity with identical settings on both sides | [ADEBench](https://github.com/harbor-framework/harbor/tree/main/adapters/adebench) — original benchmark already supports Claude Code | | ||
| | **B: LLM-based, no compatible agents** | Original benchmark is LLM-based but lacks Harbor agents | Fork the original repo, implement Harbor-compatible agents, document in fork's README | [EvoEval](https://github.com/harbor-framework/harbor/tree/main/adapters/evoeval) — forked repo to add codex agent support | | ||
| | **C: Custom agents** | Original benchmark uses custom agents unavailable in Harbor | Implement custom agent in `adapters/<name>/`. Also run with standard agents (Codex, Claude-Code) to show generalization | [MedAgentBench](https://github.com/harbor-framework/harbor/tree/main/adapters/medagentbench) — implements custom HTTPAgent matching original GET/POST/FINISH semantics | |
There was a problem hiding this comment.
Maybe we can add after the MedAgentBench example: "For custom agents, ensure the evaluation/scoring logic lives in the verifier, not inside the custom agent."
| Your task instruction here... | ||
| Multiple lines... | ||
| author_email: example@email.com | ||
| author_name: Author Name |
There was a problem hiding this comment.
I'm not quite sure where to add the formatting instructions for task.toml, like in harbor pr 1289, such as task.toml's author_name must credit the original benchmark authors (matching the Citation bibtex), not the adapter builder.
Summary
This PR was motivated by a prior Harbor meeting discussion saying that the current harbor adapter tutorial is really long and hard to follow for human readers. In order to lower the entrance boundary and make it easier for people to contribute, we decide to condense the adapter tutorial for human readers so that they can easily keep track of their progress. Also, a separate agent-versioned tutorial is created so that contributors can easily use agents to build adapters.
NOTE: New structure updates (Mar 28, 2026)
/doc/datasets/adapter-human(Title:Adapters (Human Guide)) -> Human version tutorial/doc/datasets/adapter(Title:Adapters (Agent Guide)) -> Agent version tutorialDesign details
/adapterurl is reserved for agents with the most comprehensive information, with a callout box to redirect human readers to a more concise website. In this way, all the adapter readers will be aware of the agent website so that they can use it to build their adapter.adapters.mdxfile is used to host agent guide, because it makes git history and changelogs easier to followadapters-human.mdxfile is used to host human guide, and this is created from scratch to show a concised version of tutorial steps