[Ready for Review] Adapter | Condense tutorial for human, create agent version separately by crystalxyz · Pull Request #21 · harbor-framework/harbor-docs

crystalxyz · 2026-03-25T05:31:33Z

Summary

This PR was motivated by a prior Harbor meeting discussion saying that the current harbor adapter tutorial is really long and hard to follow for human readers. In order to lower the entrance boundary and make it easier for people to contribute, we decide to condense the adapter tutorial for human readers so that they can easily keep track of their progress. Also, a separate agent-versioned tutorial is created so that contributors can easily use agents to build adapters.

NOTE: New structure updates (Mar 28, 2026)

/doc/datasets/adapter-human (Title: Adapters (Human Guide)) -> Human version tutorial
/doc/datasets/adapter (Title: Adapters (Agent Guide)) -> Agent version tutorial
A block added at the beginning of the human tutorial to point agents to the other page

Design details

/adapter url is reserved for agents with the most comprehensive information, with a callout box to redirect human readers to a more concise website. In this way, all the adapter readers will be aware of the agent website so that they can use it to build their adapter.
adapters.mdx file is used to host agent guide, because it makes git history and changelogs easier to follow
adapters-human.mdx file is used to host human guide, and this is created from scratch to show a concised version of tutorial steps

Ternura143

LGTM! I only have a small issue that needs to be addressed.

Ternura143 · 2026-03-25T12:23:52Z

+| `split` | string | yes | Split name matching original. Use `"full"` if adapter works for all splits collectively. If different splits are registered/validated in different ways, split them out separately. |
+| `adapted_benchmark_size` | int | yes | Number of tasks the adapter can convert. May differ from original if tasks were excluded for sufficient reasons documented in the README. |
+| `parity_benchmark_size` | int | yes | Number of tasks used for parity. Equals `adapted_benchmark_size` if full set. |
+| `parity_sampling_rate` | float | yes | `adapted_benchmark_size / parity_benchmark_size` |


parity_sampling_rate formula is inverted in the schema — should be parity_benchmark_size / adapted_benchmark_size, not adapted_benchmark_size / parity_benchmark_size.

AlienKevin · 2026-03-26T04:40:24Z

+
+### Step 3: Verify Oracle Solutions
+
+Run the oracle agent on your entire dataset and confirm **100% reward on all tasks**.


For cases where the original benchmark has broken oracles, maybe we can advice the agent to document the tasks with oracle issues and file bugs to the upstream benchmark instead of attempting to fix them on the Harbor side.

Ternura143

LGTM! Two small issues.

Ternura143 · 2026-04-15T15:35:25Z

+|----------|-----------|-----------------|---------|
+| **A: Compatible agents exist** | Original benchmark supports Harbor-compatible agents (OpenHands, Codex, Claude-Code, Gemini-CLI) | None — run parity with identical settings on both sides | [ADEBench](https://github.com/harbor-framework/harbor/tree/main/adapters/adebench) — original benchmark already supports Claude Code |
+| **B: LLM-based, no compatible agents** | Original benchmark is LLM-based but lacks Harbor agents | Fork the original repo, implement Harbor-compatible agents, document in fork's README | [EvoEval](https://github.com/harbor-framework/harbor/tree/main/adapters/evoeval) — forked repo to add codex agent support |
+| **C: Custom agents** | Original benchmark uses custom agents unavailable in Harbor | Implement custom agent in `adapters/<name>/`. Also run with standard agents (Codex, Claude-Code) to show generalization | [MedAgentBench](https://github.com/harbor-framework/harbor/tree/main/adapters/medagentbench) — implements custom HTTPAgent matching original GET/POST/FINISH semantics |


Maybe we can add after the MedAgentBench example: "For custom agents, ensure the evaluation/scoring logic lives in the verifier, not inside the custom agent."

Ternura143 · 2026-04-15T15:43:20Z

  Your task instruction here...
-  Multiple lines...
 author_email: example@email.com
 author_name: Author Name


I'm not quite sure where to add the formatting instructions for task.toml, like in harbor pr 1289, such as task.toml's author_name must credit the original benchmark authors (matching the Citation bibtex), not the adapter builder.

crystalxyz changed the title ~~Condense adapter tutorial for human readers, create agent version separately~~ [Ready for Review] Adapter | Condense tutorial for human, create agent version separately Mar 25, 2026

Ternura143 suggested changes Mar 25, 2026

View reviewed changes

AlienKevin reviewed Mar 26, 2026

View reviewed changes

Create human-readable and agent-readable tutorials separately

99eb1f1

crystalxyz force-pushed the adapter-tutorial-fix branch from 6a77ccd to 99eb1f1 Compare March 28, 2026 05:40

crystalxyz added 8 commits March 28, 2026 01:48

Add back conflicts from registry updates

9f797be

Update formats

97db1c1

fix registry command

3634803

Address comments, update adapter structure, add examples and

a8e7087

Remove broken urls

33b3184

More changes

8e5416e

Update urls

c13ec71

Fix a char

965104c

Ternura143 mentioned this pull request Apr 2, 2026

[Ready for Review] fix: align review bot and adapter template harbor-framework/harbor#1346

Merged

update registry dataset instructions

ca163eb

Ternura143 suggested changes Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ready for Review] Adapter | Condense tutorial for human, create agent version separately#21

[Ready for Review] Adapter | Condense tutorial for human, create agent version separately#21
crystalxyz wants to merge 10 commits intoharbor-framework:mainfrom
crystalxyz:adapter-tutorial-fix

crystalxyz commented Mar 25, 2026 •

edited

Loading

Uh oh!

Ternura143 left a comment

Uh oh!

Ternura143 Mar 25, 2026

Uh oh!

AlienKevin Mar 26, 2026

Uh oh!

Ternura143 left a comment

Uh oh!

Ternura143 Apr 15, 2026

Uh oh!

Ternura143 Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		### Step 3: Verify Oracle Solutions

		Run the oracle agent on your entire dataset and confirm 100% reward on all tasks.

Conversation

crystalxyz commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

NOTE: New structure updates (Mar 28, 2026)

Design details

Uh oh!

Ternura143 left a comment

Choose a reason for hiding this comment

Uh oh!

Ternura143 Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

AlienKevin Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Ternura143 left a comment

Choose a reason for hiding this comment

Uh oh!

Ternura143 Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Ternura143 Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

crystalxyz commented Mar 25, 2026 •

edited

Loading