Skip to content

Add RLVR adapter and verifier environment#30

Open
yannickvilleneuve-hash wants to merge 1 commit into
allenai:mainfrom
yannickvilleneuve-hash:ifbench-rlvr-adapter
Open

Add RLVR adapter and verifier environment#30
yannickvilleneuve-hash wants to merge 1 commit into
allenai:mainfrom
yannickvilleneuve-hash:ifbench-rlvr-adapter

Conversation

@yannickvilleneuve-hash

Copy link
Copy Markdown

Summary

  • Add rlvr_adapter.py to export IFBench prompts as chat-message JSONL tasks and score response JSONL files with scalar instruction-following rewards.
  • Add ifbench_verifiers.py, an optional Prime Intellect Verifiers environment that reuses the existing IFBench checkers as rollout rewards.
  • Make evaluation response lookup tolerant of harmless prompt whitespace drift and avoid mutating input kwargs during strict scoring.
  • Document the RLVR export, scoring, and Verifiers loading flow.

Motivation

This is intended to make IFBench directly usable for RLVR-style training/evaluation workflows while keeping the existing benchmark verifiers as the source of truth.

Validation

uv run python -m pytest rlvr_adapter_test.py -q
uv run python -m py_compile evaluation_lib.py rlvr_adapter.py ifbench_verifiers.py
uv run python -m run_eval --input_data=data/IFBench_test.jsonl --input_response_data=data/sample_output.jsonl --output_dir=/tmp/ifbench-eval-check
uv run --extra rlvr python - <<'PY'
from ifbench_verifiers import IFBenchEnvConfig, IFBenchTasksetConfig, load_environment
config = IFBenchEnvConfig(taskset=IFBenchTasksetConfig(limit=2, eval_limit=1))
env = load_environment(config)
print(len(env.taskset.get_dataset()))
print(len(env.taskset.get_eval_dataset()))
PY

Also verified the Verifiers reward path returns 1.0 for a positive single-example completion.

Bounty reference: https://algora.io/PrimeIntellect-ai/bounties/dderbjHtPwTiGVY4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant