You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR adds a small, self-contained Prime Verifiers environment for IFBench instruction-following RLVR work.
It includes:
environments/ifbench_rlvr/ with a packaged load_environment() entrypoint for prime eval run / Environment Hub usage
normalization for both released IFBench eval rows (prompt, instruction_id_list, kwargs) and IF-RLVR training rows from allenai/IF_multi_constraints_upto5 (messages, ground_truth)
reward functions that reuse IFBench verifier classes, falling back to the allenai/open-instruct IF-RLVR verifier registry for training-only constraint IDs
README quickstart and environment arguments
focused tests for dataset normalization, JSONL loading, and fractional verifier reward scoring
Why
The README points users to the released IF-RLVR training data and Open Instruct verifier code, but there is not currently a train/eval-ready Prime verifiers environment in this repo. This gives downstream users a reproducible environment wrapper while keeping the core benchmark code unchanged.
PYTHONDONTWRITEBYTECODE=1 uv run pytest -> 63 passed, 1 warning
uv build environments/ifbench_rlvr -> wheel and source distribution built successfully
PYTHONDONTWRITEBYTECODE=1 uv run --project environments/ifbench_rlvr --python 3.12 python ... -> installed the packaged environment, resolved last_word:last_word_answer through open_instruct.IFEvalG, constructed verifiers.envs.singleturn_env.SingleTurnEnv, and scored a demo response as reward_ok=1.0, reward_bad=0.0
Dataset registry smoke check on allenai/IF_multi_constraints_upto5[:20] and allenai/IFBench_test[:20] -> missing=[] for both
Submitted for the public Prime Intellect Algora IF-RLVR/Bench bounty: https://algora.io/PrimeIntellect-ai/bounties/dderbjHtPwTiGVY4 Current verification on this branch: - PYTHONDONTWRITEBYTECODE=1 uv run pytest -> 63 passed, 1 warning - uv build environments/ifbench_rlvr -> wheel and sdist built successfully The implementation is intentionally scoped as a standalone verifiers environment wrapper around the existing IFBench/Open Instruct verifier logic, so the core benchmark code stays unchanged.
Follow-up verification update: I found and fixed a packaging/runtime issue while doing an end-to-end environment install. The environment now pins open-instruct to a packaged revision that includes open_instruct.IFEvalG, and constrains Python to the dependency-supported >=3.12,<3.13 range.
Additional validation after commit 3d9d406:
PYTHONDONTWRITEBYTECODE=1 uv run pytest -> 63 passed, 1 warning
uv build environments/ifbench_rlvr -> wheel and sdist built successfully
PYTHONDONTWRITEBYTECODE=1 uv run --project environments/ifbench_rlvr --python 3.12 python ... -> installed the packaged environment, resolved last_word:last_word_answer via open_instruct.IFEvalG, constructed verifiers.envs.singleturn_env.SingleTurnEnv, and scored the demo response as reward_ok=1.0, reward_bad=0.0
Dataset registry smoke check on allenai/IF_multi_constraints_upto5[:20] and allenai/IFBench_test[:20] -> missing=[] for both
This should make the submitted environment reproducible from the package metadata rather than relying on whatever open-instruct HEAD happens to install.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a small, self-contained Prime Verifiers environment for IFBench instruction-following RLVR work.
It includes:
environments/ifbench_rlvr/with a packagedload_environment()entrypoint forprime eval run/ Environment Hub usageprompt,instruction_id_list,kwargs) and IF-RLVR training rows fromallenai/IF_multi_constraints_upto5(messages,ground_truth)allenai/open-instructIF-RLVR verifier registry for training-only constraint IDsWhy
The README points users to the released IF-RLVR training data and Open Instruct verifier code, but there is not currently a train/eval-ready Prime
verifiersenvironment in this repo. This gives downstream users a reproducible environment wrapper while keeping the core benchmark code unchanged.This PR was prepared with AI assistance for the public Prime Intellect Algora IF-RLVR/Bench bounty: https://algora.io/PrimeIntellect-ai/bounties/dderbjHtPwTiGVY4. I have reviewed and tested the changes locally.
Validation
PYTHONDONTWRITEBYTECODE=1 uv run pytest-> 63 passed, 1 warninguv build environments/ifbench_rlvr-> wheel and source distribution built successfullyPYTHONDONTWRITEBYTECODE=1 uv run --project environments/ifbench_rlvr --python 3.12 python ...-> installed the packaged environment, resolvedlast_word:last_word_answerthroughopen_instruct.IFEvalG, constructedverifiers.envs.singleturn_env.SingleTurnEnv, and scored a demo response asreward_ok=1.0,reward_bad=0.0allenai/IF_multi_constraints_upto5[:20]andallenai/IFBench_test[:20]->missing=[]for both/claim https://algora.io/PrimeIntellect-ai/bounties/dderbjHtPwTiGVY4