Skip to content

Add RLVR reward wrapper for IFBench verifiers#29

Open
Catnap7 wants to merge 1 commit into
allenai:mainfrom
Catnap7:codex/ifbench-rlvr-reward-env
Open

Add RLVR reward wrapper for IFBench verifiers#29
Catnap7 wants to merge 1 commit into
allenai:mainfrom
Catnap7:codex/ifbench-rlvr-reward-env

Conversation

@Catnap7

@Catnap7 Catnap7 commented May 22, 2026

Copy link
Copy Markdown

Summary

Algora bounty: https://algora.io/PrimeIntellect-ai/bounties/dderbjHtPwTiGVY4

This PR adds a small train-ready RLVR wrapper around the existing IFBench verifier logic.

It provides:

  • rlvr_env.score_response(...) for single prompt/completion reward scoring
  • strict and loose evaluation modes matching the benchmark verifier behavior
  • binary all rewards and dense partial-credit fraction rewards
  • python -m rlvr_env ... to convert prompt/response JSONL files into reward-labeled JSONL
  • README usage examples for RLVR loops
  • UTF-8 file handling in evaluation_lib.py, which avoids Windows locale decode errors on the included JSONL data
  • focused tests for scalar rewards, per-instruction diagnostics, and JSONL output

This is intended as a lightweight adapter for IF-RLVR style training/evaluation loops without changing the underlying instruction verifier implementations.

Verification

Ran locally on Windows with Python 3.12:

.\.venv\Scripts\python -m pytest instructions_test.py tests/test_rlvr_env.py
62 passed, 1 warning

The warning is from the existing syllapy dependency importing pkg_resources; local verification used setuptools<81 to keep that dependency path working.

@Catnap7

Catnap7 commented May 22, 2026

Copy link
Copy Markdown
Author

Submitting this PR for the Prime Intellect IF-RLVR/Bench bounty:
https://algora.io/PrimeIntellect-ai/bounties/dderbjHtPwTiGVY4

The change adds a train-ready RLVR reward wrapper around IFBench's existing verifiers, with a reproducible CLI and tests. Happy to adjust scope if the expected deliverable differs.

@Catnap7 Catnap7 force-pushed the codex/ifbench-rlvr-reward-env branch from 264a514 to c3ab2d1 Compare May 22, 2026 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant