Knowledge-to-Verification: Unlocking Reinforcement Learning with Verifiable Rewards for LLMs in Knowledge-Intensive Domains
K2V (Knowledge-to-Verification) is a framework that extends RLVR (Reinforcement learning ith verifiable Rewards) to knowledge-intensive domains and enabling verification of the model's reasoning process, without any human supervision.
-
Clone the repository
git clone --recurse-submodules https://github.com/superfarther/K2V.git cd K2V -
Install the dependencies of graphgen-mask according to the README, and then synthesize the fill-blank style QA pairs.
cd graphgen-mask vim README.md -
Synthsize question-specific checklist for each QA pair.
cd utils vim README.md -
Install the dependencies of verl according to the README, and then start training.
cd verl vim README.md