This repo contains the framework used by K2V to synthesize fill-blank style QA pairs. We developed this framework based on GraphGen.
We recommend to begin with a fresh new conda environment.
conda create --name verl python=3.10 -y
conda activate graphgen-maskInstall the necessary dependencies.
git clone https://github.com/superfarther/graphgen-mask.git
pip install -r requirements_K2V.txtNote: This repo is significantly outdated compared to the official GraphGen. To use the latest code for synthesizing QA pairs, you can visit the official GraphGen repository and navigate to the examples/generate/generate_masked_fill_in_blank_qa.
-
In order to construct a KG from corpus, K2V deploy a LLM using vLLM to perform Named Entity Recognition (NER) and Relation Extraction (RE).
vllm serve Qwen/Qwen2.5-72B-Instruct --max_model_len 32768
-
Configure the environment
- Create an
.envfile in the root directorycp .env.example .env
- fill in the necessary key in the
.env- SYNTHESIZER_MODEL: Local path of LLM deployed with vLLM
- SYNTHESIZER_BASE_URL: Service endpoint for the LLM deployed with vLLM.
- SYNTHESIZER_API_KEY: (optional) API key.
- Create an
-
We provide example corpus, which is stored in the
K2V-example/data/example_corpus.json. Additionally, a example configuration file is available atK2V-example/config.yaml. -
Run the generation script
bash K2V-example/run.sh