SquRL (SQL Query Reinforcement Learning) is a reinforcement learning framework that enables LLMs to adaptively construct Text-to-SQL workflows at inference time. While Text-to-SQL has achieved impressive progress, real-world applicability remains limited by reliance on single static pipelines that struggle with out-of-distribution and long-tail scenarios. SquRL learns dynamic policies that consistently outperform the best static workflow—with gains driven by heterogeneity across candidate workflows. The framework employs a rule-based reward function and two training mechanisms: dynamic actor masking for broader exploration, and pseudo rewards for improved training efficiency. Experiments on widely-used benchmarks demonstrate that SquRL outperforms state-of-the-art static methods, with especially pronounced gains on complex and out-of-distribution queries.
| Model | Huggingface | ModelScope |
|---|---|---|
| SquRL-1.5B (only SFT) | - | SquRL-1.5B-SFT |
| SquRL-1.5B | SquRL-1.5B | SquRL-1.5B |
| SquRL-7B (only SFT) | - | SquRL-7B-SFT |
| SquRL-7B | - | SquRL-7B |
- Python 3.11 or higher
- CUDA 11.8+ (for GPU training)
- Multiple GPUs recommended for distributed training
- Sufficient disk space for model checkpoints and datasets
-
Clone the repository:
git clone https://github.com/Satissss/SquRL.git cd SquRL -
Install dependencies:
pip install -r requirements.txt
Note: Make sure you have the correct CUDA version installed that matches your PyTorch installation.
-
Setup Squrve Backend:
SquRL requires the Squrve backend for reward computation and evaluation.
a. Clone Squrve repository:
git clone https://github.com/Satissss/Squrve.git
b. Download the benchmark dataset:
Download from Hugging Face:
satissss/Squrve-Benchmarksc. Organize directory structure:
Place the
SquRLdirectory under Squrve'sbenchmarks/folder:Squrve/ └── benchmarks/ └── SquRL/ ├── database/ ├── rl/ └── ...d. Configure API keys:
cd Squrve/app vim app_config.json # Add your API KEY in the configuration file
SquRL follows a two-stage training pipeline: SFT (Supervised Fine-Tuning) first, then RL (Reinforcement Learning) with PPO.
Train a base model using supervised fine-tuning with LoRA (Low-Rank Adaptation):
bash scripts/sft_peft_sp.shAfter SFT completes, continue with PPO-based reinforcement learning. Prerequisite: The Squrve backend must be running to compute rewards.
a. Start the Squrve backend server (in a separate terminal):
cd Squrve/app
python run.pyThis starts the reward computation service for evaluating SQL query quality.
b. Launch RL training:
cd SquRL
bash scripts/rl_train_fsdp.shUse SquRL for inference within Squrve to evaluate Text-to-SQL performance. SquRL employs the ForkGatherAgent for dynamic workflow construction at inference time.
- Squrve backend set up (see Setup section)
- SquRL model deployed as an LLM API service (e.g., via vLLM or SGLang)
-
Edit the startup config:
cd Squrve vim startup/startup_config.json -
Add a ForkGatherAgent task to the
task_metalist with the following configuration:{ "task_id": "agent", "task_type": "AgentTask", "data_source": "<path to your NL question data>", "schema_source": "<path to database schemas>", "dataset_save_path": "<path to save evaluation results>", "is_save_dataset": true, "eval_type": ["execute_accuracy"], "meta": { "task": { "agent_type": "ForkGatherAgent" }, "actor": { "max_n": 5, "select_type": "FastExecSelector", "rollout_llm_args": { "api_key": "<API key for your LLM service>", "base_url": "<URL of your deployed SquRL model>", "temperature": 1.0 } } }, "open_parallel": true, "max_workers": 5 } -
Key parameters:
Parameter Description data_sourcePath to natural language question dataset schema_sourcePath to database schema definitions dataset_save_pathDirectory to save evaluation outputs max_nNumber of candidate workflows to explore per query select_typeSelector for choosing final SQL (e.g., FastExecSelector)rollout_llm_args.base_urlAPI endpoint of your deployed SquRL model rollout_llm_args.api_keyAPI key for the LLM service
Contributions are welcome! Please feel free to:
- Report bugs and issues
- Suggest new features or improvements
- Submit pull requests
Please ensure your code follows the existing style and includes appropriate tests.
If you use SquRL in your research, please cite our paper:
Beyond Static Pipelines: Learning Dynamic Workflows for Text-to-SQL
Yihan Wang, Peiyu Liu, Runyu Chen, Wei Xu
arXiv preprint arXiv:2602.15564, 2026
📄 Paper | DOI
BibTeX:
@article{wang2026beyond,
title = {Beyond Static Pipelines: Learning Dynamic Workflows for Text-to-SQL},
author = {Wang, Yihan and Liu, Peiyu and Chen, Runyu and Xu, Wei},
journal = {arXiv preprint arXiv:2602.15564},
year = {2026},
doi = {10.48550/arXiv.2602.15564},
url = {https://arxiv.org/abs/2602.15564}
}Star ⭐ this repository if you find it helpful!
