Code, benchmarks, and trained models for the paper "Learned Control Layers for Combinatorial Local Search."
We propose a control-layer architecture for Dynamic Algorithm Configuration (DAC): an external RL controller adjusts solver parameters at runtime through a standardized checkpoint interface. The same PPO training pipeline is validated on two solver families without modification beyond domain-specific state--action definitions.
| Comparison | PPO Mean Cost | Baseline Mean Cost | Improvement | Wilcoxon p |
|---|---|---|---|---|
| PPO vs Random | 372.1 | 459.6 | -19.0% | 2.4 x 10^-5 |
| PPO vs Best Static | 372.1 | 415.1 | -10.4% | 0.0069 |
| PPO vs BO Static (v500) | -- | -- | 7/10 wins | 0.037 |
Scale-dependent crossover: BO static wins at small scale, PPO overtakes at 500 variables.
PPO achieves the lowest mean unsatisfied clause count among all tested configurations, with statistically significant improvements over suboptimal baselines (Wilcoxon p < 0.01).
- Exploration parameter dominance:
noise_probaccounts for nearly all DAC benefit in both domains - Scale-dependent value of dynamic control: learned policies improve relative to static tuning as instance size grows
The checkpoint protocol decouples parameter scheduling from solver implementation:
- Solver pauses every N flips
- Reports search state features to the controller
- Receives updated parameter values
- Resumes with new parameters
Any solver exposing this interface can be governed by the same RL pipeline.
maxsat-dac/
├── src/
│ ├── train_domain.py # PPO training script
│ ├── eval_domain.py # Evaluation with statistical tests
│ └── domains/sat/
│ ├── solver.py # WalkSAT solver with checkpoint protocol
│ ├── gym_env.py # Gymnasium DAC environment (8 features, 2 actions)
│ ├── baselines.py # 5 static baseline configurations
│ └── generate_instances.py # Phase-transition 3-SAT instance generator
├── data/
│ ├── benchmarks/sat/ # 90 CNF instances (v50/v100/v200) + split.json
│ └── results/ # Trained models and evaluation results (both domains)
├── archive/ # Original MaxSAT-only code (see archive/README.md)
├── configs/sat.json # SAT domain configuration reference
├── requirements.txt
└── REPRODUCE.md # Step-by-step reproduction guide
MaxSAT code is in archive/ (NuWLS C++ solver wrapper, Gymnasium env, all experiment scripts). MaxSAT results are in data/results/ (JSON files + trained PPO models in ppo_csolver_500k/).
SAT code is the active codebase in src/domains/sat/ (pure Python WalkSAT). SAT results are in data/results/sat_*/.
# Install dependencies
pip install -r requirements.txt
# Generate benchmark instances (or use the included 90 instances)
python src/domains/sat/generate_instances.py
# Train PPO controller (3 seeds as in the paper)
python src/train_domain.py --domain sat --timesteps 200000 --seed 42
python src/train_domain.py --domain sat --timesteps 200000 --seed 123
python src/train_domain.py --domain sat --timesteps 200000 --seed 999
# Evaluate against baselines
python src/eval_domain.py --domain sat --model data/results/sat_ppo_*/best_model/best_model.zipNo external solver compilation is needed for SAT -- the WalkSAT solver is pure Python.
For MaxSAT reproduction, see archive/ and the original training script archive/train_500k.py (requires NuWLS C++ compilation).
See REPRODUCE.md for detailed SAT reproduction instructions.
- Python 3.10+
- SAT domain: no C/C++ compiler needed (pure Python solver), ~1 hour training
- MaxSAT domain (archive): g++ or MSVC for NuWLS compilation, ~4 hours training
Learned Control Layers for Combinatorial Local Search Alex Chengyu Li, 2026
Submitted to the Journal of Heuristics (Springer).
@article{li2026controllayers,
author = {Li, Alex Chengyu},
title = {Learned Control Layers for Combinatorial Local Search},
journal = {Journal of Heuristics},
year = {2026},
note = {Under review}
}This project is licensed under the MIT License. See LICENSE for details.