Skip to content

crabsatellite/learned-control-layers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learned Control Layers for Combinatorial Local Search

Code, benchmarks, and trained models for the paper "Learned Control Layers for Combinatorial Local Search."

We propose a control-layer architecture for Dynamic Algorithm Configuration (DAC): an external RL controller adjusts solver parameters at runtime through a standardized checkpoint interface. The same PPO training pipeline is validated on two solver families without modification beyond domain-specific state--action definitions.

Key Results

MaxSAT (NuWLS, 4 parameters, 10 state features)

Comparison PPO Mean Cost Baseline Mean Cost Improvement Wilcoxon p
PPO vs Random 372.1 459.6 -19.0% 2.4 x 10^-5
PPO vs Best Static 372.1 415.1 -10.4% 0.0069
PPO vs BO Static (v500) -- -- 7/10 wins 0.037

Scale-dependent crossover: BO static wins at small scale, PPO overtakes at 500 variables.

SAT (WalkSAT, 2 parameters, 8 state features)

PPO achieves the lowest mean unsatisfied clause count among all tested configurations, with statistically significant improvements over suboptimal baselines (Wilcoxon p < 0.01).

Cross-Domain Patterns

  1. Exploration parameter dominance: noise_prob accounts for nearly all DAC benefit in both domains
  2. Scale-dependent value of dynamic control: learned policies improve relative to static tuning as instance size grows

Architecture

The checkpoint protocol decouples parameter scheduling from solver implementation:

  1. Solver pauses every N flips
  2. Reports search state features to the controller
  3. Receives updated parameter values
  4. Resumes with new parameters

Any solver exposing this interface can be governed by the same RL pipeline.

Repository Structure

maxsat-dac/
├── src/
│   ├── train_domain.py               # PPO training script
│   ├── eval_domain.py                # Evaluation with statistical tests
│   └── domains/sat/
│       ├── solver.py                 # WalkSAT solver with checkpoint protocol
│       ├── gym_env.py                # Gymnasium DAC environment (8 features, 2 actions)
│       ├── baselines.py              # 5 static baseline configurations
│       └── generate_instances.py     # Phase-transition 3-SAT instance generator
├── data/
│   ├── benchmarks/sat/               # 90 CNF instances (v50/v100/v200) + split.json
│   └── results/                      # Trained models and evaluation results (both domains)
├── archive/                          # Original MaxSAT-only code (see archive/README.md)
├── configs/sat.json                  # SAT domain configuration reference
├── requirements.txt
└── REPRODUCE.md                      # Step-by-step reproduction guide

MaxSAT code is in archive/ (NuWLS C++ solver wrapper, Gymnasium env, all experiment scripts). MaxSAT results are in data/results/ (JSON files + trained PPO models in ppo_csolver_500k/).

SAT code is the active codebase in src/domains/sat/ (pure Python WalkSAT). SAT results are in data/results/sat_*/.

Quick Start (SAT domain)

# Install dependencies
pip install -r requirements.txt

# Generate benchmark instances (or use the included 90 instances)
python src/domains/sat/generate_instances.py

# Train PPO controller (3 seeds as in the paper)
python src/train_domain.py --domain sat --timesteps 200000 --seed 42
python src/train_domain.py --domain sat --timesteps 200000 --seed 123
python src/train_domain.py --domain sat --timesteps 200000 --seed 999

# Evaluate against baselines
python src/eval_domain.py --domain sat --model data/results/sat_ppo_*/best_model/best_model.zip

No external solver compilation is needed for SAT -- the WalkSAT solver is pure Python.

For MaxSAT reproduction, see archive/ and the original training script archive/train_500k.py (requires NuWLS C++ compilation).

See REPRODUCE.md for detailed SAT reproduction instructions.

Requirements

  • Python 3.10+
  • SAT domain: no C/C++ compiler needed (pure Python solver), ~1 hour training
  • MaxSAT domain (archive): g++ or MSVC for NuWLS compilation, ~4 hours training

Paper

Learned Control Layers for Combinatorial Local Search Alex Chengyu Li, 2026

Submitted to the Journal of Heuristics (Springer).

Citation

@article{li2026controllayers,
  author    = {Li, Alex Chengyu},
  title     = {Learned Control Layers for Combinatorial Local Search},
  journal   = {Journal of Heuristics},
  year      = {2026},
  note      = {Under review}
}

License

This project is licensed under the MIT License. See LICENSE for details.

Acknowledgments

  • NuWLS solver by Yi Chu et al. (AAAI 2023): GitHub
  • DACBench framework: GitHub

About

RL-based Dynamic Algorithm Configuration for MaxSAT local search — PPO controller tunes clause-weighting parameters during NuWLS solving, achieving 10-19% improvement

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors