Training Artificial Neural Networks via Mixed Integer Programming
This project formulates the training of feedforward neural networks as a Mixed Integer Program (MIP). Instead of gradient-based optimization, network weights are found by solving a MIP that minimizes the mean absolute error (L1 loss) over a training dataset, subject to constraints that exactly encode the network's forward pass.
Given a training set ${(x^k, y^k)}{k=1}^{n}$, a layered network with weights $w{ij}$, and node activation functions, the training problem is formulated as:
subject to:
-
L1 linearization:
$z_{d,k} \geq |y^k_d - h_{d,k}|$ (via two linear constraints) -
Bilinear terms:
$\theta_{ij,k} = w_{ij} \cdot h_{i,k}$ — linearized using McCormick envelopes - Activation functions: Piecewise linear functions (ReLU, hard-tanh, sigmoid) encoded via a convex combination (SOS2-like) formulation
- Node bounds: Tightened using interval arithmetic propagated from input bounds
| Variable | Type | Description |
|---|---|---|
w[i,j] |
Continuous | Edge weight between nodes i and j |
h[j,k] |
Continuous | Output of node j for sample k |
theta[i,j,k] |
Continuous | Auxiliary for bilinear term w[i,j] * h[i,k] |
z[d,k] |
Continuous | L1 auxiliary for output d, sample k |
pi[j] |
Binary | (arch mode) Node activation selector |
When enabled, binary variables pi[j] are introduced for each node. A node can be turned off (pi[j] = 0), and the model simultaneously optimizes weights and which nodes to keep active — performing combined architecture search and training.
ann-ip/
├── cpp_version/ # Production implementation (C++ / LibTorch / Gurobi)
│ ├── CMakeLists.txt
│ ├── build.sh
│ ├── build_and_run.sh
│ └── src/
│ ├── main.cpp # Entry point: data loading, torch pre-training, Gurobi solve
│ ├── formulation.cpp # Gurobi MIP model construction and optimization
│ ├── formulation.h
│ ├── network.cpp # LayeredNetwork: node bounds, weight management
│ ├── network.h
│ ├── graph.cpp # Graph data structure (nodes, edges, properties)
│ ├── graph.h
│ ├── std_nn.cpp # Standard feedforward net (LibTorch) for pre-training
│ ├── std_nn.h
│ ├── functions.cpp # Activation/aggregation function registry
│ ├── functions.h
│ └── globals.h # Global flags (arch_mode)
│
├── general_form/ # Julia prototype (JuMP + Gurobi/SCIP + Flux.jl)
│ ├── formulation.jl # MIP formulation via JuMP
│ ├── network.jl # Network construction using MetaGraphs
│ ├── functions.jl # Activation and aggregation function types
│ ├── flux_network.jl # Flux.jl model creation, training, weight extraction
│ └── flux_test.jl # Experiment scripts
│
├── perceptron/ # Python prototype (python-mip / CBC solver)
│ └── main.py # Perceptron-level MIP formulation
│
├── data/
│ ├── processed_data.csv # Input dataset
│ ├── results_gurobi.csv # Predictions after MIP optimization
│ └── results_torch.csv # Predictions from torch pre-training baseline
│
└── plots/ # Visualization outputs (PNG/GIF)
1. Load data from data/processed_data.csv
|
2. Pre-train a feedforward network with LibTorch (SGD, MAE loss, 10k epochs)
|
3. Extract trained weights → warm-start the MIP
|
4. Build the Gurobi MIP model
- Propagate node bounds via interval arithmetic
- Add McCormick envelopes for each bilinear term
- Add piecewise linear constraints for each activation
- Set objective (L1 loss)
|
5. Solve with Gurobi (time limit: 150s, all CPU threads)
|
6. Extract optimal weights → evaluate on test set → save results
| Dependency | Purpose |
|---|---|
| Gurobi | MIP solver (requires license) |
| LibTorch | Tensor ops and neural network pre-training |
| CMake ≥ 3.18 | Build system |
| C++17 | Language standard |
LibTorch must be placed at ~/libtorch. Gurobi must be installed and the cmake/ module path must include a FindGURUBI.cmake.
cd cpp_version
bash build.shThis runs:
rm -rf build/ && mkdir build && cd build && cmake .. && cmake --build . --config Releasecd cpp_version/build
./ann-ip <n_samples> <arch_mode><n_samples>: number of training samples to use<arch_mode>:0for weight training only,1for combined architecture + weight optimization
Network architecture is read from config.txt in the format:
activations: identity, relu, identity
aggregations: input, sum, sum
layer_sizes: 3, 4, 1
- McCormick envelopes (
formulation.cpp:85–106): Four linear inequalities per bilinear term, plus a nonlinear general constraint added viaGRBaddgenconstrNL(Gurobi 11+). - Piecewise activation (
formulation.cpp:222–312): 20-segment piecewise linear approximation. Convex combination variables with SOS2-adjacency constraints. Domain bounds clipped to[lb_in, ub_in]per sample. - Interval arithmetic bounds (
network.cpp): For each samplek, bounds[lb_out, ub_out]are propagated layer by layer to tighten the MIP relaxation. - Warm start: Torch-trained weights are set as
GRB_DoubleAttr_Starton each weight variable before solving. - Parallelism: Gurobi is configured with
std::thread::hardware_concurrency()threads.
Uses JuMP as the modeling layer with Gurobi as the solver. Flux.jl is used for pre-training.
Supported activation functions (functions.jl):
-
LinearActivation(a)— identity ($a=1$ ) and linear scaling -
PiecewiseActivation(x_vals, y_vals)— arbitrary piecewise linear (ReLU, hard-tanh, hard-sigmoid predefined)
Supported aggregation functions: SumAggregation, InputAggregation, BiasAggregation, MaxAggregation.
Network is represented as a MetaDiGraph from MetaGraphs.jl, with node/edge properties storing activation, aggregation, weights, and bounds.
Located in perceptron/, uses python-mip with the CBC open-source solver. Implements the full MIP for perceptrons with Heaviside and ReLU activations. Intended as a minimal reference implementation.
Results are saved as CSV files:
data/results_torch.csv— test set predictions from the torch pre-trained model (baseline)data/results_gurobi.csv— test set predictions after MIP optimization
Format:
y_test,y_pred_0
80.5,82.8437
92.0,89.5578
...