ann-ip

Training Artificial Neural Networks via Mixed Integer Programming

This project formulates the training of feedforward neural networks as a Mixed Integer Program (MIP). Instead of gradient-based optimization, network weights are found by solving a MIP that minimizes the mean absolute error (L1 loss) over a training dataset, subject to constraints that exactly encode the network's forward pass.

Problem Formulation

Given a training set ${(x^k, y^k)}{k=1}^{n}$, a layered network with weights $w{ij}$, and node activation functions, the training problem is formulated as:

$$\min_{w, h, \theta, z} \quad \frac{1}{n} \sum_{d,k} z_{d,k}$$

subject to:

L1 linearization: $z_{d,k} \geq |y^k_d - h_{d,k}|$ (via two linear constraints)
Bilinear terms: $\theta_{ij,k} = w_{ij} \cdot h_{i,k}$ — linearized using McCormick envelopes
Activation functions: Piecewise linear functions (ReLU, hard-tanh, sigmoid) encoded via a convex combination (SOS2-like) formulation
Node bounds: Tightened using interval arithmetic propagated from input bounds

Variables

Variable	Type	Description
`w[i,j]`	Continuous	Edge weight between nodes `i` and `j`
`h[j,k]`	Continuous	Output of node `j` for sample `k`
`theta[i,j,k]`	Continuous	Auxiliary for bilinear term `w[i,j] * h[i,k]`
`z[d,k]`	Continuous	L1 auxiliary for output `d`, sample `k`
`pi[j]`	Binary	(arch mode) Node activation selector

Architecture Mode (`arch_mode`)

When enabled, binary variables pi[j] are introduced for each node. A node can be turned off (pi[j] = 0), and the model simultaneously optimizes weights and which nodes to keep active — performing combined architecture search and training.

Repository Structure

ann-ip/
├── cpp_version/             # Production implementation (C++ / LibTorch / Gurobi)
│   ├── CMakeLists.txt
│   ├── build.sh
│   ├── build_and_run.sh
│   └── src/
│       ├── main.cpp         # Entry point: data loading, torch pre-training, Gurobi solve
│       ├── formulation.cpp  # Gurobi MIP model construction and optimization
│       ├── formulation.h
│       ├── network.cpp      # LayeredNetwork: node bounds, weight management
│       ├── network.h
│       ├── graph.cpp        # Graph data structure (nodes, edges, properties)
│       ├── graph.h
│       ├── std_nn.cpp       # Standard feedforward net (LibTorch) for pre-training
│       ├── std_nn.h
│       ├── functions.cpp    # Activation/aggregation function registry
│       ├── functions.h
│       └── globals.h        # Global flags (arch_mode)
│
├── general_form/            # Julia prototype (JuMP + Gurobi/SCIP + Flux.jl)
│   ├── formulation.jl       # MIP formulation via JuMP
│   ├── network.jl           # Network construction using MetaGraphs
│   ├── functions.jl         # Activation and aggregation function types
│   ├── flux_network.jl      # Flux.jl model creation, training, weight extraction
│   └── flux_test.jl         # Experiment scripts
│
├── perceptron/              # Python prototype (python-mip / CBC solver)
│   └── main.py              # Perceptron-level MIP formulation
│
├── data/
│   ├── processed_data.csv   # Input dataset
│   ├── results_gurobi.csv   # Predictions after MIP optimization
│   └── results_torch.csv    # Predictions from torch pre-training baseline
│
└── plots/                   # Visualization outputs (PNG/GIF)

Workflow

1. Load data from data/processed_data.csv
        |
2. Pre-train a feedforward network with LibTorch (SGD, MAE loss, 10k epochs)
        |
3. Extract trained weights → warm-start the MIP
        |
4. Build the Gurobi MIP model
   - Propagate node bounds via interval arithmetic
   - Add McCormick envelopes for each bilinear term
   - Add piecewise linear constraints for each activation
   - Set objective (L1 loss)
        |
5. Solve with Gurobi (time limit: 150s, all CPU threads)
        |
6. Extract optimal weights → evaluate on test set → save results

C++ Implementation

Dependencies

Dependency	Purpose
Gurobi	MIP solver (requires license)
LibTorch	Tensor ops and neural network pre-training
CMake ≥ 3.18	Build system
C++17	Language standard

LibTorch must be placed at ~/libtorch. Gurobi must be installed and the cmake/ module path must include a FindGURUBI.cmake.

Build

cd cpp_version
bash build.sh

This runs:

rm -rf build/ && mkdir build && cd build && cmake .. && cmake --build . --config Release

Run

cd cpp_version/build
./ann-ip <n_samples> <arch_mode>

<n_samples>: number of training samples to use
<arch_mode>: 0 for weight training only, 1 for combined architecture + weight optimization

Network architecture is read from config.txt in the format:

activations: identity, relu, identity
aggregations: input, sum, sum
layer_sizes: 3, 4, 1

Key implementation details

McCormick envelopes (formulation.cpp:85–106): Four linear inequalities per bilinear term, plus a nonlinear general constraint added via GRBaddgenconstrNL (Gurobi 11+).
Piecewise activation (formulation.cpp:222–312): 20-segment piecewise linear approximation. Convex combination variables with SOS2-adjacency constraints. Domain bounds clipped to [lb_in, ub_in] per sample.
Interval arithmetic bounds (network.cpp): For each sample k, bounds [lb_out, ub_out] are propagated layer by layer to tighten the MIP relaxation.
Warm start: Torch-trained weights are set as GRB_DoubleAttr_Start on each weight variable before solving.
Parallelism: Gurobi is configured with std::thread::hardware_concurrency() threads.

Julia Prototype

Uses JuMP as the modeling layer with Gurobi as the solver. Flux.jl is used for pre-training.

Supported activation functions (functions.jl):

LinearActivation(a) — identity ($a=1$) and linear scaling
PiecewiseActivation(x_vals, y_vals) — arbitrary piecewise linear (ReLU, hard-tanh, hard-sigmoid predefined)

Supported aggregation functions: SumAggregation, InputAggregation, BiasAggregation, MaxAggregation.

Network is represented as a MetaDiGraph from MetaGraphs.jl, with node/edge properties storing activation, aggregation, weights, and bounds.

Python Prototype

Located in perceptron/, uses python-mip with the CBC open-source solver. Implements the full MIP for perceptrons with Heaviside and ReLU activations. Intended as a minimal reference implementation.

Output

Results are saved as CSV files:

data/results_torch.csv — test set predictions from the torch pre-trained model (baseline)
data/results_gurobi.csv — test set predictions after MIP optimization

Format:

y_test,y_pred_0
80.5,82.8437
92.0,89.5578
...

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.vscode		.vscode
cpp_version		cpp_version
data		data
general_form		general_form
perceptron		perceptron
plots		plots
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ann-ip

Problem Formulation

Variables

Architecture Mode (`arch_mode`)

Repository Structure

Workflow

C++ Implementation

Dependencies

Build

Run

Key implementation details

Julia Prototype

Python Prototype

Output

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ann-ip

Problem Formulation

Variables

Architecture Mode (arch_mode)

Repository Structure

Workflow

C++ Implementation

Dependencies

Build

Run

Key implementation details

Julia Prototype

Python Prototype

Output

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

Architecture Mode (`arch_mode`)