Skip to content

Franky03/ann-ip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ann-ip

Training Artificial Neural Networks via Mixed Integer Programming

This project formulates the training of feedforward neural networks as a Mixed Integer Program (MIP). Instead of gradient-based optimization, network weights are found by solving a MIP that minimizes the mean absolute error (L1 loss) over a training dataset, subject to constraints that exactly encode the network's forward pass.


Problem Formulation

Given a training set ${(x^k, y^k)}{k=1}^{n}$, a layered network with weights $w{ij}$, and node activation functions, the training problem is formulated as:

$$\min_{w, h, \theta, z} \quad \frac{1}{n} \sum_{d,k} z_{d,k}$$

subject to:

  • L1 linearization: $z_{d,k} \geq |y^k_d - h_{d,k}|$ (via two linear constraints)
  • Bilinear terms: $\theta_{ij,k} = w_{ij} \cdot h_{i,k}$ — linearized using McCormick envelopes
  • Activation functions: Piecewise linear functions (ReLU, hard-tanh, sigmoid) encoded via a convex combination (SOS2-like) formulation
  • Node bounds: Tightened using interval arithmetic propagated from input bounds

Variables

Variable Type Description
w[i,j] Continuous Edge weight between nodes i and j
h[j,k] Continuous Output of node j for sample k
theta[i,j,k] Continuous Auxiliary for bilinear term w[i,j] * h[i,k]
z[d,k] Continuous L1 auxiliary for output d, sample k
pi[j] Binary (arch mode) Node activation selector

Architecture Mode (arch_mode)

When enabled, binary variables pi[j] are introduced for each node. A node can be turned off (pi[j] = 0), and the model simultaneously optimizes weights and which nodes to keep active — performing combined architecture search and training.


Repository Structure

ann-ip/
├── cpp_version/             # Production implementation (C++ / LibTorch / Gurobi)
│   ├── CMakeLists.txt
│   ├── build.sh
│   ├── build_and_run.sh
│   └── src/
│       ├── main.cpp         # Entry point: data loading, torch pre-training, Gurobi solve
│       ├── formulation.cpp  # Gurobi MIP model construction and optimization
│       ├── formulation.h
│       ├── network.cpp      # LayeredNetwork: node bounds, weight management
│       ├── network.h
│       ├── graph.cpp        # Graph data structure (nodes, edges, properties)
│       ├── graph.h
│       ├── std_nn.cpp       # Standard feedforward net (LibTorch) for pre-training
│       ├── std_nn.h
│       ├── functions.cpp    # Activation/aggregation function registry
│       ├── functions.h
│       └── globals.h        # Global flags (arch_mode)
│
├── general_form/            # Julia prototype (JuMP + Gurobi/SCIP + Flux.jl)
│   ├── formulation.jl       # MIP formulation via JuMP
│   ├── network.jl           # Network construction using MetaGraphs
│   ├── functions.jl         # Activation and aggregation function types
│   ├── flux_network.jl      # Flux.jl model creation, training, weight extraction
│   └── flux_test.jl         # Experiment scripts
│
├── perceptron/              # Python prototype (python-mip / CBC solver)
│   └── main.py              # Perceptron-level MIP formulation
│
├── data/
│   ├── processed_data.csv   # Input dataset
│   ├── results_gurobi.csv   # Predictions after MIP optimization
│   └── results_torch.csv    # Predictions from torch pre-training baseline
│
└── plots/                   # Visualization outputs (PNG/GIF)

Workflow

1. Load data from data/processed_data.csv
        |
2. Pre-train a feedforward network with LibTorch (SGD, MAE loss, 10k epochs)
        |
3. Extract trained weights → warm-start the MIP
        |
4. Build the Gurobi MIP model
   - Propagate node bounds via interval arithmetic
   - Add McCormick envelopes for each bilinear term
   - Add piecewise linear constraints for each activation
   - Set objective (L1 loss)
        |
5. Solve with Gurobi (time limit: 150s, all CPU threads)
        |
6. Extract optimal weights → evaluate on test set → save results

C++ Implementation

Dependencies

Dependency Purpose
Gurobi MIP solver (requires license)
LibTorch Tensor ops and neural network pre-training
CMake ≥ 3.18 Build system
C++17 Language standard

LibTorch must be placed at ~/libtorch. Gurobi must be installed and the cmake/ module path must include a FindGURUBI.cmake.

Build

cd cpp_version
bash build.sh

This runs:

rm -rf build/ && mkdir build && cd build && cmake .. && cmake --build . --config Release

Run

cd cpp_version/build
./ann-ip <n_samples> <arch_mode>
  • <n_samples>: number of training samples to use
  • <arch_mode>: 0 for weight training only, 1 for combined architecture + weight optimization

Network architecture is read from config.txt in the format:

activations: identity, relu, identity
aggregations: input, sum, sum
layer_sizes: 3, 4, 1

Key implementation details

  • McCormick envelopes (formulation.cpp:85–106): Four linear inequalities per bilinear term, plus a nonlinear general constraint added via GRBaddgenconstrNL (Gurobi 11+).
  • Piecewise activation (formulation.cpp:222–312): 20-segment piecewise linear approximation. Convex combination variables with SOS2-adjacency constraints. Domain bounds clipped to [lb_in, ub_in] per sample.
  • Interval arithmetic bounds (network.cpp): For each sample k, bounds [lb_out, ub_out] are propagated layer by layer to tighten the MIP relaxation.
  • Warm start: Torch-trained weights are set as GRB_DoubleAttr_Start on each weight variable before solving.
  • Parallelism: Gurobi is configured with std::thread::hardware_concurrency() threads.

Julia Prototype

Uses JuMP as the modeling layer with Gurobi as the solver. Flux.jl is used for pre-training.

Supported activation functions (functions.jl):

  • LinearActivation(a) — identity ($a=1$) and linear scaling
  • PiecewiseActivation(x_vals, y_vals) — arbitrary piecewise linear (ReLU, hard-tanh, hard-sigmoid predefined)

Supported aggregation functions: SumAggregation, InputAggregation, BiasAggregation, MaxAggregation.

Network is represented as a MetaDiGraph from MetaGraphs.jl, with node/edge properties storing activation, aggregation, weights, and bounds.


Python Prototype

Located in perceptron/, uses python-mip with the CBC open-source solver. Implements the full MIP for perceptrons with Heaviside and ReLU activations. Intended as a minimal reference implementation.


Output

Results are saved as CSV files:

  • data/results_torch.csv — test set predictions from the torch pre-trained model (baseline)
  • data/results_gurobi.csv — test set predictions after MIP optimization

Format:

y_test,y_pred_0
80.5,82.8437
92.0,89.5578
...