A two-layer feedforward neural network implemented entirely from scratch in NumPy, with full forward and backward propagation derivations. Solves a 3D classification task (4 classes) and a 1D regression task on toy datasets.
Two FFNNs trained without any deep-learning library:
- Classification head (3 -> 10 -> 4): a sigmoid hidden layer + softmax output trained with cross-entropy. Reaches 100% training accuracy on a 99-point 3D dataset with 4 classes.
- Regression head (1 -> 10 -> 1): a sigmoid hidden layer + linear output trained with MSE. Reaches R² = 0.98 on the training set; R² on test reflects the difficulty of extrapolation outside the training range.
Every quantity from the lecture (X, X̃, V, X̂, F, F̃, W, F̂, G, E) is exposed and printed at every training step so the math stays visible.
Modern frameworks make training a network a one-liner. Writing one by hand once - including the chain-rule for the backward pass and the softmax + cross-entropy gradient simplification - is the cleanest way to internalize what those one-liners are actually doing.
- Forward pass:
X̃ = [1, X],X̂ = X̃ V,F = sigmoid(X̂),F̃ = [1, F],F̂ = F̃ W,G = softmax(F̂)(or identity for regression). - Loss: cross-entropy for classification, MSE for regression.
- Backward pass: closed-form gradients, no autograd. The softmax + CE gradient collapses to
(G - Y)/N, which is propagated back through the sigmoid hidden layer. - Parameters are stored as
V(input -> hidden) andW(hidden -> output), both with bias rows.
git clone https://github.com/Mathos34/ffnn-from-scratch
cd ffnn-from-scratch
python -m venv .venv && source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -r requirements.txt
jupyter notebook lab_ffnn.ipynbThe notebook runs end-to-end in under a minute on a laptop CPU.
| Task | Architecture | Final metric |
|---|---|---|
| 3D classification (4 classes, 99 points) | 3 -> 10 -> 4 (sigmoid + softmax, CE) | 100.00% training accuracy |
| 1D regression (70/30 split) | 1 -> 10 -> 1 (sigmoid + linear, MSE) | Train R² = 0.98 |
The regression test R² is intentionally weak: the test split is held out at the right end of the X range, so the model is asked to extrapolate beyond what it ever saw at training time. This is a useful pedagogical illustration of the difference between interpolation and extrapolation.
Lab from the Advanced Machine Learning course at ECE Paris (4th-year engineering, Major Data & AI).
Built by Mathis Lacombe, AI Maker at the Intelligence Lab, ECE Paris. LinkedIn · Hugging Face