You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Simple neural networks with constant number N of neurons per each layer (when the
fixed N is actually part of the type of matrices and vectors) or even differing
between layers.
The type Matrix[A, M, N] specifies that the matrix contains elements of type A,
has M number of rows and N number of columns. A must have a Ring typeclass
instance, as defined in spire.
The type Vector[A, N] specifies that the vectors contains elements of type A
and has N number of rows. A must have a Ring typeclass instance, as defined
in spire.
The type Tensor[A, M, N, O] specifies that the 3D tensor contains elements of
type A, has M number of rows, N number of columns and O as depth. A must
have a Ring typeclass instance, as defined in spire.
Each operation performed with tensors, matrices, or vectors must type check, meaning one
can multiply a Matrix[A, M, N] and a Matrix[A, N, P], yielding a Matrix[A, M, P],
but one cannot multiply the former with a Matrix[B, N, P] or a Matrix[A, P, Q],
because A differs from B, respectively, N differs from P.
N Neural Network
A neural network Network[N] is composed of M layers each of N neurons, while
N+1 is the number of weights per neuron plus 1 (for the bias). For arbitrary
precision arithmetic using spire.math.Real, for instance, each neuron of type
Neuron[N] has a Vector[Real, N] of weights, a Real bias and an Activation
function, where the latter is a Scala3 enum. This means that the Activation
functions may differ with each neuron. The definition and the derivative of an
Activation function must be known and given.
For a neural network, it is implemented training with backpropagation, as well
as the straighter prediction, once the neurons' (biases and) weights have been
trained.
The package nnn.float builds around a generalized neural network, where
the assumption of a constant number of neurons per each layer is relaxed, and
thus the number of neurons may differ between layers.
Although the dimension types of the variables and values involved in the algorithms
are the wildcard ?, operations on matrices and vectors are type-safe using shapes.
Even assignment is safe - though not the method := by itself - because the algorithms
perform reassignment only, and thus the types of matrices/vectors are asserted before
assignment.
An example of a neural network with two inputs, a hidden layer with ten neurons,
and one output is the following:
Each type mapped by the higher-kinded type N[_] differs with the type argument:
N[0] is the number of inputs, N[1] is the number of neurons in the hidden layer,
while N[2] is the number of outputs. It is hence called a shape.
The implicit given_List_Int is the shape as values. Both kinds of shape (types
and values) must be given. The neural network is defined as Network[N, 2](...),
where N is the shape and 2 is the number of hidden layers, while the (values)
shape is passed as implicit parameter. Then, 1 occuring in the argument
loss = MSE[1]() is the same number of neurons in the output layer.
Note that the output layer is also a hidden layer.
Convolutional Neural Network
The package nnn.cnn.float builds around a generalized convolutional neural network.
A modified version of LeNet with Kaiming initialization, LeakyReLU activation,
and Max Pooling (instead of Xavier/Gorot initialization, Sigmoid activation, and
Subsampling) tests the implementation of CNNs. The MNIST databases are used and
the four .gz files must be placed under the root-relative "./data/MNIST" folder:
It takes about 16 hours (on an Intel i5 CPU at 2.90GHz) to train the entire shuffled
60000 images, in 100 batches and 20 epochs, then testing the entire shuffled 10000
test images with an accuracy of 98%.
An experimental reduced version of AlexNet with 10 classes and 512 neurons in
the dense layers (instead of 1000 and 4096), is also included for purpose of
demonstration, but disabled because of being too slow. The CIFAR-10 database
must be downloaded:
The RGB images are 32x32, scaled to 224x224, and finally padded to 227x227.
Testing
Use, for instance, the following sbt command:
sbt:N Neural Networks> testOnly *double*Network*
This will run all tests with the word "Network" in package "nnn.double" (where all
values, functions, or networks are based on the Double type): there is only one
such suite, nnn.double.NetworkSuite.
Math
Consider a neural network with three hidden layers (the last of each is the output
layer) and three neurons per each layer, like in the following table:
Input
Layer
Layer
Layer/Output
$1$
$1$
$1$
$x_1$
$a_{11} {\phi_{11} \atop \longrightarrow} h_{11}$
$a_{12} {\phi_{12} \atop \longrightarrow}h_{12}$
$a_{13} {\phi_{13} \atop \longrightarrow} y_1$
$x_2$
$a_{21} {\phi_{21} \atop \longrightarrow} h_{21}$
$a_{22} {\phi_{22} \atop \longrightarrow}h_{22}$
$a_{23} {\phi_{23} \atop \longrightarrow} y_2$
$x_3$
$a_{31} {\phi_{31} \atop \longrightarrow} h_{31}$
$a_{32} {\phi_{32} \atop \longrightarrow}h_{32}$
$a_{33} {\phi_{33} \atop \longrightarrow} y_3$
The layers are fully connected, in the sense of the following nine equations:
where $w_{ij}^k$ is the weight of the $i^{th}$ neuron on the $j^{th}$ layer with
respect to the $k^{th}$ output from the previous layer. We have, in matrix form:
These were the equations corresponding to the forward pass. For backpropagation,
we proceed backwards, from the output layer towards the input layer. Assume $L$
is the loss function; it has three known partial derivatives:
$\frac{\partial{L}}{\partial{y_i}}$, where $i = \overline{1 \dots 3}$.
From these, we start with the derivatives of $L$ with respect to the weights in
the last (output) layer ($w_{i3}^k$, where $i = \overline{1 \dots 3}$ and
$k = \overline{0 \dots 3}$). Using the chain rule
the following equations hold:
We note that $\frac{\partial{L}}{\partial{y_i}} \times \phi_{i3}'(a_{i3})$,
where $i = \overline{1 \dots 3}$, occur repeatedly: we may thus introduce
the following matrix named $\delta$ (using the Hadamard product $\odot$):
Then, the previous equations become (under the notation $\nabla^{(3)}$ - the
partial derivatives of $L$ with respect to the weights on the $3^{rd}$ layer):
Let us further drop the first row (because it is not used) in the above matrix
(denoting this transient matrix by ${\left( {W^{(3)}}^T \cdot \delta \right)}^*$):
The first row of this matrix corresponds to equations $(4)-(7)$, the second row to
equations $(8)-(11)$, and the third row to equations $(12)-(15)$.
Then, the equations $(4)-(15)$ become (under the notation $\nabla^{(2)}$ - the
partial derivatives of $L$ with respect to the weights on the $2^{nd}$ layer):