Neural Networks : Introduction

Introduction

In this post, I would like to introduce how neural networks are built under the hood. I will explore the essential components and mechanisms that enable neural networks to learn and make predictions. By the end of this article, you should have a clearer understanding of how these powerful models work under the hood.

Overview of Neural Networks

Before diving deep into neural networks, I would like to roughly remind the reader of the main goal and how it is reached. Neural networks (NNs) are a class of machine learning models inspired by the structure and function of the human brain. The primary objective of a neural network is to learn a mapping from input data to desired output values by adjusting its parameters through a process of optimization.

1. Goal of neural networks

The central aim of a neural network is to learn how to produce accurate outputs when given specific inputs. This process involves training the network on a dataset, where each data point consists of an input and a corresponding target output. For instance, in a classification problem, the input might be an image, and the target output would be the label of the object in the image. Or maybe in a regression problem, we want to train our model to answer the question, how much?

2. Training Process

2.1 Feeding data into the net and comparison of the output

During training, the network is provided with a set of input data and its corresponding target outputs. The network makes predictions based on the input data, which are then compared to the target outputs. This comparison is crucial for evaluating how well the network is performing.

2.2 Loss Function

To quantify the difference between the network’s predictions and the target outputs, we use a loss function (or cost function). The loss function measures the prediction error. Common loss functions include Mean Squared Error for regression tasks and Cross-Entropy Loss for classification tasks. A lower value of the loss function indicates better performance of the network.

2.3 Optimization via backpropagation

The goal of training is to minimize the loss function, which involves finding the optimal set of parameters (weights and biases) for the network. To achieve this, we use an optimization algorithm that adjusts the parameters to reduce the loss. Backpropagation is the method used to compute the gradients of the loss function with respect to the network’s parameters. It involves:

Forward Pass: Computing the predictions of the network and the loss.
Backward Pass: Calculating the gradients of the loss function with respect to each parameter using the chain rule of calculus.

2.4 Gradient Descent

Once the gradients are computed, they are used by the optimization algorithm (often gradient descent or its variants) to update the network’s parameters. The network parameters are adjusted in the direction that reduces the loss function. This process is iterated over multiple epochs (passes through the entire dataset) until the loss function converges to a minimum value or sufficiently small error.

Roughly speaking, this is the process that we want to follow to achieve our goal. With the foundational concepts established, we will now delve into the detailed steps required to build a neural network. Firstly, we need to create a class that represents data within our neural network. This class must track the origin of each data value, including the operations and values used to compute it, in order to facilitate accurate gradient calculation during backpropagation. We are going to call the class for representing data Value class.

Breakdown of the code for the Value class

Now I will start to implement in code what we have seen in the foundational concepts about NN.

Building the basic item of the class and the attribute of the object

class Value: 
  
    def __init__(self, data, _children=(), _op=''):
    
        self.data = data
        self._previously = set(_children)
        self._operations = _op
        self.grad = 0.0
        self._backward = lambda: None

    def __repr__(self):
        return f"Value(data={self.data})"

Here we initialize the value objects with its own attribute. Let’s break down each attribute:

self.data = data, saves the value of the object
self._previously = set(_children), saves in a set the children of its value, that means that we save what values generated this value. Thanks to the set we can avoid duplicates
self._operations = _op, we keep track of what operation generated this value
self.grad = 0.0, we initially set it to zero, then we are going to modify it accordingly to the rules of backpropagation, we store it in this attribute
self._backward = lambda: None, we need to store here the function that provides us the gradient, we set it initially to a lambda None because it depends on the operation that generated this Value object.

The repr method is used to print the value object. So for example, if we want to generate a value object we can do as follows:

a = Value(3)
print(a)
>>> Value(data=3)

Building the basic operation and the relative backward function for value object

class Value: 
    ... # existing code

    def __add__(self, other):
        out = Value(self.data + other.data, (self, other), '+')
  
        def _backward():
            self.grad += 1.0 * out.grad
            other.grad += 1.0 * out.grad
        out._backward = _backward
  
        return out

Here we have the code for the add method. Simply in this method, we create an out Value object featured by a new value and a set of children (the two Value objects that originated it) and then we append the operation that originated that new Value object. Then from the fundamentals of calculus, we know that the gradient of the two children due to this operation can be calculated as shown in the code. For much more detail watch gradient explained here. Let’s watch how the add function behaves:

a = Value(3)
b = Value(4)
c = a + b
print(c)
>>> Value(data=7)
print(c._previously)
>>> (Value(data=3), Value(data=4))
print(c._operations)
>>> '+'

Now let’s look at the code for the mul method:

class Value:
    ... # existing code

    def __mul__(self, other):
        out = Value(self.data * other.data, (self, other), '*')
    
        def _backward():
            self.grad += other.data * out.grad
            other.grad += self.data * out.grad
        out._backward = _backward
    
        return out

In the add method, we create a new Value object with a new value and a set of children, which are the two Value objects that originated this result. We also append the operation that produced this new Value object. The backward function is designed to compute the gradient according to calculus rules.

Note that we use the += operator when updating the .grad attribute. This is because we want to accumulate the gradient. For example, if the Value object is involved in multiple operations, the gradient should reflect all these operations, making the gradient cumulative. Generally, this is the process that we follow to create new methods for operations for this class. For the purpose of creating a neural network, we need also a function that can work for us as an activation function. In this case, we add the tanh function for this scope.

class Value:
    ... # existing code

    def tanh(self):
        x = self.data
        t = (math.exp(2*x) - 1) / (math.exp(2*x) + 1)

        out = Value(t, (self,), 'tanh')
  
        def _backward():
            self.grad += (1 - t**2) * out.grad
        out._backward = _backward
    
        return out

From calculus, we know that tanh is defined as:

\[\tanh(x) = \frac{e^{2x} - 1}{e^{2x} + 1}\]

Then we can implement the backward function knowing that:

\[\frac{d}{dx} \tanh(x) = 1 - \tanh^2(x)\]

Now we implement the exponential method for being able to use tanh:

class Value:
    ... # existing code

    def exp(self):
        x = self.data
        out = Value(math.exp(x), (self,), 'exp')
    
        def _backward():
            self.grad += out.data * out.grad 
        out._backward = _backward
    
        return out

We have now implemented most of the basic operations for our Value object. The final step is to implement a backward method. This method will be responsible for calling the backward functions of each object in the proper order.

Building the backward method

Here’s the code for the function:

class Value: 
    ... # existing code

    def backward(self):
  
        topo = []
        visited = set()
        def build_topo(v):
            if v not in visited:
                visited.add(v)
                for child in v._previously:
                    build_topo(child)
                topo.append(v)
        build_topo(self)
    
        self.grad = 1.0
        for node in reversed(topo):
            node._backward()

Thanks to the implementation of the Value object, we are able to construct a computation graph that captures the entire history of how a Value object was derived. This graph is formed by following the _previously set of each Value object, which tracks all the predecessor values involved in its computation.

To compute the gradients, we begin by focusing on the final Value object, which typically results from a loss function in a neural network. Since we are interested in the gradient of this final Value with respect to itself, we initialize its gradient (self.grad) to 1. This initialization signifies that the gradient of the final Value with respect to itself is 1.

Following this, we execute the backward pass by invoking the _backward function for each Value object. This process starts with the final Value and proceeds backward through the computation graph to the initial Value objects. This reverse traversal ensures that the gradient for each Value is computed correctly based on the chain rule of differentiation, allowing us to accumulate the gradients appropriately.

We’ve now looked at the basics of how a neural network works. This simple approach helps us understand the core concepts behind its operation. Now it’s time to create a very simple net.

Creating a Neural Network

Now we will look more deeply into how to create a net by little steps.

Create one artificial neuron

Now we can create an artificial neuron with two inputs. This is an example of how a neuron works:

# Inputs x1, x2
x1 = Value(1.0)
x2 = Value(2.0)
# Weights w1, w2
w1 = Value(6.0)
w2 = Value(-4.0)
# Bias of the neuron
b = Value(9.8)
# x1*w1 + x2*w2 + b
x1w1 = x1 * w1
x2w2 = x2 * w2
x1w1x2w2 = x1w1 + x2w2
n = x1w1x2w2 + b
# Activation function
o = n.tanh()

o is the output of our neuron. The next step is to define the class Neuron, here’s the code:

class Neuron:
  
    def __init__(self, nin):
        self.w = list()
        self.b = Value(random.uniform(-1, 1))

        for _ in range(nin):
            self.w.append(Value(random.uniform(-1, 1)))
  
    def __call__(self, x):
        # w * x + b
        activation = 0
        for wi, xi in zip(self.w, x):
            activation += wi * xi
        activation += self.b
        out = activation.tanh()

        return out
  
    def parameters(self): # For storing the parameters
        return self.w + [self.b]

Here we define the class Neuron. In the init method, we create the weights based on the number of inputs (nin) and then we create the bias Value. When we call the Neuron, it performs the activation based on the input and gives as output the result. Here’s an example of usage:

x = [1, 2, 3]
a = Neuron(3)
print(a(x))
>>> Value(data=something)

Create a layer

The next step is to create a layer made of neurons, here’s the code:

class Layer:
  
    def __init__(self, nin, nout):
        self.neurons = list()

        for _ in range(nout):
            self.neurons.append(Neuron(nin))
  
    def __call__(self, x):
        outs = list()
    
        for neuron in self.neurons:
            outs.append(neuron(x))

        return outs[0] if len(outs) == 1 else outs
  
    def parameters(self):
        return [p for neuron in self.neurons for p in neuron.parameters()]

In the init method, we initialize a layer specifying the number of inputs for each neuron (nin) and the number of outputs for the layer (nout). When we call it, the call method pushes the input to every neuron and returns the output of every neuron.

Create the multilayer perceptron

The final step is to create an MLP fully connected, here’s the code:

class MLP:
  
    def __init__(self, nin, nouts):
        sz = [nin] + nouts
        self.layers = list()

        for i in range(len(nouts)):
            self.layers.append(Layer(sz[i], sz[i+1]))
  
    def __call__(self, x):
        for layer in self.layers:
            x = layer(x)
        return x
  
    def parameters(self):
        return [p for layer in self.layers for p in layer.parameters()]

In the init method, as usual, we initialize the MLP by giving the number of inputs of the neurons (nin) and then a list (nouts) where we store the number of neurons for each layer. When we create the self.layer attribute, we iterate over a list that we create where we have the number of inputs and outputs required for every layer. Now let’s look at how we can use it:

x = [1.0, 5.0, -6.0]
net = MLP(3, [4, 4, 1])
net(x)
>>> Value(data=something)

Now we are ready to train the net that we have built.

Creating the dataset

To start, we can manually create a very easy dataset and the desired target:

xs = [
    [1.0, 2.0, -1.0],
    [6.0, -4.0, 1],
    [2, 1.0, -1.0],
    [4.0, 3.0, -2.0],
]
ys = [2.0, -4.0, -3.0, 2.0]

So when we feed our net the first list, we want to obtain as a result the first element of the ys list. Now all we need is a loss function that can tell us how good the output of our net is. When we have our loss function, we can call the backpropagation on it (remember that it is a Value object so we can do it) and then we can slightly adjust the parameters according to what minimizes the loss function. Now we can implement it:

for k in range(20):
  
    # forward pass
    ypred = list()
    for x in xs:
        ypred.append(net(x))
    loss = 0

    for ygt, yout in zip(ys, ypred):
        loss += (yout - ygt)**2

    # backward pass
    for p in net.parameters():
        p.grad = 0.0
    loss.backward()

    # update
    for p in net.parameters():
        p.data += -0.1 * p.grad

    print(k, loss.data)

In this piece of code, we train our net following these steps:

Forward pass We feed the net the dataset and then we save the output of the net. Then we evaluate the loss using the MSE function.
Backward Pass We reset all the gradients of all parameters and then we compute the new gradients of each parameter by calling backward on the loss function.
Update We update the parameters in the direction that minimizes the loss function using a learning rate of 0.1.

Conclusion

In this first part, we built the fundamentals for understanding neural networks. We built our own api to implement it to have a deeper understand of how they work. In the next part, we are going to see how to implement a neural network for linear regression using an higher level approach.