Building a Multi-Layer Perceptron from Scratch with NumPy

4 min readJul 10, 2024

Understanding the mechanics of machine learning models by building them from scratch is an invaluable exercise for any aspiring data scientist or machine learning engineer. This blog post guides you through creating a Multi-Layer Perceptron (MLP) using Python and NumPy, inspired by the Machine-Learning-from-Scratch repository.

*Diagram of a Multi-Layer Perceptron (MLP) architecture.*

For more details and the full implementation, check out the Machine-Learning-from-Scratch repository.

GitHub - elcaiseri/Machine-Learning-from-Scratch: ML using NumPy

ML using NumPy. Contribute to elcaiseri/Machine-Learning-from-Scratch development by creating an account on GitHub.

github.com

Happy coding! Understanding the importance of building models from scratch.

What is a Multi-Layer Perceptron?

An MLP is a type of neural network consisting of at least three layers: an input layer, one or more hidden layers, and an output layer. Each neuron in one layer is connected to every neuron in the next layer. The MLP learns by adjusting the weights of these connections to minimize the error of its predictions, using a process called backpropagation.

*Illustration of neural network layers.*

Key Components of MLP

Layers and Nodes:

Input Layer: Receives input data.
Hidden Layers: Perform intermediate computations.
Output Layer: Produces the final prediction.

2. Activation Functions:

Sigmoid:

ReLU:

Softmax:

3. Loss Function:

Cross-Entropy Loss: Measures the performance of a classification model.

4. Optimization:

Gradient Descent: Updates weights to minimize the loss function.
Backpropagation: Computes gradients of the loss function with respect to each weight.

Step-by-Step Implementation

Step 1: Initialize Network Parameters

import numpy as np

class MLP:
    def __init__(self, input_size, hidden_size, output_size):
        self.weights_input_hidden = np.random.randn(input_size, hidden_size)
        self.weights_hidden_output = np.random.randn(hidden_size, output_size)
        self.bias_hidden = np.zeros((1, hidden_size))
        self.bias_output = np.zeros((1, output_size))

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def softmax(self, x):
        exp_x = np.exp(x - np.max(x))
        return exp_x / exp_x.sum(axis=1, keepdims=True)

Explanation:

Initialize weights and biases: Randomly initialize weights and set biases to zero.

Step 2: Forward Propagation

def forward(self, X):
    self.hidden_input = np.dot(X, self.weights_input_hidden) + self.bias_hidden
    self.hidden_output = self.sigmoid(self.hidden_input)
    self.final_input = np.dot(self.hidden_output, self.weights_hidden_output) + self.bias_output
    self.final_output = self.softmax(self.final_input)
    return self.final_output

*Forward propagation through the network.*

Explanation:

Compute hidden and output layer inputs and outputs: Apply activation functions to compute the activations.

Step 3: Backward Propagation

def backward(self, X, y, output, learning_rate):
    output_error = output - y
    hidden_error = np.dot(output_error, self.weights_hidden_output.T) * self.hidden_output * (1 - self.hidden_output)
    
    self.weights_hidden_output -= learning_rate * np.dot(self.hidden_output.T, output_error)
    self.bias_output -= learning_rate * np.sum(output_error, axis=0, keepdims=True)
    self.weights_input_hidden -= learning_rate * np.dot(X.T, hidden_error)
    self.bias_hidden -= learning_rate * np.sum(hidden_error, axis=0, keepdims=True)

*Backward propagation for updating weights and biases.*

Explanation:

Compute errors and update weights and biases: Adjust the weights and biases using the gradient descent algorithm.

Step 4: Training the Model

def train(self, X, y, epochs, learning_rate):
    for epoch in range(epochs):
        output = self.forward(X)
        self.backward(X, y, output, learning_rate)
        if (epoch+1) % 100 == 0:
            loss = -np.sum(y * np.log(output)) / X.shape[0]
            print(f'Epoch {epoch+1}, Loss: {loss:.4f}')

*Training the MLP model over multiple epochs.*

Explanation:

Train the network: Perform forward and backward propagation for a specified number of epochs and print the loss periodically.

Step 5: Predicting

def predict(self, X):
    output = self.forward(X)
    return np.argmax(output, axis=1)

Explanation:

Make predictions: Compute the output and return the predicted class labels.

Conclusion

Building an MLP from scratch using NumPy helps demystify neural networks and provides a deeper understanding of their mechanics. This exercise not only enhances your appreciation for the complexity of machine learning models but also equips you with practical skills for developing and troubleshooting them.

Building a Multi-Layer Perceptron from Scratch with NumPy

GitHub - elcaiseri/Machine-Learning-from-Scratch: ML using NumPy

ML using NumPy. Contribute to elcaiseri/Machine-Learning-from-Scratch development by creating an account on GitHub.

What is a Multi-Layer Perceptron?

Key Components of MLP

Step-by-Step Implementation

Step 1: Initialize Network Parameters

Step 2: Forward Propagation

Explanation:

Step 3: Backward Propagation

Explanation:

Step 4: Training the Model

Explanation:

Step 5: Predicting

Conclusion

Written by Kassem

Responses (1)