Capsule Neural Networks: What is, How it Works, Architecture & Components

How do you recognize things? If I write ‘Their’ and ‘Thier,’ would you read both of them as ‘Their’? Your answer would probably be yes. 

Your brain can identify primary features and help you recognize things. That’s why you can spot faces easily. Capsule neural networks work similarly. In this article, we’ll take a look at what they are and how they work. If you’re interested in machine learning algorithms, you’d surely like this article. So, let’s get started. 

What is a Capsule Neural Network?

A capsule neural network focuses on the replication of biological neural networks to perform better recognition and segmentation. They are a type of Artificial Neural Network. They have a nested layer under one layer of the capsule neural networks, that’s what the word ‘capsule’ indicates. 

The capsules in these networks determine the parameters of an object’s features. Suppose your capsule networks have to identify a face. The capsules will focus on determining whether the specific facial features are present or not. They aren’t restricted to this alone. They will also check how the features of the particular face are organized. So, your system can identify a face only when the capsules determine that the elements of that face are in the right order. 

You might wonder, how do they determine the order of those features? These networks can do so because of the input you give them. When they have examined hundreds (or even thousands) of images, they can perform this task efficiently. 

Learn more: Neural Networks: Applications in the Real World

How do Capsule Networks Work?

Now, let’s take a look at how these networks operate. Initially, the capsules perform matrix multiplication of the weight matrices with input vectors. This gives us information on the spatial relationship between several low-level and high-level features. 

After that, the capsules select a parent capsule. They make the selection through dynamic routing, which we’ve discussed later in this article. Once they have chosen their parent capsule, they find the sum of the vectors squashed between 0 and 1 when they hold on to their direction. You perform squashing through using the norm of the coordinate frame as the existence probability and the cosine distance to be the measure of agreement. 

There’s a significant difference between standard neural networks and capsule neural networks. While capsule networks use capsules to encapsulate essential bits of information about an image, standard neural networks use neurons for this purpose. Capsules produce vectors, whereas neurons can only produce scalar quantities. Due to this reason, capsules can identify the direction of a face (or a specific feature), but neurons can’t. If you’d change the direction of any feature, the vector’s value will remain the same, but its direction will change according to the change in position. 

Capsule networks perform amazingly well on small datasets, and they make it easier to interpret robust images. Apart from that, they retain all the information of the picture, including the texture, location, and pose. Their only drawback is they can’t outperform vast datasets. 

Read: 6 Types of Activation Function in Neural Networks

What is the Architecture of a Capsule Neural Network?

The primary two components of a capsule network are an encoder and a decoder. In total, they contain six layers. The encoder has the first three layers, and they have the responsibility of taking and converting the input image into a vector (16-dimensional). The first layer of the encoder is the convolutional neural network, and it extracts the basic features of the picture. 

The second layer is the PrimaryCaps Network, and it takes those essential features and finds more detailed patterns amongst them. For example, it could see the spatial relationship between particular strokes. Different datasets have different numbers of capsules in the PrimaryCaps Network; for example, the MNIST dataset has 32 capsules. The third layer is the DigitCaps Network, and the number of capsules present in it varies as well. After these layers, the encoder has a 16-dimensional vector that goes to the decoder. 

The decoder has three connected layers. It takes the 16-dimensional vector and tries to reconstruct the same image from scratch with the help of the data it has. This way, the network becomes more robust as it can make predictions according to its knowledge. 

Also read: Recurrent Neural Network in Python

Computations in a CNN

Matrix Multiplication

Between the first layer and the second layer, we perform the matrix multiplication. This encodes the information of spatial relationships, and the encoded info shows the probability of label classifications.

Scalar Weights

In this stage of computations, the lower level capsules adjust their weights according to the weights of the high-level capsules. They do so to match the weights of the high-level capsules. The high-level capsules graph the weight distribution and accept the largest allocation to pass. They all communicate with each other through dynamic routing. 

Dynamic Routing 

In dynamic routing, the lower capsules send their data to the parent capsule. They all send their data to the most suitable capsule according to them, and the capsule that gets most of the data becomes the parent capsule. The parent capsules follow the agreement and assign the weights accordingly. 

To understand dynamic routing, suppose you give your capsule network images of a house. It faces some problems with the identification of the house’s roof. So the capsules analyze the image, specifically its constant part. They coordinate the frame of the house concerning the walls and roof.

They first make the decision whether the object is a house or not and then send their predictions to the high-level capsules. If the projections of the roof concerning the walls match other predictions from low-level capsules, the output says the object is a house. This is the process of routing by agreement. 

Vector-to-vector nonlinearity

Once dynamic routing is complete, the system squashes the information, which means it compresses that information. It gives you the probability of whether the capsule will recognize a particular feature or not. 

Final Thoughts

After going through this article, you must’ve got familiar with capsule neural networks and their operations. You must’ve also realized how useful their actions could be. 

If you want to learn more about machine learning algorithms, check out our blog. You’ll find some knowledgeable articles there.

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Machine Learning & AI Diploma

EARN PG DIPLOMA IN MACHINE LEARNING AND AI FROM UPGRAD.
Learn More @ upGrad

Leave a comment

Your email address will not be published. Required fields are marked *

Our Popular Machine Learning Course

Accelerate Your Career with upGrad

×