Neural Networks are a subset of Machine Learning techniques which learn the data and patterns in a different way utilizing Neurons and Hidden layers. Neural Networks are way more powerful due to their complex structure and can be used in applications where traditional Machine Learning algorithms just cannot suffice.
By the end of this tutorial, you will have the knowledge of:
- A brief history of Neural Networks
- What are Neural Networks
- Types of Neural Networks
- Feed Forward Networks
- Multi-Layer Perceptron
- Radial Based Networks
- Convolutional Neural Networks
- Recurrent Neural Networks
- Long Short-Term Memory Networks
A Brief History of Neural Networks
Researchers from the 60s have been researching and formulating ways to imitate the functioning of human neurons and how the brain works. Although it is extremely complex to decode, a similar structure was proposed which could be extremely efficient in learning hidden patterns in Data.
For most of the 20th century, Neural Networks were considered incompetent. They were complex and their performance was poor. Also, they required a lot of computing power which was not available at that time. However, when the team of Sir Geoffrey Hinton, also dubbed as “The Father of Deep Learning”, published the research paper on Backpropagation, tables turned completely. Neural Networks could now achieve which was not thought of.
What are Neural Networks?
Neural Networks use the architecture of human neurons which have multiple inputs, a processing unit, and single/multiple outputs. There are weights associated with each connection of neurons. By adjusting these weights, a neural network arrives at an equation which is used for predicting outputs on new unseen data. This process is done by backpropagation and updating of the weights.
Types of Neural Networks
Different types of neural networks are used for different data and applications. The different architectures of neural networks are specifically designed to work on those particular types of data or domain. Let’s start from the most basic ones and go towards more complex ones.
The Perceptron is the most basic and oldest form of neural networks. It consists of just 1 neuron which takes the input and applies activation function on it to produce a binary output. It doesn’t contain any hidden layers and can only be used for binary classification tasks.
The neuron does the processing of addition of input values with their weights. The resulted sum is then passed to the activation function to produce a binary output.
Learn about: Deep Learning vs Neural Networks
Feed Forward Network
The Feed Forward (FF) networks consist of multiple neurons and hidden layers which are connected to each other. These are called “feed-forward” because the data flow in the forward direction only, and there is no backward propagation. Hidden layers might not be necessarily present in the network depending upon the application.
More the number of layers more can be the customization of the weights. And hence, more will be the ability of the network to learn. Weights are not updated as there is no backpropagation. The output of multiplication of weights with the inputs is fed to the activation function which acts as a threshold value.
FF networks are used in:
- Speech recognition
- Face recognition
- Pattern recognition
The main shortcoming of the Feed Forward networks was its inability to learn with backpropagation. Multi-layer Perceptrons are the neural networks which incorporate multiple hidden layers and activation functions. The learning takes place in a Supervised manner where the weights are updated by the means of Gradient Descent.
Multi-layer Perceptron is bi-directional, i.e., Forward propagation of the inputs, and the backward propagation of the weight updates. The activation functions can be changes with respect to the type of target. Softmax is usually used for multi-class classification, Sigmoid for binary classification and so on. These are also called dense networks because all the neurons in a layer are connected to all the neurons in the next layer.
They are used in Deep Learning based applications but are generally slow due to their complex structure.
Radial Basis Networks
Radial Basis Networks (RBN) use a completely different way to predict the targets. It consists of an input layer, a layer with RBF neurons and an output. The RBF neurons store the actual classes for each of the training data instances. The RBN are different from the usual Multilayer perceptron because of the Radial Function used as an activation function.
When the new data is fed into the neural network, the RBF neurons compare the Euclidian distance of the feature values with the actual classes stored in the neurons. This is similar to finding which cluster to does the particular instance belong. The class where the distance is minimum is assigned as the predicted class.
The RBNs are used mostly in function approximation applications like Power Restoration systems.
Also read: Neural Network Applications in Real World
Convolutional Neural Networks
When it comes to image classification, the most used neural networks are Convolution Neural Networks (CNN). CNN contain multiple convolution layers which are responsible for the extraction of important features from the image. The earlier layers are responsible for low-level details and the later layers are responsible for more high-level features.
The Convolution operation uses a custom matrix, also called as filters, to convolute over the input image and produce maps. These filters are initialized randomly and then are updated via backpropagation. One example of such a filter is the Canny Edge Detector, which is used to find the edges in any image.
After the convolution layer, there is a pooling layer which is responsible for the aggregation of the maps produced from the convolutional layer. It can be Max Pooling, Min Pooling, etc. For regularization, CNNs also include an option for adding dropout layers which drop or make certain neurons inactive to reduce overfitting and quicker convergence.
CNNs use ReLU (Rectified Linear Unit) as activation functions in the hidden layers. As the last layer, the CNNs have a fully connected dense layer and the activation function mostly as Softmax for classification, and mostly ReLU for regression.
Recurrent Neural Networks
Recurrent Neural Networks come into picture when there’s a need for predictions using sequential data. Sequential data can be a sequence of images, words, etc. The RNN have a similar structure to that of a Feed-Forward Network, except that the layers also receive a time-delayed input of the previous instance prediction. This instance prediction is stored in the RNN cell which is a second input for every prediction.
However, the main disadvantage of RNN is the Vanishing Gradient problem which makes it very difficult to remember earlier layers’ weights.
Long Short-Term Memory Networks
LSTM neural networks overcome the issue of Vanishing Gradient in RNNs by adding a special memory cell that can store information for long periods of time. LSTM uses gates to define which output should be used or forgotten. It uses 3 gates: Input gate, Output gate and a Forget gate. The Input gate controls what all data should be kept in memory. The Output gate controls the data given to the next layer and the forget gate controls when to dump/forget the data not required.
LSTMs are used in various applications such as:
- Gesture recognition
- Speech recognition
- Text prediction
Before you go
Neural Networks can get very complex within no time s you keep on adding layers in the network. There are times when where we can leverage the immense research in this field by using pre-trained networks for our use.
This is called Transfer Learning. In this tutorial, we covered most of the basic neural networks and their functioning. Make sure to try these out using the Deep Learning frameworks like Keras and Tensorflow.
If you’re interested to learn more about neural network, machine learning & AI, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.