In recent years, Deep learning popularity has taken an abrupt slope in terms of usage and application in every sector of the industry. Whether it is image recognition, speech generation, translation, and many more such applications, almost every company wants to integrate this technology into one or the other products they are building. The reason for this supremacy over traditional machine learning algorithms is the accuracy and efficient performance provided by these Deep Learning models.
Though the infrastructure plays an important role in delivering these results, the core code does all the processing which is enclosed in a Neural Network. Let’s explore the various components of this network and then we will look at some fundamental units using these components.
Must Read: Neural Network Model Introduction
Various Components of Neural Network
The basic building block of a neural network is a neuron. This concept is very much similar to the actual neural network in our human brains. This artificial neuron takes all the inputs, aggregates them, and then based on a function gives the output of the neuron.
A neural network comprises many such neurons interconnected with each other in the form of layers known as the input, hidden, and output layers. This network enables us to map any kind of complex data pattern to a mathematical function, and this can be verified mathematically using the universal approximation theorem.
The model can have weights so that high values can be suppressed using negative values. You can interpret this by taking an example of a smartphone purchase. The higher the price, the lower will be chances of purchasing that smartphone, but if our model adds up all the values and compares it with the threshold, the wrong prediction may be done. To nullify this effect, negative weights should reduce the sum and get the right prediction.
There was a mention in the neuron definition that based on a function, the neuron will output the result either to the next layer if it’s part of the input or hidden layer or used for further processing in the output layer.
This function is called the activation function, and this defines the state of the neuron. There are a lot of activation functions available in the market that can do the job but it all depends on the use case. Examples are the sigmoid function, tanh function, the softmax function, Relu (rectified linear unit), leaky Relu, and many more.
It can control the pace of the weight update. Consider two cases where the learning rate acts as an important factor. If an input feature has more sparse values, then we need to update the weights more frequently, and that’s why a larger learning rate is desired. Similarly, a low learning rate can work in dense data.
Let’s look at some fundamental units making use of these components in larger neural networks.
This is the most basic form of Artificial Neuron that calculates the input sum and then passes it to the activation function to get the final output. Here is a visual of this:
The limiting factor to this is that the inputs should be binary and no real number is allowed. That means if we want to use a dataset with different values then that needs to be scaled to binary to be passed to the model.
The outputs of this model are also binary, which makes it hard to interpret the quality of results. The inputs don’t have any weights, so we can’t control how much contribution a feature will have to the result.
One of the significant drawbacks of MP neurons was that it can’t accept real numbers as inputs, which can lead to undesirable results. It means that if we want to pass an input feature to this neuron with real numbers, it needs to be downscaled to 1’s or 0’s. In this neuron model, there is no such limitation on inputs, but passing standardized inputs will give better results in less time as the aggregation of inputs would be fair for all the feature values.
A learning algorithm is also introduced, which makes this model even more robust to new inputs. The algorithm updates the weights applied to each input based on the loss function. The loss function determines the difference between the actual value and the predicted value by the model. Squared error loss is one such popular function used in deep learning models.
As the Perception neuron also gives out binary output, the loss can be zero or one. It means we can define the loss function of this type in a more compact way as “When the prediction is not equal to the true value, the loss is one and weights need to be updated else zero loss and no update needed”. The updates in the weights are done in the following way:
w = w + x if w.x < 0
w = w – x if w.x >= 0
The perceptron neuron seems promising as compared to the MP neuron, but there are still some issues that need to be addressed. One major flaw in both of them is that they only support binary classification. Another issue is the harsh classification boundaries that only output whether a particular case is possible. It doesn’t allow flexibility in predictions in the form of probabilities that are more interpretable than binary outputs.
To resolve all these issues, the Sigmoid neuron was introduced, which can be used for multi-classification and doing regression tasks. This model uses the sigmoid family of functions or logarithmic:
y = 1 / (1 + e^ (-w.x + b))
If we plot this function then it would take the ‘S’ shape where its position can be adjusted by using different values of ‘b’ which is the intercept of this curve. The output of this function always lies between 0 and 1, no matter how many inputs are passed. This gives out the probability of the class, which is better than rigid outputs. This also means we can have multiple classifications or perform regression.
The learning algorithm for this differs from the previous ones. Here the weights and bias are updated according to the derivative of the loss function.
This algorithm is commonly known as the Gradient Descent rule. The derivation and detailed explanation for this is quite lengthy and mathematical, therefore it is currently out of this article. In simple terms, it states that to get an optimal minima for the derivative of the loss function, we should move in a direction opposite to the gradient.
This was a brief introduction to Neural Networks. We saw the various basic components such as the neuron which acts as a mini-brain and processes the inputs, weights that allow to balance out values, learning rate to control how the pace of weights update and the activation function to fire up the neurons.
We also saw how the basic building block neuron can take different forms on increasing the complexity of the task. We started with the most basic form in the MP neuron, then eliminating some issues in the Perceptron neuron, and later on adding support for regression and multi-class classification tasks in the sigmoid neuron.
If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.