The practice of machines to assimilate information via the paradigm of supervised learning algorithms has revolutionized several tasks like sequence generation, natural language processing and even computer vision. This approach is based on utilizing a dataset which has a set of input features and a corresponding set of labels. The machine then uses this information present in the form of features and labels to learn the distribution and patterns of the data to make statistical predictions on unseen inputs.
A paramount step in designing deep learning models is evaluating the model performance, especially on new and unseen data points. The key goal is to develop models that generalize beyond the data that they were trained on. We want models that can make good and reliable predictions in the real world. An important concept that helps us with this is model validation and regularization which we will cover today.
Building a machine learning model always boils down to splitting the available data into three sets: training, validation, and test set. The training data is used by the model to learn the quirks and characteristics of the distribution.
A focal point to know here is that a satisfactory performance of the model on the training set does not mean the model will also generalize on new data with similar performance, this is because the model has become biased to the training set. The concept of validation and test set is therefore used to report how well the model generalizes on new data points.
The standard procedure is to use the training data to fit the model, evaluate the model performance using the validation data and finally the test data is used to apprise how well the model will perform on downright new examples.
The validation set is used to tune the hyperparameters (number of hidden layers, learning rate, dropout rate, etc.) so that the model can generalize well. A common conundrum faced by machine learning novices is understanding the need for separate validation and test sets.
The need for two distinct sets can be understood by the following intuition: for every deep neural network that needs to be designed, there exists multiple numbers of hyperparameters that need to be adjusted for satisfactory performance.
Multiple models can be trained using either of the hyperparameters and then the model with the best performance metric can be selected based on the performance of that model on the validation set. Now, each time the hyperparameters are tweaked for better performance on the validation set, some information is leaked/fed into the model, hence, the final weights of the neural network may get biased towards the validation set.
After each adjustment of the hyperparameter, our model continues to perform well on the validation set because that is what we optimized it for. This is the reason that the validation test cannot accurately denote the generalization ability of the model. To overcome this drawback, the test set comes into play.
The most accurate representation of the generalization ability of a model is given by the performance on the test set as we did not optimize the model for better performance on this set and hence, this will indicate the most pragmatic estimate of the model’s ability.
Implementing Validation Strategies using TensorFlow 2.0
TensorFlow 2.0 supplies an extremely easy solution to track the performance of our model on a separate held-out validation test. We can pass the validation_split keyword argument in the model.fit()method.
The validation_split keyword takes input as a floating number between 0 & 1 which represents the fraction of training data to be used as validation data. So, passing the value of 0.1 in the keyword means reserving 10% of the training data for validation.
The practical implementation of validation split can be demonstrated easily using the Diabetes Dataset from sklearn. The dataset has 442 instances with 10 baseline variables (age, sex, BMI, etc.) as training features and the measure of disease progression after one year as its label.
We import the dataset using TensorFlow and sklearn:
The fundamental step after data pre-processing is to build a sequential feedforward neural network with dense layers:
Here, we have a neural network with six hidden layers with relu activation and one output layer with linear activation.
We then compile the model with the Adam optimizer and mean squared error loss function.
The model.fit() method is then used to train the model for 100 epochs with a validation_split of 15%.
We may also plot the loss of the model as observed for both the training data and the validation data:
The plot displayed above shows that the validation loss continuously spikes up after 10 epochs while the training loss continues to decrease. This trend is a textbook example of an incredibly significant problem in machine learning which is called overfitting.
A lot of seminal research has been conducted to overcome this problem and collectively these solutions are called regularization techniques. The following section will cover the aspect of regularization and the procedure for regularizing any deep learning model.
Regularizing our Model
In the previous section we observed a converse trend in the loss plots of the training and validation sets where the cost function plot of the latter set seems to rise and that of the former set continues decreasing and hence, creating a gap (generalization gap). Learn more about regularization in machine learning.
The fact that there exists such a gap between the two loss plots symbolises that the model cannot generalize well on the validation set (unseen data) and hence the cost/loss value incurred on that dataset would also be inevitably high.
This peculiarity eventuates because the weights and biases of the trained model get co-adapted to learn the distribution of the training data so well, that it fails to predict the labels of new and unseen features leading to an increased validation loss.
The rationale is that configuring a complex model will produce such anomalies since the models parameters grow to become highly robust for the training data. Hence, simplifying or reducing the models capacity/complexity will reduce the overfitting effect. One way to achieve this is by using dropouts in our deep learning model which we will cover in the next section.
Understanding and Implementing Dropouts in TensorFlow
The key perception behind using dropouts is to randomly drop hidden and visible units in order to obtain a less-complex model which restricts the model’s parameters from increasing and therefore, making the model more sturdy for performance on a generalized dataset.
This recently accepted practice is a powerful approach used by machine learning practitioners for inducing a regularizing effect in any deep learning model. Dropouts can be implemented effortlessly using the Keras API over TensorFlow by importing the dropout layer and passing the rate argument in it to specify the fraction of units that needs to be dropped.
These dropout layers are generally stacked right after each dense layer to produce an alternating tide of a dense-dropout layer architecture.
We can modify our previously defined feedforward neural network to include six dropout layers, one for each hidden layer:
Here, the dropout_rate has been set to 0.2 which signifies that 20% of the nodes will be dropped while training the model. We compile and train the model with the same optimizer, loss function, metrics, and the number of epochs for making a fair comparison.
The primary impact of regularizing the model using dropouts can be interpreted by again plotting the loss curve of the model obtained on the training and validation sets:
It is evident from the above plot that the generalization gap obtained after regularizing the model is much less which makes the model less susceptible to overfit the training data.
Also Read: Deep Learning Project Ideas
The aspect of model validation and regularization is an essential part of designing the workflow of building any machine learning solution. A lot of research is being conducted in order to improvise supervised learning and this hands-on tutorial provides a brief insight to some of the most accepted practices and techniques while assembling any learning algorithm.
If you’re interested to learn more about deep learning techniques, machine learning, check out IIIT-B & upGrad’s PG Certification in Machine Learning & Deep Learning which is designed for working professionals and offers 240+ hours of rigorous training, 5+ case studies & assignments, IIIT-B Alumni status & job assistance with top firms.