Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconHands-On Introduction to Model Validation and Regularization in Deep Learning using TensorFlow

Hands-On Introduction to Model Validation and Regularization in Deep Learning using TensorFlow

Last updated:
27th Oct, 2020
Views
Read Time
8 Mins
share image icon
In this article
Chevron in toc
View All
Hands-On Introduction to Model Validation and Regularization in Deep Learning using TensorFlow

Introduction 

The practice of machines to assimilate information via the paradigm of supervised learning algorithms has revolutionized several tasks like sequence generation, natural language processing and even computer vision. This approach is based on utilizing a dataset which has a set of input features and a corresponding set of labels. The machine then uses this information present in the form of features and labels to learn the distribution and patterns of the data to make statistical predictions on unseen inputs. 

Top Machine Learning and AI Courses Online

A paramount step in designing deep learning models is evaluating the model performance, especially on new and unseen data points. The key goal is to develop models that generalize beyond the data that they were trained on. We want models that can make good and reliable predictions in the real world. An important concept that helps us with this is model validation and regularization which we will cover today.  

Trending Machine Learning Skills

Ads of upGrad blog

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Model Validation 

Building a machine learning model always boils down to splitting the available data into three sets: training, validation, and test set. The training data is used by the model to learn the quirks and characteristics of the distribution.

A focal point to know here is that a satisfactory performance of the model on the training set does not mean the model will also generalize on new data with similar performance, this is because the model has become biased to the training set. The concept of validation and test set is therefore used to report how well the model generalizes on new data points. 

 The standard procedure is to use the training data to fit the model, evaluate the model performance using the validation data and finally the test data is used to apprise how well the model will perform on downright new examples.

The validation set is used to tune the hyperparameters (number of hidden layers, learning rate, dropout rate, etc.) so that the model can generalize well. A common conundrum faced by machine learning novices is understanding the need for separate validation and test sets.

The need for two distinct sets can be understood by the following intuition: for every deep neural network that needs to be designed, there exists multiple numbers of hyperparameters that need to be adjusted for satisfactory performance. 

Multiple models can be trained using either of the hyperparameters and then the model with the best performance metric can be selected based on the performance of that model on the validation set. Now, each time the hyperparameters are tweaked for better performance on the validation set, some information is leaked/fed into the model, hence, the final weights of the neural network may get biased towards the validation set.

After each adjustment of the hyperparameter, our model continues to perform well on the validation set because that is what we optimized it for. This is the reason that the validation test cannot accurately denote the generalization ability of the model. To overcome this drawback, the test set comes into play.

The most accurate representation of the generalization ability of a model is given by the performance on the test set as we did not optimize the model for better performance on this set and hence, this will indicate the most pragmatic estimate of the model’s ability.  

Must Read: Top Deep Learning Techniques You Should Know About

Implementing Validation Strategies using TensorFlow 2.0 

TensorFlow 2.0 supplies an extremely easy solution to track the performance of our model on a separate held-out validation test. We can pass the validation_split keyword argument in the model.fit()method.

The  validation_split keyword takes input as a floating number between 0 & 1 which represents the fraction of training data to be used as validation data. So, passing the value of 0.1 in the keyword means reserving 10% of the training data for validation.

The practical implementation of validation split can be demonstrated easily using the Diabetes Dataset from sklearn. The dataset has 442 instances with 10 baseline variables (age, sex, BMI, etc.) as training features and the measure of disease progression after one year as its label.  

We import the dataset using TensorFlow and sklearn:

The fundamental step after data pre-processing is to build a sequential feedforward neural network with dense layers:

Here, we have a neural network with six hidden layers with relu activation and one output layer with linear activation.  

We then compile the model with the Adam optimizer and mean squared error loss function.

The model.fit() method is then used to train the model for 100 epochs with a validation_split of 15%. 

We may also plot the loss of the model as observed for both the training data and the validation data:

The plot displayed above shows that the validation loss continuously spikes up after 10 epochs while the training loss continues to decrease. This trend is a textbook example of an incredibly significant problem in machine learning which is called overfitting.

A lot of seminal research has been conducted to overcome this problem and collectively these solutions are called regularization techniques. The following section will cover the aspect of regularization and the procedure for regularizing any deep learning model. 

Popular AI and ML Blogs & Free Courses

Regularizing our Model 

In the previous section we observed a converse trend in the loss plots of the training and validation sets where the cost function plot of the latter set seems to rise and that of the former set continues decreasing and hence, creating a gap (generalization gap). Learn more about regularization in machine learning.

The fact that there exists such a gap between the two loss plots symbolises that the model cannot generalize well on the validation set (unseen data) and hence the cost/loss value incurred on that dataset would also be inevitably high. 

This peculiarity eventuates because the weights and biases of the trained model get co-adapted to learn the distribution of the training data so well, that it fails to predict the labels of new and unseen features leading to an increased validation loss.

The rationale is that configuring a complex model will produce such anomalies since the models parameters grow to become highly robust for the training data. Hence, simplifying or reducing the models capacity/complexity will reduce the overfitting effect. One way to achieve this is by using dropouts in our deep learning model which we will cover in the next section.

Understanding and Implementing Dropouts in TensorFlow

The key perception behind using dropouts is to randomly drop hidden and visible units in order to obtain a less-complex model which restricts the model’s parameters from increasing and therefore, making the model more sturdy for performance on a generalized dataset.

This recently accepted practice is a powerful approach used by machine learning practitioners for inducing a regularizing effect in any deep learning model. Dropouts can be implemented effortlessly using the Keras API over TensorFlow by importing the dropout layer and passing the rate argument in it to specify the fraction of units that needs to be dropped.

These dropout layers are generally stacked right after each dense layer to produce an alternating tide of a dense-dropout layer architecture. 

We can modify our previously defined feedforward neural network to include six dropout layers, one for each hidden layer:

Here, the dropout_rate has been set to 0.2 which signifies that 20% of the nodes will be dropped while training the model. We compile and train the model with the same optimizer, loss function, metrics, and the number of epochs for making a fair comparison.

The primary impact of regularizing the model using dropouts can be interpreted by again plotting the loss curve of the model obtained on the training and validation sets:

Ads of upGrad blog

It is evident from the above plot that the generalization gap obtained after regularizing the model is much less which makes the model less susceptible to overfit the training data.

Also Read: Deep Learning Project Ideas

Conclusion

The aspect of model validation and regularization is an essential part of designing the workflow of building any machine learning solution. A lot of research is being conducted in order to improvise supervised learning and this hands-on tutorial provides a brief insight to some of the most accepted practices and techniques while assembling any learning algorithm.

If you’re interested to learn more about deep learning techniquesmachine learning, check out IIIT-B & upGrad’s PG Certification in Machine Learning & Deep Learning which is designed for working professionals and offers 240+ hours of rigorous training, 5+ case studies & assignments, IIIT-B Alumni status & job assistance with top firms.

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select Coursecaret down icon
Selectcaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Explore Free Courses

Suggested Blogs

Artificial Intelligence course fees
5205
Artificial intelligence (AI) was one of the most used words in 2023, which emphasizes how important and widespread this technology has become. If you
Read More

by venkatesh Rajanala

29 Feb 2024

Artificial Intelligence in Banking 2024: Examples & Challenges
5700
Introduction Millennials and their changing preferences have led to a wide-scale disruption of daily processes in many industries and a simultaneous g
Read More

by Pavan Vadapalli

27 Feb 2024

Top 9 Python Libraries for Machine Learning in 2024
75251
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 Feb 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced
64235
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 Feb 2024

Data Preprocessing in Machine Learning: 7 Easy Steps To Follow
151386
Summary: In this article, you will learn about data preprocessing in Machine Learning: 7 easy steps to follow. Acquire the dataset Import all the cr
Read More

by Kechit Goyal

18 Feb 2024

Artificial Intelligence Salary in India [For Beginners & Experienced] in 2024
907989
Artificial Intelligence (AI) has been one of the hottest buzzwords in the tech sphere for quite some time now. As Data Science is advancing, both AI a
Read More

by upGrad

18 Feb 2024

24 Exciting IoT Project Ideas & Topics For Beginners 2024 [Latest]
755168
Summary: In this article, you will learn the 24 Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Smart Agr
Read More

by Kechit Goyal

18 Feb 2024

Natural Language Processing (NLP) Projects & Topics For Beginners [2023]
106847
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

17 Feb 2024

45+ Interesting Machine Learning Project Ideas For Beginners [2024]
326824
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

16 Feb 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon