Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconHands-On Introduction to Model Validation and Regularization in Deep Learning using TensorFlow

Hands-On Introduction to Model Validation and Regularization in Deep Learning using TensorFlow

Last updated:
27th Oct, 2020
Views
Read Time
8 Mins
share image icon
In this article
Chevron in toc
View All
Hands-On Introduction to Model Validation and Regularization in Deep Learning using TensorFlow

Introduction 

The practice of machines to assimilate information via the paradigm of supervised learning algorithms has revolutionized several tasks like sequence generation, natural language processing and even computer vision. This approach is based on utilizing a dataset which has a set of input features and a corresponding set of labels. The machine then uses this information present in the form of features and labels to learn the distribution and patterns of the data to make statistical predictions on unseen inputs. 

Top Machine Learning and AI Courses Online

A paramount step in designing deep learning models is evaluating the model performance, especially on new and unseen data points. The key goal is to develop models that generalize beyond the data that they were trained on. We want models that can make good and reliable predictions in the real world. An important concept that helps us with this is model validation and regularization which we will cover today.  

Trending Machine Learning Skills

Ads of upGrad blog

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Model Validation 

Building a machine learning model always boils down to splitting the available data into three sets: training, validation, and test set. The training data is used by the model to learn the quirks and characteristics of the distribution.

A focal point to know here is that a satisfactory performance of the model on the training set does not mean the model will also generalize on new data with similar performance, this is because the model has become biased to the training set. The concept of validation and test set is therefore used to report how well the model generalizes on new data points. 

 The standard procedure is to use the training data to fit the model, evaluate the model performance using the validation data and finally the test data is used to apprise how well the model will perform on downright new examples.

The validation set is used to tune the hyperparameters (number of hidden layers, learning rate, dropout rate, etc.) so that the model can generalize well. A common conundrum faced by machine learning novices is understanding the need for separate validation and test sets.

The need for two distinct sets can be understood by the following intuition: for every deep neural network that needs to be designed, there exists multiple numbers of hyperparameters that need to be adjusted for satisfactory performance. 

Multiple models can be trained using either of the hyperparameters and then the model with the best performance metric can be selected based on the performance of that model on the validation set. Now, each time the hyperparameters are tweaked for better performance on the validation set, some information is leaked/fed into the model, hence, the final weights of the neural network may get biased towards the validation set.

After each adjustment of the hyperparameter, our model continues to perform well on the validation set because that is what we optimized it for. This is the reason that the validation test cannot accurately denote the generalization ability of the model. To overcome this drawback, the test set comes into play.

The most accurate representation of the generalization ability of a model is given by the performance on the test set as we did not optimize the model for better performance on this set and hence, this will indicate the most pragmatic estimate of the model’s ability.  

Must Read: Top Deep Learning Techniques You Should Know About

Implementing Validation Strategies using TensorFlow 2.0 

TensorFlow 2.0 supplies an extremely easy solution to track the performance of our model on a separate held-out validation test. We can pass the validation_split keyword argument in the model.fit()method.

The  validation_split keyword takes input as a floating number between 0 & 1 which represents the fraction of training data to be used as validation data. So, passing the value of 0.1 in the keyword means reserving 10% of the training data for validation.

The practical implementation of validation split can be demonstrated easily using the Diabetes Dataset from sklearn. The dataset has 442 instances with 10 baseline variables (age, sex, BMI, etc.) as training features and the measure of disease progression after one year as its label.  

We import the dataset using TensorFlow and sklearn:

The fundamental step after data pre-processing is to build a sequential feedforward neural network with dense layers:

Here, we have a neural network with six hidden layers with relu activation and one output layer with linear activation.  

We then compile the model with the Adam optimizer and mean squared error loss function.

The model.fit() method is then used to train the model for 100 epochs with a validation_split of 15%. 

We may also plot the loss of the model as observed for both the training data and the validation data:

The plot displayed above shows that the validation loss continuously spikes up after 10 epochs while the training loss continues to decrease. This trend is a textbook example of an incredibly significant problem in machine learning which is called overfitting.

A lot of seminal research has been conducted to overcome this problem and collectively these solutions are called regularization techniques. The following section will cover the aspect of regularization and the procedure for regularizing any deep learning model. 

Popular AI and ML Blogs & Free Courses

Regularizing our Model 

In the previous section we observed a converse trend in the loss plots of the training and validation sets where the cost function plot of the latter set seems to rise and that of the former set continues decreasing and hence, creating a gap (generalization gap). Learn more about regularization in machine learning.

The fact that there exists such a gap between the two loss plots symbolises that the model cannot generalize well on the validation set (unseen data) and hence the cost/loss value incurred on that dataset would also be inevitably high. 

This peculiarity eventuates because the weights and biases of the trained model get co-adapted to learn the distribution of the training data so well, that it fails to predict the labels of new and unseen features leading to an increased validation loss.

The rationale is that configuring a complex model will produce such anomalies since the models parameters grow to become highly robust for the training data. Hence, simplifying or reducing the models capacity/complexity will reduce the overfitting effect. One way to achieve this is by using dropouts in our deep learning model which we will cover in the next section.

Understanding and Implementing Dropouts in TensorFlow

The key perception behind using dropouts is to randomly drop hidden and visible units in order to obtain a less-complex model which restricts the model’s parameters from increasing and therefore, making the model more sturdy for performance on a generalized dataset.

This recently accepted practice is a powerful approach used by machine learning practitioners for inducing a regularizing effect in any deep learning model. Dropouts can be implemented effortlessly using the Keras API over TensorFlow by importing the dropout layer and passing the rate argument in it to specify the fraction of units that needs to be dropped.

These dropout layers are generally stacked right after each dense layer to produce an alternating tide of a dense-dropout layer architecture. 

We can modify our previously defined feedforward neural network to include six dropout layers, one for each hidden layer:

Here, the dropout_rate has been set to 0.2 which signifies that 20% of the nodes will be dropped while training the model. We compile and train the model with the same optimizer, loss function, metrics, and the number of epochs for making a fair comparison.

The primary impact of regularizing the model using dropouts can be interpreted by again plotting the loss curve of the model obtained on the training and validation sets:

Ads of upGrad blog

It is evident from the above plot that the generalization gap obtained after regularizing the model is much less which makes the model less susceptible to overfit the training data.

Also Read: Deep Learning Project Ideas

Conclusion

The aspect of model validation and regularization is an essential part of designing the workflow of building any machine learning solution. A lot of research is being conducted in order to improvise supervised learning and this hands-on tutorial provides a brief insight to some of the most accepted practices and techniques while assembling any learning algorithm.

If you’re interested to learn more about deep learning techniquesmachine learning, check out IIIT-B & upGrad’s PG Certification in Machine Learning & Deep Learning which is designed for working professionals and offers 240+ hours of rigorous training, 5+ case studies & assignments, IIIT-B Alumni status & job assistance with top firms.

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Selectcaret down icon
Select Area of interestcaret down icon
Select Work Experiencecaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Explore Free Courses

Suggested Blogs

15 Interesting MATLAB Project Ideas & Topics For Beginners [2024]
82457
Diving into the world of engineering and data science, I’ve discovered the potential of MATLAB as an indispensable tool. It has accelerated my c
Read More

by Pavan Vadapalli

09 Jul 2024

5 Types of Research Design: Elements and Characteristics
47126
The reliability and quality of your research depend upon several factors such as determination of target audience, the survey of a sample population,
Read More

by Pavan Vadapalli

07 Jul 2024

Biological Neural Network: Importance, Components & Comparison
50612
Humans have made several attempts to mimic the biological systems, and one of them is artificial neural networks inspired by the biological neural net
Read More

by Pavan Vadapalli

04 Jul 2024

Production System in Artificial Intelligence and its Characteristics
86790
The AI market has witnessed rapid growth on the international level, and it is predicted to show a CAGR of 37.3% from 2023 to 2030. The production sys
Read More

by Pavan Vadapalli

03 Jul 2024

AI vs Human Intelligence: Difference Between AI & Human Intelligence
112983
In this article, you will learn about AI vs Human Intelligence, Difference Between AI & Human Intelligence. Definition of AI & Human Intelli
Read More

by Pavan Vadapalli

01 Jul 2024

Career Opportunities in Artificial Intelligence: List of Various Job Roles
89548
Artificial Intelligence or AI career opportunities have escalated recently due to its surging demands in industries. The hype that AI will create tons
Read More

by Pavan Vadapalli

26 Jun 2024

Gini Index for Decision Trees: Mechanism, Perfect & Imperfect Split With Examples
70805
As you start learning about supervised learning, it’s important to get acquainted with the concept of decision trees. Decision trees are akin to
Read More

by MK Gurucharan

24 Jun 2024

Random Forest Vs Decision Tree: Difference Between Random Forest and Decision Tree
51730
Recent advancements have paved the growth of multiple algorithms. These new and blazing algorithms have set the data on fire. They help in handling da
Read More

by Pavan Vadapalli

24 Jun 2024

Basic CNN Architecture: Explaining 5 Layers of Convolutional Neural Network
270717
Introduction In the last few years of the IT industry, there has been a huge demand for once particular skill set known as Deep Learning. Deep Learni
Read More

by MK Gurucharan

21 Jun 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon