Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconUsing Convolutional Neural Network for Image Classification

Using Convolutional Neural Network for Image Classification

Last updated:
14th Aug, 2020
Read Time
11 Mins
share image icon
In this article
Chevron in toc
View All
Using Convolutional Neural Network for Image Classification

Image Classification Gets a Makeover. Thanks to CNN.

Convolutional Neural Networks (CNNs) are the backbone of image classification, a deep learning phenomenon that takes an image and assigns it a class and a label that makes it unique. Image classification using CNN forms a significant part of machine learning experiments.

Top Machine Learning and AI Courses Online

Together with using CNN and its induced capabilities, it is now widely used for a range of applications-right from Facebook picture tagging to Amazon product recommendations and healthcare imagery to automatic cars. The reason CNN is so popular is that it requires very little pre-processing, meaning that it can read 2D images by applying filters that other conventional algorithms cannot. We will delve deeper into the process of how image classification using CNN works.

Ads of upGrad blog

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

How Does CNN work?

CNN’s are equipped with an input layer, an output layer, and hidden layers, all of which help process and classify images. The hidden layers comprise convolutional layers, ReLU layers, pooling layers, and fully connected layers, all of which play a crucial role. Learn more about convolutional neural network.

Trending Machine Learning Skills

Let’s look at how image classification using CNN works:

Imagine that the input image is that of an elephant. This image, with pixels, is first entered into the convolutional layers. If it is a black and white picture, the image is interpreted as a 2D layer, with every pixel assigned a value between ‘0’and ‘255’, ‘0’ being wholly black, and ‘255’ completely white. If, on the other hand, it is a colour picture, this becomes a 3D array, with a blue, green, and red layer, with each colour value between 0 and 255. 

The reading of the matrix then begins, for which the software selects a smaller image, known as the ‘filter’ (or kernel). The depth of the filter is the same as the depth of the input. The filter then produces a convolution movement along with the input image, moving right along the image by 1 unit. 

It then multiplies the values with the original picture values. All the multiplied figures are added up together, and a single number is generated. The process is repeated along with the entire image, and a matrix is obtained, smaller than the original input image. 

The final array is called the feature map of an activation map. Convolution of an image helps perform operations such as edge detection, sharpening, and blurring, by applying different filters. All one needs to do is specify aspects such as the size of the filter, the number of filters and/or the architecture of the network. 

From a human perspective, this action is akin to identifying the simple colours and boundaries of an image. However, to classify the image and recognize the features that make it, say, that of an elephant and not of a cat, unique features such as large ears and trunk of the elephant need to be identified. This is where the non-linear and pooling layers come in. 

FYI: Free nlp course!

The non-linear layer (ReLU) follows the convolution layer, where an activation function is applied to the feature maps to increase the non-linearity of the image. The ReLU layer removes all negative values and increases the accuracy of the image. Although there are other operations like tanh or sigmoid, ReLU is the most popular since it can train the network much faster. 

The next step is to create several images of the same object so that the network can always recognize that image, whatever its size or location. For instance, in the elephant picture, the network must recognize the elephant, whether it is walking, standing still, or running. There must be image flexibility, and that’s where the pooling layer comes in. 

It works with the image’s measurements (height and width) to progressively reduce the size of the input image so that the objects in the image can be spotted and identified wherever it is located. 

Pooling also helps control ‘overfitting’ where there is too much information with no scope for new ones. Perhaps, the most common example of pooling is max pooling, where the image is divided into a series of non-overlapping areas. 

Max pooling is all about identifying the maximum value in each area so that all extra information is excluded, and the image becomes smaller in size. This action helps account for distortions in the image as well. 

Now comes the fully connected layer that adds an artificial neural network for using CNN. This artificial network combines different features and helps predict the image classes with greater accuracy. At this stage, the gradient of the error function is calculated concerning the neural network’s weight. The weights and feature detectors are adjusted to optimize performance, and this process is repeated repeatedly. 

Here’s what the CNN architecture looks like:


Leveraging datasets for CNN Application-MNIST

Several datasets can be used to apply CNN effectively. The three most popular ones vital in image classification using CNN are MNIST, CIFAR-10, and ImageNet. Let’s look at MNIST first.


 MNIST is an acronym for the Modified National Institute of Standards and Technology dataset and comprises 60,000 small, square 28×28 grayscale images of single, handwritten digits between 0 and 9. MNIST is a popular and well-understood dataset that is, for the greater part, ‘solved.’ It can be used in computer vision and deep learning to practice, develop, and evaluate image classification using CNN. Among other things, this includes steps to evaluate the performance of the model, explore possible improvements, and use it to predict new data. 

Its USP is that it already has a well-defined train and test dataset that we can use. This training set can further be divided into a train and validate dataset if one needs to evaluate the performance of a training run model. Its performance in the train and validate set on each run can be recorded as learning curves for greater insight into how well the model is learning the problem. 

Keras, one of the leading neural network APIs, supports this by stipulating the “validation_data argument to the model. Fit()function when training the model, which eventually returns an object that mentions model performance for the loss and metrics on each training run. Fortunately, MNIST is equipped with Keras by default, and the train and test files can be loaded using just a few lines of code.

Interestingly, an article by Yann LeCun, Professor at The Courant Institute of Mathematical Sciences at New York University and Corinna Cortes, Research Scientist at Google Labs in New York, points out that MNIST’s Special Database 3 (SD-3) was originally assigned as a training set. Special Database 1 (SD-1) was designated as a test set. 

However, they believe that SD-3 is much easier to identify and recognize than SD-1 because SD-3 was gathered from employees working in the Census Bureau, while SD-1 was sourced from among high-school students. Since accurate conclusions from learning experiments mandates that the result must be independent of the training set and test, it was deemed necessary to develop a fresh database by missing the datasets.  

When using the dataset, it is recommended to divide it into minibatches, store it in shared variables, and access it based on the minibatch index. You might wonder at the need for shared variables, but this is connected with using the GPU. What happens is that when copying data into the GPU memory, if you copy each minibatch separately as and when needed, the GPU code will slow down and not be much faster than the CPU code. If you have your data in Theano shared variables, there is a good chance of copying the whole data onto the GPU at one go when the shared variables are built. 

Later the GPU can use the minibatch by accessing these shared variables without needing to copy information from the CPU memory. Also, because the data points are usually real numbers and label integers, it would be good to use different variables for these as well as for the validation set, a training set, and testing set, to make the code easier to read.

The code below shows you how to store data and access a minibatch:


2. CIFAR-10 Dataset

CIFAR stands for the Canadian Institute for Advanced Research, and the CIFAR-10 dataset was developed by researchers at the CIFAR institute, along with the CIFAR-100 dataset. The CIFAR-10 dataset consists of 60,000 32×32 pixel colour images of objects belonging to ten classes such as cats, ships, birds, frogs, etc. These images are much smaller than an average photograph and are intended for computer vision purposes. 

CIFAR is a well understood, straightforward dataset that is 80% accurate in the image classification using the CNN  process and 90% on the test dataset. Also, as many as 1,000 images spread out over one test batch and five training batches.  

The CIFAR-10 dataset consists of 1,000 randomly selected images from each class, but some batches might contain more images from one class than another. However, the training batches contain exactly 5,000 images from each class. The CIFAR-10 dataset is preferred for its ease of use as a starting point for solving image classification CNN using problems. 

The design of its test harness is modular, and it can be developed with five elements that include dataset loading, model definition, dataset preparation, and the evaluation and result presentation. The example below shows the CIFAR-10 dataset using the Keras API with the first nine images in the training dataset:


Running the example loads the CIFAR-10 dataset and prints their shape.

3. ImageNet

ImageNet aims to categorize and label images into nearly 22,000 categories based on predefined words and phrases. To do this, it follows the WordNet hierarchy, where every word or phrase is a synonym or synset (in short). In ImageNet, all images are organized according to these synsets, to have over a thousand images per synset. 

However, when ImageNet is referred to in computer vision and deep learning, what is actually meant is the ImageNet Large Scale Recognition Challenge or ILSVRC. The goal here is to categorize an image into 1,000 different categories by using over 100,000 test images since the training dataset contains around 1.2 million images.

Also Read: The 7 Types of Artificial Neural Networks ML Engineers Need to Know

Perhaps the greatest challenge here is that the images in ImageNet measure 224×224, and so processing such a large amount of data requires massive CPU, GPU, and RAM capacity. This might prove impossible for an average laptop, so how does one overcome this problem? 

One way of doing this is to use Imagenette, a dataset extracted from ImageNet that doesn’t require too many resources. This dataset has two folders named ‘train’ (training) and ‘Val’ (validation) with individual folders for each class. All these classes have the same ID as the original dataset, with each of the classes having around 1,000 images, so the whole set up is pretty balanced. 

Another option is to use transfer learning, a method that uses pre-trained weights on large datasets. This is a very effective way of image classification using CNN  because we can use it to produce models that work well for us. The one aspect that an image classification using the CNN model should be able to do is to classify images belonging to the same class and distinguish between those that are different. This is where we can make use of the pre-trained weights. The advantage here is that we can use different methods depending on the kind of dataset we’re working with. 

Popular AI and ML Blogs & Free Courses

Summing up

Ads of upGrad blog

To sum up, image classification using CNN has made the process easier, more accurate, and less process-heavy. If you’d like to delve deeper into machine learning, upGrad has a range of courses that help you master it like a pro! 

upGrad offers various courses online with a wide range of subcategories; visit the official site for further information.      

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.


Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Selectcaret down icon
Select Area of interestcaret down icon
Select Work Experiencecaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Frequently Asked Questions (FAQs)

1What are convolutional neural networks?

Convolutional neural networks (CNNs), or convnets, are a category of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery. The design of CNNs is loosely inspired by the organization of mammalian visual cortex, although they have also been applied to audio, speech, and other domains. CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing. This makes them less error-prone and more portable to a diverse set of problems, but sacrifices the ability to perform non-linear transformations on their inputs.

2Why are convolutional neural networks good for image classification?

The big limitation of CNN is that it is unable to grasp context in an image. It is also unable to do faces and do color. More limitations of CNN: The learning techniques used in neural networks are not sufficient to reproduce higher cognitive functions such as object recognition, learning, spatial awareness and the ability to transfer experience. The architecture of neural networks are not flexible enough to overcome these limitations.

3Why is CNN best for image classification?

Explore Free Courses

Suggested Blogs

Top 5 Natural Language Processing (NLP) Projects & Topics For Beginners [2024]
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

30 May 2024

Top 8 Exciting AWS Projects & Ideas For Beginners [2024]
AWS Projects & Topics Looking for AWS project ideas? Then you’ve come to the right place because, in this article, we’ve shared multiple AWS proj
Read More

by Pavan Vadapalli

30 May 2024

Bagging vs Boosting in Machine Learning: Difference Between Bagging and Boosting
Owing to the proliferation of Machine learning applications and an increase in computing power, data scientists have inherently implemented algorithms
Read More

by Pavan Vadapalli

25 May 2024

45+ Best Machine Learning Project Ideas For Beginners [2024]
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

21 May 2024

Top 9 Python Libraries for Machine Learning in 2024
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 May 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 May 2024

40 Best IoT Project Ideas & Topics For Beginners 2024 [Latest]
In this article, you will learn the 40Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Best Simple IoT Proje
Read More

by Kechit Goyal

19 May 2024

Top 22 Artificial Intelligence Project Ideas & Topics for Beginners [2024]
In this article, you will learn the 22 AI project ideas & Topics. Take a glimpse below. Best AI Project Ideas & Topics Predict Housing Price
Read More

by Pavan Vadapalli

18 May 2024

Image Segmentation Techniques [Step By Step Implementation]
What do you see first when you look at your selfie? Your face, right? You can spot your face because your brain is capable of identifying your face an
Read More

by Pavan Vadapalli

16 May 2024

Schedule 1:1 free counsellingTalk to Career Expert
footer sticky close icon