Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconImage Classification in CNN: Everything You Need to Know

Image Classification in CNN: Everything You Need to Know

Last updated:
25th Feb, 2021
Read Time
14 Mins
share image icon
In this article
Chevron in toc
View All
Image Classification in CNN: Everything You Need to Know


While going through the Facebook feed, have you ever wondered how the people in a group photo are automatically labelled by Facebook’s software? Behind every interactive user interface of Facebook you see, there is a complex and strong algorithm that is used to recognize and label each picture that is uploaded by us on to the social media platform. With every picture of ours, we only aid in improving the algorithm’s efficiency. Yes, Image Classification is one of the most widely used algorithms where we see the application of Artificial Intelligence.

Top Machine Learning and AI Courses Online

In recent times, Convolutional Neural Networks (CNN) has become one of the strongest proponents of Deep Learning. One popular application of these Convolutional Networks is Image Classification. In this tutorial, we will go through the basics of Convolutional Neural Networks, see the various layers involved in building a CNN model and finally visualize an example of the Image Classification task.

Image Classification 

Before we get into the details of Deep Learning and Convolutional Neural Networks, let us understand the basics of Image Classification. In general, Image Classification is defined as the task in which we give an image as the input to a model built using a specific algorithm that outputs the class or the probability of the class that the image belongs to. This process in which we label an image to a particular class is called Supervised Learning. 

Ads of upGrad blog

Trending Machine Learning Skills

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

There is a huge difference between how we see an image and how the machine (computer) sees the same image. To us, we are able to visualize the image and characterize it based on colour and size. On the other hand, to the machine, all it gets to see are numbers. The numbers that are seen are called pixels.

Each pixel has a value between 0 and 255. Hence, with these numerical data, the machine requires some pre-processing steps in order to derive some specific patterns or features that distinguish one image from the other. Convolutional Neural Networks help us build algorithms that are capable of deriving the specific pattern from images.

What We See Vs What the Computer Sees

SourceDifference between Computer and Human Eye

Deep Learning for Image Classification 

Now that we have understood what is Image Classification, let us now see how we can implement it using Artificial Intelligence. For this, we use the popular Deep Learning methods. Deep Learning is a subset of Artificial Intelligence that makes use of large image datasets to recognize and derive patterns from various images to differentiate between various classes present in the image dataset.

The major challenge that Deep Learning faces is that for a huge database, it takes a very long time and it has a high computational cost. However, the Convolutional Neural Networks, which is a type of Deep Learning algorithm addresses this problem well.

Convolutional Neural Networks 

In Deep Learning, Convolutional Neural Networks are a class of Deep Neural Networks that are mostly used in visual imagery. They are a special architecture of the Artificial Neural Networks (ANN) which was proposed in 1998 by Yann LeCunn. The Convolutional Neural Networks consist of two parts.

The first part consists of the Convolutional layers and the Pooling layers in which the main feature extraction process takes place. In the second part, the Fully Connected and the Dense layers perform several non-linear transformations on the extracted features and act as the classifier part. Learn CNN for image classification.

Consider the above-shown image example of what the human and the machine sees. As we see, the computer sees an array of pixels. For example, if the image size if 500×500, then the size of the array will be 500x500x3. Here, 500 stands for each height and width, 3 stands for the RGB channel where each colour channel is represented by a separate array. The pixel intensity varies from 0 to 255. 

Now for Image Classification, the computer will look for the features at the base level. According to us as humans, these base-level features of the cat are its ears, nose and whiskers. While for the computer, these base-level features are the curvatures and boundaries. In this way by using several different layers such as the Convolutional layers and the Pooling layers, the computer extracts the base level features from the images. 

In the Convolutional Neural Network model, there are several types of layers such as the – 

  • Input Layer
  • Convolutional Layer
  • Pooling Layer
  • Fully Connected Layer
  • Output Layer
  • Activation Functions

Let us go through each of the layers in brief before we get into its application in Image Classification.

Input Layer  

From the name, we understand that this is the layer in which the input image will be fed into the CNN model. Depending upon our requirement, we can reshape the image to different sizes such as (28,28,3)

Convolutional Layer 

Then comes the most important layer which consists of a filter (also known as a kernel) with a fixed size. The mathematical operation of Convolution is performed between the input image and the filter. This is the stage in which most of the base features such as sharp edges and curves are extracted from the image and hence this layer is also known as the feature extractor layer.

Pooling Layer  

After performing the convolution operation, we perform the Pooling operation. This is also known as downsampling where the spatial volume of the image is reduced. For example, if we perform a Pooling operation with a stride of 2 on an image with dimensions 28×28, then the image size reduced to 14×14, it gets reduced to half of its original size.

Fully Connected Layer  

The Fully Connected Layer (FC) is placed just before the final classification output of the CNN model. These layers are used to flatten the results before classifying. It involves several biases, weights and neurons. Attaching an FC layer before classification results in an N-dimensional vector where N is a number of classes out of which the model has to choose a class.

Output Layer 

Finally, the Output Layer consists of the label which is mostly encoded by using the one-hot encoding method.

Activation Function 

These Activation Functions are the core of any Convolutional Neural Network model. These functions are used to determine the output of a neural network. In short, it determines whether a particular neuron should be activated (“fired”) or not. These are usually non-linear functions that are performed on the input signals. This transformed output is then sent as an input to the next layer of neurons. There are several activation functions such as the Sigmoid, ReLU, Leaky ReLU, TanH and Softmax.

Basic CNN Architecture 

Source: Basic CNN Architecture

As defined earlier the above-shown diagram is the basic architecture of a Convolutional Neural Network model. Now that we are ready with the basics of Image Classification and CNN, let us now dive into its application with a real-time problem. Learn more about basic CNN architecture.

Convolutional Neural Networks Implementation 

Now that we have understood the basics of Image Classification and Convolutional Neural Networks, let us visualize its implementation in TensorFlow/Keras with Python coding. In this, we shall build a simple Convolutional Neural Network Model with a Basic LeNet Architecture, train the model on a training set & test set and finally obtain the accuracy of the model on the test set data.

Problem Set  

In this article for building and training the Convolutional Neural Network Model, we shall be using the famous Fashion MNIST dataset. MNIST stands for Modified National Institute of Standards and Technology. Fashion-MNIST is a dataset of Zalando’s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28×28 grayscale image, associated with a label from 10 classes. 

Each training and test example is assigned to one of the following labels:

0 – T-shirt/top

1 – Trouser

2 – Pullover

3 – Dress

4 – Coat

5 – Sandal

6 – Shirt

7 – Sneaker

8 – Bag

9 – Ankle Boots

Source: Fashion MNIST Dataset Images

Program Code 

Step 1 – Importing the Libraries 

The First step to building any Deep Learning model is to import the libraries that are necessary for the program. In our example, as we are using the TensorFlow framework, we shall import the Keras library and also other important libraries such as the number for calculation and the matplotlib for plotting the plots.

#TensorFlow – Importing the Libraries

import numpy as np 

import matplotlib.pyplot as plt

%matplotlib inline  

import tensorflow as tf

from tensorflow import Keras

Step 2 – Getting and Splitting the Dataset

Once we have imported the libraries, the next step is to download the dataset and split the Fashion MNIST dataset into the respective 60,000 training and 10,000 test data. Fortunately, keras provides us with a predefined function to import the Fashion MNIST dataset and we can split them in the next line using a simple line of code that is self-understood.

#TensorFlow – Getting and Splitting the Dataset

fashion_mnist = keras.datasets.fashion_mnist

(train_images_tf, train_labels_tf), (test_images_tf, test_labels_tf) = fashion_mnist.load_data()

Step 3 – Visualizing the Data

As the dataset is downloaded along with the images and their corresponding labels, to make it more clear to the user, it is always advised to view the data so that we can understand the type of data that we are dealing with the build the Convolutional Neural Network Model accordingly. Here, with this simple block of code given below, we shall visualize the first 3 images of the training dataset that is shuffled randomly.

#TensorFlow – Visualizing the Data

def imshowTensorFlow(img):

  plt.imshow(img, cmap=’gray’)

  print(“Label:”, img[0])


Label: 9              Label: 0          Label: 3

The above-given image and their labels can be verified with the labels which are given in the Fashion MNIST dataset details above. From this, we infer that our data image is a grayscale image with a height of 28 pixels and a width of 28 pixels. 

Hence, the model can be built with an input size of (28,28,1), where 1 stands for the grayscale image.

Step 4 – Building the Model

As mentioned above, in this article we will be building a simple Convolutional Neural Network with the LeNet architecture. LeNet is a convolutional neural network structure proposed by Yann LeCun et al. in 1989. In general, LeNet refers to LeNet-5 and is a simple Convolutional Neural Network.

Source: The LeNet Architecture

From the above-given Architecture diagram of the LeNet CNN Model, we see that there are 5+2 layers. The first and second layers are a Convolutional layer followed by a Pooling layer. Again, the third and fourth layers consist of a Convolutional layer and a Pooling layer. As a result of these operations, the size of the input image from 28×28 reduces to 7×7.

The fifth layer of the LeNet Model is the Fully Connected Layer which flattens the previous layer’s output. Followed by two Dense layers, the final output layer of the CNN model consist of a Softmax activation function with 10 units. Softmax function predicts a class probability for each of the 10 classes of the Fashion MNIST dataset.

#TensorFlow – Building the Model

model = keras.Sequential([

    keras.layers.Conv2D(input_shape=(28,28,1), filters=6, kernel_size=5, strides=1, padding=”same”, activation=tf.nn.relu),

    keras.layers.AveragePooling2D(pool_size=2, strides=2),

    keras.layers.Conv2D(16, kernel_size=5, strides=1, padding=”same”, activation=tf.nn.relu),

    keras.layers.AveragePooling2D(pool_size=2, strides=2),


    keras.layers.Dense(120, activation=tf.nn.relu),

    keras.layers.Dense(84, activation=tf.nn.relu),

    keras.layers.Dense(10, activation=tf.nn.softmax)


Step 5 – Model Summary

Once the layers of the LeNet model are finalized, we can proceed to compile the model and view a summaried version of the CNN model designed.

#TensorFlow – Model Summary





In this, as the final output has more than 2 classes (10 classes), we use the categorical crossentropy as the loss function and the Adam Optimizer to our model built. The model summary is given below.

Step 6 – Training the Model 

Finally, we come to the part where we begin the training process of the LeNet CNN model. Firstly, we reshape the training dataset and normalize it to smaller values by dividing with 255.0 to reduce the computational cost. Then the training labels are converted from an integer class vector to a binary class matrix. For example, label 3 is converted to [0, 0, 0, 1, 0, 0, 0, 0, 0]

#TensorFlow – Training the Model

train_images_tensorflow = (train_images_tf / 255.0).reshape(train_images_tf.shape[0], 28, 28, 1)

test_images_tensorflow = (test_images_tf / 255.0).reshape(test_images_tf.shape[0], 28, 28 ,1)



H =, train_labels_tensorflow, epochs=30, batch_size=32)

At the end of training after 30 epochs, we obtain the final training accuracy and loss as,

Epoch 30/30

1875/1875 [==============================] – 4s 2ms/step – loss: 0.0421 – acc: 0.9850

Training Accuracy:  98.294997215271 %

Training Loss:  0.04584110900759697

Step 7 – Predicting the Results

Finally, once we are done with our training process of the CNN model, we shall fit the same model on the test dataset and predict the accuracy of 10,000 test images.

#TensorFlow – Comparing the Results

predictions = model.predict(test_images_tensorflow)

correct = 0

for i, pred in enumerate(predictions):

  if np.argmax(pred) == test_labels_tf[i]:

    correct += 1

print(‘Test Accuracy of the model on the {} test images: {}% with TensorFlow’.format(test_images_tf.shape[0],100 * correct/test_images_tf.shape[0]))

Also Read: Machine Learning Project Ideas

The output that we get is,

Test Accuracy of the model on the 10000 test images: 90.67% with TensorFlow

Ads of upGrad blog

With this, we come to an end to the program on building an Image Classification Model with Convolutional Neural Networks. 

Popular AI and ML Blogs & Free Courses


Thus, in this tutorial on implementing Image Classification in CNN, we have understood the basic concepts behind Image Classification, Convolutional Neural Networks along with its implementation in Python programming language with TensorFlow framework.

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.


Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select Coursecaret down icon
Selectcaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Frequently Asked Questions (FAQs)

1Which CNN model is considered to be the most optimum for image classification?

The best CNN model for image classification is the VGG-16, which stands for Very Deep Convolutional Networks for Large-Scale Image Recognition. VGG, which was designed as a deep CNN, outperforms baselines on a wide range of tasks and datasets outside of ImageNet. The model's distinguishing feature is that when it was being created, more attention was placed on incorporating excellent convolution layers rather than focusing on adding a large number of hyper parameters. It has a total of 16 layers, 5 blocks, and each block has a maximum pooling layer, making it a quite large network.

2What are the disadvantages of using CNN models for image classification?

When it comes to image classification, CNN models are highly successful. However, there are several drawbacks to employing CNNs. If the picture to be identified is slanted or rotated, the CNN model has problems accurately identifying the image. When CNN visualizes the images, there are no internal representations of the components and their part-whole connections. Furthermore, if the CNN model to be employed includes numerous convolutional layers, the classification process will take a long time.

3Why is the use of the CNN model preferred over the ANN for image data as input?

By combining filters or transformations, CNN can learn many layers of feature representations for every image provided as input. Overfitting is decreased since the number of parameters for the network to learn in CNN is substantially smaller than in multilayer neural networks. When using ANN, neural networks may learn a single feature representation of the image, but, in the case of complex images, ANN will fail to provide improved visualizations or classifications since it cannot learn pixel dependencies existing in the input images.

Explore Free Courses

Suggested Blogs

Artificial Intelligence course fees
Artificial intelligence (AI) was one of the most used words in 2023, which emphasizes how important and widespread this technology has become. If you
Read More

by venkatesh Rajanala

29 Feb 2024

Artificial Intelligence in Banking 2024: Examples & Challenges
Introduction Millennials and their changing preferences have led to a wide-scale disruption of daily processes in many industries and a simultaneous g
Read More

by Pavan Vadapalli

27 Feb 2024

Top 9 Python Libraries for Machine Learning in 2024
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 Feb 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 Feb 2024

Data Preprocessing in Machine Learning: 7 Easy Steps To Follow
Summary: In this article, you will learn about data preprocessing in Machine Learning: 7 easy steps to follow. Acquire the dataset Import all the cr
Read More

by Kechit Goyal

18 Feb 2024

Artificial Intelligence Salary in India [For Beginners & Experienced] in 2024
Artificial Intelligence (AI) has been one of the hottest buzzwords in the tech sphere for quite some time now. As Data Science is advancing, both AI a
Read More

by upGrad

18 Feb 2024

24 Exciting IoT Project Ideas & Topics For Beginners 2024 [Latest]
Summary: In this article, you will learn the 24 Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Smart Agr
Read More

by Kechit Goyal

18 Feb 2024

Natural Language Processing (NLP) Projects & Topics For Beginners [2023]
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

17 Feb 2024

45+ Interesting Machine Learning Project Ideas For Beginners [2024]
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

16 Feb 2024

Schedule 1:1 free counsellingTalk to Career Expert
footer sticky close icon