Handwritten Digit Recognition with CNN Using Python

By Rohit Sharma

Updated on Jul 28, 2025 | 9 min read | 1.65K+ views

Share:

Recognizing handwritten digits is a very old challenge in machine learning and computer vision. While humans can easily identify digits, computers struggle with the wide variety of handwriting styles. 

In this project, you’ll solve the handwritten digit recognition problem using the Kaggle Sample dataset, which contains thousands of grayscale images of digits from 0 to 9. You will build a Convolutional Neural Network (CNN) to learn and identify these digits accurately. 

Want to turn skills into a career? Learn Python, Machine Learning, and more with upGrad’s job-ready Data Science Courses, built to get you hired faster. Explore now.

Build confidence through code. Explore top Python data science projects and start creating work that stands out to recruiters.

What You Should Know Before You Begin

It’s good to have some basic knowledge of the following before starting this Customer Churn Prediction project:

  • Python programming (variables, functions, loops, basic syntax)
  • Pandas and Numpy (for handling and analyzing data)
  • Matplotlib or Seaborn (for creating charts and visualizing trends)
  • TensorFlow or PyTorch: Build, train, and evaluate Convolutional Neural Networks (CNNs) using these deep learning libraries.
  • Convolutional Neural Networks (CNNs)Know the basic structure of CNNs including layers like Conv2D, Pooling, and Dense layers.
  • Model optimization techniques: Familiarity with data augmentation, overfitting prevention, and hyperparameter tuning will help boost accuracy.

Also Read: PyTorch vs TensorFlow: Making the Right Choice for 2025!

Level up your data science game with upGrad’s top-rated courses. Get mentored by industry pros, build real skills, and fast-track your path to a standout career.

Project Timeline & Difficulty Level

  • Time required: Around 2 to 3 hours
  • Difficulty level: Moderate

This project is ideal if you're confident with Python and want hands-on experience applying deep learning to real-world image data. You’ll learn how to build and optimize a CNN for handwritten digit recognition using Machine Learning.

Tools/Libraries Used in This Project: 

We will use the following tools and Python libraries to build and evaluate the Handwritten Digit Recognition system:

Tool / Library

Purpose

Python The core programming language used to write and execute the model pipeline
NumPy Efficient array operations and numerical computing
TensorFlow / PyTorch Building, training, and testing Convolutional Neural Networks (CNNs)
Google Colab / Jupyter An interactive environment to run code and visualize model results

Also Read - Step-by-Step Guide to Learning Python for Data Science

How the Model Recognizes Handwritten Digits

To identify handwritten digits accurately, you'll apply deep learning techniques alongside image processing steps. Here’s what the project focuses on:

  • Convolutional Neural Networks (CNNs) – Used to extract spatial patterns from handwritten images and classify digits from 0 to 9.
  • Image Preprocessing – Applied steps like grayscale conversion, normalization, and reshaping to prepare images for the model.
  • Data Augmentation – Improved model generalization by slightly modifying training images (rotation, zoom, etc.).
  • Model Evaluation – Assessed accuracy, confusion matrix, and loss curves to fine-tune performance and reduce errors.

Also Read - Explaining 5 Layers of Convolutional Neural Network

From Pixels to Predictions: Building a Handwritten Digit Recognition Model

This section walks you through each step to build a handwritten digit recognition using CNN model from the ground up:

  • Load and Explore the Dataset
  • Preprocess the image data
  • Apply data augmentation
  • Build a Convolutional Neural Network (CNN)
  • Train and test the model
  • Evaluate model performance

Without any further delay, let’s get started!

Step 1: Download the Dataset

Download customer data from Kaggle by searching " Handwritten digit recognition using Machine Learning," downloading the ZIP file, extracting it, and using the CSV file for analysis.

Now, after downloading the dataset, move to the next step.

Step 2: Import Libraries and Load the Dataset in Any IDE

Now that you have downloaded both files, upload them to Google Colab using the code below:

from google.colab import files
uploaded = files.upload()

We start by importing important libraries and loading the dataset.


# Import deep learning and utility libraries

import tensorflow as tf

from tensorflow import keras

from tensorflow.keras import layers

import numpy as np

import matplotlib.pyplot as plt

import cv2

from sklearn.model_selection import train_test_split


# Load the built-in MNIST dataset from TensorFlow

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()


# Display the shape of datasets to understand the structure

print(f"Training data shape: {x_train.shape}")

print(f"Training labels shape: {y_train.shape}")

print(f"Test data shape: {x_test.shape}")

Output : 

Training data shape: (60000, 28, 28)

Training labels shape: (60000,)

Test data shape: (10000, 28, 28) 

Also Read - Top 6 Python IDEs of 2025 That Will Change Your Workflow!

Step 3: Visualize and Preprocess the Data

Before training our model, it's helpful to visualize a few handwritten digits and then preprocess the data so it's ready for input into a Convolutional Neural Network (CNN).

Here is the code for this step : 

# Visualize 12 sample images from the training set
plt.figure(figsize=(12, 8))
for i in range(12):
    plt.subplot(3, 4, i+1)
    plt.imshow(x_train[i], cmap='gray')         # Show grayscale image
    plt.title(f'Label: {y_train[i]}')           # Display the digit label
    plt.axis('off')                              # Hide axis for better clarity
plt.tight_layout()
plt.show()

# Define a function to preprocess the data
def preprocess_data(x_train, x_test, y_train, y_test):
    # Scale pixel values to range 0–1
    x_train = x_train.astype('float32') / 255.0
    x_test = x_test.astype('float32') / 255.0

    # Reshape input data to fit CNN: (samples, height, width, channels)
    x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
    x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

    # Convert labels to one-hot encoded format (for multiclass classification)
    y_train = keras.utils.to_categorical(y_train, 10)
    y_test = keras.utils.to_categorical(y_test, 10)

    return x_train, x_test, y_train, y_test

# Apply preprocessing to the data
x_train, x_test, y_train, y_test = preprocess_data(x_train, x_test, y_train, y_test)

# Check new shape after preprocessing
print(f"Processed training data shape: {x_train.shape}")

Output: 

Processed training data shape: (60000, 28, 28, 1)

Step 4: Build a Convolutional Neural Network (CNN)

Now we’ll define a basic CNN architecture to classify handwritten digit recognition. The model consists of convolutional and pooling layers followed by dense layers for classification.

Here is the code for building a CNN model:

# Define a function to create a basic CNN model

def create_basic_cnn():

    model = keras.Sequential([

        # First Convolution + Pooling

        layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),

        layers.MaxPooling2D((2, 2)),



        # Second Convolution + Pooling

        layers.Conv2D(64, (3, 3), activation='relu'),

        layers.MaxPooling2D((2, 2)),



        # Third Convolution layer (no pooling)

        layers.Conv2D(64, (3, 3), activation='relu'),



        # Flatten the feature maps and add Dense layers

        layers.Flatten(),

        layers.Dense(64, activation='relu'),     # Fully connected layer

        layers.Dropout(0.5),                     # Dropout to reduce overfitting

        layers.Dense(10, activation='softmax')   # Output layer for 10 digit classes

    ])

    return model



# Create the CNN model

model = create_basic_cnn()



# Compile the model with optimizer, loss function, and evaluation metric

model.compile(optimizer='adam',

              loss='categorical_crossentropy',

              metrics=['accuracy'])



# Print a summary of the model architecture

model.summary()

Output: 

Layer Type

Output Shape

Parameters

Conv2D (None, 26, 26, 32) 320
MaxPooling2D (None, 13, 13, 32) 0
Conv2D (None, 11, 11, 64) 18,496
MaxPooling2D (None, 5, 5, 64) 0
Conv2D (None, 3, 3, 64) 36,928
Flatten (None, 576) 0
Dense (None, 64) 36,928
Dropout (None, 64) 0
Dense (Output) (None, 10) 650

Total params: 93,322 (364.54 KB)

Trainable params: 93,322 (364.54 KB)

Non-trainable params: 0 (0.00 B)

Also Read - CNN vs. RNN: Key Differences and Applications Explained

Step 5: Train and Evaluate the CNN Model

Let’s now train the CNN model using the training dataset and visualize its accuracy and loss over the epochs.

Here is the code:

# Train the model on training data for 10 epochs

history = model.fit(

    x_train, y_train,                 # Input images and labels

    batch_size=128,                  # Number of samples processed before model update

    epochs=10,                       # Total training cycles

    validation_data=(x_test, y_test),# Use test set for validation

    verbose=1                        # Display training progress

)



# Function to visualize training and validation performance

def plot_training_history(history):

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))



    # Accuracy over epochs

    ax1.plot(history.history['accuracy'], label='Training Accuracy')

    ax1.plot(history.history['val_accuracy'], label='Validation Accuracy')

    ax1.set_title('Model Accuracy per Epoch')

    ax1.set_xlabel('Epoch')

    ax1.set_ylabel('Accuracy')

    ax1.legend()



    # Loss over epochs

    ax2.plot(history.history['loss'], label='Training Loss')

    ax2.plot(history.history['val_loss'], label='Validation Loss')

    ax2.set_title('Model Loss per Epoch')

    ax2.set_xlabel('Epoch')

    ax2.set_ylabel('Loss')

    ax2.legend()



    plt.tight_layout()

    plt.show()



# Call the function to display accuracy and loss plots

plot_training_history(history)

Output:  

Conclusion: The model shows excellent performance, achieving over 98% accuracy with steadily decreasing loss, indicating strong learning and generalization.

Step 6:  Improve Model with Data Augmentation and Early Stopping

To boost performance and prevent overfitting, we apply random rotations and zooms during training. 

Here is the Code:


# Step 6: Define a CNN model with Data Augmentation and Batch Normalization

def create_augmented_model():

    model = keras.Sequential([

        # Data Augmentation Layers: Improve generalization by rotating and zooming

        layers.RandomRotation(0.1),

        layers.RandomZoom(0.1),



        # Convolutional Block 1

        layers.Conv2D(32, (3, 3), activation='relu'),

        layers.BatchNormalization(),

        layers.MaxPooling2D((2, 2)),



        # Convolutional Block 2

        layers.Conv2D(64, (3, 3), activation='relu'),

        layers.BatchNormalization(),

        layers.MaxPooling2D((2, 2)),



        # Convolutional Block 3

        layers.Conv2D(128, (3, 3), activation='relu'),

        layers.BatchNormalization(),



        # Fully Connected Layers

        layers.Flatten(),

        layers.Dense(128, activation='relu'),

        layers.Dropout(0.5),  # Regularization

        layers.Dense(10, activation='softmax')  # Output for 10 digit classes

    ])

    return model



# Create the improved CNN model

improved_model = create_augmented_model()



# Compile with Adam optimizer and categorical cross-entropy loss

improved_model.compile(optimizer='adam',

                       loss='categorical_crossentropy',

                       metrics=['accuracy'])



# Step 6: Train the Model with Early Stopping to avoid overfitting

early_stopping = keras.callbacks.EarlyStopping(

    monitor='val_accuracy',   # Watch validation accuracy

    patience=3,               # Stop after 3 non-improving epochs

    restore_best_weights=True  # Revert to best weights

)



# Train the model

history_improved = improved_model.fit(

    x_train, y_train,

    batch_size=128,

    epochs=15,

    validation_data=(x_test, y_test),

    callbacks=[early_stopping],  # Apply early stopping

    verbose=1

)

Step 7: Model Evaluation and Testing

This section evaluates the improved CNN model on test data, makes predictions, and visualizes the results to show which digits were predicted correctly or incorrectly.

# Evaluate the improved model on the test set

test_loss, test_accuracy = improved_model.evaluate(x_test, y_test, verbose=0)

print(f"Test Accuracy: {test_accuracy:.4f}")



# Generate predicted probabilities for each digit (0 to 9)

predictions = improved_model.predict(x_test)



# Get the predicted class (highest probability) for each test image

predicted_classes = np.argmax(predictions, axis=1)



# Get the actual class labels from one-hot encoded test labels

true_classes = np.argmax(y_test, axis=1)



# Function to visualize a few predictions from the test set

def visualize_predictions(x_test, true_classes, predicted_classes, num_images=12):

    plt.figure(figsize=(15, 10))



    # Loop through the number of images to display

    for i in range(num_images):

        plt.subplot(3, 4, i + 1)



        # Show the digit image in grayscale

        plt.imshow(x_test[i].reshape(28, 28), cmap='gray')



        # Set title color: green if correct, red if incorrect

        color = 'green' if true_classes[i] == predicted_classes[i] else 'red'



        # Show true and predicted labels

        plt.title(f'True: {true_classes[i]}, Pred: {predicted_classes[i]}', color=color)

        plt.axis('off')



    plt.tight_layout()

    plt.show()



# Call the function to display predictions

visualize_predictions(x_test, true_classes, predicted_classes)

Output:

background

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Conclusion- The final prediction grid shows the model accurately classifies most digits, with only minor errors in visually similar cases.

Final Conclusion: What We Learned from the Handwritten Digit Recognition Project

In this project, we built a complete pipeline for handwritten digit recognition using CNNs. We started by preprocessing the dataset, reshaping and normalizing image data, and converting labels to one-hot encoding.

Our best model achieved high accuracy on unseen test data, correctly identifying most digits. Visualization of predictions made the results easy to interpret. This project reinforced the practical use of CNNs in image classification tasks and highlighted key steps from data handling to model evaluation in computer vision workflows.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Colab Link:
https://colab.research.google.com/drive/1iRXiyBpNrxG2WLqwZHXmUi1RwnLVvj1C?usp=sharing

Frequently Asked Questions (FAQs)

1. What is the objective of the Handwritten Digit Recognition project?

2. Why are Convolutional Neural Networks (CNNs) effective for this task?

3. What preprocessing steps are necessary before training the model?

4. How did data augmentation help improve the model’s performance?

5. What results were achieved, and how can they be used practically?

Rohit Sharma

804 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

360° Career Support

Executive PG Program

12 Months