Home
Blog
Data Science
How to Build a CNN Model for Sign Language MNIST Classification?

How to Build a CNN Model for Sign Language MNIST Classification?

Q: 1. What is the Sign Language MNIST Classification?

Grayscale pictures of American Sign Language (ASL) letters make up the Sign Language MNIST dataset. With the exception of J and Z, which need motion, each 28x28 pixel image represents a hand gesture that represents a particular letter.

Q: 2. Which kind of model is most effective for classifying images in Sign Language?

The best models for this task are Convolutional Neural Networks, a.k.a., CNNs. These are perfect for classifying hand gestures as they can efficiently extract spatial features and patterns from image data.

Q: 3. Which preprocessing steps are essential before training?

For effective training and increased accuracy, the primary preprocessing steps are: Reshaping the image data Normalizing pixel values Converting class labels into one-hot encoded format

Q: 4. On the Sign Language MNIST dataset, how accurate can a CNN be?

CNN models can reach over 90% accuracy with the right architecture and tuning. We obtained approximately 93% accuracy on the test set in this project.

Q: 5. What practical uses does this project have?

By allowing machines to interpret sign language in real time, this project shows how deep learning can be used to create assistive technologies for the hearing and speech-impaired community.

By Rohit Sharma

Updated on Aug 08, 2025 | 7 min read | 2.23K+ views

Table of Contents

View all

What You Must Know?
Technologies and Libraries Used in Sign Language MNIST Classification
Duration and Challenge
How to Build a Sign Language MNIST Classification Using CNN
Conclusion

In order to help the deaf and hard-of-hearing communities communicate, sign language recognition is paramount. For this project, we'll perform sign language MNIST classification on a powerful Kaggle image dataset. This dataset contains grayscale images of American Sign Language (ASL) hand gestures with the same format as the original MNIST digit dataset.

With 24 static sign classes (A-Y, excluding J and Z), computer vision models face a significant challenge in this classification task. In this project, our goal is to create a model that can reliably decipher ASL letters from picture pixels by using machine learning, particularly Convolutional Neural Networks (CNNs).

For more project ideas like this one, check out our blog post - Top 25+ Essential Data Science Projects GitHub to Explore in 2025.

Popular Data Science Programs

Postgraduate Diploma in Data Science Data Science Advanced Course M Sc in Data Science Degree Data Science Machine Learning Course Cloud Computing Courses Certification

What You Must Know?

Before starting this project, you have to be familiar with the basic concepts of convolutional neural networks (CNNs) and Python. Also, knowledge of NumPy arrays, image classification, and one-hot encoding will come in handy during the workflow. Knowing how deep learning models represent image data and engage with this data will also be helpful.

Technologies and Libraries Used in Sign Language MNIST Classification

We will use the following:

Python: To script the complete process
Keras/TensorFlow: To construct and train the CNN model
Pandas & NumPy: For loading and data preprocessing
Matplotlib: For displaying trends in accuracy
Scikit-learn: For assessing the model with a confusion matrix and classification reports

Data Science Courses to upskill

Explore Data Science Courses for Career Progression

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree18 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

Duration and Challenge

This project should take you 3 - 4 hours. It is perfect for beginners to intermediate learners who are learning about CNNs for the first time.

How to Build a Sign Language MNIST Classification Using CNN

Let’s start building the project from scratch. So, without wasting any more time, let’s begin!

Step 1: Download the Dataset

To begin, we will first download the Kaggle dataset for American Sign Language. The dataset includes pictures of hand signals with labels ranging from 0 to 24 (not including 9).

Use the below-mentioned code :

import kagglehub

# Download latest version
path = kagglehub.dataset_download("datamunge/sign-language-mnist")

print("Path to dataset files:", path)

Output:

Downloading from https://www.kaggle.com/api/v1/datasets/download/datamunge/sign-language-mnist?dataset_version_number=1...
100%|██████████| 62.6M/62.6M [00:02<00:00, 22.6MB/s]Extracting files...
Path to dataset files: /root/.cache/kagglehub/datasets/datamunge/sign-language-mnist/versions/1

Step 2: Preview and Load the Data

In order to comprehend the structure of the data, we will use pandas to load the training and test datasets. Once the dataset is loaded, we will also view the first few rows.

Use the below-mentioned code to do so:

import pandas as pd

# Define file paths
train_path = '/root/.cache/kagglehub/datasets/datamunge/sign-language-mnist/versions/1/sign_mnist_train.csv'
test_path = '/root/.cache/kagglehub/datasets/datamunge/sign-language-mnist/versions/1/sign_mnist_test.csv'

# Load the CSV files
train_df = pd.read_csv(train_path)
test_df = pd.read_csv(test_path)

# Display basic info and first few rows
print("Training data shape:", train_df.shape)
print("Testing data shape:", test_df.shape)
print("\nFirst few rows of training data:")
print(train_df.head())

Output:

Training data shape: (27455, 785)
Testing data shape: (7172, 785)

First few rows of training data:

\ label pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8

0 3 107 118 127 134 139 143 146 150

1 6 155 157 156 156 156 157 156 158

2 2 187 188 188 187 187 186 187 188

3 2 211 211 212 212 211 210 211 210

4 13 164 167 170 172 176 179 180 184

\ pixel9 ... pixel775 pixel776 pixel777 pixel778 pixel779 pixel780

0 153 ... 207 207 207 207 206 206

1 158 ... 69 149 128 87 94 163

2 187 ... 202 201 200 199 198 99

3 210 ... 235 234 233 231 230 226

4 185 ... 92 105 105 108 133 163

pixel781 pixel782 pixel783 pixel784

0 206 204 203 202

1 175 103 135 149

2 198 195 194 195

3 225 222 229 163

4 157 163 164 179

[5 rows x 785 columns]

Step 3: Get the Information Ready

In this step, the flat vectors are reshaped into 28x28 grayscale image arrays after the pixel values are normalized to the interval [0, 1]. Once done, labels are transformed into a one-hot encoded format for training.

Use the below-mentioned code:

import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.utils import to_categorical

# Separate features (X) and labels (y)
X_train = train_df.iloc[:, 1:].values
y_train = train_df.iloc[:, 0].values

X_test = test_df.iloc[:, 1:].values
y_test = test_df.iloc[:, 0].values

# Normalize pixel values to [0,1]
X_train = X_train / 255.0
X_test = X_test / 255.0

# Reshape into 28x28 images with 1 channel (grayscale)
X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)

# One-hot encode the labels
y_train_cat = to_categorical(y_train, num_classes=25)
y_test_cat = to_categorical(y_test, num_classes=25)

Before moving ahead, let’s plot the first 12 sign language images to verify. Use the below-mentioned code to do so:

# Plot 12 random training images with labels
plt.figure(figsize=(12, 6))
for i in range(12):
    plt.subplot(3, 4, i + 1)
    plt.imshow(X_train[i].reshape(28, 28), cmap='gray')
    plt.title(f"Label: {y_train[i]}")
    plt.axis('off')
plt.suptitle("Sample Sign Language Images", fontsize=16)
plt.tight_layout()
plt.show()

Output:

Step 4: Labels with One-Hot Encoding

To make sure the label arrays are compatible with the CNN's softmax output layer, we will re-encode them to categorical format using to_categorical(). Use the below-mentioned code to do so:

from tensorflow.keras.utils import to_categorical

# One-hot encode the labels
y_train = to_categorical(y_train, num_classes=25)
y_test = to_categorical(y_test, num_classes=25)

Step 5: Construct the CNN Model

Conv2D and MaxPooling layers will be used in this step to extract features in a sequential CNN. Once done, dense layers with dropout will then be used to minimize overfitting, and a softmax layer will be used for final classification.

Use the below-mentioned code to do so:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

# Initialize the CNN
model = Sequential()

# First convolutional block
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))

# Second convolutional block
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# Flatten and fully connected layers
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.3))  # Prevent overfitting
model.add(Dense(25, activation='softmax'))  # 25 classes for A-Y (excluding J and Z)

Use the below-mentioned code to compile the model:

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Step 6: Model Training

In this step, the model will be trained over ten epochs using the Adam optimizer. To visually assess performance over time, we will also plot training versus validation accuracy.

# Train the model
history = model.fit(
    X_train, y_train,
    epochs=10,
    batch_size=128,
    validation_data=(X_test, y_test)
)

Use the below-mentioned code for plot training and validation accuracy:

import matplotlib.pyplot as plt

# Plot training & validation accuracy values
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy Over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
plt.show()

Output:

Step 7: Assess Model Performance

To evaluate accuracy and class-wise performance, in this step, we will:

Create predictions on the test set
Compare them with real labels
Show a confusion matrix and a classification report

Use the below-mentioned code:

from sklearn.metrics import classification_report, confusion_matrix

# Predict class probabilities
y_pred_probs = model.predict(X_test)

# Convert probabilities to class labels
y_pred_classes = np.argmax(y_pred_probs, axis=1)

# True class labels
y_true = np.argmax(y_test, axis=1)

# Classification report
print("Classification Report:\n")
print(classification_report(y_true, y_pred_classes))

# Confusion matrix
import seaborn as sns
import matplotlib.pyplot as plt

cm = confusion_matrix(y_true, y_pred_classes)
plt.figure(figsize=(12, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

Output:

225/225 ━━━━━━━━━━━━━━━━━━━━ 3s 12ms/step

Classification Report:

precision recall f1-score support

0 0.94 1.00 0.97 331

1 1.00 1.00 1.00 432

2 1.00 1.00 1.00 310

3 1.00 1.00 1.00 245

4 0.94 0.92 0.93 498

5 1.00 1.00 1.00 247

6 0.85 0.94 0.89 348

7 0.91 0.95 0.93 436

8 0.93 0.86 0.89 288

10 1.00 0.91 0.95 331

11 0.91 1.00 0.95 209

12 0.90 0.95 0.92 394

13 1.00 0.70 0.82 291

14 1.00 0.96 0.98 246

15 1.00 0.97 0.99 347

16 0.92 1.00 0.96 164

17 0.99 0.59 0.74 144

18 0.72 0.92 0.80 246

19 0.77 0.58 0.67 248

20 0.88 1.00 0.93 266

21 0.94 1.00 0.97 346

22 0.97 1.00 0.98 206

23 0.92 0.90 0.91 267

24 0.83 0.89 0.86 332

accuracy 0.93 7172

macro avg 0.93 0.92 0.92 7172

weighted avg 0.93 0.93 0.92 7172

Conclusion

In this project, we used the Sign Language MNIST classification dataset to construct and train a CNN model. The model performed exceptionally well on the majority of classes and had an overall accuracy of 93%. Strong gesture recognition is confirmed by the results, although minor misclassifications in a few labels indicate space for improvement.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Explore our Popular Data Science Courses

Executive Post Graduate Programme in Data Science from IIITB	Data Science Bootcamp with AI	Master of Science in Data Science from LJMU
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Courses

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Top Data Science Skills to Learn

Data Analysis Course	Inferential Statistics Courses
Hypothesis Testing Programs	Logistic Regression Courses
Linear Regression Courses	Linear Algebra for Analysis

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Read our popular Data Science Articles

Is Data Science Hard to Learn	Data Science Career Growth	What Is Data Science? Courses, Basics, Frameworks & Careers
Future of Data Science in India	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist

Colab Link:
https://colab.research.google.com/drive/1cAs_8BryRAGbj8OK89v-YJX1i2-OQMqP?usp=sharing

Frequently Asked Questions (FAQs)

1. What is the Sign Language MNIST Classification?

Grayscale pictures of American Sign Language (ASL) letters make up the Sign Language MNIST dataset. With the exception of J and Z, which need motion, each 28x28 pixel image represents a hand gesture that represents a particular letter.

2. Which kind of model is most effective for classifying images in Sign Language?

The best models for this task are Convolutional Neural Networks, a.k.a., CNNs. These are perfect for classifying hand gestures as they can efficiently extract spatial features and patterns from image data.

3. Which preprocessing steps are essential before training?

For effective training and increased accuracy, the primary preprocessing steps are:

Reshaping the image data
Normalizing pixel values
Converting class labels into one-hot encoded format

4. On the Sign Language MNIST dataset, how accurate can a CNN be?

CNN models can reach over 90% accuracy with the right architecture and tuning. We obtained approximately 93% accuracy on the test set in this project.

5. What practical uses does this project have?

By allowing machines to interpret sign language in real time, this project shows how deep learning can be used to create assistive technologies for the hearing and speech-impaired community.

Rohit Sharma

840 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources