How to Build a CNN Model for Sign Language MNIST Classification?
By Rohit Sharma
Updated on Aug 08, 2025 | 7 min read | 1.55K+ views
Share:
For working professionals
For fresh graduates
More
By Rohit Sharma
Updated on Aug 08, 2025 | 7 min read | 1.55K+ views
Share:
Table of Contents
In order to help the deaf and hard-of-hearing communities communicate, sign language recognition is paramount. For this project, we'll perform sign language MNIST classification on a powerful Kaggle image dataset. This dataset contains grayscale images of American Sign Language (ASL) hand gestures with the same format as the original MNIST digit dataset.
With 24 static sign classes (A-Y, excluding J and Z), computer vision models face a significant challenge in this classification task. In this project, our goal is to create a model that can reliably decipher ASL letters from picture pixels by using machine learning, particularly Convolutional Neural Networks (CNNs).
For more project ideas like this one, check out our blog post - Top 25+ Essential Data Science Projects GitHub to Explore in 2025.
Before starting this project, you have to be familiar with the basic concepts of convolutional neural networks (CNNs) and Python. Also, knowledge of NumPy arrays, image classification, and one-hot encoding will come in handy during the workflow. Knowing how deep learning models represent image data and engage with this data will also be helpful.
We will use the following:
This project should take you 3 - 4 hours. It is perfect for beginners to intermediate learners who are learning about CNNs for the first time.
Let’s start building the project from scratch. So, without wasting any more time, let’s begin!
To begin, we will first download the Kaggle dataset for American Sign Language. The dataset includes pictures of hand signals with labels ranging from 0 to 24 (not including 9).
Use the below-mentioned code :
import kagglehub
# Download latest version
path = kagglehub.dataset_download("datamunge/sign-language-mnist")
print("Path to dataset files:", path)
Output:
Downloading from https://www.kaggle.com/api/v1/datasets/download/datamunge/sign-language-mnist?dataset_version_number=1...
100%|██████████| 62.6M/62.6M [00:02<00:00, 22.6MB/s]Extracting files...
Path to dataset files: /root/.cache/kagglehub/datasets/datamunge/sign-language-mnist/versions/1
In order to comprehend the structure of the data, we will use pandas to load the training and test datasets. Once the dataset is loaded, we will also view the first few rows.
Use the below-mentioned code to do so:
import pandas as pd
# Define file paths
train_path = '/root/.cache/kagglehub/datasets/datamunge/sign-language-mnist/versions/1/sign_mnist_train.csv'
test_path = '/root/.cache/kagglehub/datasets/datamunge/sign-language-mnist/versions/1/sign_mnist_test.csv'
# Load the CSV files
train_df = pd.read_csv(train_path)
test_df = pd.read_csv(test_path)
# Display basic info and first few rows
print("Training data shape:", train_df.shape)
print("Testing data shape:", test_df.shape)
print("\nFirst few rows of training data:")
print(train_df.head())
Output:
Training data shape: (27455, 785)
Testing data shape: (7172, 785)
First few rows of training data:
\ label pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8
0 3 107 118 127 134 139 143 146 150
1 6 155 157 156 156 156 157 156 158
2 2 187 188 188 187 187 186 187 188
3 2 211 211 212 212 211 210 211 210
4 13 164 167 170 172 176 179 180 184
\ pixel9 ... pixel775 pixel776 pixel777 pixel778 pixel779 pixel780
0 153 ... 207 207 207 207 206 206
1 158 ... 69 149 128 87 94 163
2 187 ... 202 201 200 199 198 99
3 210 ... 235 234 233 231 230 226
4 185 ... 92 105 105 108 133 163
pixel781 pixel782 pixel783 pixel784
0 206 204 203 202
1 175 103 135 149
2 198 195 194 195
3 225 222 229 163
4 157 163 164 179
[5 rows x 785 columns]
In this step, the flat vectors are reshaped into 28x28 grayscale image arrays after the pixel values are normalized to the interval [0, 1]. Once done, labels are transformed into a one-hot encoded format for training.
Use the below-mentioned code:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.utils import to_categorical
# Separate features (X) and labels (y)
X_train = train_df.iloc[:, 1:].values
y_train = train_df.iloc[:, 0].values
X_test = test_df.iloc[:, 1:].values
y_test = test_df.iloc[:, 0].values
# Normalize pixel values to [0,1]
X_train = X_train / 255.0
X_test = X_test / 255.0
# Reshape into 28x28 images with 1 channel (grayscale)
X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)
# One-hot encode the labels
y_train_cat = to_categorical(y_train, num_classes=25)
y_test_cat = to_categorical(y_test, num_classes=25)
Before moving ahead, let’s plot the first 12 sign language images to verify. Use the below-mentioned code to do so:
# Plot 12 random training images with labels
plt.figure(figsize=(12, 6))
for i in range(12):
plt.subplot(3, 4, i + 1)
plt.imshow(X_train[i].reshape(28, 28), cmap='gray')
plt.title(f"Label: {y_train[i]}")
plt.axis('off')
plt.suptitle("Sample Sign Language Images", fontsize=16)
plt.tight_layout()
plt.show()
Output:
Popular Data Science Programs
To make sure the label arrays are compatible with the CNN's softmax output layer, we will re-encode them to categorical format using to_categorical(). Use the below-mentioned code to do so:
from tensorflow.keras.utils import to_categorical
# One-hot encode the labels
y_train = to_categorical(y_train, num_classes=25)
y_test = to_categorical(y_test, num_classes=25)
Conv2D and MaxPooling layers will be used in this step to extract features in a sequential CNN. Once done, dense layers with dropout will then be used to minimize overfitting, and a softmax layer will be used for final classification.
Use the below-mentioned code to do so:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
# Initialize the CNN
model = Sequential()
# First convolutional block
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
# Second convolutional block
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# Flatten and fully connected layers
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.3)) # Prevent overfitting
model.add(Dense(25, activation='softmax')) # 25 classes for A-Y (excluding J and Z)
Use the below-mentioned code to compile the model:
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
In this step, the model will be trained over ten epochs using the Adam optimizer. To visually assess performance over time, we will also plot training versus validation accuracy.
# Train the model
history = model.fit(
X_train, y_train,
epochs=10,
batch_size=128,
validation_data=(X_test, y_test)
)
Use the below-mentioned code for plot training and validation accuracy:
import matplotlib.pyplot as plt
# Plot training & validation accuracy values
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy Over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
plt.show()
Output:
To evaluate accuracy and class-wise performance, in this step, we will:
Use the below-mentioned code:
from sklearn.metrics import classification_report, confusion_matrix
# Predict class probabilities
y_pred_probs = model.predict(X_test)
# Convert probabilities to class labels
y_pred_classes = np.argmax(y_pred_probs, axis=1)
# True class labels
y_true = np.argmax(y_test, axis=1)
# Classification report
print("Classification Report:\n")
print(classification_report(y_true, y_pred_classes))
# Confusion matrix
import seaborn as sns
import matplotlib.pyplot as plt
cm = confusion_matrix(y_true, y_pred_classes)
plt.figure(figsize=(12, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
Output:
225/225 ━━━━━━━━━━━━━━━━━━━━ 3s 12ms/step
Classification Report:
precision recall f1-score support
0 0.94 1.00 0.97 331
1 1.00 1.00 1.00 432
2 1.00 1.00 1.00 310
3 1.00 1.00 1.00 245
4 0.94 0.92 0.93 498
5 1.00 1.00 1.00 247
6 0.85 0.94 0.89 348
7 0.91 0.95 0.93 436
8 0.93 0.86 0.89 288
10 1.00 0.91 0.95 331
11 0.91 1.00 0.95 209
12 0.90 0.95 0.92 394
13 1.00 0.70 0.82 291
14 1.00 0.96 0.98 246
15 1.00 0.97 0.99 347
16 0.92 1.00 0.96 164
17 0.99 0.59 0.74 144
18 0.72 0.92 0.80 246
19 0.77 0.58 0.67 248
20 0.88 1.00 0.93 266
21 0.94 1.00 0.97 346
22 0.97 1.00 0.98 206
23 0.92 0.90 0.91 267
24 0.83 0.89 0.86 332
accuracy 0.93 7172
macro avg 0.93 0.92 0.92 7172
weighted avg 0.93 0.93 0.92 7172
In this project, we used the Sign Language MNIST classification dataset to construct and train a CNN model. The model performed exceptionally well on the majority of classes and had an overall accuracy of 93%. Strong gesture recognition is confirmed by the results, although minor misclassifications in a few labels indicate space for improvement.
Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!
Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!
Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!
Colab Link:
https://colab.research.google.com/drive/1cAs_8BryRAGbj8OK89v-YJX1i2-OQMqP?usp=sharing
827 articles published
Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...
Speak with Data Science Expert
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources