How to Build a CNN Model for Sign Language MNIST Classification?

By Rohit Sharma

Updated on Aug 08, 2025 | 7 min read | 1.55K+ views

Share:

In order to help the deaf and hard-of-hearing communities communicate, sign language recognition is paramount. For this project, we'll perform sign language MNIST classification on a powerful Kaggle image dataset. This dataset contains grayscale images of American Sign Language (ASL) hand gestures with the same format as the original MNIST digit dataset.

With 24 static sign classes (A-Y, excluding J and Z), computer vision models face a significant challenge in this classification task. In this project, our goal is to create a model that can reliably decipher ASL letters from picture pixels by using machine learning, particularly Convolutional Neural Networks (CNNs).

For more project ideas like this one, check out our blog post - Top 25+ Essential Data Science Projects GitHub to Explore in 2025.

What You Must Know?

Before starting this project, you have to be familiar with the basic concepts of convolutional neural networks (CNNs) and Python. Also, knowledge of NumPy arrays, image classification, and one-hot encoding will come in handy during the workflow. Knowing how deep learning models represent image data and engage with this data will also be helpful.

Technologies and Libraries Used in Sign Language MNIST Classification

We will use the following:

Duration and Challenge

This project should take you 3 - 4 hours. It is perfect for beginners to intermediate learners who are learning about CNNs for the first time.

How to Build a Sign Language MNIST Classification Using CNN

Let’s start building the project from scratch. So, without wasting any more time, let’s begin!

Step 1: Download the Dataset

To begin, we will first download the Kaggle dataset for American Sign Language. The dataset includes pictures of hand signals with labels ranging from 0 to 24 (not including 9).

Use the below-mentioned code :

import kagglehub

# Download latest version
path = kagglehub.dataset_download("datamunge/sign-language-mnist")

print("Path to dataset files:", path)

Output:

Downloading from https://www.kaggle.com/api/v1/datasets/download/datamunge/sign-language-mnist?dataset_version_number=1...
100%|██████████| 62.6M/62.6M [00:02<00:00, 22.6MB/s]Extracting files...
Path to dataset files: /root/.cache/kagglehub/datasets/datamunge/sign-language-mnist/versions/1

Step 2: Preview and Load the Data

In order to comprehend the structure of the data, we will use pandas to load the training and test datasets. Once the dataset is loaded, we will also view the first few rows. 

Use the below-mentioned code to do so:

import pandas as pd

# Define file paths
train_path = '/root/.cache/kagglehub/datasets/datamunge/sign-language-mnist/versions/1/sign_mnist_train.csv'
test_path = '/root/.cache/kagglehub/datasets/datamunge/sign-language-mnist/versions/1/sign_mnist_test.csv'

# Load the CSV files
train_df = pd.read_csv(train_path)
test_df = pd.read_csv(test_path)

# Display basic info and first few rows
print("Training data shape:", train_df.shape)
print("Testing data shape:", test_df.shape)
print("\nFirst few rows of training data:")
print(train_df.head())

Output:

Training data shape: (27455, 785)
Testing data shape: (7172, 785)

First few rows of training data:

 \  label  pixel1  pixel2  pixel3  pixel4  pixel5  pixel6  pixel7  pixel8  

0      3     107     118     127     134     139     143     146        150   

1      6     155     157     156     156     156     157     156         158   

2      2     187     188     188     187     187     186     187         188   

3      2     211     211     212     212     211       210     211         210   

4     13     164     167     170     172     176     179     180        184   

\   pixel9  ...  pixel775  pixel776  pixel777  pixel778  pixel779  pixel780  

0     153  ...       207       207       207       207            206             206   

1     158  ...        69        149       128        87               94               163   

2     187  ...       202       201       200       199            198               99   

3     210  ...       235      234       233       231            230              226   

4     185  ...        92       105       105        108            133               163   

   pixel781  pixel782  pixel783  pixel784  

0       206       204       203          202  

1        175       103        135          149  

2       198       195        194          195  

3       225      222        229          163  

4       157       163        164          179  

[5 rows x 785 columns]

Step 3: Get the Information Ready

In this step, the flat vectors are reshaped into 28x28 grayscale image arrays after the pixel values are normalized to the interval [0, 1]. Once done, labels are transformed into a one-hot encoded format for training.

Use the below-mentioned code:

import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.utils import to_categorical

# Separate features (X) and labels (y)
X_train = train_df.iloc[:, 1:].values
y_train = train_df.iloc[:, 0].values

X_test = test_df.iloc[:, 1:].values
y_test = test_df.iloc[:, 0].values

# Normalize pixel values to [0,1]
X_train = X_train / 255.0
X_test = X_test / 255.0

# Reshape into 28x28 images with 1 channel (grayscale)
X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)

# One-hot encode the labels
y_train_cat = to_categorical(y_train, num_classes=25)
y_test_cat = to_categorical(y_test, num_classes=25)

Before moving ahead, let’s plot the first 12 sign language images to verify. Use the below-mentioned code to do so:

# Plot 12 random training images with labels
plt.figure(figsize=(12, 6))
for i in range(12):
    plt.subplot(3, 4, i + 1)
    plt.imshow(X_train[i].reshape(28, 28), cmap='gray')
    plt.title(f"Label: {y_train[i]}")
    plt.axis('off')
plt.suptitle("Sample Sign Language Images", fontsize=16)
plt.tight_layout()
plt.show()

Output:

Step 4: Labels with One-Hot Encoding

To make sure the label arrays are compatible with the CNN's softmax output layer, we will re-encode them to categorical format using to_categorical(). Use the below-mentioned code to do so:

from tensorflow.keras.utils import to_categorical

# One-hot encode the labels
y_train = to_categorical(y_train, num_classes=25)
y_test = to_categorical(y_test, num_classes=25)

Step 5: Construct the CNN Model

Conv2D and MaxPooling layers will be used in this step to extract features in a sequential CNN. Once done, dense layers with dropout will then be used to minimize overfitting, and a softmax layer will be used for final classification.

Use the below-mentioned code to do so:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

# Initialize the CNN
model = Sequential()

# First convolutional block
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))

# Second convolutional block
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# Flatten and fully connected layers
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.3))  # Prevent overfitting
model.add(Dense(25, activation='softmax'))  # 25 classes for A-Y (excluding J and Z)

Use the below-mentioned code to compile the model:

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Step 6: Model Training

In this step, the model will be trained over ten epochs using the Adam optimizer. To visually assess performance over time, we will also plot training versus validation accuracy.

# Train the model
history = model.fit(
    X_train, y_train,
    epochs=10,
    batch_size=128,
    validation_data=(X_test, y_test)
)

Use the below-mentioned code for plot training and validation accuracy:

import matplotlib.pyplot as plt

# Plot training & validation accuracy values
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy Over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
plt.show()

Output:

Step 7: Assess Model Performance

To evaluate accuracy and class-wise performance, in this step, we will: 

  • Create predictions on the test set
  • Compare them with real labels
  • Show a confusion matrix and a classification report

Use the below-mentioned code:

from sklearn.metrics import classification_report, confusion_matrix

# Predict class probabilities
y_pred_probs = model.predict(X_test)

# Convert probabilities to class labels
y_pred_classes = np.argmax(y_pred_probs, axis=1)

# True class labels
y_true = np.argmax(y_test, axis=1)

# Classification report
print("Classification Report:\n")
print(classification_report(y_true, y_pred_classes))

# Confusion matrix
import seaborn as sns
import matplotlib.pyplot as plt

cm = confusion_matrix(y_true, y_pred_classes)
plt.figure(figsize=(12, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

Output:

225/225 ━━━━━━━━━━━━━━━━━━━━ 3s 12ms/step

Classification Report:

              precision    recall  f1-score   support

           0       0.94      1.00      0.97        331

           1       1.00      1.00      1.00          432

           2       1.00      1.00      1.00          310

           3       1.00      1.00      1.00          245

           4       0.94      0.92     0.93         498

           5       1.00      1.00      1.00          247

           6       0.85      0.94      0.89        348

           7       0.91      0.95      0.93        436

           8       0.93      0.86      0.89       288

          10       1.00      0.91      0.95       331

          11       0.91      1.00      0.95       209

          12       0.90      0.95      0.92      394

          13       1.00      0.70      0.82       291

          14       1.00      0.96      0.98       246

          15       1.00      0.97      0.99       347

          16       0.92      1.00      0.96       164

          17       0.99      0.59      0.74       144

          18       0.72      0.92      0.80       246

          19       0.77      0.58      0.67       248

          20       0.88      1.00      0.93       266

          21       0.94      1.00      0.97       346

          22       0.97      1.00      0.98       206

          23       0.92      0.90      0.91       267

          24       0.83      0.89      0.86       332

    accuracy                                    0.93      7172

   macro avg       0.93        0.92      0.92      7172

weighted avg       0.93      0.93      0.92      7172

background

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Conclusion

In this project, we used the Sign Language MNIST classification dataset to construct and train a CNN model. The model performed exceptionally well on the majority of classes and had an overall accuracy of 93%. Strong gesture recognition is confirmed by the results, although minor misclassifications in a few labels indicate space for improvement.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Colab Link:
https://colab.research.google.com/drive/1cAs_8BryRAGbj8OK89v-YJX1i2-OQMqP?usp=sharing

Frequently Asked Questions (FAQs)

1. What is the Sign Language MNIST Classification?

2. Which kind of model is most effective for classifying images in Sign Language?

3. Which preprocessing steps are essential before training?

4. On the Sign Language MNIST dataset, how accurate can a CNN be?

5. What practical uses does this project have?

Rohit Sharma

827 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

360° Career Support

Executive PG Program

12 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree

17 Months

upGrad Logo

Certification

3 Months