Named Entity Recognition(NER) Model with BiLSTM and Deep Learning in NLP
By Rohit Sharma
Updated on Jul 31, 2025 | 11 min read | 1.33K+ views
Share:
For working professionals
For fresh graduates
More
By Rohit Sharma
Updated on Jul 31, 2025 | 11 min read | 1.33K+ views
Share:
Table of Contents
Named Entity Recognition (NER) is a key task in Natural Language Processing that identifies and classifies entities like names, locations, organizations, and dates in text.
In this project, you'll build an NER model using a Bidirectional LSTM network with TensorFlow and Keras. You'll also explore how to preprocess data, label sequences, and train deep learning models to recognize entities in a custom text corpus.
Fast-track your data science career with upGrad’s premier Online Data Science Courses. Taught by industry experts, these courses will equip you with real-world skills in Python, Machine Learning, AI, SQL, Tableau, and more, preparing you for immediate job readiness. Enroll today and begin your learning journey!
Transform theoretical concepts into practical expertise. Explore our premier Python Data Science Projects to commence project development.
Popular Data Science Programs
Also Read - 15+ Top Natural Language Processing Techniques To Learn in 2025
Advance your data science career with upGrad's top-ranked courses, offering the opportunity to learn from industry-established mentors.
To build this NER project, you'll use a focused set of Python tools and libraries designed for natural language understanding, deep learning, and efficient model training:
Tool / Library |
Purpose |
Python | Core language for scripting NLP workflows and training models |
Google Colab | Cloud-based environment for running and sharing Jupyter notebooks |
Pandas | Reads and manages structured text data and annotations |
NumPy | Handles numerical operations in tokenization and model input prep |
TensorFlow / Keras | Builds and trains BiLSTM-based entity recognition models |
spaCy | Offers pre-trained pipelines for quick entity tagging and evaluation |
NLTK | Assists with tokenization, POS tagging, and text cleaning |
Matplotlib / Seaborn | Visualizes entity distributions and training performance |
Also Read - Introduction to Deep Learning & Neural Networks with Keras
To build an effective Named Entity Recognition (NER) model, you’ll apply core NLP techniques that help detect and label useful information from raw text:
Also Read - Large Language Models: What They Are, Examples, and Open-Source Disadvantages
You can complete this Named Entity Recognition project in around 5 to 6 hours. It’s well-suited for beginners and intermediate learners who have basic knowledge of Python and want to build real-world NLP pipelines.
Let’s build this project from scratch with a clear, step-by-step workflow:
Alright, let's dive in!
Before building the Named Entity Recognition model, we need to import the essential Python libraries for data handling, model creation, and training.
Here's the code:
# Step 1: Import Required Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Bidirectional, LSTM, TimeDistributed, Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
Also Read - Libraries in Python Explained: List of Important Libraries
In this step, you’ll load the CSV file containing labeled text data and organize it into sentence-wise groups.
This structure is essential for Named Entity Recognition tasks, which operate on sequences.
try:
# Load the dataset
df = pd.read_csv('NER dataset.csv', encoding='latin1')
except FileNotFoundError:
print("Error: 'NER dataset.csv' not found. Please ensure the dataset is in the correct path.")
exit()
# Fill missing sentence identifiers
df = df.fillna(method='ffill')
# Display sample records and metadata
print("Dataset Head:")
print(df.head())
print("\nDataset Info:")
df.info()
# Extract unique words and tags
words = list(set(df["Word"].values))
tags = list(set(df["Tag"].values))
num_words = len(words)
num_tags = len(tags)
print(f"\nNumber of unique words: {num_words}")
print(f"Number of unique tags: {num_tags}")
print(f"Tags: {tags}")
# Define a sentence grouping utility
class SentenceGetter(object):
def __init__(self, data):
self.n_sent = 1
self.data = data
self.empty = False
agg_func = lambda s: [(w, p, t) for w, p, t in zip(s["Word"].values.tolist(),
s["POS"].values.tolist(),
s["Tag"].values.tolist())]
self.grouped = self.data.groupby("Sentence #").apply(agg_func)
self.sentences = [s for s in self.grouped]
# Generate sentence-level data
getter = SentenceGetter(df)
sentences = getter.sentences
# Output sample and statistics
print(f"\nExample sentence (first 5 words): {sentences[0][:5]}")
print(f"Total number of sentences: {len(sentences)}")
Output:
Dataset Head:
Sentence # Word POS Tag
0 Sentence: 1 Thousands NNS O
1 Sentence: 1 of IN O
2 Sentence: 1 demonstrators NNS O
3 Sentence: 1 have VBP O
4 Sentence: 1 marched VBN O
Dataset Info:
RangeIndex: 1048575 entries, 0 to 1048574
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Sentence # 1048575 non-null object
1 Word 1048575 non-null object
2 POS 1048575 non-null object
3 Tag 1048575 non-null object
dtypes: object(4)
Number of unique words: 35177
Number of unique tags: 17
Tags: ['B-nat', 'B-geo', 'B-eve', 'I-per', 'I-nat', 'I-gpe', 'B-art', 'B-org', 'I-eve', 'B-gpe', 'O', 'I-art', 'I-org', 'I-geo', 'B-per', 'I-tim', 'B-tim']
Example sentence (first 5 words): [('Thousands', 'NNS', 'O'), ('of', 'IN', 'O'), ('demonstrators', 'NNS', 'O'), ('have', 'VBP', 'O'), ('marched', 'VBN', 'O')]
Total number of sentences: 47959
Also Read - Data Cleaning Techniques: 15 Simple & Effective Ways To Clean Data
Neural networks can’t understand raw text. To train a model for Named Entity Recognition, you must convert each word and tag into a numerical representation.
Then you’ll pad all sequences to a uniform length to maintain consistency.
# Map each word and tag to a unique index
word2idx = {w: i + 1 for i, w in enumerate(words)} # Index 0 reserved for padding
tag2idx = {t: i for i, t in enumerate(tags)}
# Convert words and tags in sentences to their index values
X = [[word2idx.get(w[0], 0) for w in s] for s in sentences] # Unknown words mapped to 0
y = [[tag2idx[w[2]] for w in s] for s in sentences]
# Pad each sequence to the same length (50)
MAX_LEN = 50
X = pad_sequences(maxlen=MAX_LEN, sequences=X, padding="post", value=0)
y = pad_sequences(maxlen=MAX_LEN, sequences=y, padding="post", value=0)
# Convert tag sequences to one-hot encoding
y = to_categorical(y, num_classes=num_tags)
# Split the data into training and testing sets (80/20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Display data shapes
print(f"\nShape of X_train: {X_train.shape}")
print(f"Shape of y_train: {y_train.shape}")
print(f"Shape of X_test: {X_test.shape}")
print(f"Shape of y_test: {y_test.shape}")
Output:
Step 2: Creating word/tag mappings and padding sequences...
Shape of X_train: (38367, 50)
Shape of y_train: (38367, 50, 17)
Shape of X_test: (9592, 50)
Shape of y_test: (9592, 50, 17)
Also Read - Structured Data vs Semi-Structured Data: Differences, Examples & Challenges
With your data ready, it’s time to define the model. You'll use a Bidirectional LSTM, which captures patterns in both forward and backward directions, essential for understanding context in a sentence.
The code for this step is as follows:
input_layer = Input(shape=(MAX_LEN,))
# Embedding layer to learn word representations
embedding_layer = Embedding(input_dim=num_words + 1, output_dim=100, input_length=MAX_LEN)(input_layer)
# Bidirectional LSTM to capture sequence patterns
lstm_layer = Bidirectional(
LSTM(units=100, return_sequences=True, recurrent_dropout=0.1)
)(embedding_layer)
# TimeDistributed layer applies dense layer at each time step
output_layer = TimeDistributed(Dense(num_tags, activation="softmax"))(lstm_layer)
# Define and compile the model
model = Model(input_layer, output_layer)
model.compile(
optimizer="adam",
loss="categorical_crossentropy",
metrics=["accuracy"]
)
# View model summary
model.summary()
Output:
Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer (InputLayer) │ (None, 50) │ 0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ embedding (Embedding) │ (None, 50, 100) │ 3,517,800 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ bidirectional (Bidirectional) │ (None, 50, 200) │ 160,800 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ time_distributed │ (None, 50, 17) │ 3,417 │
│ (TimeDistributed) │ │ │
└─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 3,682,017 (14.05 MB)
Trainable params: 3,682,017 (14.05 MB)
Non-trainable params: 0 (0.00 B)
Now that your Bi-LSTM model is ready, it's time to train it. You'll feed in your training data, define the batch size, set the number of epochs, and keep a small portion of data for validation to monitor learning progress.
print("\nStep 4: Training the model...")
history = model.fit(
X_train, y_train,
batch_size=32,
epochs=5,
validation_split=0.1,
verbose=1
)
print("\nModel training complete.")
Output:
Training the model...
Epoch 1/5
1080/1080 ━━━━━━━━━━━━━━━━━━━━ 268s 239ms/step - accuracy: 0.9261 - loss: 0.2953 - val_accuracy: 0.9824 - val_loss: 0.0589
Epoch 2/5
1080/1080 ━━━━━━━━━━━━━━━━━━━━ 254s 235ms/step - accuracy: 0.9864 - loss: 0.0450 - val_accuracy: 0.9850 - val_loss: 0.0486
Epoch 3/5
1080/1080 ━━━━━━━━━━━━━━━━━━━━ 261s 234ms/step - accuracy: 0.9901 - loss: 0.0314 - val_accuracy: 0.9858 - val_loss: 0.0473
Epoch 4/5
1080/1080 ━━━━━━━━━━━━━━━━━━━━ 260s 233ms/step - accuracy: 0.9917 - loss: 0.0255 - val_accuracy: 0.9854 - val_loss: 0.0493
Epoch 5/5
1080/1080 ━━━━━━━━━━━━━━━━━━━━ 264s 235ms/step - accuracy: 0.9932 - loss: 0.0208 - val_accuracy: 0.9852 - val_loss: 0.0517
Model training complete.
During training, the model updates its weights to better predict entity tags. Monitoring the validation accuracy after each epoch helps identify whether the model is learning effectively or overfitting.
Also Read- Neural Network Architecture: Types, Components & Key Algorithms
With training complete, it’s time to test how well your Named Entity Recognition model performs on unseen data. You’ll evaluate the model on the test set and run predictions on custom sentences.
# Evaluate the model on the test set
loss, accuracy = model.evaluate(X_test, y_test, verbose=1)
print(f"\nTest Accuracy: {accuracy*100:.2f}%")
print(f"Test Loss: {loss:.4f}")
# Create an inverted index to map tag indices back to tag names
idx2tag = {i: w for w, i in tag2idx.items()}
# Function to predict named entities in a new sentence
def predict_entities(sentence):
# Tokenize and convert words to indices
words = sentence.split()
word_indices = [word2idx.get(w, 0) for w in words] # Use 0 for unknown words
# Pad the input sequence
padded_sequence = pad_sequences([word_indices], maxlen=MAX_LEN, padding="post", value=0)
# Make predictions
p = model.predict(padded_sequence)
p = np.argmax(p, axis=-1)
print(f"\nPrediction for: '{sentence}'")
print("{:15} | {:5}".format("Word", "Tag"))
print("-" * 25)
for w, pred_idx in zip(words, p[0][:len(words)]):
print("{:15} | {:5}".format(w, idx2tag[pred_idx]))
# Run example predictions
test_sentence = "Narendra Modi is the Prime Minister of India"
predict_entities(test_sentence)
test_sentence_2 = "Apple is looking at buying U.K. startup for $1 billion"
predict_entities(test_sentence_2)
Output:
Evaluating the model and making predictions...
300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 38ms/step - accuracy: 0.9857 - loss: 0.0509
Test Accuracy: 98.58%
Test Loss: 0.0506
1/1 ━━━━━━━━━━━━━━━━━━━━ 2s 2s/step
Prediction for: 'Narendra Modi is the Prime Minister of India'
Word | Tag
-------------------------
Narendra | B-nat
Modi | B-nat
is | O
the | O
Prime | B-per
Minister | O
of | O
India | B-geo
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step
Prediction for: 'Apple is looking at buying U.K. startup for $1 billion'
Word | Tag
-------------------------
Apple | B-org
is | O
looking | O
at | O
buying | O
U.K. | B-nat
startup | B-nat
for | O
$1 | B-nat
billion | B-nat
Your Named Entity Recognition model achieved 98.58% accuracy on the test data, a strong result for a beginner-level project.
You now have a working NER model. You can improve it further by:
You successfully built a Named Entity Recognition model using a Bidirectional LSTM network with TensorFlow and Keras. The model achieved high accuracy on the test set and was able to identify entities like names, locations, and organizations from raw text.
While it performed well, a few misclassifications suggest that further improvements are possible with more data or advanced models. This project gave you hands-on experience in sequence labeling, deep learning for NLP, and real-world entity extraction.
Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!
Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!
Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!
Colab Link:
https://colab.research.google.com/drive/1UQFIzPN4_aCcLUAp6GL0axxWRmtg9fbF
804 articles published
Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...
Speak with Data Science Expert
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources