Home
Blog
Data Science
Named Entity Recognition(NER) Model with BiLSTM and Deep Learning in NLP

Named Entity Recognition(NER) Model with BiLSTM and Deep Learning in NLP

Updated on Jul 31, 2025 | 11 min read | 1.61K+ views

Table of Contents

View all

Prerequisites for Successful Project Development
Tools Powering Named Entity Recognition: An Inside Look
Techniques Used For Named Entity Recognition: Smart Insights
Our Plan & Timeline
How to Build a Named Entity Recognition (NER) Model
Final Conclusion

Named Entity Recognition (NER) is a key task in Natural Language Processing that identifies and classifies entities like names, locations, organizations, and dates in text.

In this project, you'll build an NER model using a Bidirectional LSTM network with TensorFlow and Keras. You'll also explore how to preprocess data, label sequences, and train deep learning models to recognize entities in a custom text corpus.

Fast-track your data science career with upGrad’s premier Online Data Science Courses. Taught by industry experts, these courses will equip you with real-world skills in Python, Machine Learning, AI, SQL, Tableau, and more, preparing you for immediate job readiness. Enroll today and begin your learning journey!

Transform theoretical concepts into practical expertise. Explore our premier Python Data Science Projects to commence project development.

Popular Data Science Programs

PG Diploma in Data Science Data Science Machine Learning Course Post Graduate Certificate in Data Science MS in Data Science DevOps Course Online

Prerequisites for Successful Project Development

Basic understanding of Python programming
Familiarity with Natural Language Processing (NLP) concepts
Experience with TensorFlow and Keras for deep learning
Knowledge of sequence models like LSTM and BiLSTM
Understanding of word embeddings and text preprocessing
Ability to label and tokenize text for training
Awareness of evaluation metrics like accuracy and F1-score

Also Read - 15+ Top Natural Language Processing Techniques To Learn in 2025

Advance your data science career with upGrad's top-ranked courses, offering the opportunity to learn from industry-established mentors.

Tools Powering Named Entity Recognition: An Inside Look

To build this NER project, you'll use a focused set of Python tools and libraries designed for natural language understanding, deep learning, and efficient model training:

Tool / Library	Purpose
Python	Core language for scripting NLP workflows and training models
Google Colab	Cloud-based environment for running and sharing Jupyter notebooks
Pandas	Reads and manages structured text data and annotations
NumPy	Handles numerical operations in tokenization and model input prep
TensorFlow / Keras	Builds and trains BiLSTM-based entity recognition models
spaCy	Offers pre-trained pipelines for quick entity tagging and evaluation
NLTK	Assists with tokenization, POS tagging, and text cleaning
Matplotlib / Seaborn	Visualizes entity distributions and training performance

Also Read - Introduction to Deep Learning & Neural Networks with Keras

Techniques Used For Named Entity Recognition: Smart Insights

To build an effective Named Entity Recognition (NER) model, you’ll apply core NLP techniques that help detect and label useful information from raw text:

Text Preprocessing: Clean the input by tokenizing text, removing punctuation, and normalizing words to improve model accuracy.
Sequence Tagging with BiLSTM: Use Bidirectional LSTM networks to learn context from both directions and label entities like names, organizations, and locations.
Transfer Learning with spaCy and BERT: Apply pre-trained NLP models to boost recognition quality without extensive training from scratch.
Entity Annotation and Labeling: Define custom entity categories and manually tag training samples to prepare your data for supervised learning.
Model Evaluation: Use metrics like precision, recall, and F1-score to evaluate how well your model detects named entities.

Also Read - Large Language Models: What They Are, Examples, and Open-Source Disadvantages

Our Plan & Timeline

You can complete this Named Entity Recognition project in around 5 to 6 hours. It’s well-suited for beginners and intermediate learners who have basic knowledge of Python and want to build real-world NLP pipelines.

How to Build a Named Entity Recognition (NER) Model

Let’s build this project from scratch with a clear, step-by-step workflow:

Load the Dataset
Clean and Preprocess the Text
Label and Format Data for NER
Build the BiLSTM NER Model
Train the Model on Labeled Sequences
Test on Custom Sentences

Alright, let's dive in!

Step 1: Import Required Libraries

Before building the Named Entity Recognition model, we need to import the essential Python libraries for data handling, model creation, and training.

Here's the code:

# Step 1: Import Required Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Bidirectional, LSTM, TimeDistributed, Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split

Also Read - Libraries in Python Explained: List of Important Libraries

Step 2: Load and Preprocess the NER Dataset

In this step, you’ll load the CSV file containing labeled text data and organize it into sentence-wise groups.

This structure is essential for Named Entity Recognition tasks, which operate on sequences.

try:
    # Load the dataset
    df = pd.read_csv('NER dataset.csv', encoding='latin1')
except FileNotFoundError:
    print("Error: 'NER dataset.csv' not found. Please ensure the dataset is in the correct path.")
    exit()

# Fill missing sentence identifiers
df = df.fillna(method='ffill')

# Display sample records and metadata
print("Dataset Head:")
print(df.head())
print("\nDataset Info:")
df.info()

# Extract unique words and tags
words = list(set(df["Word"].values))
tags = list(set(df["Tag"].values))
num_words = len(words)
num_tags = len(tags)
print(f"\nNumber of unique words: {num_words}")
print(f"Number of unique tags: {num_tags}")
print(f"Tags: {tags}")

# Define a sentence grouping utility
class SentenceGetter(object):
    def __init__(self, data):
        self.n_sent = 1
        self.data = data
        self.empty = False
        agg_func = lambda s: [(w, p, t) for w, p, t in zip(s["Word"].values.tolist(),
                                                           s["POS"].values.tolist(),
                                                           s["Tag"].values.tolist())]
        self.grouped = self.data.groupby("Sentence #").apply(agg_func)
        self.sentences = [s for s in self.grouped]

# Generate sentence-level data
getter = SentenceGetter(df)
sentences = getter.sentences

# Output sample and statistics
print(f"\nExample sentence (first 5 words): {sentences[0][:5]}")
print(f"Total number of sentences: {len(sentences)}")

Output:

Dataset Head:

Sentence # Word POS Tag

0 Sentence: 1 Thousands NNS O

1 Sentence: 1 of IN O

2 Sentence: 1 demonstrators NNS O

3 Sentence: 1 have VBP O

4 Sentence: 1 marched VBN O

Dataset Info:

RangeIndex: 1048575 entries, 0 to 1048574

Data columns (total 4 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Sentence # 1048575 non-null object

1 Word 1048575 non-null object

2 POS 1048575 non-null object

3 Tag 1048575 non-null object

dtypes: object(4)

Number of unique words: 35177

Number of unique tags: 17

Tags: ['B-nat', 'B-geo', 'B-eve', 'I-per', 'I-nat', 'I-gpe', 'B-art', 'B-org', 'I-eve', 'B-gpe', 'O', 'I-art', 'I-org', 'I-geo', 'B-per', 'I-tim', 'B-tim']

Example sentence (first 5 words): [('Thousands', 'NNS', 'O'), ('of', 'IN', 'O'), ('demonstrators', 'NNS', 'O'), ('have', 'VBP', 'O'), ('marched', 'VBN', 'O')]

Total number of sentences: 47959

Also Read - Data Cleaning Techniques: 15 Simple & Effective Ways To Clean Data

Step 3: Create Word and Tag Mappings with Padded Sequences

Neural networks can’t understand raw text. To train a model for Named Entity Recognition, you must convert each word and tag into a numerical representation.

Then you’ll pad all sequences to a uniform length to maintain consistency.

# Map each word and tag to a unique index
word2idx = {w: i + 1 for i, w in enumerate(words)}  # Index 0 reserved for padding
tag2idx = {t: i for i, t in enumerate(tags)}

# Convert words and tags in sentences to their index values
X = [[word2idx.get(w[0], 0) for w in s] for s in sentences]  # Unknown words mapped to 0
y = [[tag2idx[w[2]] for w in s] for s in sentences]

# Pad each sequence to the same length (50)
MAX_LEN = 50
X = pad_sequences(maxlen=MAX_LEN, sequences=X, padding="post", value=0)
y = pad_sequences(maxlen=MAX_LEN, sequences=y, padding="post", value=0)

# Convert tag sequences to one-hot encoding
y = to_categorical(y, num_classes=num_tags)

# Split the data into training and testing sets (80/20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display data shapes
print(f"\nShape of X_train: {X_train.shape}")
print(f"Shape of y_train: {y_train.shape}")
print(f"Shape of X_test: {X_test.shape}")
print(f"Shape of y_test: {y_test.shape}")

Output:

Step 2: Creating word/tag mappings and padding sequences...

Shape of X_train: (38367, 50)
Shape of y_train: (38367, 50, 17)
Shape of X_test: (9592, 50)
Shape of y_test: (9592, 50, 17)

Also Read - Structured Data vs Semi-Structured Data: Differences, Examples & Challenges

Data Science Courses to upskill

Explore Data Science Courses for Career Progression

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

Step 4: Build the Bi-LSTM Model for Named Entity Recognition

With your data ready, it’s time to define the model. You'll use a Bidirectional LSTM, which captures patterns in both forward and backward directions, essential for understanding context in a sentence.

The code for this step is as follows:

input_layer = Input(shape=(MAX_LEN,))

# Embedding layer to learn word representations
embedding_layer = Embedding(input_dim=num_words + 1, output_dim=100, input_length=MAX_LEN)(input_layer)

# Bidirectional LSTM to capture sequence patterns
lstm_layer = Bidirectional(
    LSTM(units=100, return_sequences=True, recurrent_dropout=0.1)
)(embedding_layer)

# TimeDistributed layer applies dense layer at each time step
output_layer = TimeDistributed(Dense(num_tags, activation="softmax"))(lstm_layer)

# Define and compile the model
model = Model(input_layer, output_layer)
model.compile(
    optimizer="adam",
    loss="categorical_crossentropy",
    metrics=["accuracy"]
)

# View model summary
model.summary()

Output:

Model: "functional"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓

┃ Layer (type) ┃ Output Shape ┃ Param # ┃

┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩

│ input_layer (InputLayer) │ (None, 50) │ 0 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ embedding (Embedding) │ (None, 50, 100) │ 3,517,800 │

│ bidirectional (Bidirectional) │ (None, 50, 200) │ 160,800 │

│ time_distributed │ (None, 50, 17) │ 3,417 │

│ (TimeDistributed) │ │ │

└─────────────────────────────────┴────────────────────────┴───────────────┘

Total params: 3,682,017 (14.05 MB)
Trainable params: 3,682,017 (14.05 MB)
Non-trainable params: 0 (0.00 B)

Step 5: Train the Named Entity Recognition Model

Now that your Bi-LSTM model is ready, it's time to train it. You'll feed in your training data, define the batch size, set the number of epochs, and keep a small portion of data for validation to monitor learning progress.

print("\nStep 4: Training the model...")
history = model.fit(
    X_train, y_train,
    batch_size=32,
    epochs=5,
    validation_split=0.1,
    verbose=1
)
print("\nModel training complete.")

Output:

Training the model...

Epoch 1/5

1080/1080 ━━━━━━━━━━━━━━━━━━━━ 268s 239ms/step - accuracy: 0.9261 - loss: 0.2953 - val_accuracy: 0.9824 - val_loss: 0.0589

Epoch 2/5

1080/1080 ━━━━━━━━━━━━━━━━━━━━ 254s 235ms/step - accuracy: 0.9864 - loss: 0.0450 - val_accuracy: 0.9850 - val_loss: 0.0486

Epoch 3/5

1080/1080 ━━━━━━━━━━━━━━━━━━━━ 261s 234ms/step - accuracy: 0.9901 - loss: 0.0314 - val_accuracy: 0.9858 - val_loss: 0.0473

Epoch 4/5

1080/1080 ━━━━━━━━━━━━━━━━━━━━ 260s 233ms/step - accuracy: 0.9917 - loss: 0.0255 - val_accuracy: 0.9854 - val_loss: 0.0493

Epoch 5/5

1080/1080 ━━━━━━━━━━━━━━━━━━━━ 264s 235ms/step - accuracy: 0.9932 - loss: 0.0208 - val_accuracy: 0.9852 - val_loss: 0.0517

Model training complete.

During training, the model updates its weights to better predict entity tags. Monitoring the validation accuracy after each epoch helps identify whether the model is learning effectively or overfitting.

Also Read- Neural Network Architecture: Types, Components & Key Algorithms

Step 6: Evaluate and Test the NER Model

With training complete, it’s time to test how well your Named Entity Recognition model performs on unseen data. You’ll evaluate the model on the test set and run predictions on custom sentences.

# Evaluate the model on the test set
loss, accuracy = model.evaluate(X_test, y_test, verbose=1)
print(f"\nTest Accuracy: {accuracy*100:.2f}%")
print(f"Test Loss: {loss:.4f}")
# Create an inverted index to map tag indices back to tag names
idx2tag = {i: w for w, i in tag2idx.items()}

# Function to predict named entities in a new sentence
def predict_entities(sentence):
    # Tokenize and convert words to indices
    words = sentence.split()
    word_indices = [word2idx.get(w, 0) for w in words]  # Use 0 for unknown words

    # Pad the input sequence
    padded_sequence = pad_sequences([word_indices], maxlen=MAX_LEN, padding="post", value=0)

    # Make predictions
    p = model.predict(padded_sequence)
    p = np.argmax(p, axis=-1)
    print(f"\nPrediction for: '{sentence}'")
    print("{:15} | {:5}".format("Word", "Tag"))
    print("-" * 25)
    for w, pred_idx in zip(words, p[0][:len(words)]):
        print("{:15} | {:5}".format(w, idx2tag[pred_idx]))

# Run example predictions
test_sentence = "Narendra Modi is the Prime Minister of India"
predict_entities(test_sentence)
test_sentence_2 = "Apple is looking at buying U.K. startup for $1 billion"
predict_entities(test_sentence_2)

Output:

Evaluating the model and making predictions...

300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 38ms/step - accuracy: 0.9857 - loss: 0.0509

Test Accuracy: 98.58%

Test Loss: 0.0506

1/1 ━━━━━━━━━━━━━━━━━━━━ 2s 2s/step

Prediction for: 'Narendra Modi is the Prime Minister of India'

Word | Tag

-------------------------

Narendra | B-nat

Modi | B-nat

is | O

the | O

Prime | B-per

Minister | O

of | O

India | B-geo

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step

Prediction for: 'Apple is looking at buying U.K. startup for $1 billion'

Word | Tag

-------------------------

Apple | B-org

is | O

looking | O

at | O

buying | O

U.K. | B-nat

startup | B-nat

for | O

$1 | B-nat

billion | B-nat

Your Named Entity Recognition model achieved 98.58% accuracy on the test data, a strong result for a beginner-level project.

You now have a working NER model. You can improve it further by:

Adding more annotated data.
Using pre-trained embeddings (e.g., GloVe, FastText).
Trying transformer-based models like BERT for better context handling.

Final Conclusion

You successfully built a Named Entity Recognition model using a Bidirectional LSTM network with TensorFlow and Keras. The model achieved high accuracy on the test set and was able to identify entities like names, locations, and organizations from raw text.

While it performed well, a few misclassifications suggest that further improvements are possible with more data or advanced models. This project gave you hands-on experience in sequence labeling, deep learning for NLP, and real-world entity extraction.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Subscribe to upGrad's Newsletter

Join thousands of learners who receive useful tips

Promise we won't spam!

Explore our Popular Data Science Courses

Executive Post Graduate Programme in Data Science from IIITB	Data Science Bootcamp with AI	Master of Science in Data Science from LJMU
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Courses

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Top Data Science Skills to Learn

Data Analysis Course	Inferential Statistics Courses
Hypothesis Testing Programs	Logistic Regression Courses
Linear Regression Courses	Linear Algebra for Analysis

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Read our popular Data Science Articles

Is Data Science Hard to Learn	Data Science Career Growth	What Is Data Science? Courses, Basics, Frameworks & Careers
Future of Data Science in India	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist

Colab Link:
https://colab.research.google.com/drive/1UQFIzPN4_aCcLUAp6GL0axxWRmtg9fbF

Frequently Asked Questions (FAQs)

1. What is the purpose of NER in this project?

The project uses Named Entity Recognition to identify and classify specific entities like names, places, organizations, and monetary values from unstructured text. It helps turn raw sentences into structured insights.

2. What is NER, and how does it work here?

NER stands for Named Entity Recognition. In this project, it’s powered by a Bidirectional LSTM model that analyzes each word in a sentence and predicts its entity tag based on both past and future context.

3. What is the importance of NER in NLP tasks?

NER is essential for extracting key information from text. It enables downstream tasks like question answering, search relevance, document summarization, and chatbot understanding.

4. How does the BiLSTM model improve NER performance?

BiLSTM processes text in both forward and backward directions, giving the model a better understanding of word context. This boosts tagging accuracy for complex sentences.

5. Where can this NER model be applied?

It’s useful for tagging news articles, analyzing social media posts, processing legal or medical documents, extracting financial entities, and more.

Rohit Sharma

840 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources