Named Entity Recognition(NER) Model with BiLSTM and Deep Learning in NLP

By Rohit Sharma

Updated on Jul 31, 2025 | 11 min read | 1.33K+ views

Share:

Named Entity Recognition (NER) is a key task in Natural Language Processing that identifies and classifies entities like names, locations, organizations, and dates in text.

In this project, you'll build an NER model using a Bidirectional LSTM network with TensorFlow and Keras. You'll also explore how to preprocess data, label sequences, and train deep learning models to recognize entities in a custom text corpus. 

Fast-track your data science career with upGrad’s premier Online Data Science Courses. Taught by industry experts, these courses will equip you with real-world skills in Python, Machine Learning, AI, SQL, Tableau, and more, preparing you for immediate job readiness. Enroll today and begin your learning journey!

Transform theoretical concepts into practical expertise. Explore our premier Python Data Science Projects to commence project development.

Prerequisites for Successful Project Development

  • Basic understanding of Python programming
  • Familiarity with Natural Language Processing (NLP) concepts
  • Experience with TensorFlow and Keras for deep learning
  • Knowledge of sequence models like LSTM and BiLSTM
  • Understanding of word embeddings and text preprocessing
  • Ability to label and tokenize text for training
  • Awareness of evaluation metrics like accuracy and F1-score

Also Read - 15+ Top Natural Language Processing Techniques To Learn in 2025

Advance your data science career with upGrad's top-ranked courses, offering the opportunity to learn from industry-established mentors.

Tools Powering Named Entity Recognition: An Inside Look

To build this NER project, you'll use a focused set of Python tools and libraries designed for natural language understanding, deep learning, and efficient model training:

Tool / Library

Purpose

Python Core language for scripting NLP workflows and training models
Google Colab Cloud-based environment for running and sharing Jupyter notebooks
Pandas Reads and manages structured text data and annotations
NumPy Handles numerical operations in tokenization and model input prep
TensorFlow / Keras Builds and trains BiLSTM-based entity recognition models
spaCy Offers pre-trained pipelines for quick entity tagging and evaluation
NLTK Assists with tokenization, POS tagging, and text cleaning
Matplotlib / Seaborn Visualizes entity distributions and training performance

Also Read - Introduction to Deep Learning & Neural Networks with Keras

Techniques Used For Named Entity Recognition: Smart Insights

To build an effective Named Entity Recognition (NER) model, you’ll apply core NLP techniques that help detect and label useful information from raw text:

  • Text Preprocessing: Clean the input by tokenizing text, removing punctuation, and normalizing words to improve model accuracy.
  • Sequence Tagging with BiLSTM: Use Bidirectional LSTM networks to learn context from both directions and label entities like names, organizations, and locations.
  • Transfer Learning with spaCy and BERT: Apply pre-trained NLP models to boost recognition quality without extensive training from scratch.
  • Entity Annotation and Labeling: Define custom entity categories and manually tag training samples to prepare your data for supervised learning.
  • Model Evaluation: Use metrics like precision, recall, and F1-score to evaluate how well your model detects named entities.

Also Read - Large Language Models: What They Are, Examples, and Open-Source Disadvantages

Our Plan & Timeline

You can complete this Named Entity Recognition project in around 5 to 6 hours. It’s well-suited for beginners and intermediate learners who have basic knowledge of Python and want to build real-world NLP pipelines.

How to Build a Named Entity Recognition (NER) Model

Let’s build this project from scratch with a clear, step-by-step workflow:

  1. Load the Dataset
  2. Clean and Preprocess the Text
  3. Label and Format Data for NER
  4. Build the BiLSTM NER Model
  5. Train the Model on Labeled Sequences
  6. Test on Custom Sentences

Alright, let's dive in!

Step 1: Import Required Libraries

Before building the Named Entity Recognition model, we need to import the essential Python libraries for data handling, model creation, and training.

Here's the code:

# Step 1: Import Required Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Bidirectional, LSTM, TimeDistributed, Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split

Also Read - Libraries in Python Explained: List of Important Libraries

Step 2: Load and Preprocess the NER Dataset

In this step, you’ll load the CSV file containing labeled text data and organize it into sentence-wise groups.

This structure is essential for Named Entity Recognition tasks, which operate on sequences.

try:
    # Load the dataset
    df = pd.read_csv('NER dataset.csv', encoding='latin1')
except FileNotFoundError:
    print("Error: 'NER dataset.csv' not found. Please ensure the dataset is in the correct path.")
    exit()

# Fill missing sentence identifiers
df = df.fillna(method='ffill')

# Display sample records and metadata
print("Dataset Head:")
print(df.head())
print("\nDataset Info:")
df.info()

# Extract unique words and tags
words = list(set(df["Word"].values))
tags = list(set(df["Tag"].values))
num_words = len(words)
num_tags = len(tags)
print(f"\nNumber of unique words: {num_words}")
print(f"Number of unique tags: {num_tags}")
print(f"Tags: {tags}")

# Define a sentence grouping utility
class SentenceGetter(object):
    def __init__(self, data):
        self.n_sent = 1
        self.data = data
        self.empty = False
        agg_func = lambda s: [(w, p, t) for w, p, t in zip(s["Word"].values.tolist(),
                                                           s["POS"].values.tolist(),
                                                           s["Tag"].values.tolist())]
        self.grouped = self.data.groupby("Sentence #").apply(agg_func)
        self.sentences = [s for s in self.grouped]

# Generate sentence-level data
getter = SentenceGetter(df)
sentences = getter.sentences

# Output sample and statistics
print(f"\nExample sentence (first 5 words): {sentences[0][:5]}")
print(f"Total number of sentences: {len(sentences)}")

Output: 

Dataset Head:

    Sentence #           Word          POS      Tag

0  Sentence: 1      Thousands     NNS       O

1  Sentence: 1             of              IN          O

2  Sentence: 1  demonstrators    NNS      O

3  Sentence: 1           have            VBP      O

4  Sentence: 1        marched        VBN      O

Dataset Info:

RangeIndex: 1048575 entries, 0 to 1048574

Data columns (total 4 columns):

 #   Column      Non-Null Count        Dtype 

---  ------          --------------          ----- 

 0   Sentence #  1048575 non-null   object

 1   Word             1048575 non-null   object

 2   POS              1048575 non-null   object

 3   Tag               1048575 non-null   object

dtypes: object(4)

Number of unique words: 35177

Number of unique tags: 17

Tags: ['B-nat', 'B-geo', 'B-eve', 'I-per', 'I-nat', 'I-gpe', 'B-art', 'B-org', 'I-eve', 'B-gpe', 'O', 'I-art', 'I-org', 'I-geo', 'B-per', 'I-tim', 'B-tim']

Example sentence (first 5 words): [('Thousands', 'NNS', 'O'), ('of', 'IN', 'O'), ('demonstrators', 'NNS', 'O'), ('have', 'VBP', 'O'), ('marched', 'VBN', 'O')]

Total number of sentences: 47959

Also Read - Data Cleaning Techniques: 15 Simple & Effective Ways To Clean Data

Step 3:  Create Word and Tag Mappings with Padded Sequences

Neural networks can’t understand raw text. To train a model for Named Entity Recognition, you must convert each word and tag into a numerical representation. 

Then you’ll pad all sequences to a uniform length to maintain consistency.

# Map each word and tag to a unique index
word2idx = {w: i + 1 for i, w in enumerate(words)}  # Index 0 reserved for padding
tag2idx = {t: i for i, t in enumerate(tags)}

# Convert words and tags in sentences to their index values
X = [[word2idx.get(w[0], 0) for w in s] for s in sentences]  # Unknown words mapped to 0
y = [[tag2idx[w[2]] for w in s] for s in sentences]

# Pad each sequence to the same length (50)
MAX_LEN = 50
X = pad_sequences(maxlen=MAX_LEN, sequences=X, padding="post", value=0)
y = pad_sequences(maxlen=MAX_LEN, sequences=y, padding="post", value=0)

# Convert tag sequences to one-hot encoding
y = to_categorical(y, num_classes=num_tags)

# Split the data into training and testing sets (80/20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Display data shapes
print(f"\nShape of X_train: {X_train.shape}")
print(f"Shape of y_train: {y_train.shape}")
print(f"Shape of X_test: {X_test.shape}")
print(f"Shape of y_test: {y_test.shape}")

Output:

Step 2: Creating word/tag mappings and padding sequences...

Shape of X_train: (38367, 50)
Shape of y_train: (38367, 50, 17)
Shape of X_test: (9592, 50)
Shape of y_test: (9592, 50, 17)

Also Read - Structured Data vs Semi-Structured Data: Differences, Examples & Challenges 

background

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Step 4:  Build the Bi-LSTM Model for Named Entity Recognition

With your data ready, it’s time to define the model. You'll use a Bidirectional LSTM, which captures patterns in both forward and backward directions, essential for understanding context in a sentence.

The code for this step is as follows:

input_layer = Input(shape=(MAX_LEN,))

# Embedding layer to learn word representations
embedding_layer = Embedding(input_dim=num_words + 1, output_dim=100, input_length=MAX_LEN)(input_layer)

# Bidirectional LSTM to capture sequence patterns
lstm_layer = Bidirectional(
    LSTM(units=100, return_sequences=True, recurrent_dropout=0.1)
)(embedding_layer)

# TimeDistributed layer applies dense layer at each time step
output_layer = TimeDistributed(Dense(num_tags, activation="softmax"))(lstm_layer)

# Define and compile the model
model = Model(input_layer, output_layer)
model.compile(
    optimizer="adam",
    loss="categorical_crossentropy",
    metrics=["accuracy"]
)

# View model summary
model.summary()

Output:

Model: "functional"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓

┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃

┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩

│ input_layer (InputLayer)        │ (None, 50)             │             0 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ embedding (Embedding)           │ (None, 50, 100)        │     3,517,800 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ bidirectional (Bidirectional)   │ (None, 50, 200)        │       160,800 │

├─────────────────────────────────┼────────────────────────┼───────────────┤

│ time_distributed                │ (None, 50, 17)         │         3,417 │

│ (TimeDistributed)               │                        │               │

└─────────────────────────────────┴────────────────────────┴───────────────┘

Total params: 3,682,017 (14.05 MB)
Trainable params: 3,682,017 (14.05 MB)
Non-trainable params: 0 (0.00 B)

Step 5: Train the Named Entity Recognition Model

Now that your Bi-LSTM model is ready, it's time to train it. You'll feed in your training data, define the batch size, set the number of epochs, and keep a small portion of data for validation to monitor learning progress.

print("\nStep 4: Training the model...")
history = model.fit(
    X_train, y_train,
    batch_size=32,
    epochs=5,
    validation_split=0.1,
    verbose=1
)
print("\nModel training complete.")

Output:

Training the model...

Epoch 1/5

1080/1080 ━━━━━━━━━━━━━━━━━━━━ 268s 239ms/step - accuracy: 0.9261 - loss: 0.2953 - val_accuracy: 0.9824 - val_loss: 0.0589

Epoch 2/5

1080/1080 ━━━━━━━━━━━━━━━━━━━━ 254s 235ms/step - accuracy: 0.9864 - loss: 0.0450 - val_accuracy: 0.9850 - val_loss: 0.0486

Epoch 3/5

1080/1080 ━━━━━━━━━━━━━━━━━━━━ 261s 234ms/step - accuracy: 0.9901 - loss: 0.0314 - val_accuracy: 0.9858 - val_loss: 0.0473

Epoch 4/5

1080/1080 ━━━━━━━━━━━━━━━━━━━━ 260s 233ms/step - accuracy: 0.9917 - loss: 0.0255 - val_accuracy: 0.9854 - val_loss: 0.0493

Epoch 5/5

1080/1080 ━━━━━━━━━━━━━━━━━━━━ 264s 235ms/step - accuracy: 0.9932 - loss: 0.0208 - val_accuracy: 0.9852 - val_loss: 0.0517

Model training complete.

During training, the model updates its weights to better predict entity tags. Monitoring the validation accuracy after each epoch helps identify whether the model is learning effectively or overfitting.

Also Read- Neural Network Architecture: Types, Components & Key Algorithms

Step 6:  Evaluate and Test the NER Model

With training complete, it’s time to test how well your Named Entity Recognition model performs on unseen data. You’ll evaluate the model on the test set and run predictions on custom sentences.

# Evaluate the model on the test set
loss, accuracy = model.evaluate(X_test, y_test, verbose=1)
print(f"\nTest Accuracy: {accuracy*100:.2f}%")
print(f"Test Loss: {loss:.4f}")
# Create an inverted index to map tag indices back to tag names
idx2tag = {i: w for w, i in tag2idx.items()}

# Function to predict named entities in a new sentence
def predict_entities(sentence):
    # Tokenize and convert words to indices
    words = sentence.split()
    word_indices = [word2idx.get(w, 0) for w in words]  # Use 0 for unknown words

    # Pad the input sequence
    padded_sequence = pad_sequences([word_indices], maxlen=MAX_LEN, padding="post", value=0)

    # Make predictions
    p = model.predict(padded_sequence)
    p = np.argmax(p, axis=-1)
    print(f"\nPrediction for: '{sentence}'")
    print("{:15} | {:5}".format("Word", "Tag"))
    print("-" * 25)
    for w, pred_idx in zip(words, p[0][:len(words)]):
        print("{:15} | {:5}".format(w, idx2tag[pred_idx]))

# Run example predictions
test_sentence = "Narendra Modi is the Prime Minister of India"
predict_entities(test_sentence)
test_sentence_2 = "Apple is looking at buying U.K. startup for $1 billion"
predict_entities(test_sentence_2)

Output: 

Evaluating the model and making predictions...

300/300 ━━━━━━━━━━━━━━━━━━━━ 12s 38ms/step - accuracy: 0.9857 - loss: 0.0509

Test Accuracy: 98.58%

Test Loss: 0.0506

1/1 ━━━━━━━━━━━━━━━━━━━━ 2s 2s/step

Prediction for: 'Narendra Modi is the Prime Minister of India'

Word            | Tag  

-------------------------

Narendra     | B-nat

Modi            | B-nat

is                 | O    

the              | O    

Prime          | B-per

Minister      | O    

of                | O    

India           | B-geo

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step

Prediction for: 'Apple is looking at buying U.K. startup for $1 billion'

Word            | Tag  

-------------------------

Apple           | B-org

is                  | O    

looking         | O    

at                  | O    

buying          | O    

U.K.               | B-nat

startup          | B-nat

for                 | O    

$1                  | B-nat

billion            | B-nat

Your Named Entity Recognition model achieved 98.58% accuracy on the test data, a strong result for a beginner-level project. 

You now have a working NER model. You can improve it further by:

  • Adding more annotated data. 
  • Using pre-trained embeddings (e.g., GloVe, FastText).
  • Trying transformer-based models like BERT for better context handling.

Final Conclusion

You successfully built a Named Entity Recognition model using a Bidirectional LSTM network with TensorFlow and Keras. The model achieved high accuracy on the test set and was able to identify entities like names, locations, and organizations from raw text.

While it performed well, a few misclassifications suggest that further improvements are possible with more data or advanced models. This project gave you hands-on experience in sequence labeling, deep learning for NLP, and real-world entity extraction.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Colab Link:
https://colab.research.google.com/drive/1UQFIzPN4_aCcLUAp6GL0axxWRmtg9fbF

Frequently Asked Questions (FAQs)

1. What is the purpose of NER in this project?

2. What is NER, and how does it work here?

3. What is the importance of NER in NLP tasks?

4. How does the BiLSTM model improve NER performance?

5. Where can this NER model be applied?

Rohit Sharma

804 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

360° Career Support

Executive PG Program

12 Months