How to Make a Chatbot in Python Step by Step [With Source Code] in 2025

By Kechit Goyal

Updated on Oct 03, 2025 | 11 min read | 43.9K+ views

Share:

Building a chatbot project in python with source code is one of the most exciting ways to dive into the world of Artificial Intelligence and Natural Language Processing (NLP). In 2025, chatbots are more than just novelties; they are essential tools for customer service, information retrieval, and user engagement across countless industries. Creating your own chatbot from scratch not only looks great on a resume but also provides a deep, hands-on understanding of how machines learn to understand and respond to human language. 

In this blog, we will walk you through every step of building a functional, intelligent chatbot. You'll learn the core concepts, from preparing the data to training an AI model and integrating it into a live chat application. We'll provide the complete source code and a detailed explanation, ensuring that even beginners can follow along and succeed. Get ready to bring your very own chatbot to life. 

Looking to build your career in one of the fastest-growing data science fields? Explore our online Data Science Course and gain the skills that employers are actively looking for all from the convenience of your home! 

Step-by-Step Guide: Your Chatbot Project in Python with Source Code 

Let's start building! Our chatbot will work by processing a knowledge base file, training a model to understand different categories of user queries, and then using that model to predict the best response. 

Prerequisites 

Step 1: Setting Up the Environment and Installing Libraries 

First, we need to install the necessary Python libraries. We'll use the Natural Language Toolkit (NLTK) for processing text and scikit-learn for building our classification model. 

Open your terminal or command prompt and run the following commands: 

Bash 
pip install nltk 
pip install scikit-learn 
 

We will also need to download a specific NLTK data package called punkt for tokenization and wordnet for lemmatization. Open a Python shell by typing python in your terminal, and then run this chatbot code: 

Python 
import nltk 
nltk.download('punkt') 
nltk.download('wordnet') 
 

Enhance your data science skills with the following specialized programs from upGrad to become industry-ready in 2025.  

Step 2: Creating the Knowledge Base (intents.json) 

Our chatbot needs data to learn from. We'll create a JSON file named intents.json. This file will contain categories of user inputs (we'll call these "tags" or "intents"), patterns of what users might say for each intent, and a list of possible responses for the bot. 

Create a file named intents.json and add the following content: 

JSON 
{ 
  "intents": [ 
    { 
      "tag": "greeting", 
      "patterns": ["Hi", "Hello", "Hey", "Good morning", "Good afternoon"], 
      "responses": ["Hello!", "Hi there!", "Hey! How can I help you?"] 
    }, 
    { 
      "tag": "goodbye", 
      "patterns": ["Bye", "Goodbye", "See you later", "Take care"], 
      "responses": ["Goodbye!", "See you later!", "Take care!"] 
    }, 
    { 
      "tag": "thanks", 
      "patterns": ["Thanks", "Thank you", "That's helpful", "Thanks a lot"], 
      "responses": ["You're welcome!", "Happy to help!", "Any time!"] 
    }, 
    { 
      "tag": "about", 
      "patterns": ["Who are you?", "What are you?", "Tell me about yourself"], 
      "responses": ["I am a chatbot created with Python and NLTK.", "I'm your friendly neighborhood chatbot!"] 
    }, 
    { 
      "tag": "help", 
      "patterns": ["Help", "Can you help me?", "I need help", "What can you do?"], 
      "responses": ["I can answer your questions. What do you need help with?", "Sure, I'm here to help. What's up?"] 
    } 
  ] 
} 
 

You can add as many tags, patterns, and responses as you want to make your chatbot more knowledgeable. 

Also Read: How to Open a JSON File? A Complete Guide on Creating and Reading JSON 

Step 3: Preprocessing the Data 

Now we need to write a Python script to read this JSON file and prepare the data for training our model. This involves a few key NLP steps: 

  1. Loading the data from the JSON file. 
  2. Tokenization: Breaking down each sentence into individual words. 
  3. Lemmatization: Reducing words to their base or root form (e.g., "running" becomes "run"). This helps the model treat different forms of the same word as the same thing. 
  4. Creating a vocabulary and a bag-of-words representation for our training data. 

Let's start our Python script, which we'll call chatbot.py

Python 
import json 
import random 
import pickle 
import numpy as np 
 
import nltk 
from nltk.stem import WordNetLemmatizer 
 
from sklearn.feature_extraction.text import TfidfVectorizer 
from sklearn.linear_model import LogisticRegression 
 
lemmatizer = WordNetLemmatizer() 
 
# Load the intents file 
with open('intents.json') as file: 
    intents = json.load(file) 
 
words = [] 
classes = [] 
documents = [] 
ignore_letters = ['?', '!', '.', ','] 
 
# Process each intent 
for intent in intents['intents']: 
    for pattern in intent['patterns']: 
        # Tokenize each word in the sentence 
        word_list = nltk.word_tokenize(pattern) 
        words.extend(word_list) 
        # Add the pair to documents 
        documents.append((word_list, intent['tag'])) 
        # Add the tag to our classes list if not already there 
        if intent['tag'] not in classes: 
            classes.append(intent['tag']) 
 
# Lemmatize words and remove duplicates 
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_letters] 
words = sorted(list(set(words))) 
classes = sorted(list(set(classes))) 
 
# Save the processed words and classes 
pickle.dump(words, open('words.pkl', 'wb')) 
pickle.dump(classes, open('classes.pkl', 'wb')) 
 
 

Step 4: Building and Training the Model 

With our data preprocessed, we can now create the training data and build our machine learning model. We will convert our text data into numerical vectors using TfidfVectorizer and then train a Logistic Regression classifier. This model will learn to predict the correct "tag" or "intent" for a given user input. This is a core part of completing a chatbot project in python with source code. 

Python 
# Create training data 
training = [] 
output_empty = [0] * len(classes) 
 
# Prepare documents for TF-IDF 
corpus = [" ".join(doc[0]) for doc in documents] 
tags = [doc[1] for doc in documents] 
 
# Vectorize the corpus 
vectorizer = TfidfVectorizer(tokenizer=lambda txt: [lemmatizer.lemmatize(w.lower()) for w in nltk.word_tokenize(txt) if w not in ignore_letters]) 
X = vectorizer.fit_transform(corpus) 
y = np.array(tags) 
 
# Train the Logistic Regression model 
model = LogisticRegression(random_state=42) 
model.fit(X, y) 
 
# Save the trained model and vectorizer 
pickle.dump(model, open('chatbot_model.pkl', 'wb')) 
pickle.dump(vectorizer, open('vectorizer.pkl', 'wb')) 
 
print("Training is complete!") 
 

After running this script, you will have four new files: words.pkl, classes.pkl, vectorizer.pkl, and chatbot_model.pkl. These files contain the processed data and the trained AI model, ready to be used. 

Step 5: Developing the Chatting Interface 

The final step is to create the functions that will handle the conversation. We need to: 

  1. Take user input from the command line. 
  2. Process that input in the same way we processed our training data. 
  3. Use our trained model to predict the intent of the input. 
  4. Select a random response from the list of responses for that intent. 
  5. Display the response to the user. 

This step brings all our chatbot programming efforts together into a usable application. 

Python 
# This part of the code is for the interactive chat 
def get_response(user_input): 
    # Load the saved model, vectorizer, words, and classes 
    model = pickle.load(open('chatbot_model.pkl', 'rb')) 
    vectorizer = pickle.load(open('vectorizer.pkl', 'rb')) 
    classes = pickle.load(open('classes.pkl', 'rb')) 
 
    # Transform user input 
    user_input_vec = vectorizer.transform([user_input]) 
 
    # Predict the tag 
    prediction = model.predict(user_input_vec)[0] 
     
    # Get a random response from the predicted tag 
    for intent in intents['intents']: 
        if intent['tag'] == prediction: 
            response = random.choice(intent['responses']) 
            return response 
     
    return "I'm not sure how to respond to that." 
 
# Main chat loop 
print("Chatbot is live! Type 'quit' to exit.") 
while True: 
    message = input("You: ") 
    if message.lower() == 'quit': 
        break 
     
    response = get_response(message) 
    print(f"Bot: {response}") 
 

You can now run this entire script. The first part will train and save the model, and the second part will launch the interactive chat session in your terminal. 

Also Read: Top 50 Python Project Ideas with Source Code in 2025 

Machine Learning Courses to upskill

Explore Machine Learning Courses for Career Progression

360° Career Support

Executive PG Program12 Months
background

Liverpool John Moores University

Master of Science in Machine Learning & AI

Double Credentials

Master's Degree18 Months

The Complete Chatbot Project in Python with Source Code 

For clarity and ease of use, here is the complete and final code for the project. 

File 1: intents.json 

JSON 
{ 
  "intents": [ 
    { 
      "tag": "greeting", 
      "patterns": ["Hi", "Hello", "Hey", "Good morning", "Good afternoon", "What's up"], 
      "responses": ["Hello!", "Hi there!", "Hey! How can I help you?"] 
    }, 
    { 
      "tag": "goodbye", 
      "patterns": ["Bye", "Goodbye", "See you later", "Take care", "I have to go"], 
      "responses": ["Goodbye!", "See you later!", "Take care!"] 
    }, 
    { 
      "tag": "thanks", 
      "patterns": ["Thanks", "Thank you", "That's helpful", "Thanks a lot", "Much appreciated"], 
      "responses": ["You're welcome!", "Happy to help!", "Any time!", "My pleasure!"] 
    }, 
    { 
      "tag": "about", 
      "patterns": ["Who are you?", "What are you?", "Tell me about yourself", "Your name?"], 
      "responses": ["I am a chatbot created with Python and NLTK.", "I'm your friendly neighborhood chatbot!", "You can call me Bot."] 
    }, 
    { 
      "tag": "help", 
      "patterns": ["Help", "Can you help me?", "I need help", "What can you do?", "Support"], 
      "responses": ["I can answer your questions based on my training. What do you need help with?", "Sure, I'm here to help. What's up?"] 
    }, 
    { 
      "tag": "name", 
      "patterns": ["What is your name?", "What should I call you?"], 
      "responses": ["You can call me ChatBot.", "I don't have a name, I am a bot."] 
    } 
  ] 
} 
 

File 2: chatbot.py (The Complete Python Script) 

Python 
import json 
import random 
import pickle 
import numpy as np 
import nltk 
from nltk.stem import WordNetLemmatizer 
from sklearn.feature_extraction.text import TfidfVectorizer 
from sklearn.linear_model import LogisticRegression 
 
# ----------------- 
# 1. DATA PREPARATION AND MODEL TRAINING 
# ----------------- 
def train_model(): 
    lemmatizer = WordNetLemmatizer() 
 
    # Load the intents file 
    with open('intents.json') as file: 
        intents = json.load(file) 
 
    words = [] 
    classes = [] 
    documents = [] 
    ignore_letters = ['?', '!', '.', ','] 
 
    # Process each intent 
    for intent in intents['intents']: 
        for pattern in intent['patterns']: 
            word_list = nltk.word_tokenize(pattern) 
            words.extend(word_list) 
            documents.append((word_list, intent['tag'])) 
            if intent['tag'] not in classes: 
                classes.append(intent['tag']) 
 
    # Lemmatize words, remove duplicates 
    words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_letters] 
    words = sorted(list(set(words))) 
    classes = sorted(list(set(classes))) 
 
    # Save the processed words and classes 
    pickle.dump(words, open('words.pkl', 'wb')) 
    pickle.dump(classes, open('classes.pkl', 'wb')) 
 
    # Prepare documents for TF-IDF 
    corpus = [" ".join(doc[0]) for doc in documents] 
    tags = [doc[1] for doc in documents] 
 
    # Vectorize the corpus 
    vectorizer = TfidfVectorizer(tokenizer=lambda txt: [lemmatizer.lemmatize(w.lower()) for w in nltk.word_tokenize(txt) if w not in ignore_letters]) 
    X = vectorizer.fit_transform(corpus) 
    y = np.array(tags) 
 
    # Train the Logistic Regression model 
    model = LogisticRegression(random_state=42, max_iter=200) 
    model.fit(X, y) 
 
    # Save the trained model and vectorizer 
    pickle.dump(model, open('chatbot_model.pkl', 'wb')) 
    pickle.dump(vectorizer, open('vectorizer.pkl', 'wb')) 
 
    print("Training is complete! Model and necessary files are saved.") 
    return intents 
 
# ----------------- 
# 2. CHATBOT INTERACTION 
# ----------------- 
def run_chatbot(intents): 
    # Load the saved model, vectorizer, and classes 
    model = pickle.load(open('chatbot_model.pkl', 'rb')) 
    vectorizer = pickle.load(open('vectorizer.pkl', 'rb')) 
 
    def get_response(user_input): 
        # Transform user input 
        user_input_vec = vectorizer.transform([user_input]) 
        # Predict the tag 
        prediction = model.predict(user_input_vec)[0] 
         
        # Get a random response from the predicted tag 
        for intent in intents['intents']: 
            if intent['tag'] == prediction: 
                response = random.choice(intent['responses']) 
                return response 
         
        return "I'm not sure how to respond to that, but I'm still learning." 
 
    # Main chat loop 
    print("\nChatbot is live! Type 'quit' to exit.") 
    while True: 
        message = input("You: ") 
        if message.lower() == 'quit': 
            break 
         
        response = get_response(message) 
        print(f"Bot: {response}") 
 
# ----------------- 
# MAIN EXECUTION 
# ----------------- 
if __name__ == "__main__": 
    trained_intents = train_model() 
    run_chatbot(trained_intents)

How to Enhance Your Python Chatbot 

You've successfully built a foundational chatbot project in python with source code. Now, let's explore some ways to make it even better. 

1. Add a Graphical User Interface (GUI) 

A command-line interface works, but a graphical chat window is far more engaging. You can use Python's built-in Tkinter library to create a simple GUI

Example Snippet: 

Python 
import tkinter as tk 
 
def setup_gui(): 
    # This is a basic structure for a GUI 
    window = tk.Tk() 
    window.title("My Chatbot") 
     
    chat_history = tk.Text(window, state='disabled') 
    chat_history.pack(padx=10, pady=10) 
     
    message_entry = tk.Entry(window, width=50) 
    message_entry.pack(padx=10, pady=5) 
     
    send_button = tk.Button(window, text="Send") 
    send_button.pack(pady=5) 
     
    window.mainloop() 
     
# You would integrate your chatbot logic with this GUI 
# setup_gui() 
 

2. Connect to a Database 

To make your chatbot remember conversations or user preferences, you can connect it to a database like SQLite. You could store each conversation with a timestamp, allowing you to analyze what users are asking most frequently. 

3. Deploy Your Chatbot on a Website 

You can turn your chatbot into a web application using a web framework like Flask or Django. This would allow you to create an API that a website's frontend can communicate with, letting users interact with your chatbot directly in their browser. 

Also Read: The Ultimate Guide to Python Web Development: Fundamental Concepts Explained 

4. Use More Advanced NLP Models 

While our model works well, you can achieve even more human-like conversations using state-of-the-art models. The Hugging Face Transformers library provides easy access to powerful pre-trained models like BERT and GPT-2, which can understand context and nuance far better. This is a great next step after mastering the fundamentals of chatbot code. 

Also Read: Natural Language Processing: The Only Guide You'll Ever Need! 

Understanding the Core of a Python Chatbot 

Now, let's understand what a chatbot is and why Python is the perfect tool for the job. A chatbot is a computer program designed to simulate human conversation through text or voice commands. The goal is to create a program that a user can interact with in a natural, intuitive way. 

Types of Chatbots 

There are generally two types of chatbots: 

  1. Rule-Based Chatbots: These are the simplest form. They operate based on a predefined set of rules. If a user's query matches a specific keyword or pattern, the bot responds with a pre-written answer. They are easy to build but not very flexible. 
  2. AI/ML-Based Chatbots: These are more advanced. They use Machine Learning (ML) and Natural Language Processing (NLP) to understand the intent behind a user's query, not just the keywords. They can handle variations in language and learn from conversations to improve over time. 

In this tutorial, we will build an AI-based chatbot. It will be simple enough for beginners but powerful enough to understand user intent and provide intelligent responses. This approach provides a solid foundation in chatbot programming. 

Why Use Python for Chatbot Development? 

Python is the undisputed king of AI and Machine Learning for several reasons: 

  • Simple Syntax: Python's code is clean and readable, making it easy for beginners to learn and write. 
  • Extensive Libraries: It has a massive ecosystem of libraries specifically for AI and NLP, such as NLTK, TensorFlow, PyTorch, and scikit-learn. These libraries simplify complex tasks, allowing us to focus on the logic of our chatbot. 
  • Strong Community: A large and active community means you can always find tutorials, documentation, and support when you get stuck. 

Conclusion 

Congratulations on completing your chatbot project in python with source code! You have successfully built an intelligent application that can understand and respond to human language. You've learned the entire pipeline of a chatbot project, from gathering and preprocessing data with NLTK to training a machine learning model with scikit-learn and building an interactive user interface. 

 

Discover popular AI and ML blogs and free courses to deepen your expertise. Explore the programs below to find your perfect fit.

Subscribe to upGrad's Newsletter

Join thousands of learners who receive useful tips

Promise we won't spam!

Frequently Asked Questions (FAQs)

1. What is NLTK and why is it used for chatbots?

NLTK, or the Natural Language Toolkit, is a powerful Python library for working with human language data. It's used in this chatbot project for essential NLP tasks like tokenization (splitting text into words) and lemmatization (reducing words to their root form), which are crucial for preparing text data for a machine learning model. 

2. Can I use this chatbot on my website?

Yes, but not directly. You would first need to wrap the chatbot project in python with source code logic in a web framework like Flask or Django to create an API. Then, your website's front-end (built with HTML, CSS, and JavaScript) can send user messages to this API and display the chatbot's responses. 

3. How do I add more knowledge to my chatbot?

To add more knowledge, you simply need to edit the intents.json file. You can add new "tags" (categories), and under each tag, add more "patterns" (example user phrases) and "responses" (what the bot should say). After updating the JSON file, you must re-run the training script to update the model. 

4. Is this project considered a rule-based or an AI chatbot?

This is an AI chatbot. While it uses a structured intents.json file (which feels like rules), the core logic relies on a machine learning model to predict the user's intent based on the patterns it learned during training. It doesn't rely on simple keyword matching. 

5. What's the difference between stemming and lemmatization?

Both are techniques to reduce words to their root form. Stemming is a cruder, rule-based process that chops off the end of words (e.g., "studies" becomes "studi"), which may not be a real word. Lemmatization is a more advanced, dictionary-based process that reduces words to their actual root (lemma), so "studies" becomes "study." 

6. How can I measure my chatbot's accuracy?

To measure accuracy, you would need to split your intents.json data into a training set and a testing set. You train the model on the training set and then use it to make predictions on the unseen testing set. You can then compare the model's predictions to the actual tags to calculate an accuracy score. 

7. What are some alternatives to NLTK for chatbot development?

Popular alternatives include spaCy, which is known for its speed and production-readiness, and libraries from Hugging Face like Transformers and Tokenizers, which provide access to state-of-the-art, pre-trained language models for more advanced NLP tasks. 

8. Can I make my chatbot speak its responses?

Yes. You can integrate a Text-to-Speech (TTS) library like gTTS (Google Text-to-Speech) or pyttsx3. After your chatbot generates a text response, you would pass that text to the TTS library, which would then convert it into an audio file that can be played. 

9. How much data do I need to train a good chatbot?

The more, the better. For a simple chatbot like this, a few dozen intents with 5-10 patterns each can produce decent results. For a production-level chatbot that needs to handle a wide variety of queries, you might need thousands of examples to achieve high accuracy. 

10. What's a "bag-of-words"?

A "bag-of-words" is a simple way to represent text data numerically for a machine learning model. It describes the occurrence of words within a document, disregarding grammar and word order but keeping track of frequency. Our TF-IDF vectorizer is a more advanced version of this concept. 

11. Why do we save the model to a file using pickle?

Training a machine learning model can take time, especially with large datasets. We use pickle to save the trained model object to a file. This allows us to simply load and use the pre-trained model for chatting without having to retrain it every time we run the application. 

12. Can this chatbot project in python with source code handle typos from users?

This simple model has limited ability to handle typos. Because it's trained on a specific vocabulary, an unknown misspelled word will be ignored. More advanced chatbots use techniques like fuzzy string matching or character-level embeddings to become more robust against spelling errors. 

13. What is Logistic Regression in the context of this chatbot project in python with source code?

Logistic Regression is a simple and efficient classification algorithm. In our project, it acts as the "brain" of the chatbot. After we convert text into numbers (vectors), this algorithm learns to map those vectors to the correct intent or "tag" (e.g., greeting, goodbye). 

14. Could I use a neural network instead of Logistic Regression?

Yes, absolutely. For more complex datasets, a simple neural network built with libraries like TensorFlow/Keras or PyTorch could potentially achieve higher accuracy. It would replace the LogisticRegression part of the code while the data preprocessing steps would remain largely the same. 

15. How can I make the chatbot project in python with source code remember the context of a conversation?

To handle context, you need to build a more complex system that manages "state." This involves storing information from previous user interactions (like their name or a question they asked earlier) and using that information to inform future responses. This is a feature of more advanced chatbot frameworks like Rasa or Google Dialogflow. 

16. Is the chatbot's response always random from the list?

Yes, in this implementation, once the model predicts an intent (like "greeting"), the code randomly selects one of the available responses from the responses list for that intent. This makes the conversation feel slightly less repetitive. 

17. What is a "corpus" in NLP?

A corpus is simply a large and structured collection of text data. In our project, the collection of all the "patterns" from our intents.json file is considered our training corpus. It's the body of text that our model learns from. 

18. How can I handle user inputs that don't match any intent?

Our current code provides a default fallback response like "I'm not sure how to respond." A more advanced system might use prediction probabilities. If the model's confidence in its top prediction is below a certain threshold, it can trigger the fallback response, indicating it's unsure. 

19. What's the purpose of the random_state parameter in the model?

The random_state parameter is used to ensure reproducibility. Machine learning algorithms often have a random element (e.g., for initializing weights). Setting random_state to a specific number ensures that you get the exact same results every time you run the training process with the same data. 

20. Is this chatbot project in python with source code scalable for a business?

This project is an excellent educational tool but is not production-ready for a large business. A commercial-grade chatbot would require a more robust framework (like Rasa or a cloud provider's AI service), extensive error handling, scalability to handle many users, and a much larger, more dynamic knowledge base. 

Kechit Goyal

95 articles published

Kechit Goyal is a Technology Leader at Azent Overseas Education with a background in software development and leadership in fast-paced startups. He holds a B.Tech in Computer Science from the Indian I...

Speak with AI & ML expert

+91

By submitting, I accept the T&C and
Privacy Policy

India’s #1 Tech University

Executive Program in Generative AI for Leaders

76%

seats filled

View Program

Top Resources

Recommended Programs

LJMU

Liverpool John Moores University

Master of Science in Machine Learning & AI

Double Credentials

Master's Degree

18 Months

IIITB
bestseller

IIIT Bangalore

Executive Diploma in Machine Learning and AI

360° Career Support

Executive PG Program

12 Months

upGrad
new course

upGrad

Advanced Certificate Program in GenerativeAI

Generative AI curriculum

Certification

4 months