Home
Blog
Data Science
Multithreading in Python [With Coding Examples]

Multithreading in Python [With Coding Examples]

Q: 1. What is a thread in Python?

Threads are entities within a process that may be scheduled for execution in Python. In layman's terms, a thread is a calculation process carried out by a computer. It is a set of such instructions within a program that developers may run independently of other scripts. Threads allow you to increase application speed by using parallelism. It is a lightweight process that will enable tasks to operate in parallel. The threads operate independently and maximize CPU use, therefore improving CPU performance.

Q: 2. What is the use of multi-thread in Python?

Multithreading is a threading technique in Python programming that allows many threads to operate concurrently by fast switching between threads with the assistance of a CPU (called context switching). When we can divide our task into multiple separate sections, we utilize multithreading. For example, suppose that you need to conduct a complex database query to get data and break that query up into numerous individual queries. In that case, it will be preferable to allocate a thread to each query and run them all in parallel.

Q: 3. What is thread synchronization?

Thread synchronization is described as a method that guarantees that two or more concurrent processes or threads do not execute a crucial piece of a program simultaneously. Synchronization methods are used to control the access of processes to important sections. When we start two or more threads inside a program, there is a chance that several threads may try to access the same resource, resulting in unexpected results due to concurrency challenges. For example, if many threads attempt to write within the same file, the data may be corrupted because one of the threads can override data, or when one thread is opening and another thread is closing the same file.

By Rohit Sharma

Updated on Dec 20, 2023 | 9 min read | 9.77K+ views

Improving and making code faster is the next step after the basic knowledge of Python is acquired. Multithreading is one such way to achieve that optimization using “Threads”. What are these threads? And how are these different from processes? Let’s find out.

Popular Data Science Programs

DevOps Full Course Online Masters in Data Science Degree PGD in Data Science Advanced Certificate Program in Data Science MSc AI and Data Science Program

import threading

def cuber(n):
print(“Cube: {}”.format(n * n * n))

def squarer(n):
print(“Square: {}”.format(n * n))

if __name__ == “__main__”:
# create the thread
t1 = threading.Thread(target=squarer, args=(5,))
t2 = threading.Thread(target=cuber, args=(5,))

# start the thread t1
t1.start()
# start the thread t2
t2.start()

# wait until t1 is completed
t1.join()
# wait until t2 is completed
t2.join()

# both threads completed
print(“Done!”)

#Output:
Square: 25
Cube: 125
Done!

Now let’s try to understand the code.

First, we import the Threading module which is responsible for all the tasks. Inside the main, we create 2 threads by creating subclasses of the Thread class. We need to pass the target, which is the function that needs to be executed in that thread, and the arguments that need to be passed into those functions.

Now once the threads are declared, we need to start them. That is done by calling the start method on threads. Once started, the main program needs to wait for threads to finish their processing. We use the wait method to let the main program pause and wait for threads T1 and T2 finish their execution.

Must Read: Python Challenges for Beginners

Thread Synchronization

As we discussed above, threads do not execute in parallel, instead Python switches from one to another. So, there is a very critical need of correct synchronization between the threads to avoid any weird behavior.

Race Condition

Threads which are under the same process use common data and files which can lead to a “Race” for the data between multiple threads. Therefore, if a piece of data is accessed by multiple threads, it will be modified by both the threads and the results we’ll get won’t be as expected. This is called a Race Condition.

So, if you have two threads which have access to the same data, then they both can access and modify it when that particular thread is executing. So when T1 starts executing and modifies some data, T2 is in sleep/wait mode. Then T1 stops execution and goes into sleep mode handing the control over to T2, which also has the access to the same data. So T2 will now modify and overwrite the same data which will lead to problems when T1 begins again.

The aim of Thread Synchronization is to make sure this Race Condition never comes and the critical section of code is accessed by threads one at a time in a synchronized way.

upGrad’s Exclusive Data Science Webinar for you –

Watch our Webinar on The Future of Consumer Data in an Open Data Economy

Explore our Popular Data Science Certifications

Executive Post Graduate Programme in Data Science from IIITB	Professional Certificate Program in Data Science for Business Decision Making	Master of Science in Data Science from University of Arizona
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Certifications

Locks

To solve and prevent the Race Condition and its consequences, the thread module offers a Lock class which uses Semaphores to help threads synchronise. Semaphores are nothing but binary flags. Consider them as the “Engaged” sign on Telephone booths which have the value as “Engaged” (equivalent to 1) or “Not Engaged” (Equivalent to 0). So everytime a thread comes across a segment of code with lock, it has to check if lock is already in 1 state. If it is, then it will have to wait until it becomes 0 so that it can use it.

The Lock class has two primary methods:

acquire([blocking]): The acquire method takes in the parameter blocking as either True or False. If a lock for a thread T1 was initiated with blocking as True, it will wait or remain blocked until the critical section of code is locked by another thread T2. Once the other thread T2 releases the lock, thread T1 acquires the lock and returns True.

On the other hand, if the lock for thread T1 was initiated with parameter blocking as False, the thread T1 won’t wait or remain blocked if the critical section is already locked by thread T2. If it sees it as locked, it will straightaway return False and exit. However, if the code was not locked by another thread, it will acquire the lock and return True.

release(): When the release method is called on the lock, it will unlock the lock and return True. Also, it will check if any threads are waiting for lock to be released. If there are, then it will allow exactly one of them to access the lock.

However, if the lock is already unlocked, a ThreadError is raised.

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

Read our popular Data Science Articles

Data Science Career Path: A Comprehensive Career Guide	Data Science Career Growth: The Future of Work is here	Why is Data Science Important? 8 Ways Data Science Brings Value to the Business
Relevance of Data Science for Managers	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist
Career in Data Science	Data Science Top 10 Careers in 2025	Business Intelligence vs Data Science: What are the differences?

Deadlocks

Another issue which arises when we deal with multiple locks is – Deadlocks. Deadlocks occur when locks are not released by threads due to various reasons. Let’s consider a simple example where we do the following:

import threading

l = threading.Lock()
# Before the 1st acquire
l.acquire()
# Before the 2nd acquire
l.acquire()
# Now acquired the lock twice

In the above code, we call the acquire method twice but don’t release it after it is acquired for the first time. Hence, when Python sees the second acquire statement, it will go into the wait mode indefinitely as we never released the previous lock.

These deadlock conditions might creep into your code without you realizing it. Even if you include a release call, your code may fail mid way and the release will never be called and the lock will stay locked. One way to overcome this is by using the with–as statement, also called the Context Managers. Using the with–as statement, the lock will get automatically released once the processing is over or failed due to any reason.

Read: Python Project Ideas & Topics

Top Data Science Skills to Learn to upskill

SL. No	Top Data Science Skills to Learn
1	Data Analysis Online Courses	Inferential Statistics Online Courses
2	Hypothesis Testing Online Courses	Logistic Regression Online Courses
3	Linear Regression Courses	Linear Algebra for Analysis Online Courses

Before you go

As we discussed earlier, Multithreading is not useful in all applications as it doesn’t really make things run in parallel. But the main application of Multithreading is during I/O tasks where the CPU sits idly while waiting for data to be loaded. Multithreading plays a crucial role here as this idle time of CPU is utilized in other tasks, thereby making it ideal for optimization.

If you are curious to learn about data science, check out IIIT-B & upGrad’s Executive PG Program in Data Science which is created for working professionals and offers 10+ case studies & projects, practical hands-on workshops, mentorship with industry experts, 1-on-1 with industry mentors, 400+ hours of learning and job assistance with top firms.