Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconData Sciencebreadcumb forward arrow iconCreate Your Own Movie Recommendation System Using Python

Create Your Own Movie Recommendation System Using Python

Last updated:
9th Mar, 2021
Read Time
10 Mins
share image icon
In this article
Chevron in toc
View All
Create Your Own Movie Recommendation System Using Python

Do you wonder how Netflix suggests movies that align your interests so much? Or maybe you want to build a system that can make such suggestions to its users too?

If your answer was yes, then you’ve come to the right place as this article will teach you how to build a movie recommendation system by using Python. 

However, before we start discussing the ‘How’ we must be familiar with the ‘What.’

Check out our data science training to upskill yourself

Recommendation System: What is It?

Recommendation systems have become a very integral part of our daily lives. From online retailers like Amazon and Flipkart to social media platforms like YouTube and Facebook, every major digital company uses recommendation systems to provide a personalized user experience to their clients.

Some examples of recommendation systems in your everyday life include:

  • The suggestions you get from Amazon when you buy products are a result of a recommender system.
  • YouTube uses a recommender system to suggest videos suited for your taste.
  • Netflix has a famous recommendation system for suggesting shows and movies according to your interests. 

A recommender system suggests users products by using data. This data could be about the user’s entered interests, history, etc. If you’re studying machine learning and AI, then it’s a must to study recommender systems as they are becoming increasingly popular and advanced. 

Types of Recommendation Systems

There are two types of recommendation systems:

1. Collaborative Recommendation Systems

A collaborative recommendation system suggests items according to the likeness of similar users for that item. It groups users with similar interests and tastes and suggests their products accordingly. 

For example, suppose you and one other user liked Sholay. Now, after watching Sholay and liking it, the other user liked Golmaal. Because you and the other user have similar interests, the recommender system would suggest you watch Golmaal based on this data. This is collaborative filtering. 

2. Content-Based Recommendation Systems

A content-based recommender system suggests items based on the data it receives from a user. It could be based on explicit data (‘Likes’, ‘Shares’, etc.) or implicit data (watch history). The recommendation system would use this data to create a user-specific profile and would suggest items based on that profile. 

Building a Basic Movie Recommendation System

Now that we have covered the basics of recommender systems, let’s get started on building a movie recommendation system. 

We can start building a movie recommendation system Python-based by using the full MovieLens dataset. This dataset contains more than 26 million ratings, 750,000 tag applications that are applied to over 45,000 movies. The tag genome data present in this dataset with more than 12 million relevance scores. 

We are using the full dataset for creating a basic movie recommendation system. However, you’re free to use a smaller dataset for this project. First, we’ll have to import all the required libraries:

A basic movie recommendation system Python-based would suggest movies according to the movie’s popularity and genre. This system works based on the notion that popular movies with critical acclamation will have a high probability of getting liked by the general audience. Keep in mind that such a movie recommendation system doesn’t give personalized suggestions. 

To implement it, we will sort the movies according to their popularity and rating and pass in a genre argument to get a genre’s top movies:


md = pd. read_csv(‘../input/movies_metadata.csv’)



False(‘id’L 10194, ‘name’: ‘Toy Story Collection’)30000000[{‘id’: 16, ‘name’: ‘Animvation’}…False862tt0114709Toy StoryLed by Woody, Andy’s toys live happily…373554033Toy Story
1FalseNaN65000000{{‘id’: 12, ‘name’: ‘Adventure’}…False8844tt0113497JumanjiWhen siblings Judy and Peter…262797249Jumanji
2False(‘id’:  119050, ‘name’: ‘Grumpy Old Men)0{{‘id’: 10749, ‘name’: ‘Romance’}…False15602tt0113228Grumpy Old MenA family wedding reignites the ancient…0Grumpier Old Men
3FalseNaN16000000{{‘id’: 35, ‘name’: ‘Comedy’}…False31357tt0114885Waiting to ExhaleCheated on, mistreated and stepped…81452156Waiting to Exhale


md[‘genres’] = md[‘genres’].fillna(‘[]’).apply(literal_eval).apply(lambda x: [i[‘name’] for i in x] if isinstance(x, list) else [])

The Formula for Our Chart

For creating our chart of top movies, we used the TMDB ratings. We will use IMDB’s weighted rating formula to create our chart, which is as follows:

Weighted Rating (WR) = (iaouaouaouaouaou)

Here, v stands for the number of votes a movie got, m is the minimum number of votes a movie should have to get on the chart, R stands for the average rating of the movie, and C is the mean vote for the entire report. 

Building the Charts

Now that we have the dataset and the formula in place, we can start building the chart. We’ll only add those movies to our charts that have a minimum of 95% votes. We’ll begin with creating a top 250 chart. 


vote_counts = md[md[‘vote_count’].notnull()][‘vote_count’].astype(‘int’)

vote_averages = md[md[‘vote_average’].notnull()][‘vote_average’].astype(‘int’)

C = vote_averages.mean()





m = vote_counts.quantile(0.95)





md[‘year’] = pd.to_datetime(md[‘release_date’], errors=’coerce’).apply(lambda x: str(x).split(‘-‘)[0] if x != np.nan else np.nan)


qualified = md[(md[‘vote_count’] >= m) & (md[‘vote_count’].notnull()) & (md[‘vote_average’].notnull())][[‘title’, ‘year’, ‘vote_count’, ‘vote_average’, ‘popularity’, ‘genres’]]

qualified[‘vote_count’] = qualified[‘vote_count’].astype(‘int’)

qualified[‘vote_average’] = qualified[‘vote_average’].astype(‘int’)



(2274, 6)

As you can see, to get a place on our chart a movie must have a minimum of 434 votes. You may have noticed that the average rating a movie must have to enter our chart is 5.24. 


def weighted_rating(x):

    v = x[‘vote_count’]

    R = x[‘vote_average’]

    return (v/(v+m) * R) + (m/(m+v) * C)


qualified[‘wr’] = qualified.apply(weighted_rating, axis=1)


qualified = qualified.sort_values(‘wr’, ascending=False).head(250)

With all of this in place, let’s build the chart:

upGrad’s Exclusive Data Science Webinar for you –

How upGrad helps for your Data Science Career?

Explore our Popular Data Science Certifications

Top Movies Overall




15480Inception201014075829.1081[Action, Thriller, Science Fiction, Mystery, A…7.917588
12481The Dark Knight2008122698123.167[Drama, Action, Crime, Thriller]7.905871
22879Interstellar201411187832.2135[Adventure, Drama, Science Fiction]7.897107
2843Fight Club19999678863.8696[Drama]7.881753
4863The Lord of the Rings: The Fellowship of the Ring20018892832.0707[Adventure, Fantasy, Action]7.871787
292Pulp Fiction199486708140.95[Thriller, Crime]7.868660
314The Shawshank Redemption19948358851.6454[Drama, Crime]7.864000
7000The Lord of the Rings: The Return of the King20038226829.3244[Adventure, Fantasy, Action]7.861927
351Forrest Gump19948147848.3072[Comedy, Drama, Romance]7.860656
5814The Lord of the Rings: The Two Towers20027641829.4235[Adventure, Fantasy, Action]7.851924
256Star Wars19776778842.1497[Adventure, Action, Science Fiction]7.834205
1225Back to the Future19856239825.7785[Adventure, Comedy, Science Fiction, Family]7.820813
834The Godfather19726024841.1093[Drama, Crime]7.814847
1154The Empire Strikes Back19805998819.471[Adventure, Action, Science Fiction]7.814099
46Se7en19955915818.4574[Crime, Mystery, Thriller]

Voila, you have created a basic movie recommendation system Python-based! 

We will now narrow down our recommender system’s suggestions to genre-based so it can be more precise. After all, it is not necessary for everyone to like The Godfather equally. 

Top Data Science Skills to Learn

Narrowing Down the Genre

So, now we’ll modify our recommender system to be more genre-specific:


s = md.apply(lambda x: pd.Series(x[‘genres’]),axis=1).stack().reset_index(level=1, drop=True) = ‘genre’

gen_md = md.drop(‘genres’, axis=1).join(s)


def build_chart(genre, percentile=0.85):

    df = gen_md[gen_md[‘genre’] == genre]

    vote_counts = df[df[‘vote_count’].notnull()][‘vote_count’].astype(‘int’)

    vote_averages = df[df[‘vote_average’].notnull()][‘vote_average’].astype(‘int’)

    C = vote_averages.mean()

    m = vote_counts.quantile(percentile)

    qualified = df[(df[‘vote_count’] >= m) & (df[‘vote_count’].notnull()) & (df[‘vote_average’].notnull())][[‘title’, ‘year’, ‘vote_count’, ‘vote_average’, ‘popularity’]]

    qualified[‘vote_count’] = qualified[‘vote_count’].astype(‘int’)

    qualified[‘vote_average’] = qualified[‘vote_average’].astype(‘int’)

    qualified[‘wr’] = qualified.apply(lambda x: (x[‘vote_count’]/(x[‘vote_count’]+m) * x[‘vote_average’]) + (m/(m+x[‘vote_count’]) * C), axis=1)

    qualified = qualified.sort_values(‘wr’, ascending=False).head(250)

        return qualified

We have now created a recommender system that sorts movies in the romance genre and recommends the top ones. We chose the romance genre because it didn’t show up much in our previous chart. 

Read our popular Data Science Articles

Top Movies in Romance




10309Dilwale Dulhania Le Jayenge1995661934.4578.565285
351Forrest Gump19948147848.30727.971357
40251Your Name.20161030834.4612527.789489
883Some Like It Hot1959835811.84517.745154
1132Cinema Paradiso1988834814.1777.744878
37863Sing Street2016669810.6728627.689483
882The Apartment1960498811.99437.599317
38718The Handmaiden2016453816.7274057.566166
3189City Lights1931444810.89157.558867
24886The Way He Looks201426285.711277.331363
45437In a Heartbeat2017146820.821787.003959
19731Silver Linings Playbook20124840714.48816.970581

Now, you have a movie recommender system that suggests top movies according to a chosen genre. We recommend testing out this recommender system with other genres too such as Action, Drama, Suspense, etc. Share the top three movies in your favourite genre the recommender system suggests in the comment section below

Learn More About a Movie Recommendation System 

As you must have noticed by now, building a movie recommendation system Python-based, is quite simple. All you need is a little knowledge of data science and a little effort to create a fully-functional recommender system. 

However, what if you want to build more advanced recommender systems? What if you want to create a recommender system that a large corporate might consider using? 

If you’re interested in learning more about recommender systems and data science, then we recommend taking a data science course. With a course, you’ll learn all the fundamental and advanced concepts of data science and machine learning. Moreover, you’ll study from industry experts who will guide you throughout the course to help you avoid doubts and confusion.

At upGrad, we offer multiple data science and machine learning courses. You can pick anyone from the following depending on your interests:

Apart from these courses, we offer many other courses in data science and machine learning. Be sure to check them out!

Final Thoughts

You now know how to build a movie recommendation system. After you have created the system, be sure to share it with others and show them your progress. Recommender systems have a diverse range of applications so learning about them will surely give you an upper hand in the industry.


Rohit Sharma

Blog Author
Rohit Sharma is the Program Director for the UpGrad-IIIT Bangalore, PG Diploma Data Analytics Program.

Frequently Asked Questions (FAQs)

1What is collaborative filtering and what are its types?

Collaborative filtering is a type of recommendation system that approaches building a model based on the user’s preferences. The history of the users acts as the dataset for collaborative filtering. Collaborative filtering is of 2 types that are mentioned below:

1. User-based collaborative filtering : The idea behind this type of collaborative filtering is that we take a user for preference, let's say “A” and find other users having similar preferences and then providing “A” those preferences of these users that it has not encountered yet.
Item-based collaborative filtering : Here instead of finding the users with similar preferences, we find movies similar to “A”’s taste and recommend those movies that it has not watched yet.

2What are the advantages and disadvantages of content-based filtering?

The content-based filtering collects the data from the user and suggests the items accordingly. Some of its advantages, as well as disadvantages, are mentioned below:
1. Unlike collaborative filtering, the model does not need to collect data about other users with similar preferences as it takes the suggestions from the primary user itself.
2. The model can recommend some of the best movies to you according to your preferences that only a few others have watched.
1. This technique requires a lot of information about a certain domain so the quality of features it provides is more or less the same as the hand-engineered features.
2. Its ability to recommend movies is limited since it only works according to the existing interests of the users.

3Which popular applications use collaborative filtering algorithms?

The collaborative filtering algorithm is becoming the primary driving algorithm for many popular applications. More and more businesses are focusing on delivering rich personalized content. For example, you probably have seen this message on many e-commerce websites Customers who buy this also bought.
The following are some of the applications having a popular user base worldwide:
1. YouTube uses this algorithm along with some other powerful algorithms to provide video recommendations on the home page.
2. E-commerce websites such as Amazon, Flipkart, and Myntra also use this algorithm to provide product recommendations.
3. Video streaming platforms are the biggest example here which use user rating, average rating, and related content to provide personalized suggestions.

Explore Free Courses

Suggested Blogs

17 Must Read Pandas Interview Questions & Answers [For Freshers & Experienced]
Pandas is a BSD-licensed and open-source Python library offering high-performance, easy-to-use data structures, and data analysis tools. Python with P
Read More

by Rohit Sharma

04 Oct 2023

13 Interesting Data Structure Project Ideas and Topics For Beginners [2023]
In the world of computer science, data structure refers to the format that contains a collection of data values, their relationships, and the function
Read More

by Rohit Sharma

03 Oct 2023

How To Remove Excel Duplicate: Deleting Duplicates in Excel
Ever wondered how to tackle the pesky issue of duplicate data in Microsoft Excel? Well, you’re not alone! Excel has become a powerhouse tool, es
Read More

by Keerthi Shivakumar

26 Sep 2023

Python Free Online Course with Certification [2023]
Summary: In this Article, you will learn about python free online course with certification. Programming with Python: Introduction for Beginners Lea
Read More

by Rohit Sharma

20 Sep 2023

Information Retrieval System Explained: Types, Comparison & Components
An information retrieval (IR) system is a set of algorithms that facilitate the relevance of displayed documents to searched queries. In simple words,
Read More

by Rohit Sharma

19 Sep 2023

40 Scripting Interview Questions & Answers [For Freshers & Experienced]
For those of you who use any of the major operating systems regularly, you will be interacting with one of the two most critical components of an oper
Read More

by Rohit Sharma

17 Sep 2023

Best Capstone Project Ideas & Topics in 2023
Capstone projects have become a cornerstone of modern education, offering students a unique opportunity to bridge the gap between academic learning an
Read More

by Rohit Sharma

15 Sep 2023

4 Types of Data: Nominal, Ordinal, Discrete, Continuous
Summary: In this Article, you will learn about 4 Types of Data Qualitative Data Type Nominal Ordinal Quantitative Data Type Discrete Continuous R
Read More

by Rohit Sharma

14 Sep 2023

Data Science Course Eligibility Criteria: Syllabus, Skills & Subjects
Summary: In this article, you will learn in detail about Course Eligibility Demand Who is Eligible? Curriculum Subjects & Skills The Science Beh
Read More

by Rohit Sharma

14 Sep 2023

Schedule 1:1 free counsellingTalk to Career Expert
footer sticky close icon