Create Your Own Movie Recommendation System Using Python

Do you wonder how Netflix suggests movies that align your interests so much? Or maybe you want to build a system that can make such suggestions to its users too?

If your answer was yes, then you’ve come to the right place as this article will teach you how to build a movie recommendation system by using Python. 

However, before we start discussing the ‘How’ we must be familiar with the ‘What.’

Recommendation System: What is It?

Recommendation systems have become a very integral part of our daily lives. From online retailers like Amazon and Flipkart to social media platforms like YouTube and Facebook, every major digital company uses recommendation systems to provide a personalized user experience to their clients.

Some examples of recommendation systems in your everyday life include:

  • The suggestions you get from Amazon when you buy products are a result of a recommender system.
  • YouTube uses a recommender system to suggest videos suited for your taste.
  • Netflix has a famous recommendation system for suggesting shows and movies according to your interests. 

A recommender system suggests users products by using data. This data could be about the user’s entered interests, history, etc. If you’re studying machine learning and AI, then it’s a must to study recommender systems as they are becoming increasingly popular and advanced. 

Types of Recommendation Systems

There are two types of recommendation systems:

1. Collaborative Recommendation Systems

A collaborative recommendation system suggests items according to the likeness of similar users for that item. It groups users with similar interests and tastes and suggests their products accordingly. 

For example, suppose you and one other user liked Sholay. Now, after watching Sholay and liking it, the other user liked Golmaal. Because you and the other user have similar interests, the recommender system would suggest you watch Golmaal based on this data. This is collaborative filtering. 

2. Content-Based Recommendation Systems

A content-based recommender system suggests items based on the data it receives from a user. It could be based on explicit data (‘Likes’, ‘Shares’, etc.) or implicit data (watch history). The recommendation system would use this data to create a user-specific profile and would suggest items based on that profile. 

Building a Basic Movie Recommendation System

Now that we have covered the basics of recommender systems, let’s get started on building a movie recommendation system. 

We can start building a movie recommendation system Python-based by using the full MovieLens dataset. This dataset contains more than 26 million ratings, 750,000 tag applications that are applied to over 45,000 movies. The tag genome data present in this dataset with more than 12 million relevance scores. 

We are using the full dataset for creating a basic movie recommendation system. However, you’re free to use a smaller dataset for this project. First, we’ll have to import all the required libraries:

A basic movie recommendation system Python-based would suggest movies according to the movie’s popularity and genre. This system works based on the notion that popular movies with critical acclamation will have a high probability of getting liked by the general audience. Keep in mind that such a movie recommendation system doesn’t give personalized suggestions. 

To implement it, we will sort the movies according to their popularity and rating and pass in a genre argument to get a genre’s top movies:

Input

md = pd. read_csv(‘../input/movies_metadata.csv’)

md.head()

Output

adultbelongs_to_collectionbudgetgenresvideoidimdb_idoriginal_titleoverviewrevenuetitle
False(‘id’L 10194, ‘name’: ‘Toy Story Collection’)30000000[{‘id’: 16, ‘name’: ‘Animvation’}…False862tt0114709Toy StoryLed by Woody, Andy’s toys live happily…373554033Toy Story
1FalseNaN65000000{{‘id’: 12, ‘name’: ‘Adventure’}…False8844tt0113497JumanjiWhen siblings Judy and Peter…262797249Jumanji
2False(‘id’:  119050, ‘name’: ‘Grumpy Old Men)0{{‘id’: 10749, ‘name’: ‘Romance’}…False15602tt0113228Grumpy Old MenA family wedding reignites the ancient…0Grumpier Old Men
3FalseNaN16000000{{‘id’: 35, ‘name’: ‘Comedy’}…False31357tt0114885Waiting to ExhaleCheated on, mistreated and stepped…81452156Waiting to Exhale

Input

md[‘genres’] = md[‘genres’].fillna(‘[]’).apply(literal_eval).apply(lambda x: [i[‘name’] for i in x] if isinstance(x, list) else [])

The Formula for Our Chart

For creating our chart of top movies, we used the TMDB ratings. We will use IMDB’s weighted rating formula to create our chart, which is as follows:

Weighted Rating (WR) = (iaouaouaouaouaou)

Here, v stands for the number of votes a movie got, m is the minimum number of votes a movie should have to get on the chart, R stands for the average rating of the movie, and C is the mean vote for the entire report. 

Building the Charts

Now that we have the dataset and the formula in place, we can start building the chart. We’ll only add those movies to our charts that have a minimum of 95% votes. We’ll begin with creating a top 250 chart. 

Input

vote_counts = md[md[‘vote_count’].notnull()][‘vote_count’].astype(‘int’)

vote_averages = md[md[‘vote_average’].notnull()][‘vote_average’].astype(‘int’)

C = vote_averages.mean()

C

Output

5.244896612406511

Input

m = vote_counts.quantile(0.95)

m

Output

434.0

Input

md[‘year’] = pd.to_datetime(md[‘release_date’], errors=’coerce’).apply(lambda x: str(x).split(‘-‘)[0] if x != np.nan else np.nan)

Input

qualified = md[(md[‘vote_count’] >= m) & (md[‘vote_count’].notnull()) & (md[‘vote_average’].notnull())][[‘title’, ‘year’, ‘vote_count’, ‘vote_average’, ‘popularity’, ‘genres’]]

qualified[‘vote_count’] = qualified[‘vote_count’].astype(‘int’)

qualified[‘vote_average’] = qualified[‘vote_average’].astype(‘int’)

qualified.shape

Output

(2274, 6)

As you can see, to get a place on our chart a movie must have a minimum of 434 votes. You may have noticed that the average rating a movie must have to enter our chart is 5.24. 

Input

def weighted_rating(x):

    v = x[‘vote_count’]

    R = x[‘vote_average’]

    return (v/(v+m) * R) + (m/(m+v) * C)

Input

qualified[‘wr’] = qualified.apply(weighted_rating, axis=1)

Input

qualified = qualified.sort_values(‘wr’, ascending=False).head(250)

With all of this in place, let’s build the chart:

Top Movies Overall

Input

qualified.head(15)

Output

titleyearvote_countvote_averagepopularitygenreswr
15480Inception201014075829.1081[Action, Thriller, Science Fiction, Mystery, A…7.917588
12481The Dark Knight2008122698123.167[Drama, Action, Crime, Thriller]7.905871
22879Interstellar201411187832.2135[Adventure, Drama, Science Fiction]7.897107
2843Fight Club19999678863.8696[Drama]7.881753
4863The Lord of the Rings: The Fellowship of the Ring20018892832.0707[Adventure, Fantasy, Action]7.871787
292Pulp Fiction199486708140.95[Thriller, Crime]7.868660
314The Shawshank Redemption19948358851.6454[Drama, Crime]7.864000
7000The Lord of the Rings: The Return of the King20038226829.3244[Adventure, Fantasy, Action]7.861927
351Forrest Gump19948147848.3072[Comedy, Drama, Romance]7.860656
5814The Lord of the Rings: The Two Towers20027641829.4235[Adventure, Fantasy, Action]7.851924
256Star Wars19776778842.1497[Adventure, Action, Science Fiction]7.834205
1225Back to the Future19856239825.7785[Adventure, Comedy, Science Fiction, Family]7.820813
834The Godfather19726024841.1093[Drama, Crime]7.814847
1154The Empire Strikes Back19805998819.471[Adventure, Action, Science Fiction]7.814099
46Se7en19955915818.4574[Crime, Mystery, Thriller]

Voila, you have created a basic movie recommendation system Python-based! 

We will now narrow down our recommender system’s suggestions to genre-based so it can be more precise. After all, it is not necessary for everyone to like The Godfather equally. 

Narrowing Down the Genre

So, now we’ll modify our recommender system to be more genre-specific:

Input

s = md.apply(lambda x: pd.Series(x[‘genres’]),axis=1).stack().reset_index(level=1, drop=True)

s.name = ‘genre’

gen_md = md.drop(‘genres’, axis=1).join(s)

Input

def build_chart(genre, percentile=0.85):

    df = gen_md[gen_md[‘genre’] == genre]

    vote_counts = df[df[‘vote_count’].notnull()][‘vote_count’].astype(‘int’)

    vote_averages = df[df[‘vote_average’].notnull()][‘vote_average’].astype(‘int’)

    C = vote_averages.mean()

    m = vote_counts.quantile(percentile)

    qualified = df[(df[‘vote_count’] >= m) & (df[‘vote_count’].notnull()) & (df[‘vote_average’].notnull())][[‘title’, ‘year’, ‘vote_count’, ‘vote_average’, ‘popularity’]]

    qualified[‘vote_count’] = qualified[‘vote_count’].astype(‘int’)

    qualified[‘vote_average’] = qualified[‘vote_average’].astype(‘int’)

    qualified[‘wr’] = qualified.apply(lambda x: (x[‘vote_count’]/(x[‘vote_count’]+m) * x[‘vote_average’]) + (m/(m+x[‘vote_count’]) * C), axis=1)

    qualified = qualified.sort_values(‘wr’, ascending=False).head(250)

        return qualified

We have now created a recommender system that sorts movies in the romance genre and recommends the top ones. We chose the romance genre because it didn’t show up much in our previous chart. 

Top Movies in Romance

Input

build_chart(‘Romance’).head(15)

Output

titleyearvote_countvote_averagepopularitywr
10309Dilwale Dulhania Le Jayenge1995661934.4578.565285
351Forrest Gump19948147848.30727.971357
876Vertigo19581162818.20827.811667
40251Your Name.20161030834.4612527.789489
883Some Like It Hot1959835811.84517.745154
1132Cinema Paradiso1988834814.1777.744878
19901Paperman201273487.198637.713951
37863Sing Street2016669810.6728627.689483
882The Apartment1960498811.99437.599317
38718The Handmaiden2016453816.7274057.566166
3189City Lights1931444810.89157.558867
24886The Way He Looks201426285.711277.331363
45437In a Heartbeat2017146820.821787.003959
1639Titanic19977770726.88916.981546
19731Silver Linings Playbook20124840714.48816.970581

Now, you have a movie recommender system that suggests top movies according to a chosen genre. We recommend testing out this recommender system with other genres too such as Action, Drama, Suspense, etc. Share the top three movies in your favourite genre the recommender system suggests in the comment section below

Learn More About a Movie Recommendation System 

As you must have noticed by now, building a movie recommendation system Python-based, is quite simple. All you need is a little knowledge of data science and a little effort to create a fully-functional recommender system. 

However, what if you want to build more advanced recommender systems? What if you want to create a recommender system that a large corporate might consider using? 

If you’re interested in learning more about recommender systems and data science, then we recommend taking a data science course. With a course, you’ll learn all the fundamental and advanced concepts of data science and machine learning. Moreover, you’ll study from industry experts who will guide you throughout the course to help you avoid doubts and confusion.

At upGrad, we offer multiple data science and machine learning courses. You can pick anyone from the following depending on your interests:

Apart from these courses, we offer many other courses in data science and machine learning. Be sure to check them out!

Final Thoughts

You now know how to build a movie recommendation system. After you have created the system, be sure to share it with others and show them your progress. Recommender systems have a diverse range of applications so learning about them will surely give you an upper hand in the industry.

What is collaborative filtering and what are its types?

Collaborative filtering is a type of recommendation system that approaches building a model based on the user’s preferences. The history of the users acts as the dataset for collaborative filtering. Collaborative filtering is of 2 types that are mentioned below:

1. User-based collaborative filtering : The idea behind this type of collaborative filtering is that we take a user for preference, let's say “A” and find other users having similar preferences and then providing “A” those preferences of these users that it has not encountered yet.
Item-based collaborative filtering : Here instead of finding the users with similar preferences, we find movies similar to “A”’s taste and recommend those movies that it has not watched yet.

What are the advantages and disadvantages of content-based filtering?

The content-based filtering collects the data from the user and suggests the items accordingly. Some of its advantages, as well as disadvantages, are mentioned below:
Advantages
1. Unlike collaborative filtering, the model does not need to collect data about other users with similar preferences as it takes the suggestions from the primary user itself.
2. The model can recommend some of the best movies to you according to your preferences that only a few others have watched.
Disadvantages
1. This technique requires a lot of information about a certain domain so the quality of features it provides is more or less the same as the hand-engineered features.
2. Its ability to recommend movies is limited since it only works according to the existing interests of the users.

Which popular applications use collaborative filtering algorithms?

The collaborative filtering algorithm is becoming the primary driving algorithm for many popular applications. More and more businesses are focusing on delivering rich personalized content. For example, you probably have seen this message on many e-commerce websites Customers who buy this also bought.
The following are some of the applications having a popular user base worldwide:
1. YouTube uses this algorithm along with some other powerful algorithms to provide video recommendations on the home page.
2. E-commerce websites such as Amazon, Flipkart, and Myntra also use this algorithm to provide product recommendations.
3. Video streaming platforms are the biggest example here which use user rating, average rating, and related content to provide personalized suggestions.

Prepare for a Career of the Future

Leave a comment

Your email address will not be published.

×
Let’s do it!
No, thanks.