Movie Recommendation System: How To Build it with Machine Learning?

By Rohit Sharma

Updated on Jul 04, 2025 | 14 min read | 20.14K+ views

Share:

Did you know? YouTube’s recommendation engine drives over 70% of its traffic. That makes it the real power player behind what you watch next. But unlike Netflix, YouTube’s system plays by a different set of rules, with its own unique goals and challenges.

Movie recommendation systems use machine learning to analyze user preferences, watch history, ratings, and behavior to suggest films tailored to individual tastes. These systems learn patterns over time to deliver more accurate and personalized suggestions. 

For example, Netflix uses collaborative filtering and deep learning to recommend movies based on what similar users watched. This helps viewers discover content they’re likely to enjoy, even if they’ve never searched for it before.

In this blog, you’ll learn how movie recommendation systems use machine learning to deliver personalized viewing experiences. 

If you want to learn machine learning, upGrad’s online AI and ML courses can help you. By the end of the program, participants will be equipped with the skills to build AI models, analyze complex data, and solve industry-specific challenges.

Build Your Own Movie Recommendation System with ML

You’ve seen Netflix suggest that crime drama at 3 AM, or YouTube nudge you toward that oddly addictive mini doc. But have you ever thought: “How do these platforms know what I’ll like?”

That’s where Movie Recommendation Systems come in. They can be built using either collaborative filtering or content-based filtering. 

Collaborative Filtering: Collaborative filtering finds users with similar tastes and recommends what they liked. It analyzes user behavior patterns like ratings, views, and clicks. If User A and User B both loved the same movies, the system assumes they have similar preferences. It then suggests movies User A enjoyed to User B. This "people like you also liked" approach doesn't need to understand movie content. It relies purely on user interaction patterns to make predictions.

Content-Based Filtering: Content-based filtering analyzes movie characteristics to find similar content. It examines features like genre, director, actors, and plot keywords. If you rate action movies with specific actors highly, it suggests more action films with those actors. This method focuses on item attributes, not user opinions. It works well for new users since it only needs a few preferences to start making recommendations.

In 2025, professionals who have a good understanding of machine learning concepts will be in high demand. If you're looking to develop skills in AI and ML, here are some top-rated courses to help you get there:

In this guide, we’ll build a movie recommendation system using:

  1. Collaborative Filtering (using user behavior)
  2. Content-Based Filtering (based on movie features)

Let’s begin.

background

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Step 1: Set Up Your Environment

Before you dive into building your movie recommendation system, you need to set up a clean environment with the right libraries. We'll use Google Colab for this tutorial because it's cloud-based, beginner-friendly, and doesn’t require local setup. All you need is a Google account.

The key libraries we’ll use are:

  • pandas for handling and manipulating data
  • scikit-surprise for building collaborative filtering models
  • scikit-learn for content-based filtering (TF-IDF, cosine similarity)

Start by running the following in a Colab cell to install everything you need:

t!pip install scikit-surprise
!pip install pandas scikit-learn

Once installed, you can import them and start writing your code right away. Colab also gives you GPU and TPU access if you need it later for larger-scale models.

Also Read: Top 5 Machine Learning Models Explained For Beginners

Step 2: Create a Movie Ratings Dataset

To build a movie recommendation system, you first need a dataset that simulates real-world user behavior. Since this is a hands-on tutorial, we’ll start with a small, hypothetical dataset. This helps you focus on understanding the logic without being overwhelmed by thousands of rows.

Think of this as a mini version of what platforms like Netflix or Amazon Prime collect every time you watch or rate something.

Each entry in the dataset contains:

  • A user_id: the viewer
  • A movie: the name of the movie watched
  • A rating: the viewer’s score, from 1 (didn’t like it) to 5 (loved it)

Let’s define this sample data in code:

import pandas as pd

# Simulated user ratings for various movies
ratings_data = {
    'user_id': ['U1', 'U1', 'U2', 'U2', 'U3', 'U3', 'U4', 'U4', 'U5'],
    'movie': ['Inception', 'Avengers', 'Inception', 'Titanic', 'Avengers', 'Shrek', 'Titanic', 'Shrek', 'Inception'],
    'rating': [5, 4, 4, 5, 3, 5, 4, 4, 3]
}

# Create a DataFrame
ratings_df = pd.DataFrame(ratings_data)

# Display the dataset
ratings_df

When you run this, you’ll see a table that looks like this:

user_id

movie

rating

U1 Inception 5
U1 Avengers 4
U2 Inception 4
U2 Titanic 5
U3 Avengers 3
U3 Shrek 5
U4 Titanic 4
U4 Shrek 4
U5 Inception 3

This dataset will be the foundation of your collaborative filtering model. Later, we’ll use this to predict ratings for movies a user hasn’t watched yet, just like a real recommendation engine would do behind the scenes.

If you're working with a larger dataset in the future (like MovieLens), this same logic will apply, just at a bigger scale.

Also Read: Exploring the Scope of Machine Learning: Trends, Applications, and Future Opportunities

Step 3: Build a Collaborative Filtering Recommender System

Now that you’ve got your sample data ready, let’s move on to building the actual recommendation engine. We'll start with Collaborative Filtering, one of the most common and powerful techniques used in platforms like Netflix, Amazon, and YouTube.

This approach makes recommendations by analyzing user behavior. In simple terms, if two users liked similar movies in the past, they’re likely to enjoy the same ones in the future. We'll implement this using the Surprise library, which is perfect for building and testing recommender systems.

1. Prepare Your Data for the Surprise Library

The Surprise library requires your dataset to be in a specific format: a dataframe with three columns — user ID, item (movie) name, and rating. We already have that in ratings_df. Now let’s load it into Surprise:

from surprise import Dataset, Reader

# Define the rating scale (our ratings go from 1 to 5)
reader = Reader(rating_scale=(1, 5))

# Load the dataframe into Surprise's format
data = Dataset.load_from_df(ratings_df[['user_id', 'movie', 'rating']], reader)

# Build the full training set
trainset = data.build_full_trainset()

What’s happening here?

  • Reader tells Surprise the scale of your ratings.
  • Dataset.load_from_df converts your Pandas dataframe to a format Surprise understands.
  • build_full_trainset() takes all your data and prepares it for training.

2. Train the Collaborative Filtering Model

Next, we’ll use a K-Nearest Neighbors (KNN) algorithm to find similar users based on their ratings. We’ll use cosine similarity as our metric (a common choice in recommendations).

from surprise import KNNBasic

# Define similarity options
sim_options = {
    'name': 'cosine',      # Use cosine similarity
    'user_based': True     # Compute similarities between users (not items)
}

# Initialize the algorithm
model = KNNBasic(sim_options=sim_options)

# Train the model on the training set
model.fit(trainset)

Why KNN? It’s simple, intuitive, and works well for small to medium-sized datasets. It compares users and recommends items liked by similar users.

Why cosine similarity? It measures how similar two users are based on the "angle" between their rating vectors, rather than the difference in absolute rating values.

Once this step runs, you now have a trained model ready to make movie predictions!

Step 4: Generate Movie Recommendations for a User

Now that your collaborative filtering model is trained, it’s time to put it to work. You’ll predict how much a user might enjoy movies they haven’t rated yet. Based on those predictions, you’ll recommend the highest-rated ones.

1. Identify Unwatched Movies for a User:

Let’s focus on User U5. Based on our sample data, U5 has only rated Inception. Let’s find out which movies they haven’t rated yet, and then predict ratings for those.

# Get all unique movie names
all_movies = ratings_df['movie'].unique()

# Movies that U5 has already rated
watched_by_u5 = ratings_df[ratings_df['user_id'] == 'U5']['movie'].values

# Find the unwatched ones
unwatched_by_u5 = [movie for movie in all_movies if movie not in watched_by_u5]

print(f"U5 has not watched: {unwatched_by_u5}")

Output:

U5 has not watched: ['Avengers', 'Titanic', 'Shrek']

Now let’s predict how U5 would rate each of these movies.

2. Predict Ratings for Unwatched Movies

from surprise import accuracy

# Predict ratings for each unwatched movie
for movie in unwatched_by_u5:
    prediction = model.predict('U5', movie)
    estimated_rating = round(prediction.est, 2)
    print(f"Predicted rating for U5 → {movie}: {estimated_rating}")

Sample Output:

Predicted rating for U5 → Avengers: 3.68
Predicted rating for U5 → Titanic: 4.12
Predicted rating for U5 → Shrek: 3.91

What just happened?

  • The model compared U5 with other users who have rated the same movies.
  • It looked for patterns: for example, if users similar to U5 liked Titanic, the model assumes U5 might like it too.
  • It gave you an estimated rating that reflects U5’s likely preference.

3. Recommend the Top-Rated Movies

Now, you probably don’t want to recommend everything, just the best picks. Let’s sort and show the top recommendations:

# Store predictions in a list
predictions = []

for movie in unwatched_by_u5:
    prediction = model.predict('U5', movie)
    predictions.append((movie, round(prediction.est, 2)))

# Sort by predicted rating in descending order
recommendations = sorted(predictions, key=lambda x: x[1], reverse=True)

print("Recommended movies for U5:")
for movie, rating in recommendations:
    print(f"{movie} (Predicted Rating: {rating})")

Output:

Recommended movies for U5:
Titanic (Predicted Rating: 4.12)
Shrek (Predicted Rating: 3.91)
Avengers (Predicted Rating: 3.68)

Now you now know how to:

  • Detect what a user hasn’t watched
  • Predict how much they’d like those movies
  • Sort predictions to recommend the best ones

Just like that, you've created a functioning recommendation system, one that mimics how platforms like Netflix suggest shows and movies.

Also Read: How to Implement Machine Learning Steps: A Complete Guide

Step 5: Build a Content-Based Recommender

Let’s say U5 loved Titanic, a romance-drama. With content-based filtering, you’ll recommend movies that are similar in content, not just in who liked them.

In this step, we’ll use TF-IDF (Term Frequency–Inverse Document Frequency) to convert text data (genres) into vectors and use cosine similarity to find how close different movies are.

1. Create Metadata for Movies

Start by building a mini dataset of movies and their genres. This data could be fetched from IMDb or TMDb in a real-world project.r4

movie_data = {
    'movie': ['Inception', 'Avengers', 'Titanic', 'Shrek'],
    'genre': ['Sci-Fi Action', 'Action Fantasy', 'Romance Drama', 'Animation Comedy']
}

movies_df = pd.DataFrame(movie_data)
movies_df

Output:

movie

genre

Inception Sci-Fi Action
Avengers Action Fantasy
Titanic Romance Drama
Shrek Animation Comedy

Each movie now has a genre description we’ll convert into numerical values for similarity calculation.

2. Convert Genres to Feature Vectors

Use TfidfVectorizer to extract keyword importance from each genre string.

from sklearn.feature_extraction.text import TfidfVectorizer

# Convert genre text to TF-IDF vectors
tfidf = TfidfVectorizer(stop_words='english')
genre_matrix = tfidf.fit_transform(movies_df['genre'])

What’s happening? Each genre string is transformed into a vector based on term frequency. Words that occur often in one movie but rarely across others get higher weight, great for distinguishing unique genres.

3. Compute Cosine Similarity Between Movies

Now we’ll compare each movie to every other movie based on genre vectors using cosine similarity.

from sklearn.metrics.pairwise import linear_kernel

# Compute pairwise cosine similarity
cosine_sim = linear_kernel(genre_matrix, genre_matrix)

This will give you a 4x4 similarity matrix comparing each movie with every other movie, including itself (score of 1.0).

4. Create a Movie Recommendation Function

Now we’ll write a function to recommend the top N similar movies based on genre.

# Create a reverse lookup of movie names to their index
indices = pd.Series(movies_df.index, index=movies_df['movie'])

def content_recommendations(title, num_recs=2):
    idx = indices[title]
    
    # Get similarity scores for the selected movie
    sim_scores = list(enumerate(cosine_sim[idx]))
    
    # Sort scores (excluding the movie itself)
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)[1:num_recs+1]
    
    # Get indices of most similar movies
    movie_indices = [i[0] for i in sim_scores]
    
    return movies_df['movie'].iloc[movie_indices]

Try It Out: Recommend Similar Movies to Titanic

content_recommendations('Titanic')

Output:

0    Inception
3        Shrek
Name: movie, dtype: object

That’s it! Your system recommends Inception and Shrek based on how closely their genres relate to Titanic.

Why Shrek? While the genres differ, the system picks up shared themes like emotional engagement or family-friendly tones (via vector similarity).

You just built a fully functional content-based movie recommender using:

  • Movie metadata (genre)
  • TF-IDF to convert text to vectors
  • Cosine similarity to find closeness
  • A custom function to recommend movies

Unlike collaborative filtering, this method doesn't depend on user ratings. It can work even if a movie is brand new.

Blending Both Methods: In real-world scenarios, companies blend collaborative and content-based filtering to build hybrid systems. You can average the scores from both systems or use more advanced ensemble methods.

Next Steps (Optional):

  • Use MovieLens dataset for scale
  • Integrate with Flask or Streamlit for a working app
  • Try item-based filtering or matrix factorization with SVD
  • Expand metadata: actors, tags, plot summaries

You will learn more about ML techniques with upGrad’s free Unsupervised Learning: Clustering course. Explore K-Means, Hierarchical Clustering, and practical applications to uncover hidden patterns in unlabelled data.

upGrad’s Exclusive Data Science Webinar for you –

How upGrad helps for your Data Science Career?

 

Also Read: Machine Learning Basics: What You Need to Know in 2025!

Next, let’s look at some of the challenges of building a movie recommendation system with machine learning and how you can overcome them.

Challenges of Building a Movie Recommendation System and How to Overcome Them?

While building a movie recommendation system sounds exciting, there are several practical challenges that can trip you up along the way. For example, how do you recommend a movie to a brand-new user who hasn't rated anything? Or how do you ensure your model doesn't only suggest blockbuster hits and ignores hidden gems?

Building an effective recommendation engine means going beyond just algorithms. You need to solve for data sparsity, ensure fairness, and adapt to changing user preferences in real time. All while delivering a personalized experience.

To make this easier, here’s a breakdown of some common challenges and how you can solve them:

Challenges

Solutions

Cold Start Problem: It’s hard to recommend movies to new users or include new movies without prior data. Use content-based filtering initially, and encourage onboarding feedback (e.g., favorite genres) to collect early signals.
Data Sparsity: Most users only rate a few movies, making user-user comparisons weak. Apply matrix factorization (like SVD) or hybrid models to infer hidden relationships between users and items.
Popularity Bias: Models tend to over-recommend popular movies and ignore niche titles. Introduce diversity constraints or re-rank recommendations to include lesser-known but relevant options.
Scalability: As data grows, models can slow down or become memory-intensive. Use approximate nearest neighbors (e.g., FAISS) or distributed computing to speed up similarity searches.
Dynamic User Preferences: A user’s taste may evolve over time, making static models less effective. Incorporate time-aware or session-based models that adapt based on recent user behavior.

You can get a better understanding of more advanced ML models with upGrad’s free Fundamentals of Deep Learning and Neural Networks course. Get expert-led deep learning training, and hands-on insights, and earn a free certification.

Also Read: Learning Artificial Intelligence & Machine Learning - How to Start

Next, let’s look at how upGrad can help you learn how to build a movie recommendation system with machine learning.

How Can upGrad Help You Build Machine Learning Models?

Movie recommendation systems are at the core of user engagement for platforms like Netflix, Prime Video, and YouTube. These ML-based systems help millions discover content they actually care about. Learning ML applications can set you apart in today’s AI-powered job market.

With upGrad, you can learn how to build smart, scalable ML systems from scratch. With expert mentorship, real-world projects, and personalized guidance, you'll gain the confidence to turn machine learning into a tool for impact.

In addition to the programs covered above, here are some courses that can enhance your learning journey:

If you're unsure where to begin or which area to focus on, upGrad’s expert career counselors can guide you based on your goals. You can also visit a nearby upGrad offline center to explore course options, get hands-on experience, and speak directly with mentors!

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

References:
https://labelyourdata.com/articles/movie-recommendation-with-machine-learning
https://qz.com/1178125/youtubes-recommendations-drive-70-of-what-we-watch

Frequently Asked Questions (FAQs)

1. How can I handle multilingual movie titles or metadata in a recommendation system?

2. What if users have inconsistent rating behavior, like someone who never gives 5 stars?

3. Can I recommend movies based on mood or emotional tone instead of genre?

4. How do I make recommendations when only clickstream data (no ratings) is available?

5. How can I prevent my recommendation system from reinforcing stereotypes or biases in content?

6. What are the risks of overfitting in collaborative filtering models, and how can I avoid it?

7. How do I update recommendations in real time without retraining the entire model?

8. Can I use knowledge graphs to improve recommendations beyond ratings and genres?

9. How do you test if a recommendation system is actually improving user satisfaction?

10. Is it possible to combine both collaborative and content-based filtering in a production system?

11. How do I scale a recommendation engine to handle millions of users and movies?

Rohit Sharma

834 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

360° Career Support

Executive PG Program

12 Months