Home
Blog
Data Science
Song Recommendation System Using Machine Learning

Song Recommendation System Using Machine Learning

Updated on Aug 05, 2025 | 7 min read | 1.35K+ views

Table of Contents

View all

What Should You Know Beforehand?
Technologies and Libraries Used
Time Taken and Difficulty
How to Build a Song Recommendation System
Conclusion

Ever noticed how, after you like or listen to a song on Spotify, Jio Saavan, YouTube, or any other platform, your feed starts showing more similar content? That’s an application of Machine Learning, using which recommender systems are built to personalize user experience and keep you engaged. In this project, we will develop a song recommendation system that follows the same principle.

Using metadata like genre, artist, and track name, the system will suggest songs that are similar to the one you select. The dataset used is the TCC CEDs Music Dataset, which contains rich information on songs released over the last century.

Enhance your data science career with upGrad's Online Data Science Courses Taught by experts. These courses offer job-ready skills in Python, Machine Learning, AI, SQL, and Tableau. Enroll today!

For more project ideas like this one, check out our blog post on the Top 25+ Essential Data Science Projects GitHub to Explore in 2025.

Popular Data Science Programs

MSc in Data Science Program M Sc in Data Science Degree Advanced Certificate Program in Data Science DevOps Full Course Online PGD in Data Science

What Should You Know Beforehand?

To follow along smoothly, make sure you have a basic understanding of:

Python programming
Data preprocessing
Pandas, NumPy, and basic visualization
TF-IDF and cosine similarity
Recommender system concepts

Technologies and Libraries Used

We will work with the following Python libraries:

Pandas – for handling tabular data
NumPy – for numerical operations
Matplotlib & Seaborn – for data visualization
Scikit-learn – to vectorize text and calculate similarity

Time Taken and Difficulty

Difficulty: Beginner–Intermediate
Time Estimate: 1.5–2 hours

How to Build a Song Recommendation System

Let’s start building the project from scratch. So, without wasting any more time, let’s begin!

Step 1: Upload the Dataset & Import Libraries

To begin, we’ll upload our dataset and import the essential Python libraries. The dataset file (tcc_ceds_music.csv) needs to be manually uploaded into Colab for access. Here is the code to do so:

from google.colab import files

# This will prompt you to upload the file manually
uploaded = files.upload()

Output:

tcc_ceds_music.csv(text/csv) - 27655251 bytes, last modified: 7/30/2025 - 100% done

Saving tcc_ceds_music.csv to tcc_ceds_music.csv

Now that the song recommendation system-related .csv file has been uploaded, let’s load it using Pandas. Use the code mentioned below to do so:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Load the dataset
data = pd.read_csv('tcc_ceds_music.csv')
data.head()

Output:

	Unnamed: 0	artist_name	track_name	release_date	genre	lyrics	len	dating	violence	world/life	...	sadness	feelings	danceability	loudness	acousticness	instrumentalness	valence	energy	topic	age
0	0	mukesh	mohabbat bhi jhoothi	1950	pop	hold time feel break feel untrue convince spea...	95	0.000598	0.063746	0.000598	...	0.380299	0.117175	0.357739	0.454119	0.997992	0.901822	0.339448	0.137110	sadness	1.0
1	4	frankie laine	i believe	1950	pop	believe drop rain fall grow believe darkest ni...	51	0.035537	0.096777	0.443435	...	0.001284	0.001284	0.331745	0.647540	0.954819	0.000002	0.325021	0.263240	world/life	1.0
2	6	johnnie ray	cry	1950	pop	sweetheart send letter goodbye secret feel bet...	24	0.002770	0.002770	0.002770	...	0.002770	0.225422	0.456298	0.585288	0.840361	0.000000	0.351814	0.139112	music	1.0
3	10	pérez prado	patricia	1950	pop	kiss lips want stroll charm mambo chacha merin...	54	0.048249	0.001548	0.001548	...	0.225889	0.001548	0.686992	0.744404	0.083935	0.199393	0.775350	0.743736	romantic	1.0
4	12	giorgos papadopoulos	apopse eida oneiro	1950	pop	till darling till matter know till dream live ...	48	0.001350	0.001350	0.417772	...	0.068800	0.001350	0.291671	0.646489	0.975904	0.000246	0.597073	0.394375	romantic	1.0

5 rows × 31 columns

The output shows us that the dataset has successfully loaded. The dataset contains detailed columns, such as artist_name, track_name, lyrics, genre, valence, acousticness, etc.

Step 2: Exploratory Data Analysis (EDA)

In this step, we will explore the dataset visually. Doing so will help us comprehend this database structure and uncover patterns. Additionally, it will aid us in finding out or pinpointing which features may contribute the most to our song recommendation system.

Most Common Music Genres

First, we will look at the top 10 genres to see the musical diversity in the dataset. Use the code mentioned below to do so:

plt.figure(figsize=(10,6))
sns.countplot(
    y='genre', 
    data=data, 
    order=data['genre'].value_counts().index[:10],
    palette='pastel'
)
plt.title('Top 10 Genres in the Dataset')
plt.xlabel('Number of Songs')
plt.ylabel('Genre')
plt.show()

Output:

The plot shows us which genres dominate the dataset. These genres might influence recommendations depending on song distribution.

Top Artists by Number of Songs

Now, we will identify the most frequently appearing artist. This may help understand which artists could bring about some bias or frequency in our recommendations.

Use the following code to do so:

top_artists = data['artist_name'].value_counts().head(10)

plt.figure(figsize=(10,6))
sns.barplot(
    x=top_artists.values, 
    y=top_artists.index, 
    palette='viridis'
)
plt.title('Top 10 Artists by Song Count')
plt.xlabel('Number of Songs')
plt.ylabel('Artist Name')
plt.show()

Output:

Data Science Courses to upskill

Explore Data Science Courses for Career Progression

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

The output tells us which artists have the most representation. This information can be used to analyze how artist popularity will affect the similarity scores.

Step 3: Preprocess the Data for Recommendation

For a song recommendation system, the data can be prepared by combining relevant text features and transforming them into numerical vectors. This step will allow us to assess the degree of similarity between one song and another, using their metadata presently.

The below-mentioned sub-steps for this will then be executed in one block:

Combine genre, artist_name, and track_name into one feature.
Vectorize the combined text via TF-IDF.
Calculate pairwise similarity scores via cosine similarity.

Use the below-mentioned code to accomplish the same:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# 1. Combine important text fields into a single string
data['combined_features'] = (
    data['genre'].fillna('') + ' ' +
    data['artist_name'].fillna('') + ' ' +
    data['track_name'].fillna('')
)

# 2. Vectorize the combined text using TF-IDF
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(data['combined_features'])

# 3. Compute similarity scores between songs
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

By turning text features into TF-IDF vectors, we will highlight the importance of each word with respect to the entire dataset. By then using cosine similarity, we can compare any pair of songs with respect to their metadata profile and find those that are most alike.

Step 4: Build the Song Recommendation System

Now that we have a matrix showing how similar each song is to every other song, let’s write a function that uses this information to suggest similar tracks. This is the core of our recommendation system.

We’ll define a function recommend_songs() that:

Takes a song name as input.
Finds its index in the dataset.
Retrieves similarity scores for that song.
Sorts and returns the most similar songs (excluding the input itself).

Use the below-mentioned code to do so:

# Function to recommend similar songs using cosine similarity
def recommend_songs(song_name, data, similarity_matrix):
    # Convert input song name to lowercase
    song_name = song_name.lower()
    
    # Create a lowercase version of all track names in the dataset
    data['track_name_lower'] = data['track_name'].str.lower()
    
    # Check if the song exists in the dataset
    if song_name not in data['track_name_lower'].values:
        return "Sorry, this song is not in our database."

    # Get the index of the song
    idx = data[data['track_name_lower'] == song_name].index[0]
    
    # Fetch similarity scores for the song
    sim_scores = list(enumerate(similarity_matrix[idx]))
    
    # Sort songs by similarity score (highest first)
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    # Get top 10 most similar songs (excluding the input song itself)
    sim_scores = sim_scores[1:11]
    
    # Extract the indices of the recommended songs
    song_indices = [i[0] for i in sim_scores]
    
    # Return the names of the recommended songs
    return data['track_name'].iloc[song_indices].values

# Replace "i believe" with any song title from the dataset to get recommendations
recommend_songs("i believe", data, cosine_sim)

Output:

array(["that's my desire", "after you've gone", 'laura',

"that ain't right", 'jezebel', "your cheatin' heart", 'wanted man',

'granada', "you've changed", 'high noon'], dtype=object)

Conclusion

In this project, we built a content-based song recommender system using the tcc_ceds_music.csv dataset. By analyzing audio features like danceability, energy, and tempo, we calculated song similarities using cosine distance. Given a track like “I Believe”, our system effectively suggests similar songs.

This project highlights a practical application of machine learning in building music recommendation engines for platforms like Spotify, Jio Saavan, YouTube, etc. It’s fast, efficient, and based entirely on content.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Subscribe to upGrad's Newsletter

Join thousands of learners who receive useful tips

Promise we won't spam!

Explore our Popular Data Science Courses

Executive Post Graduate Programme in Data Science from IIITB	Data Science Bootcamp with AI	Master of Science in Data Science from LJMU
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Courses

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Top Data Science Skills to Learn

Data Analysis Course	Inferential Statistics Courses
Hypothesis Testing Programs	Logistic Regression Courses
Linear Regression Courses	Linear Algebra for Analysis

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Read our popular Data Science Articles

Is Data Science Hard to Learn	Data Science Career Growth	What Is Data Science? Courses, Basics, Frameworks & Careers
Future of Data Science in India	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist

Colab Link:
https://colab.research.google.com/drive/1WmHNjM6Bs2Zn9p3wWPzN3un5WNMuC0qi

Frequently Asked Questions (FAQs)

1. What is the goal of this song recommender system project?

The main goal is to recommend similar songs based on audio features like tempo, energy, and danceability using machine learning. It mimics how platforms like Instagram suggest music clips to users.

2. How does the system determine which songs are similar?

It uses cosine similarity between numerical audio features to measure how close songs are to one another in feature space.

3. Can this model be adapted for Spotify or YouTube recommendations?

Yes. The same logic can be applied to any platform with access to song metadata or audio features.

4. Is this a content-based or collaborative filtering system?

It is a content-based system, meaning it makes recommendations using only the features of the songs. Not user preferences.

5. What are the limitations of this song recommendation system?

It doesn’t account for user taste or popularity trends. It only suggests songs with similar audio properties, not based on real user listening data.

Rohit Sharma

834 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources