Song Recommendation System Using Machine Learning

By Rohit Sharma

Updated on Aug 05, 2025 | 7 min read | 1.16K+ views

Share:

Ever noticed how, after you like or listen to a song on Spotify, Jio Saavan, YouTube, or any other platform, your feed starts showing more similar content? That’s an application of Machine Learning, using which recommender systems are built to personalize user experience and keep you engaged. In this project, we will develop a song recommendation system that follows the same principle. 

Using metadata like genre, artist, and track name, the system will suggest songs that are similar to the one you select. The dataset used is the TCC CEDs Music Dataset, which contains rich information on songs released over the last century.

Enhance your data science career with upGrad's Online Data Science Courses Taught by experts. These courses offer job-ready skills in Python, Machine Learning, AI, SQL, and Tableau. Enroll today!

For more project ideas like this one, check out our blog post on the Top 25+ Essential Data Science Projects GitHub to Explore in 2025.

What Should You Know Beforehand?

To follow along smoothly, make sure you have a basic understanding of:

Technologies and Libraries Used

We will work with the following Python libraries:

  • Pandas – for handling tabular data
  • NumPy – for numerical operations
  • Matplotlib & Seaborn – for data visualization
  • Scikit-learn – to vectorize text and calculate similarity

Time Taken and Difficulty

  • Difficulty: Beginner–Intermediate
  • Time Estimate: 1.5–2 hours

How to Build a Song Recommendation System

Let’s start building the project from scratch. So, without wasting any more time, let’s begin!

Step 1: Upload the Dataset & Import Libraries

To begin, we’ll upload our dataset and import the essential Python libraries. The dataset file (tcc_ceds_music.csv) needs to be manually uploaded into Colab for access. Here is the code to do so:

from google.colab import files

# This will prompt you to upload the file manually
uploaded = files.upload()

Output:

tcc_ceds_music.csv(text/csv) - 27655251 bytes, last modified: 7/30/2025 - 100% done

Saving tcc_ceds_music.csv to tcc_ceds_music.csv

Now that the song recommendation system-related .csv file has been uploaded, let’s load it using Pandas. Use the code mentioned below to do so:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Load the dataset
data = pd.read_csv('tcc_ceds_music.csv')
data.head()

Output:

 

Unnamed: 0

artist_name

track_name

release_date

genre

lyrics

len

dating

violence

world/life

...

sadness

feelings

danceability

loudness

acousticness

instrumentalness

valence

energy

topic

age

0

0

mukesh

mohabbat bhi jhoothi

1950

pop

hold time feel break feel untrue convince spea...

95

0.000598

0.063746

0.000598

...

0.380299

0.117175

0.357739

0.454119

0.997992

0.901822

0.339448

0.137110

sadness

1.0

1

4

frankie laine

i believe

1950

pop

believe drop rain fall grow believe darkest ni...

51

0.035537

0.096777

0.443435

...

0.001284

0.001284

0.331745

0.647540

0.954819

0.000002

0.325021

0.263240

world/life

1.0

2

6

johnnie ray

cry

1950

pop

sweetheart send letter goodbye secret feel bet...

24

0.002770

0.002770

0.002770

...

0.002770

0.225422

0.456298

0.585288

0.840361

0.000000

0.351814

0.139112

music

1.0

3

10

pérez prado

patricia

1950

pop

kiss lips want stroll charm mambo chacha merin...

54

0.048249

0.001548

0.001548

...

0.225889

0.001548

0.686992

0.744404

0.083935

0.199393

0.775350

0.743736

romantic

1.0

4

12

giorgos papadopoulos

apopse eida oneiro

1950

pop

till darling till matter know till dream live ...

48

0.001350

0.001350

0.417772

...

0.068800

0.001350

0.291671

0.646489

0.975904

0.000246

0.597073

0.394375

romantic

1.0

 

5 rows × 31 columns

The output shows us that the dataset has successfully loaded. The dataset contains detailed columns, such as artist_name, track_name, lyrics, genre, valence, acousticness, etc.

Step 2: Exploratory Data Analysis (EDA)

In this step, we will explore the dataset visually. Doing so will help us comprehend this database structure and uncover patterns. Additionally, it will aid us in finding out or pinpointing which features may contribute the most to our song recommendation system.

Most Common Music Genres

First, we will look at the top 10 genres to see the musical diversity in the dataset. Use the code mentioned below to do so:

plt.figure(figsize=(10,6))
sns.countplot(
    y='genre', 
    data=data, 
    order=data['genre'].value_counts().index[:10],
    palette='pastel'
)
plt.title('Top 10 Genres in the Dataset')
plt.xlabel('Number of Songs')
plt.ylabel('Genre')
plt.show()

Output:

The plot shows us which genres dominate the dataset. These genres might influence recommendations depending on song distribution.

Top Artists by Number of Songs

Now, we will identify the most frequently appearing artist. This may help understand which artists could bring about some bias or frequency in our recommendations. 

Use the following code to do so:

top_artists = data['artist_name'].value_counts().head(10)

plt.figure(figsize=(10,6))
sns.barplot(
    x=top_artists.values, 
    y=top_artists.index, 
    palette='viridis'
)
plt.title('Top 10 Artists by Song Count')
plt.xlabel('Number of Songs')
plt.ylabel('Artist Name')
plt.show()

Output:

background

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

The output tells us which artists have the most representation. This information can be used to analyze how artist popularity will affect the similarity scores.

Step 3: Preprocess the Data for Recommendation

For a song recommendation system, the data can be prepared by combining relevant text features and transforming them into numerical vectors. This step will allow us to assess the degree of similarity between one song and another, using their metadata presently.

The below-mentioned sub-steps for this will then be executed in one block:

  • Combine genre, artist_name, and track_name into one feature.
  • Vectorize the combined text via TF-IDF.
  • Calculate pairwise similarity scores via cosine similarity.

Use the below-mentioned code to accomplish the same:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# 1. Combine important text fields into a single string
data['combined_features'] = (
    data['genre'].fillna('') + ' ' +
    data['artist_name'].fillna('') + ' ' +
    data['track_name'].fillna('')
)

# 2. Vectorize the combined text using TF-IDF
tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(data['combined_features'])

# 3. Compute similarity scores between songs
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

By turning text features into TF-IDF vectors, we will highlight the importance of each word with respect to the entire dataset. By then using cosine similarity, we can compare any pair of songs with respect to their metadata profile and find those that are most alike.

Step 4: Build the Song Recommendation System

Now that we have a matrix showing how similar each song is to every other song, let’s write a function that uses this information to suggest similar tracks. This is the core of our recommendation system.

We’ll define a function recommend_songs() that:

  • Takes a song name as input.
  • Finds its index in the dataset.
  • Retrieves similarity scores for that song.
  • Sorts and returns the most similar songs (excluding the input itself).

Use the below-mentioned code to do so:

# Function to recommend similar songs using cosine similarity
def recommend_songs(song_name, data, similarity_matrix):
    # Convert input song name to lowercase
    song_name = song_name.lower()
    
    # Create a lowercase version of all track names in the dataset
    data['track_name_lower'] = data['track_name'].str.lower()
    
    # Check if the song exists in the dataset
    if song_name not in data['track_name_lower'].values:
        return "Sorry, this song is not in our database."

    # Get the index of the song
    idx = data[data['track_name_lower'] == song_name].index[0]
    
    # Fetch similarity scores for the song
    sim_scores = list(enumerate(similarity_matrix[idx]))
    
    # Sort songs by similarity score (highest first)
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    # Get top 10 most similar songs (excluding the input song itself)
    sim_scores = sim_scores[1:11]
    
    # Extract the indices of the recommended songs
    song_indices = [i[0] for i in sim_scores]
    
    # Return the names of the recommended songs
    return data['track_name'].iloc[song_indices].values

# Replace "i believe" with any song title from the dataset to get recommendations
recommend_songs("i believe", data, cosine_sim)

Output:

array(["that's my desire", "after you've gone", 'laura',

       "that ain't right", 'jezebel', "your cheatin' heart", 'wanted man',

       'granada', "you've changed", 'high noon'], dtype=object)

Conclusion

In this project, we built a content-based song recommender system using the tcc_ceds_music.csv dataset. By analyzing audio features like danceability, energy, and tempo, we calculated song similarities using cosine distance. Given a track like “I Believe”, our system effectively suggests similar songs.

This project highlights a practical application of machine learning in building music recommendation engines for platforms like Spotify, Jio Saavan, YouTube, etc. It’s fast, efficient, and based entirely on content.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Colab Link:
https://colab.research.google.com/drive/1WmHNjM6Bs2Zn9p3wWPzN3un5WNMuC0qi

Frequently Asked Questions (FAQs)

1. What is the goal of this song recommender system project?

2. How does the system determine which songs are similar?

3. Can this model be adapted for Spotify or YouTube recommendations?

4. Is this a content-based or collaborative filtering system?

5. What are the limitations of this song recommendation system?

Rohit Sharma

827 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

360° Career Support

Executive PG Program

12 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree

17 Months

upGrad Logo

Certification

3 Months