Home
Blog
Data Science
World Happiness Report Analysis with Python

World Happiness Report Analysis with Python

Updated on Aug 05, 2025 | 10 min read | 1.57K+ views

The World Happiness Report ranks the countries based on how people feel about their lives. It utilises data such as income, health, freedom, and support from others.

In this project, you’ll break down that data to find out which factors matter most.
You’ll also group countries with similar happiness patterns using clustering. This helps you see global trends in happiness through simple analysis and visuals.

Upskill in data science with upGrad's Online Data Science Courses. Learn Python, ML, AI, SQL, and Tableau from experts, build real-world skills, and get job-ready.

Looking for some hands-on Python projects to get job-ready? Check this out!: 23+ Data Science Projects in Python for Freshers and Experts to Succeed in 2025

Getting Your Project Done Right: What You Need

Before you start working on the World Happiness Report analysis, make sure you're comfortable with these tools and concepts:

Python programming (You’ll use Python throughout for data processing, visualisation, and modelling.)
Pandas and Numpy (These libraries help you clean, explore, and structure the dataset for analysis.)
Matplotlib or Seaborn (You’ll use these tools to create heatmaps, scatter plots, and interactive maps)
Scikit‑learn basics (You’ll need this to prepare your data before applying clustering)
Correlation analysis (Understanding how different features relate to the happiness score is a key part of this project)
K-Means clustering (This helps you group countries based on similar happiness-related factors.)

Advance your data science career with upGrad's top courses and industry mentors.

Behind the Scenes: How To Do World Happiness Report Analysis

To analyse and visualise the World Happiness Report, you’ll use a set of Python libraries focused on data handling, visualisation, and clustering:

Tool / Library	Purpose
Python	Core language for scripting and data analysis
Pandas	Loads, cleans, and explores the dataset
NumPy	Supports numerical operations and calculations
Matplotlib / Seaborn	Creates static visualisations like scatter plots and heatmaps
Plotly	Builds interactive world maps and charts
Scikit-learn	Standardises data and applies K-Means clustering
K-Means	Group countries based on similar happiness factors

Are you new to Python? This course can help you enhance your skills for free - Learn Basic Python Programming

Time Required to Complete the Project

You can complete this World Happiness Report Analysis project in about 2 to 3 hours.

It’s a good fit if you’re comfortable with Python and want to practice real-world data analysis.

Smart Insights: Techniques That Power World Happiness Report Analysis

To get the most out of the World Happiness Report, you’ll apply these key data science techniques:

Exploratory Data Analysis (EDA):
Explore patterns and trends in happiness scores and related factors like GDP, social support, and life expectancy.
Correlation Analysis:
Identify which features have the strongest relationship with happiness across countries.
Data Visualisation:
Use scatter plots, heatmaps, and interactive maps to make the data easier to understand.
K-Means Clustering:
Group countries into clusters based on similar characteristics to find patterns in happiness levels.

Also Read: Exploratory Data Analysis: Role & Techniques for Business Insights

Let’s build this project from scratch with clear, step-by-step guidance:

Load the Dataset
Check for Missing Values
Explore the Data (EDA)
Visualise Key Factors
Map Global Happiness Scores
Standardise Data for Clustering
Apply K-Means Clustering
Visualize Clusters

Without any further delay, let’s get started!

Step 1: Download the Dataset

To start the analysis, first you need to download the dataset, which is available on the internet for free. You can also download from Kaggle by searching for your project name.

Step 2: Import Required Libraries

To begin your analysis, you need to import all the necessary Python libraries. These libraries will help you load data, create visualisations, perform clustering, and analyse results.

Here’s the list of tools you’ll use:

# main.py
import pandas as pd                  # For loading and handling data
import numpy as np                   # For numerical operations
import matplotlib.pyplot as plt      # For creating plots
import seaborn as sns                # For advanced visualizations
from sklearn.cluster import KMeans   # For clustering countries
from sklearn.preprocessing import StandardScaler  # For scaling features
import plotly.express as px          # For interactive charts and maps

Also Read: Libraries in Python Explained: List of Important Libraries

Step 3: Load the Dataset and Check Missing Values

Now that your tools are ready, the next step is to load the dataset and understand its structure.

You’ll start by reading the CSV file and checking the first few rows.

Then, you'll check for any missing values that might need cleaning later.

# --- 1. Data Loading and Initial Exploration ---

# Load the dataset
try:
    df = pd.read_csv('world-happiness-report-2021.csv')
except FileNotFoundError:
    print("Error: 'world-happiness-report-2021.csv' not found.")
    print("Please make sure the dataset file is in the same directory as the script.")
    exit()

# Display the first few rows of the dataframe
print("--- First 5 Rows of the Dataset ---")
print(df.head())
print("\n" + "="*50 + "\n")

# Check for missing values
print("--- Missing Values ---")
print(df.isnull().sum())
print("\n" + "="*50 + "\n")

Output:

--- First 5 Rows of the Dataset ---

Country name	Regional indicator	Ladder score	Standard error of ladder score	upperwhisker	lowerwhisker	Logged GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices	Generosity	Perceptions of corruption	Ladder score in Dystopia	Explained by: Log GDP per capita	Explained by: Social support	Explained by: Healthy life expectancy	Explained by: Freedom to make life choices	Explained by: Generosity	Explained by: Perceptions of corruption	Dystopia + residual
Finland	Western Europe	7.842	0.032	7.904	7.78	10.775	0.954	72	0.949	-0.098	0.186	2.43	1.446	1.106	0.741	0.691	0.124	0.481	3.253
Denmark	Western Europe	7.62	0.035	7.687	7.552	10.933	0.954	72.7	0.946	0.03	0.179	2.43	1.502	1.108	0.763	0.686	0.208	0.485	2.868
Switzerland	Western Europe	7.571	0.036	7.643	7.5	11.117	0.942	74.4	0.919	0.025	0.292	2.43	1.566	1.079	0.816	0.653	0.204	0.413	2.839
Iceland	Western Europe	7.554	0.059	7.67	7.438	10.878	0.983	73	0.955	0.16	0.673	2.43	1.482	1.172	0.772	0.698	0.293	0.17	2.967
Netherlands	Western Europe	7.464	0.027	7.518	7.41	10.932	0.942	72.4	0.913	0.175	0.338	2.43	1.501	1.079	0.753	0.647	0.302	0.384	2.798

Popular Data Science Programs

MS in Data Science Post Graduate Certificate in Data Science PGD in Data Science MSc in Data Science Program DevOps Full Course Online

--- Missing Values ---

Column Name	Value
Country name	0
Regional indicator	0
Ladder score	0
Standard error of ladder score	0
upperwhisker	0
lowerwhisker	0
Logged GDP per capita	0
Social support	0
Healthy life expectancy	0
Freedom to make life choices	0
Generosity	0
Perceptions of corruption	0
Ladder score in Dystopia	0
Explained by: Log GDP per capita	0
Explained by: Social support	0
Explained by: Healthy life expectancy	0
Explained by: Freedom to make life choices	0
Explained by: Generosity	0
Explained by: Perceptions of corruption	0
Dystopia + residual	0

dtype: int64

Also Read: Data Cleaning Techniques: 15 Simple & Effective Ways To Clean Data

Step 4: Explore the Data with Visual Analysis

Now that the dataset is loaded and cleaned, it’s time to explore the patterns in the data.

You’ll check correlations, create scatter plots, and build an interactive world map.

4.1 Correlation Analysis

Check how strongly each feature relates to the happiness score:

# Set the style for the plots
sns.set_style("whitegrid")

# Correlation Analysis
print("--- Correlation Analysis ---")
numeric_df = df.select_dtypes(include=np.number)
correlation_matrix = numeric_df.corr()

# Heatmap
plt.figure(figsize=(16, 12))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Matrix of World Happiness Report 2021 Features', fontsize=18)
plt.show()

# Print sorted correlation with Ladder score
print("Correlation with Ladder score (Happiness Score):")
print(correlation_matrix['Ladder score'].sort_values(ascending=False))
print("\n" + "="*50 + "\n")

Output:

Correlation with Ladder score (Happiness Score):

Metric	Value
Ladder score	1
lowerwhisker	0.999396
upperwhisker	0.999347
Logged GDP per capita	0.78976
Explained by: Log GDP per capita	0.789745
Explained by: Healthy life expectancy	0.768138
Healthy life expectancy	0.768099
Social support	0.756888
Explained by: Social support	0.756869
Explained by: Freedom to make life choices	0.607793
Freedom to make life choices	0.607753
Dystopia + residual	0.49201
Explained by: Perceptions of corruption	0.421205
Explained by: Generosity	-0.017631
Generosity	-0.017799
Perceptions of corruption	-0.42114
Standard error of ladder score	-0.470787
Ladder score in Dystopia	NaN

Name: Ladder score, dtype: float64

4.2 Visualising Key Factors

Use scatter plots to see how happiness is linked to individual features:

# GDP vs. Happiness
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Logged GDP per capita', y='Ladder score', data=df,
                hue='Regional indicator', palette='viridis', s=80)
plt.title('Happiness Score vs. Logged GDP per Capita', fontsize=16)
plt.xlabel('Logged GDP per Capita')
plt.ylabel('Happiness Score')
plt.legend(title='Region', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

# Social Support vs. Happiness
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Social support', y='Ladder score', data=df,
                hue='Regional indicator', palette='plasma', s=80)
plt.title('Happiness Score vs. Social Support', fontsize=16)
plt.xlabel('Social Support')
plt.ylabel('Happiness Score')
plt.legend(title='Region', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

# Healthy Life Expectancy vs. Happiness
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Healthy life expectancy', y='Ladder score', data=df,
                hue='Regional indicator', palette='magma', s=80)
plt.title('Happiness Score vs. Healthy Life Expectancy', fontsize=16)
plt.xlabel('Healthy Life Expectancy')
plt.ylabel('Happiness Score')
plt.legend(title='Region', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

Output:

Data Science Courses to upskill

Explore Data Science Courses for Career Progression

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

4.3 World Map of Happiness Scores

Use Plotly to build an interactive map:

fig = px.choropleth(df,
                    locations="Country name",
                    locationmode='country names',
                    color="Ladder score",
                    hover_name="Country name",
                    color_continuous_scale=px.colors.sequential.Plasma,
                    title='World Happiness Scores 2021')
fig.show()

Output:

Also Read: What They Don't Tell You About Exploratory Data Analysis in Python!

Step 5: Prepare Data for Clustering

Now that you’ve explored the data, the next goal is to group countries with similar happiness profiles.

Before applying K-Means, you need to select key features and scale them properly.

Here is the code for this step:

# Select features for clustering
features = ['Logged GDP per capita', 'Social support', 'Healthy life expectancy',
            'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']
X = df[features]

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

This prepares your data for clustering in the next step.

Step 6: Find the Optimal Number of Clusters (Elbow Method)

Now that your data is scaled, the next step is to decide how many clusters to create.

The Elbow Method helps you find the ideal number of clusters by measuring how compact each group is.

Here is the code for this step:

# a) Finding the Optimal Number of Clusters (Elbow Method)
wcss = []

for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=42)
    kmeans.fit(X_scaled)
    wcss.append(kmeans.inertia_)

# Plotting the elbow graph
plt.figure(figsize=(10, 5))
plt.plot(range(1, 11), wcss, marker='o', linestyle='--')
plt.title('Elbow Method for Optimal k', fontsize=16)
plt.xlabel('Number of clusters (k)')
plt.ylabel('WCSS')
plt.xticks(range(1, 11))
plt.show()

Output:

Look for the point where the curve starts to bend. This gives you a solid starting point for setting the number of clusters.

Step 7: Apply K-Means Clustering and Visualise the Results

Once you've selected the number of clusters using the Elbow Method, the next step is to apply K-Means to group the countries.

Each country is assigned a cluster label based on its happiness-related features.

Here is the code:

# b) Applying K-Means with the Optimal k
optimal_k = 4
kmeans = KMeans(n_clusters=optimal_k, init='k-means++', max_iter=300, n_init=10, random_state=42)
cluster_labels = kmeans.fit_predict(X_scaled)

# Add the cluster labels to the original dataframe
df['Cluster'] = cluster_labels

# c) Visualizing the Clusters using GDP and Social Support
plt.figure(figsize=(12, 8))
sns.scatterplot(x='Logged GDP per capita', y='Social support',
                hue='Cluster', data=df, palette='Set1', s=100, alpha=0.8)
plt.title('Country Clusters based on GDP and Social Support', fontsize=16)
plt.xlabel('Logged GDP per Capita', fontsize=12)
plt.ylabel('Social Support', fontsize=12)
plt.legend(title='Cluster')
plt.show()

Output:

Subscribe to upGrad's Newsletter

Join thousands of learners who receive useful tips

Promise we won't spam!

This chart helps you see how different countries group together based on their wealth and social support levels.

Also Read: What is Clustering in Machine Learning and Different Types of Clustering Methods

Step 8: Analyse and Interpret the Clusters

Now that each country is assigned to a cluster, it’s time to understand what each group represents.
You’ll compare average values for key happiness factors across clusters.
Then, you’ll view the clusters on a world map for a global perspective.

Here is the Code for this step:

# d) Analyzing the Clusters
print("\n--- Analysis of Clusters ---")
cluster_analysis = df.groupby('Cluster')[features + ['Ladder score']].mean().sort_values(by='Ladder score', ascending=False)
print(cluster_analysis)
print("\n" + "="*50 + "\n")

# World Map of Clusters
fig_cluster.show()

Output:

--- Analysis of Clusters ---

Cluster	Logged GDP per capita	Social support	Healthy life expectancy	Freedom to make life choices	Generosity	Perceptions of corruption	Ladder score
0	10.87644	0.92156	72.46796	0.8984	0.05168	0.43536	6.94904
2	9.867868	0.856868	67.482162	0.781706	-0.114897	0.803088	5.690441
1	8.750958	0.782167	63.030958	0.861583	0.158292	0.7355	5.118292
3	7.889062	0.666219	55.334313	0.676687	0.014594	0.788875	4.402438

Final Conclusion

This project examined the World Happiness Report to determine what causes happiness between nations. By examining attributes such as GDP, social support, and life expectancy, and using K-Means clustering, you separated nations into four clusters. The findings evidenced distinct patterns associating economic and social components with greater scores on happiness. Visualisations served to unveil regional variations and made the conclusions easy to understand.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Explore our Popular Data Science Courses

Executive Post Graduate Programme in Data Science from IIITB	Data Science Bootcamp with AI	Master of Science in Data Science from LJMU
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Courses

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Top Data Science Skills to Learn

Data Analysis Course	Inferential Statistics Courses
Hypothesis Testing Programs	Logistic Regression Courses
Linear Regression Courses	Linear Algebra for Analysis

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Read our popular Data Science Articles

Is Data Science Hard to Learn	Data Science Career Growth	What Is Data Science? Courses, Basics, Frameworks & Careers
Future of Data Science in India	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist

Colab Link:
https://colab.research.google.com/drive/1nVlQq8muCiZBPPYvcevXfl_07TX7gYd5?usp=sharing

Frequently Asked Questions (FAQs)

1. What is the goal of this project?

The goal is to do World Happiness Report Analysis and group countries based on factors influencing happiness, such as GDP, social support, and health.

2. Why is clustering used in this project?

Clustering helps you find patterns by grouping countries with similar happiness-related features, making comparisons easier.

3. How many clusters were used and why?

Four clusters were used, based on the Elbow Method, which showed that this number offered a good balance between detail and simplicity.

4. What features had the strongest link to happiness?

Logged GDP per capita, social support, and healthy life expectancy had the highest positive correlation with happiness scores.

5. What libraries were used in this project?

The project used Python with libraries like Pandas, NumPy, Seaborn, Matplotlib, Plotly, and Scikit-learn.

Rohit Sharma

840 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources