World Happiness Report Analysis with Python
By Rohit Sharma
Updated on Aug 05, 2025 | 10 min read | 1.25K+ views
Share:
For working professionals
For fresh graduates
More
By Rohit Sharma
Updated on Aug 05, 2025 | 10 min read | 1.25K+ views
Share:
The World Happiness Report ranks the countries based on how people feel about their lives. It utilises data such as income, health, freedom, and support from others.
In this project, you’ll break down that data to find out which factors matter most.
You’ll also group countries with similar happiness patterns using clustering. This helps you see global trends in happiness through simple analysis and visuals.
Upskill in data science with upGrad's Online Data Science Courses. Learn Python, ML, AI, SQL, and Tableau from experts, build real-world skills, and get job-ready.
Looking for some hands-on Python projects to get job-ready? Check this out!: 23+ Data Science Projects in Python for Freshers and Experts to Succeed in 2025
Before you start working on the World Happiness Report analysis, make sure you're comfortable with these tools and concepts:
Advance your data science career with upGrad's top courses and industry mentors.
To analyse and visualise the World Happiness Report, you’ll use a set of Python libraries focused on data handling, visualisation, and clustering:
Tool / Library |
Purpose |
Python | Core language for scripting and data analysis |
Pandas | Loads, cleans, and explores the dataset |
NumPy | Supports numerical operations and calculations |
Matplotlib / Seaborn | Creates static visualisations like scatter plots and heatmaps |
Plotly | Builds interactive world maps and charts |
Scikit-learn | Standardises data and applies K-Means clustering |
K-Means | Group countries based on similar happiness factors |
Are you new to Python? This course can help you enhance your skills for free - Learn Basic Python Programming
You can complete this World Happiness Report Analysis project in about 2 to 3 hours.
It’s a good fit if you’re comfortable with Python and want to practice real-world data analysis.
To get the most out of the World Happiness Report, you’ll apply these key data science techniques:
Also Read: Exploratory Data Analysis: Role & Techniques for Business Insights
Let’s build this project from scratch with clear, step-by-step guidance:
Without any further delay, let’s get started!
To start the analysis, first you need to download the dataset, which is available on the internet for free. You can also download from Kaggle by searching for your project name.
To begin your analysis, you need to import all the necessary Python libraries. These libraries will help you load data, create visualisations, perform clustering, and analyse results.
Here’s the list of tools you’ll use:
# main.py
import pandas as pd # For loading and handling data
import numpy as np # For numerical operations
import matplotlib.pyplot as plt # For creating plots
import seaborn as sns # For advanced visualizations
from sklearn.cluster import KMeans # For clustering countries
from sklearn.preprocessing import StandardScaler # For scaling features
import plotly.express as px # For interactive charts and maps
Also Read: Libraries in Python Explained: List of Important Libraries
Now that your tools are ready, the next step is to load the dataset and understand its structure.
You’ll start by reading the CSV file and checking the first few rows.
Then, you'll check for any missing values that might need cleaning later.
# --- 1. Data Loading and Initial Exploration ---
# Load the dataset
try:
df = pd.read_csv('world-happiness-report-2021.csv')
except FileNotFoundError:
print("Error: 'world-happiness-report-2021.csv' not found.")
print("Please make sure the dataset file is in the same directory as the script.")
exit()
# Display the first few rows of the dataframe
print("--- First 5 Rows of the Dataset ---")
print(df.head())
print("\n" + "="*50 + "\n")
# Check for missing values
print("--- Missing Values ---")
print(df.isnull().sum())
print("\n" + "="*50 + "\n")
Output:
--- First 5 Rows of the Dataset ---
Country name | Regional indicator | Ladder score | Standard error of ladder score | upperwhisker | lowerwhisker | Logged GDP per capita | Social support | Healthy life expectancy | Freedom to make life choices | Generosity | Perceptions of corruption | Ladder score in Dystopia | Explained by: Log GDP per capita | Explained by: Social support | Explained by: Healthy life expectancy | Explained by: Freedom to make life choices | Explained by: Generosity | Explained by: Perceptions of corruption | Dystopia + residual |
Finland | Western Europe | 7.842 | 0.032 | 7.904 | 7.78 | 10.775 | 0.954 | 72 | 0.949 | -0.098 | 0.186 | 2.43 | 1.446 | 1.106 | 0.741 | 0.691 | 0.124 | 0.481 | 3.253 |
Denmark | Western Europe | 7.62 | 0.035 | 7.687 | 7.552 | 10.933 | 0.954 | 72.7 | 0.946 | 0.03 | 0.179 | 2.43 | 1.502 | 1.108 | 0.763 | 0.686 | 0.208 | 0.485 | 2.868 |
Switzerland | Western Europe | 7.571 | 0.036 | 7.643 | 7.5 | 11.117 | 0.942 | 74.4 | 0.919 | 0.025 | 0.292 | 2.43 | 1.566 | 1.079 | 0.816 | 0.653 | 0.204 | 0.413 | 2.839 |
Iceland | Western Europe | 7.554 | 0.059 | 7.67 | 7.438 | 10.878 | 0.983 | 73 | 0.955 | 0.16 | 0.673 | 2.43 | 1.482 | 1.172 | 0.772 | 0.698 | 0.293 | 0.17 | 2.967 |
Netherlands | Western Europe | 7.464 | 0.027 | 7.518 | 7.41 | 10.932 | 0.942 | 72.4 | 0.913 | 0.175 | 0.338 | 2.43 | 1.501 | 1.079 | 0.753 | 0.647 | 0.302 | 0.384 | 2.798 |
Popular Data Science Programs
--- Missing Values ---
Column Name | Value |
Country name | 0 |
Regional indicator | 0 |
Ladder score | 0 |
Standard error of ladder score | 0 |
upperwhisker | 0 |
lowerwhisker | 0 |
Logged GDP per capita | 0 |
Social support | 0 |
Healthy life expectancy | 0 |
Freedom to make life choices | 0 |
Generosity | 0 |
Perceptions of corruption | 0 |
Ladder score in Dystopia | 0 |
Explained by: Log GDP per capita | 0 |
Explained by: Social support | 0 |
Explained by: Healthy life expectancy | 0 |
Explained by: Freedom to make life choices | 0 |
Explained by: Generosity | 0 |
Explained by: Perceptions of corruption | 0 |
Dystopia + residual | 0 |
dtype: int64
Also Read: Data Cleaning Techniques: 15 Simple & Effective Ways To Clean Data
Now that the dataset is loaded and cleaned, it’s time to explore the patterns in the data.
You’ll check correlations, create scatter plots, and build an interactive world map.
Check how strongly each feature relates to the happiness score:
# Set the style for the plots
sns.set_style("whitegrid")
# Correlation Analysis
print("--- Correlation Analysis ---")
numeric_df = df.select_dtypes(include=np.number)
correlation_matrix = numeric_df.corr()
# Heatmap
plt.figure(figsize=(16, 12))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Matrix of World Happiness Report 2021 Features', fontsize=18)
plt.show()
# Print sorted correlation with Ladder score
print("Correlation with Ladder score (Happiness Score):")
print(correlation_matrix['Ladder score'].sort_values(ascending=False))
print("\n" + "="*50 + "\n")
Output:
Correlation with Ladder score (Happiness Score):
Metric | Value |
Ladder score | 1 |
lowerwhisker | 0.999396 |
upperwhisker | 0.999347 |
Logged GDP per capita | 0.78976 |
Explained by: Log GDP per capita | 0.789745 |
Explained by: Healthy life expectancy | 0.768138 |
Healthy life expectancy | 0.768099 |
Social support | 0.756888 |
Explained by: Social support | 0.756869 |
Explained by: Freedom to make life choices | 0.607793 |
Freedom to make life choices | 0.607753 |
Dystopia + residual | 0.49201 |
Explained by: Perceptions of corruption | 0.421205 |
Explained by: Generosity | -0.017631 |
Generosity | -0.017799 |
Perceptions of corruption | -0.42114 |
Standard error of ladder score | -0.470787 |
Ladder score in Dystopia | NaN |
Name: Ladder score, dtype: float64
Use scatter plots to see how happiness is linked to individual features:
# GDP vs. Happiness
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Logged GDP per capita', y='Ladder score', data=df,
hue='Regional indicator', palette='viridis', s=80)
plt.title('Happiness Score vs. Logged GDP per Capita', fontsize=16)
plt.xlabel('Logged GDP per Capita')
plt.ylabel('Happiness Score')
plt.legend(title='Region', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
# Social Support vs. Happiness
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Social support', y='Ladder score', data=df,
hue='Regional indicator', palette='plasma', s=80)
plt.title('Happiness Score vs. Social Support', fontsize=16)
plt.xlabel('Social Support')
plt.ylabel('Happiness Score')
plt.legend(title='Region', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
# Healthy Life Expectancy vs. Happiness
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Healthy life expectancy', y='Ladder score', data=df,
hue='Regional indicator', palette='magma', s=80)
plt.title('Happiness Score vs. Healthy Life Expectancy', fontsize=16)
plt.xlabel('Healthy Life Expectancy')
plt.ylabel('Happiness Score')
plt.legend(title='Region', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
Output:
Use Plotly to build an interactive map:
fig = px.choropleth(df,
locations="Country name",
locationmode='country names',
color="Ladder score",
hover_name="Country name",
color_continuous_scale=px.colors.sequential.Plasma,
title='World Happiness Scores 2021')
fig.show()
Output:
Also Read: What They Don't Tell You About Exploratory Data Analysis in Python!
Now that you’ve explored the data, the next goal is to group countries with similar happiness profiles.
Before applying K-Means, you need to select key features and scale them properly.
Here is the code for this step:
# Select features for clustering
features = ['Logged GDP per capita', 'Social support', 'Healthy life expectancy',
'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']
X = df[features]
# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
This prepares your data for clustering in the next step.
Now that your data is scaled, the next step is to decide how many clusters to create.
The Elbow Method helps you find the ideal number of clusters by measuring how compact each group is.
Here is the code for this step:
# a) Finding the Optimal Number of Clusters (Elbow Method)
wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=42)
kmeans.fit(X_scaled)
wcss.append(kmeans.inertia_)
# Plotting the elbow graph
plt.figure(figsize=(10, 5))
plt.plot(range(1, 11), wcss, marker='o', linestyle='--')
plt.title('Elbow Method for Optimal k', fontsize=16)
plt.xlabel('Number of clusters (k)')
plt.ylabel('WCSS')
plt.xticks(range(1, 11))
plt.show()
Output:
Look for the point where the curve starts to bend. This gives you a solid starting point for setting the number of clusters.
Once you've selected the number of clusters using the Elbow Method, the next step is to apply K-Means to group the countries.
Each country is assigned a cluster label based on its happiness-related features.
Here is the code:
# b) Applying K-Means with the Optimal k
optimal_k = 4
kmeans = KMeans(n_clusters=optimal_k, init='k-means++', max_iter=300, n_init=10, random_state=42)
cluster_labels = kmeans.fit_predict(X_scaled)
# Add the cluster labels to the original dataframe
df['Cluster'] = cluster_labels
# c) Visualizing the Clusters using GDP and Social Support
plt.figure(figsize=(12, 8))
sns.scatterplot(x='Logged GDP per capita', y='Social support',
hue='Cluster', data=df, palette='Set1', s=100, alpha=0.8)
plt.title('Country Clusters based on GDP and Social Support', fontsize=16)
plt.xlabel('Logged GDP per Capita', fontsize=12)
plt.ylabel('Social Support', fontsize=12)
plt.legend(title='Cluster')
plt.show()
Output:
This chart helps you see how different countries group together based on their wealth and social support levels.
Also Read: What is Clustering in Machine Learning and Different Types of Clustering Methods
Now that each country is assigned to a cluster, it’s time to understand what each group represents.
You’ll compare average values for key happiness factors across clusters.
Then, you’ll view the clusters on a world map for a global perspective.
Here is the Code for this step:
# d) Analyzing the Clusters
print("\n--- Analysis of Clusters ---")
cluster_analysis = df.groupby('Cluster')[features + ['Ladder score']].mean().sort_values(by='Ladder score', ascending=False)
print(cluster_analysis)
print("\n" + "="*50 + "\n")
# World Map of Clusters
fig_cluster.show()
Output:
--- Analysis of Clusters ---
Cluster | Logged GDP per capita | Social support | Healthy life expectancy | Freedom to make life choices | Generosity | Perceptions of corruption | Ladder score |
0 | 10.87644 | 0.92156 | 72.46796 | 0.8984 | 0.05168 | 0.43536 | 6.94904 |
2 | 9.867868 | 0.856868 | 67.482162 | 0.781706 | -0.114897 | 0.803088 | 5.690441 |
1 | 8.750958 | 0.782167 | 63.030958 | 0.861583 | 0.158292 | 0.7355 | 5.118292 |
3 | 7.889062 | 0.666219 | 55.334313 | 0.676687 | 0.014594 | 0.788875 | 4.402438 |
This project examined the World Happiness Report to determine what causes happiness between nations. By examining attributes such as GDP, social support, and life expectancy, and using K-Means clustering, you separated nations into four clusters. The findings evidenced distinct patterns associating economic and social components with greater scores on happiness. Visualisations served to unveil regional variations and made the conclusions easy to understand.
Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!
Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!
Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!
Colab Link:
https://colab.research.google.com/drive/1nVlQq8muCiZBPPYvcevXfl_07TX7gYd5?usp=sharing
826 articles published
Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...
Speak with Data Science Expert
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources