World Happiness Report Analysis with Python

By Rohit Sharma

Updated on Aug 05, 2025 | 10 min read | 1.25K+ views

Share:

The World Happiness Report ranks the countries based on how people feel about their lives. It utilises data such as income, health, freedom, and support from others.

In this project, you’ll break down that data to find out which factors matter most.
You’ll also group countries with similar happiness patterns using clustering. This helps you see global trends in happiness through simple analysis and visuals.

Upskill in data science with upGrad's Online Data Science Courses. Learn Python, ML, AI, SQL, and Tableau from experts, build real-world skills, and get job-ready.

Looking for some hands-on Python projects to get job-ready? Check this out!: 23+ Data Science Projects in Python for Freshers and Experts to Succeed in 2025

Getting Your Project Done Right: What You Need

Before you start working on the World Happiness Report analysis, make sure you're comfortable with these tools and concepts:

  • Python programming (You’ll use Python throughout for data processing, visualisation, and modelling.)
  • Pandas and Numpy (These libraries help you clean, explore, and structure the dataset for analysis.)
  • Matplotlib or Seaborn (You’ll use these tools to create heatmaps, scatter plots, and interactive maps)
  • Scikit‑learn basics (You’ll need this to prepare your data before applying clustering)
  • Correlation analysis (Understanding how different features relate to the happiness score is a key part of this project)
  • K-Means clustering (This helps you group countries based on similar happiness-related factors.)

Advance your data science career with upGrad's top courses and industry mentors.

Behind the Scenes: How To Do World Happiness Report Analysis

To analyse and visualise the World Happiness Report, you’ll use a set of Python libraries focused on data handling, visualisation, and clustering:

Tool / Library

Purpose

Python Core language for scripting and data analysis
Pandas Loads, cleans, and explores the dataset
NumPy Supports numerical operations and calculations
Matplotlib / Seaborn Creates static visualisations like scatter plots and heatmaps
Plotly Builds interactive world maps and charts
Scikit-learn Standardises data and applies K-Means clustering
K-Means Group countries based on similar happiness factors

Are you new to Python? This course can help you enhance your skills for free -  Learn Basic Python Programming

Time Required to Complete the Project

You can complete this World Happiness Report Analysis project in about 2 to 3 hours.

It’s a good fit if you’re comfortable with Python and want to practice real-world data analysis.

Smart Insights: Techniques That Power World Happiness Report Analysis

To get the most out of the World Happiness Report, you’ll apply these key data science techniques:

  • Exploratory Data Analysis (EDA):
    Explore patterns and trends in happiness scores and related factors like GDP, social support, and life expectancy.
  • Correlation Analysis:
    Identify which features have the strongest relationship with happiness across countries.
  • Data Visualisation:
    Use scatter plots, heatmaps, and interactive maps to make the data easier to understand.
  • K-Means Clustering:
    Group countries into clusters based on similar characteristics to find patterns in happiness levels.

Also Read: Exploratory Data Analysis: Role & Techniques for Business Insights

Let’s build this project from scratch with clear, step-by-step guidance:

  1. Load the Dataset
  2. Check for Missing Values
  3. Explore the Data (EDA) 
  4. Visualise Key Factors 
  5. Map Global Happiness Scores
  6. Standardise Data for Clustering
  7. Apply K-Means Clustering
  8. Visualize Clusters

Without any further delay, let’s get started!

Step 1: Download the Dataset

To start the analysis, first you need to download the dataset, which is available on the internet for free. You can also download from Kaggle by searching for your project name.

Step 2:  Import Required Libraries

To begin your analysis, you need to import all the necessary Python libraries. These libraries will help you load data, create visualisations, perform clustering, and analyse results.

Here’s the list of tools you’ll use:

# main.py
import pandas as pd                  # For loading and handling data
import numpy as np                   # For numerical operations
import matplotlib.pyplot as plt      # For creating plots
import seaborn as sns                # For advanced visualizations
from sklearn.cluster import KMeans   # For clustering countries
from sklearn.preprocessing import StandardScaler  # For scaling features
import plotly.express as px          # For interactive charts and maps

Also Read: Libraries in Python Explained: List of Important Libraries

Step 3:  Load the Dataset and Check Missing Values

Now that your tools are ready, the next step is to load the dataset and understand its structure.

You’ll start by reading the CSV file and checking the first few rows.

Then, you'll check for any missing values that might need cleaning later.

# --- 1. Data Loading and Initial Exploration ---

# Load the dataset
try:
    df = pd.read_csv('world-happiness-report-2021.csv')
except FileNotFoundError:
    print("Error: 'world-happiness-report-2021.csv' not found.")
    print("Please make sure the dataset file is in the same directory as the script.")
    exit()

# Display the first few rows of the dataframe
print("--- First 5 Rows of the Dataset ---")
print(df.head())
print("\n" + "="*50 + "\n")

# Check for missing values
print("--- Missing Values ---")
print(df.isnull().sum())
print("\n" + "="*50 + "\n")

Output:

--- First 5 Rows of the Dataset ---

Country name Regional indicator Ladder score Standard error of ladder score upperwhisker lowerwhisker Logged GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption Ladder score in Dystopia Explained by: Log GDP per capita Explained by: Social support Explained by: Healthy life expectancy Explained by: Freedom to make life choices Explained by: Generosity Explained by: Perceptions of corruption Dystopia + residual
Finland Western Europe 7.842 0.032 7.904 7.78 10.775 0.954 72 0.949 -0.098 0.186 2.43 1.446 1.106 0.741 0.691 0.124 0.481 3.253
Denmark Western Europe 7.62 0.035 7.687 7.552 10.933 0.954 72.7 0.946 0.03 0.179 2.43 1.502 1.108 0.763 0.686 0.208 0.485 2.868
Switzerland Western Europe 7.571 0.036 7.643 7.5 11.117 0.942 74.4 0.919 0.025 0.292 2.43 1.566 1.079 0.816 0.653 0.204 0.413 2.839
Iceland Western Europe 7.554 0.059 7.67 7.438 10.878 0.983 73 0.955 0.16 0.673 2.43 1.482 1.172 0.772 0.698 0.293 0.17 2.967
Netherlands Western Europe 7.464 0.027 7.518 7.41 10.932 0.942 72.4 0.913 0.175 0.338 2.43 1.501 1.079 0.753 0.647 0.302 0.384 2.798

--- Missing Values ---

Column Name Value
Country name 0
Regional indicator 0
Ladder score 0
Standard error of ladder score 0
upperwhisker 0
lowerwhisker 0
Logged GDP per capita 0
Social support 0
Healthy life expectancy 0
Freedom to make life choices 0
Generosity 0
Perceptions of corruption 0
Ladder score in Dystopia 0
Explained by: Log GDP per capita 0
Explained by: Social support 0
Explained by: Healthy life expectancy 0
Explained by: Freedom to make life choices 0
Explained by: Generosity 0
Explained by: Perceptions of corruption 0
Dystopia + residual 0

dtype: int64

Also Read: Data Cleaning Techniques: 15 Simple & Effective Ways To Clean Data

Step 4:  Explore the Data with Visual Analysis

Now that the dataset is loaded and cleaned, it’s time to explore the patterns in the data.

You’ll check correlations, create scatter plots, and build an interactive world map.

4.1 Correlation Analysis

Check how strongly each feature relates to the happiness score:

# Set the style for the plots
sns.set_style("whitegrid")

# Correlation Analysis
print("--- Correlation Analysis ---")
numeric_df = df.select_dtypes(include=np.number)
correlation_matrix = numeric_df.corr()

# Heatmap
plt.figure(figsize=(16, 12))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Matrix of World Happiness Report 2021 Features', fontsize=18)
plt.show()

# Print sorted correlation with Ladder score
print("Correlation with Ladder score (Happiness Score):")
print(correlation_matrix['Ladder score'].sort_values(ascending=False))
print("\n" + "="*50 + "\n")

Output:

Correlation with Ladder score (Happiness Score):

Metric Value
Ladder score 1
lowerwhisker 0.999396
upperwhisker 0.999347
Logged GDP per capita 0.78976
Explained by: Log GDP per capita 0.789745
Explained by: Healthy life expectancy 0.768138
Healthy life expectancy 0.768099
Social support 0.756888
Explained by: Social support 0.756869
Explained by: Freedom to make life choices 0.607793
Freedom to make life choices 0.607753
Dystopia + residual 0.49201
Explained by: Perceptions of corruption 0.421205
Explained by: Generosity -0.017631
Generosity -0.017799
Perceptions of corruption -0.42114
Standard error of ladder score -0.470787
Ladder score in Dystopia NaN

Name: Ladder score, dtype: float64

4.2 Visualising Key Factors

Use scatter plots to see how happiness is linked to individual features:

# GDP vs. Happiness
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Logged GDP per capita', y='Ladder score', data=df,
                hue='Regional indicator', palette='viridis', s=80)
plt.title('Happiness Score vs. Logged GDP per Capita', fontsize=16)
plt.xlabel('Logged GDP per Capita')
plt.ylabel('Happiness Score')
plt.legend(title='Region', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

# Social Support vs. Happiness
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Social support', y='Ladder score', data=df,
                hue='Regional indicator', palette='plasma', s=80)
plt.title('Happiness Score vs. Social Support', fontsize=16)
plt.xlabel('Social Support')
plt.ylabel('Happiness Score')
plt.legend(title='Region', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

# Healthy Life Expectancy vs. Happiness
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Healthy life expectancy', y='Ladder score', data=df,
                hue='Regional indicator', palette='magma', s=80)
plt.title('Happiness Score vs. Healthy Life Expectancy', fontsize=16)
plt.xlabel('Healthy Life Expectancy')
plt.ylabel('Happiness Score')
plt.legend(title='Region', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

Output:
 

background

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

4.3 World Map of Happiness Scores

Use Plotly to build an interactive map:

fig = px.choropleth(df,
                    locations="Country name",
                    locationmode='country names',
                    color="Ladder score",
                    hover_name="Country name",
                    color_continuous_scale=px.colors.sequential.Plasma,
                    title='World Happiness Scores 2021')
fig.show()

Output:

Also Read: What They Don't Tell You About Exploratory Data Analysis in Python!

Step 5: Prepare Data for Clustering

Now that you’ve explored the data, the next goal is to group countries with similar happiness profiles.

Before applying K-Means, you need to select key features and scale them properly.

Here is the code for this step:

# Select features for clustering
features = ['Logged GDP per capita', 'Social support', 'Healthy life expectancy',
            'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']
X = df[features]

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

This prepares your data for clustering in the next step.

Step 6:  Find the Optimal Number of Clusters (Elbow Method)

Now that your data is scaled, the next step is to decide how many clusters to create.

The Elbow Method helps you find the ideal number of clusters by measuring how compact each group is.

Here is the code for this step:

# a) Finding the Optimal Number of Clusters (Elbow Method)
wcss = []

for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=42)
    kmeans.fit(X_scaled)
    wcss.append(kmeans.inertia_)

# Plotting the elbow graph
plt.figure(figsize=(10, 5))
plt.plot(range(1, 11), wcss, marker='o', linestyle='--')
plt.title('Elbow Method for Optimal k', fontsize=16)
plt.xlabel('Number of clusters (k)')
plt.ylabel('WCSS')
plt.xticks(range(1, 11))
plt.show()

Output:

Look for the point where the curve starts to bend. This gives you a solid starting point for setting the number of clusters.

Step 7:  Apply K-Means Clustering and Visualise the Results

Once you've selected the number of clusters using the Elbow Method, the next step is to apply K-Means to group the countries.

Each country is assigned a cluster label based on its happiness-related features.

Here is the code:

# b) Applying K-Means with the Optimal k
optimal_k = 4
kmeans = KMeans(n_clusters=optimal_k, init='k-means++', max_iter=300, n_init=10, random_state=42)
cluster_labels = kmeans.fit_predict(X_scaled)

# Add the cluster labels to the original dataframe
df['Cluster'] = cluster_labels

# c) Visualizing the Clusters using GDP and Social Support
plt.figure(figsize=(12, 8))
sns.scatterplot(x='Logged GDP per capita', y='Social support',
                hue='Cluster', data=df, palette='Set1', s=100, alpha=0.8)
plt.title('Country Clusters based on GDP and Social Support', fontsize=16)
plt.xlabel('Logged GDP per Capita', fontsize=12)
plt.ylabel('Social Support', fontsize=12)
plt.legend(title='Cluster')
plt.show()

Output:

This chart helps you see how different countries group together based on their wealth and social support levels.

Also Read: What is Clustering in Machine Learning and Different Types of Clustering Methods

Step 8: Analyse and Interpret the Clusters

Now that each country is assigned to a cluster, it’s time to understand what each group represents.
You’ll compare average values for key happiness factors across clusters.
Then, you’ll view the clusters on a world map for a global perspective.

Here is the Code for this step:

# d) Analyzing the Clusters
print("\n--- Analysis of Clusters ---")
cluster_analysis = df.groupby('Cluster')[features + ['Ladder score']].mean().sort_values(by='Ladder score', ascending=False)
print(cluster_analysis)
print("\n" + "="*50 + "\n")

# World Map of Clusters
fig_cluster.show()

Output:

--- Analysis of Clusters ---

Cluster Logged GDP per capita Social support Healthy life expectancy Freedom to make life choices Generosity Perceptions of corruption Ladder score
0 10.87644 0.92156 72.46796 0.8984 0.05168 0.43536 6.94904
2 9.867868 0.856868 67.482162 0.781706 -0.114897 0.803088 5.690441
1 8.750958 0.782167 63.030958 0.861583 0.158292 0.7355 5.118292
3 7.889062 0.666219 55.334313 0.676687 0.014594 0.788875 4.402438

Final Conclusion

This project examined the World Happiness Report to determine what causes happiness between nations. By examining attributes such as GDP, social support, and life expectancy, and using K-Means clustering, you separated nations into four clusters. The findings evidenced distinct patterns associating economic and social components with greater scores on happiness. Visualisations served to unveil regional variations and made the conclusions easy to understand.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Colab Link:
https://colab.research.google.com/drive/1nVlQq8muCiZBPPYvcevXfl_07TX7gYd5?usp=sharing

Frequently Asked Questions (FAQs)

1. What is the goal of this project?

2. Why is clustering used in this project?

3. How many clusters were used and why?

4. What features had the strongest link to happiness?

5. What libraries were used in this project?

Rohit Sharma

826 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

360° Career Support

Executive PG Program

12 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree

17 Months

upGrad Logo

Certification

3 Months