Home
Blog
Data Science
COVID-19 Project: Data Visualization & Insights

COVID-19 Project: Data Visualization & Insights

Updated on Jul 24, 2025 | 17 min read | 1.97K+ views

Table of Contents

View all

Ready to Dive In? Here's What You Need
The Tools That Made It Happen
Model Selection: Our Choices and Why
Time Commitment & Skill Level
How to Build a COVID-19 Project
Conclusion: What We Learned from This Project

The COVID-19 pandemic touched every region of the globe, but behind the news were enormous quantities of data. In this project, we will perform COVID-19 data analysis
and going to make that data tangible.

If you're a data science newcomer or just want to tune up your skills, this blog will guide you through working with actual public health data, visualizing trends with interactive graphs, and even tracing the virus's spread across regions.

If you're looking to accelerate your data science journey, check out the Online Data Science Courses at upGrad. The programs help you learn Python, Machine Learning, AI, Tableau, SQL, and more from top-tier faculty. Enroll today!

Spark your next big idea. Browse our full collection of data science projects in Python.

Popular Data Science Programs

MSc in Data Science Program PGD in Data Science Data Science Advanced Course M Sc in Data Science Degree DevOps Course Online

Ready to Dive In? Here's What You Need

It’s helpful to have some basic knowledge of the following before starting this project:

Python programming (variables, functions, loops, basic syntax)
Pandas and Numpy (for handling and analyzing data)
Matplotlib or Seaborn (for creating charts and visualizing trends)
Ploty (Creating interactive line charts, scatter plots, and subplots)
Geospatial Tools (Folium / GeoPandas) (Basic idea of plotting on maps)
Date and time handling (using datetime in pandas)

Start your journey of career advancement in data science with upGrad’s top-ranked courses and get a chance to learn from industry-established mentors:

The Tools That Made It Happen

For this COVID-19 Project, the following tools and libraries will be used:

Skill Area	Purpose
Python Programming	You'll be writing Python code to clean, explore, and visualize data.
Pandas & NumPy	These libraries help in cleaning and analyzing large datasets efficiently.
Matplotlib & Seaborn	Useful for quick visualizations and exploring trends in the data.
Plotly	This project uses Plotly to build rich, interactive dashboards.
Geospatial Tools (Folium / GeoPandas)	Used for mapping how the virus spread across regions or countries.
Jupyter/Colab Environment (Optional)	Makes it easier to test code and view visualizations inline.

Data Science Courses to upskill

Explore Data Science Courses for Career Progression

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree18 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

Model Selection: Our Choices and Why

In this COVID-19 Project, you're not predicting future trends, but instead learning to explore, analyze, and visualize real-world health data effectively using the following tools and techniques:

Data Cleaning & Manipulation with Pandas
You’ll learn how to handle messy, real-world datasets, fixing date formats, filling missing values, and reshaping data to make it ready for analysis.
Data Visualization with Matplotlib & Seaborn
Create clear charts like bar graphs, line plots, and heatmaps to explore COVID-19 trends such as case spikes, death rates, and recoveries.
Interactive Dashboards with Plotly
Learn to build interactive visuals that respond to user input — such as clickable country-wise maps or timelines showing case growth over months.
Geospatial Mapping (Optional: with Folium or GeoPandas)
This lets you visualize how the virus spread across countries or regions on real-world maps.

Time Commitment & Skill Level

You can complete the COVID-19 project in 2 to 3 hours. It’s a beginner-friendly yet impactful project that helps you learn how to work with real-world public health data, create insightful visualizations, and build interactive charts and maps using Python.

How to Build a COVID-19 Project

Let’s start building the project from scratch. We'll go step-by-step through the process of:

Download the Dataset
Cleaning and preparing the data
Exploratory Data Analysis (EDA)
Interactive Visualizations with Plotly
Advanced Interactive Dashboard

Without any further delay, let’s get started!

Step 1: Download the Dataset

To build our COVID-19 Project, we’ll use a publicly available dataset from Kaggle. This dataset includes real-world COVID-19 statistics such as daily confirmed cases, deaths, recoveries, and testing rates across different countries and periods.

Follow the steps below to download the dataset:

Open a new tab in your web browser.
Go to: Kaggle
Search for the dataset and click the Download button to download the dataset as a .zip file.
Once downloaded, extract the ZIP file.
We’ll use this CSV file for the project.

Now that you’ve downloaded the dataset, let’s move on to the next step, uploading and loading it into Google Colab.

Step 2: Upload and Read the Dataset in Google Colab

Now that you have downloaded both files, upload them to Google Colab using the code below:

from google.colab import files
uploaded = files.upload()

Once uploaded, import the required libraries and use the following Python code to read and check the data:

# Import all necessary libraries for data analysis and visualization
import pandas as pd                 # For data manipulation
import numpy as np                  # For numerical operations
import matplotlib.pyplot as plt     # For basic plotting
import seaborn as sns              # For statistical visualizations
import plotly.express as px        # For interactive visualizations
import plotly.graph_objects as go  # For custom interactive plots
import plotly.offline as pyo       # For offline plotting
from plotly.subplots import make_subplots  # For multiple subplots
# Install required packages if not already installed
# Run these in separate cells if needed:
# !pip install plotly
# !pip install folium
# !pip install geopandas
# Load the COVID-19 dataset
df = pd.read_csv('country_wise_latest.csv')
# Display basic information about the dataset
print("\nFirst 5 rows:")
print(df.head())

Output :

First 5 rows:

Country/Region Confirmed Deaths Recovered Active New cases New deaths \

0 Afghanistan 36263 1269 25198 9796 106 10

1 Albania 4880 144 2745 1991 117 6

2 Algeria 27973 1163 18837 7973 616 8

3 Andorra 907 52 803 52 10 0

4 Angola 950 41 242 667 18 1

New recovered Deaths / 100 Cases Recovered / 100 Cases \

0 18 3.50 69.49

1 63 2.95 56.25

2 749 4.16 67.34

3 0 5.73 88.53

4 0 4.32 25.47

Deaths / 100 Recovered Confirmed last week 1 week change \

0 5.04 35526 737

1 5.25 4171 709

2 6.17 23691 4282

3 6.48 884 23

4 16.94 749 201

1 week % increase WHO Region

0 2.07 Eastern Mediterranean

1 17.00 Europe

2 18.07 Africa

3 2.60 Europe

4 26.84 Africa

Step 3: Clean and Prepare the Data

Before visualizing COVID-19 data, it’s crucial to clean and prepare the dataset. This includes handling missing values, calculating useful metrics like death and recovery rates, and formatting country names for mapping and analysis.

Here is the code:

# Step 1: Check for missing values
print("Missing values in each column:")
print(df.isnull().sum())
# Step 2: Handle missing values and correct data types
# Replace any missing values in key numerical columns with 0
numerical_cols = ['Confirmed', 'Deaths', 'Recovered', 'Active', 'New cases', 
                  'New deaths', 'New recovered']
for col in numerical_cols:
    df[col] = df[col].fillna(0)
# Step 3: Create additional calculated columns for deeper analysis
# Calculate death rate as a percentage of confirmed cases
df['Death_Rate'] = (df['Deaths'] / df['Confirmed']) * 100
# Calculate recovery rate as a percentage of confirmed cases
df['Recovery_Rate'] = (df['Recovered'] / df['Confirmed']) * 100
# Calculate active case rate as a percentage of confirmed cases
df['Active_Rate'] = (df['Active'] / df['Confirmed']) * 100
# Step 4: Clean country names for mapping compatibility
# Remove asterisks (if any) from country names
df['Country/Region'] = df['Country/Region'].str.replace('*', '', regex=False)
# Display a preview of the cleaned and enriched dataset
print("Cleaned Dataset Info:")
print(df[['Country/Region', 'Confirmed', 'Deaths', 'Recovered', 'Death_Rate', 'Recovery_Rate']].head())

Output:

Column Name	Missing Values
Country/Region	0
Confirmed	0
Deaths	0
Recovered	0
Active	0
New cases	0
New deaths	0
New recovered	0
Deaths / 100 Cases	0
Recovered / 100 Cases	0
Deaths / 100 Recovered	0
Confirmed last week	0
1 week change	0
1 week % increase	0
WHO Region	0

Cleaned Data Preview :

Country/Region	Confirmed	Deaths	Recovered	Death_Rate (%)	Recovery_Rate (%)
Afghanistan	36,263	1,269	25,198	3.50	69.49
Albania	4,880	144	2,745	2.95	56.25
Algeria	27,973	1,163	18,837	4.16	67.34
Andorra	907	52	803	5.73	88.53
Angola	950	41	242	4.32	25.47

Now that our Real-world health project data is clean and sorted, our very first task is to perform EDA.

Step 4: Exploratory Data Analysis (EDA) for COVID-19 Dataset

In this step, we’ll perform a comprehensive visual analysis of the COVID-19 dataset to uncover global trends. We’ll examine the countries most affected, distribution of recovery rates, and the relationship between confirmed cases and death rate.

Here is the code:

# Create comprehensive exploratory analysis using visualizations
# Set the figure size
plt.figure(figsize=(15, 8))
# 1. Top 10 countries by confirmed COVID-19 cases
top_10_confirmed = df.nlargest(10, 'Confirmed')
plt.subplot(2, 2, 1)
plt.barh(top_10_confirmed['Country/Region'], top_10_confirmed['Confirmed'], color='red', alpha=0.7)
plt.title('Top 10 Countries by Confirmed Cases')
plt.xlabel('Confirmed Cases')
# 2. Top 10 countries by COVID-19 deaths
top_10_deaths = df.nlargest(10, 'Deaths')
plt.subplot(2, 2, 2)
plt.barh(top_10_deaths['Country/Region'], top_10_deaths['Deaths'], color='black', alpha=0.7)
plt.title('Top 10 Countries by Deaths')
plt.xlabel('Deaths')
# 3. Histogram showing the distribution of recovery rates across countries
plt.subplot(2, 2, 3)
plt.hist(df['Recovery_Rate'].dropna(), bins=30, color='green', alpha=0.7, edgecolor='black')
plt.title('Distribution of Recovery Rates')
plt.xlabel('Recovery Rate (%)')
plt.ylabel('Frequency')
# 4. Scatter plot comparing confirmed cases and death rate
plt.subplot(2, 2, 4)
plt.scatter(df['Confirmed'], df['Death_Rate'], alpha=0.6, color='purple')
plt.title('Death Rate vs Confirmed Cases')
plt.xlabel('Confirmed Cases')
plt.ylabel('Death Rate (%)')
plt.xscale('log')  # Log scale to handle wide range of confirmed cases
# Adjust layout to prevent overlap
plt.tight_layout()
plt.show()
# Print key global statistics
print("Global COVID-19 Statistics:")
print(f"Total Confirmed Cases: {df['Confirmed'].sum():,}")
print(f"Total Deaths: {df['Deaths'].sum():,}")
print(f"Total Recovered: {df['Recovered'].sum():,}")
print(f"Global Death Rate: {(df['Deaths'].sum() / df['Confirmed'].sum() * 100):.2f}%")
print(f"Global Recovery Rate: {(df['Recovered'].sum() / df['Confirmed'].sum() * 100):.2f}%")

Output:

Global COVID-19 Statistics:

Total Confirmed Cases: 16,480,485

Total Deaths: 654,036

Total Recovered: 9,468,087

Global Death Rate: 3.97%

Global Recovery Rate: 57.45%

Step 5: Interactive Visualizations for COVID-19 Insights

In this section, we’ll create interactive visualizations using Plotly to explore COVID-19 data across countries and WHO regions. These dynamic charts help reveal deeper insights such as country-wise confirmed cases and how recovery and death rates vary globally.

Here is the code:

# Create interactive visualizations for dynamic COVID-19 data insights
# 1. Interactive bar chart for the top 20 countries with highest confirmed cases
top_20_countries = df.nlargest(20, 'Confirmed')  # Get top 20 rows with highest confirmed cases
fig1 = px.bar(
    top_20_countries,                     # Data to plot
    x='Country/Region',                   # Countries on X-axis
    y='Confirmed',                        # Confirmed cases on Y-axis
    color='Deaths',                       # Color bar by number of deaths
    title='Top 20 Countries by Confirmed COVID-19 Cases',
    hover_data=['Deaths', 'Recovered', 'Active'],  # Extra info when hovering over bars
    color_continuous_scale='Reds'         # Red color gradient for deaths
)
# Customize layout for better readability
fig1.update_layout(
    xaxis_tickangle=-45,                 # Rotate country names on X-axis
    height=600,
    xaxis_title="Country",
    yaxis_title="Confirmed Cases"
)
# Show the interactive bar chart
fig1.show()
# 2. Interactive scatter plot comparing Recovery Rate vs Death Rate
fig2 = px.scatter(
    df,                                  # Full dataset
    x='Recovery_Rate',                   # Recovery rate on X-axis
    y='Death_Rate',                      # Death rate on Y-axis
    size='Confirmed',                    # Bubble size indicates total confirmed cases
    color='WHO Region',                  # Color based on WHO region
    hover_name='Country/Region',         # Show country name on hover
    hover_data=['Confirmed', 'Deaths', 'Recovered'],  # Extra info on hover
    title='COVID-19: Death Rate vs Recovery Rate by WHO Region',
    labels={
        'Recovery_Rate': 'Recovery Rate (%)',
        'Death_Rate': 'Death Rate (%)'
    }
)
# Adjust chart height for better display
fig2.update_layout(height=600)
# Show the interactive scatter plot
fig2.show()

Output:

Note- The charts above are originally interactive and dynamic. However, they are shown here as static images for display purposes. In a real project or dashboard, you can hover, zoom, and filter data directly on these plots for deeper exploration.

Step 5.1: Interactive Bar Charts

In this section, we use Plotly to build interactive visualizations for deeper insights into COVID-19 trends by WHO regions and global distribution. These Bar charts help users visually compare the confirmed, death, and recovery numbers across various regions and countries.

Here is the Code:

# Create interactive multi-metric visualization
# Using the WHO region data for comparative analysis
# 3. Regional analysis - Aggregate key metrics by WHO Region
# We'll sum up total Confirmed, Deaths, Recovered, and Active cases for each region
regional_data = df.groupby('WHO Region').agg({
    'Confirmed': 'sum',
    'Deaths': 'sum',
    'Recovered': 'sum',
    'Active': 'sum'
}).reset_index()
# Create a multi-bar chart using Plotly's go.Figure
fig3 = go.Figure()
# Add bar trace for confirmed cases
fig3.add_trace(go.Bar(
    name='Confirmed', 
    x=regional_data['WHO Region'], 
    y=regional_data['Confirmed'], 
    marker_color='blue'
))
# Add bar trace for deaths
fig3.add_trace(go.Bar(
    name='Deaths', 
    x=regional_data['WHO Region'], 
    y=regional_data['Deaths'], 
    marker_color='red'
))
# Add bar trace for recovered
fig3.add_trace(go.Bar(
    name='Recovered', 
    x=regional_data['WHO Region'], 
    y=regional_data['Recovered'], 
    marker_color='green'
))
# Customize the layout
fig3.update_layout(
    title='COVID-19 Cases by WHO Region',
    xaxis_title='WHO Region',
    yaxis_title='Number of Cases',
    barmode='group',            # Group bars next to each other
    height=600,
    hovermode='x unified'       # Unified tooltip on hover for better comparison
)
# Show the interactive chart
fig3.show()
# 4. Interactive pie chart for global distribution of confirmed cases (Top 10 countries)
# We use top_10_confirmed (already defined earlier using df.nlargest(10, 'Confirmed'))
fig4 = px.pie(
    top_10_confirmed, 
    values='Confirmed', 
    names='Country/Region',
    title='Global COVID-19 Cases Distribution (Top 10 Countries)',
    hover_data=['Deaths', 'Recovered']  # Show more info when hovered
)
# Customize labels and layout
fig4.update_traces(
    textposition='inside', 
    textinfo='percent+label'  # Show both percent and country name
)
# Display the interactive pie chart
fig4.show()

Output:

Step 6: COVID-19 Comprehensive Dashboard with Subplots

In this final step, we create an interactive dashboard that combines multiple visualizations into a single layout. Using Plotly subplots, this dashboard helps analyze the pandemic's impact by region, country, and rate metrics—all at once.

Here is the Code:

# Create a comprehensive dashboard with multiple subplots
# This combines bar, scatter, and pie charts into a single interactive layout
from plotly.subplots import make_subplots
import plotly.graph_objects as go
# Initialize subplot layout with 2 rows × 2 columns
# Each cell will host a different type of chart
fig5 = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        'Cases by Region', 
        'Top 10 Countries (Sample: Top 5)', 
        'Death vs Recovery Rate', 
        'Case Distribution (Top 5 Countries)'
    ),
    specs=[[{"type": "bar"}, {"type": "bar"}],     # Row 1: bar charts
           [{"type": "scatter"}, {"type": "pie"}]] # Row 2: scatter and pie
)
# Subplot 1: Bar chart - Total confirmed cases by WHO Region
fig5.add_trace(
    go.Bar(
        x=regional_data['WHO Region'], 
        y=regional_data['Confirmed'], 
        name='Confirmed by Region'
    ),
    row=1, col=1
)
# Subplot 2: Bar chart - Top 5 countries with most confirmed cases
fig5.add_trace(
    go.Bar(
        x=top_10_confirmed['Country/Region'][:5], 
        y=top_10_confirmed['Confirmed'][:5], 
        name='Top 5 Countries'
    ),
    row=1, col=2
)
# Subplot 3: Scatter plot - Death Rate vs Recovery Rate
fig5.add_trace(
    go.Scatter(
        x=df['Recovery_Rate'], 
        y=df['Death_Rate'], 
        mode='markers',
        text=df['Country/Region'],              # Country name as hover text
        marker=dict(
            size=df['Confirmed'] / 10000,       # Bubble size based on confirmed cases
            color='red', 
            opacity=0.6
        ),
        name='Death vs Recovery'
    ),
    row=2, col=1
)
# Subplot 4: Pie chart - Distribution of confirmed cases (Top 5 countries)
fig5.add_trace(
    go.Pie(
        labels=top_10_confirmed['Country/Region'][:5], 
        values=top_10_confirmed['Confirmed'][:5], 
        name='Distribution'
    ),
    row=2, col=2
)
# Final layout adjustments
fig5.update_layout(
    height=800,
    showlegend=False,                      # Hide legend for cleaner look
    title_text="COVID-19 Comprehensive Dashboard"
)

# Display the dashboard

fig5.show()

Output:

Note- This interactive dashboard allows dynamic zoom, pan, and hover interactions. It’s ideal for use in Jupyter/Colab notebooks, Dash apps, or Streamlit.

Here you're seeing a static version; it may not reflect the true interactivity.

Step 6.1: Geospatial Mapping for Disease Spread Analysis with Ploty

To visualize the global spread of COVID-19, we use choropleth maps that color countries based on the number of confirmed cases and death rates. For this, we manually map country names to their ISO-3 codes, which Plotly uses to identify countries.

Here is the code:

# Create interactive world map showing COVID-19 spread
# Choropleth maps require ISO 3-letter country codes
# Mapping country names to their ISO codes (for visualization)
country_codes = {
    'US': 'USA', 'Brazil': 'BRA', 'India': 'IND', 'Russia': 'RUS', 'Peru': 'PER',
    'Chile': 'CHL', 'United Kingdom': 'GBR', 'Iran': 'IRN', 'Germany': 'DEU', 'Turkey': 'TUR',
    'Bangladesh': 'BGD', 'France': 'FRA', 'Saudi Arabia': 'SAU', 'Italy': 'ITA', 'Pakistan': 'PAK',
    'Spain': 'ESP', 'Mexico': 'MEX', 'South Africa': 'ZAF', 'Canada': 'CAN', 'Qatar': 'QAT',
    'China': 'CHN', 'Egypt': 'EGY', 'Sweden': 'SWE', 'Belarus': 'BLR', 'Belgium': 'BEL',
    'Ecuador': 'ECU', 'Kazakhstan': 'KAZ', 'Indonesia': 'IDN', 'UAE': 'ARE', 'Portugal': 'PRT',
    'Netherlands': 'NLD', 'Singapore': 'SGP', 'Kuwait': 'KWT', 'Ukraine': 'UKR', 'Philippines': 'PHL',
    'Argentina': 'ARG', 'Afghanistan': 'AFG', 'Japan': 'JPN', 'Poland': 'POL', 'Romania': 'ROU',
    'Israel': 'ISR', 'Switzerland': 'CHE', 'Thailand': 'THA', 'Armenia': 'ARM', 'Nigeria': 'NGA',
    'Bahrain': 'BHR', 'Iraq': 'IRQ', 'Azerbaijan': 'AZE', 'Dominican Republic': 'DOM', 'Panama': 'PAN',
    'Bolivia': 'BOL', 'Ireland': 'IRL', 'South Korea': 'KOR', 'Austria': 'AUT', 'Serbia': 'SRB',
    'Oman': 'OMN', 'Czech Republic': 'CZE', 'Moldova': 'MDA', 'Denmark': 'DNK', 'Guatemala': 'GTM'
}
# Add ISO-3 codes as a new column for mapping
df['iso_code'] = df['Country/Region'].map(country_codes)
# ---------------------------------------------
# Choropleth Map 1: Confirmed COVID-19 Cases
# ---------------------------------------------
fig6 = px.choropleth(
    df,
    locations='iso_code',                  # ISO-3 country codes
    color='Confirmed',                     # Color scale based on confirmed cases
    hover_name='Country/Region',           # Hover label
    hover_data=['Deaths', 'Recovered', 'Death_Rate'],  # Extra info on hover
    color_continuous_scale='Reds',
    title='Global COVID-19 Confirmed Cases Distribution'
)
# Update map layout
fig6.update_layout(
    height=600,
    geo=dict(
        showframe=False,
        showcoastlines=True,
        projection_type='natural earth'     # Natural Earth projection
    )
)
fig6.show()
# ---------------------------------------------
#  Choropleth Map 2: COVID-19 Death Rate (%)
# ---------------------------------------------
fig7 = px.choropleth(
    df,
    locations='iso_code',
    color='Death_Rate',                    # Color scale based on death rate %
    hover_name='Country/Region',
    hover_data=['Confirmed', 'Deaths', 'Recovered'],
    color_continuous_scale='Oranges',
    title='Global COVID-19 Death Rate Distribution (%)'
)
# Update map layout
fig7.update_layout(
    height=600,
    geo=dict(
        showframe=False,
        showcoastlines=True,
        projection_type='natural earth'
    )
)
fig7.show()

Output:

Note- These choropleth maps are fully interactive, allowing zoom, pan, and hover. Here they are displayed as static images. To experience their full functionality, run them in a Jupyter Notebook, Google Colab, or Streamlit dashboard.

Step 6.2: Advanced Geospatial Analysis with Folium

This interactive map uses Folium, a Python mapping library, to visualize COVID-19's global impact. Countries are represented using circle markers, where:

Size is proportional to confirmed cases.
Color indicates the death rate:
- Green: Low (< 2%)
- Orange: Moderate (2–5%)
- Red: High (> 5%)

Here is the Code:

# Install folium if not already installed
# !pip install folium
import folium
from folium import plugins
# Create a base world map centered at lat=20, lon=0
world_map = folium.Map(location=[20, 0], zoom_start=2, tiles='OpenStreetMap')
# Coordinates for major countries (used if latitude/longitude not in dataset)
coordinates = {
    'US': [39.8283, -98.5795], 'Brazil': [-14.2350, -51.9253], 'India': [20.5937, 78.9629],
    'Russia': [61.5240, 105.3188], 'Peru': [-9.1900, -75.0152], 'Chile': [-35.6751, -71.5430],
    'United Kingdom': [55.3781, -3.4360], 'Iran': [32.4279, 53.6880], 'Germany': [51.1657, 10.4515],
    'Turkey': [38.9637, 35.2433], 'Bangladesh': [23.6850, 90.3563], 'France': [46.6034, 1.8883],
    'Saudi Arabia': [23.8859, 45.0792], 'Italy': [41.8719, 12.5674], 'Pakistan': [30.3753, 69.3451],
    'Spain': [40.4637, -3.7492], 'Mexico': [23.6345, -102.5528], 'South Africa': [-30.5595, 22.9375],
    'Canada': [56.1304, -106.3468], 'China': [35.8617, 104.1954]
}
# Add latitude and longitude to the DataFrame
df['lat'] = df['Country/Region'].map(lambda x: coordinates.get(x, [None, None])[0])
df['lon'] = df['Country/Region'].map(lambda x: coordinates.get(x, [None, None])[1])
# Drop countries without coordinate data
map_data = df.dropna(subset=['lat', 'lon'])
# Plot each country's data as a circle marker
for idx, row in map_data.iterrows():
    # Scale marker size by confirmed cases
    marker_size = min(max(row['Confirmed'] / 10000, 5), 50)  # Keep between 5–50
    # Choose marker color based on death rate
    if row['Death_Rate'] < 2:
        color = 'green'
    elif row['Death_Rate'] < 5:
        color = 'orange'
    else:
        color = 'red'
    # Info popup for each country
    popup_text = f"""
    <b>{row['Country/Region']}</b><br>
    Confirmed: {row['Confirmed']:,}<br>
    Deaths: {row['Deaths']:,}<br>
    Recovered: {row['Recovered']:,}<br>
    Death Rate: {row['Death_Rate']:.2f}%<br>
    Recovery Rate: {row['Recovery_Rate']:.2f}%<br>
    WHO Region: {row['WHO Region']}
    """
    # Add marker to map
    folium.CircleMarker(
        location=[row['lat'], row['lon']],
        radius=marker_size,
        popup=folium.Popup(popup_text, max_width=300),
        color='black',
        fillColor=color,
        fillOpacity=0.7,
        weight=2
    ).add_to(world_map)
# Custom legend HTML for the map
legend_html = '''
<div style="position: fixed; 
            bottom: 50px; left: 50px; width: 150px; height: 90px; 
            background-color: white; border:2px solid grey; z-index:9999; 
            font-size:14px; padding: 10px">
<p><b>COVID-19 Death Rate</b></p>
<p><i class="fa fa-circle" style="color:green"></i> < 2%</p>
<p><i class="fa fa-circle" style="color:orange"></i> 2% - 5%</p>
<p><i class="fa fa-circle" style="color:red"></i> > 5%</p>
</div>
'''
# Add legend to the map
world_map.get_root().html.add_child(folium.Element(legend_html))
# Save interactive map as an HTML file
world_map.save('covid19_world_map.html')
print("Interactive world map saved as 'covid19_world_map.html'")
# Display map inline (works in Jupyter/Colab)
world_map

Output:

Step 7: Save Results and Create Final Dashboard

This final visualization combines six key subplots into a single interactive dashboard using Plotly’s make_subplots. It gives a holistic view of the global COVID-19 situation by showing confirmed cases, regional trends, recovery vs. death rates, new case distribution, recovery rates by region, and an overall status breakdown.

Here is the code:

# Save cleaned and enriched dataset for future analysis or sharing
df.to_csv('covid19_processed_data.csv', index=False)
print("Processed data saved to 'covid19_processed_data.csv'")
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import pandas as pd
# Create a 3x2 dashboard layout
final_dashboard = make_subplots(
    rows=3, cols=2,
    subplot_titles=(
        'Top 10 Countries - Confirmed Cases', 'Regional Distribution', 
        'Death Rate vs Recovery Rate', 'New Cases Distribution',
        'Recovery Rate by Region', 'Case Status Distribution'
    ),
    specs=[[{"type": "bar"}, {"type": "bar"}],
           [{"type": "scatter"}, {"type": "histogram"}],
           [{"type": "box"}, {"type": "pie"}]],
    vertical_spacing=0.12,
    horizontal_spacing=0.1
)
# --- Subplot 1: Top 10 Countries by Confirmed Cases ---
final_dashboard.add_trace(
    go.Bar(
        x=top_10_confirmed['Country/Region'], 
        y=top_10_confirmed['Confirmed'], 
        name='Confirmed Cases', 
        marker_color='red'
    ),
    row=1, col=1
)
# --- Subplot 2: Regional Distribution ---
final_dashboard.add_trace(
    go.Bar(
        x=regional_data['WHO Region'], 
        y=regional_data['Confirmed'], 
        name='Regional Cases', 
        marker_color='blue'
    ),
    row=1, col=2
)
# --- Subplot 3: Recovery Rate vs Death Rate Scatter Plot ---
final_dashboard.add_trace(
    go.Scatter(
        x=df['Recovery_Rate'], 
        y=df['Death_Rate'], 
        mode='markers',
        text=df['Country/Region'], 
        name='Countries',
        marker=dict(size=8, color='purple', opacity=0.6)
    ),
    row=2, col=1
)
# --- Subplot 4: New Cases Histogram ---
final_dashboard.add_trace(
    go.Histogram(
        x=df['New cases'], 
        name='New Cases Distribution', 
        marker_color='orange', 
        nbinsx=30
    ),
    row=2, col=2
)
# --- Subplot 5: Recovery Rate Box Plot by Region ---
# Prepare recovery rate data for box plot
box_data = []
regions = df['WHO Region'].unique()
for region in regions:
    region_data = df[df['WHO Region'] == region]['Recovery_Rate'].dropna()
    box_data.extend([(rate, region) for rate in region_data])
box_df = pd.DataFrame(box_data, columns=['Recovery_Rate', 'WHO Region'])
# Plot for top 3 regions (to keep visualization clean)
for region in regions[:3]:
    region_data = box_df[box_df['WHO Region'] == region]['Recovery_Rate']
    final_dashboard.add_trace(
        go.Box(y=region_data, name=region, boxmean=True),
        row=3, col=1
    )
# --- Subplot 6: Global Case Status Pie Chart ---
total_confirmed = df['Confirmed'].sum()
total_deaths = df['Deaths'].sum()
total_recovered = df['Recovered'].sum()
total_active = df['Active'].sum()
final_dashboard.add_trace(
    go.Pie(
        labels=['Active', 'Recovered', 'Deaths'], 
        values=[total_active, total_recovered, total_deaths],
        name='Global Status'
    ),
    row=3, col=2
)
# --- Layout Settings ---
final_dashboard.update_layout(
    height=1200,
    showlegend=True,
    title_text="COVID-19 Comprehensive Analysis Dashboard",
    title_x=0.5  # Center the title
)
# Show interactive dashboard in browser or notebook
final_dashboard.show()
# Save dashboard as standalone HTML file
final_dashboard.write_html("covid19_final_dashboard.html")
print("Final dashboard saved as 'covid19_final_dashboard.html'")

Output:

This dashboard gives a complete data-driven snapshot of COVID-19 trends across countries and regions.

Subscribe to upGrad's Newsletter

Join thousands of learners who receive useful tips

Promise we won't spam!

Conclusion: What We Learned from This Project

This COVID-19 Project gave us a comprehensive, hands-on experience in working with real-world public health data. Throughout the process, we learned how to:

Preprocess and clean complex datasets for meaningful analysis.
Perform exploratory data analysis (EDA) to uncover trends, distributions, and anomalies.
Build interactive visualizations that communicate insights clearly.
Apply geospatial analysis to map global pandemic impact.
Understand and compute critical health metrics like death rates, recovery rates, and active cases.

From a technical perspective, this project helped us gain proficiency in several powerful tools and technologies, including:

Pandas for data manipulation
Plotly for dynamic plots and subplots
Folium for interactive mapping
NumPy for numerical calculations
Plotly Dash / Subplots for dashboard development

Ultimately, we transformed raw COVID-19 data into an insightful, interactive, and visually engaging dashboard demonstrating the real-world value of data science in public health decision-making.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Explore our Popular Data Science Courses

Executive Post Graduate Programme in Data Science from IIITB	Data Science Bootcamp with AI	Master of Science in Data Science from LJMU
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Courses

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Top Data Science Skills to Learn

Data Analysis Course	Inferential Statistics Courses
Hypothesis Testing Programs	Logistic Regression Courses
Linear Regression Courses	Linear Algebra for Analysis

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Read our popular Data Science Articles

Is Data Science Hard to Learn	Data Science Career Growth	What Is Data Science? Courses, Basics, Frameworks & Careers
Future of Data Science in India	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist

Colab Link-
https://colab.research.google.com/drive/1ti-gc6N5zgUFZ_hQX3CAOClYg0LY7i5N?usp=sharing

Frequently Asked Questions (FAQs)

1. How can I visualize COVID-19 data using Python?

You can visualize COVID-19 data in Python using libraries like Plotly, Folium, and Matplotlib. This project shows how to build bar charts, scatter plots, and interactive maps to understand the global impact effectively.

2. What is the best dataset for COVID-19 analysis?

The most reliable COVID-19 datasets are available on platforms like Kaggle, Johns Hopkins University, and WHO. In this project, we used a Kaggle dataset with country-wise confirmed, recovered, and death cases.

3. How do I make an interactive COVID-19 dashboard?

You can build an interactive dashboard using Plotly’s make_subplots() and Folium for maps. This project combines multiple visualizations—bar charts, pie charts, and maps—into one cohesive dashboard saved as an HTML file.

4. How do I calculate COVID-19 recovery and death rates?

The recovery rate is calculated as (Recovered / Confirmed) × 100, and the death rate is (Deaths / Confirmed) × 100. These metrics provide deeper insights into the severity and healthcare response across regions.

5. What insights can I get from a COVID-19 project?

You can uncover patterns like which countries have high fatality or recovery rates, regional disparities, case surges, and healthcare effectiveness—all through visual analytics.

#Tag

Project Ideas

Rohit Sharma

844 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources