COVID-19 Project: Data Visualization & Insights

By Rohit Sharma

Updated on Jul 24, 2025 | 17 min read | 1.29K+ views

Share:

The COVID-19 pandemic touched every region of the globe, but behind the news were enormous quantities of data. In this project, we will perform COVID-19 data analysis
and going to make that data tangible.

If you're a data science newcomer or just want to tune up your skills, this blog will guide you through working with actual public health data, visualizing trends with interactive graphs, and even tracing the virus's spread across regions.

If you're looking to accelerate your data science journey, check out the Online Data Science Courses at upGrad. The programs help you learn Python, Machine Learning, AI, Tableau, SQL, and more from top-tier faculty. Enroll today!

Spark your next big idea. Browse our full collection of data science projects in Python.

Ready to Dive In? Here's What You Need

It’s helpful to have some basic knowledge of the following before starting this project:

  • Python programming (variables, functions, loops, basic syntax)
  • Pandas and Numpy (for handling and analyzing data)
  • Matplotlib or Seaborn (for creating charts and visualizing trends)
  • Ploty (Creating interactive line charts, scatter plots, and subplots)
  • Geospatial Tools (Folium / GeoPandas) (Basic idea of plotting on maps)
  • Date and time handling (using datetime in pandas)

Start your journey of career advancement in data science with upGrad’s top-ranked courses and get a chance to learn from industry-established mentors:

The Tools That Made It Happen

For this COVID-19 Project, the following tools and libraries will be used:

Skill Area

Purpose

Python Programming You'll be writing Python code to clean, explore, and visualize data.
Pandas & NumPy These libraries help in cleaning and analyzing large datasets efficiently.
Matplotlib & Seaborn Useful for quick visualizations and exploring trends in the data.
Plotly This project uses Plotly to build rich, interactive dashboards.
Geospatial Tools (Folium / GeoPandas) Used for mapping how the virus spread across regions or countries.
Jupyter/Colab Environment (Optional) Makes it easier to test code and view visualizations inline.

Model Selection: Our Choices and Why

In this COVID-19  Project, you're not predicting future trends, but instead learning to explore, analyze, and visualize real-world health data effectively using the following tools and techniques:

  • Data Cleaning & Manipulation with Pandas
    You’ll learn how to handle messy, real-world datasets,  fixing date formats, filling missing values, and reshaping data to make it ready for analysis.
  • Data Visualization with Matplotlib & Seaborn
    Create clear charts like bar graphs, line plots, and heatmaps to explore COVID-19 trends such as case spikes, death rates, and recoveries.
  • Interactive Dashboards with Plotly
    Learn to build interactive visuals that respond to user input — such as clickable country-wise maps or timelines showing case growth over months.
  • Geospatial Mapping (Optional: with Folium or GeoPandas)
    This lets you visualize how the virus spread across countries or regions on real-world maps.

Time Commitment & Skill Level

You can complete the COVID-19 project in 2 to 3 hours. It’s a beginner-friendly yet impactful project that helps you learn how to work with real-world public health data, create insightful visualizations, and build interactive charts and maps using Python.

How to Build a COVID-19 Project

Let’s start building the project from scratch. We'll go step-by-step through the process of:

  1. Download the Dataset
  2. Cleaning and preparing the data
  3. Exploratory Data Analysis (EDA)
  4. Interactive Visualizations with Plotly 
  5. Advanced Interactive Dashboard

Without any further delay, let’s get started!

Step 1: Download the Dataset

To build our COVID-19 Project, we’ll use a publicly available dataset from Kaggle. This dataset includes real-world COVID-19 statistics such as daily confirmed cases, deaths, recoveries, and testing rates across different countries and periods.

Follow the steps below to download the dataset:

  1. Open a new tab in your web browser.
  2. Go to: Kaggle
  3. Search for the dataset and click the Download button to download the dataset as a .zip file.
  4. Once downloaded, extract the ZIP file. 
  5. We’ll use this CSV file for the project.

Now that you’ve downloaded the dataset, let’s move on to the next step, uploading and loading it into Google Colab.

Step 2: Upload and Read the Dataset in Google Colab

Now that you have downloaded both files, upload them to Google Colab using the code below:

from google.colab import files
uploaded = files.upload()

Once uploaded, import the required libraries and use the following Python code to read and check the data:

# Import all necessary libraries for data analysis and visualization
import pandas as pd                 # For data manipulation
import numpy as np                  # For numerical operations
import matplotlib.pyplot as plt     # For basic plotting
import seaborn as sns              # For statistical visualizations
import plotly.express as px        # For interactive visualizations
import plotly.graph_objects as go  # For custom interactive plots
import plotly.offline as pyo       # For offline plotting
from plotly.subplots import make_subplots  # For multiple subplots
# Install required packages if not already installed
# Run these in separate cells if needed:
# !pip install plotly
# !pip install folium
# !pip install geopandas
# Load the COVID-19 dataset
df = pd.read_csv('country_wise_latest.csv')
# Display basic information about the dataset
print("\nFirst 5 rows:")
print(df.head())

Output : 

First 5 rows:

      Country/Region  Confirmed  Deaths  Recovered  Active  New cases  New deaths  \

0    Afghanistan           36263       1269       25198        9796         106             10   

1    Albania                     4880         144         2745         1991         117               6   

2    Algeria                    27973       1163       18837         7973        616               8   

3    Andorra                      907          52           803             52          10               0   

4    Angola                       950          41            242           667          18               1   

     New recovered  Deaths / 100 Cases  Recovered / 100 Cases  \

0             18                      3.50                          69.49   

1             63                      2.95                           56.25   

2            749                     4.16                           67.34   

3              0                      5.73                           88.53   

4              0                      4.32                           25.47   

     Deaths / 100 Recovered  Confirmed last week  1 week change  \

0                    5.04                           35526                        737   

1                    5.25                               4171                        709   

2                    6.17                            23691                      4282   

3                    6.48                               884                          23   

4                   16.94                               749                        201   

      1 week % increase         WHO Region  

0               2.07                       Eastern Mediterranean  

1              17.00                       Europe  

2              18.07                      Africa  

3              2.60                       Europe  

4            26.84                       Africa  

Step 3: Clean and Prepare the Data

Before visualizing COVID-19 data, it’s crucial to clean and prepare the dataset. This includes handling missing values, calculating useful metrics like death and recovery rates, and formatting country names for mapping and analysis.

Here is the code:

# Step 1: Check for missing values
print("Missing values in each column:")
print(df.isnull().sum())
# Step 2: Handle missing values and correct data types
# Replace any missing values in key numerical columns with 0
numerical_cols = ['Confirmed', 'Deaths', 'Recovered', 'Active', 'New cases', 
                  'New deaths', 'New recovered']
for col in numerical_cols:
    df[col] = df[col].fillna(0)
# Step 3: Create additional calculated columns for deeper analysis
# Calculate death rate as a percentage of confirmed cases
df['Death_Rate'] = (df['Deaths'] / df['Confirmed']) * 100
# Calculate recovery rate as a percentage of confirmed cases
df['Recovery_Rate'] = (df['Recovered'] / df['Confirmed']) * 100
# Calculate active case rate as a percentage of confirmed cases
df['Active_Rate'] = (df['Active'] / df['Confirmed']) * 100
# Step 4: Clean country names for mapping compatibility
# Remove asterisks (if any) from country names
df['Country/Region'] = df['Country/Region'].str.replace('*', '', regex=False)
# Display a preview of the cleaned and enriched dataset
print("Cleaned Dataset Info:")
print(df[['Country/Region', 'Confirmed', 'Deaths', 'Recovered', 'Death_Rate', 'Recovery_Rate']].head())

Output: 

Column Name

Missing Values

Country/Region 0
Confirmed 0
Deaths 0
Recovered 0
Active 0
New cases 0
New deaths 0
New recovered 0
Deaths / 100 Cases 0
Recovered / 100 Cases 0
Deaths / 100 Recovered 0
Confirmed last week 0
1 week change 0
1 week % increase 0
WHO Region 0

Cleaned Data Preview : 

Country/Region

Confirmed

Deaths

Recovered

Death_Rate (%)

Recovery_Rate (%)

Afghanistan 36,263 1,269 25,198 3.50 69.49
Albania 4,880 144 2,745 2.95 56.25
Algeria 27,973 1,163 18,837 4.16 67.34
Andorra 907 52 803 5.73 88.53
Angola 950 41 242 4.32 25.47

 

Now that our Real-world health project data is clean and sorted, our very first task is to perform EDA.

Step 4: Exploratory Data Analysis (EDA) for COVID-19 Dataset

In this step, we’ll perform a comprehensive visual analysis of the COVID-19 dataset to uncover global trends. We’ll examine the countries most affected, distribution of recovery rates, and the relationship between confirmed cases and death rate.

Here is the code:

# Create comprehensive exploratory analysis using visualizations
# Set the figure size
plt.figure(figsize=(15, 8))
# 1. Top 10 countries by confirmed COVID-19 cases
top_10_confirmed = df.nlargest(10, 'Confirmed')
plt.subplot(2, 2, 1)
plt.barh(top_10_confirmed['Country/Region'], top_10_confirmed['Confirmed'], color='red', alpha=0.7)
plt.title('Top 10 Countries by Confirmed Cases')
plt.xlabel('Confirmed Cases')
# 2. Top 10 countries by COVID-19 deaths
top_10_deaths = df.nlargest(10, 'Deaths')
plt.subplot(2, 2, 2)
plt.barh(top_10_deaths['Country/Region'], top_10_deaths['Deaths'], color='black', alpha=0.7)
plt.title('Top 10 Countries by Deaths')
plt.xlabel('Deaths')
# 3. Histogram showing the distribution of recovery rates across countries
plt.subplot(2, 2, 3)
plt.hist(df['Recovery_Rate'].dropna(), bins=30, color='green', alpha=0.7, edgecolor='black')
plt.title('Distribution of Recovery Rates')
plt.xlabel('Recovery Rate (%)')
plt.ylabel('Frequency')
# 4. Scatter plot comparing confirmed cases and death rate
plt.subplot(2, 2, 4)
plt.scatter(df['Confirmed'], df['Death_Rate'], alpha=0.6, color='purple')
plt.title('Death Rate vs Confirmed Cases')
plt.xlabel('Confirmed Cases')
plt.ylabel('Death Rate (%)')
plt.xscale('log')  # Log scale to handle wide range of confirmed cases
# Adjust layout to prevent overlap
plt.tight_layout()
plt.show()
# Print key global statistics
print("Global COVID-19 Statistics:")
print(f"Total Confirmed Cases: {df['Confirmed'].sum():,}")
print(f"Total Deaths: {df['Deaths'].sum():,}")
print(f"Total Recovered: {df['Recovered'].sum():,}")
print(f"Global Death Rate: {(df['Deaths'].sum() / df['Confirmed'].sum() * 100):.2f}%")
print(f"Global Recovery Rate: {(df['Recovered'].sum() / df['Confirmed'].sum() * 100):.2f}%")

Output: 

Global COVID-19 Statistics:

Total Confirmed Cases: 16,480,485

Total Deaths: 654,036

Total Recovered: 9,468,087

Global Death Rate: 3.97%

Global Recovery Rate: 57.45%

Step 5:  Interactive Visualizations for COVID-19 Insights

In this section, we’ll create interactive visualizations using Plotly to explore COVID-19 data across countries and WHO regions. These dynamic charts help reveal deeper insights such as country-wise confirmed cases and how recovery and death rates vary globally.

Here is the code:

# Create interactive visualizations for dynamic COVID-19 data insights
# 1. Interactive bar chart for the top 20 countries with highest confirmed cases
top_20_countries = df.nlargest(20, 'Confirmed')  # Get top 20 rows with highest confirmed cases
fig1 = px.bar(
    top_20_countries,                     # Data to plot
    x='Country/Region',                   # Countries on X-axis
    y='Confirmed',                        # Confirmed cases on Y-axis
    color='Deaths',                       # Color bar by number of deaths
    title='Top 20 Countries by Confirmed COVID-19 Cases',
    hover_data=['Deaths', 'Recovered', 'Active'],  # Extra info when hovering over bars
    color_continuous_scale='Reds'         # Red color gradient for deaths
)
# Customize layout for better readability
fig1.update_layout(
    xaxis_tickangle=-45,                 # Rotate country names on X-axis
    height=600,
    xaxis_title="Country",
    yaxis_title="Confirmed Cases"
)
# Show the interactive bar chart
fig1.show()
# 2. Interactive scatter plot comparing Recovery Rate vs Death Rate
fig2 = px.scatter(
    df,                                  # Full dataset
    x='Recovery_Rate',                   # Recovery rate on X-axis
    y='Death_Rate',                      # Death rate on Y-axis
    size='Confirmed',                    # Bubble size indicates total confirmed cases
    color='WHO Region',                  # Color based on WHO region
    hover_name='Country/Region',         # Show country name on hover
    hover_data=['Confirmed', 'Deaths', 'Recovered'],  # Extra info on hover
    title='COVID-19: Death Rate vs Recovery Rate by WHO Region',
    labels={
        'Recovery_Rate': 'Recovery Rate (%)',
        'Death_Rate': 'Death Rate (%)'
    }
)
# Adjust chart height for better display
fig2.update_layout(height=600)
# Show the interactive scatter plot
fig2.show()

Output:

background

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Note- The charts above are originally interactive and dynamic. However, they are shown here as static images for display purposes. In a real project or dashboard, you can hover, zoom, and filter data directly on these plots for deeper exploration.

Step 5.1: Interactive Bar Charts

In this section, we use Plotly to build interactive visualizations for deeper insights into COVID-19 trends by WHO regions and global distribution. These Bar charts help users visually compare the confirmed, death, and recovery numbers across various regions and countries.

Here is the Code:

# Create interactive multi-metric visualization
# Using the WHO region data for comparative analysis
# 3. Regional analysis - Aggregate key metrics by WHO Region
# We'll sum up total Confirmed, Deaths, Recovered, and Active cases for each region
regional_data = df.groupby('WHO Region').agg({
    'Confirmed': 'sum',
    'Deaths': 'sum',
    'Recovered': 'sum',
    'Active': 'sum'
}).reset_index()
# Create a multi-bar chart using Plotly's go.Figure
fig3 = go.Figure()
# Add bar trace for confirmed cases
fig3.add_trace(go.Bar(
    name='Confirmed', 
    x=regional_data['WHO Region'], 
    y=regional_data['Confirmed'], 
    marker_color='blue'
))
# Add bar trace for deaths
fig3.add_trace(go.Bar(
    name='Deaths', 
    x=regional_data['WHO Region'], 
    y=regional_data['Deaths'], 
    marker_color='red'
))
# Add bar trace for recovered
fig3.add_trace(go.Bar(
    name='Recovered', 
    x=regional_data['WHO Region'], 
    y=regional_data['Recovered'], 
    marker_color='green'
))
# Customize the layout
fig3.update_layout(
    title='COVID-19 Cases by WHO Region',
    xaxis_title='WHO Region',
    yaxis_title='Number of Cases',
    barmode='group',            # Group bars next to each other
    height=600,
    hovermode='x unified'       # Unified tooltip on hover for better comparison
)
# Show the interactive chart
fig3.show()
# 4. Interactive pie chart for global distribution of confirmed cases (Top 10 countries)
# We use top_10_confirmed (already defined earlier using df.nlargest(10, 'Confirmed'))
fig4 = px.pie(
    top_10_confirmed, 
    values='Confirmed', 
    names='Country/Region',
    title='Global COVID-19 Cases Distribution (Top 10 Countries)',
    hover_data=['Deaths', 'Recovered']  # Show more info when hovered
)
# Customize labels and layout
fig4.update_traces(
    textposition='inside', 
    textinfo='percent+label'  # Show both percent and country name
)
# Display the interactive pie chart
fig4.show()

Output: 
 

Note- The charts above are originally interactive and dynamic. However, they are shown here as static images for display purposes. In a real project or dashboard, you can hover, zoom, and filter data directly on these plots for deeper exploration.

Step 6: COVID-19 Comprehensive Dashboard with Subplots

In this final step, we create an interactive dashboard that combines multiple visualizations into a single layout. Using Plotly subplots, this dashboard helps analyze the pandemic's impact by region, country, and rate metrics—all at once.

Here is the Code: 

# Create a comprehensive dashboard with multiple subplots
# This combines bar, scatter, and pie charts into a single interactive layout
from plotly.subplots import make_subplots
import plotly.graph_objects as go
# Initialize subplot layout with 2 rows × 2 columns
# Each cell will host a different type of chart
fig5 = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        'Cases by Region', 
        'Top 10 Countries (Sample: Top 5)', 
        'Death vs Recovery Rate', 
        'Case Distribution (Top 5 Countries)'
    ),
    specs=[[{"type": "bar"}, {"type": "bar"}],     # Row 1: bar charts
           [{"type": "scatter"}, {"type": "pie"}]] # Row 2: scatter and pie
)
# Subplot 1: Bar chart - Total confirmed cases by WHO Region
fig5.add_trace(
    go.Bar(
        x=regional_data['WHO Region'], 
        y=regional_data['Confirmed'], 
        name='Confirmed by Region'
    ),
    row=1, col=1
)
# Subplot 2: Bar chart - Top 5 countries with most confirmed cases
fig5.add_trace(
    go.Bar(
        x=top_10_confirmed['Country/Region'][:5], 
        y=top_10_confirmed['Confirmed'][:5], 
        name='Top 5 Countries'
    ),
    row=1, col=2
)
# Subplot 3: Scatter plot - Death Rate vs Recovery Rate
fig5.add_trace(
    go.Scatter(
        x=df['Recovery_Rate'], 
        y=df['Death_Rate'], 
        mode='markers',
        text=df['Country/Region'],              # Country name as hover text
        marker=dict(
            size=df['Confirmed'] / 10000,       # Bubble size based on confirmed cases
            color='red', 
            opacity=0.6
        ),
        name='Death vs Recovery'
    ),
    row=2, col=1
)
# Subplot 4: Pie chart - Distribution of confirmed cases (Top 5 countries)
fig5.add_trace(
    go.Pie(
        labels=top_10_confirmed['Country/Region'][:5], 
        values=top_10_confirmed['Confirmed'][:5], 
        name='Distribution'
    ),
    row=2, col=2
)
# Final layout adjustments
fig5.update_layout(
    height=800,
    showlegend=False,                      # Hide legend for cleaner look
    title_text="COVID-19 Comprehensive Dashboard"
)

# Display the dashboard

fig5.show()

Output:

Note- This interactive dashboard allows dynamic zoom, pan, and hover interactions. It’s ideal for use in Jupyter/Colab notebooksDash apps, or Streamlit.

Here you're seeing a static version; it may not reflect the true interactivity.

Step 6.1:  Geospatial Mapping for Disease Spread Analysis with Ploty

To visualize the global spread of COVID-19, we use choropleth maps that color countries based on the number of confirmed cases and death rates. For this, we manually map country names to their ISO-3 codes, which Plotly uses to identify countries.

Here is the code:

# Create interactive world map showing COVID-19 spread
# Choropleth maps require ISO 3-letter country codes
# Mapping country names to their ISO codes (for visualization)
country_codes = {
    'US': 'USA', 'Brazil': 'BRA', 'India': 'IND', 'Russia': 'RUS', 'Peru': 'PER',
    'Chile': 'CHL', 'United Kingdom': 'GBR', 'Iran': 'IRN', 'Germany': 'DEU', 'Turkey': 'TUR',
    'Bangladesh': 'BGD', 'France': 'FRA', 'Saudi Arabia': 'SAU', 'Italy': 'ITA', 'Pakistan': 'PAK',
    'Spain': 'ESP', 'Mexico': 'MEX', 'South Africa': 'ZAF', 'Canada': 'CAN', 'Qatar': 'QAT',
    'China': 'CHN', 'Egypt': 'EGY', 'Sweden': 'SWE', 'Belarus': 'BLR', 'Belgium': 'BEL',
    'Ecuador': 'ECU', 'Kazakhstan': 'KAZ', 'Indonesia': 'IDN', 'UAE': 'ARE', 'Portugal': 'PRT',
    'Netherlands': 'NLD', 'Singapore': 'SGP', 'Kuwait': 'KWT', 'Ukraine': 'UKR', 'Philippines': 'PHL',
    'Argentina': 'ARG', 'Afghanistan': 'AFG', 'Japan': 'JPN', 'Poland': 'POL', 'Romania': 'ROU',
    'Israel': 'ISR', 'Switzerland': 'CHE', 'Thailand': 'THA', 'Armenia': 'ARM', 'Nigeria': 'NGA',
    'Bahrain': 'BHR', 'Iraq': 'IRQ', 'Azerbaijan': 'AZE', 'Dominican Republic': 'DOM', 'Panama': 'PAN',
    'Bolivia': 'BOL', 'Ireland': 'IRL', 'South Korea': 'KOR', 'Austria': 'AUT', 'Serbia': 'SRB',
    'Oman': 'OMN', 'Czech Republic': 'CZE', 'Moldova': 'MDA', 'Denmark': 'DNK', 'Guatemala': 'GTM'
}
# Add ISO-3 codes as a new column for mapping
df['iso_code'] = df['Country/Region'].map(country_codes)
# ---------------------------------------------
# Choropleth Map 1: Confirmed COVID-19 Cases
# ---------------------------------------------
fig6 = px.choropleth(
    df,
    locations='iso_code',                  # ISO-3 country codes
    color='Confirmed',                     # Color scale based on confirmed cases
    hover_name='Country/Region',           # Hover label
    hover_data=['Deaths', 'Recovered', 'Death_Rate'],  # Extra info on hover
    color_continuous_scale='Reds',
    title='Global COVID-19 Confirmed Cases Distribution'
)
# Update map layout
fig6.update_layout(
    height=600,
    geo=dict(
        showframe=False,
        showcoastlines=True,
        projection_type='natural earth'     # Natural Earth projection
    )
)
fig6.show()
# ---------------------------------------------
#  Choropleth Map 2: COVID-19 Death Rate (%)
# ---------------------------------------------
fig7 = px.choropleth(
    df,
    locations='iso_code',
    color='Death_Rate',                    # Color scale based on death rate %
    hover_name='Country/Region',
    hover_data=['Confirmed', 'Deaths', 'Recovered'],
    color_continuous_scale='Oranges',
    title='Global COVID-19 Death Rate Distribution (%)'
)
# Update map layout
fig7.update_layout(
    height=600,
    geo=dict(
        showframe=False,
        showcoastlines=True,
        projection_type='natural earth'
    )
)
fig7.show()

Output:

Note- These choropleth maps are fully interactive, allowing zoom, pan, and hover. Here they are displayed as static images. To experience their full functionality, run them in a Jupyter Notebook, Google Colab, or Streamlit dashboard.

Step 6.2: Advanced Geospatial Analysis with Folium

This interactive map uses Folium, a Python mapping library, to visualize COVID-19's global impact. Countries are represented using circle markers, where:

  • Size is proportional to confirmed cases.
  • Color indicates the death rate:
    •  Green: Low (< 2%)
    •  Orange: Moderate (2–5%)
    •  Red: High (> 5%)

Here is the Code: 

# Install folium if not already installed
# !pip install folium
import folium
from folium import plugins
# Create a base world map centered at lat=20, lon=0
world_map = folium.Map(location=[20, 0], zoom_start=2, tiles='OpenStreetMap')
# Coordinates for major countries (used if latitude/longitude not in dataset)
coordinates = {
    'US': [39.8283, -98.5795], 'Brazil': [-14.2350, -51.9253], 'India': [20.5937, 78.9629],
    'Russia': [61.5240, 105.3188], 'Peru': [-9.1900, -75.0152], 'Chile': [-35.6751, -71.5430],
    'United Kingdom': [55.3781, -3.4360], 'Iran': [32.4279, 53.6880], 'Germany': [51.1657, 10.4515],
    'Turkey': [38.9637, 35.2433], 'Bangladesh': [23.6850, 90.3563], 'France': [46.6034, 1.8883],
    'Saudi Arabia': [23.8859, 45.0792], 'Italy': [41.8719, 12.5674], 'Pakistan': [30.3753, 69.3451],
    'Spain': [40.4637, -3.7492], 'Mexico': [23.6345, -102.5528], 'South Africa': [-30.5595, 22.9375],
    'Canada': [56.1304, -106.3468], 'China': [35.8617, 104.1954]
}
# Add latitude and longitude to the DataFrame
df['lat'] = df['Country/Region'].map(lambda x: coordinates.get(x, [None, None])[0])
df['lon'] = df['Country/Region'].map(lambda x: coordinates.get(x, [None, None])[1])
# Drop countries without coordinate data
map_data = df.dropna(subset=['lat', 'lon'])
# Plot each country's data as a circle marker
for idx, row in map_data.iterrows():
    # Scale marker size by confirmed cases
    marker_size = min(max(row['Confirmed'] / 10000, 5), 50)  # Keep between 5–50
    # Choose marker color based on death rate
    if row['Death_Rate'] < 2:
        color = 'green'
    elif row['Death_Rate'] < 5:
        color = 'orange'
    else:
        color = 'red'
    # Info popup for each country
    popup_text = f"""
    <b>{row['Country/Region']}</b><br>
    Confirmed: {row['Confirmed']:,}<br>
    Deaths: {row['Deaths']:,}<br>
    Recovered: {row['Recovered']:,}<br>
    Death Rate: {row['Death_Rate']:.2f}%<br>
    Recovery Rate: {row['Recovery_Rate']:.2f}%<br>
    WHO Region: {row['WHO Region']}
    """
    # Add marker to map
    folium.CircleMarker(
        location=[row['lat'], row['lon']],
        radius=marker_size,
        popup=folium.Popup(popup_text, max_width=300),
        color='black',
        fillColor=color,
        fillOpacity=0.7,
        weight=2
    ).add_to(world_map)
# Custom legend HTML for the map
legend_html = '''
<div style="position: fixed; 
            bottom: 50px; left: 50px; width: 150px; height: 90px; 
            background-color: white; border:2px solid grey; z-index:9999; 
            font-size:14px; padding: 10px">
<p><b>COVID-19 Death Rate</b></p>
<p><i class="fa fa-circle" style="color:green"></i> < 2%</p>
<p><i class="fa fa-circle" style="color:orange"></i> 2% - 5%</p>
<p><i class="fa fa-circle" style="color:red"></i> > 5%</p>
</div>
'''
# Add legend to the map
world_map.get_root().html.add_child(folium.Element(legend_html))
# Save interactive map as an HTML file
world_map.save('covid19_world_map.html')
print("Interactive world map saved as 'covid19_world_map.html'")
# Display map inline (works in Jupyter/Colab)
world_map

Output:

Step 7: Save Results and Create Final Dashboard

This final visualization combines six key subplots into a single interactive dashboard using Plotly’s make_subplots. It gives a holistic view of the global COVID-19 situation by showing confirmed cases, regional trends, recovery vs. death rates, new case distribution, recovery rates by region, and an overall status breakdown.

Here is the code: 

# Save cleaned and enriched dataset for future analysis or sharing
df.to_csv('covid19_processed_data.csv', index=False)
print("Processed data saved to 'covid19_processed_data.csv'")
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import pandas as pd
# Create a 3x2 dashboard layout
final_dashboard = make_subplots(
    rows=3, cols=2,
    subplot_titles=(
        'Top 10 Countries - Confirmed Cases', 'Regional Distribution', 
        'Death Rate vs Recovery Rate', 'New Cases Distribution',
        'Recovery Rate by Region', 'Case Status Distribution'
    ),
    specs=[[{"type": "bar"}, {"type": "bar"}],
           [{"type": "scatter"}, {"type": "histogram"}],
           [{"type": "box"}, {"type": "pie"}]],
    vertical_spacing=0.12,
    horizontal_spacing=0.1
)
# --- Subplot 1: Top 10 Countries by Confirmed Cases ---
final_dashboard.add_trace(
    go.Bar(
        x=top_10_confirmed['Country/Region'], 
        y=top_10_confirmed['Confirmed'], 
        name='Confirmed Cases', 
        marker_color='red'
    ),
    row=1, col=1
)
# --- Subplot 2: Regional Distribution ---
final_dashboard.add_trace(
    go.Bar(
        x=regional_data['WHO Region'], 
        y=regional_data['Confirmed'], 
        name='Regional Cases', 
        marker_color='blue'
    ),
    row=1, col=2
)
# --- Subplot 3: Recovery Rate vs Death Rate Scatter Plot ---
final_dashboard.add_trace(
    go.Scatter(
        x=df['Recovery_Rate'], 
        y=df['Death_Rate'], 
        mode='markers',
        text=df['Country/Region'], 
        name='Countries',
        marker=dict(size=8, color='purple', opacity=0.6)
    ),
    row=2, col=1
)
# --- Subplot 4: New Cases Histogram ---
final_dashboard.add_trace(
    go.Histogram(
        x=df['New cases'], 
        name='New Cases Distribution', 
        marker_color='orange', 
        nbinsx=30
    ),
    row=2, col=2
)
# --- Subplot 5: Recovery Rate Box Plot by Region ---
# Prepare recovery rate data for box plot
box_data = []
regions = df['WHO Region'].unique()
for region in regions:
    region_data = df[df['WHO Region'] == region]['Recovery_Rate'].dropna()
    box_data.extend([(rate, region) for rate in region_data])
box_df = pd.DataFrame(box_data, columns=['Recovery_Rate', 'WHO Region'])
# Plot for top 3 regions (to keep visualization clean)
for region in regions[:3]:
    region_data = box_df[box_df['WHO Region'] == region]['Recovery_Rate']
    final_dashboard.add_trace(
        go.Box(y=region_data, name=region, boxmean=True),
        row=3, col=1
    )
# --- Subplot 6: Global Case Status Pie Chart ---
total_confirmed = df['Confirmed'].sum()
total_deaths = df['Deaths'].sum()
total_recovered = df['Recovered'].sum()
total_active = df['Active'].sum()
final_dashboard.add_trace(
    go.Pie(
        labels=['Active', 'Recovered', 'Deaths'], 
        values=[total_active, total_recovered, total_deaths],
        name='Global Status'
    ),
    row=3, col=2
)
# --- Layout Settings ---
final_dashboard.update_layout(
    height=1200,
    showlegend=True,
    title_text="COVID-19 Comprehensive Analysis Dashboard",
    title_x=0.5  # Center the title
)
# Show interactive dashboard in browser or notebook
final_dashboard.show()
# Save dashboard as standalone HTML file
final_dashboard.write_html("covid19_final_dashboard.html")
print("Final dashboard saved as 'covid19_final_dashboard.html'")

Output:

This dashboard gives a complete data-driven snapshot of COVID-19 trends across countries and regions. 

Conclusion: What We Learned from This Project

This COVID-19 Project gave us a comprehensive, hands-on experience in working with real-world public health data. Throughout the process, we learned how to:

  • Preprocess and clean complex datasets for meaningful analysis.
  • Perform exploratory data analysis (EDA) to uncover trends, distributions, and anomalies.
  • Build interactive visualizations that communicate insights clearly.
  • Apply geospatial analysis to map global pandemic impact.
  • Understand and compute critical health metrics like death rates, recovery rates, and active cases.

From a technical perspective, this project helped us gain proficiency in several powerful tools and technologies, including:

  •  Pandas for data manipulation
  •  Plotly for dynamic plots and subplots
  •  Folium for interactive mapping
  •  NumPy for numerical calculations
  •  Plotly Dash / Subplots for dashboard development

Ultimately, we transformed raw COVID-19 data into an insightful, interactive, and visually engaging dashboard demonstrating the real-world value of data science in public health decision-making.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Colab Link-
https://colab.research.google.com/drive/1ti-gc6N5zgUFZ_hQX3CAOClYg0LY7i5N?usp=sharing

Frequently Asked Questions (FAQs)

1. How can I visualize COVID-19 data using Python?

2. What is the best dataset for COVID-19 analysis?

3. How do I make an interactive COVID-19 dashboard?

4. How do I calculate COVID-19 recovery and death rates?

5. What insights can I get from a COVID-19 project?

Rohit Sharma

805 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

360° Career Support

Executive PG Program

12 Months