View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All

Complete Airline Passenger Traffic Analysis Project Using Python

By Rohit Sharma

Updated on Jul 24, 2025 | 16 min read | 1.86K+ views

Share:

Air travel has evolved significantly over the years, but behind every flight statistic lies a story of demand, seasonality, and real-world events. 

In this project, we will clean and explore monthly passenger data, uncover the highs and lows of travel seasons, and predict future demand using a simple forecasting tool. Along the way, we’ll highlight peak months for smarter pricing and examine how outside factors, such as holiday schedules, influence when people fly. By the end, you’ll have hands‑on experience turning raw airline data into clear insights.

If you're looking to accelerate your data science journey, check out the Online Data Science Courses at upGrad. The programs help you learn Python, Machine Learning, AI, Tableau, SQL, and more from top-tier faculty. Enroll today!

Kickstart your next innovation. Browse our curated collection of Python-based data science projects.

What Do You Need to Start?

It’s helpful to have some basic knowledge of the following before starting this project:

  • Python programming (variables, functions, loops, basic syntax)
  • Pandas and Numpy (for handling and analyzing data)
  • Matplotlib or Seaborn (for creating charts and visualizing trends)
  • Statsmodels (seasonal decomposition for trend analysis)
  • Prophet (installing, fitting, and forecasting time‑series data)
  • Working with dates in pandas (using pd.to_datetime, resampling)
  • Merging external data (e.g., holiday calendars) for correlation studies

Start your journey of career advancement in data science with upGrad’s top-ranked courses and get a chance to learn from industry-established mentors:

How We Built This: The Tools

For this Airline Passenger Traffic Analysis project, the following tools and libraries will be used:

Tool / Library

Purpose

Python Core programming language for data analysis and scripting
Pandas Data loading, cleaning, manipulation, and data handling
Matplotlib Basic plotting of time‑series trends and visual checks
Statsmodels Seasonal decomposition to separate trend and seasonality
Prophet Time‑series forecasting of future passenger demand
Jupyter/Colab Interactive environment for writing and running code
NumPy Efficient numerical operations and array handling
Holiday CSV An external data source to analyze the impact of holidays

The Brains of the Operation: Our Models

We’ll use a few straightforward yet powerful techniques to understand and predict airline passenger traffic:

  • Seasonal Decomposition
    Breaks the monthly passenger series into three parts: trend (long‑term direction), seasonality (repeated yearly patterns), and residual (random noise). This helps us see underlying patterns.
  • Moving Averages
    Calculates rolling averages (e.g., 12‑month) to smooth out short‑term fluctuations and highlight longer‑term trends in passenger volumes.
  • Prophet Forecasting
    A user‑friendly time‑series model by Meta/Facebook that automatically handles seasonality, holidays, and trend changes to forecast future passenger demand.
  • Correlation Analysis with External Factors
    Measures how external data, like the number of holidays in a month, correlates with passenger counts, giving insight into what drives air travel volumes.

Project Snapshot: Time & Difficulty

You can complete this Airline Passenger Traffic Analysis project in about 2 to 3 hours. Difficulty is rated at (Easy to Moderate), which is perfect for beginners familiar with basic Python and eager to learn time‑series forecasting.

How to  Build an Airline Passenger Traffic Analysis Project

Let’s start building the project from scratch. We'll go step-by-step through the process of:

  1. Load the Dataset
  2. Parse Dates
  3. Visualize Trends
  4. Decompose Seasonality
  5. Forecast Demand
  6. Identify Peak Months
  7. Analyze External Factors

Without any further delay, let’s get started!

Step 1: Download the Dataset

To build our Airline Passenger Traffic Analysis project, we’ll use a publicly available dataset from Kaggle. This dataset includes real‑world monthly passenger counts for various airlines, giving us historical figures to practice time‑series analysis, forecast future demand, and explore seasonal travel patterns.

Follow the steps below to download the dataset:

  1. Open a new tab in your web browser.
  2. Go to: Kaggle
  3. Search for the dataset and click the Download button to download the dataset as a .zip file.
  4. Once downloaded, extract the ZIP file.
  5. We’ll use this CSV file for the project.

Now that you’ve downloaded the dataset, let’s move on to the next step, uploading and loading it into Google Colab.

Step 2: Upload and Read the Dataset in Google Colab

Now that you have downloaded both files, upload them to Google Colab using the code below:

from google.colab import files
uploaded = files.upload()

Once uploaded, import the required libraries and use the following Python code to read and check the data:

# Step 1: Load and inspect the dataset
import pandas as pd  # Import pandas for data handling
# Read the CSV file into a DataFrame
# Make sure 'Air_Traffic_Passenger_Statistics (1).csv' is in your working directory
df = pd.read_csv('Air_Traffic_Passenger_Statistics (1).csv')
# Display the first 5 rows to see column names and sample data
print("First 5 rows of the dataset:")
print(df.head())

Output : 

First 5 rows of the dataset:

      Activity Period  Activity Period Start Date  \

0           199907                 1999/07/01   

1           199907                 1999/07/01   

2           199907                 1999/07/01   

3           199907                 1999/07/01   

4           199907                 1999/07/01   

                        Operating Airline Operating Airline IATA Code  \

0                             ATA Airlines                          TZ   

1                             ATA Airlines                          TZ   

2                             ATA Airlines                          TZ   

3  Aeroflot Russian International Airlines          NaN   

4  Aeroflot Russian International Airlines           NaN   

                             Published Airline Published Airline IATA Code  \

0                             ATA Airlines                          TZ   

1                             ATA Airlines                          TZ   

2                             ATA Airlines                          TZ   

3  Aeroflot Russian International Airlines           NaN   

4  Aeroflot Russian International Airlines           NaN   

       GEO Summary GEO Region Activity Type Code Price Category Code  \

0       Domestic         US           Deplaned            Low Fare   

1       Domestic         US           Enplaned             Low Fare   

2       Domestic         US     Thru / Transit             Low Fare   

3  International     Europe           Deplaned               Other   

4  International     Europe           Enplaned               Other   

     Terminal Boarding Area  Passenger Count              data_as_of  \

0  Terminal 1             B            31432  2025/06/20 01:00:30 PM   

1  Terminal 1             B            31353  2025/06/20 01:00:30 PM   

2  Terminal 1             B             2518  2025/06/20 01:00:30 PM   

3  Terminal 2             D             1324  2025/06/20 01:00:30 PM   

4  Terminal 2             D             1198  2025/06/20 01:00:30 PM   

           data_loaded_at  

0  2025/07/20 03:02:25 PM  

1  2025/07/20 03:02:25 PM  

2  2025/07/20 03:02:25 PM  

3  2025/07/20 03:02:25 PM  

4  2025/07/20 03:02:25 PM 

background

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Step 3: Clean and Prepare the Data

In this step, we’ll convert the date column to a proper datetime type and make it the index of our DataFrame. This is an essential cleaning step because it allows us to easily sort by date, resample data, and run time‑series analyses.

Here is the code:

#  Convert Activity Period Start Date to datetime and set as index
import pandas as pd  # Ensure pandas is imported
# Convert the 'Activity Period Start Date' column from text to datetime objects
df['Activity Period Start Date'] = pd.to_datetime(df['Activity Period Start Date'], errors='coerce')
# Set this datetime column as the DataFrame index and sort by date
df = df.set_index('Activity Period Start Date').sort_index()
# Confirm the date range and get basic stats on passenger counts
print("Date range:", df.index.min(), "to", df.index.max())
print("\nPassenger Count Summary:")
print(df['Passenger Count'].describe())

Output: 

Metric

Value

Date Range 1999-07-01 to 2025-05-01
Total Records 38,196
Mean Passengers 27,814.07
Standard Deviation 61,982.22
Minimum 0
25th Percentile 4,355.75
Median (50%) 8,596
75th Percentile 19,675.75
Maximum 856,501

Now the data is cleaned and structured, with dates properly formatted and set as the index. This ensures our analysis and time-series forecasting models will perform accurately on chronological passenger trends.

Step 4: Visualize Monthly Passenger Traffic Over Time

Before diving into advanced modeling or forecasting, it's helpful to visually understand how passenger traffic changes over time. This line plot helps us spot trends, seasonal peaks, and any unusual dips (such as during the COVID-19 pandemic).

Here is the code:

import matplotlib.pyplot as plt  # Library used for creating static, interactive, and animated plots
# Set figure size for better visibility
plt.figure(figsize=(12, 5))
# Plot passenger count over time (each point represents a month's passenger data)
plt.plot(df.index, df['Passenger Count'], marker='o', linestyle='-')
# Set title and axis labels
plt.title('Monthly Airline Passenger Traffic')
plt.xlabel('Date')
plt.ylabel('Number of Passengers')
# Add a grid to make the chart easier to read
plt.grid(True)
# Display the plot
plt.show()

Output:

This visual gives us a clear look at seasonal patterns, long-term growth or decline, and any disruptions in air traffic, all of which are crucial for the forecasting step coming up next.

Step 5: Break Down the Trend, Seasonality & Noise with Seasonal Decomposition

Understanding how time series data behaves over time is key to accurate forecasting. A powerful technique for this is seasonal decomposition, which breaks a time series into:

  • Trend: Long-term upward or downward movement
  • Seasonality: Repeating patterns or cycles (e.g., more flights during holidays)
  • Residual (Noise): Irregularities or randomness in the data

Let’s decompose the passenger data to identify these components.

Here is the code:

from statsmodels.tsa.seasonal import seasonal_decompose  # For decomposition
import matplotlib.pyplot as plt
# Step 1: Aggregate passenger count by month
# If there are multiple entries per month, this sums them into a single monthly value
monthly_data = df.groupby('Activity Period Start Date')['Passenger Count'].sum()
# Step 2: Ensure the data has a consistent monthly frequency
# 'MS' means "Month Start" — it sets each index to the start of the month
monthly_series = monthly_data.asfreq('MS')
# Step 3: Apply additive seasonal decomposition
# 'additive' assumes that trend + seasonality + noise = observed data
decomp = seasonal_decompose(monthly_series, model='additive', period=12)  # 12 = yearly seasonality
# Step 4: Plot the decomposition results
decomp.plot()
# Add title and layout adjustments
plt.suptitle('Seasonal Decomposition of Airline Passengers', fontsize=16)
plt.tight_layout()
plt.show()

Output:

Step 6: Forecast Future Airline Passenger Trends Using Prophet

After identifying trends and seasonality, it's time to predict the future of airline passenger traffic. For this, we'll use Facebook Prophet, a powerful forecasting tool built specifically for time series data like ours.

Prophet is beginner-friendly and handles seasonality, holidays, and trends with minimal tuning, perfect for our airline analysis project.

Here is the Code:

# Step 6: Forecast Future Airline Passenger Trends Using Prophet
#  6.1 Install Prophet (only run once in Google Colab or local environment)
# !pip install prophet
#  6.2 Import necessary libraries
from prophet import Prophet
import matplotlib.pyplot as plt
#  6.3 Prepare the dataset for Prophet
# Prophet requires the dataframe to have two specific column names:
# 'ds' -> datestamp (date column), and 'y' -> numeric target variable
# Reset the index so 'Activity Period Start Date' becomes a column again,
# then rename columns to match Prophet's expected format
prophet_df = monthly_series.reset_index().rename(columns={
    'Activity Period Start Date': 'ds',   # Date column
    'Passenger Count': 'y'                # Target value to forecast
})
#  6.4 Initialize and fit the Prophet model
# Since airline traffic often repeats yearly patterns, we enable yearly seasonality
model = Prophet(yearly_seasonality=True, daily_seasonality=False)
# Fit the model to the prepared dataset
model.fit(prophet_df)
#  6.5 Generate future dates for forecasting
# We'll forecast for the next 24 months (2 years) using monthly frequency
future = model.make_future_dataframe(periods=24, freq='M')
#  6.6 Predict the future passenger counts
forecast = model.predict(future)
#  6.7 Plot the forecast results
# This graph shows the historical data and predicted future passenger numbers
fig_forecast = model.plot(forecast)
plt.title('Airline Passenger Forecast for Next 24 Months')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.grid(True)
plt.show()
#  6.8 Plot the forecast components
# Prophet automatically breaks down the forecast into:
#  - Trend (overall direction)
#  - Yearly seasonality (travel peaks/troughs)
fig_components = model.plot_components(forecast)
plt.tight_layout()
plt.show()

Output:

What we gain here:

  • A visual forecast of passenger traffic for the next 2 years
  • Breakdown of trend, seasonality, and uncertainty
  • Foundation for decision-making, like resource planning, marketing campaigns, or pricing adjustments around high/low travel seasons.

Step 7: Identify Peak Travel Seasons

In the airline industry, knowing when most people travel is crucial. These high-traffic times, or peak travel seasons, often align with holidays, festivals, school vacations, or weather patterns. Identifying these periods helps:

  •  Airlines optimize ticket pricing and flight schedules
  •  Airports manage staffing and logistics
  •  Analysts make data-driven decisions on demand forecasting

In our project, we’ll use the dataset to detect the months with the highest number of passengers, giving us real insights into seasonal demand patterns. This forms the foundation for smarter forecasting, route planning, and even promotional strategies in the airline industry.

Let’s now dive into the code to find those historical high-demand months.

# Step 7.1: Identify Top 3 Historical Peak Months
# The dataset is indexed by date and includes the number of passengers per entry
# We use 'nlargest(3)' to find the top 3 records with the highest passenger counts
top_peaks = df['Passenger Count'].nlargest(3)
# Display the dates and values for these peak records
print(" Top 3 Peak Months (Historical):")
print(top_peaks)

Output:

Rank

Month

Passenger Count

1 1999-08-01 856,501
2 1999-08-01 846,421
3 1999-07-01 792,965

The top 2 entries have the same date (1999-08-01) — likely from multiple flights/records on the same month contributing to peak volume. This indicates August 1999 was a particularly high-traffic period.

Step 7.1:  Visualizing Peak Travel Months

After identifying the top 3 peak passenger months, it's useful to highlight them on a line graph to visually understand when air travel was at its highest. The following code does exactly that:

# Highlight the peak months on the original series
plt.figure(figsize=(12, 5))  # Set the figure size
plt.plot(df.index, df['Passenger Count'], label='Monthly Passengers')  # Plot the full time series
plt.scatter(top_peaks.index, top_peaks.values, color='red', s=100, label='Peak Months')  # Highlight top peaks
plt.title('Historical Peak Travel Months')  # Title for the chart
plt.xlabel('Date')  # X-axis label
plt.ylabel('Passengers')  # Y-axis label
plt.legend()  # Show legend
plt.grid(True)  # Add grid for readability
plt.show()  # Display the plot

Output:

Step 8: Analyze External Factors Affecting Airline Traffic

Airline passenger traffic doesn’t just follow random patterns; it’s heavily influenced by various external factors. Things like public holidays, seasonal breaks, long weekends, and major events (like festivals or sports tournaments) often cause spikes or dips in travel.

In this section, we’ll explore how one of the most common external factors—public holidays- impacts the volume of airline passengers. By doing this, we can uncover interesting patterns and improve our forecasting accuracy for future travel demand.

For performing this step, you have to download an extra sample dataset for holidays, which is easily available on Kaggle.

Understanding these external influences is super helpful for:

  • Airlines planning their flight schedules
  • Marketers creating travel offers
  • Analysts estimating revenue during peak travel windows

Let’s dive in and analyze how holidays may be driving changes in air travel!

Step 8.1: Load Holiday Dataset to Analyze External Influences

To understand how external factors such as holidays affect passenger traffic, we’ll load a supplementary dataset that contains information about holidays.

Here is the code:

# Step 7.1: Load holiday dataset
holidays = pd.read_csv('holidays.csv', parse_dates=['date'])  
# The 'parse_dates' ensures that the 'date' column is read as datetime objects
# The dataset should have the following columns:
# - 'date': Date of the holiday (format: YYYY-MM-DD)
# - 'is_holiday': Binary flag (1 = holiday, 0 = non-holiday)
# Display the first few rows of the dataset to confirm it loaded correctly
holidays.head()

Output:

Date

Holiday

2012-01-02 New Year's Day
2012-02-20 Family Day
2012-04-06 Good Friday
2012-05-21 Victoria Day
2012-07-02 Canada Day

Step 8.2: Prepare Holiday Flags for Monthly Analysis

To analyze how holidays affect airline passenger traffic, we need to align daily holiday data with monthly passenger data. This step ensures both datasets are at the same time frequency (monthly) for proper comparison.

 What This Code Does:

  • Converts daily holiday entries into monthly counts (i.e., how many holidays in a month).
  • Resamples passenger traffic data to monthly totals.
  • Merges the two datasets on the month.
  • Fills any missing values (months with no holidays) with 0.
# Step 7.2: Prepare holiday flags at monthly level
# 1. Extract 'month' from holiday date
holidays['month'] = holidays['date'].dt.to_period('M')
# 2. Count number of holidays in each month
monthly_holidays = holidays.groupby('month')['holiday'].count().reset_index()
# 3. Rename the holiday count column
monthly_holidays = monthly_holidays.rename(columns={'holiday': 'is_holiday'})
# 4. Convert month back to datetime
monthly_holidays['month'] = monthly_holidays['month'].dt.to_timestamp()
# 5. Resample passenger count to monthly totals
df_monthly = df['Passenger Count'].resample('MS').sum().to_frame().reset_index()
# 6. Merge monthly passengers with holiday counts
df_merged = pd.merge(df_monthly, monthly_holidays,
                     left_on='Activity Period Start Date',
                     right_on='month', how='left')
# 7. Fill months without holidays with 0
df_merged['is_holiday'] = df_merged['is_holiday'].fillna(0)
# 8. Drop the extra 'month' column
df_merged = df_merged.drop(columns=['month'])
# 9. View the first few merged rows
df_merged.head()

Output:

Activity Period Start Date

Passenger Count

is_holiday

1999-07-01 3,976,746 0.0
1999-08-01 3,972,694 0.0
1999-09-01 3,341,964 0.0
1999-10-01 3,468,846 0.0
1999-11-01 3,145,240 0.0

This table shows:

  • Total monthly passenger counts.
  • Number of holidays in each month (is_holiday = 0 means no holidays that month).

Step 8.3: Correlation Between Holidays and Airline Passenger Traffic

Now that we have the merged dataset combining monthly passenger numbers and holiday counts, we can check whether holidays impact airline traffic.

Here’s the code that calculates the correlation:

# Step 8.3: Calculate correlation between holidays and passenger traffic
corr = df_merged['Passenger Count'].corr(df_merged['is_holiday'])
print(f"Correlation between holiday count and passenger traffic: {corr:.2f}")

Output:

Correlation between holiday count and passenger traffic: 0.20

What This Code Does:

  • corr() computes the Pearson correlation coefficient between the two variables:
    • Passenger Count
    • is_holiday (number of holidays in the month) 
  • It helps us understand:
    • Positive correlation (> 0): More holidays → higher passenger traffic
    • Negative correlation (< 0): More holidays → lower traffic
    • Near 0: No strong linear relationship

Step 8.4: Visualize the Relationship Between Holidays and Airline Traffic

After calculating the correlation between the number of holidays and monthly airline passenger count, it's important to visualize how they relate.

A scatter plot gives a clear view of whether more holidays in a month tend to bring more air travelers or not. This step helps analyze the influence of external temporal events (like national holidays) on airline traffic.

# Step 7.4: Scatter plot to visualize relationship
import matplotlib.pyplot as plt  # Ensure this is already imported
# Create a scatter plot
plt.figure(figsize=(10, 5))
plt.scatter(df_merged['is_holiday'], df_merged['Passenger Count'], alpha=0.7)
# Add plot title and axis labels
plt.title('Passengers vs. Number of Holidays per Month')
plt.xlabel('Number of Holidays in Month')
plt.ylabel('Passenger Count')
# Add grid for better readability
plt.grid(True)
# Display the plot
plt.show()

Output:

The scatter plot titled "Passengers vs. Number of Holidays per Month" shows how airline passenger traffic varies with the number of holidays in a month. Here's what we can observe:

  • Most months have 0 or 1 holiday, and they exhibit a wide range of passenger counts, from under 1 million to over 5 million. This suggests that other factors besides holidays also strongly influence airline traffic.
  • When the number of holidays increases (4 to 6 holidays in a month), we see a concentration of higher passenger counts, indicating that months with more holidays tend to attract more travelers, likely due to extended breaks or festive seasons.
  • However, the pattern is not perfectly linear or consistent, which means holidays may contribute to spikes, but they are not the only driver of high air traffic.

Final Conclusion: What We Learned from the Airline Passenger Traffic Analysis Project

In this project, we conducted a comprehensive time series analysis of airline passenger traffic. Here's a summary of the key insights and skills we developed:

 1. Data Cleaning and Preparation

  • We transformed raw data into a usable format by parsing dates, setting appropriate indices, and handling missing values.
  • Aggregated passenger counts to a monthly level to standardize analysis.

 2. Exploratory Data Analysis (EDA)

  • Plotted the raw passenger trends over time to observe growth patterns and anomalies.
  • Identified seasonal patternstrends, and volatility in passenger movement.

 3. Time Series Decomposition

  • Decomposed the time series into trendseasonality, and residuals using seasonal_decompose.
  • This helped us understand the underlying components that influence air traffic volumes.

 4. Forecasting with Prophet

  • Utilized Facebook's Prophet model to forecast airline passenger traffic for the next 24 months.
  • Prophet effectively captured seasonality and trend components, providing a visually interpretable and robust forecast.

 5. Identifying Peak Travel Seasons

  • Analyzed historical data to find the top peak months in terms of passenger traffic.
  • Visualized these peaks to understand when travel demand surges most.

 6. Analyzing External Factors (Holidays)

  • Merged public holiday data with passenger traffic to investigate external influences.
  • Found a low-to-moderate correlation between the number of holidays in a month and passenger volume.
  • Visualized this relationship using scatter plots to reveal patterns or clusters.

In short, this project showcased a full data pipeline, from raw ingestion to insights and forecasting, using scalable and industry-relevant tools, making it a strong technical case study for aspiring data scientists.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Colab Link:
https://colab.research.google.com/drive/1_ap86w1WnHy7Pcp7Zm2biMtxtWa33aPb

Frequently Asked Questions (FAQs)

1. How do you preprocess airline passenger data for time series forecasting?

2. Which time series model is used for airline traffic prediction?

3. How is seasonal decomposition applied in airline traffic analysis?

4. What libraries are used to visualize airline passenger trends?

5. How do you assess the impact of holidays on air traffic using Python?

Rohit Sharma

779 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

Placement Assistance

Executive PG Program

12 Months