Complete Airline Passenger Traffic Analysis Project Using Python
By Rohit Sharma
Updated on Jul 24, 2025 | 16 min read | 1.86K+ views
Share:
For working professionals
For fresh graduates
More
By Rohit Sharma
Updated on Jul 24, 2025 | 16 min read | 1.86K+ views
Share:
Table of Contents
Air travel has evolved significantly over the years, but behind every flight statistic lies a story of demand, seasonality, and real-world events.
In this project, we will clean and explore monthly passenger data, uncover the highs and lows of travel seasons, and predict future demand using a simple forecasting tool. Along the way, we’ll highlight peak months for smarter pricing and examine how outside factors, such as holiday schedules, influence when people fly. By the end, you’ll have hands‑on experience turning raw airline data into clear insights.
If you're looking to accelerate your data science journey, check out the Online Data Science Courses at upGrad. The programs help you learn Python, Machine Learning, AI, Tableau, SQL, and more from top-tier faculty. Enroll today!
Kickstart your next innovation. Browse our curated collection of Python-based data science projects.
Popular Data Science Programs
It’s helpful to have some basic knowledge of the following before starting this project:
Start your journey of career advancement in data science with upGrad’s top-ranked courses and get a chance to learn from industry-established mentors:
For this Airline Passenger Traffic Analysis project, the following tools and libraries will be used:
Tool / Library |
Purpose |
Python | Core programming language for data analysis and scripting |
Pandas | Data loading, cleaning, manipulation, and data handling |
Matplotlib | Basic plotting of time‑series trends and visual checks |
Statsmodels | Seasonal decomposition to separate trend and seasonality |
Prophet | Time‑series forecasting of future passenger demand |
Jupyter/Colab | Interactive environment for writing and running code |
NumPy | Efficient numerical operations and array handling |
Holiday CSV | An external data source to analyze the impact of holidays |
We’ll use a few straightforward yet powerful techniques to understand and predict airline passenger traffic:
You can complete this Airline Passenger Traffic Analysis project in about 2 to 3 hours. Difficulty is rated at (Easy to Moderate), which is perfect for beginners familiar with basic Python and eager to learn time‑series forecasting.
Let’s start building the project from scratch. We'll go step-by-step through the process of:
Without any further delay, let’s get started!
To build our Airline Passenger Traffic Analysis project, we’ll use a publicly available dataset from Kaggle. This dataset includes real‑world monthly passenger counts for various airlines, giving us historical figures to practice time‑series analysis, forecast future demand, and explore seasonal travel patterns.
Follow the steps below to download the dataset:
Now that you’ve downloaded the dataset, let’s move on to the next step, uploading and loading it into Google Colab.
Now that you have downloaded both files, upload them to Google Colab using the code below:
from google.colab import files
uploaded = files.upload()
Once uploaded, import the required libraries and use the following Python code to read and check the data:
# Step 1: Load and inspect the dataset
import pandas as pd # Import pandas for data handling
# Read the CSV file into a DataFrame
# Make sure 'Air_Traffic_Passenger_Statistics (1).csv' is in your working directory
df = pd.read_csv('Air_Traffic_Passenger_Statistics (1).csv')
# Display the first 5 rows to see column names and sample data
print("First 5 rows of the dataset:")
print(df.head())
Output :
First 5 rows of the dataset:
Activity Period Activity Period Start Date \
0 199907 1999/07/01
1 199907 1999/07/01
2 199907 1999/07/01
3 199907 1999/07/01
4 199907 1999/07/01
Operating Airline Operating Airline IATA Code \
0 ATA Airlines TZ
1 ATA Airlines TZ
2 ATA Airlines TZ
3 Aeroflot Russian International Airlines NaN
4 Aeroflot Russian International Airlines NaN
Published Airline Published Airline IATA Code \
0 ATA Airlines TZ
1 ATA Airlines TZ
2 ATA Airlines TZ
3 Aeroflot Russian International Airlines NaN
4 Aeroflot Russian International Airlines NaN
GEO Summary GEO Region Activity Type Code Price Category Code \
0 Domestic US Deplaned Low Fare
1 Domestic US Enplaned Low Fare
2 Domestic US Thru / Transit Low Fare
3 International Europe Deplaned Other
4 International Europe Enplaned Other
Terminal Boarding Area Passenger Count data_as_of \
0 Terminal 1 B 31432 2025/06/20 01:00:30 PM
1 Terminal 1 B 31353 2025/06/20 01:00:30 PM
2 Terminal 1 B 2518 2025/06/20 01:00:30 PM
3 Terminal 2 D 1324 2025/06/20 01:00:30 PM
4 Terminal 2 D 1198 2025/06/20 01:00:30 PM
data_loaded_at
0 2025/07/20 03:02:25 PM
1 2025/07/20 03:02:25 PM
2 2025/07/20 03:02:25 PM
3 2025/07/20 03:02:25 PM
4 2025/07/20 03:02:25 PM
In this step, we’ll convert the date column to a proper datetime type and make it the index of our DataFrame. This is an essential cleaning step because it allows us to easily sort by date, resample data, and run time‑series analyses.
Here is the code:
# Convert Activity Period Start Date to datetime and set as index
import pandas as pd # Ensure pandas is imported
# Convert the 'Activity Period Start Date' column from text to datetime objects
df['Activity Period Start Date'] = pd.to_datetime(df['Activity Period Start Date'], errors='coerce')
# Set this datetime column as the DataFrame index and sort by date
df = df.set_index('Activity Period Start Date').sort_index()
# Confirm the date range and get basic stats on passenger counts
print("Date range:", df.index.min(), "to", df.index.max())
print("\nPassenger Count Summary:")
print(df['Passenger Count'].describe())
Output:
Metric |
Value |
Date Range | 1999-07-01 to 2025-05-01 |
Total Records | 38,196 |
Mean Passengers | 27,814.07 |
Standard Deviation | 61,982.22 |
Minimum | 0 |
25th Percentile | 4,355.75 |
Median (50%) | 8,596 |
75th Percentile | 19,675.75 |
Maximum | 856,501 |
Now the data is cleaned and structured, with dates properly formatted and set as the index. This ensures our analysis and time-series forecasting models will perform accurately on chronological passenger trends.
Before diving into advanced modeling or forecasting, it's helpful to visually understand how passenger traffic changes over time. This line plot helps us spot trends, seasonal peaks, and any unusual dips (such as during the COVID-19 pandemic).
Here is the code:
import matplotlib.pyplot as plt # Library used for creating static, interactive, and animated plots
# Set figure size for better visibility
plt.figure(figsize=(12, 5))
# Plot passenger count over time (each point represents a month's passenger data)
plt.plot(df.index, df['Passenger Count'], marker='o', linestyle='-')
# Set title and axis labels
plt.title('Monthly Airline Passenger Traffic')
plt.xlabel('Date')
plt.ylabel('Number of Passengers')
# Add a grid to make the chart easier to read
plt.grid(True)
# Display the plot
plt.show()
Output:
This visual gives us a clear look at seasonal patterns, long-term growth or decline, and any disruptions in air traffic, all of which are crucial for the forecasting step coming up next.
Understanding how time series data behaves over time is key to accurate forecasting. A powerful technique for this is seasonal decomposition, which breaks a time series into:
Let’s decompose the passenger data to identify these components.
Here is the code:
from statsmodels.tsa.seasonal import seasonal_decompose # For decomposition
import matplotlib.pyplot as plt
# Step 1: Aggregate passenger count by month
# If there are multiple entries per month, this sums them into a single monthly value
monthly_data = df.groupby('Activity Period Start Date')['Passenger Count'].sum()
# Step 2: Ensure the data has a consistent monthly frequency
# 'MS' means "Month Start" — it sets each index to the start of the month
monthly_series = monthly_data.asfreq('MS')
# Step 3: Apply additive seasonal decomposition
# 'additive' assumes that trend + seasonality + noise = observed data
decomp = seasonal_decompose(monthly_series, model='additive', period=12) # 12 = yearly seasonality
# Step 4: Plot the decomposition results
decomp.plot()
# Add title and layout adjustments
plt.suptitle('Seasonal Decomposition of Airline Passengers', fontsize=16)
plt.tight_layout()
plt.show()
Output:
After identifying trends and seasonality, it's time to predict the future of airline passenger traffic. For this, we'll use Facebook Prophet, a powerful forecasting tool built specifically for time series data like ours.
Prophet is beginner-friendly and handles seasonality, holidays, and trends with minimal tuning, perfect for our airline analysis project.
Here is the Code:
# Step 6: Forecast Future Airline Passenger Trends Using Prophet
# 6.1 Install Prophet (only run once in Google Colab or local environment)
# !pip install prophet
# 6.2 Import necessary libraries
from prophet import Prophet
import matplotlib.pyplot as plt
# 6.3 Prepare the dataset for Prophet
# Prophet requires the dataframe to have two specific column names:
# 'ds' -> datestamp (date column), and 'y' -> numeric target variable
# Reset the index so 'Activity Period Start Date' becomes a column again,
# then rename columns to match Prophet's expected format
prophet_df = monthly_series.reset_index().rename(columns={
'Activity Period Start Date': 'ds', # Date column
'Passenger Count': 'y' # Target value to forecast
})
# 6.4 Initialize and fit the Prophet model
# Since airline traffic often repeats yearly patterns, we enable yearly seasonality
model = Prophet(yearly_seasonality=True, daily_seasonality=False)
# Fit the model to the prepared dataset
model.fit(prophet_df)
# 6.5 Generate future dates for forecasting
# We'll forecast for the next 24 months (2 years) using monthly frequency
future = model.make_future_dataframe(periods=24, freq='M')
# 6.6 Predict the future passenger counts
forecast = model.predict(future)
# 6.7 Plot the forecast results
# This graph shows the historical data and predicted future passenger numbers
fig_forecast = model.plot(forecast)
plt.title('Airline Passenger Forecast for Next 24 Months')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.grid(True)
plt.show()
# 6.8 Plot the forecast components
# Prophet automatically breaks down the forecast into:
# - Trend (overall direction)
# - Yearly seasonality (travel peaks/troughs)
fig_components = model.plot_components(forecast)
plt.tight_layout()
plt.show()
Output:
What we gain here:
In the airline industry, knowing when most people travel is crucial. These high-traffic times, or peak travel seasons, often align with holidays, festivals, school vacations, or weather patterns. Identifying these periods helps:
In our project, we’ll use the dataset to detect the months with the highest number of passengers, giving us real insights into seasonal demand patterns. This forms the foundation for smarter forecasting, route planning, and even promotional strategies in the airline industry.
Let’s now dive into the code to find those historical high-demand months.
# Step 7.1: Identify Top 3 Historical Peak Months
# The dataset is indexed by date and includes the number of passengers per entry
# We use 'nlargest(3)' to find the top 3 records with the highest passenger counts
top_peaks = df['Passenger Count'].nlargest(3)
# Display the dates and values for these peak records
print(" Top 3 Peak Months (Historical):")
print(top_peaks)
Output:
Rank |
Month |
Passenger Count |
1 | 1999-08-01 | 856,501 |
2 | 1999-08-01 | 846,421 |
3 | 1999-07-01 | 792,965 |
The top 2 entries have the same date (1999-08-01) — likely from multiple flights/records on the same month contributing to peak volume. This indicates August 1999 was a particularly high-traffic period.
After identifying the top 3 peak passenger months, it's useful to highlight them on a line graph to visually understand when air travel was at its highest. The following code does exactly that:
# Highlight the peak months on the original series
plt.figure(figsize=(12, 5)) # Set the figure size
plt.plot(df.index, df['Passenger Count'], label='Monthly Passengers') # Plot the full time series
plt.scatter(top_peaks.index, top_peaks.values, color='red', s=100, label='Peak Months') # Highlight top peaks
plt.title('Historical Peak Travel Months') # Title for the chart
plt.xlabel('Date') # X-axis label
plt.ylabel('Passengers') # Y-axis label
plt.legend() # Show legend
plt.grid(True) # Add grid for readability
plt.show() # Display the plot
Output:
Airline passenger traffic doesn’t just follow random patterns; it’s heavily influenced by various external factors. Things like public holidays, seasonal breaks, long weekends, and major events (like festivals or sports tournaments) often cause spikes or dips in travel.
In this section, we’ll explore how one of the most common external factors—public holidays- impacts the volume of airline passengers. By doing this, we can uncover interesting patterns and improve our forecasting accuracy for future travel demand.
For performing this step, you have to download an extra sample dataset for holidays, which is easily available on Kaggle.
Understanding these external influences is super helpful for:
Let’s dive in and analyze how holidays may be driving changes in air travel!
To understand how external factors such as holidays affect passenger traffic, we’ll load a supplementary dataset that contains information about holidays.
Here is the code:
# Step 7.1: Load holiday dataset
holidays = pd.read_csv('holidays.csv', parse_dates=['date'])
# The 'parse_dates' ensures that the 'date' column is read as datetime objects
# The dataset should have the following columns:
# - 'date': Date of the holiday (format: YYYY-MM-DD)
# - 'is_holiday': Binary flag (1 = holiday, 0 = non-holiday)
# Display the first few rows of the dataset to confirm it loaded correctly
holidays.head()
Output:
Date |
Holiday |
2012-01-02 | New Year's Day |
2012-02-20 | Family Day |
2012-04-06 | Good Friday |
2012-05-21 | Victoria Day |
2012-07-02 | Canada Day |
To analyze how holidays affect airline passenger traffic, we need to align daily holiday data with monthly passenger data. This step ensures both datasets are at the same time frequency (monthly) for proper comparison.
What This Code Does:
# Step 7.2: Prepare holiday flags at monthly level
# 1. Extract 'month' from holiday date
holidays['month'] = holidays['date'].dt.to_period('M')
# 2. Count number of holidays in each month
monthly_holidays = holidays.groupby('month')['holiday'].count().reset_index()
# 3. Rename the holiday count column
monthly_holidays = monthly_holidays.rename(columns={'holiday': 'is_holiday'})
# 4. Convert month back to datetime
monthly_holidays['month'] = monthly_holidays['month'].dt.to_timestamp()
# 5. Resample passenger count to monthly totals
df_monthly = df['Passenger Count'].resample('MS').sum().to_frame().reset_index()
# 6. Merge monthly passengers with holiday counts
df_merged = pd.merge(df_monthly, monthly_holidays,
left_on='Activity Period Start Date',
right_on='month', how='left')
# 7. Fill months without holidays with 0
df_merged['is_holiday'] = df_merged['is_holiday'].fillna(0)
# 8. Drop the extra 'month' column
df_merged = df_merged.drop(columns=['month'])
# 9. View the first few merged rows
df_merged.head()
Output:
Activity Period Start Date |
Passenger Count |
is_holiday |
1999-07-01 | 3,976,746 | 0.0 |
1999-08-01 | 3,972,694 | 0.0 |
1999-09-01 | 3,341,964 | 0.0 |
1999-10-01 | 3,468,846 | 0.0 |
1999-11-01 | 3,145,240 | 0.0 |
This table shows:
Now that we have the merged dataset combining monthly passenger numbers and holiday counts, we can check whether holidays impact airline traffic.
Here’s the code that calculates the correlation:
# Step 8.3: Calculate correlation between holidays and passenger traffic
corr = df_merged['Passenger Count'].corr(df_merged['is_holiday'])
print(f"Correlation between holiday count and passenger traffic: {corr:.2f}")
Output:
Correlation between holiday count and passenger traffic: 0.20
What This Code Does:
After calculating the correlation between the number of holidays and monthly airline passenger count, it's important to visualize how they relate.
A scatter plot gives a clear view of whether more holidays in a month tend to bring more air travelers or not. This step helps analyze the influence of external temporal events (like national holidays) on airline traffic.
# Step 7.4: Scatter plot to visualize relationship
import matplotlib.pyplot as plt # Ensure this is already imported
# Create a scatter plot
plt.figure(figsize=(10, 5))
plt.scatter(df_merged['is_holiday'], df_merged['Passenger Count'], alpha=0.7)
# Add plot title and axis labels
plt.title('Passengers vs. Number of Holidays per Month')
plt.xlabel('Number of Holidays in Month')
plt.ylabel('Passenger Count')
# Add grid for better readability
plt.grid(True)
# Display the plot
plt.show()
Output:
The scatter plot titled "Passengers vs. Number of Holidays per Month" shows how airline passenger traffic varies with the number of holidays in a month. Here's what we can observe:
In this project, we conducted a comprehensive time series analysis of airline passenger traffic. Here's a summary of the key insights and skills we developed:
1. Data Cleaning and Preparation
2. Exploratory Data Analysis (EDA)
3. Time Series Decomposition
4. Forecasting with Prophet
5. Identifying Peak Travel Seasons
6. Analyzing External Factors (Holidays)
In short, this project showcased a full data pipeline, from raw ingestion to insights and forecasting, using scalable and industry-relevant tools, making it a strong technical case study for aspiring data scientists.
Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!
Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!
Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!
Colab Link:
https://colab.research.google.com/drive/1_ap86w1WnHy7Pcp7Zm2biMtxtWa33aPb
779 articles published
Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...
Speak with Data Science Expert
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources