Sales Data Analysis Project – Learn, Analyze & Drive Business Growth!
By Rohit Sharma
Updated on Jul 23, 2025 | 9 min read | 1.4K+ views
Share:
For working professionals
For fresh graduates
More
By Rohit Sharma
Updated on Jul 23, 2025 | 9 min read | 1.4K+ views
Share:
Sales data reveals important information about buying habits and trend developments, along with identifying unusual market activity.
The project will utilize Python to analyze past sales records for data exploration. The analysis includes pattern recognition and moving average techniques to detect sales fluctuations and uses Prophet for future sales forecasting. This beginner-level approach enables users to grasp sales patterns and enhance decision-making through data analysis.
If you're looking to accelerate your data science journey, check out the Online Data Science Courses at upGrad. The programs help you learn Python, Machine Learning, AI, Tableau, SQL, and more from top-tier faculty. Enroll today!
Spark your next big idea. Browse our full collection of data science projects in Python.
It’s helpful to have some basic knowledge of the following before starting this project:
Start your journey of career advancement in data science with upGrad’s top-ranked courses and get a chance to learn from industry-established mentors:
For this Sales Data Analysis project, the following tools and libraries will be used:
Tool/Library |
Purpose |
Python | Programming language |
Google Colab | Cloud-based environment to write and run code |
Pandas | Data manipulation and analysis |
NumPy | Efficient array and numerical operations |
Matplotlib / Seaborn | Data visualization and trend plotting |
Statsmodels | Time series decomposition and statistical tools |
Prophet | Sales forecasting using time-series modeling |
To predict upcoming sales while analyzing trends, we will employ a straightforward yet effective time series model, which includes:
You can complete the Sales Data Analysis project in about 2 to 3 hours. It’s a great beginner-friendly, hands-on project for learning how to work with time-series data, perform trend analysis, and apply basic forecasting using Python.
Let’s start building the project from scratch. We'll go step-by-step through the process of:
Without any further delay, let’s get started!
To build our Sales Data Analysis model, we’ll use a sample sales dataset available on Kaggle. It contains historical sales records such as order dates, product lines, revenue, and more, perfect for practicing time-series trend analysis and forecasting.
Follow the steps below to download the dataset:
Now that you’ve downloaded the dataset, let’s move on to the next step, uploading and loading it into Google Colab.
Now that you have downloaded both files, upload them to Google Colab using the code below:
from google.colab import files
uploaded = files.upload()
Once uploaded, use the following Python code to read and check the data:
import pandas as pd
# Read the uploaded CSV file
df = pd.read_csv('sales_data_sample.csv', encoding='ISO-8859-1')
# Show the first 5 rows
df.head()
Note: We use encoding='ISO-8859-1' because the file may contain special characters that can cause reading errors with the default encoding.
Output:
Popular Data Science Programs
We only need the ORDERDATE and SALES columns. Let’s clean the data by converting dates, grouping sales by day, and sorting them.
Here is the code:
# Step 3: Keep only the columns we need (Date and Sales)
df = df[['ORDERDATE', 'SALES']]
# Step 4: Convert the 'ORDERDATE' column to datetime format (e.g., 2/24/2003 → 2003-02-24)
df['ORDERDATE'] = pd.to_datetime(df['ORDERDATE'])
# Step 5: Group sales by date (add sales from same date)
df = df.groupby('ORDERDATE').sum()
# Step 6: Sort by date (from oldest to newest)
df = df.sort_index()
# Step 7: View the first few rows
print(df.head())
Output:
ORDERDATE | SALES |
2003-01-06 | 12133.25 |
2003-01-09 | 11432.34 |
2003-01-10 | 6864.05 |
2003-01-29 | 54702.00 |
Now that our sales data is clean and sorted, our very first task is to perform sales data visualization.
Let’s create a line chart to see how sales change over time.
What This Step Does:
Here is the code:
# Import the matplotlib library for plotting
import matplotlib.pyplot as plt
# Set the figure size (width=14, height=6) so the graph is large and clear
plt.figure(figsize=(14, 6))
# Plot sales over time; x = date, y = sales values
plt.plot(df.index, df['SALES'], color='blue', label='Daily Sales')
# Set a title for the chart
plt.title('Daily Sales Trend')
# Label for the X-axis
plt.xlabel('Date')
# Label for the Y-axis
plt.ylabel('Sales ($)')
# Add grid lines for better readability
plt.grid(True)
# Show the legend (Daily Sales label)
plt.legend()
# Adjust layout so nothing gets cut off
plt.tight_layout()
# Finally, display the plot
plt.show()
Output:
Raw sales data can be noisy due to daily fluctuations. To get a smoother and clearer trend, we calculate moving averages:
Why Use Moving Averages?
We’ll add:
Here is the code:
# Add 7-day moving average (rolling mean)
df['7-day MA'] = df['SALES'].rolling(window=7).mean()
# Add 30-day moving average
df['30-day MA'] = df['SALES'].rolling(window=30).mean()
# Create a new plot
plt.figure(figsize=(12, 6))
# Plot original daily sales
plt.plot(df['SALES'], label='Original', color='lightgray')
# Plot 7-day moving average
plt.plot(df['7-day MA'], label='7-Day MA', color='blue')
# Plot 30-day moving average
plt.plot(df['30-day MA'], label='30-Day MA', color='green')
# Add a title and legend
plt.title('Sales with Moving Averages')
plt.legend()
# Show the chart
plt.show()
Output:
Sales often have hidden patterns like:
To see these parts clearly, we decompose the sales data using a built-in function.
What We’ll Do:
Here is the code:
# Import the seasonal decomposition function
from statsmodels.tsa.seasonal import seasonal_decompose
# Step 11: Convert daily sales to monthly total sales
monthly_sales = df['SALES'].resample('M').sum()
# Step 12: Break the monthly sales data into trend, seasonality, and noise
result = seasonal_decompose(monthly_sales, model='additive')
# Step 13: Plot the result
result.plot() # Creates 4 subplots: observed, trend, seasonal, residual
plt.suptitle('Time Series Decomposition of Monthly Sales', fontsize=16) # Add title
plt.tight_layout() # Prevents label cutoff
plt.show() # Show the chart
Output:
This step gives you a clear look at how your sales are changing, what patterns repeat, and what’s just noise.
Now, take a look at the Sales anomaly detection part in this project.
Sometimes, sales spike unexpectedly due to special events, promotions, or errors. These unusual days are called anomalies.
What We’ll Do:
Here is the code:
# Step 14: Calculate threshold for anomaly detection
mean_sales = df['SALES'].mean() # Average sales
std_sales = df['SALES'].std() # Spread/variation in sales
threshold = mean_sales + 2 * std_sales # Set threshold at 2 standard deviations above mean
# Step 15: Identify days where sales exceed this threshold
anomalies = df[df['SALES'] > threshold] # These are the anomaly days
# Step 16: Visualize daily sales and mark anomalies
plt.figure(figsize=(14, 6))
plt.plot(df.index, df['SALES'], label='Daily Sales', color='lightgray') # Main sales line
plt.scatter(anomalies.index, anomalies['SALES'], color='red', label='Anomalies', zorder=5) # Red dots for anomalies
# Add horizontal line for the anomaly threshold
plt.axhline(y=threshold, color='orange', linestyle='--', label='Anomaly Threshold')
plt.title('Anomaly Detection in Sales')
plt.xlabel('Date')
plt.ylabel('Sales ($)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
# Optional: Display the dates and values of anomalies
print("Unusual Sales Days:")
print(anomalies[['SALES']])
Output:
ORDERDATE |
SALES |
2003-11-06 | 114,456.85 |
2003-11-12 | 111,156.73 |
2003-11-14 | 131,236.00 |
2003-11-20 | 95,344.63 |
2003-12-02 | 109,432.27 |
2004-08-20 | 96,139.50 |
2004-09-08 | 93,717.43 |
2004-10-16 | 103,815.53 |
2004-10-22 | 96,850.65 |
2004-11-04 | 105,074.98 |
2004-11-05 | 106,240.69 |
2004-11-17 | 97,958.38 |
2004-11-24 | 137,644.72 |
2004-12-10 | 93,587.56 |
2005-02-17 | 92,236.97 |
This helps you spot outliers in your data, days that performed way above normal, which may need special attention or deeper analysis.
Now, take a look at how we can perform Sales forecasting using Prophet in this project.
We'll now use Facebook Prophet, a powerful time series forecasting tool developed by Meta. This step will help us:
Here is the code:
# STEP 1: Install Prophet if you haven't already
!pip install prophet
# STEP 2: Import Prophet
from prophet import Prophet
# STEP 3: Prepare monthly data (we'll forecast based on monthly sales totals)
monthly_df = df['SALES'].resample('M').sum().reset_index()
# STEP 4: Prophet expects columns named 'ds' for date and 'y' for value
monthly_df.columns = ['ds', 'y']
# Let's see the first few rows
monthly_df.head()
Output:
ds |
y |
2003-01-31 | 112441.82 |
2003-02-28 | 248348.12 |
2003-03-31 | 289763.94 |
2003-04-30 | 297828.61 |
2003-05-31 | 303148.69 |
Now that we have our monthly sales data ready, let’s move ahead with forecasting the next 12 months.
What We'll Do:
Here is the code:
# STEP 6: Create and fit the model
model = Prophet() # Initialize Prophet model
model.fit(monthly_df) # Fit the model to our sales data
# STEP 7 : Create future dates (12 more months from the last date in the dataset)
future = model.make_future_dataframe(periods=12, freq='M') # Add 12 future months
# STEP 8: Predict future sales using the trained model
forecast = model.predict(future) # Generate forecast for all dates (past + future)
Let’s visualize the predicted sales values using Prophet’s built-in plot.
Here is the code:
# STEP 9: Plot the forecast
fig = model.plot(forecast) # Prophet's built-in forecast plot
plt.title("Monthly Sales Forecast") # Add title
plt.xlabel("Date") # X-axis label
plt.ylabel("Sales") # Y-axis label
plt.grid(True) # Add grid for clarity
plt.show() # Show the plot
Output:
This chart shows:
The project involved sales data analysis to identify patterns, together with deviations and trends. After cleaning the data, we visualized it, then performed moving averages to detect anomalies before using Facebook Prophet for sales prediction. Through this practical project, we received an introductory understanding of time series analysis and forecasting alongside business applications for data-driven decision-making.
Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!
Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!
Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!
Reference:
https://colab.research.google.com/drive/1mqhEBihgCoRVZOUNBZLU5dEley7Ff8PM?usp=sharing
804 articles published
Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...
Speak with Data Science Expert
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources