Sales Data Analysis Project – Learn, Analyze & Drive Business Growth!

By Rohit Sharma

Updated on Jul 23, 2025 | 9 min read | 1.4K+ views

Share:

Sales data reveals important information about buying habits and trend developments, along with identifying unusual market activity.

The project will utilize Python to analyze past sales records for data exploration. The analysis includes pattern recognition and moving average techniques to detect sales fluctuations and uses Prophet for future sales forecasting. This beginner-level approach enables users to grasp sales patterns and enhance decision-making through data analysis.

If you're looking to accelerate your data science journey, check out the Online Data Science Courses at upGrad. The programs help you learn Python, Machine Learning, AI, Tableau, SQL, and more from top-tier faculty. Enroll today!

Spark your next big idea. Browse our full collection of data science projects in Python.

What Should You Know Beforehand?

It’s helpful to have some basic knowledge of the following before starting this project:

  • Python programming (variables, functions, loops, basic syntax)
  • Pandas and Numpy (for handling and analyzing data)
  • Matplotlib or Seaborn (for creating charts and visualizing trends)
  • Date and time handling (using datetime in pandas)
  • Basic understanding of time series (optional, but useful for forecasting)

Start your journey of career advancement in data science with upGrad’s top-ranked courses and get a chance to learn from industry-established mentors:

Technologies and Libraries Used

For this Sales Data Analysis project, the following tools and libraries will be used:

Tool/Library

Purpose

Python Programming language
Google Colab Cloud-based environment to write and run code
Pandas Data manipulation and analysis
NumPy Efficient array and numerical operations
Matplotlib / Seaborn Data visualization and trend plotting
Statsmodels Time series decomposition and statistical tools
Prophet Sales forecasting using time-series modeling

Models That Will Be Utilized for Learning

To predict upcoming sales while analyzing trends, we will employ a straightforward yet effective time series model, which includes:

  • Moving Averages: This method uses simple calculations to reduce short-term noise while revealing extended patterns and cycles present in sales data.
  • Seasonal Decomposition (using Statsmodels):  The method partitions sales data into three components, which include trend patterns and seasonal variations, and residual errors to analyze recurring trends.
  • Prophet (by Meta/Facebook):  The time-series forecasting tool provides an easy entry point for users

Time Taken and Difficulty

You can complete the Sales Data Analysis project in about 2 to 3 hours. It’s a great beginner-friendly, hands-on project for learning how to work with time-series data, perform trend analysis, and apply basic forecasting using Python. 

How to Build a Sales Data Analysis Model

Let’s start building the project from scratch. We'll go step-by-step through the process of:

  1. Loading the sales dataset
  2. Cleaning and preparing the data
  3. Visualizing trends and seasonal patterns
  4. Detecting anomalies
  5. Forecasting future sales using Prophet

Without any further delay, let’s get started!

Step 1: Download the Dataset

To build our Sales Data Analysis model, we’ll use a sample sales dataset available on Kaggle. It contains historical sales records such as order dates, product lines, revenue, and more, perfect for practicing time-series trend analysis and forecasting.

Follow the steps below to download the dataset:

  1. Open a new tab in your web browser.
  2. Go to: https://www.kaggle.com/datasets/kyanyoga/sample-sales-data
  3. Click the Download button to download the dataset as a .zip file.
  4. Once downloaded, extract the ZIP file. You’ll find a file named sales_data_sample.csv.
  5. We’ll use this CSV file for the project.

Now that you’ve downloaded the dataset, let’s move on to the next step, uploading and loading it into Google Colab.

Step 2: Upload and Read the Dataset in Google Colab

Now that you have downloaded both files, upload them to Google Colab using the code below:

from google.colab import files
uploaded = files.upload()

Once uploaded, use the following Python code to read and check the data:

import pandas as pd

# Read the uploaded CSV file
df = pd.read_csv('sales_data_sample.csv', encoding='ISO-8859-1')

# Show the first 5 rows
df.head()

Note: We use encoding='ISO-8859-1' because the file may contain special characters that can cause reading errors with the default encoding.

Output:

Step 3: Clean and Prepare the Data

We only need the ORDERDATE and SALES columns. Let’s clean the data by converting dates, grouping sales by day, and sorting them.

Here is the code:

# Step 3: Keep only the columns we need (Date and Sales)
df = df[['ORDERDATE', 'SALES']]

# Step 4: Convert the 'ORDERDATE' column to datetime format (e.g., 2/24/2003 → 2003-02-24)
df['ORDERDATE'] = pd.to_datetime(df['ORDERDATE'])

# Step 5: Group sales by date (add sales from same date)
df = df.groupby('ORDERDATE').sum()

# Step 6: Sort by date (from oldest to newest)
df = df.sort_index()

# Step 7: View the first few rows
print(df.head())

Output:

ORDERDATE SALES
2003-01-06   12133.25
2003-01-09   11432.34
2003-01-10  6864.05
2003-01-29  54702.00

Now that our sales data is clean and sorted, our very first task is to perform sales data visualization.

Step 4: Visualize Daily Sales

Let’s create a line chart to see how sales change over time.

What This Step Does:

  •  Plot daily sales to show overall trends.
  •  Helps identify patterns, spikes, or drops in sales.
  •  Makes data easier to understand visually.

Here is the code:

# Import the matplotlib library for plotting
import matplotlib.pyplot as plt
# Set the figure size (width=14, height=6) so the graph is large and clear
plt.figure(figsize=(14, 6))

# Plot sales over time; x = date, y = sales values
plt.plot(df.index, df['SALES'], color='blue', label='Daily Sales')

# Set a title for the chart
plt.title('Daily Sales Trend')

# Label for the X-axis
plt.xlabel('Date')

# Label for the Y-axis
plt.ylabel('Sales ($)')

# Add grid lines for better readability
plt.grid(True)

# Show the legend (Daily Sales label)
plt.legend()

# Adjust layout so nothing gets cut off
plt.tight_layout()

# Finally, display the plot
plt.show()

Output: 

Step 5: Apply Moving Averages

Raw sales data can be noisy due to daily fluctuations. To get a smoother and clearer trend, we calculate moving averages:

 Why Use Moving Averages?

  •  Smooths short-term ups and downs.
  •  Helps reveal longer-term trends.
  •  Easier to analyze than raw sales data.

We’ll add:

  •  7-day Moving Average – short-term trend
  •  30-day Moving Average – long-term trend

Here is the code:

# Add 7-day moving average (rolling mean)
df['7-day MA'] = df['SALES'].rolling(window=7).mean()

# Add 30-day moving average
df['30-day MA'] = df['SALES'].rolling(window=30).mean()

# Create a new plot
plt.figure(figsize=(12, 6))

# Plot original daily sales
plt.plot(df['SALES'], label='Original', color='lightgray')

# Plot 7-day moving average
plt.plot(df['7-day MA'], label='7-Day MA', color='blue')

# Plot 30-day moving average
plt.plot(df['30-day MA'], label='30-Day MA', color='green')

# Add a title and legend
plt.title('Sales with Moving Averages')
plt.legend()

# Show the chart
plt.show()

Output:

background

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Step 6: Decompose Time Series for Seasonality

Sales often have hidden patterns like:

  •  Overall growth or decline (trend)
  •  Regular ups and downs (seasonality)
  •  Random noise (residual)

To see these parts clearly, we decompose the sales data using a built-in function.

 What We’ll Do:

  •  Resample daily sales into monthly totals.
  •  Use seasonal_decompose() to break the time series into:
    • Trend (long-term direction)
    • Seasonality (repeating patterns)
    • Residual (noise or error)

Here is the code:

# Import the seasonal decomposition function
from statsmodels.tsa.seasonal import seasonal_decompose

# Step 11: Convert daily sales to monthly total sales
monthly_sales = df['SALES'].resample('M').sum()

# Step 12: Break the monthly sales data into trend, seasonality, and noise
result = seasonal_decompose(monthly_sales, model='additive')

# Step 13: Plot the result
result.plot()  # Creates 4 subplots: observed, trend, seasonal, residual
plt.suptitle('Time Series Decomposition of Monthly Sales', fontsize=16)  # Add title
plt.tight_layout()  # Prevents label cutoff
plt.show()  # Show the chart

Output:

This step gives you a clear look at how your sales are changing, what patterns repeat, and what’s just noise.

Now, take a look at the Sales anomaly detection part in this project.

Step 7: Detect Anomalies

Sometimes, sales spike unexpectedly due to special events, promotions, or errors. These unusual days are called anomalies.

 What We’ll Do:

  • Calculate a threshold: average sales + 2 × standard deviation.
  • Find days where sales go above this threshold.
  • Highlight those unusual sales spikes on a graph.

Here is the code:

# Step 14: Calculate threshold for anomaly detection
mean_sales = df['SALES'].mean()  # Average sales
std_sales = df['SALES'].std()    # Spread/variation in sales
threshold = mean_sales + 2 * std_sales  # Set threshold at 2 standard deviations above mean

# Step 15: Identify days where sales exceed this threshold
anomalies = df[df['SALES'] > threshold]  # These are the anomaly days

# Step 16: Visualize daily sales and mark anomalies
plt.figure(figsize=(14, 6))
plt.plot(df.index, df['SALES'], label='Daily Sales', color='lightgray')  # Main sales line
plt.scatter(anomalies.index, anomalies['SALES'], color='red', label='Anomalies', zorder=5)  # Red dots for anomalies

# Add horizontal line for the anomaly threshold
plt.axhline(y=threshold, color='orange', linestyle='--', label='Anomaly Threshold')

plt.title('Anomaly Detection in Sales')
plt.xlabel('Date')
plt.ylabel('Sales ($)')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

# Optional: Display the dates and values of anomalies
print("Unusual Sales Days:")
print(anomalies[['SALES']])

Output:

ORDERDATE

SALES

2003-11-06 114,456.85
2003-11-12 111,156.73
2003-11-14 131,236.00
2003-11-20 95,344.63
2003-12-02 109,432.27
2004-08-20 96,139.50
2004-09-08 93,717.43
2004-10-16 103,815.53
2004-10-22 96,850.65
2004-11-04 105,074.98
2004-11-05 106,240.69
2004-11-17 97,958.38
2004-11-24 137,644.72
2004-12-10 93,587.56
2005-02-17 92,236.97

This helps you spot outliers in your data, days that performed way above normal, which may need special attention or deeper analysis.

Now, take a look at how we can perform Sales forecasting using Prophet in this project.

Step 8: Forecast Sales Using Prophet

We'll now use Facebook Prophet, a powerful time series forecasting tool developed by Meta. This step will help us:

  • Forecast the next 12 months of sales
  •  View a graph of future predictions with uncertainty ranges
  •  Visualize trends and seasonal patterns clearly
    What We'll Do:
  • Prepare monthly sales data for Prophet
  • Rename columns as required (ds for date and y for value)
  • Fit the Prophet model
  • Forecast for the next 12 months
  • Plot forecast and seasonality

Here is the code:

# STEP 1: Install Prophet if you haven't already
!pip install prophet

# STEP 2: Import Prophet
from prophet import Prophet

# STEP 3: Prepare monthly data (we'll forecast based on monthly sales totals)
monthly_df = df['SALES'].resample('M').sum().reset_index()

# STEP 4: Prophet expects columns named 'ds' for date and 'y' for value
monthly_df.columns = ['ds', 'y']

# Let's see the first few rows
monthly_df.head()

Output:

ds

y

2003-01-31 112441.82
2003-02-28 248348.12
2003-03-31 289763.94
2003-04-30 297828.61
2003-05-31 303148.69

Train the Prophet Model and Forecast Future Sales

Now that we have our monthly sales data ready, let’s move ahead with forecasting the next 12 months.

What We'll Do:

  • Create and train a Prophet model using historical sales
  • Generate future dates for the next 12 months
  • Predict sales for those future dates using the trained model

Here is the code:

# STEP 6: Create and fit the model
model = Prophet()               # Initialize Prophet model
model.fit(monthly_df)           # Fit the model to our sales data

# STEP 7 : Create future dates (12 more months from the last date in the dataset)
future = model.make_future_dataframe(periods=12, freq='M')  # Add 12 future months

# STEP 8: Predict future sales using the trained model
forecast = model.predict(future)  # Generate forecast for all dates (past + future)

Plot the Sales Forecast

Let’s visualize the predicted sales values using Prophet’s built-in plot.

Here is the code: 

# STEP 9: Plot the forecast
fig = model.plot(forecast)          # Prophet's built-in forecast plot
plt.title("Monthly Sales Forecast") # Add title
plt.xlabel("Date")                  # X-axis label
plt.ylabel("Sales")                 # Y-axis label
plt.grid(True)                      # Add grid for clarity
plt.show()                          # Show the plot

Output:

This chart shows:

  • Historical sales (black dots/line)
  • Forecasted sales (blue line)
  • Confidence intervals (shaded light-blue region)

Conclusion

The project involved sales data analysis to identify patterns, together with deviations and trends. After cleaning the data, we visualized it, then performed moving averages to detect anomalies before using Facebook Prophet for sales prediction.  Through this practical project, we received an introductory understanding of time series analysis and forecasting alongside business applications for data-driven decision-making.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Reference:
https://colab.research.google.com/drive/1mqhEBihgCoRVZOUNBZLU5dEley7Ff8PM?usp=sharing

Frequently Asked Questions (FAQs)

1. What is time series analysis in sales data?

2. Why should I use Python for sales data analysis?

3. What is Prophet in Python used for?

4. How can I detect sales anomalies using Python?

5. How do I prepare sales data for Prophet forecasting?

6. Can I forecast daily sales using Prophet?

7. What are moving averages, and why are they used in sales analysis?

8. How accurate is Prophet for forecasting sales?

9. Do I need machine learning experience to do this project?

10. Is sales forecasting helpful for small businesses?

11. Can this project be used for academic or portfolio purposes?

Rohit Sharma

804 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

Placement Assistance

Executive PG Program

12 Months