Home
Blog
Data Science
Demand Forecasting for E-commerce Using Python (Machine Learning Project)

Demand Forecasting for E-commerce Using Python (Machine Learning Project)

Updated on Jul 29, 2025 | 1.32K+ views

Table of Contents

View all

What Should You Know to Build This Project Successfully?
Tools That Power the Forecast: Tech Stack and Libraries Explained
How Long Does It Take and What to Expect
Smart Forecasting: Techniques That Drive E-commerce Demand Prediction
How to Build a Weather Forecasting Model
Conclusion

Demand Forecasting for E-commerce is a critical application of data science that helps online retailers to optimize stock levels, reduce wastage, and meet customer demand efficiently.

In this project, you'll use Python to analyze historical sales data, identify trends and seasonality, and build a machine learning model to forecast future demand. This hands-on approach will help you master time series analysis and regression techniques tailored for real-world e-commerce scenarios.

Supercharge your data science career with upGrad’s top-rated Online Data Science Courses. Learn Python, Machine Learning, AI, SQL, Tableau, and more, taught by industry experts. Build real-world skills and get job-ready. Start learning today!

Turn ideas into action, explore our top Python Data Science Projects, and start building today.

What Should You Know to Build This Project Successfully?

It’s helpful to have some basic knowledge of the following before starting this project:

Python programming (variables, functions, loops, basic syntax)
Pandas and Numpy (data loading, cleaning, and numerical operations)
Matplotlib or Seaborn (data visualization)
Scikit‑learn basics (Understand how to train a regression model, make predictions, and evaluate performance using metrics such as MAE, RMSE, and R²)
Intro to Time Series Concepts (Grasp the basics of trend, seasonality, and autocorrelation. These ideas help when preparing data for forecasting models).

Also Read- Python Tutorial: Learn Python from Scratch

Start your data science career journey with upGrad’s top-ranked courses and gain the opportunity to learn directly from experienced industry mentors.

Tools That Power the Forecast: Tech Stack and Libraries Explained

To build this e-commerce demand forecasting, you'll work with a powerful set of Python tools and libraries designed for data analysis, modeling, and visualization:

Tool / Library	Purpose
Python	The main programming language used to build the model
Google Colab	Cloud-based environment for writing, running, and sharing code
Pandas	For reading data, cleaning missing values, and wrangling time series
NumPy	Performs fast numerical operations and array manipulations
Matplotlib / Seaborn	Helps visualize trends, patterns, and actual vs. predicted values
Scikit-learn	Trains the regression model and evaluates its performance
Datetime / statsmodels	Deals with timestamps and enhances forecasting with statistical tools

Also Read - Top 6 Python IDEs of 2025 That Will Change Your Workflow!

How Long Does It Take and What to Expect

You can finish this E-commerce Demand Forecasting project in 3 to 4 hours. It’s perfect for beginners to intermediate learners.

Smart Forecasting: Techniques That Drive E-commerce Demand Prediction

To build a reliable demand forecasting model for e-commerce, you'll apply key techniques that help uncover patterns in historical sales and predict future demand:

Linear Regression: Predict future product demand from past sales data.
Time Series Analysis (Lag Features, Rolling Means): Like lag values and rolling averages to capture trends and seasonality.

These tools help build accurate, data-driven forecasts for smarter inventory planning.

Boost your predictive modeling skills with this free Step-by-Step Linear Regression Course, perfect for mastering feature engineering, model evaluation, and building accurate forecasts.

How to Build a Weather Forecasting Model

Let’s build this project from scratch with clear, step-by-step guidance:

Load the Weather Dataset
Clean and Preprocess the Data
Feature Engineering and Aggregation
Define Features and Target, then Split Data Chronologically.
Train the Linear Regression Model
Evaluate the Model
Visualize the Predictions

Without any further delay, let’s get started!

Step 1: Download the Dataset

Download the dataset from Kaggle, extract the ZIP file, and use the downloaded dataset file for the project.

Now that you’ve downloaded the dataset, let’s move on to the next step, uploading and loading it into Google Colab.

Step 2: Upload and Read the Dataset in Google Colab

Now that you have downloaded both files, upload them to Google Colab using the code below:

from google.colab import files
uploaded = files.upload()

Once uploaded, use the following Python code to read and check the data and import the required libraries:

# main.py
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import matplotlib.pyplot as plt
import seaborn as sns

# --- 1. Data Loading and Initial Exploration ---
try:
    df = pd.read_csv('Dataset.csv', encoding='ISO-8859-1')
except FileNotFoundError:
    print("Error: 'Dataset.csv' not found. Please ensure the file is in the correct directory.")
    exit()
print("--- Initial Data Overview ---")
print("First 5 rows of the dataset:")
print(df.head())
print("\nDataset Information:")
df.info()

Output:

--- Initial Data Overview ---

First 5 rows of the dataset:

InvoiceNo StockCode Description Quantity \

0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6

1 536365 71053 WHITE METAL LANTERN 6

2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8

3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6

4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6

InvoiceDate UnitPrice CustomerID Country

0 01-12-2010 08:26 2.55 17850.0 United Kingdom

1 01-12-2010 08:26 3.39 17850.0 United Kingdom

2 01-12-2010 08:26 2.75 17850.0 United Kingdom

3 01-12-2010 08:26 3.39 17850.0 United Kingdom

4 01-12-2010 08:26 3.39 17850.0 United Kingdom

Dataset Information:

RangeIndex: 541909 entries, 0 to 541908

Data columns (total 8 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 InvoiceNo 541909 non-null object

1 StockCode 541909 non-null object

2 Description 540455 non-null object

3 Quantity 541909 non-null int64

4 InvoiceDate 541909 non-null object

5 UnitPrice 541909 non-null float64

6 CustomerID 406829 non-null float64

7 Country 541909 non-null object

dtypes: float64(2), int64(1), object(5)

Step 3: Data Cleaning and Preprocessing

We’ll clean the dataset, extract useful time-based features, handle missing values, and convert categorical variables into a numerical format that the machine learning model can understand.

Here is the code for this step:

# Drop rows with missing CustomerID
df.dropna(subset=['CustomerID'], inplace=True)
print(f"\nDropped rows with missing CustomerID. New shape: {df.shape}")

# Convert InvoiceDate to datetime
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'], format='mixed')

# Remove returns/cancellations (Quantity < 0)
df = df[df['Quantity'] > 0]
print(f"Removed returned items. New shape: {df.shape}")

# Remove zero or negative UnitPrice entries
df = df[df['UnitPrice'] > 0]
print(f"Removed items with zero or negative price. New shape: {df.shape}")

Output:

Dropped rows with missing CustomerID. New shape: (406829, 8)

Removed returned items. New shape: (397924, 8)

Removed items with zero or negative price. New shape: (397884, 8)

Also Read - Data Cleaning Techniques: 15 Simple & Effective Ways To Clean Data

Step 4: Feature Engineering and Time Series Creation

To forecast sales, we need to convert raw transactional data into a structured time series format. This step includes calculating the total price and aggregating daily sales quantities.

Tasks:

Create a TotalPrice column
Aggregate daily quantity sold into a time series format

Here is the code for this step:

# Create a 'TotalPrice' column
df['TotalPrice'] = df['Quantity'] * df['UnitPrice']

# Aggregate total quantity sold per day
daily_sales = df.set_index('InvoiceDate').resample('D')['Quantity'].sum().reset_index()
daily_sales.rename(columns={'InvoiceDate': 'Date', 'Quantity': 'TotalQuantity'}, inplace=True)
print("\nAggregated daily sales data (first 5 rows):")
print(daily_sales.head())

Output:

Aggregated daily sales data (first 5 rows):

Date TotalQuantity

0 2010-01-12 24215

1 2010-01-13 0

2 2010-01-14 0

3 2010-01-15 0

4 2010-01-16 0

Also Read- Feature Engineering for Machine Learning: Process, Techniques, and Examples

Step 5: Creating Time-Based Features for the Model

Now that we have a clean time series, the next step is to engineer features that help the model understand patterns in time, like weekdays, seasonality, and recent trends.

Here is the code for this step:

# Set Date as index for easy feature creation
daily_sales.set_index('Date', inplace=True)

# Calendar-based features
daily_sales['dayofweek'] = daily_sales.index.dayofweek
daily_sales['month'] = daily_sales.index.month
daily_sales['year'] = daily_sales.index.year
daily_sales['dayofyear'] = daily_sales.index.dayofyear

# Lag features
daily_sales['lag_1'] = daily_sales['TotalQuantity'].shift(1)
daily_sales['lag_7'] = daily_sales['TotalQuantity'].shift(7)

# Rolling window feature (7-day rolling mean)
daily_sales['rolling_mean_7'] = daily_sales['TotalQuantity'].rolling(window=7).mean()

# Drop rows with NaN values caused by lags and rolling calculations
daily_sales.dropna(inplace=True)
print("\nCreated time-based, lag, and rolling features.")
print("Final features for the model (first 5 rows):")
print(daily_sales.head())

Output:

Created time-based, lag, and rolling features.

Final features for the model (first 5 rows):

Date TotalQuantity dayofweek month year dayofyear lag_1 lag_7 \

2010-01-19 0 1 1 2010 19 0.0 24215.0

2010-01-20 0 2 1 2010 20 0.0 0.0

2010-01-21 0 3 1 2010 21 0.0 0.0

2010-01-22 0 4 1 2010 22 0.0 0.0

2010-01-23 0 5 1 2010 23 0.0 0.0

rolling_mean_7

Date

2010-01-19 0.0

2010-01-20 0.0

2010-01-21 0.0

2010-01-22 0.0

2010-01-23 0.0

Step 6: Defining Features and Target for the Forecast

Now that we’ve created relevant time-based features, it’s time to separate them into input features (X) and the output we want to predict (y).

Here, TotalQuantity is the target variable, and all other columns help in making predictions.

Here is the code for this step:

# The target 'y' is the total daily quantity we want to predict.
y = daily_sales['TotalQuantity']

# The features 'X' are all the columns we created to help predict the target.
X = daily_sales.drop('TotalQuantity', axis=1)
print("\nShape of Feature Matrix (X):", X.shape)
print("Shape of Target Vector (y):", y.shape)

Output:

Shape of Feature Matrix (X): (691, 7)

Shape of Target Vector (y): (691,)

Step 7: Splitting the Data and Training the Model

For time series forecasting, it's important to maintain chronological order.

We’ll split the data based on time, training on earlier records, and testing on more recent ones.

Then we train a simple and effective Linear Regression model using the training data.

Here is the code:

# For time series data, we split chronologically to train on the past and test on the future.
split_ratio = 0.8
split_index = int(len(X) * split_ratio)
X_train = X[:split_index]
X_test = X[split_index:]
y_train = y[:split_index]
y_test = y[split_index:]
print(f"\nSplit data into {split_ratio*100}% training and {(1-split_ratio)*100}% testing.")
print("Training set size:", len(X_train))
print("Testing set size:", len(X_test))

# --- 7. Model Training ---
print("\n--- Model Training ---")
# We'll use Linear Regression, a straightforward and effective model for regression tasks.
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
print("Linear Regression model trained successfully.")

Output:

Split data into 80.0% training and 19.999999999999996% testing.

Training set size: 552

Testing set size: 139

Linear Regression model trained successfully.

Step 8: Model Prediction and Evaluation

Once the model is trained, it's time to test how well it performs on unseen data.

We’ll make predictions using the test set and evaluate the results using standard regression metrics like MAE, RMSE, and R².

Here is the Code for this step:

print("\n--- Model Evaluation ---")
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

# Predict on test set
y_pred = model.predict(X_test)

# Calculate evaluation metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

# Display results
print(f"Mean Absolute Error (MAE): {mae:,.2f} units")
print(f"Root Mean Squared Error (RMSE): {rmse:,.2f} units")
print(f"R-squared (R²): {r2:.2f}")

Output:

Mean Absolute Error (MAE): 8,059.80 units

Root Mean Squared Error (RMSE): 11,250.11 units

R-squared (R²): 0.19

Also Read - Evaluation Metrics in Machine Learning: Top 10 Metrics You Should Know

Step 9: Visualizing Actual vs Predicted Sales

After evaluation, it's helpful to visualize how well the model predictions align with actual sales.

This line chart lets you see patterns, gaps, and overall prediction accuracy over time.

# --- Visualizing Results ---
import matplotlib.pyplot as plt
print("\n--- Visualizing Results ---")
plt.style.use('seaborn-v0_8-whitegrid')
fig, ax = plt.subplots(figsize=(15, 7))

# Plot actual vs predicted sales
ax.plot(y_test.index, y_test.values, label='Actual Daily Sales', color='blue', linewidth=2)
ax.plot(y_test.index, y_pred, label='Predicted Daily Sales', color='red', linestyle='--', linewidth=2)

# Customize the chart
ax.set_title('Demand Forecast: Actual vs. Predicted Sales', fontsize=16)
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Total Quantity Sold', fontsize=12)
ax.legend()
plt.xticks(rotation=45)
plt.tight_layout()

# Display the plot
print("Displaying the plot of actual vs. predicted sales.")
plt.show()

Output:

Popular Data Science Programs

PG Diploma in Data Science M Sc in Data Science Degree Advanced Certificate Program in Data Science Cloud Computing Courses Certification Data Science Machine Learning Course

Also Read - Top 15 Types of Data Visualization: Benefits and How to Choose the Right Tool for Your Needs in 2025

Conclusion

This project demonstrated how to forecast daily product demand using linear regression. You cleaned the sales data, created time-based features, and built a time series model to predict future sales. The model gave decent performance and helped visualize actual vs predicted demand. While this was a solid starting point, you can improve accuracy by exploring more advanced models or adding external factors.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Data Science Courses to upskill

Explore Data Science Courses for Career Progression

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

Explore our Popular Data Science Courses

Executive Post Graduate Programme in Data Science from IIITB	Data Science Bootcamp with AI	Master of Science in Data Science from LJMU
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Courses

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Top Data Science Skills to Learn

Data Analysis Course	Inferential Statistics Courses
Hypothesis Testing Programs	Logistic Regression Courses
Linear Regression Courses	Linear Algebra for Analysis

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Read our popular Data Science Articles

Data Science Career Path: A Comprehensive Career Guide	Data Science Career Growth: The Future of Work is here	Why is Data Science Important? 8 Ways Data Science Brings Value to the Business
Relevance of Data Science for Managers	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist

Reference:
https://colab.research.google.com/drive/1EES-nOpZTlAnGOwOGpr_cqC49Vd3SzWq?usp=sharing

Subscribe to upGrad's Newsletter

Join thousands of learners who receive useful tips

Promise we won't spam!

Frequently Asked Questions (FAQs)

1. What is demand forecasting in e-commerce?

Demand forecasting in e-commerce involves predicting future customer demand for products using historical sales data. It helps online businesses optimize inventory, reduce stockouts, and plan logistics more efficiently.

2. How can machine learning be used for demand forecasting?

Machine learning models like linear regression, decision trees, and LSTM can analyze past sales trends and time-based patterns. These models learn from data to predict future sales, enabling more accurate and automated forecasting.

3. Why did this project use linear regression?

Linear regression was used as a simple yet effective model to forecast daily product demand. It's easy to interpret, works well with time-based features, and offers a strong baseline for evaluating performance before moving to more complex models.

4. What features improve demand forecasting accuracy?

Features like day of the week, month, holidays, promotions, and historical sales averages can significantly boost model accuracy. In this project, we used time-based features to capture demand patterns effectively.

5. Can this model be used in real e-commerce applications?

Yes. This approach can be extended and deployed to predict sales across categories or regions in real-time. With further tuning and more data (like weather, ads, or social trends), it can support strategic decisions in inventory and marketing.

Rohit Sharma

834 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources