Build a Stock Price Prediction Model Using ML Techniques

By Rohit Sharma

Updated on Jul 30, 2025 | 10 min read | 1.2K+ views

Share:

Stock price prediction is one of the most practical applications of time series forecasting in finance. 

In this project, you'll learn how to build machine learning models to forecast future stock prices based on historical data. Using techniques like data preprocessing, feature engineering, and supervised learning, you’ll uncover patterns in stock trends and train models to make accurate predictions.

If you're aiming to fast-track your data science career, explore the Online Data Science Courses offered by upGrad. These courses cover essential tools like Python, Machine Learning, AI, SQL, Tableau, and more, taught by industry-leading faculty. Take the next step and enroll today!

Ignite your next big idea with our expertly curated collection of Python-based data science projects, perfect for sharpening your skills and building real-world experience.

What Should You Know Before Building a Stock Price Prediction Project?

Before starting your stock price prediction project, it’s important to be familiar with these key concepts and tools:

  • Python programming (You’ll use Python throughout for data processing, visualization, and modeling.)
  • Pandas and Numpy (These libraries help you handle time series data, perform calculations, and structure your dataset for modeling.)
  • Matplotlib or Seaborn (You’ll use them to visualize stock trends, forecast results, and model performance.)
  • Time series concepts (trend, seasonality, stationarity)
  • Machine learning and forecasting models (Familiarity with ARIMA and moving averages)
  • Model evaluation metrics (Learn how to use RMSE, MAE, and MAPE to assess how accurate your model’s predictions are).

Also Read- Autoregressive Model Explained: Forecasting Made Simple

Start your data science career journey with upGrad’s top-ranked courses and gain the opportunity to learn directly from experienced industry mentors.

background

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

The Tech Stack Fueling Stock Price Prediction Project

To build this stock price prediction project, you'll work with powerful Python libraries that specialize in time series forecasting, data manipulation, and visualization:

Tool / Library

Purpose

Python Core programming language for data handling and modeling
Google Colab Free cloud-based platform to run code and experiments
Pandas Loads, structures, and preprocesses stock market time series data
NumPy Handles numerical computations required for data smoothing and scaling
Matplotlib / Seaborn Visualizes historical trends, model predictions, and evaluation metrics
Statsmodels Implements ARIMA and other statistical forecasting models
Warnings Library Suppresses unwanted output from forecasting libraries like ARIMA

Also Read - How to Create a Python Heatmap with Seaborn? [Comprehensive Explanation]

How Long Will It Take and What Will You Learn?

You can complete this stock price prediction project in 3 to 4 hours. It’s great for beginners with Python skills who want to explore time series forecasting and apply machine learning to real-world financial data.

Smart Forecasting: Techniques That Drive Stock Price Prediction

To build a reliable stock price prediction model, you'll use essential techniques that transform historical market data into actionable forecasts:

  • Time Series Analysis: Understand stock trends and patterns over time using date-based indexing and chronological modeling.
  • ARIMA Modeling: Apply the ARIMA algorithm to capture autocorrelation, trends, and seasonality in stock price data.
  • Data Visualization: Use tools like Matplotlib to plot actual vs. predicted prices, helping you evaluate model performance visually.

Also Read- Data Visualisation: The What, The Why, and The How!

How to Build a Stock Price Prediction Model

Let’s build this project from scratch with clear, step-by-step guidance:

1. Load the Stock Price Dataset

2. Clean and Preprocess the Data

3. Visualize Stock Trends

4. Apply Time Series Model (ARIMA)

5. Forecast Future Prices

6. Evaluate the Forecast

Let’s jump in and get started.

Step 1:  Import Essential Libraries

To begin building your stock price prediction model, start by importing the core Python libraries. These include tools for data handling, visualization, and time series forecasting using ARIMA.

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tools.sm_exceptions import ValueWarning
import warnings

# Suppress warnings from statsmodels for cleaner output
warnings.filterwarnings("ignore", category=ValueWarning)
warnings.filterwarnings("ignore", category=UserWarning)

Step 2: Load and Prepare the Stock Data

In this step, load the historical stock price dataset, convert the date column to a proper datetime format, and set it as the index. This prepares the data for time series forecasting.

# STEP 2: LOADING AND PREPARING THE DATA
try:
    # Load the dataset
    file_path = 'stock_data.csv'
    df = pd.read_csv(file_path)

    # Rename the first column to 'Date'
    df.rename(columns={df.columns[0]: 'Date'}, inplace=True)

    # Convert the 'Date' column to datetime objects
    df['Date'] = pd.to_datetime(df['Date'])

    # Set the 'Date' column as the index of the DataFrame
    df.set_index('Date', inplace=True)

    print("\nDataset Information:")
    df.info()
    print("\nFirst 5 rows of the dataset:")
    print(df.head())
    print(f"\nDataset Summary:")
    print(f"  • Total number of stocks: {len(df.columns)}")
    print(f"  • Stock names: {list(df.columns)}")
    print(f"  • Date range: {df.index.min().date()} to {df.index.max().date()}")
    print(f"  • Total trading days: {len(df)}")

except FileNotFoundError:
    print("Error: 'stock_data.csv' not found.")
    print("Please make sure you have uploaded the file to your session.")
    exit()

Output : 

Dataset Information:

DatetimeIndex: 365 entries, 2020-01-01 to 2020-12-30

Data columns (total 5 columns):

 #   Column   Non-Null Count  Dtype  

---  ------   --------------  -----  

 0   Stock_1  365 non-null    float64

 1   Stock_2  365 non-null    float64

 2   Stock_3  365 non-null    float64

 3   Stock_4  365 non-null    float64

 4   Stock_5  365 non-null    float64

dtypes: float64(5)

memory usage: 17.1 KB

First 5 rows of the dataset:

Date                    Stock_1             Stock_2            Stock_3            Stock_4             Stock_5

2020-01-01       101.764052      100.160928      99.494642       99.909756        101.761266

2020-01-02      102.171269       99.969968       98.682973       100.640755       102.528643

2020-01-03      103.171258       99.575237        98.182139         100.574847       101.887811

2020-01-04      105.483215      99.308641       97.149381         100.925017       101.490049

2020-01-05      107.453175       98.188428       99.575396        101.594411        101.604283

Dataset Summary:

  • Total number of stocks: 5
  • Stock names: ['Stock_1', 'Stock_2', 'Stock_3', 'Stock_4', 'Stock_5']
  • Date range: 2020-01-01 to 2020-12-30
  • Total trading days: 365

This sets the foundation for forecasting by ensuring the dataset is clean, properly indexed by date, and ready for model building.

Step 3: Choose a Stock and Analyze Its Historical Data

Now that the dataset is ready, pick a specific stock for forecasting. This step involves selecting one stock from the dataset, handling missing values, and analyzing its historical trends and statistics.

Here is the code for this step:

# STEP 3: STOCK SELECTION AND DATA EXPLORATION
# --------------------------------------------------

# Select the stock to analyze
stock_to_predict = 'Stock_1'
print(f"Selected stock for analysis: {stock_to_predict}")

# Extract and clean the data
stock_data = df[[stock_to_predict]].dropna()

print(f"\nData Analysis for {stock_to_predict}:")
print(f"  • Available data points: {len(stock_data)}")
print(f"  • Date range: {stock_data.index.min().date()} to {stock_data.index.max().date()}")
print(f"  • Price statistics:")
print(f"    - Minimum price: ${stock_data[stock_to_predict].min():.2f}")
print(f"    - Maximum price: ${stock_data[stock_to_predict].max():.2f}")
print(f"    - Average price: ${stock_data[stock_to_predict].mean():.2f}")
print(f"    - Standard deviation: ${stock_data[stock_to_predict].std():.2f}")

# Display basic statistics
print(f"\nDetailed Statistics:")
print(stock_data[stock_to_predict].describe())

Conclusion:

Selected stock for analysis: Stock_1

 Data Analysis for Stock_1:

  • Available data points: 365
  • Date range: 2020-01-01 to 2020-12-30
  • Price statistics:
    • Minimum price: $91.47
    • Maximum price: $121.90
    • Average price: $107.77
    • Standard deviation: $7.40

Detailed Statistics:

count    365.000000

mean     107.772577

std        7.398296

min       91.474442

25%      101.603117

50%      107.421299

75%      113.741728

max      121.901773

Name: Stock_1, dtype: float64

Also Read - How Forecasting Works in Tableau? Predicting the Future with Data

Step 4: Visualize the Historical Price Trends

After selecting and analyzing a stock, the next step is to visualize its historical price movements. This helps spot trends, volatility, and any seasonal patterns in the data before building the prediction model.

Here's the code to generate the line chart of historical prices:

# STEP 4: VISUALIZE HISTORICAL STOCK DATA
print(f"Visualization for {stock_to_predict} historical prices...")

# Create a comprehensive plot
plt.figure(figsize=(15, 7))
plt.plot(
    stock_data.index,
    stock_data[stock_to_predict],
    label=f'Historical Prices for {stock_to_predict}',
    linewidth=2,
    color='blue'
)

plt.title(f'Historical Price Data for {stock_to_predict}', fontsize=16, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Stock Price (USD)', fontsize=12)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)

# Add some styling
plt.tight_layout()
plt.show()

Output:

Also Read- 14 Essential Data Visualization Libraries for Python in 2025

Step 5: Train the ARIMA Model for Stock Price Forecasting

With the data visualized, you're now ready to train an ARIMA model to forecast future stock prices. ARIMA is widely used for time series forecasting because it captures patterns based on past values (AR), differencing (I), and forecast errors (MA).

Below is the code to configure, train, and inspect the ARIMA model:

# STEP 5: IMPLEMENT AND TRAIN THE ARIMA MODEL

# Define ARIMA parameters
p, d, q = 5, 1, 0
print(f"\nModel Configuration:")
print(f"  • p (AR order): {p} - Using last 5 values for prediction")
print(f"  • d (Differencing): {d} - Making data stationary")
print(f"  • q (MA order): {q} - No moving average component")

# Create and train the ARIMA model
print(f"\nTraining ARIMA({p},{d},{q}) model...")
model = ARIMA(stock_data[stock_to_predict], order=(p, d, q))

# Fit the model
model_fit = model.fit()

# Display comprehensive training results
print("\nTraining Results:")
print(f"  • AIC (Akaike Information Criterion): {model_fit.aic:.2f}")
print(f"  • BIC (Bayesian Information Criterion): {model_fit.bic:.2f}")
print(f"  • Log Likelihood: {model_fit.llf:.2f}")
print(f"  • Model Parameters: ARIMA({p},{d},{q})")

# Display model coefficients
print("\nModel Coefficients:")
params = model_fit.params
for i, param in enumerate(params):
    if i < p:
        param_name = f"AR.L{i+1}"
    elif i == len(params) - 1:
        param_name = "sigma2"
    else:
        param_name = f"param_{i}"
    print(f"  • {param_name}: {param:.4f}")

print(f"\nModel Performance:")
print(f"  • Successfully fitted on {len(stock_data)} data points")

Output: 

Model Configuration:

  • p (AR order): 5 - Using last 5 values for prediction
  • d (Differencing): 1 - Making data stationary
  • q (MA order): 0 - No moving average component

 Training ARIMA(5,1,0) model...

 Training Results:

  • AIC (Akaike Information Criterion): 1086.54
  • BIC (Bayesian Information Criterion): 1109.92
  • Log Likelihood: -537.27
  • Model Parameters: ARIMA(5,1,0)

 Model Coefficients:

  • AR.L1: -0.0252
  • AR.L2: 0.0508
  • AR.L3: 0.0416
  • AR.L4: -0.0474
  • AR.L5: 0.0216
  • sigma2: 1.1208

 Model Performance:

  • Successfully fitted on 365 data points

Step 6: Forecast Future Stock Prices Using ARIMA

After training the model, the next step is to predict future stock prices. Here, you'll forecast the stock movement over the next 30 trading days and evaluate the predicted trend.

# STEP 6: GENERATE FUTURE PREDICTIONS

# Set forecast parameters
n_forecast = 30
print(f"Generating predictions for the next {n_forecast} trading days...")

# Generate forecasts
print("\nCalculating forecasts...")
forecast_result = model_fit.get_forecast(steps=n_forecast)

# Extract forecast components
predicted_mean = forecast_result.predicted_mean
confidence_intervals = forecast_result.conf_int()

# Create date range for forecasts
last_date = stock_data.index[-1]
forecast_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=n_forecast)

# Display forecast summary
print(f"\nForecast Summary:")
print(f"  • Forecast period: {forecast_dates[0].date()} to {forecast_dates[-1].date()}")
print(f"  • Number of predictions: {len(predicted_mean)}")
print(f"  • Last historical price: ${stock_data[stock_to_predict].iloc[-1]:.2f}")
print(f"  • First forecast price: ${predicted_mean.iloc[0]:.2f}")
print(f"  • Average forecast price: ${predicted_mean.mean():.2f}")

# Show first 10 days of predictions
print(f"\nDetailed Predictions (First 10 Days):")
print("Date          | Predicted Price | Lower Bound | Upper Bound")
print("-" * 60)
for i in range(min(10, len(predicted_mean))):
    lower_bound = confidence_intervals.iloc[i, 0]
    upper_bound = confidence_intervals.iloc[i, 1]
    print(f"{forecast_dates[i].date()} | ${predicted_mean.iloc[i]:>11.2f} | ${lower_bound:>9.2f} | ${upper_bound:>9.2f}")

if n_forecast > 10:
    print(f"\n... and {n_forecast - 10} more predictions available")

# Analyze forecasted trend
print(f"\nForecast Analysis:")
trend = "upward" if predicted_mean.iloc[-1] > predicted_mean.iloc[0] else "downward"
print(f"  • Overall trend: {trend}")
print(f"  • Price range: ${predicted_mean.min():.2f} - ${predicted_mean.max():.2f}")
print(f"  • Volatility: Confidence intervals show prediction uncertainty")

Output: 

Generating predictions for the next 30 trading days...

Forecast Summary:

  • Forecast period: 2020-12-31 to 2021-01-29
  • Number of predictions: 30
  • Last historical price: $93.86
  • First forecast price: $93.92
  • Average forecast price: $93.91

Detailed Predictions (First 10 Days):

Date          | Predicted Price | Lower Bound | Upper Bound

------------------------------------------------------------

2020-12-31 | $      93.92 | $    91.85 | $    96.00

2021-01-01 | $      93.89 | $    90.99 | $    96.79

2021-01-02 | $      93.90 | $    90.30 | $    97.49

2021-01-03 | $      93.91 | $    89.69 | $    98.13

2021-01-04 | $      93.90 | $    89.18 | $    98.62

2021-01-05 | $      93.91 | $    88.71 | $    99.10

2021-01-06 | $      93.91 | $    88.28 | $    99.53

2021-01-07 | $      93.91 | $    87.88 | $    99.93

2021-01-08 | $      93.91 | $    87.50 | $   100.31

2021-01-09 | $      93.91 | $    87.14 | $   100.67

... and 20 more predictions available

Forecast Analysis:

  • Overall trend: downward
  • Price range: $93.89 - $93.92
  • Volatility: Confidence intervals show prediction uncertainty

Step 7:  Visualize the Complete Forecast

In this step, you'll plot both historical stock data and future forecasts, making it easier to understand the ARIMA model's predictions visually.

# STEP 7: VISUALIZE THE COMPLETE FORECAST

# Create the final comprehensive plot
plt.figure(figsize=(15, 8))

# Plot historical data
plt.plot(stock_data.index, stock_data[stock_to_predict],
         label='Historical Prices', linewidth=2, color='blue')

# Plot forecasted data
plt.plot(forecast_dates, predicted_mean,
         color='red', linestyle='--', linewidth=2, label='Forecasted Prices')

# Plot confidence intervals
plt.fill_between(forecast_dates,
                 confidence_intervals.iloc[:, 0],
                 confidence_intervals.iloc[:, 1],
                 color='pink', alpha=0.5, label='95% Confidence Interval')

# Enhance the plot
plt.title(f'Stock Price Forecast for {stock_to_predict}', fontsize=16, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Stock Price (USD)', fontsize=12)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)

# Add vertical line to separate historical and forecast data
plt.axvline(x=last_date, color='gray', linestyle=':', alpha=0.7, label='Forecast Start')

plt.tight_layout()
plt.show()

Output:

Visualization Components:

  • Blue line: Historical stock prices
  • Red dashed line: Forecasted prices
  • Pink shaded area: 95% confidence interval
  • Gray dotted line: Forecast starting point

How to Interpret This Plot:

  • The red line shows the predicted direction of the stock.
  • The pink area indicates uncertainty; the wider it is, the less certain the prediction.
  • If the forecast continues the historical trend, the model likely captured key patterns.

Use this to compare actual future data and refine your model later.

Also Read - Evaluation Metrics in Machine Learning: Top 10 Metrics You Should Know

Final Conclusion

This project used ARIMA to forecast stock prices based on historical data. You cleaned and explored the dataset, trained the model, and visualized both past trends and future predictions. While ARIMA captured the general price direction, future improvements could include more features or advanced models for better accuracy.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Colab Link -
https://colab.research.google.com/drive/1D10SqVf_Au2yzLbSsArnHEmGPp5buybN?usp=sharing

Frequently Asked Questions (FAQs)

1. What is stock price prediction in machine learning?

2. How accurate is ARIMA for stock price prediction?

3. Can stock prices be predicted?

4. What data do you need for stock price prediction?

5. Is this stock price prediction project good for beginners?

Rohit Sharma

802 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

Placement Assistance

Executive PG Program

12 Months