Build a Stock Price Prediction Model Using ML Techniques
By Rohit Sharma
Updated on Jul 30, 2025 | 10 min read | 1.2K+ views
Share:
For working professionals
For fresh graduates
More
By Rohit Sharma
Updated on Jul 30, 2025 | 10 min read | 1.2K+ views
Share:
Table of Contents
Stock price prediction is one of the most practical applications of time series forecasting in finance.
In this project, you'll learn how to build machine learning models to forecast future stock prices based on historical data. Using techniques like data preprocessing, feature engineering, and supervised learning, you’ll uncover patterns in stock trends and train models to make accurate predictions.
Popular Data Science Programs
Ignite your next big idea with our expertly curated collection of Python-based data science projects, perfect for sharpening your skills and building real-world experience.
Before starting your stock price prediction project, it’s important to be familiar with these key concepts and tools:
Also Read- Autoregressive Model Explained: Forecasting Made Simple
Start your data science career journey with upGrad’s top-ranked courses and gain the opportunity to learn directly from experienced industry mentors.
To build this stock price prediction project, you'll work with powerful Python libraries that specialize in time series forecasting, data manipulation, and visualization:
Tool / Library |
Purpose |
Python | Core programming language for data handling and modeling |
Google Colab | Free cloud-based platform to run code and experiments |
Pandas | Loads, structures, and preprocesses stock market time series data |
NumPy | Handles numerical computations required for data smoothing and scaling |
Matplotlib / Seaborn | Visualizes historical trends, model predictions, and evaluation metrics |
Statsmodels | Implements ARIMA and other statistical forecasting models |
Warnings Library | Suppresses unwanted output from forecasting libraries like ARIMA |
Also Read - How to Create a Python Heatmap with Seaborn? [Comprehensive Explanation]
You can complete this stock price prediction project in 3 to 4 hours. It’s great for beginners with Python skills who want to explore time series forecasting and apply machine learning to real-world financial data.
To build a reliable stock price prediction model, you'll use essential techniques that transform historical market data into actionable forecasts:
Also Read- Data Visualisation: The What, The Why, and The How!
Let’s build this project from scratch with clear, step-by-step guidance:
1. Load the Stock Price Dataset
2. Clean and Preprocess the Data
3. Visualize Stock Trends
4. Apply Time Series Model (ARIMA)
5. Forecast Future Prices
6. Evaluate the Forecast
Let’s jump in and get started.
To begin building your stock price prediction model, start by importing the core Python libraries. These include tools for data handling, visualization, and time series forecasting using ARIMA.
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.tools.sm_exceptions import ValueWarning
import warnings
# Suppress warnings from statsmodels for cleaner output
warnings.filterwarnings("ignore", category=ValueWarning)
warnings.filterwarnings("ignore", category=UserWarning)
In this step, load the historical stock price dataset, convert the date column to a proper datetime format, and set it as the index. This prepares the data for time series forecasting.
# STEP 2: LOADING AND PREPARING THE DATA
try:
# Load the dataset
file_path = 'stock_data.csv'
df = pd.read_csv(file_path)
# Rename the first column to 'Date'
df.rename(columns={df.columns[0]: 'Date'}, inplace=True)
# Convert the 'Date' column to datetime objects
df['Date'] = pd.to_datetime(df['Date'])
# Set the 'Date' column as the index of the DataFrame
df.set_index('Date', inplace=True)
print("\nDataset Information:")
df.info()
print("\nFirst 5 rows of the dataset:")
print(df.head())
print(f"\nDataset Summary:")
print(f" • Total number of stocks: {len(df.columns)}")
print(f" • Stock names: {list(df.columns)}")
print(f" • Date range: {df.index.min().date()} to {df.index.max().date()}")
print(f" • Total trading days: {len(df)}")
except FileNotFoundError:
print("Error: 'stock_data.csv' not found.")
print("Please make sure you have uploaded the file to your session.")
exit()
Output :
Dataset Information:
DatetimeIndex: 365 entries, 2020-01-01 to 2020-12-30
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Stock_1 365 non-null float64
1 Stock_2 365 non-null float64
2 Stock_3 365 non-null float64
3 Stock_4 365 non-null float64
4 Stock_5 365 non-null float64
dtypes: float64(5)
memory usage: 17.1 KB
First 5 rows of the dataset:
Date Stock_1 Stock_2 Stock_3 Stock_4 Stock_5
2020-01-01 101.764052 100.160928 99.494642 99.909756 101.761266
2020-01-02 102.171269 99.969968 98.682973 100.640755 102.528643
2020-01-03 103.171258 99.575237 98.182139 100.574847 101.887811
2020-01-04 105.483215 99.308641 97.149381 100.925017 101.490049
2020-01-05 107.453175 98.188428 99.575396 101.594411 101.604283
Dataset Summary:
This sets the foundation for forecasting by ensuring the dataset is clean, properly indexed by date, and ready for model building.
Now that the dataset is ready, pick a specific stock for forecasting. This step involves selecting one stock from the dataset, handling missing values, and analyzing its historical trends and statistics.
Here is the code for this step:
# STEP 3: STOCK SELECTION AND DATA EXPLORATION
# --------------------------------------------------
# Select the stock to analyze
stock_to_predict = 'Stock_1'
print(f"Selected stock for analysis: {stock_to_predict}")
# Extract and clean the data
stock_data = df[[stock_to_predict]].dropna()
print(f"\nData Analysis for {stock_to_predict}:")
print(f" • Available data points: {len(stock_data)}")
print(f" • Date range: {stock_data.index.min().date()} to {stock_data.index.max().date()}")
print(f" • Price statistics:")
print(f" - Minimum price: ${stock_data[stock_to_predict].min():.2f}")
print(f" - Maximum price: ${stock_data[stock_to_predict].max():.2f}")
print(f" - Average price: ${stock_data[stock_to_predict].mean():.2f}")
print(f" - Standard deviation: ${stock_data[stock_to_predict].std():.2f}")
# Display basic statistics
print(f"\nDetailed Statistics:")
print(stock_data[stock_to_predict].describe())
Conclusion:
Selected stock for analysis: Stock_1
Data Analysis for Stock_1:
Detailed Statistics:
count 365.000000
mean 107.772577
std 7.398296
min 91.474442
25% 101.603117
50% 107.421299
75% 113.741728
max 121.901773
Name: Stock_1, dtype: float64
Also Read - How Forecasting Works in Tableau? Predicting the Future with Data
After selecting and analyzing a stock, the next step is to visualize its historical price movements. This helps spot trends, volatility, and any seasonal patterns in the data before building the prediction model.
Here's the code to generate the line chart of historical prices:
# STEP 4: VISUALIZE HISTORICAL STOCK DATA
print(f"Visualization for {stock_to_predict} historical prices...")
# Create a comprehensive plot
plt.figure(figsize=(15, 7))
plt.plot(
stock_data.index,
stock_data[stock_to_predict],
label=f'Historical Prices for {stock_to_predict}',
linewidth=2,
color='blue'
)
plt.title(f'Historical Price Data for {stock_to_predict}', fontsize=16, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Stock Price (USD)', fontsize=12)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
# Add some styling
plt.tight_layout()
plt.show()
Output:
Also Read- 14 Essential Data Visualization Libraries for Python in 2025
With the data visualized, you're now ready to train an ARIMA model to forecast future stock prices. ARIMA is widely used for time series forecasting because it captures patterns based on past values (AR), differencing (I), and forecast errors (MA).
Below is the code to configure, train, and inspect the ARIMA model:
# STEP 5: IMPLEMENT AND TRAIN THE ARIMA MODEL
# Define ARIMA parameters
p, d, q = 5, 1, 0
print(f"\nModel Configuration:")
print(f" • p (AR order): {p} - Using last 5 values for prediction")
print(f" • d (Differencing): {d} - Making data stationary")
print(f" • q (MA order): {q} - No moving average component")
# Create and train the ARIMA model
print(f"\nTraining ARIMA({p},{d},{q}) model...")
model = ARIMA(stock_data[stock_to_predict], order=(p, d, q))
# Fit the model
model_fit = model.fit()
# Display comprehensive training results
print("\nTraining Results:")
print(f" • AIC (Akaike Information Criterion): {model_fit.aic:.2f}")
print(f" • BIC (Bayesian Information Criterion): {model_fit.bic:.2f}")
print(f" • Log Likelihood: {model_fit.llf:.2f}")
print(f" • Model Parameters: ARIMA({p},{d},{q})")
# Display model coefficients
print("\nModel Coefficients:")
params = model_fit.params
for i, param in enumerate(params):
if i < p:
param_name = f"AR.L{i+1}"
elif i == len(params) - 1:
param_name = "sigma2"
else:
param_name = f"param_{i}"
print(f" • {param_name}: {param:.4f}")
print(f"\nModel Performance:")
print(f" • Successfully fitted on {len(stock_data)} data points")
Output:
Model Configuration:
Training ARIMA(5,1,0) model...
Training Results:
Model Coefficients:
Model Performance:
After training the model, the next step is to predict future stock prices. Here, you'll forecast the stock movement over the next 30 trading days and evaluate the predicted trend.
# STEP 6: GENERATE FUTURE PREDICTIONS
# Set forecast parameters
n_forecast = 30
print(f"Generating predictions for the next {n_forecast} trading days...")
# Generate forecasts
print("\nCalculating forecasts...")
forecast_result = model_fit.get_forecast(steps=n_forecast)
# Extract forecast components
predicted_mean = forecast_result.predicted_mean
confidence_intervals = forecast_result.conf_int()
# Create date range for forecasts
last_date = stock_data.index[-1]
forecast_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=n_forecast)
# Display forecast summary
print(f"\nForecast Summary:")
print(f" • Forecast period: {forecast_dates[0].date()} to {forecast_dates[-1].date()}")
print(f" • Number of predictions: {len(predicted_mean)}")
print(f" • Last historical price: ${stock_data[stock_to_predict].iloc[-1]:.2f}")
print(f" • First forecast price: ${predicted_mean.iloc[0]:.2f}")
print(f" • Average forecast price: ${predicted_mean.mean():.2f}")
# Show first 10 days of predictions
print(f"\nDetailed Predictions (First 10 Days):")
print("Date | Predicted Price | Lower Bound | Upper Bound")
print("-" * 60)
for i in range(min(10, len(predicted_mean))):
lower_bound = confidence_intervals.iloc[i, 0]
upper_bound = confidence_intervals.iloc[i, 1]
print(f"{forecast_dates[i].date()} | ${predicted_mean.iloc[i]:>11.2f} | ${lower_bound:>9.2f} | ${upper_bound:>9.2f}")
if n_forecast > 10:
print(f"\n... and {n_forecast - 10} more predictions available")
# Analyze forecasted trend
print(f"\nForecast Analysis:")
trend = "upward" if predicted_mean.iloc[-1] > predicted_mean.iloc[0] else "downward"
print(f" • Overall trend: {trend}")
print(f" • Price range: ${predicted_mean.min():.2f} - ${predicted_mean.max():.2f}")
print(f" • Volatility: Confidence intervals show prediction uncertainty")
Output:
Generating predictions for the next 30 trading days...
Forecast Summary:
Detailed Predictions (First 10 Days):
Date | Predicted Price | Lower Bound | Upper Bound
------------------------------------------------------------
2020-12-31 | $ 93.92 | $ 91.85 | $ 96.00
2021-01-01 | $ 93.89 | $ 90.99 | $ 96.79
2021-01-02 | $ 93.90 | $ 90.30 | $ 97.49
2021-01-03 | $ 93.91 | $ 89.69 | $ 98.13
2021-01-04 | $ 93.90 | $ 89.18 | $ 98.62
2021-01-05 | $ 93.91 | $ 88.71 | $ 99.10
2021-01-06 | $ 93.91 | $ 88.28 | $ 99.53
2021-01-07 | $ 93.91 | $ 87.88 | $ 99.93
2021-01-08 | $ 93.91 | $ 87.50 | $ 100.31
2021-01-09 | $ 93.91 | $ 87.14 | $ 100.67
... and 20 more predictions available
Forecast Analysis:
In this step, you'll plot both historical stock data and future forecasts, making it easier to understand the ARIMA model's predictions visually.
# STEP 7: VISUALIZE THE COMPLETE FORECAST
# Create the final comprehensive plot
plt.figure(figsize=(15, 8))
# Plot historical data
plt.plot(stock_data.index, stock_data[stock_to_predict],
label='Historical Prices', linewidth=2, color='blue')
# Plot forecasted data
plt.plot(forecast_dates, predicted_mean,
color='red', linestyle='--', linewidth=2, label='Forecasted Prices')
# Plot confidence intervals
plt.fill_between(forecast_dates,
confidence_intervals.iloc[:, 0],
confidence_intervals.iloc[:, 1],
color='pink', alpha=0.5, label='95% Confidence Interval')
# Enhance the plot
plt.title(f'Stock Price Forecast for {stock_to_predict}', fontsize=16, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Stock Price (USD)', fontsize=12)
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
# Add vertical line to separate historical and forecast data
plt.axvline(x=last_date, color='gray', linestyle=':', alpha=0.7, label='Forecast Start')
plt.tight_layout()
plt.show()
Output:
Visualization Components:
How to Interpret This Plot:
Use this to compare actual future data and refine your model later.
Also Read - Evaluation Metrics in Machine Learning: Top 10 Metrics You Should Know
This project used ARIMA to forecast stock prices based on historical data. You cleaned and explored the dataset, trained the model, and visualized both past trends and future predictions. While ARIMA captured the general price direction, future improvements could include more features or advanced models for better accuracy.
Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!
Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!
Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!
Colab Link -
https://colab.research.google.com/drive/1D10SqVf_Au2yzLbSsArnHEmGPp5buybN?usp=sharing
802 articles published
Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...
Speak with Data Science Expert
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources