Weather Forecasting Model Using Machine Learning and Time Series Analysis

By Rohit Sharma

Updated on Jul 30, 2025 | 8 min read | 1.21K+ views

Share:

Accurate weather prediction plays a major role in agriculture, travel, disaster planning, and daily life. In this project, you'll build a Weather Forecasting Model using machine learning techniques and time series data. 

You'll apply regression models to predict continuous variables such as temperature, humidity, or pressure. You'll also learn how to prepare time-based data and evaluate your model's accuracy using standard metrics.

If you're looking to accelerate your data science journey, check out the Online Data Science Courses at upGrad. The programs help you learn Python, Machine Learning, AI, Tableau, SQL, and more from top-tier faculty. Enroll today!

Spark your next big idea. Browse our full collection of data science projects in Python.

What Should You Know Beforehand?

It’s helpful to have some basic knowledge of the following before starting this project:

  • Python programming (variables, functions, loops, basic syntax)
  • Pandas and Numpy (data loading, cleaning, and numerical operations)
  • Matplotlib or Seaborn (data visualization)
  • Scikit‑learn basics (Know how to train models, make predictions, and evaluate performance using metrics like MAE, RMSE, or R².)
  • Intro to Time Series Concepts (Understanding trends, seasonality, and autocorrelation will help in preprocessing and modeling.

Also Read- Data Structures in Python

Start your journey of career advancement in data science with upGrad’s top-ranked courses and get a chance to learn from industry-established mentors:

Technologies and Libraries Used

For this Weather Forecasting Model project, the following tools and libraries will be used:

Tool / Library

Purpose

Python Core programming language
Google Colab Cloud-based notebook for coding and collaboration
Pandas Data loading, cleaning, and manipulation
NumPy Numerical computations and array operations
Matplotlib / Seaborn Visualizing time series trends and correlations
Scikit-learn Building and evaluating regression models

Datetime / 

statsmodels

Handling time-based data and forecasting

Also Read - Top 6 Python IDEs of 2025 That Will Change Your Workflow!

Models That Will Be Utilized for Learning

For our Weather Forecasting Model, we’ll use the following key techniques to build and evaluate predictive models:

  • Linear Regression
    A simple yet effective method for predicting continuous weather variables like temperature based on historical data.
  • Time Series Analysis (Lag Features, Rolling Means)
    Techniques that help the model learn from past weather values and identify trends or seasonal patterns.

Check this free Linear Regression – Step by Step Guide course to enhance your skills in predictive modeling, feature engineering, and model evaluation.

Time Taken and Difficulty

You can complete this Weather Forecasting Model project in about 3 to 4 hours. It’s ideal for beginners to intermediate users, offering hands-on experience in:

  • Predicting continuous variables using regression
  • Working with time series data
  • Evaluating model performance using real metrics

How to Build a Weather Forecasting Model

Let’s build this project from scratch with clear, step-by-step guidance:

  1. Load the Weather Dataset 
  2. Preprocess and Clean the Data
  3. Create Time-Based Features
  4. Convert Categorical Variables
  5. Define Features and Target & Split Data Chronologically
  6. Train the Linear Regression Model
  7.  Evaluate the Model
  8. Visualize the Predictions

Without any further delay, let’s get started!

Step 1: Download the Dataset

Download the dataset from Kaggle, extract the ZIP file, and use the downloaded dataset file for the project.

Now that you’ve downloaded the dataset, let’s move on to the next step, uploading and loading it into Google Colab.

Step 2: Upload and Read the Dataset in Google Colab

Now that you have downloaded both files, upload them to Google Colab using the code below:

from google.colab import files
uploaded = files.upload()

Once uploaded, use the following Python code to read and check the data and import the required libraries:

# main.py
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import matplotlib.pyplot as plt
import seaborn as sns
# --- 1. Data Loading and Initial Exploration ---
# Load the dataset from the uploaded CSV file.
try:
    df = pd.read_csv('weatherHistory.csv')
except FileNotFoundError:
    print("Error: 'weatherHistory.csv' not found. Please make sure the file is in the correct directory.")
    exit()
print("--- Initial Data Overview ---")
print("First 5 rows of the dataset:")
print(df.head())
print("\nDataset Information:")
df.info()
print("\nChecking for missing values:")
print(df.isnull().sum())

Output : 

--- Initial Data Overview ---

First 5 rows of the dataset:

                  Formatted Date        Summary     Precip         Type       Temperature (C)  \

0  2006-04-01 00:00:00.000 +0200  Partly  Cloudy       rain          9.472222   

1  2006-04-01 01:00:00.000 +0200   Partly  Cloudy        rain          9.355556   

2  2006-04-01 02:00:00.000 +0200  Mostly Cloudy        rain          9.377778   

3  2006-04-01 03:00:00.000 +0200  Partly  Cloudy        rain           8.288889   

4  2006-04-01 04:00:00.000 +0200  Mostly Cloudy        rain          8.755556   

Apparent Temperature (C)  Humidity  Wind Speed (km/h)  \

0                  7.388889      0.89            14.1197   

1                  7.227778      0.86            14.2646   

2                  9.377778      0.89             3.9284   

3                  5.944444      0.83            14.1036   

4                  6.977778      0.83            11.0446   

   Wind Bearing (degrees)  Visibility (km)  Loud Cover  Pressure (millibars)  \

0                   251.0          15.8263         0.0               1015.13   

1                   259.0          15.8263         0.0               1015.63   

2                   204.0          14.9569         0.0               1015.94   

3                   269.0          15.8263         0.0               1016.41   

4                   259.0          15.8263         0.0               1016.51   

                       Daily Summary  

0  Partly cloudy throughout the day.  

1  Partly cloudy throughout the day.  

2  Partly cloudy throughout the day.  

3  Partly cloudy throughout the day.  

4  Partly cloudy throughout the day.  

Dataset Information:

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 96453 entries, 0 to 96452

Data columns (total 12 columns):

 #   Column                              Non-Null Count                    Dtype  

---  ------                                  --------------                      -----  

 0   Formatted Date                     96453 non-null               object 

 1   Summary                                96453 non-null               object 

 2   Precip Type                            95936 non-null               object 

 3   Temperature (C)                    96453 non-null               float64

 4   Apparent Temperature (C)    96453 non-null               float64

 5   Humidity                                 96453 non-null               float64

 6   Wind Speed (km/h)                96453 non-null               float64 

 7   Wind Bearing (degrees)         96453 non-null               float64

 8   Visibility (km)                          96453 non-null               float64

 9   Loud Cover                             96453 non-null               float64

 10  Pressure (millibars)                96453 non-null               float64

 11  Daily Summary                        96453 non-null               object 

dtypes: float64(8), object(4)

memory usage: 8.8+ MB

Checking for missing            values:

Formatted Date                       0

Summary                                 0

Precip Type                             517

Temperature (C)                      0

Apparent Temperature (C)      0

Humidity                                   0

Wind Speed (km/h)                  0

Wind Bearing (degrees)           0

Visibility (km)                            0

Loud Cover                               0

Pressure (millibars)                   0

Daily Summary                          0

dtype: int64

background

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Step 3: Data Preprocessing and Feature Engineering

We’ll clean the dataset, extract useful time-based features, handle missing values, and convert categorical variables into a numerical format that the machine learning model can understand.

Here is the code for this step:

# --- Step 3: Data Preprocessing and Feature Engineering ---
print("\n--- Data Preprocessing ---")
# Convert 'Formatted Date' to datetime format
df['Formatted Date'] = pd.to_datetime(df['Formatted Date'], utc=True)
# Set datetime as index and sort chronologically
df.set_index('Formatted Date', inplace=True)
df.sort_index(inplace=True)
# Fill missing values in 'Precip Type' with the most frequent value
precip_mode = df['Precip Type'].mode()[0]
df['Precip Type'].fillna(precip_mode, inplace=True)
print(f"Filled missing 'Precip Type' values with '{precip_mode}'.")
# Create new time-based features from the datetime index
df['year'] = df.index.year
df['month'] = df.index.month
df['day'] = df.index.day
df['hour'] = df.index.hour
print("Created time-based features: year, month, day, hour.")
# Drop columns that add no value or cause data leakage
df_processed = df.drop(['Loud Cover', 'Apparent Temperature (C)', 'Daily Summary'], axis=1)
# Convert categorical variables into numerical format using one-hot encoding
df_processed = pd.get_dummies(df_processed, columns=['Summary', 'Precip Type'], drop_first=True)
print("Converted categorical features to numeric using one-hot encoding.")

Output:

--- Data Preprocessing ---

Filled missing 'Precip Type' values with 'rain'.
Created new time-based features: 'year', 'month', 'day', 'hour'.
Converted categorical features to numeric using one-hot encoding.

Check out this tutorial on How to Work with datetime in Python to learn how to handle dates and times, parse timestamps, format outputs, and perform date-based operations

Step 4:  Define Features and Target Variable

We now separate the target variable and features: Temperature (C) will be our prediction target, and the rest will serve as input features for the model.

Here is the code for this step:

# --- Step 4: Defining Features (X) and Target (y) ---
# The target variable 'y' is what we want to predict.
y = df_processed['Temperature (C)']
# The feature matrix 'X' contains all the variables used for prediction.
X = df_processed.drop('Temperature (C)', axis=1)
# Display shapes to confirm the structure
print("\nShape of Feature Matrix (X):", X.shape)
print("Shape of Target Vector (y):", y.shape)

Output: 

Shape of Feature Matrix (X): (96453, 36)
Shape of Target Vector (y): (96453,)

Also Read- Feature Engineering for Machine Learning: Process, Techniques, and Examples

Step 5: Split Data into Training and Testing Sets

Since this is time series data, we avoid random splitting. Instead, we split the dataset chronologically, training on the past and testing on the future. 

Here is the code for this step:

# --- Step 5: Splitting Data into Training and Testing Sets ---
# Define split ratio
split_ratio = 0.8
split_index = int(len(X) * split_ratio)
# Chronologically split the dataset
X_train = X[:split_index]
X_test = X[split_index:]
y_train = y[:split_index]
y_test = y[split_index:]
# Print the sizes of the splits
print(f"\nSplit data into {split_ratio*100}% training and {(1 - split_ratio)*100}% testing.")
print("Training set size:", len(X_train))
print("Testing set size:", len(X_test))

Output: 

Split data into 80.0% training and 19.999999999999996% testing.
Training set size: 77162
Testing set size: 19291

Step 6: Train the Linear Regression Model

Now we train a Linear Regression model, which is a simple yet effective algorithm for predicting continuous variables like temperature. The model learns patterns from the training data to make future predictions.

Here is the code for this step:

# --- Step 6: Model Training ---
print("\n--- Model Training ---")
# Initialize the Linear Regression model
model = LinearRegression()
# Train the model on the training data
model.fit(X_train, y_train)
print("Linear Regression model trained successfully.")

Output:

--- Model Training ---
Linear Regression model trained successfully

Also Read - What is Regression: Regression Analysis Explained

Step 7: Make Predictions and Evaluate the Model

After training the model, we test its performance on unseen data. We use standard regression metrics to measure accuracy and understand how well the model predicts temperature.

Here is the code:

# --- Step 7: Model Prediction and Evaluation ---
print("\n--- Model Evaluation ---")
# Predict on the test set
y_pred = model.predict(X_test)
# Calculate evaluation metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)
# Print the evaluation results
print(f"Mean Absolute Error (MAE): {mae:.2f}")
print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
print(f"R-squared (R²): {r2:.2f}")

Output:

--- Model Evaluation ---

Mean Absolute Error (MAE): 5.22
Mean Squared Error (MSE): 38.56
Root Mean Squared Error (RMSE): 6.21
R-squared (R²): 0.53

Step 8: Visualize the Results

To better understand model performance, we visualize the predictions:

  • line plot comparing actual vs predicted temperatures over time.
  • scatter plot showing how closely the predictions match the actual values.

Here is the Code for this step:

# --- Step 8: Visualizing the Results ---
print("\n--- Visualizing Results ---")
# Set plot style
plt.style.use('seaborn-v0_8-whitegrid')
# Line plot: Actual vs Predicted temperatures (sample of 500)
fig, ax = plt.subplots(figsize=(15, 7))
sample_size = 500
ax.plot(y_test.index[:sample_size], y_test.values[:sample_size], label='Actual Temperature', color='blue', linewidth=2)
ax.plot(y_test.index[:sample_size], y_pred[:sample_size], label='Predicted Temperature', color='red', linestyle='--', linewidth=2)
ax.set_title('Weather Forecast: Actual vs. Predicted Temperature (Sample of Test Data)', fontsize=16)
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Temperature (C)', fontsize=12)
ax.legend()
plt.xticks(rotation=45)
plt.tight_layout()
print("Displaying the plot of actual vs. predicted temperatures.")
plt.show()
# Scatter plot: Predicted vs Actual values
fig, ax = plt.subplots(figsize=(8, 8))
ax.scatter(y_test, y_pred, alpha=0.3)
ax.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2, label='Perfect Prediction')
ax.set_xlabel('Actual Temperature (C)', fontsize=12)
ax.set_ylabel('Predicted Temperature (C)', fontsize=12)
ax.set_title('Actual vs. Predicted Scatter Plot', fontsize=16)
ax.legend()
plt.tight_layout()
plt.show()

Output:

Also Read - Top 15 Types of Data Visualization: Benefits and How to Choose the Right Tool for Your Needs in 2025

Conclusion

In this project, we built a weather forecasting model using linear regression on historical weather data. After cleaning the dataset, creating time-based features, and encoding categorical variables, we trained the model and evaluated it using MAE, RMSE, and R². The results showed reasonable prediction accuracy, and visualizations confirmed the model's ability to capture key trends. This project reinforced essential skills in regression, time series handling, and model evaluation.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Colab Link:
https://colab.research.google.com/drive/1KBrnnWm6Ka858VkrhyKDVBVHcwyIm3RG

Frequently Asked Questions (FAQs)

1. What are the different types of weather forecast models?

2. Which forecast model is most accurate?

3. Why are models used for weather forecasting?

4. What are the methods of weather forecasting used in this project?

5. What is the most accurate weather forecast model in India?

Rohit Sharma

802 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

Placement Assistance

Executive PG Program

12 Months