Home
Blog
Data Science
Weather Forecasting Model Using Machine Learning and Time Series Analysis

Weather Forecasting Model Using Machine Learning and Time Series Analysis

Updated on Jul 30, 2025 | 8 min read | 1.62K+ views

Table of Contents

View all

What Should You Know Beforehand?
Technologies and Libraries Used
Models That Will Be Utilized for Learning
Time Taken and Difficulty
How to Build a Weather Forecasting Model
Conclusion

Accurate weather prediction plays a major role in agriculture, travel, disaster planning, and daily life. In this project, you'll build a Weather Forecasting Model using machine learning techniques and time series data.

You'll apply regression models to predict continuous variables such as temperature, humidity, or pressure. You'll also learn how to prepare time-based data and evaluate your model's accuracy using standard metrics.

If you're looking to accelerate your data science journey, check out the Online Data Science Courses at upGrad. The programs help you learn Python, Machine Learning, AI, Tableau, SQL, and more from top-tier faculty. Enroll today!

Spark your next big idea. Browse our full collection of data science projects in Python.

Popular Data Science Programs

MSc AI and Data Science Program MS in Data Science Post Graduate Certificate in Data Science Cloud Computing Courses Certification PGD in Data Science

What Should You Know Beforehand?

It’s helpful to have some basic knowledge of the following before starting this project:

Python programming (variables, functions, loops, basic syntax)
Pandas and Numpy (data loading, cleaning, and numerical operations)
Matplotlib or Seaborn (data visualization)
Scikit‑learn basics (Know how to train models, make predictions, and evaluate performance using metrics like MAE, RMSE, or R².)
Intro to Time Series Concepts (Understanding trends, seasonality, and autocorrelation will help in preprocessing and modeling.

Also Read- Data Structures in Python

Start your journey of career advancement in data science with upGrad’s top-ranked courses and get a chance to learn from industry-established mentors:

Technologies and Libraries Used

For this Weather Forecasting Model project, the following tools and libraries will be used:

Tool / Library	Purpose
Python	Core programming language
Google Colab	Cloud-based notebook for coding and collaboration
Pandas	Data loading, cleaning, and manipulation
NumPy	Numerical computations and array operations
Matplotlib / Seaborn	Visualizing time series trends and correlations
Scikit-learn	Building and evaluating regression models
Datetime / statsmodels	Handling time-based data and forecasting

Also Read - Top 6 Python IDEs of 2025 That Will Change Your Workflow!

Models That Will Be Utilized for Learning

For our Weather Forecasting Model, we’ll use the following key techniques to build and evaluate predictive models:

Linear Regression
A simple yet effective method for predicting continuous weather variables like temperature based on historical data.
Time Series Analysis (Lag Features, Rolling Means)
Techniques that help the model learn from past weather values and identify trends or seasonal patterns.

Check this free Linear Regression – Step by Step Guide course to enhance your skills in predictive modeling, feature engineering, and model evaluation.

Time Taken and Difficulty

You can complete this Weather Forecasting Model project in about 3 to 4 hours. It’s ideal for beginners to intermediate users, offering hands-on experience in:

Predicting continuous variables using regression
Working with time series data
Evaluating model performance using real metrics

How to Build a Weather Forecasting Model

Let’s build this project from scratch with clear, step-by-step guidance:

Load the Weather Dataset
Preprocess and Clean the Data
Create Time-Based Features
Convert Categorical Variables
Define Features and Target & Split Data Chronologically
Train the Linear Regression Model
Evaluate the Model
Visualize the Predictions

Without any further delay, let’s get started!

Step 1: Download the Dataset

Download the dataset from Kaggle, extract the ZIP file, and use the downloaded dataset file for the project.

Now that you’ve downloaded the dataset, let’s move on to the next step, uploading and loading it into Google Colab.

Step 2: Upload and Read the Dataset in Google Colab

Now that you have downloaded both files, upload them to Google Colab using the code below:

from google.colab import files
uploaded = files.upload()

Once uploaded, use the following Python code to read and check the data and import the required libraries:

# main.py
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import matplotlib.pyplot as plt
import seaborn as sns
# --- 1. Data Loading and Initial Exploration ---
# Load the dataset from the uploaded CSV file.
try:
    df = pd.read_csv('weatherHistory.csv')
except FileNotFoundError:
    print("Error: 'weatherHistory.csv' not found. Please make sure the file is in the correct directory.")
    exit()
print("--- Initial Data Overview ---")
print("First 5 rows of the dataset:")
print(df.head())
print("\nDataset Information:")
df.info()
print("\nChecking for missing values:")
print(df.isnull().sum())

Output :

--- Initial Data Overview ---

First 5 rows of the dataset:

Formatted Date Summary Precip Type Temperature (C) \

0 2006-04-01 00:00:00.000 +0200 Partly Cloudy rain 9.472222

1 2006-04-01 01:00:00.000 +0200 Partly Cloudy rain 9.355556

2 2006-04-01 02:00:00.000 +0200 Mostly Cloudy rain 9.377778

3 2006-04-01 03:00:00.000 +0200 Partly Cloudy rain 8.288889

4 2006-04-01 04:00:00.000 +0200 Mostly Cloudy rain 8.755556

Apparent Temperature (C) Humidity Wind Speed (km/h) \

0 7.388889 0.89 14.1197

1 7.227778 0.86 14.2646

2 9.377778 0.89 3.9284

3 5.944444 0.83 14.1036

4 6.977778 0.83 11.0446

Wind Bearing (degrees) Visibility (km) Loud Cover Pressure (millibars) \

0 251.0 15.8263 0.0 1015.13

1 259.0 15.8263 0.0 1015.63

2 204.0 14.9569 0.0 1015.94

3 269.0 15.8263 0.0 1016.41

4 259.0 15.8263 0.0 1016.51

Daily Summary

0 Partly cloudy throughout the day.

1 Partly cloudy throughout the day.

2 Partly cloudy throughout the day.

3 Partly cloudy throughout the day.

4 Partly cloudy throughout the day.

Dataset Information:

RangeIndex: 96453 entries, 0 to 96452

Data columns (total 12 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Formatted Date 96453 non-null object

1 Summary 96453 non-null object

2 Precip Type 95936 non-null object

3 Temperature (C) 96453 non-null float64

4 Apparent Temperature (C) 96453 non-null float64

5 Humidity 96453 non-null float64

6 Wind Speed (km/h) 96453 non-null float64

7 Wind Bearing (degrees) 96453 non-null float64

8 Visibility (km) 96453 non-null float64

9 Loud Cover 96453 non-null float64

10 Pressure (millibars) 96453 non-null float64

11 Daily Summary 96453 non-null object

dtypes: float64(8), object(4)

memory usage: 8.8+ MB

Checking for missing values:

Formatted Date 0

Summary 0

Precip Type 517

Temperature (C) 0

Apparent Temperature (C) 0

Humidity 0

Wind Speed (km/h) 0

Wind Bearing (degrees) 0

Visibility (km) 0

Loud Cover 0

Pressure (millibars) 0

Daily Summary 0

dtype: int64

Data Science Courses to upskill

Explore Data Science Courses for Career Progression

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

Step 3: Data Preprocessing and Feature Engineering

We’ll clean the dataset, extract useful time-based features, handle missing values, and convert categorical variables into a numerical format that the machine learning model can understand.

Here is the code for this step:

# --- Step 3: Data Preprocessing and Feature Engineering ---
print("\n--- Data Preprocessing ---")
# Convert 'Formatted Date' to datetime format
df['Formatted Date'] = pd.to_datetime(df['Formatted Date'], utc=True)
# Set datetime as index and sort chronologically
df.set_index('Formatted Date', inplace=True)
df.sort_index(inplace=True)
# Fill missing values in 'Precip Type' with the most frequent value
precip_mode = df['Precip Type'].mode()[0]
df['Precip Type'].fillna(precip_mode, inplace=True)
print(f"Filled missing 'Precip Type' values with '{precip_mode}'.")
# Create new time-based features from the datetime index
df['year'] = df.index.year
df['month'] = df.index.month
df['day'] = df.index.day
df['hour'] = df.index.hour
print("Created time-based features: year, month, day, hour.")
# Drop columns that add no value or cause data leakage
df_processed = df.drop(['Loud Cover', 'Apparent Temperature (C)', 'Daily Summary'], axis=1)
# Convert categorical variables into numerical format using one-hot encoding
df_processed = pd.get_dummies(df_processed, columns=['Summary', 'Precip Type'], drop_first=True)
print("Converted categorical features to numeric using one-hot encoding.")

Output:

--- Data Preprocessing ---

Filled missing 'Precip Type' values with 'rain'.
Created new time-based features: 'year', 'month', 'day', 'hour'.
Converted categorical features to numeric using one-hot encoding.

Check out this tutorial on How to Work with datetime in Python to learn how to handle dates and times, parse timestamps, format outputs, and perform date-based operations

Step 4: Define Features and Target Variable

We now separate the target variable and features: Temperature (C) will be our prediction target, and the rest will serve as input features for the model.

Here is the code for this step:

# --- Step 4: Defining Features (X) and Target (y) ---
# The target variable 'y' is what we want to predict.
y = df_processed['Temperature (C)']
# The feature matrix 'X' contains all the variables used for prediction.
X = df_processed.drop('Temperature (C)', axis=1)
# Display shapes to confirm the structure
print("\nShape of Feature Matrix (X):", X.shape)
print("Shape of Target Vector (y):", y.shape)

Output:

Shape of Feature Matrix (X): (96453, 36)
Shape of Target Vector (y): (96453,)

Also Read- Feature Engineering for Machine Learning: Process, Techniques, and Examples

Step 5: Split Data into Training and Testing Sets

Since this is time series data, we avoid random splitting. Instead, we split the dataset chronologically, training on the past and testing on the future.

Here is the code for this step:

# --- Step 5: Splitting Data into Training and Testing Sets ---
# Define split ratio
split_ratio = 0.8
split_index = int(len(X) * split_ratio)
# Chronologically split the dataset
X_train = X[:split_index]
X_test = X[split_index:]
y_train = y[:split_index]
y_test = y[split_index:]
# Print the sizes of the splits
print(f"\nSplit data into {split_ratio*100}% training and {(1 - split_ratio)*100}% testing.")
print("Training set size:", len(X_train))
print("Testing set size:", len(X_test))

Output:

Split data into 80.0% training and 19.999999999999996% testing.
Training set size: 77162
Testing set size: 19291

Step 6: Train the Linear Regression Model

Now we train a Linear Regression model, which is a simple yet effective algorithm for predicting continuous variables like temperature. The model learns patterns from the training data to make future predictions.

Here is the code for this step:

# --- Step 6: Model Training ---
print("\n--- Model Training ---")
# Initialize the Linear Regression model
model = LinearRegression()
# Train the model on the training data
model.fit(X_train, y_train)
print("Linear Regression model trained successfully.")

Output:

--- Model Training ---
Linear Regression model trained successfully

Also Read - What is Regression: Regression Analysis Explained

Step 7: Make Predictions and Evaluate the Model

After training the model, we test its performance on unseen data. We use standard regression metrics to measure accuracy and understand how well the model predicts temperature.

Here is the code:

# --- Step 7: Model Prediction and Evaluation ---
print("\n--- Model Evaluation ---")
# Predict on the test set
y_pred = model.predict(X_test)
# Calculate evaluation metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)
# Print the evaluation results
print(f"Mean Absolute Error (MAE): {mae:.2f}")
print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
print(f"R-squared (R²): {r2:.2f}")

Output:

--- Model Evaluation ---

Mean Absolute Error (MAE): 5.22
Mean Squared Error (MSE): 38.56
Root Mean Squared Error (RMSE): 6.21
R-squared (R²): 0.53

Step 8: Visualize the Results

To better understand model performance, we visualize the predictions:

A line plot comparing actual vs predicted temperatures over time.
A scatter plot showing how closely the predictions match the actual values.

Here is the Code for this step:

# --- Step 8: Visualizing the Results ---
print("\n--- Visualizing Results ---")
# Set plot style
plt.style.use('seaborn-v0_8-whitegrid')
# Line plot: Actual vs Predicted temperatures (sample of 500)
fig, ax = plt.subplots(figsize=(15, 7))
sample_size = 500
ax.plot(y_test.index[:sample_size], y_test.values[:sample_size], label='Actual Temperature', color='blue', linewidth=2)
ax.plot(y_test.index[:sample_size], y_pred[:sample_size], label='Predicted Temperature', color='red', linestyle='--', linewidth=2)
ax.set_title('Weather Forecast: Actual vs. Predicted Temperature (Sample of Test Data)', fontsize=16)
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Temperature (C)', fontsize=12)
ax.legend()
plt.xticks(rotation=45)
plt.tight_layout()
print("Displaying the plot of actual vs. predicted temperatures.")
plt.show()
# Scatter plot: Predicted vs Actual values
fig, ax = plt.subplots(figsize=(8, 8))
ax.scatter(y_test, y_pred, alpha=0.3)
ax.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2, label='Perfect Prediction')
ax.set_xlabel('Actual Temperature (C)', fontsize=12)
ax.set_ylabel('Predicted Temperature (C)', fontsize=12)
ax.set_title('Actual vs. Predicted Scatter Plot', fontsize=16)
ax.legend()
plt.tight_layout()
plt.show()

Output:

Also Read - Top 15 Types of Data Visualization: Benefits and How to Choose the Right Tool for Your Needs in 2025

Conclusion

In this project, we built a weather forecasting model using linear regression on historical weather data. After cleaning the dataset, creating time-based features, and encoding categorical variables, we trained the model and evaluated it using MAE, RMSE, and R². The results showed reasonable prediction accuracy, and visualizations confirmed the model's ability to capture key trends. This project reinforced essential skills in regression, time series handling, and model evaluation.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Subscribe to upGrad's Newsletter

Join thousands of learners who receive useful tips

Promise we won't spam!

Explore our Popular Data Science Courses

Executive Post Graduate Programme in Data Science from IIITB	Data Science Bootcamp with AI	Master of Science in Data Science from LJMU
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Courses

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Top Data Science Skills to Learn

Data Analysis Course	Inferential Statistics Courses
Hypothesis Testing Programs	Logistic Regression Courses
Linear Regression Courses	Linear Algebra for Analysis

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Read our popular Data Science Articles

Is Data Science Hard to Learn	Data Science Career Growth	What Is Data Science? Courses, Basics, Frameworks & Careers
Future of Data Science in India	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist

Colab Link:
https://colab.research.google.com/drive/1KBrnnWm6Ka858VkrhyKDVBVHcwyIm3RG

Frequently Asked Questions (FAQs)

1. What are the different types of weather forecast models?

There are several types, including numerical weather prediction (NWP), statistical models, and machine learning models. In our project, we used a linear regression model—a simple machine learning approach that predicts temperature using historical data and time-based features.

2. Which forecast model is most accurate?

The accuracy depends on the dataset, location, and time window. For short-term predictions, complex models like Random Forests or LSTMs may outperform simple models. However, our linear regression model delivered reasonable accuracy while remaining interpretable and fast.

3. Why are models used for weather forecasting?

Models help predict future weather conditions based on historical patterns and features. In our case, the model learned how temperature changes over time and used that knowledge to forecast future temperatures.

4. What are the methods of weather forecasting used in this project?

We used a data-driven statistical approach, where historical weather data was cleaned, feature-engineered (e.g., day, month, lag features), and then passed to a linear regression model to predict temperature.

5. What is the most accurate weather forecast model in India?

Globally, models like ECMWF and GFS are widely used, but accuracy varies by region. In this project, we used local temperature data and a linear regression model to customize the forecast for a specific region, providing localized accuracy based on available data.

Rohit Sharma

840 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources