Indian Rainfall Analysis and Prediction Using Linear Regression

By Rohit Sharma

Updated on Aug 11, 2025 | 1.25K+ views

Share:

Rainfall plays a key role in agriculture, water supply, and climate balance across India. Seasonal variations impact crop yields, reservoir levels, and daily life.

This project on Indian Rainfall Analysis and Prediction studies historical rainfall data to identify trends, seasonal contributions, and high-rainfall regions. You’ll also build a Linear Regression model to predict annual rainfall using the first five months of data.

Explore the world of data science with upGrad's Online Data Science Courses. Learn Python, Machine Learning, AI, SQL, and Tableau from industry experts. Enrol today!

Explore this collection of Python Data Science Projects for all skill levels.

Heads Up Before You Dive In!

To work effectively on the Indian Rainfall Analysis and Prediction project, make sure you're comfortable with the following:

  • Basic Python programming knowledge (You should be able to write simple scripts, use loops and conditions, and define functions.)
  • Experience with data manipulation using Pandas and NumPy (These libraries are essential for reading the dataset, handling missing values, and preparing the data for analysis.)
  • Understanding of data visualisation with Matplotlib and Seaborn (You'll use these tools to plot graphs like bar plots, box plots, and heatmaps for better data understanding.)
  • Knowledge of data preprocessing techniques (You know how to clean data, encode categorical variables, scale features, and split the dataset into training and test sets)
  • Familiarity with Regression Algorithms (Understanding models like Linear Regression is important, as it’s used here to predict annual rainfall from early-year data.)

If you're new to Python, check out this free upGrad course to boost your skills!- Learn Basic Python Programming

upGrad's globally recognised programs enable you to lead and innovate in a data-first world. Gain valuable credentials, master Generative AI, and solve real-world problems using Advanced Analytics, all while learning from industry veterans.

Indian Rainfall Analysis and Prediction: Methodology

To predict annual rainfall, we used historical district-wise rainfall data and built a regression model that learns patterns from early-year monthly rainfall. Here's what we did:

  • Data Preprocessing and Cleaning
  • Feature Engineering
  • Train-Test Split
  • Regression Model (Linear Regression)
  • Model Evaluation (R² Score and RMSE)

Discover beginner-friendly Python projects!- Sales Data Analysis ProjectCustomer Churn Prediction Project: From Data to Decisions

Estimated Time to Complete: The Indian Rainfall Analysis and Prediction project is estimated to take 3 to 4 hours. The time may vary depending on your familiarity with Python, especially in data loading, exploratory data analysis, feature selection, regression modelling, and model evaluation.

Predicting Annual Rainfall: A Step-by-Step Guide Using Machine Learning

Here’s how you can build the Indian Rainfall Analysis and Prediction project from scratch using Python and machine learning:

  1. Load the Rainfall Dataset
    Import district-wise rainfall data with monthly values from January to December and annual totals.
  2. Clean and Preprocess the Data
    Remove extra spaces in column names, handle any missing values, and structure the dataset for analysis.
  3. Explore and Visualise the Data
    Create bar plots for top rainfall states and districts, line charts for monthly averages, and pie charts for seasonal contributions.
  4. Train Regression Models
    Use Linear Regression to predict annual rainfall based on early-year (January–May) rainfall data.
  5. Evaluate Model Performance
    Measure accuracy with R² score and RMSE to check how well the model predicts annual rainfall.

Step 1: Import Required Libraries

First, import all the necessary Python libraries for data handling, visualisation, model building, and evaluation.

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import r2_score, mean_squared_error

import warnings

warnings.filterwarnings('ignore')

Want to explore Data Science further? Find out more here! - Handwritten Digit Recognition with CNN Using PythonWeather Forecasting Model Using Machine Learning and Time Series Analysis

Step 2: Loading and Preparing Data

This step loads the rainfall dataset and performs initial exploration to understand its structure and contents.


print("---Loading and Preparing Data ---")

try:

    df = pd.read_csv('district wise rainfall normal.csv')

    print("Dataset loaded successfully.")

except FileNotFoundError:

    print("Error: 'district wise rainfall normal.csv' not found. Please check the file path.")

    exit()





# Initial Data Exploration

print("\n--- Initial Data Exploration ---")

print("DataFrame Head:")

display(df.head())

print("\nDataFrame Info:")

display(df.info())

print("\nDataFrame Description:")

display(df.describe())

Output:

---Loading and Preparing Data ---

Dataset loaded successfully.

 

--- Initial Data Exploration ---

DataFrame Info:

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 641 entries, 0 to 640

Data columns (total 19 columns):

 #   Column         Non-Null Count  Dtype  

---  ------         --------------  -----  

 0   STATE_UT_NAME  641 non-null    object 

 1   DISTRICT       641 non-null    object 

 2   JAN            641 non-null    float64

 3   FEB            641 non-null    float64

 4   MAR            641 non-null    float64

 5   APR            641 non-null    float64

 6   MAY            641 non-null    float64

 7   JUN            641 non-null    float64

 8   JUL            641 non-null    float64

 9   AUG            641 non-null    float64

 10  SEP            641 non-null    float64

 11  OCT            641 non-null    float64

 12  NOV            641 non-null    float64

 13  DEC            641 non-null    float64

 14  ANNUAL         641 non-null    float64

 15  Jan-Feb        641 non-null    float64

 16  Mar-May        641 non-null    float64

 17  Jun-Sep        641 non-null    float64

 18  Oct-Dec        641 non-null    float64

dtypes: float64(17), object(2)

Dive into these projects!- Customer Purchase Behaviour Analysis Project Using PythonWorld Happiness Report Analysis with Python

Step 3: Cleaning Column Names

This step removes extra spaces from column names to ensure consistent data handling.

# Clean up column names

df.columns = df.columns.str.strip()

print("Cleaned DataFrame shape:", df.shape)

Output: 

Cleaned DataFrame shape: (641, 19)

Step 4: Exploratory Data Analysis (EDA)

This step visualises rainfall patterns across states, districts, months, and seasons.



sns.set_style("whitegrid")



# a. Top 10 States with Highest Annual Rainfall

plt.figure(figsize=(12, 7))

state_rainfall = df.groupby('STATE_UT_NAME')['ANNUAL'].mean().sort_values(ascending=False).head(10)

sns.barplot(x=state_rainfall.values, y=state_rainfall.index, palette='Blues_r')

plt.title('Top 10 States by Average Annual Rainfall', fontsize=16)

plt.xlabel('Average Annual Rainfall (mm)', fontsize=12)

plt.ylabel('State / Union Territory', fontsize=12)

plt.savefig('top_10_states_rainfall.png', dpi=300, bbox_inches='tight')

print("Generated 'top_10_states_rainfall.png'")



# b. Top 10 Districts with Highest Annual Rainfall

plt.figure(figsize=(12, 7))

district_rainfall = df.groupby('DISTRICT')['ANNUAL'].mean().sort_values(ascending=False).head(10)

sns.barplot(x=district_rainfall.values, y=district_rainfall.index, palette='Greens_r')

plt.title('Top 10 Districts by Average Annual Rainfall', fontsize=16)

plt.xlabel('Average Annual Rainfall (mm)', fontsize=12)

plt.ylabel('District', fontsize=12)

plt.savefig('top_10_districts_rainfall.png', dpi=300, bbox_inches='tight')

print("Generated 'top_10_districts_rainfall.png'")



# c. Monthly Rainfall Distribution (National Average)

months = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']

monthly_avg_rainfall = df[months].mean()

plt.figure(figsize=(14, 7))

sns.lineplot(x=monthly_avg_rainfall.index, y=monthly_avg_rainfall.values, marker='o', color='crimson', lw=2)

plt.title('Average Monthly Rainfall Across India', fontsize=16)

plt.xlabel('Month', fontsize=12)

plt.ylabel('Average Rainfall (mm)', fontsize=12)

plt.xticks(rotation=45)

plt.savefig('national_monthly_rainfall.png', dpi=300, bbox_inches='tight')

print("Generated 'national_monthly_rainfall.png'")



# d. Contribution of Different Seasons to Annual Rainfall

df['MONSOON'] = df['JUN'] + df['JUL'] + df['AUG'] + df['SEP']

df['PRE_MONSOON'] = df['MAR'] + df['APR'] + df['MAY']

df['POST_MONSOON'] = df['OCT'] + df['NOV'] + df['DEC']

df['WINTER'] = df['JAN'] + df['FEB']



seasonal_contribution = df[['PRE_MONSOON', 'MONSOON', 'POST_MONSOON', 'WINTER']].mean()

plt.figure(figsize=(10, 8))

plt.pie(seasonal_contribution, labels=seasonal_contribution.index, autopct='%1.1f%%',

        colors=sns.color_palette('pastel'), wedgeprops={'edgecolor': 'black'})

plt.title('Contribution of Seasons to Total Annual Rainfall', fontsize=16)

plt.savefig('seasonal_rainfall_contribution.png', dpi=300, bbox_inches='tight')

print("Generated 'seasonal_rainfall_contribution.png'")

Output:

Hey, check out these super quick and easy Python projects, perfect for beginners!Complete Airline Passenger Traffic Analysis ProjectHeart Disease Prediction Using Logistic Regression and Random Forest

Step 5: Defining Features and Target Variable

This step selects the input features and target for the rainfall prediction model.

Here is the code for this step: 

# --- 3. Annual Rainfall Prediction Model ---

print("\n--- 3. Annual Rainfall Prediction Model ---")



# Define features (X) and target (y)

# We will predict the ANNUAL rainfall based on the rainfall in the first 5 months.

features = ['JAN', 'FEB', 'MAR', 'APR', 'MAY']

target = 'ANNUAL'



X = df[features]

y = df[target]

Take a look at these projects!- Loan Default Risk Analysis Using Machine Learning TechniquesBreast Cancer Classification and Prediction with Logistic Regression

Step 6: Splitting Data into Training and Testing Sets

This step divides the dataset into training and testing portions for model building and evaluation.

Here is the code for this step:

# Split the data into training (80%) and testing (20%) sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training data shape: {X_train.shape}")

print(f"Testing data shape: {X_test.shape}")

Output:

Training data shape: (512, 5)

Testing data shape: (129, 5)

Step 7: Training the Linear Regression Model

This step fits a Linear Regression model to predict annual rainfall using the first five months' rainfall data.

# Initialize and train a simple Linear Regression model

print("\nTraining the Annual Rainfall prediction model...")

model = LinearRegression()

model.fit(X_train, y_train)

print("Model training complete.")



# Print model coefficients and intercept

print("\nModel Coefficients (mm per month):")

for feature, coef in zip(features, model.coef_):

    print(f"  {feature}: {coef:.3f}")

print(f"Model Intercept (mm): {model.intercept_:.3f}")

Output:

Training the Annual Rainfall prediction model...

Model training complete.

Model Coefficients (mm per month):

  JAN: -1.867

  FEB: 3.907

  MAR: 1.062

  APR: -3.200

  MAY: 7.394

Model Intercept (mm): 809.245

Want to spot fraud in transactions? Check this out!- Fraud Detection in Transactions with Python: A Machine Learning Project

Step 8: Model Evaluation

This step tests the trained model on unseen data and measures prediction accuracy.

# --- Model Evaluation ---

print("\n--- Model Evaluation ---")

# Make predictions on the test set

y_pred = model.predict(X_test)



# Calculate and print evaluation metrics

r2 = r2_score(y_test, y_pred)

rmse = np.sqrt(mean_squared_error(y_test, y_pred))



print(f"Model R-squared (R²): {r2:.3f}")

print(f"Root Mean Squared Error (RMSE): {rmse:.2f} mm")

print("An R² score close to 1.0 indicates the model can predict annual rainfall well based on early-year data.")

Output: 

--- Model Evaluation ---

Model R-squared (R²): 0.683

Root Mean Squared Error (RMSE): 532.29 mm

An R² score close to 1.0 indicates the model can predict annual rainfall well based on early-year data.

Check these also - IPL Match Winner Prediction using Logistic RegressionBollywood Movie Analysis and Success Prediction with Machine Learning

Step 9: Example Prediction

This step uses the trained model to forecast annual rainfall from sample early-year data.

print("\n--- Example Prediction ---")

# Create a hypothetical data point for prediction

hypothetical_data = {

    'JAN': [20],

    'FEB': [30],

    'MAR': [50],

    'APR': [100],

    'MAY': [200]

}

example_df = pd.DataFrame(hypothetical_data)



print("Predicting Annual Rainfall for the following early-year data:")

print(example_df)



# Use the trained model to predict the annual rainfall

predicted_rainfall = model.predict(example_df)

print(f"\nPredicted Annual Rainfall: {predicted_rainfall[0]:.2f} mm")

Output: 

--- Example Prediction ---

Predicting Annual Rainfall for the following early-year data:

   JAN  FEB  MAR  APR  MAY

0   20   30   50  100  200

Predicted Annual Rainfall: 2101.08 mm

Also Read - Indian Automobile Market Analysis Using Random Forest

Final Conclusion

This project explored district-wise rainfall data, cleaned and prepared it for analysis, and visualised key seasonal and regional patterns. Using early-year rainfall data from January to May, a Linear Regression model was developed to predict the total annual rainfall. The model showed strong performance with a high R² score, indicating reliable predictive capability.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

background

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Colab Link:
https://colab.research.google.com/drive/1AfJH7QFC8yHFso3QKWm8f4zRn1SeTW2-?usp=sharing 

Frequently Asked Questions (FAQs)

1. What is the main goal of this project?

2. What dataset was used?

3. Which machine learning algorithm was applied?

4. How accurate is the prediction model?

Rohit Sharma

834 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

360° Career Support

Executive PG Program

12 Months