Indian Rainfall Analysis and Prediction Using Linear Regression
By Rohit Sharma
Updated on Aug 11, 2025 | 1.25K+ views
Share:
For working professionals
For fresh graduates
More
By Rohit Sharma
Updated on Aug 11, 2025 | 1.25K+ views
Share:
Table of Contents
Rainfall plays a key role in agriculture, water supply, and climate balance across India. Seasonal variations impact crop yields, reservoir levels, and daily life.
This project on Indian Rainfall Analysis and Prediction studies historical rainfall data to identify trends, seasonal contributions, and high-rainfall regions. You’ll also build a Linear Regression model to predict annual rainfall using the first five months of data.
Explore the world of data science with upGrad's Online Data Science Courses. Learn Python, Machine Learning, AI, SQL, and Tableau from industry experts. Enrol today!
Explore this collection of Python Data Science Projects for all skill levels.
To work effectively on the Indian Rainfall Analysis and Prediction project, make sure you're comfortable with the following:
If you're new to Python, check out this free upGrad course to boost your skills!- Learn Basic Python Programming
upGrad's globally recognised programs enable you to lead and innovate in a data-first world. Gain valuable credentials, master Generative AI, and solve real-world problems using Advanced Analytics, all while learning from industry veterans.
To predict annual rainfall, we used historical district-wise rainfall data and built a regression model that learns patterns from early-year monthly rainfall. Here's what we did:
Discover beginner-friendly Python projects!- Sales Data Analysis Project | Customer Churn Prediction Project: From Data to Decisions
Estimated Time to Complete: The Indian Rainfall Analysis and Prediction project is estimated to take 3 to 4 hours. The time may vary depending on your familiarity with Python, especially in data loading, exploratory data analysis, feature selection, regression modelling, and model evaluation.
Here’s how you can build the Indian Rainfall Analysis and Prediction project from scratch using Python and machine learning:
First, import all the necessary Python libraries for data handling, visualisation, model building, and evaluation.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
import warnings
warnings.filterwarnings('ignore')
Want to explore Data Science further? Find out more here! - Handwritten Digit Recognition with CNN Using Python | Weather Forecasting Model Using Machine Learning and Time Series Analysis
This step loads the rainfall dataset and performs initial exploration to understand its structure and contents.
print("---Loading and Preparing Data ---")
try:
df = pd.read_csv('district wise rainfall normal.csv')
print("Dataset loaded successfully.")
except FileNotFoundError:
print("Error: 'district wise rainfall normal.csv' not found. Please check the file path.")
exit()
# Initial Data Exploration
print("\n--- Initial Data Exploration ---")
print("DataFrame Head:")
display(df.head())
print("\nDataFrame Info:")
display(df.info())
print("\nDataFrame Description:")
display(df.describe())
Output:
---Loading and Preparing Data ---
Dataset loaded successfully.
--- Initial Data Exploration ---
DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 641 entries, 0 to 640
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 STATE_UT_NAME 641 non-null object
1 DISTRICT 641 non-null object
2 JAN 641 non-null float64
3 FEB 641 non-null float64
4 MAR 641 non-null float64
5 APR 641 non-null float64
6 MAY 641 non-null float64
7 JUN 641 non-null float64
8 JUL 641 non-null float64
9 AUG 641 non-null float64
10 SEP 641 non-null float64
11 OCT 641 non-null float64
12 NOV 641 non-null float64
13 DEC 641 non-null float64
14 ANNUAL 641 non-null float64
15 Jan-Feb 641 non-null float64
16 Mar-May 641 non-null float64
17 Jun-Sep 641 non-null float64
18 Oct-Dec 641 non-null float64
dtypes: float64(17), object(2)
Dive into these projects!- Customer Purchase Behaviour Analysis Project Using Python | World Happiness Report Analysis with Python
This step removes extra spaces from column names to ensure consistent data handling.
# Clean up column names
df.columns = df.columns.str.strip()
print("Cleaned DataFrame shape:", df.shape)
Output:
Cleaned DataFrame shape: (641, 19)
This step visualises rainfall patterns across states, districts, months, and seasons.
sns.set_style("whitegrid")
# a. Top 10 States with Highest Annual Rainfall
plt.figure(figsize=(12, 7))
state_rainfall = df.groupby('STATE_UT_NAME')['ANNUAL'].mean().sort_values(ascending=False).head(10)
sns.barplot(x=state_rainfall.values, y=state_rainfall.index, palette='Blues_r')
plt.title('Top 10 States by Average Annual Rainfall', fontsize=16)
plt.xlabel('Average Annual Rainfall (mm)', fontsize=12)
plt.ylabel('State / Union Territory', fontsize=12)
plt.savefig('top_10_states_rainfall.png', dpi=300, bbox_inches='tight')
print("Generated 'top_10_states_rainfall.png'")
# b. Top 10 Districts with Highest Annual Rainfall
plt.figure(figsize=(12, 7))
district_rainfall = df.groupby('DISTRICT')['ANNUAL'].mean().sort_values(ascending=False).head(10)
sns.barplot(x=district_rainfall.values, y=district_rainfall.index, palette='Greens_r')
plt.title('Top 10 Districts by Average Annual Rainfall', fontsize=16)
plt.xlabel('Average Annual Rainfall (mm)', fontsize=12)
plt.ylabel('District', fontsize=12)
plt.savefig('top_10_districts_rainfall.png', dpi=300, bbox_inches='tight')
print("Generated 'top_10_districts_rainfall.png'")
# c. Monthly Rainfall Distribution (National Average)
months = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']
monthly_avg_rainfall = df[months].mean()
plt.figure(figsize=(14, 7))
sns.lineplot(x=monthly_avg_rainfall.index, y=monthly_avg_rainfall.values, marker='o', color='crimson', lw=2)
plt.title('Average Monthly Rainfall Across India', fontsize=16)
plt.xlabel('Month', fontsize=12)
plt.ylabel('Average Rainfall (mm)', fontsize=12)
plt.xticks(rotation=45)
plt.savefig('national_monthly_rainfall.png', dpi=300, bbox_inches='tight')
print("Generated 'national_monthly_rainfall.png'")
# d. Contribution of Different Seasons to Annual Rainfall
df['MONSOON'] = df['JUN'] + df['JUL'] + df['AUG'] + df['SEP']
df['PRE_MONSOON'] = df['MAR'] + df['APR'] + df['MAY']
df['POST_MONSOON'] = df['OCT'] + df['NOV'] + df['DEC']
df['WINTER'] = df['JAN'] + df['FEB']
seasonal_contribution = df[['PRE_MONSOON', 'MONSOON', 'POST_MONSOON', 'WINTER']].mean()
plt.figure(figsize=(10, 8))
plt.pie(seasonal_contribution, labels=seasonal_contribution.index, autopct='%1.1f%%',
colors=sns.color_palette('pastel'), wedgeprops={'edgecolor': 'black'})
plt.title('Contribution of Seasons to Total Annual Rainfall', fontsize=16)
plt.savefig('seasonal_rainfall_contribution.png', dpi=300, bbox_inches='tight')
print("Generated 'seasonal_rainfall_contribution.png'")
Output:
Popular Data Science Programs
Hey, check out these super quick and easy Python projects, perfect for beginners!- Complete Airline Passenger Traffic Analysis Project | Heart Disease Prediction Using Logistic Regression and Random Forest
This step selects the input features and target for the rainfall prediction model.
Here is the code for this step:
# --- 3. Annual Rainfall Prediction Model ---
print("\n--- 3. Annual Rainfall Prediction Model ---")
# Define features (X) and target (y)
# We will predict the ANNUAL rainfall based on the rainfall in the first 5 months.
features = ['JAN', 'FEB', 'MAR', 'APR', 'MAY']
target = 'ANNUAL'
X = df[features]
y = df[target]
Take a look at these projects!- Loan Default Risk Analysis Using Machine Learning Techniques | Breast Cancer Classification and Prediction with Logistic Regression
This step divides the dataset into training and testing portions for model building and evaluation.
Here is the code for this step:
# Split the data into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Training data shape: {X_train.shape}")
print(f"Testing data shape: {X_test.shape}")
Output:
Training data shape: (512, 5)
Testing data shape: (129, 5)
This step fits a Linear Regression model to predict annual rainfall using the first five months' rainfall data.
# Initialize and train a simple Linear Regression model
print("\nTraining the Annual Rainfall prediction model...")
model = LinearRegression()
model.fit(X_train, y_train)
print("Model training complete.")
# Print model coefficients and intercept
print("\nModel Coefficients (mm per month):")
for feature, coef in zip(features, model.coef_):
print(f" {feature}: {coef:.3f}")
print(f"Model Intercept (mm): {model.intercept_:.3f}")
Output:
Training the Annual Rainfall prediction model...
Model training complete.
Model Coefficients (mm per month):
JAN: -1.867
FEB: 3.907
MAR: 1.062
APR: -3.200
MAY: 7.394
Model Intercept (mm): 809.245
Want to spot fraud in transactions? Check this out!- Fraud Detection in Transactions with Python: A Machine Learning Project
This step tests the trained model on unseen data and measures prediction accuracy.
# --- Model Evaluation ---
print("\n--- Model Evaluation ---")
# Make predictions on the test set
y_pred = model.predict(X_test)
# Calculate and print evaluation metrics
r2 = r2_score(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"Model R-squared (R²): {r2:.3f}")
print(f"Root Mean Squared Error (RMSE): {rmse:.2f} mm")
print("An R² score close to 1.0 indicates the model can predict annual rainfall well based on early-year data.")
Output:
--- Model Evaluation ---
Model R-squared (R²): 0.683
Root Mean Squared Error (RMSE): 532.29 mm
An R² score close to 1.0 indicates the model can predict annual rainfall well based on early-year data.
Check these also - IPL Match Winner Prediction using Logistic Regression | Bollywood Movie Analysis and Success Prediction with Machine Learning
This step uses the trained model to forecast annual rainfall from sample early-year data.
print("\n--- Example Prediction ---")
# Create a hypothetical data point for prediction
hypothetical_data = {
'JAN': [20],
'FEB': [30],
'MAR': [50],
'APR': [100],
'MAY': [200]
}
example_df = pd.DataFrame(hypothetical_data)
print("Predicting Annual Rainfall for the following early-year data:")
print(example_df)
# Use the trained model to predict the annual rainfall
predicted_rainfall = model.predict(example_df)
print(f"\nPredicted Annual Rainfall: {predicted_rainfall[0]:.2f} mm")
Output:
--- Example Prediction ---
Predicting Annual Rainfall for the following early-year data:
JAN FEB MAR APR MAY
0 20 30 50 100 200
Predicted Annual Rainfall: 2101.08 mm
Also Read - Indian Automobile Market Analysis Using Random Forest
This project explored district-wise rainfall data, cleaned and prepared it for analysis, and visualised key seasonal and regional patterns. Using early-year rainfall data from January to May, a Linear Regression model was developed to predict the total annual rainfall. The model showed strong performance with a high R² score, indicating reliable predictive capability.
Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!
Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!
Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!
Colab Link:
https://colab.research.google.com/drive/1AfJH7QFC8yHFso3QKWm8f4zRn1SeTW2-?usp=sharing
834 articles published
Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...
Speak with Data Science Expert
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources