Home
Blog
Data Science
Forest Fire Project Using R - A Step-by-Step Guide

Forest Fire Project Using R - A Step-by-Step Guide

Q: 1. What is the Forest Fire Prediction project in R?

The Forest Fire Prediction project is a beginner-friendly machine learning task that uses environmental data (like temperature, humidity, and wind) to predict whether a forest fire will occur. It uses classification models like Decision Trees and Random Forests implemented in R.

Q: 2. Which R libraries are used in the Forest Fire Classification Project?

The key libraries used are: ggplot2 – for data visualization dplyr – for data manipulation rpart – to build Decision Trees randomForest – to build a more accurate Random Forest classifier

Q: 3. What algorithms can be used to improve forest fire prediction?

In addition to Decision Trees and Random Forests, you can also try: Logistic Regression – for a quick baseline model Support Vector Machines (SVM) – for better performance on complex patterns k-Nearest Neighbors (k-NN) – simple and intuitive model Neural Networks – for deep learning-based improvements

Q: 4. Can I do this Forest Fire project without prior experience in R?

Yes. This project is suitable for beginners and covers everything step-by-step, from importing data to evaluating your models, while giving you hands-on practice with R programming and machine learning concepts.

Q: 5. What are some similar machine learning projects I can try after this one?

Here are 5 related projects to help you grow your skills: Voice Gender Recognition using Audio Features Winning Jeopardy: Predicting Game Outcomes Condominium Sale Price Prediction in Real Estate Inventory Demand Forecasting for Supply Chain Stock Market Forecasting using Time Series Models

By Rohit Sharma

Updated on Jul 28, 2025 | 13 min read | 1.69K+ views

Table of Contents

View all

What Should You Know Before Starting the Forest Fire Project Using R?
Tools, Technologies, and R Libraries You'll Be Using
Classification Models You Will Explore in This Forest Fire Project Using R
Project Duration, Difficulty, and Skill Level Required
Step-by-Step Forest Fire Project Using R: A Hands-On Guide
Conclusion

We've all seen or heard about forest fires. Every year, thousands of acres of land come under forest fires. Various environmental factors like temperature, humidity, and wind influence forest fires.

In this Forest Fire Project Using R, we’ll use data related to forest fires and learn how to clean, preprocess, and analyze it using R. We’ll also apply classification models such as Decision Tree, Random Forest, and Logistic Regression to predict fire risk. This project includes techniques like data visualization, feature engineering, and model evaluation.

Supercharge Your Data Science Career. Learn from Experts, Master GenAI Tools, and Boost Your Earning Potential With These Top Online Data Science Courses. Enroll Now!

Want More Projects on R? Read This: Top 25+ R Projects for Beginners to Boost Your Data Science Skills in 2025

What Should You Know Before Starting the Forest Fire Project Using R?

Before starting the forest fire project using R, there are a few things you need to know in order to complete the project properly:

Basic knowledge of R programming and running code in Google Colab or RStudio
Basic knowledge is required to be familiar with and use the environment, install libraries, and execute scripts effectively.
Familiarity with data manipulation using dplyr and tidyverse packages
Knowing these packages helps you clean, filter, transform, and prepare the forest fire dataset for modeling.
Understanding of classification concepts and binary target variables
Knowledge of classification concepts and binary target variables helps structure the project as a classification task (predicting fire vs. no fire).
Exposure to basic machine learning models like Decision Tree and Logistic Regression
Having knowledge about these models helps you understand how models are trained, evaluated, and compared in R.
Basic knowledge of data visualization using ggplot2
If you have basic knowledge of data visualization it helps you explore patterns and present environmental factors influencing fire risk.

Level Up Your Career in Data Science and AI with upGrad’s Expert-Led Programs. Learn Generative AI, Machine Learning, and Advanced Analytics—All Online. Apply Now!

Tools, Technologies, and R Libraries You'll Be Using

These are the tools and libraries that’ll be used in this project. These tools help you extract and use the data for various purposes.

Library / Tool	Purpose
ggplot2	To visualize environmental patterns and fire risk relationships
dplyr	For data cleaning, transformation, and manipulation
rpart	To build and visualize Decision Tree models for classification
randomForest	To apply the Random Forest algorithm for better fire risk prediction
readr	To load CSV files and handle data input efficiently
tidyverse	A collection of packages (including ggplot2 and dplyr) for tidy workflows

Read More To Understand About: Machine Learning with R: Everything You Need to Know

Classification Models You Will Explore in This Forest Fire Project Using R

These are the classification models that’ll be used in this forest fire project.

Model	Purpose
Decision Tree	An easy-to-understand model that splits data into clear rules
Random Forest	It is an ensemble method that improves accuracy and also reduces overfitting
Logistic Regression	A simple, interpretable baseline for binary fire prediction
Other Options	Advanced models like SVM and k-NN can be tested for comparison

Read About: R For Data Science: Why Should You Choose R for Data Science?

Project Duration, Difficulty, and Skill Level Required

This forest fire project is a beginner-level project. It requires basic knowledge of R and basic knowledge of models and classification techniques.

Estimated Time Commitment (2–4 Hours)

This Forest Fire Project Using R can typically be completed in 2 to 4 hours. This depends on your familiarity with R language and classification models. The time includes all the steps from start to finish, like data cleaning, visualization, model building, and evaluation.

Complexity Level: Beginner to Intermediate

The project is ideal for beginners who have a basic understanding of R and are looking to apply classification models. It also suits intermediate learners who want experience in environmental data analysis and predictive modeling.

Step-by-Step Forest Fire Project Using R: A Hands-On Guide

In this section, we’ll go through the entire project. Starting from loading the dataset, building and evaluating classification models. Each step includes simple explanations and R code.

Step 1: Download the Forest Fires Dataset and Start Colab

Download the dataset from Kaggle. Next, open Google Colab, and change the runtime to use R instead of Python.

How to switch Colab to R:

Click on Runtime → Change runtime type
Select R from the "Runtime type" dropdown
Click Save

Now, you’re ready to upload the dataset and install the libraries you'll use for this project.

Step 2: Install Required Libraries for the Forest Fire Project

Before starting the project, it’s required that you install the libraries to run the forest fire project properly. These R packages give us the tools to handle data, build models, and visualize results.

Use this code:

# Install required packages (run this once to set things up)
install.packages("ggplot2")       # For creating charts and visualizations
install.packages("rpart")         # For building decision tree models
install.packages("randomForest")  # For building random forest models
install.packages("dplyr")         # For data cleaning and manipulation

Step 3: Load the Dataset and View the Data

In this step, we’ll load and read the dataset. Use this code:

# Read the uploaded forest fire CSV file into R
data <- read.csv("forestfires.csv")  
# Show the first few rows of the data to understand its structure
head(data)

This gives us a table showing the data of the first few rows and columns, like this:

	X	Y	month	day	FFMC	DMC	DC	ISI	temp	RH	wind	rain	area
	<int>	<int>	<chr>	<chr>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<int>	<dbl>	<dbl>	<dbl>
1	7	5	mar	fri	86.2	26.2	94.3	5.1	8.2	51	6.7	0.0	0
2	7	4	oct	tue	90.6	35.4	669.1	6.7	18.0	33	0.9	0.0	0
3	7	4	oct	sat	90.6	43.7	686.9	6.7	14.6	33	1.3	0.0	0
4	8	6	mar	fri	91.7	33.3	77.5	9.0	8.3	97	4.0	0.2	0
5	8	6	mar	sun	89.3	51.3	102.2	9.6	11.4	99	1.8	0.0	0
6	8	6	aug	sun	92.3	85.3	488.0	14.7	22.2	29	5.4	0.0	0

You Can Also Read This: 18 Types of Regression in Machine Learning You Should Know

Step 4: Explore and Understand the Forest Fire Dataset

In this step, we’ll analyze the data and understand what it consists of. Use this code:

# View the first few rows of the dataset (to get a quick look at the values)
head(data)
# Check the structure of the dataset (see data types of each column)
str(data)
# Get summary statistics (like mean, min, max, etc.) for each column
summary(data)

This will give us an overview of the dataset:

	X	Y	month	day	FFMC	DMC	DC	ISI	temp	RH	wind	rain	area
	<int>	<int>	<chr>	<chr>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<int>	<dbl>	<dbl>	<dbl>
1	7	5	mar	fri	86.2	26.2	94.3	5.1	8.2	51	6.7	0.0	0
2	7	4	oct	tue	90.6	35.4	669.1	6.7	18.0	33	0.9	0.0	0
3	7	4	oct	sat	90.6	43.7	686.9	6.7	14.6	33	1.3	0.0	0
4	8	6	mar	fri	91.7	33.3	77.5	9.0	8.3	97	4.0	0.2	0
5	8	6	mar	sun	89.3	51.3	102.2	9.6	11.4	99	1.8	0.0	0
6	8	6	aug	sun	92.3	85.3	488.0	14.7	22.2	29	5.4	0.0	0

Step 5: Create a Target Column for Fire Risk Classification

In this section, we’ll create a new column for fire risk classification. Here’s the code:

# Create a new column called 'fire_risk' based on the 'area' column
# If area > 0, mark it as "yes" (fire occurred), otherwise "no"
data$fire_risk <- ifelse(data$area > 0, "yes", "no")
# Convert the new column to a factor (important for classification models)
data$fire_risk <- as.factor(data$fire_risk)
# Check how many 'yes' and 'no' labels we have
table(data$fire_risk)

The output from this code was:

no yes

247 270

That means:

We have 247 instances where no fire occurred, and 270 instances where a fire did occur.

It’s Time To Build Your Skills With This R Language Tutorial

Step 6: Prepare the Dataset for Modeling

In this step, we’ll prepare the dataset for modeling. Here we’ll remove the ‘area’ column to avoid data leakage. By cleaning up the data, we can identify important patterns.

Use this code:

# Load dplyr package for data manipulation
library(dplyr)
# Remove the 'area' column (we've already used it to create 'fire_risk')
data <- data %>% select(-area)
# Convert 'month' and 'day' columns to categorical (factor) type
# These are names like 'aug', 'fri', etc. — not numbers
data$month <- as.factor(data$month)
data$day <- as.factor(data$day)
# Check the updated structure of the dataset
str(data)

The output for this code will be like the following:

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

'data.frame': 517 obs. of 13 variables:

$ X : int 7 7 7 8 8 8 8 8 8 7 ...

$ Y : int 5 4 4 6 6 6 6 6 6 5 ...

$ month : Factor w/ 12 levels "apr","aug","dec",..: 8 11 11 8 8 2 2 2 12 12 ...

$ day : Factor w/ 7 levels "fri","mon","sat",..: 1 6 3 1 4 4 2 2 6 3 ...

$ FFMC : num 86.2 90.6 90.6 91.7 89.3 92.3 92.3 91.5 91 92.5 ...

$ DMC : num 26.2 35.4 43.7 33.3 51.3 ...

$ DC : num 94.3 669.1 686.9 77.5 102.2 ...

$ ISI : num 5.1 6.7 6.7 9 9.6 14.7 8.5 10.7 7 7.1 ...

$ temp : num 8.2 18 14.6 8.3 11.4 22.2 24.1 8 13.1 22.8 ...

$ RH : int 51 33 33 97 99 29 27 86 63 40 ...

$ wind : num 6.7 0.9 1.3 4 1.8 5.4 3.1 2.2 5.4 4 ...

$ rain : num 0 0 0 0.2 0 0 0 0 0 0 ...

$ fire_risk: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

Step 7: Split the Data into Training and Testing Sets

In this section, we will:

Split the dataset into two parts:

Training data (to teach the model)
Testing data (to see how well the model performs on unseen data)

# Set a random seed so results stay the same every time you run this code
set.seed(123)
# Combine the training and testing data first (in case re-running)
combined_data <- rbind(train_data, test_data)
# Randomly select 70% of the rows to be used for training
train_index <- sample(1:nrow(combined_data), 0.7 * nrow(combined_data))
# Split the data: 70% for training, 30% for testing
train_data <- combined_data[train_index, ]
test_data <- combined_data[-train_index, ]
# Print the number of rows in each set
cat("Training rows:", nrow(train_data), "\n")
cat("Testing rows:", nrow(test_data), "\n")

We get the output as:

Training rows: 361

Testing rows: 156

That means:

361 rows will be used to train this model, and 156 rows will be used to test how well the model predicts fire risk.

Also Read To Level Up: What is Data Wrangling? Exploring Its Role in Data Analysis

Step 8: Build and Evaluate a Decision Tree Model

Here, we're using a Decision Tree model, which is simple and easy to understand. It uses features like temperature, humidity, and wind to predict whether there’s a fire risk ("yes" or "no").

# Load the decision tree package
library(rpart)
# Train a decision tree model to predict 'fire_risk' based on all other variables
tree_model <- rpart(fire_risk ~ ., data = train_data, method = "class")
# Print the summary of the model (tree structure and splits)
print(tree_model)

This gives the output as:

n= 361

node), split, n, loss, yval, (yprob)

* denotes terminal node

1) root 361 168 yes (0.4653740 0.5346260)

2) temp< 18.85 166 75 no (0.5481928 0.4518072)

4) month=apr,aug,jan,jul,mar,oct 89 30 no (0.6629213 0.3370787)

8) X< 4.5 51 10 no (0.8039216 0.1960784) *

9) X>=4.5 38 18 yes (0.4736842 0.5263158)

18) temp>=16.9 10 1 no (0.9000000 0.1000000) *

19) temp< 16.9 28 9 yes (0.3214286 0.6785714) *

5) month=dec,feb,jun,may,sep 77 32 yes (0.4155844 0.5844156)

10) wind< 7.8 70 32 yes (0.4571429 0.5428571)

20) DMC< 115.35 44 19 no (0.5681818 0.4318182)

40) RH< 52 17 4 no (0.7647059 0.2352941) *

41) RH>=52 27 12 yes (0.4444444 0.5555556)

82) RH>=73.5 8 2 no (0.7500000 0.2500000) *

83) RH< 73.5 19 6 yes (0.3157895 0.6842105) *

21) DMC>=115.35 26 7 yes (0.2692308 0.7307692) *

11) wind>=7.8 7 0 yes (0.0000000 1.0000000) *

3) temp>=18.85 195 77 yes (0.3948718 0.6051282)

6) Y< 2.5 17 4 no (0.7647059 0.2352941) *

7) Y>=2.5 178 64 yes (0.3595506 0.6404494)

14) DC< 562.75 33 15 no (0.5454545 0.4545455)

28) temp< 24.4 22 6 no (0.7272727 0.2727273) *

29) temp>=24.4 11 2 yes (0.1818182 0.8181818) *

15) DC>=562.75 145 46 yes (0.3172414 0.6827586) *

The output shows how the decision tree splits the data to predict fire risk.

The root node starts with all 361 rows.
It first splits at a temperature < 18.85.
Further splits depend on month, wind, RH (humidity), DMC, and DC.
Each terminal node gives a prediction (“yes” or “no”) based on probabilities.
For example, if the temperature is low and the month is April or August, it's more likely no fire.
If the temperature is high and DC is above 562.75, it’s more likely a fire risk: yes.

Step 9: Make Predictions Using the Tree and Measure Accuracy

In this step, we’ll test the accuracy of the model used based on the unseen data. Use this code:

# Predict fire risk labels on test data
tree_pred <- predict(tree_model, test_data, type = "class")
# Calculate prediction accuracy
accuracy <- mean(tree_pred == test_data$fire_risk)
# Print accuracy percentage
cat("Decision Tree Accuracy:", round(accuracy * 100, 2), "%\n")

The output for this model came out to be:

Decision Tree Accuracy: 53.85 %

This means the decision tree correctly identified fire risk about 54% of the time, which is a modest start and indicates room for improvement.

Step 10: Train and Test a Random Forest Model

Here we’ll load the random forest model and train a model using 100 decision trees. It will predict fire risk on test data and calculate how many predictions were correct.

# Load the random forest package
library(randomForest)
# Train the model on training data (100 trees)
rf_model <- randomForest(fire_risk ~ ., data = train_data, ntree = 100)
# Make predictions on the test set
rf_pred <- predict(rf_model, test_data)
# Calculate and display accuracy
rf_accuracy <- mean(rf_pred == test_data$fire_risk)
cat("Random Forest Accuracy:", round(rf_accuracy * 100, 2), "%\n")

The output for this model came out to be:

Random Forest Accuracy: 54.49 %

This means your model was able to correctly predict fire risk in about 54.49% of the cases, which is slightly better than the Decision Tree result.

Step 11: Check Feature Importance in the Random Forest Model

In this step, we check the importance of each model and how it helped in making a decision. The code for this section is:

# Show importance values of each variable
importance(rf_model)
# Visualize the importance of features
varImpPlot(rf_model)

This gives the output:

X	15.0032994
Y	12.4502925
month	10.0166118
day	16.6330798
FFMC	13.9930943
DMC	18.1110317
DC	15.8684236
ISI	13.7510319
temp	23.5911713
RH	21.1361677
wind	15.1371009
rain	0.1780586

The plot shows which variables are most important for predicting fire risk using Random Forest.
The MeanDecreaseGini value tells us how much each variable contributed to improving the model’s decisions.
Higher value = more important feature.

Popular Data Science Programs

MS in Data Science Data Science Machine Learning Course PGD in Data Science DevOps Course Online Post Graduate Certificate in Data Science

Top Important Features (based on plot & table):

Rank	Feature	Importance (MeanDecreaseGini)
1	temp	23.59 – Most important
2	RH	21.14
3	DMC	18.11
4	day	16.63
5	DC	15.87

This shows us that temperature and humidity (RH) are the most crucial indicators of fire risk in this dataset.

Conclusion

In this forest fire project using R, we analyzed forest fire risk using environmental data and built two classification models: a Decision Tree and a Random Forest.

We started by preprocessing the dataset, converting necessary variables to factors, and creating a binary target variable (fire_risk). After splitting the data into training and testing sets, we trained and evaluated both models.

The Decision Tree model achieved an accuracy of 53.85%, while the Random Forest model performed slightly better with 54.49% accuracy. Random Forest also identified key features like temperature, relative humidity, and DMC as the most influential in predicting fire risk.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Data Science Courses to upskill

Explore Data Science Courses for Career Progression

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

Explore our Popular Data Science Courses

Executive Post Graduate Programme in Data Science from IIITB	Data Science Bootcamp with AI	Master of Science in Data Science from LJMU
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Courses

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Top Data Science Skills to Learn

Data Analysis Course	Inferential Statistics Courses
Hypothesis Testing Programs	Logistic Regression Courses
Linear Regression Courses	Linear Algebra for Analysis

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Read our popular Data Science Articles

Is Data Science Hard to Learn	Data Science Career Growth	What Is Data Science? Courses, Basics, Frameworks & Careers
Future of Data Science in India	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist

Reference:
https://colab.research.google.com/drive/1oPFk4aH-Hy0lrtiTMk80neRcbh6kPRFQ#scrollTo=LKrm_UOKb-OY

Subscribe to upGrad's Newsletter

Join thousands of learners who receive useful tips

Promise we won't spam!

Frequently Asked Questions (FAQs)

1. What is the Forest Fire Prediction project in R?

The Forest Fire Prediction project is a beginner-friendly machine learning task that uses environmental data (like temperature, humidity, and wind) to predict whether a forest fire will occur. It uses classification models like Decision Trees and Random Forests implemented in R.

2. Which R libraries are used in the Forest Fire Classification Project?

The key libraries used are:

ggplot2 – for data visualization
dplyr – for data manipulation
rpart – to build Decision Trees
randomForest – to build a more accurate Random Forest classifier

3. What algorithms can be used to improve forest fire prediction?

In addition to Decision Trees and Random Forests, you can also try:

Logistic Regression – for a quick baseline model
Support Vector Machines (SVM) – for better performance on complex patterns
k-Nearest Neighbors (k-NN) – simple and intuitive model
Neural Networks – for deep learning-based improvements

4. Can I do this Forest Fire project without prior experience in R?

Yes. This project is suitable for beginners and covers everything step-by-step, from importing data to evaluating your models, while giving you hands-on practice with R programming and machine learning concepts.

5. What are some similar machine learning projects I can try after this one?

Here are 5 related projects to help you grow your skills:

Voice Gender Recognition using Audio Features
Winning Jeopardy: Predicting Game Outcomes
Condominium Sale Price Prediction in Real Estate
Inventory Demand Forecasting for Supply Chain
Stock Market Forecasting using Time Series Models

Rohit Sharma

840 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources