Forest Fire Project Using R - A Step-by-Step Guide
By Rohit Sharma
Updated on Jul 28, 2025 | 13 min read | 1.39K+ views
Share:
For working professionals
For fresh graduates
More
By Rohit Sharma
Updated on Jul 28, 2025 | 13 min read | 1.39K+ views
Share:
Table of Contents
We've all seen or heard about forest fires. Every year, thousands of acres of land come under forest fires. Various environmental factors like temperature, humidity, and wind influence forest fires.
In this Forest Fire Project Using R, we’ll use data related to forest fires and learn how to clean, preprocess, and analyze it using R. We’ll also apply classification models such as Decision Tree, Random Forest, and Logistic Regression to predict fire risk. This project includes techniques like data visualization, feature engineering, and model evaluation.
Supercharge Your Data Science Career. Learn from Experts, Master GenAI Tools, and Boost Your Earning Potential With These Top Online Data Science Courses. Enroll Now!
Want More Projects on R? Read This: Top 25+ R Projects for Beginners to Boost Your Data Science Skills in 2025
Before starting the forest fire project using R, there are a few things you need to know in order to complete the project properly:
Level Up Your Career in Data Science and AI with upGrad’s Expert-Led Programs. Learn Generative AI, Machine Learning, and Advanced Analytics—All Online. Apply Now!
These are the tools and libraries that’ll be used in this project. These tools help you extract and use the data for various purposes.
Library / Tool |
Purpose |
ggplot2 | To visualize environmental patterns and fire risk relationships |
dplyr | For data cleaning, transformation, and manipulation |
rpart | To build and visualize Decision Tree models for classification |
randomForest | To apply the Random Forest algorithm for better fire risk prediction |
readr | To load CSV files and handle data input efficiently |
tidyverse | A collection of packages (including ggplot2 and dplyr) for tidy workflows |
Read More To Understand About: Machine Learning with R: Everything You Need to Know
These are the classification models that’ll be used in this forest fire project.
Model |
Purpose |
Decision Tree | An easy-to-understand model that splits data into clear rules |
Random Forest | It is an ensemble method that improves accuracy and also reduces overfitting |
Logistic Regression | A simple, interpretable baseline for binary fire prediction |
Other Options | Advanced models like SVM and k-NN can be tested for comparison |
Read About: R For Data Science: Why Should You Choose R for Data Science?
This forest fire project is a beginner-level project. It requires basic knowledge of R and basic knowledge of models and classification techniques.
This Forest Fire Project Using R can typically be completed in 2 to 4 hours. This depends on your familiarity with R language and classification models. The time includes all the steps from start to finish, like data cleaning, visualization, model building, and evaluation.
The project is ideal for beginners who have a basic understanding of R and are looking to apply classification models. It also suits intermediate learners who want experience in environmental data analysis and predictive modeling.
In this section, we’ll go through the entire project. Starting from loading the dataset, building and evaluating classification models. Each step includes simple explanations and R code.
Download the dataset from Kaggle. Next, open Google Colab, and change the runtime to use R instead of Python.
How to switch Colab to R:
Now, you’re ready to upload the dataset and install the libraries you'll use for this project.
Before starting the project, it’s required that you install the libraries to run the forest fire project properly. These R packages give us the tools to handle data, build models, and visualize results.
Use this code:
# Install required packages (run this once to set things up)
install.packages("ggplot2") # For creating charts and visualizations
install.packages("rpart") # For building decision tree models
install.packages("randomForest") # For building random forest models
install.packages("dplyr") # For data cleaning and manipulation
In this step, we’ll load and read the dataset. Use this code:
# Read the uploaded forest fire CSV file into R
data <- read.csv("forestfires.csv")
# Show the first few rows of the data to understand its structure
head(data)
This gives us a table showing the data of the first few rows and columns, like this:
X |
Y |
month |
day |
FFMC |
DMC |
DC |
ISI |
temp |
RH |
wind |
rain |
area |
|
<int> |
<int> |
<chr> |
<chr> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
<int> |
<dbl> |
<dbl> |
<dbl> |
|
1 |
7 |
5 |
mar |
fri |
86.2 |
26.2 |
94.3 |
5.1 |
8.2 |
51 |
6.7 |
0.0 |
0 |
2 |
7 |
4 |
oct |
tue |
90.6 |
35.4 |
669.1 |
6.7 |
18.0 |
33 |
0.9 |
0.0 |
0 |
3 |
7 |
4 |
oct |
sat |
90.6 |
43.7 |
686.9 |
6.7 |
14.6 |
33 |
1.3 |
0.0 |
0 |
4 |
8 |
6 |
mar |
fri |
91.7 |
33.3 |
77.5 |
9.0 |
8.3 |
97 |
4.0 |
0.2 |
0 |
5 |
8 |
6 |
mar |
sun |
89.3 |
51.3 |
102.2 |
9.6 |
11.4 |
99 |
1.8 |
0.0 |
0 |
6 |
8 |
6 |
aug |
sun |
92.3 |
85.3 |
488.0 |
14.7 |
22.2 |
29 |
5.4 |
0.0 |
0 |
You Can Also Read This: 18 Types of Regression in Machine Learning You Should Know
In this step, we’ll analyze the data and understand what it consists of. Use this code:
# View the first few rows of the dataset (to get a quick look at the values)
head(data)
# Check the structure of the dataset (see data types of each column)
str(data)
# Get summary statistics (like mean, min, max, etc.) for each column
summary(data)
This will give us an overview of the dataset:
X |
Y |
month |
day |
FFMC |
DMC |
DC |
ISI |
temp |
RH |
wind |
rain |
area |
|
<int> |
<int> |
<chr> |
<chr> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
<int> |
<dbl> |
<dbl> |
<dbl> |
|
1 |
7 |
5 |
mar |
fri |
86.2 |
26.2 |
94.3 |
5.1 |
8.2 |
51 |
6.7 |
0.0 |
0 |
2 |
7 |
4 |
oct |
tue |
90.6 |
35.4 |
669.1 |
6.7 |
18.0 |
33 |
0.9 |
0.0 |
0 |
3 |
7 |
4 |
oct |
sat |
90.6 |
43.7 |
686.9 |
6.7 |
14.6 |
33 |
1.3 |
0.0 |
0 |
4 |
8 |
6 |
mar |
fri |
91.7 |
33.3 |
77.5 |
9.0 |
8.3 |
97 |
4.0 |
0.2 |
0 |
5 |
8 |
6 |
mar |
sun |
89.3 |
51.3 |
102.2 |
9.6 |
11.4 |
99 |
1.8 |
0.0 |
0 |
6 |
8 |
6 |
aug |
sun |
92.3 |
85.3 |
488.0 |
14.7 |
22.2 |
29 |
5.4 |
0.0 |
0 |
In this section, we’ll create a new column for fire risk classification. Here’s the code:
# Create a new column called 'fire_risk' based on the 'area' column
# If area > 0, mark it as "yes" (fire occurred), otherwise "no"
data$fire_risk <- ifelse(data$area > 0, "yes", "no")
# Convert the new column to a factor (important for classification models)
data$fire_risk <- as.factor(data$fire_risk)
# Check how many 'yes' and 'no' labels we have
table(data$fire_risk)
The output from this code was:
no yes
247 270
That means:
We have 247 instances where no fire occurred, and 270 instances where a fire did occur.
It’s Time To Build Your Skills With This R Language Tutorial
In this step, we’ll prepare the dataset for modeling. Here we’ll remove the ‘area’ column to avoid data leakage. By cleaning up the data, we can identify important patterns.
Use this code:
# Load dplyr package for data manipulation
library(dplyr)
# Remove the 'area' column (we've already used it to create 'fire_risk')
data <- data %>% select(-area)
# Convert 'month' and 'day' columns to categorical (factor) type
# These are names like 'aug', 'fri', etc. — not numbers
data$month <- as.factor(data$month)
data$day <- as.factor(data$day)
# Check the updated structure of the dataset
str(data)
The output for this code will be like the following:
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
'data.frame': 517 obs. of 13 variables:
$ X : int 7 7 7 8 8 8 8 8 8 7 ...
$ Y : int 5 4 4 6 6 6 6 6 6 5 ...
$ month : Factor w/ 12 levels "apr","aug","dec",..: 8 11 11 8 8 2 2 2 12 12 ...
$ day : Factor w/ 7 levels "fri","mon","sat",..: 1 6 3 1 4 4 2 2 6 3 ...
$ FFMC : num 86.2 90.6 90.6 91.7 89.3 92.3 92.3 91.5 91 92.5 ...
$ DMC : num 26.2 35.4 43.7 33.3 51.3 ...
$ DC : num 94.3 669.1 686.9 77.5 102.2 ...
$ ISI : num 5.1 6.7 6.7 9 9.6 14.7 8.5 10.7 7 7.1 ...
$ temp : num 8.2 18 14.6 8.3 11.4 22.2 24.1 8 13.1 22.8 ...
$ RH : int 51 33 33 97 99 29 27 86 63 40 ...
$ wind : num 6.7 0.9 1.3 4 1.8 5.4 3.1 2.2 5.4 4 ...
$ rain : num 0 0 0 0.2 0 0 0 0 0 0 ...
$ fire_risk: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
In this section, we will:
Split the dataset into two parts:
# Set a random seed so results stay the same every time you run this code
set.seed(123)
# Combine the training and testing data first (in case re-running)
combined_data <- rbind(train_data, test_data)
# Randomly select 70% of the rows to be used for training
train_index <- sample(1:nrow(combined_data), 0.7 * nrow(combined_data))
# Split the data: 70% for training, 30% for testing
train_data <- combined_data[train_index, ]
test_data <- combined_data[-train_index, ]
# Print the number of rows in each set
cat("Training rows:", nrow(train_data), "\n")
cat("Testing rows:", nrow(test_data), "\n")
We get the output as:
Training rows: 361
Testing rows: 156
That means:
361 rows will be used to train this model, and 156 rows will be used to test how well the model predicts fire risk.
Also Read To Level Up: What is Data Wrangling? Exploring Its Role in Data Analysis
Here, we're using a Decision Tree model, which is simple and easy to understand. It uses features like temperature, humidity, and wind to predict whether there’s a fire risk ("yes" or "no").
# Load the decision tree package
library(rpart)
# Train a decision tree model to predict 'fire_risk' based on all other variables
tree_model <- rpart(fire_risk ~ ., data = train_data, method = "class")
# Print the summary of the model (tree structure and splits)
print(tree_model)
This gives the output as:
n= 361
node), split, n, loss, yval, (yprob)
* denotes terminal node
1) root 361 168 yes (0.4653740 0.5346260)
2) temp< 18.85 166 75 no (0.5481928 0.4518072)
4) month=apr,aug,jan,jul,mar,oct 89 30 no (0.6629213 0.3370787)
8) X< 4.5 51 10 no (0.8039216 0.1960784) *
9) X>=4.5 38 18 yes (0.4736842 0.5263158)
18) temp>=16.9 10 1 no (0.9000000 0.1000000) *
19) temp< 16.9 28 9 yes (0.3214286 0.6785714) *
5) month=dec,feb,jun,may,sep 77 32 yes (0.4155844 0.5844156)
10) wind< 7.8 70 32 yes (0.4571429 0.5428571)
20) DMC< 115.35 44 19 no (0.5681818 0.4318182)
40) RH< 52 17 4 no (0.7647059 0.2352941) *
41) RH>=52 27 12 yes (0.4444444 0.5555556)
82) RH>=73.5 8 2 no (0.7500000 0.2500000) *
83) RH< 73.5 19 6 yes (0.3157895 0.6842105) *
21) DMC>=115.35 26 7 yes (0.2692308 0.7307692) *
11) wind>=7.8 7 0 yes (0.0000000 1.0000000) *
3) temp>=18.85 195 77 yes (0.3948718 0.6051282)
6) Y< 2.5 17 4 no (0.7647059 0.2352941) *
7) Y>=2.5 178 64 yes (0.3595506 0.6404494)
14) DC< 562.75 33 15 no (0.5454545 0.4545455)
28) temp< 24.4 22 6 no (0.7272727 0.2727273) *
29) temp>=24.4 11 2 yes (0.1818182 0.8181818) *
15) DC>=562.75 145 46 yes (0.3172414 0.6827586) *
The output shows how the decision tree splits the data to predict fire risk.
In this step, we’ll test the accuracy of the model used based on the unseen data. Use this code:
# Predict fire risk labels on test data
tree_pred <- predict(tree_model, test_data, type = "class")
# Calculate prediction accuracy
accuracy <- mean(tree_pred == test_data$fire_risk)
# Print accuracy percentage
cat("Decision Tree Accuracy:", round(accuracy * 100, 2), "%\n")
The output for this model came out to be:
Decision Tree Accuracy: 53.85 %
This means the decision tree correctly identified fire risk about 54% of the time, which is a modest start and indicates room for improvement.
Here we’ll load the random forest model and train a model using 100 decision trees. It will predict fire risk on test data and calculate how many predictions were correct.
# Load the random forest package
library(randomForest)
# Train the model on training data (100 trees)
rf_model <- randomForest(fire_risk ~ ., data = train_data, ntree = 100)
# Make predictions on the test set
rf_pred <- predict(rf_model, test_data)
# Calculate and display accuracy
rf_accuracy <- mean(rf_pred == test_data$fire_risk)
cat("Random Forest Accuracy:", round(rf_accuracy * 100, 2), "%\n")
The output for this model came out to be:
Random Forest Accuracy: 54.49 %
This means your model was able to correctly predict fire risk in about 54.49% of the cases, which is slightly better than the Decision Tree result.
In this step, we check the importance of each model and how it helped in making a decision. The code for this section is:
# Show importance values of each variable
importance(rf_model)
# Visualize the importance of features
varImpPlot(rf_model)
This gives the output:
X |
15.0032994 |
Y |
12.4502925 |
month |
10.0166118 |
day |
16.6330798 |
FFMC |
13.9930943 |
DMC |
18.1110317 |
DC |
15.8684236 |
ISI |
13.7510319 |
temp |
23.5911713 |
RH |
21.1361677 |
wind |
15.1371009 |
rain |
0.1780586 |
Popular Data Science Programs
Top Important Features (based on plot & table):
Rank |
Feature |
Importance (MeanDecreaseGini) |
1 | temp | 23.59 – Most important |
2 | RH | 21.14 |
3 | DMC | 18.11 |
4 | day | 16.63 |
5 | DC | 15.87 |
This shows us that temperature and humidity (RH) are the most crucial indicators of fire risk in this dataset.
In this forest fire project using R, we analyzed forest fire risk using environmental data and built two classification models: a Decision Tree and a Random Forest.
We started by preprocessing the dataset, converting necessary variables to factors, and creating a binary target variable (fire_risk). After splitting the data into training and testing sets, we trained and evaluated both models.
The Decision Tree model achieved an accuracy of 53.85%, while the Random Forest model performed slightly better with 54.49% accuracy. Random Forest also identified key features like temperature, relative humidity, and DMC as the most influential in predicting fire risk.
Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!
Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!
Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!
Reference:
https://colab.research.google.com/drive/1oPFk4aH-Hy0lrtiTMk80neRcbh6kPRFQ#scrollTo=LKrm_UOKb-OY
802 articles published
Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...
Speak with Data Science Expert
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources