Forest Fire Project Using R - A Step-by-Step Guide

By Rohit Sharma

Updated on Jul 28, 2025 | 13 min read | 1.39K+ views

Share:

We've all seen or heard about forest fires. Every year, thousands of acres of land come under forest fires. Various environmental factors like temperature, humidity, and wind influence forest fires. 

In this Forest Fire Project Using R, we’ll use data related to forest fires and learn how to clean, preprocess, and analyze it using R. We’ll also apply classification models such as Decision Tree, Random Forest, and Logistic Regression to predict fire risk. This project includes techniques like data visualization, feature engineering, and model evaluation.

Supercharge Your Data Science Career. Learn from Experts, Master GenAI Tools, and Boost Your Earning Potential With These Top Online Data Science Courses. Enroll Now!

Want More Projects on R? Read This: Top 25+ R Projects for Beginners to Boost Your Data Science Skills in 2025

What Should You Know Before Starting the Forest Fire Project Using R?

Before starting the forest fire project using R, there are a few things you need to know in order to complete the project properly:

  • Basic knowledge of R programming and running code in Google Colab or RStudio
    Basic knowledge is required to be familiar with and use the environment, install libraries, and execute scripts effectively.
  • Familiarity with data manipulation using dplyr and tidyverse packages
    Knowing these packages helps you clean, filter, transform, and prepare the forest fire dataset for modeling.
  • Understanding of classification concepts and binary target variables
    Knowledge of classification concepts and binary target variables helps structure the project as a classification task (predicting fire vs. no fire).
  • Exposure to basic machine learning models like Decision Tree and Logistic Regression
    Having knowledge about these models helps you understand how models are trained, evaluated, and compared in R.
  • Basic knowledge of data visualization using ggplot2
    If you have basic knowledge of data visualization it helps you explore patterns and present environmental factors influencing fire risk.

Level Up Your Career in Data Science and AI with upGrad’s Expert-Led Programs. Learn Generative AI, Machine Learning, and Advanced Analytics—All Online. Apply Now!

Tools, Technologies, and R Libraries You'll Be Using

These are the tools and libraries that’ll be used in this project. These tools help you extract and use the data for various purposes.

Library / Tool

Purpose

ggplot2 To visualize environmental patterns and fire risk relationships
dplyr For data cleaning, transformation, and manipulation
rpart To build and visualize Decision Tree models for classification
randomForest To apply the Random Forest algorithm for better fire risk prediction
readr To load CSV files and handle data input efficiently
tidyverse A collection of packages (including ggplot2 and dplyr) for tidy workflows

Read More To Understand About: Machine Learning with R: Everything You Need to Know

Classification Models You Will Explore in This Forest Fire Project Using R

These are the classification models that’ll be used in this forest fire project.

Model

Purpose

Decision Tree An easy-to-understand model that splits data into clear rules
Random Forest It is an ensemble method that improves accuracy and also reduces overfitting
Logistic Regression A simple, interpretable baseline for binary fire prediction
Other Options Advanced models like SVM and k-NN can be tested for comparison

Read About: R For Data Science: Why Should You Choose R for Data Science?

Project Duration, Difficulty, and Skill Level Required

This forest fire project is a beginner-level project. It requires basic knowledge of R and basic knowledge of models and classification techniques.

Estimated Time Commitment (2–4 Hours)

This Forest Fire Project Using R can typically be completed in 2 to 4 hours. This depends on your familiarity with R language and classification models. The time includes all the steps from start to finish, like data cleaning, visualization, model building, and evaluation.

Complexity Level: Beginner to Intermediate

The project is ideal for beginners who have a basic understanding of R and are looking to apply classification models. It also suits intermediate learners who want experience in environmental data analysis and predictive modeling.

Step-by-Step Forest Fire Project Using R: A Hands-On Guide

In this section, we’ll go through the entire project. Starting from loading the dataset, building and evaluating classification models. Each step includes simple explanations and R code.

Step 1: Download the Forest Fires Dataset and Start Colab

Download the dataset from Kaggle. Next, open Google Colab, and change the runtime to use R instead of Python.

How to switch Colab to R:

  • Click on Runtime → Change runtime type
  • Select R from the "Runtime type" dropdown
  • Click Save

Now, you’re ready to upload the dataset and install the libraries you'll use for this project.

Step 2: Install Required Libraries for the Forest Fire Project

Before starting the project, it’s required that you install the libraries to run the forest fire project properly. These R packages give us the tools to handle data, build models, and visualize results. 

Use this code:

# Install required packages (run this once to set things up)
install.packages("ggplot2")       # For creating charts and visualizations
install.packages("rpart")         # For building decision tree models
install.packages("randomForest")  # For building random forest models
install.packages("dplyr")         # For data cleaning and manipulation

Step 3: Load the Dataset and View the Data

In this step, we’ll load and read the dataset. Use this code:

# Read the uploaded forest fire CSV file into R
data <- read.csv("forestfires.csv")  
# Show the first few rows of the data to understand its structure
head(data)

This gives us a table showing the data of the first few rows and columns, like this:

 

X

Y

month

day

FFMC

DMC

DC

ISI

temp

RH

wind

rain

area

 

<int>

<int>

<chr>

<chr>

<dbl>

<dbl>

<dbl>

<dbl>

<dbl>

<int>

<dbl>

<dbl>

<dbl>

1

7

5

mar

fri

86.2

26.2

94.3

5.1

8.2

51

6.7

0.0

0

2

7

4

oct

tue

90.6

35.4

669.1

6.7

18.0

33

0.9

0.0

0

3

7

4

oct

sat

90.6

43.7

686.9

6.7

14.6

33

1.3

0.0

0

4

8

6

mar

fri

91.7

33.3

77.5

9.0

8.3

97

4.0

0.2

0

5

8

6

mar

sun

89.3

51.3

102.2

9.6

11.4

99

1.8

0.0

0

6

8

6

aug

sun

92.3

85.3

488.0

14.7

22.2

29

5.4

0.0

0

 

You Can Also Read This: 18 Types of Regression in Machine Learning You Should Know

Step 4: Explore and Understand the Forest Fire Dataset

In this step, we’ll analyze the data and understand what it consists of. Use this code:

# View the first few rows of the dataset (to get a quick look at the values)
head(data)
# Check the structure of the dataset (see data types of each column)
str(data)
# Get summary statistics (like mean, min, max, etc.) for each column
summary(data)

This will give us an overview of the dataset:

 

X

Y

month

day

FFMC

DMC

DC

ISI

temp

RH

wind

rain

area

 

<int>

<int>

<chr>

<chr>

<dbl>

<dbl>

<dbl>

<dbl>

<dbl>

<int>

<dbl>

<dbl>

<dbl>

1

7

5

mar

fri

86.2

26.2

94.3

5.1

8.2

51

6.7

0.0

0

2

7

4

oct

tue

90.6

35.4

669.1

6.7

18.0

33

0.9

0.0

0

3

7

4

oct

sat

90.6

43.7

686.9

6.7

14.6

33

1.3

0.0

0

4

8

6

mar

fri

91.7

33.3

77.5

9.0

8.3

97

4.0

0.2

0

5

8

6

mar

sun

89.3

51.3

102.2

9.6

11.4

99

1.8

0.0

0

6

8

6

aug

sun

92.3

85.3

488.0

14.7

22.2

29

5.4

0.0

0

 

Step 5: Create a Target Column for Fire Risk Classification

In this section, we’ll create a new column for fire risk classification. Here’s the code:

# Create a new column called 'fire_risk' based on the 'area' column
# If area > 0, mark it as "yes" (fire occurred), otherwise "no"
data$fire_risk <- ifelse(data$area > 0, "yes", "no")
# Convert the new column to a factor (important for classification models)
data$fire_risk <- as.factor(data$fire_risk)
# Check how many 'yes' and 'no' labels we have
table(data$fire_risk)

The output from this code was:

no yes 

247 270 

That means:

We have 247 instances where no fire occurred, and 270 instances where a fire did occur.

It’s Time To Build Your Skills With This R Language Tutorial

Step 6: Prepare the Dataset for Modeling

In this step, we’ll prepare the dataset for modeling. Here we’ll remove the ‘area’ column to avoid data leakage. By cleaning up the data, we can identify important patterns.

Use this code:

# Load dplyr package for data manipulation
library(dplyr)
# Remove the 'area' column (we've already used it to create 'fire_risk')
data <- data %>% select(-area)
# Convert 'month' and 'day' columns to categorical (factor) type
# These are names like 'aug', 'fri', etc. — not numbers
data$month <- as.factor(data$month)
data$day <- as.factor(data$day)
# Check the updated structure of the dataset
str(data)

The output for this code will be like the following:

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

'data.frame': 517 obs. of  13 variables:

 $ X        : int  7 7 7 8 8 8 8 8 8 7 ...

 $ Y        : int  5 4 4 6 6 6 6 6 6 5 ...

 $ month    : Factor w/ 12 levels "apr","aug","dec",..: 8 11 11 8 8 2 2 2 12 12 ...

 $ day      : Factor w/ 7 levels "fri","mon","sat",..: 1 6 3 1 4 4 2 2 6 3 ...

 $ FFMC     : num  86.2 90.6 90.6 91.7 89.3 92.3 92.3 91.5 91 92.5 ...

 $ DMC      : num  26.2 35.4 43.7 33.3 51.3 ...

 $ DC       : num  94.3 669.1 686.9 77.5 102.2 ...

 $ ISI      : num  5.1 6.7 6.7 9 9.6 14.7 8.5 10.7 7 7.1 ...

 $ temp     : num  8.2 18 14.6 8.3 11.4 22.2 24.1 8 13.1 22.8 ...

 $ RH       : int  51 33 33 97 99 29 27 86 63 40 ...

 $ wind     : num  6.7 0.9 1.3 4 1.8 5.4 3.1 2.2 5.4 4 ...

 $ rain     : num  0 0 0 0.2 0 0 0 0 0 0 ...

 $ fire_risk: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

Step 7: Split the Data into Training and Testing Sets

In this section, we will:

Split the dataset into two parts:

  • Training data (to teach the model)
  • Testing data (to see how well the model performs on unseen data)
# Set a random seed so results stay the same every time you run this code
set.seed(123)
# Combine the training and testing data first (in case re-running)
combined_data <- rbind(train_data, test_data)
# Randomly select 70% of the rows to be used for training
train_index <- sample(1:nrow(combined_data), 0.7 * nrow(combined_data))
# Split the data: 70% for training, 30% for testing
train_data <- combined_data[train_index, ]
test_data <- combined_data[-train_index, ]
# Print the number of rows in each set
cat("Training rows:", nrow(train_data), "\n")
cat("Testing rows:", nrow(test_data), "\n")

We get the output as:

Training rows: 361

Testing rows: 156 

That means:

361 rows will be used to train this model, and 156 rows will be used to test how well the model predicts fire risk.

Also Read To Level Up: What is Data Wrangling? Exploring Its Role in Data Analysis

Step 8: Build and Evaluate a Decision Tree Model

Here, we're using a Decision Tree model, which is simple and easy to understand. It uses features like temperature, humidity, and wind to predict whether there’s a fire risk ("yes" or "no"). 

# Load the decision tree package
library(rpart)
# Train a decision tree model to predict 'fire_risk' based on all other variables
tree_model <- rpart(fire_risk ~ ., data = train_data, method = "class")
# Print the summary of the model (tree structure and splits)
print(tree_model)

This gives the output as:

n= 361

node), split, n, loss, yval, (yprob)

      * denotes terminal node

1) root 361 168 yes (0.4653740 0.5346260)  

2) temp< 18.85 166  75 no (0.5481928 0.4518072)  

4) month=apr,aug,jan,jul,mar,oct 89  30 no (0.6629213 0.3370787)  

8) X< 4.5 51  10 no (0.8039216 0.1960784) *

9) X>=4.5 38  18 yes (0.4736842 0.5263158)

18) temp>=16.9 10   1 no (0.9000000 0.1000000) *

19) temp< 16.9 28   9 yes (0.3214286 0.6785714) *

5) month=dec,feb,jun,may,sep 77  32 yes (0.4155844 0.5844156)

10) wind< 7.8 70  32 yes (0.4571429 0.5428571)

20) DMC< 115.35 44  19 no (0.5681818 0.4318182)

40) RH< 52 17   4 no (0.7647059 0.2352941) *

41) RH>=52 27  12 yes (0.4444444 0.5555556)

 82) RH>=73.5 8   2 no (0.7500000 0.2500000) *

83) RH< 73.5 19   6 yes (0.3157895 0.6842105) *

21) DMC>=115.35 26   7 yes (0.2692308 0.7307692) *

11) wind>=7.8 7   0 yes (0.0000000 1.0000000) *

3) temp>=18.85 195  77 yes (0.3948718 0.6051282)

6) Y< 2.5 17   4 no (0.7647059 0.2352941) *

7) Y>=2.5 178  64 yes (0.3595506 0.6404494)

14) DC< 562.75 33  15 no (0.5454545 0.4545455)

28) temp< 24.4 22   6 no (0.7272727 0.2727273) *

29) temp>=24.4 11   2 yes (0.1818182 0.8181818) *

15) DC>=562.75 145  46 yes (0.3172414 0.6827586) *

The output shows how the decision tree splits the data to predict fire risk.

  • The root node starts with all 361 rows.
  • It first splits at a temperature < 18.85.
  • Further splits depend on month, wind, RH (humidity), DMC, and DC.
  • Each terminal node gives a prediction (“yes” or “no”) based on probabilities.
  • For example, if the temperature is low and the month is April or August, it's more likely no fire.
  • If the temperature is high and DC is above 562.75, it’s more likely a fire risk: yes.

Step 9: Make Predictions Using the Tree and Measure Accuracy

In this step, we’ll test the accuracy of the model used based on the unseen data. Use this code:

# Predict fire risk labels on test data
tree_pred <- predict(tree_model, test_data, type = "class")
# Calculate prediction accuracy
accuracy <- mean(tree_pred == test_data$fire_risk)
# Print accuracy percentage
cat("Decision Tree Accuracy:", round(accuracy * 100, 2), "%\n")

The output for this model came out to be:

Decision Tree Accuracy: 53.85 %

This means the decision tree correctly identified fire risk about 54% of the time, which is a modest start and indicates room for improvement.

Step 10: Train and Test a Random Forest Model

Here we’ll load the random forest model and train a model using 100 decision trees. It will predict fire risk on test data and calculate how many predictions were correct.

# Load the random forest package
library(randomForest)
# Train the model on training data (100 trees)
rf_model <- randomForest(fire_risk ~ ., data = train_data, ntree = 100)
# Make predictions on the test set
rf_pred <- predict(rf_model, test_data)
# Calculate and display accuracy
rf_accuracy <- mean(rf_pred == test_data$fire_risk)
cat("Random Forest Accuracy:", round(rf_accuracy * 100, 2), "%\n")

The output for this model came out to be:

Random Forest Accuracy: 54.49 %

This means your model was able to correctly predict fire risk in about 54.49% of the cases, which is slightly better than the Decision Tree result.

Step 11: Check Feature Importance in the Random Forest Model

In this step, we check the importance of each model and how it helped in making a decision. The code for this section is:

# Show importance values of each variable
importance(rf_model)
# Visualize the importance of features
varImpPlot(rf_model)

This gives the output:

X

15.0032994

Y

12.4502925

month

10.0166118

day

16.6330798

FFMC

13.9930943

DMC

18.1110317

DC

15.8684236

ISI

13.7510319

temp

23.5911713

RH

21.1361677

wind

15.1371009

rain

0.1780586

  • The plot shows which variables are most important for predicting fire risk using Random Forest.
  • The MeanDecreaseGini value tells us how much each variable contributed to improving the model’s decisions.
  • Higher value = more important feature.

Top Important Features (based on plot & table):

Rank

Feature

Importance (MeanDecreaseGini)

1 temp 23.59 – Most important
2 RH 21.14
3 DMC 18.11
4 day 16.63
5 DC 15.87

This shows us that temperature and humidity (RH) are the most crucial indicators of fire risk in this dataset.

Conclusion

In this forest fire project using R, we analyzed forest fire risk using environmental data and built two classification models: a Decision Tree and a Random Forest.

We started by preprocessing the dataset, converting necessary variables to factors, and creating a binary target variable (fire_risk). After splitting the data into training and testing sets, we trained and evaluated both models.

The Decision Tree model achieved an accuracy of 53.85%, while the Random Forest model performed slightly better with 54.49% accuracy. Random Forest also identified key features like temperature, relative humidity, and DMC as the most influential in predicting fire risk.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

background

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Reference:
https://colab.research.google.com/drive/1oPFk4aH-Hy0lrtiTMk80neRcbh6kPRFQ#scrollTo=LKrm_UOKb-OY

Frequently Asked Questions (FAQs)

1. What is the Forest Fire Prediction project in R?

2. Which R libraries are used in the Forest Fire Classification Project?

3. What algorithms can be used to improve forest fire prediction?

4. Can I do this Forest Fire project without prior experience in R?

5. What are some similar machine learning projects I can try after this one?

Rohit Sharma

802 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

Placement Assistance

Executive PG Program

12 Months