Daily Temperature Forecast Analysis Using R

By Rohit Sharma

Updated on Aug 06, 2025 | 10 min read | 1.45K+ views

Share:

Learn how to build a project on the Daily Temperature Forecast Using R. In this blog, we'll explore time series data, clean and visualize it, and apply the ARIMA model to forecast future temperatures. 

Using a dataset of Delhi’s daily climate, this blog will explain each step, from importing the data to evaluating model accuracy, all using R on Google Colab. This project will help you understand the basics with simple code and clear explanations.

Accelerate your data science career with upGrad’s cutting-edge online data science programs. Gain hands-on expertise in Python, Machine Learning, AI, Tableau, and SQL, guided by industry leaders. Enrol now and lead the data-driven revolution.

Take your Data Science skills to the next level with these Top 25+ R Projects for Beginners.

What Tools and Libraries You’ll Need to Get Started with This Forecasting Project

Before starting this project on Daily Temperature Forecast Using R, it's helpful to know what tools and libraries you'll be working with. The tools and libraries used in the project are given in the table below.

Tool / Library

Purpose

Google Colab Cloud platform to run R code in-browser
R Language Programming language for data analysis
tidyverse Data wrangling and data manipulation
ggplot2 Data visualization
lubridate Date conversion and manipulation
zoo Time series handling
forecast Building ARIMA models and forecasting
tseries Running stationarity tests like ADF

Level up your future with world-class Data Science and AI programs. Whether it's an executive edge, a global master’s, or a smart start with a B.Sc., upGrad has the path to your success. Learn from the best. Lead the rest. Enrol now.

What You Must Understand Before Starting the Daily Temperature Forecast Analysis

Before you begin this project, it’s important to understand a few basics that will help you work smoothly with the dataset and tools:

  • You’ll be working with a time series dataset containing daily temperature readings.
  • This project uses the ARIMA model, which requires the data to be stationary for accurate forecasting.
  • You must be comfortable using R packages like forecast, tseries, and ggplot2.
  • Time series projects focus on trends over time, seasonality, and date formatting matters a lot.
  • You need a basic understanding of R syntax and dataframes; it will be useful.

Project Duration, Difficulty, and Skill Level Required

If you're wondering how much time this project will take or whether it's the right fit for your current skill level, this quick overview will help you plan better:

Aspect

Details

Estimated Duration 1–2 hours
Project Difficulty Beginner
R Skill Level Needed Basic understanding of R and data frames
Libraries Used tidyverse, ggplot2, forecast, tseries
Tools Required R (in Google Colab) and a CSV dataset

How This Daily Temperature Forecasting Project in R Is Structured Step by Step

In this section, you’ll find a complete breakdown of the project into simple, easy-to-follow steps. Each step includes the R code you need, along with short comments and a brief explanation to help you understand what’s happening at that stage.

Step 1: Configure Google Colab for R Programming

To begin working with R in Google Colab, you'll need to switch the runtime from Python to R. This setup ensures that all code cells in your notebook will run R code instead of Python by default.

Here's how to switch to R:

  • Open Google Colab and create a new notebook
  • Go to the Runtime menu at the top
  • Click Change runtime type
  • Under the Language option, select R from the dropdown
  • Click Save to apply the change

Here’s an R Project: Loan Approval Classification Using Logistic Regression in R

Step 2: Install the Required R Packages

Before running the analysis, we need to install a few essential R packages. These libraries will help you clean the data, work with dates, visualize trends, and build forecasting models. You only need to install them once per session in Google Colab. Here’s the code to install the libraries and packages:

# Install packages (Run only once)

install.packages("tidyverse")   # For data handling and manipulation (includes dplyr, readr, etc.)

install.packages("lubridate")   # Makes working with dates easier

install.packages("forecast")    # Used for time series modeling like ARIMA

install.packages("tseries")     # For running statistical tests like ADF (stationarity check)

install.packages("ggplot2")     # For creating data visualizations

install.packages("zoo")         # Helps in time series data conversion and handling

The output will confirm that the libraries and packages are installed and loaded:

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

 

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

 

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

 

also installing the dependencies ‘xts’, ‘TTR’, ‘quadprog’, ‘quantmod’, ‘colorspace’, ‘fracdiff’, ‘lmtest’, ‘timeDate’, ‘tseries’, ‘urca’, ‘zoo’, ‘RcppArmadillo’

  

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

 

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

 

Step 3: Load the Dataset into Your R Environment

Since you’ve already uploaded the dataset to your Google Colab session, this step involves reading that file into R. We’ll use read.csv() to load the data and head() to preview the first few rows. Here’s the code:

# Read the uploaded dataset from its path in Colab
data <- read.csv("DailyDelhiClimateTest.csv")

# View the first few rows of the dataset
head(data)

The above step will read the dataset and give us a glimpse of the data:

 

date

meantemp

humidity

wind_speed

meanpressure

 

<chr>

<dbl>

<dbl>

<dbl>

<dbl>

1

2017-01-01

15.91304

85.86957

2.743478

59.000

2

2017-01-02

18.50000

77.22222

2.894444

1018.278

3

2017-01-03

17.11111

81.88889

4.016667

1018.333

4

2017-01-04

18.70000

70.05000

4.545000

1015.700

5

2017-01-05

18.38889

74.94444

3.300000

1014.333

6

2017-01-06

19.31818

79.31818

8.681818

1011.773

 

Step 4: Clean and Explore the Dataset

Now that the data is loaded, the next step is to inspect and prepare it for analysis. This includes checking the structure, converting the date column, identifying missing values, and reviewing a summary of the dataset. Here’s the code:

# Load required libraries

library(tidyverse)

library(lubridate)

library(zoo)


# Check structure of the data
str(data)  # Shows column types and sample values

# Convert 'date' column to Date format
data$date <- as.Date(data$date)  # Ensures proper date handling

# Check for missing values
sum(is.na(data))  # Returns the total number of NAs in the dataset

# View summary of the dataset
summary(data)  # Gives min, max, mean, and quartiles for each column

The output of the above code is:

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──

 dplyr    1.1.4      readr    2.1.5

 forcats  1.0.0      stringr  1.5.1

 ggplot2  3.5.2      tibble   3.3.0

 lubridate 1.9.4      tidyr    1.3.1

 purrr    1.1.0     

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──

 dplyr::filter() masks stats::filter()

 dplyr::lag()    masks stats::lag()

Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

 

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

 

    as.Date, as.Date.numeric

 

'data.frame': 114 obs. of  5 variables:

 $ date        : chr  "2017-01-01" "2017-01-02" "2017-01-03" "2017-01-04" ...

 $ meantemp    : num  15.9 18.5 17.1 18.7 18.4 ...

 $ humidity    : num  85.9 77.2 81.9 70 74.9 ...

 $ wind_speed  : num  2.74 2.89 4.02 4.54 3.3 ...

 $ meanpressure: num  59 1018 1018 1016 1014 ...

 

0

     date                     meantemp           humidity           wind_speed    

Min.   :2017-01-01      Min.   :11.00         Min.   :17.75       Min.   : 1.387  

1st Qu.:2017-01-29    1st Qu.:16.44       1st Qu.:39.62    1st Qu.: 5.564  

Median :2017-02-26  Median :19.88     Median :57.75    Median : 8.069  

Mean   :2017-02-26   Mean   :21.71       Mean   :56.26    Mean   : 8.144  

3rd Qu.:2017-03-26   3rd Qu.:27.71      3rd Qu.:71.90     3rd Qu.:10.069  

Max.   :2017-04-24     Max.   :34.50      Max.   :95.83      Max.   :19.314  

  meanpressure 

 Min.:  59  

 1st Qu.:1007  

 Median :1013  

 Mean   :1004  

 3rd Qu.:1017  

 Max.   :1023 

Check This R Project: Customer Segmentation Project Using R: A Step-by-Step Guide

Step 5: Visualize the Daily Temperature Trend

Before jumping into forecasting, it's important to understand the overall trend of the temperature data. A line chart gives a clear view of how daily mean temperatures have changed over time. The code for this step is:

library(ggplot2)


# Line plot of temperature over time

ggplot(data, aes(x = date, y = meantemp)) +

  geom_line(color = "blue") +  # Draws a blue line for temperature

  labs(title = "Daily Temperature in Delhi", 

       x = "Date", y = "Mean Temperature (°C)") +  # Axis labels and title

  theme_minimal()  # Clean and minimal visual style

The graph for this step shows how the temperature increases gradually over the months.

Step 6: Convert the Data to a Time Series Format

To apply time series forecasting methods, you need to convert the temperature values into a proper time series object in R. This prepares the data for models like ARIMA. The code for this step is:

# Create a time series object

temp_ts <- ts(data$meantemp, frequency = 365)  # Daily data, assumes yearly seasonality


# Plot time series

plot(temp_ts, main = "Time Series of Daily Mean Temperature",

     ylab = "Mean Temperature (°C)", xlab = "Days")  # Basic time series plot

The above graph shows how the temperature rises as the day progresses.

Here’s an R Project For You: Car Data Analysis Project Using R

Step 7: Check if the Time Series Is Stationary

A stationary time series has a consistent mean and variance over time, which is a key assumption for ARIMA modeling. The Augmented Dickey-Fuller (ADF) test helps us know whether a time series is stationary. Here’s the code:

library(tseries)

# Augmented Dickey-Fuller test for stationarity

adf.test(temp_ts)  # Returns a p-value to assess stationarity

The output for this step is:

Registered S3 method overwritten by 'quantmod':

  method            from

  as.zoo.data.frame zoo 

 

Augmented Dickey-Fuller Test
 

data:  temp_ts

Dickey-Fuller = -3.6378, Lag order = 4, p-value = 0.03297

alternative hypothesis: stationary

The above output shows that:

  • ADF Statistic = -3.6378
  • p-value = 0.03297
  • Since p-value < 0.05, we reject the null hypothesis.

Thus, this temperature time series is stationary, which means we do NOT need to differentiate the data.

Here’s a Must-Try R Project: Forest Fire Project Using R - A Step-by-Step Guide

Step 8: Build an ARIMA Model Automatically

ARIMA is a powerful forecasting method for time series data. The auto.arima() function finds the best ARIMA configuration by testing multiple combinations of parameters automatically. Here’s the code:

library(forecast)

# Build the ARIMA model automatically
model <- auto.arima(temp_ts)  # Automatically selects the best (p,d,q) model

# Print the summary of the model
summary(model)  # Displays model coefficients and diagnostics

The output for the above code is:

Series: temp_ts 

ARIMA(0,1,0) 

 

sigma^2 = 2.856:  log likelihood = -219.63

AIC=441.26   AICc=441.3   BIC=443.99

Training set error measures:

 

ME

RMSE

MAE

MPE

MAPE

MASE

ACF1

Training set 0.1412532 1.682507 1.306216 0.2263582 6.598771 NaN -0.1295276


The above output means that:

Here’s what the ARIMA model performance means:

Metric

Value

What It Means

ME (Mean Error) 0.14 On average, it's slightly overestimating.
RMSE 1.68 The standard deviation of errors; the smaller it is better.
MAE 1.31 On average, your forecast is off by ~1.3°C.
MAPE 6.6% Only ~6.6% average error — this is considered very good for forecasting!
ACF1 -0.13 No strong autocorrelation in residuals, which is good.

The ARIMA model is doing a solid job! A MAPE below 10% indicates high forecasting accuracy.

Step 9: Forecast Daily Temperature for the Next 30 Days

After training the ARIMA model, we can now predict future values. Here, we generate a 30-day temperature forecast and visualize it using a simple plot. Here’s the code to generate the graph:

# Forecast the next 30 days
forecast_temp <- forecast(model, h = 30)  # h = number of days to forecast


# Plot the forecast

plot(forecast_temp, 

     main = "Temperature Forecast for Next 30 Days",

     xlab = "Time", ylab = "Mean Temperature (°C)")  # Visualize forecast and confidence intervals

The output of the above code gives the graph:

background

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Step 10: Evaluate the Accuracy of the ARIMA Model

After forecasting, it's important to evaluate how well the ARIMA model fits the training data. The accuracy() function provides several statistical metrics to assess performance. Here’s the code:

# Evaluate the model's accuracy

accuracy(model)  # Returns metrics like RMSE, MAE, MAPE, etc.

The output for the above code gives us a table showing the model’s accuracy:

 

ME

RMSE

MAE

MPE

MAPE

MASE

ACF1

Training set

0.1412532

1.682507

1.306216

0.2263582

6.598771

NaN

-0.1295276

 

The above output shows that:

Metric

Meaning

Explanation (In Easy Terms)

ME (Mean Error) Average of all forecast errors A small number close to 0 (like 0.14) means the model isn’t consistently over- or under-predicting.
RMSE (Root Mean Square Error) Standard deviation of prediction errors Measures how far off your predictions are, on average. Lower RMSE = better model. (Here: 1.68)
MAE (Mean Absolute Error) Average of absolute errors (ignores direction) Tells you how much the forecast is off, on average. (Here: 1.30°C)
MPE (Mean Percentage Error) Average of percentage errors Shows average error in percentage terms. It can be misleading with small values.
MAPE (Mean Absolute Percentage Error) Mean of absolute percentage errors Very popular metric. Here, 6.59% means your forecast is, on average, 6.6% off the actual values.
MASE (Mean Absolute Scaled Error) Scaled version of MAE Can’t be calculated here (shows NaN) because it needs test data or a benchmark method.
ACF1 Autocorrelation of errors at lag 1 -0.129 means there's little autocorrelation — a good sign (errors are not following a pattern).

Conclusion

In this Daily Temperature Forecast Using R project, we used an ARIMA time series model in Google Colab to predict future temperatures based on daily historical data from Delhi.

After cleaning and exploring the data, we converted it into a time series format, checked for stationarity using the Augmented Dickey-Fuller test, and automatically fitted an ARIMA model.

We forecasted the next 30 days and evaluated the model using metrics like RMSE and MAPE. The model achieved an RMSE of 1.68 and a MAPE of 6.6%, showing reasonable accuracy for daily temperature predictions.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Frequently Asked Questions (FAQs)

1. What is the goal of the Daily Temperature Forecasting project in R?

2. Which tools and libraries are required to build this temperature forecasting model?

3. Can I use machine learning models instead of ARIMA for forecasting?

4. How can I improve the accuracy of my temperature forecast model?

5. What other beginner-friendly R projects can I explore after this?

Rohit Sharma

827 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

360° Career Support

Executive PG Program

12 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree

17 Months

upGrad Logo

Certification

3 Months