Daily Temperature Forecast Analysis Using R
By Rohit Sharma
Updated on Aug 06, 2025 | 10 min read | 1.45K+ views
Share:
For working professionals
For fresh graduates
More
By Rohit Sharma
Updated on Aug 06, 2025 | 10 min read | 1.45K+ views
Share:
Table of Contents
Learn how to build a project on the Daily Temperature Forecast Using R. In this blog, we'll explore time series data, clean and visualize it, and apply the ARIMA model to forecast future temperatures.
Using a dataset of Delhi’s daily climate, this blog will explain each step, from importing the data to evaluating model accuracy, all using R on Google Colab. This project will help you understand the basics with simple code and clear explanations.
Accelerate your data science career with upGrad’s cutting-edge online data science programs. Gain hands-on expertise in Python, Machine Learning, AI, Tableau, and SQL, guided by industry leaders. Enrol now and lead the data-driven revolution.
Take your Data Science skills to the next level with these Top 25+ R Projects for Beginners.
Before starting this project on Daily Temperature Forecast Using R, it's helpful to know what tools and libraries you'll be working with. The tools and libraries used in the project are given in the table below.
Tool / Library |
Purpose |
Google Colab | Cloud platform to run R code in-browser |
R Language | Programming language for data analysis |
tidyverse | Data wrangling and data manipulation |
ggplot2 | Data visualization |
lubridate | Date conversion and manipulation |
zoo | Time series handling |
forecast | Building ARIMA models and forecasting |
tseries | Running stationarity tests like ADF |
Level up your future with world-class Data Science and AI programs. Whether it's an executive edge, a global master’s, or a smart start with a B.Sc., upGrad has the path to your success. Learn from the best. Lead the rest. Enrol now.
Before you begin this project, it’s important to understand a few basics that will help you work smoothly with the dataset and tools:
If you're wondering how much time this project will take or whether it's the right fit for your current skill level, this quick overview will help you plan better:
Aspect |
Details |
Estimated Duration | 1–2 hours |
Project Difficulty | Beginner |
R Skill Level Needed | Basic understanding of R and data frames |
Libraries Used | tidyverse, ggplot2, forecast, tseries |
Tools Required | R (in Google Colab) and a CSV dataset |
In this section, you’ll find a complete breakdown of the project into simple, easy-to-follow steps. Each step includes the R code you need, along with short comments and a brief explanation to help you understand what’s happening at that stage.
To begin working with R in Google Colab, you'll need to switch the runtime from Python to R. This setup ensures that all code cells in your notebook will run R code instead of Python by default.
Here's how to switch to R:
Here’s an R Project: Loan Approval Classification Using Logistic Regression in R
Before running the analysis, we need to install a few essential R packages. These libraries will help you clean the data, work with dates, visualize trends, and build forecasting models. You only need to install them once per session in Google Colab. Here’s the code to install the libraries and packages:
# Install packages (Run only once)
install.packages("tidyverse") # For data handling and manipulation (includes dplyr, readr, etc.)
install.packages("lubridate") # Makes working with dates easier
install.packages("forecast") # Used for time series modeling like ARIMA
install.packages("tseries") # For running statistical tests like ADF (stationarity check)
install.packages("ggplot2") # For creating data visualizations
install.packages("zoo") # Helps in time series data conversion and handling
The output will confirm that the libraries and packages are installed and loaded:
Installing package into ‘/usr/local/lib/R/site-library’
Installing package into ‘/usr/local/lib/R/site-library’
Installing package into ‘/usr/local/lib/R/site-library’
also installing the dependencies ‘xts’, ‘TTR’, ‘quadprog’, ‘quantmod’, ‘colorspace’, ‘fracdiff’, ‘lmtest’, ‘timeDate’, ‘tseries’, ‘urca’, ‘zoo’, ‘RcppArmadillo’
Installing package into ‘/usr/local/lib/R/site-library’
Installing package into ‘/usr/local/lib/R/site-library’ Installing package into ‘/usr/local/lib/R/site-library’
|
Since you’ve already uploaded the dataset to your Google Colab session, this step involves reading that file into R. We’ll use read.csv() to load the data and head() to preview the first few rows. Here’s the code:
# Read the uploaded dataset from its path in Colab
data <- read.csv("DailyDelhiClimateTest.csv")
# View the first few rows of the dataset
head(data)
The above step will read the dataset and give us a glimpse of the data:
date |
meantemp |
humidity |
wind_speed |
meanpressure |
|
<chr> |
<dbl> |
<dbl> |
<dbl> |
<dbl> |
|
1 |
2017-01-01 |
15.91304 |
85.86957 |
2.743478 |
59.000 |
2 |
2017-01-02 |
18.50000 |
77.22222 |
2.894444 |
1018.278 |
3 |
2017-01-03 |
17.11111 |
81.88889 |
4.016667 |
1018.333 |
4 |
2017-01-04 |
18.70000 |
70.05000 |
4.545000 |
1015.700 |
5 |
2017-01-05 |
18.38889 |
74.94444 |
3.300000 |
1014.333 |
6 |
2017-01-06 |
19.31818 |
79.31818 |
8.681818 |
1011.773 |
Now that the data is loaded, the next step is to inspect and prepare it for analysis. This includes checking the structure, converting the date column, identifying missing values, and reviewing a summary of the dataset. Here’s the code:
# Load required libraries
library(tidyverse)
library(lubridate)
library(zoo)
# Check structure of the data
str(data) # Shows column types and sample values
# Convert 'date' column to Date format
data$date <- as.Date(data$date) # Ensures proper date handling
# Check for missing values
sum(is.na(data)) # Returns the total number of NAs in the dataset
# View summary of the dataset
summary(data) # Gives min, max, mean, and quartiles for each column
The output of the above code is:
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.5 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.5.2 ✔ tibble 3.3.0 ✔ lubridate 1.9.4 ✔ tidyr 1.3.1 ✔ purrr 1.1.0 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Attaching package: ‘zoo’ The following objects are masked from ‘package:base’:
as.Date, as.Date.numeric
'data.frame': 114 obs. of 5 variables: $ date : chr "2017-01-01" "2017-01-02" "2017-01-03" "2017-01-04" ... $ meantemp : num 15.9 18.5 17.1 18.7 18.4 ... $ humidity : num 85.9 77.2 81.9 70 74.9 ... $ wind_speed : num 2.74 2.89 4.02 4.54 3.3 ... $ meanpressure: num 59 1018 1018 1016 1014 ...
0 date meantemp humidity wind_speed Min. :2017-01-01 Min. :11.00 Min. :17.75 Min. : 1.387 1st Qu.:2017-01-29 1st Qu.:16.44 1st Qu.:39.62 1st Qu.: 5.564 Median :2017-02-26 Median :19.88 Median :57.75 Median : 8.069 Mean :2017-02-26 Mean :21.71 Mean :56.26 Mean : 8.144 3rd Qu.:2017-03-26 3rd Qu.:27.71 3rd Qu.:71.90 3rd Qu.:10.069 Max. :2017-04-24 Max. :34.50 Max. :95.83 Max. :19.314 meanpressure Min.: 59 1st Qu.:1007 Median :1013 Mean :1004 3rd Qu.:1017 Max. :1023 |
Check This R Project: Customer Segmentation Project Using R: A Step-by-Step Guide
Before jumping into forecasting, it's important to understand the overall trend of the temperature data. A line chart gives a clear view of how daily mean temperatures have changed over time. The code for this step is:
library(ggplot2)
# Line plot of temperature over time
ggplot(data, aes(x = date, y = meantemp)) +
geom_line(color = "blue") + # Draws a blue line for temperature
labs(title = "Daily Temperature in Delhi",
x = "Date", y = "Mean Temperature (°C)") + # Axis labels and title
theme_minimal() # Clean and minimal visual style
The graph for this step shows how the temperature increases gradually over the months.
Popular Data Science Programs
To apply time series forecasting methods, you need to convert the temperature values into a proper time series object in R. This prepares the data for models like ARIMA. The code for this step is:
# Create a time series object
temp_ts <- ts(data$meantemp, frequency = 365) # Daily data, assumes yearly seasonality
# Plot time series
plot(temp_ts, main = "Time Series of Daily Mean Temperature",
ylab = "Mean Temperature (°C)", xlab = "Days") # Basic time series plot
The above graph shows how the temperature rises as the day progresses.
Here’s an R Project For You: Car Data Analysis Project Using R
A stationary time series has a consistent mean and variance over time, which is a key assumption for ARIMA modeling. The Augmented Dickey-Fuller (ADF) test helps us know whether a time series is stationary. Here’s the code:
library(tseries)
# Augmented Dickey-Fuller test for stationarity
adf.test(temp_ts) # Returns a p-value to assess stationarity
The output for this step is:
Registered S3 method overwritten by 'quantmod': method from as.zoo.data.frame zoo
Augmented Dickey-Fuller Test data: temp_ts Dickey-Fuller = -3.6378, Lag order = 4, p-value = 0.03297 alternative hypothesis: stationary |
The above output shows that:
Thus, this temperature time series is stationary, which means we do NOT need to differentiate the data.
Here’s a Must-Try R Project: Forest Fire Project Using R - A Step-by-Step Guide
ARIMA is a powerful forecasting method for time series data. The auto.arima() function finds the best ARIMA configuration by testing multiple combinations of parameters automatically. Here’s the code:
library(forecast)
# Build the ARIMA model automatically
model <- auto.arima(temp_ts) # Automatically selects the best (p,d,q) model
# Print the summary of the model
summary(model) # Displays model coefficients and diagnostics
The output for the above code is:
Series: temp_ts ARIMA(0,1,0)
sigma^2 = 2.856: log likelihood = -219.63 AIC=441.26 AICc=441.3 BIC=443.99 |
Training set error measures:
ME |
RMSE |
MAE |
MPE |
MAPE |
MASE |
ACF1 |
|
Training set | 0.1412532 | 1.682507 | 1.306216 | 0.2263582 | 6.598771 | NaN | -0.1295276 |
The above output means that:
Here’s what the ARIMA model performance means:
Metric |
Value |
What It Means |
ME (Mean Error) | 0.14 | On average, it's slightly overestimating. |
RMSE | 1.68 | The standard deviation of errors; the smaller it is better. |
MAE | 1.31 | On average, your forecast is off by ~1.3°C. |
MAPE | 6.6% | Only ~6.6% average error — this is considered very good for forecasting! |
ACF1 | -0.13 | No strong autocorrelation in residuals, which is good. |
The ARIMA model is doing a solid job! A MAPE below 10% indicates high forecasting accuracy.
After training the ARIMA model, we can now predict future values. Here, we generate a 30-day temperature forecast and visualize it using a simple plot. Here’s the code to generate the graph:
# Forecast the next 30 days
forecast_temp <- forecast(model, h = 30) # h = number of days to forecast
# Plot the forecast
plot(forecast_temp,
main = "Temperature Forecast for Next 30 Days",
xlab = "Time", ylab = "Mean Temperature (°C)") # Visualize forecast and confidence intervals
The output of the above code gives the graph:
After forecasting, it's important to evaluate how well the ARIMA model fits the training data. The accuracy() function provides several statistical metrics to assess performance. Here’s the code:
# Evaluate the model's accuracy
accuracy(model) # Returns metrics like RMSE, MAE, MAPE, etc.
The output for the above code gives us a table showing the model’s accuracy:
ME |
RMSE |
MAE |
MPE |
MAPE |
MASE |
ACF1 |
|
Training set |
0.1412532 |
1.682507 |
1.306216 |
0.2263582 |
6.598771 |
NaN |
-0.1295276 |
The above output shows that:
Metric |
Meaning |
Explanation (In Easy Terms) |
ME (Mean Error) | Average of all forecast errors | A small number close to 0 (like 0.14) means the model isn’t consistently over- or under-predicting. |
RMSE (Root Mean Square Error) | Standard deviation of prediction errors | Measures how far off your predictions are, on average. Lower RMSE = better model. (Here: 1.68) |
MAE (Mean Absolute Error) | Average of absolute errors (ignores direction) | Tells you how much the forecast is off, on average. (Here: 1.30°C) |
MPE (Mean Percentage Error) | Average of percentage errors | Shows average error in percentage terms. It can be misleading with small values. |
MAPE (Mean Absolute Percentage Error) | Mean of absolute percentage errors | Very popular metric. Here, 6.59% means your forecast is, on average, 6.6% off the actual values. |
MASE (Mean Absolute Scaled Error) | Scaled version of MAE | Can’t be calculated here (shows NaN) because it needs test data or a benchmark method. |
ACF1 | Autocorrelation of errors at lag 1 | -0.129 means there's little autocorrelation — a good sign (errors are not following a pattern). |
In this Daily Temperature Forecast Using R project, we used an ARIMA time series model in Google Colab to predict future temperatures based on daily historical data from Delhi.
After cleaning and exploring the data, we converted it into a time series format, checked for stationarity using the Augmented Dickey-Fuller test, and automatically fitted an ARIMA model.
We forecasted the next 30 days and evaluated the model using metrics like RMSE and MAPE. The model achieved an RMSE of 1.68 and a MAPE of 6.6%, showing reasonable accuracy for daily temperature predictions.
Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!
Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!
Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!
827 articles published
Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...
Speak with Data Science Expert
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources