Food Delivery Analysis Project Using R
By Rohit Sharma
Updated on Aug 04, 2025 | 15 min read | 1.31K+ views
Share:
For working professionals
For fresh graduates
More
By Rohit Sharma
Updated on Aug 04, 2025 | 15 min read | 1.31K+ views
Share:
Table of Contents
This Food Delivery Analysis project in R will look into customer ordering patterns, delivery performance, and payment behavior using real-world food order data from New Delhi.
By cleaning and analyzing the dataset in Google Colab, we will try to understand key metrics such as peak order hours, top restaurants, delivery duration trends, and preferred payment methods.
We will also examine the correlation between order value and delivery time to understand logistical patterns. This project will use R libraries like tidyverse, ggplot2, and lubridate to build a solid foundation in data analysis and visualization.
Want a 6-Figure Career in Data Science? upGrad’s Expert-Led Data Science Courses Can Get You There, No Experience Needed! Start Learning Today, Thank Yourself Tomorrow.
Read This For: Top 25+ R Projects for Beginners to Boost Your Data Science Skills in 2025
Popular Data Science Programs
This section will tell you how long the project will take and what skills you'll need to complete the Food Delivery Analysis project in R successfully.
Aspect |
Details |
Duration | 2–4 hours (including exploration) |
Difficulty Level | Beginner |
Skills Needed | Basic R knowledge, data handling, plotting |
Tools Used | Google Colab (R), tidyverse, ggplot2 |
Output | Visualizations and insights from food delivery data |
Break Into AI’s Hottest Careers! Master Generative AI & Data Science with upGrad’s Power-Packed Certifications. No Tech Background? No Problem. Start Today & Future-Proof Your Career!
Before you begin this Food Delivery Analysis project in R, it's helpful to be familiar with the following:
To complete the Food Delivery Analysis project in R, we’ll use various tools and R libraries for processes like data cleaning, exploration, and visualization. Here's a quick overview of what you'll be working with:
Tool/Library |
Purpose |
Google Colab (R runtime) | Cloud-based coding environment for R |
tidyverse | Collection of packages for data manipulation and visualization (includes dplyr, ggplot2, etc.) |
lubridate | Handles date and time conversions and calculations |
janitor | Simplifies data cleaning and column name formatting |
ggplot2 | Used for creating powerful and flexible visualizations |
skimr | (Optional) Provides quick summaries of the dataset |
Must Know: Best R Libraries Data Science: Tools for Analysis, Visualization & ML
This section will walk through each part of the project with clear explanations, well-commented code, and meaningful insights to help you understand the process from start to finish.
To begin working with R in Google Colab, you'll first need to switch the default programming environment from Python to R. This enables you to write and execute R code directly within the notebook.
Here’s how to set it up:
This step ensures that all the necessary R packages are installed and loaded into your Colab session. These libraries will help with data cleaning, exploration, visualization, and time manipulation throughout the project. Here’s the code:
# Install libraries (you only need to do this once)
install.packages("tidyverse") # For data handling and visualization
install.packages("lubridate") # For handling date and time
install.packages("skimr") # For summary statistics and data overview
install.packages("janitor") # For cleaning and standardizing column names
# Load the libraries
library(tidyverse) # Loads core packages like dplyr and ggplot2
library(lubridate) # Makes it easier to work with dates and times
library(skimr) # Summarizes data in a reader-friendly format
library(janitor) # Cleans up column names for consistency
The above step installs the required libraries and packages. The output is:
Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) Installing package into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) also installing the dependency ‘snakecase’ ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.5 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.5.2 ✔ tibble 3.3.0 ✔ lubridate 1.9.4 ✔ tidyr 1.3.1 ✔ purrr 1.1.0 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors Attaching package: ‘janitor’ The following objects are masked from ‘package:stats’: chisq.test, fisher.test |
Here’s an R Project For You: Car Data Analysis Project Using R
Now that your environment is set up, the next step is to upload and load your dataset into R. This code reads the CSV file you uploaded into a data frame and previews the first few rows to confirm it loaded correctly. The code for this step is:
# Load the uploaded file from Colab storage
food_data <- read.csv("food_orders_new_delhi (1).csv") # Read CSV into a data frame
# See first few rows
head(food_data) # Displays the top rows of the dataset
The output gives us a preview of our dataset:
Order.ID |
Customer.ID |
Restaurant.ID |
Order.Date.and.Time |
Delivery.Date.and.Time |
Order.Value |
Delivery.Fee |
Payment.Method |
Discounts.and.Offers |
Commission.Fee |
Payment.Processing.Fee |
Refunds.Chargebacks |
|
<int> |
<chr> |
<chr> |
<chr> |
<chr> |
<int> |
<int> |
<chr> |
<chr> |
<int> |
<int> |
<int> |
|
1 |
1 |
C8270 |
R2924 |
2024-02-01 01:11:52 |
2024-02-01 02:39:52 |
1914 |
0 |
Credit Card |
5% on App |
150 |
47 |
0 |
2 |
2 |
C1860 |
R2054 |
2024-02-02 22:11:04 |
2024-02-02 22:46:04 |
986 |
40 |
Digital Wallet |
10% |
198 |
23 |
0 |
3 |
3 |
C6390 |
R2870 |
2024-01-31 05:54:35 |
2024-01-31 06:52:35 |
937 |
30 |
Cash on Delivery |
15% New User |
195 |
45 |
0 |
4 |
4 |
C6191 |
R2642 |
2024-01-16 22:52:49 |
2024-01-16 23:38:49 |
1463 |
50 |
Cash on Delivery |
None |
146 |
27 |
0 |
5 |
5 |
C6734 |
R2799 |
2024-01-29 01:19:30 |
2024-01-29 02:48:30 |
1992 |
30 |
Cash on Delivery |
50 off Promo |
130 |
50 |
0 |
6 |
6 |
C7265 |
R2777 |
2024-01-25 04:36:52 |
2024-01-25 05:27:52 |
439 |
20 |
Cash on Delivery |
10% |
92 |
27 |
150 |
Before starting the analysis, it's important to standardize the column names and understand the structure of your dataset. This step uses the janitor package to clean column names and explores the data format and content. We use the code
# Clean column names to make them easier to work with
food_data <- janitor::clean_names(food_data) # Converts column names to lowercase with underscores
# View structure of dataset
str(food_data) # Shows data types and structure of each column
# Preview first few rows
head(food_data) # Confirms changes and provides a sample of the dataset
We get the output as:
'data.frame': 1000 obs. of 12 variables: $ order_id : int 1 2 3 4 5 6 7 8 9 10 ... $ customer_id : chr "C8270" "C1860" "C6390" "C6191" ... $ restaurant_id : chr "R2924" "R2054" "R2870" "R2642" ... $ order_date_and_time : chr "2024-02-01 01:11:52" "2024-02-02 22:11:04" "2024-01-31 05:54:35" "2024-01-16 22:52:49" ... $ delivery_date_and_time: chr "2024-02-01 02:39:52" "2024-02-02 22:46:04" "2024-01-31 06:52:35" "2024-01-16 23:38:49" ... $ order_value : int 1914 986 937 1463 1992 439 303 260 1663 491 ... $ delivery_fee : int 0 40 30 50 30 20 30 0 40 40 ... $ payment_method : chr "Credit Card" "Digital Wallet" "Cash on Delivery" "Cash on Delivery" ... $ discounts_and_offers : chr "5% on App" "10%" "15% New User" "None" ... $ commission_fee : int 150 198 195 146 130 92 144 55 116 189 ... $ payment_processing_fee: int 47 23 45 27 50 27 12 19 48 10 ... $ refunds_chargebacks : int 0 0 0 0 0 150 50 0 0 0 ... |
order_id |
customer_id |
restaurant_id |
order_date_and_time |
delivery_date_and_time |
order_value |
delivery_fee |
payment_method |
discounts_and_offers |
commission_fee |
payment_processing_fee |
refunds_chargebacks |
|
<int> |
<chr> |
<chr> |
<chr> |
<chr> |
<int> |
<int> |
<chr> |
<chr> |
<int> |
<int> |
<int> |
|
1 |
1 |
C8270 |
R2924 |
2024-02-01 01:11:52 |
2024-02-01 02:39:52 |
1914 |
0 |
Credit Card |
5% on App |
150 |
47 |
0 |
2 |
2 |
C1860 |
R2054 |
2024-02-02 22:11:04 |
2024-02-02 22:46:04 |
986 |
40 |
Digital Wallet |
10% |
198 |
23 |
0 |
3 |
3 |
C6390 |
R2870 |
2024-01-31 05:54:35 |
2024-01-31 06:52:35 |
937 |
30 |
Cash on Delivery |
15% New User |
195 |
45 |
0 |
4 |
4 |
C6191 |
R2642 |
2024-01-16 22:52:49 |
2024-01-16 23:38:49 |
1463 |
50 |
Cash on Delivery |
None |
146 |
27 |
0 |
5 |
5 |
C6734 |
R2799 |
2024-01-29 01:19:30 |
2024-01-29 02:48:30 |
1992 |
30 |
Cash on Delivery |
50 off Promo |
130 |
50 |
0 |
6 |
6 |
C7265 |
R2777 |
2024-01-25 04:36:52 |
2024-01-25 05:27:52 |
439 |
20 |
Cash on Delivery |
10% |
92 |
27 |
150 |
This step gives a quick overview of the dataset. We’ll look at column types, basic statistics, and check for any missing data. This step is important as it helps us analyze the quality of data. The code used is:
# View column types and sample values
glimpse(food_data)
# Or get summary stats without crashing
summary(food_data)
# View first few rows
head(food_data)
# Count missing values
colSums(is.na(food_data))
The output of the above step is:
Rows: 1,000 Columns: 12 $ order_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, … $ customer_id <chr> "C8270", "C1860", "C6390", "C6191", "C6734", "C… $ restaurant_id <chr> "R2924", "R2054", "R2870", "R2642", "R2799", "R… $ order_date_and_time <chr> "2024-02-01 01:11:52", "2024-02-02 22:11:04", "… $ delivery_date_and_time <chr> "2024-02-01 02:39:52", "2024-02-02 22:46:04", "… $ order_value <int> 1914, 986, 937, 1463, 1992, 439, 303, 260, 1663… $ delivery_fee <int> 0, 40, 30, 50, 30, 20, 30, 0, 40, 40, 0, 20, 0,… $ payment_method <chr> "Credit Card", "Digital Wallet", "Cash on Deliv… $ discounts_and_offers <chr> "5% on App", "10%", "15% New User", "None", "50… $ commission_fee <int> 150, 198, 195, 146, 130, 92, 144, 55, 116, 189,… $ payment_processing_fee <int> 47, 23, 45, 27, 50, 27, 12, 19, 48, 10, 36, 36,… $ refunds_chargebacks <int> 0, 0, 0, 0, 0, 150, 50, 0, 0, 0, 0, 0, 0, 50, 0…
order_id customer_id restaurant_id order_date_and_time Min. : 1.0 Length:1000 Length:1000 Length:1000 1st Qu.: 250.8 Class :character Class :character Class :character Median : 500.5 Mode :character Mode :character Mode :character Mean : 500.5 3rd Qu.: 750.2 Max. :1000.0 delivery_date_and_time order_value delivery_fee payment_method Length:1000 Min. : 104.0 Min. : 0.00 Length:1000 Class :character 1st Qu.: 597.8 1st Qu.:20.00 Class :character Mode :character Median :1038.5 Median :30.00 Mode :character Mean :1054.0 Mean :28.62 3rd Qu.:1494.0 3rd Qu.:40.00 Max. :1995.0 Max. :50.00 discounts_and_offers commission_fee payment_processing_fee refunds_chargebacks Length:1000 Min. : 50 Min. :10.00 Min. : 0.0 Class :character 1st Qu.: 90 1st Qu.:20.00 1st Qu.: 0.0 Mode :character Median :127 Median :30.00 Median : 0.0 Mean :127 Mean :29.83 Mean : 28.3 3rd Qu.:164 3rd Qu.:40.00 3rd Qu.: 50.0 Max. :200 Max. :50.00 Max. :150.0 |
order_id |
customer_id |
restaurant_id |
order_date_and_time |
delivery_date_and_time |
order_value |
delivery_fee |
payment_method |
discounts_and_offers |
commission_fee |
payment_processing_fee |
refunds_chargebacks |
|
<int> |
<chr> |
<chr> |
<chr> |
<chr> |
<int> |
<int> |
<chr> |
<chr> |
<int> |
<int> |
<int> |
|
1 |
1 |
C8270 |
R2924 |
2024-02-01 01:11:52 |
2024-02-01 02:39:52 |
1914 |
0 |
Credit Card |
5% on App |
150 |
47 |
0 |
2 |
2 |
C1860 |
R2054 |
2024-02-02 22:11:04 |
2024-02-02 22:46:04 |
986 |
40 |
Digital Wallet |
10% |
198 |
23 |
0 |
3 |
3 |
C6390 |
R2870 |
2024-01-31 05:54:35 |
2024-01-31 06:52:35 |
937 |
30 |
Cash on Delivery |
15% New User |
195 |
45 |
0 |
4 |
4 |
C6191 |
R2642 |
2024-01-16 22:52:49 |
2024-01-16 23:38:49 |
1463 |
50 |
Cash on Delivery |
None |
146 |
27 |
0 |
5 |
5 |
C6734 |
R2799 |
2024-01-29 01:19:30 |
2024-01-29 02:48:30 |
1992 |
30 |
Cash on Delivery |
50 off Promo |
130 |
50 |
0 |
6 |
6 |
C7265 |
R2777 |
2024-01-25 04:36:52 |
2024-01-25 05:27:52 |
439 |
20 |
Cash on Delivery |
10% |
92 |
27 |
150 |
Order_id 0 customer_id 0 restaurant_id 0 order_date_and_time 0 delivery_date_and_time 0 order_value 0 delivery_fee 0 payment_method 0 discounts_and_offers 0 commission_fee 0 payment_processing_fee 0 refunds_chargebacks 0 |
Here’s an Interesting R Project: Project On Gender Recognition Using Voice In R Language
To analyze delivery durations or time-based patterns, we need to convert the order_date_and_time and delivery_date_and_time columns from character strings to proper datetime objects using the lubridate package.
# Convert character strings to proper datetime format
library(lubridate)
food_data$order_date_and_time <- ymd_hms(food_data$order_date_and_time)
food_data$delivery_date_and_time <- ymd_hms(food_data$delivery_date_and_time)
# Check if it worked
str(food_data[c("order_date_and_time", "delivery_date_and_time")])
The output of the above step is:
'data.frame': 1000 obs. of 2 variables: $ order_date_and_time : POSIXct, format: "2024-02-01 01:11:52" "2024-02-02 22:11:04" ... $ delivery_date_and_time: POSIXct, format: "2024-02-01 02:39:52" "2024-02-02 22:46:04" ... |
Now that the order and delivery timestamps are in datetime format, we can compute the actual delivery duration for each order in minutes using the difftime() function. The code for this step is:
# Convert to datetime
library(lubridate)
food_data$order_date_and_time <- ymd_hms(food_data$order_date_and_time)
food_data$delivery_date_and_time <- ymd_hms(food_data$delivery_date_and_time)
# Calculate delivery time
food_data$delivery_time_mins <- as.numeric(difftime(
food_data$delivery_date_and_time,
food_data$order_date_and_time,
units = "mins"
))
# Summary of delivery durations
summary(food_data$delivery_time_mins)
The output of this step is:
Min. 1st Qu. Median Mean 3rd Qu. Max. 30.00 50.00 74.00 73.58 96.00 119.00 |
To understand customer behavior, we’ll extract the hour from each order's timestamp and visualize when orders are most frequent. This helps identify peak ordering times throughout the day. The code is:
# Extract the hour from order timestamps (0 to 23)
food_data$order_hour <- hour(food_data$order_date_and_time)
# Load ggplot2 for visualization (already part of tidyverse, but can be loaded again for clarity)
library(ggplot2)
# Create a bar plot of order frequency by hour
ggplot(food_data, aes(x = order_hour)) +
geom_bar(fill = "orange") + # Orange bars for visibility
labs(
title = "Order Volume by Hour of Day",
x = "Hour of Day (0–23)",
y = "Number of Orders"
)
The above code gives us a graph that shows the order volume by hour of day.
To see which restaurants received the most orders, we’ll count the number of orders per restaurant and visualize the top 10. This helps reveal which eateries are most popular on the platform. The code for this step is:
# Count the number of orders placed for each restaurant and select the top 10
top_restaurants <- food_data %>%
count(restaurant_id, sort = TRUE) %>%
top_n(10) # Get top 10 restaurants by order count
# Plot top 10 restaurants using a horizontal bar chart
ggplot(top_restaurants, aes(x = reorder(restaurant_id, n), y = n)) +
geom_bar(stat = "identity", fill = "purple") + # Purple bars
coord_flip() + # Flip coordinates for better readability
labs(
title = "Top 10 Restaurants by Number of Orders",
x = "Restaurant ID",
y = "Number of Orders"
)
The output for this step gives us the top 10 restaurants by number of orders.
Here’s a Fun R Project: Spotify Music Data Analysis Project in R
Understanding how order values are distributed gives insight into customer spending behavior. A histogram is useful to see the spread, concentration, and any unusual patterns in the order amounts. The code is given below:
# Plot a histogram to visualize how order values are distributed
ggplot(food_data, aes(x = order_value)) +
geom_histogram(fill = "steelblue", bins = 30, color = "black") + # Binned histogram
labs(
title = "Distribution of Order Values",
x = "Order Value (₹)", # X-axis label
y = "Number of Orders" # Y-axis label
)
The output for the above step gives us a graph of the distribution of order values.
Knowing which payment methods are most popular can help delivery services optimize for user preferences. This step visualizes how customers typically choose to pay for their orders. Here’s the code for this step:
# Count how many times each payment method was used
payment_counts <- food_data %>%
count(payment_method, sort = TRUE)
# Plot a bar chart of payment method usage
ggplot(payment_counts, aes(x = reorder(payment_method, n), y = n)) +
geom_bar(stat = "identity", fill = "darkcyan") + # Create bars based on count
coord_flip() + # Flip axes for better readability
labs(
title = "Preferred Payment Methods", # Chart title
x = "Payment Method", # X-axis label
y = "Number of Orders" # Y-axis label
)
The output of the above step gives us a graph that shows how many times each payment method was used.
Level Up With This R Project: Movie Rating Analysis Project in R
Discounts and promotional offers can influence customer behavior. In this step, we explore which discount types were most frequently applied to orders. Here’s the code:
# Count how many times each discount or offer was used
offer_counts <- food_data %>%
count(discounts_and_offers, sort = TRUE)
# Plot a bar chart of discount usage
ggplot(offer_counts, aes(x = reorder(discounts_and_offers, n), y = n)) +
geom_bar(stat = "identity", fill = "darkorange") + # Bar chart
coord_flip() + # Flip axes to make labels easier to read
labs(
title = "Most Used Discounts and Offers", # Chart title
x = "Discount Type", # X-axis label
y = "Number of Orders" # Y-axis label
)
The above code gives us a graph that shows how many times each discount or offer was used.
This step explores whether there’s a correlation between how much a customer spends and how long the delivery takes. A scatter plot with a linear trend line helps visualize this relationship. The code for this step is:
ggplot(food_data, aes(x = order_value, y = delivery_time_mins)) +
geom_point(alpha = 0.4, color = "tomato") +
geom_smooth(method = "lm", se = FALSE, color = "darkblue") +
labs(
title = "Order Value vs Delivery Time",
x = "Order Value (₹)",
y = "Delivery Time (minutes)"
)
The above code gives us a graph to show the relation between order value and delivery time.
The above graph means that:
In this Food Delivery Analysis project, we used R in Google Colab to explore and understand patterns in a real food order dataset.
We cleaned and prepared the data, extracted key time-based features, and conducted exploratory analysis through visualizations on delivery durations, order values, payment methods, and discount usage.
We also examined peak order times, top-performing restaurants, and the relationship between order value and delivery time. These insights help understand customer behavior, delivery efficiency, and service trends, valuable for improving logistics and enhancing customer experience in the food delivery ecosystem.
Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!
Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!
Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!
Colab Link:
https://colab.research.google.com/drive/12CCIF-GeDEFP0ORE2fjfrPjgu-NlpKDA
826 articles published
Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...
Speak with Data Science Expert
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources