Food Delivery Analysis Project Using R

By Rohit Sharma

Updated on Aug 04, 2025 | 15 min read | 1.31K+ views

Share:

This Food Delivery Analysis project in R will look into customer ordering patterns, delivery performance, and payment behavior using real-world food order data from New Delhi. 

By cleaning and analyzing the dataset in Google Colab, we will try to understand key metrics such as peak order hours, top restaurants, delivery duration trends, and preferred payment methods. 

We will also examine the correlation between order value and delivery time to understand logistical patterns. This project will use R libraries like tidyverse, ggplot2, and lubridate to build a solid foundation in data analysis and visualization.

Want a 6-Figure Career in Data Science? upGrad’s Expert-Led Data Science Courses Can Get You There, No Experience Needed! Start Learning Today, Thank Yourself Tomorrow.

Read This For: Top 25+ R Projects for Beginners to Boost Your Data Science Skills in 2025

How Much Time and Skill Does This Project Need?

This section will tell you how long the project will take and what skills you'll need to complete the Food Delivery Analysis project in R successfully.

Aspect

Details

Duration 2–4 hours (including exploration)
Difficulty Level Beginner
Skills Needed Basic R knowledge, data handling, plotting
Tools Used Google Colab (R), tidyverse, ggplot2
Output Visualizations and insights from food delivery data

Break Into AI’s Hottest Careers! Master Generative AI & Data Science with upGrad’s Power-Packed Certifications. No Tech Background? No Problem. Start Today & Future-Proof Your Career!

What to Know Before Starting the Food Delivery Analysis Project

Before you begin this Food Delivery Analysis project in R, it's helpful to be familiar with the following:

  • Basic understanding of how data frames work in R
  • Concepts of data cleaning, such as renaming columns and handling missing values
  • Working with date and time formats using lubridate
  • Creating simple plots using ggplot2
  • Logical thinking to frame data-driven questions and interpret insights

Tools and R Libraries You'll Work With in This Project

To complete the Food Delivery Analysis project in R, we’ll use various tools and R libraries for processes like data cleaning, exploration, and visualization. Here's a quick overview of what you'll be working with:

Tool/Library

Purpose

Google Colab (R runtime) Cloud-based coding environment for R
tidyverse Collection of packages for data manipulation and visualization (includes dplyr, ggplot2, etc.)
lubridate Handles date and time conversions and calculations
janitor Simplifies data cleaning and column name formatting
ggplot2 Used for creating powerful and flexible visualizations
skimr (Optional) Provides quick summaries of the dataset

Must Know: Best R Libraries Data Science: Tools for Analysis, Visualization & ML

How This Food Delivery Analysis Project Works: Step-by-Step Explanation

This section will walk through each part of the project with clear explanations, well-commented code, and meaningful insights to help you understand the process from start to finish.

Step 1: Configure Google Colab for R

To begin working with R in Google Colab, you'll first need to switch the default programming environment from Python to R. This enables you to write and execute R code directly within the notebook.

Here’s how to set it up:

  1. Open a new notebook on Google Colab
  2. Go to the top menu and click Runtime
  3. Choose Change runtime type
  4. In the "Language" dropdown, select R
  5. Click Save to apply the changes

Step 2: Install and Load the Required R Libraries

This step ensures that all the necessary R packages are installed and loaded into your Colab session. These libraries will help with data cleaning, exploration, visualization, and time manipulation throughout the project. Here’s the code:

# Install libraries (you only need to do this once)
install.packages("tidyverse")   # For data handling and visualization
install.packages("lubridate")   # For handling date and time
install.packages("skimr")       # For summary statistics and data overview
install.packages("janitor")     # For cleaning and standardizing column names

# Load the libraries
library(tidyverse)   # Loads core packages like dplyr and ggplot2
library(lubridate)   # Makes it easier to work with dates and times
library(skimr)       # Summarizes data in a reader-friendly format
library(janitor)     # Cleans up column names for consistency

The above step installs the required libraries and packages. The output is:

Installing package into ‘/usr/local/lib/R/site-library’

(as ‘lib’ is unspecified)

Installing package into ‘/usr/local/lib/R/site-library’

(as ‘lib’ is unspecified)

Installing package into ‘/usr/local/lib/R/site-library’

(as ‘lib’ is unspecified)

Installing package into ‘/usr/local/lib/R/site-library’

(as ‘lib’ is unspecified)

also installing the dependency ‘snakecase’

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──

 dplyr    1.1.4      readr    2.1.5

 forcats  1.0.0      stringr  1.5.1

 ggplot2  3.5.2      tibble   3.3.0

 lubridate 1.9.4      tidyr    1.3.1

 purrr    1.1.0     

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──

 dplyr::filter() masks stats::filter()

 dplyr::lag()    masks stats::lag()

Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: ‘janitor’

The following objects are masked from ‘package:stats’:

    chisq.test, fisher.test

Here’s an R Project For You: Car Data Analysis Project Using R

Step 3: Load the Dataset into Your R Environment

Now that your environment is set up, the next step is to upload and load your dataset into R. This code reads the CSV file you uploaded into a data frame and previews the first few rows to confirm it loaded correctly. The code for this step is:

# Load the uploaded file from Colab storage
food_data <- read.csv("food_orders_new_delhi (1).csv")  # Read CSV into a data frame

# See first few rows
head(food_data)  # Displays the top rows of the dataset

The output gives us a preview of our dataset:

 

Order.ID

Customer.ID

Restaurant.ID

Order.Date.and.Time

Delivery.Date.and.Time

Order.Value

Delivery.Fee

Payment.Method

Discounts.and.Offers

Commission.Fee

Payment.Processing.Fee

Refunds.Chargebacks

 

<int>

<chr>

<chr>

<chr>

<chr>

<int>

<int>

<chr>

<chr>

<int>

<int>

<int>

1

1

C8270

R2924

2024-02-01 01:11:52

2024-02-01 02:39:52

1914

0

Credit Card

5% on App

150

47

0

2

2

C1860

R2054

2024-02-02 22:11:04

2024-02-02 22:46:04

986

40

Digital Wallet

10%

198

23

0

3

3

C6390

R2870

2024-01-31 05:54:35

2024-01-31 06:52:35

937

30

Cash on Delivery

15% New User

195

45

0

4

4

C6191

R2642

2024-01-16 22:52:49

2024-01-16 23:38:49

1463

50

Cash on Delivery

None

146

27

0

5

5

C6734

R2799

2024-01-29 01:19:30

2024-01-29 02:48:30

1992

30

Cash on Delivery

50 off Promo

130

50

0

6

6

C7265

R2777

2024-01-25 04:36:52

2024-01-25 05:27:52

439

20

Cash on Delivery

10%

92

27

150

 

Step 4: Clean Column Names and Inspect the Dataset Structure

Before starting the analysis, it's important to standardize the column names and understand the structure of your dataset. This step uses the janitor package to clean column names and explores the data format and content. We use the code

# Clean column names to make them easier to work with
food_data <- janitor::clean_names(food_data)  # Converts column names to lowercase with underscores

# View structure of dataset
str(food_data)  # Shows data types and structure of each column

# Preview first few rows
head(food_data)  # Confirms changes and provides a sample of the dataset

We get the output as:

'data.frame': 1000 obs. of  12 variables:

 $ order_id              : int  1 2 3 4 5 6 7 8 9 10 ...

 $ customer_id           : chr  "C8270" "C1860" "C6390" "C6191" ...

 $ restaurant_id         : chr  "R2924" "R2054" "R2870" "R2642" ...

 $ order_date_and_time   : chr  "2024-02-01 01:11:52" "2024-02-02 22:11:04" "2024-01-31 05:54:35" "2024-01-16 22:52:49" ...

 $ delivery_date_and_time: chr  "2024-02-01 02:39:52" "2024-02-02 22:46:04" "2024-01-31 06:52:35" "2024-01-16 23:38:49" ...

 $ order_value           : int  1914 986 937 1463 1992 439 303 260 1663 491 ...

 $ delivery_fee          : int  0 40 30 50 30 20 30 0 40 40 ...

 $ payment_method        : chr  "Credit Card" "Digital Wallet" "Cash on Delivery" "Cash on Delivery" ...

 $ discounts_and_offers  : chr  "5% on App" "10%" "15% New User" "None" ...

 $ commission_fee        : int  150 198 195 146 130 92 144 55 116 189 ...

 $ payment_processing_fee: int  47 23 45 27 50 27 12 19 48 10 ...

 $ refunds_chargebacks   : int  0 0 0 0 0 150 50 0 0 0 ...

 

order_id

customer_id

restaurant_id

order_date_and_time

delivery_date_and_time

order_value

delivery_fee

payment_method

discounts_and_offers

commission_fee

payment_processing_fee

refunds_chargebacks

 

<int>

<chr>

<chr>

<chr>

<chr>

<int>

<int>

<chr>

<chr>

<int>

<int>

<int>

1

1

C8270

R2924

2024-02-01 01:11:52

2024-02-01 02:39:52

1914

0

Credit Card

5% on App

150

47

0

2

2

C1860

R2054

2024-02-02 22:11:04

2024-02-02 22:46:04

986

40

Digital Wallet

10%

198

23

0

3

3

C6390

R2870

2024-01-31 05:54:35

2024-01-31 06:52:35

937

30

Cash on Delivery

15% New User

195

45

0

4

4

C6191

R2642

2024-01-16 22:52:49

2024-01-16 23:38:49

1463

50

Cash on Delivery

None

146

27

0

5

5

C6734

R2799

2024-01-29 01:19:30

2024-01-29 02:48:30

1992

30

Cash on Delivery

50 off Promo

130

50

0

6

6

C7265

R2777

2024-01-25 04:36:52

2024-01-25 05:27:52

439

20

Cash on Delivery

10%

92

27

150

 

Step 5: Explore Data Types, Summary Stats, and Missing Values

This step gives a quick overview of the dataset. We’ll look at column types, basic statistics, and check for any missing data. This step is important as it helps us analyze the quality of data. The code used is:

# View column types and sample values
glimpse(food_data)
# Or get summary stats without crashing
summary(food_data)
# View first few rows
head(food_data)
# Count missing values
colSums(is.na(food_data))

The output of the above step is:

Rows: 1,000

Columns: 12

$ order_id               <int> 1234567891011121314

$ customer_id            <chr> "C8270""C1860""C6390""C6191""C6734""C…

$ restaurant_id          <chr> "R2924""R2054""R2870""R2642""R2799""R…

$ order_date_and_time    <chr> "2024-02-01 01:11:52""2024-02-02 22:11:04""…

$ delivery_date_and_time <chr> "2024-02-01 02:39:52""2024-02-02 22:46:04""…

$ order_value            <int> 1914986937146319924393032601663…

$ delivery_fee           <int> 0403050302030040400200,

$ payment_method         <chr> "Credit Card""Digital Wallet""Cash on Deliv…

$ discounts_and_offers   <chr> "5% on App""10%""15% New User""None""50…

$ commission_fee         <int> 1501981951461309214455116189,

$ payment_processing_fee <int> 472345275027121948103636,

$ refunds_chargebacks    <int> 0000015050000000500…

 

   order_id      customer_id        restaurant_id      order_date_and_time

 Min.   :   1.0   Length:1000        Length:1000        Length:1000        

 1st Qu.: 250.8   Class :character   Class :character   Class :character   

 Median : 500.5   Mode  :character   Mode  :character   Mode  :character   

 Mean   : 500.5                                                            

 3rd Qu.: 750.2                                                            

 Max.   :1000.0                                                            

 delivery_date_and_time  order_value      delivery_fee   payment_method    

 Length:1000            Min.   : 104.0   Min.   : 0.00   Length:1000       

 Class :character       1st Qu.: 597.8   1st Qu.:20.00   Class :character  

 Mode  :character       Median :1038.5   Median :30.00   Mode  :character  

                        Mean   :1054.0   Mean   :28.62                     

                        3rd Qu.:1494.0   3rd Qu.:40.00                     

                        Max.   :1995.0   Max.   :50.00                     

 discounts_and_offers commission_fee payment_processing_fee refunds_chargebacks

 Length:1000          Min.   : 50    Min.   :10.00          Min.   :  0.0      

 Class :character     1st Qu.: 90    1st Qu.:20.00          1st Qu.:  0.0      

 Mode  :character     Median :127    Median :30.00          Median :  0.0      

                      Mean   :127    Mean   :29.83          Mean   : 28.3      

                      3rd Qu.:164    3rd Qu.:40.00          3rd Qu.: 50.0      

                      Max.   :200    Max.   :50.00          Max.   :150.0   

 

order_id

customer_id

restaurant_id

order_date_and_time

delivery_date_and_time

order_value

delivery_fee

payment_method

discounts_and_offers

commission_fee

payment_processing_fee

refunds_chargebacks

 

<int>

<chr>

<chr>

<chr>

<chr>

<int>

<int>

<chr>

<chr>

<int>

<int>

<int>

1

1

C8270

R2924

2024-02-01 01:11:52

2024-02-01 02:39:52

1914

0

Credit Card

5% on App

150

47

0

2

2

C1860

R2054

2024-02-02 22:11:04

2024-02-02 22:46:04

986

40

Digital Wallet

10%

198

23

0

3

3

C6390

R2870

2024-01-31 05:54:35

2024-01-31 06:52:35

937

30

Cash on Delivery

15% New User

195

45

0

4

4

C6191

R2642

2024-01-16 22:52:49

2024-01-16 23:38:49

1463

50

Cash on Delivery

None

146

27

0

5

5

C6734

R2799

2024-01-29 01:19:30

2024-01-29 02:48:30

1992

30

Cash on Delivery

50 off Promo

130

50

0

6

6

C7265

R2777

2024-01-25 04:36:52

2024-01-25 05:27:52

439

20

Cash on Delivery

10%

92

27

150

 

Order_id 0 customer_id 0 restaurant_id 0 order_date_and_time 0 delivery_date_and_time 0 order_value 0 delivery_fee 0 payment_method 0 discounts_and_offers 0 commission_fee 0 payment_processing_fee 0 refunds_chargebacks 0

Here’s an Interesting R Project: Project On Gender Recognition Using Voice In R Language

Step 6: Convert Order and Delivery Times to Datetime Format

To analyze delivery durations or time-based patterns, we need to convert the order_date_and_time and delivery_date_and_time columns from character strings to proper datetime objects using the lubridate package.

# Convert character strings to proper datetime format
library(lubridate)

food_data$order_date_and_time <- ymd_hms(food_data$order_date_and_time)
food_data$delivery_date_and_time <- ymd_hms(food_data$delivery_date_and_time)

# Check if it worked
str(food_data[c("order_date_and_time", "delivery_date_and_time")])

The output of the above step is:

'data.frame': 1000 obs. of  2 variables:

 $ order_date_and_time   : POSIXct, format: "2024-02-01 01:11:52" "2024-02-02 22:11:04" ...

 $ delivery_date_and_time: POSIXct, format: "2024-02-01 02:39:52" "2024-02-02 22:46:04" ...

Step 7: Calculate Delivery Time in Minutes

Now that the order and delivery timestamps are in datetime format, we can compute the actual delivery duration for each order in minutes using the difftime() function. The code for this step is:

# Convert to datetime
library(lubridate)
food_data$order_date_and_time <- ymd_hms(food_data$order_date_and_time)
food_data$delivery_date_and_time <- ymd_hms(food_data$delivery_date_and_time)
# Calculate delivery time
food_data$delivery_time_mins <- as.numeric(difftime(
  food_data$delivery_date_and_time,
  food_data$order_date_and_time,
  units = "mins"
))
# Summary of delivery durations
summary(food_data$delivery_time_mins)

The output of this step is:

Min.   1st Qu.  Median Mean  3rd Qu.  Max.

30.00 50.00    74.00   73.58   96.00  119.00 

Step 8: Analyze Order Volume by Hour of the Day

To understand customer behavior, we’ll extract the hour from each order's timestamp and visualize when orders are most frequent. This helps identify peak ordering times throughout the day. The code is:

# Extract the hour from order timestamps (0 to 23)
food_data$order_hour <- hour(food_data$order_date_and_time)

# Load ggplot2 for visualization (already part of tidyverse, but can be loaded again for clarity)
library(ggplot2)

# Create a bar plot of order frequency by hour
ggplot(food_data, aes(x = order_hour)) +
  geom_bar(fill = "orange") +  # Orange bars for visibility
  labs(
    title = "Order Volume by Hour of Day",
    x = "Hour of Day (0–23)",
    y = "Number of Orders"
  )

The above code gives us a graph that shows the order volume by hour of day.

Step 9: Identify Top 10 Most Ordered-From Restaurants

To see which restaurants received the most orders, we’ll count the number of orders per restaurant and visualize the top 10. This helps reveal which eateries are most popular on the platform. The code for this step is:

# Count the number of orders placed for each restaurant and select the top 10
top_restaurants <- food_data %>%
  count(restaurant_id, sort = TRUE) %>%
  top_n(10)  # Get top 10 restaurants by order count

# Plot top 10 restaurants using a horizontal bar chart
ggplot(top_restaurants, aes(x = reorder(restaurant_id, n), y = n)) +
  geom_bar(stat = "identity", fill = "purple") +  # Purple bars
  coord_flip() +  # Flip coordinates for better readability
  labs(
    title = "Top 10 Restaurants by Number of Orders",
    x = "Restaurant ID",
    y = "Number of Orders"
  )

The output for this step gives us the top 10 restaurants by number of orders.

background

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Here’s a Fun R Project: Spotify Music Data Analysis Project in R

Step 10: Visualize the Distribution of Order Values

Understanding how order values are distributed gives insight into customer spending behavior. A histogram is useful to see the spread, concentration, and any unusual patterns in the order amounts. The code is given below:

# Plot a histogram to visualize how order values are distributed
ggplot(food_data, aes(x = order_value)) +
  geom_histogram(fill = "steelblue", bins = 30, color = "black") +  # Binned histogram
  labs(
    title = "Distribution of Order Values",
    x = "Order Value (₹)",       # X-axis label
    y = "Number of Orders"       # Y-axis label
  )

The output for the above step gives us a graph of the distribution of order values.

Step 11: Analyze Preferred Payment Methods

Knowing which payment methods are most popular can help delivery services optimize for user preferences. This step visualizes how customers typically choose to pay for their orders. Here’s the code for this step:

# Count how many times each payment method was used
payment_counts <- food_data %>%
  count(payment_method, sort = TRUE)

# Plot a bar chart of payment method usage
ggplot(payment_counts, aes(x = reorder(payment_method, n), y = n)) +
  geom_bar(stat = "identity", fill = "darkcyan") +  # Create bars based on count
  coord_flip() +  # Flip axes for better readability
  labs(
    title = "Preferred Payment Methods",  # Chart title
    x = "Payment Method",                 # X-axis label
    y = "Number of Orders"                # Y-axis label
  )

The output of the above step gives us a graph that shows how many times each payment method was used.

Level Up With This R Project: Movie Rating Analysis Project in R

Step 12: Explore Usage of Discounts and Offers

Discounts and promotional offers can influence customer behavior. In this step, we explore which discount types were most frequently applied to orders. Here’s the code:

# Count how many times each discount or offer was used
offer_counts <- food_data %>%
  count(discounts_and_offers, sort = TRUE)

# Plot a bar chart of discount usage
ggplot(offer_counts, aes(x = reorder(discounts_and_offers, n), y = n)) +
  geom_bar(stat = "identity", fill = "darkorange") +  # Bar chart
  coord_flip() +  # Flip axes to make labels easier to read
  labs(
    title = "Most Used Discounts and Offers",  # Chart title
    x = "Discount Type",                       # X-axis label
    y = "Number of Orders"                     # Y-axis label
  )

The above code gives us a graph that shows how many times each discount or offer was used.

Step 13: Analyze the Relationship Between Order Value and Delivery Time

This step explores whether there’s a correlation between how much a customer spends and how long the delivery takes. A scatter plot with a linear trend line helps visualize this relationship. The code for this step is:

ggplot(food_data, aes(x = order_value, y = delivery_time_mins)) +
  geom_point(alpha = 0.4, color = "tomato") +
  geom_smooth(method = "lm", se = FALSE, color = "darkblue") +
  labs(
    title = "Order Value vs Delivery Time",
    x = "Order Value (₹)",
    y = "Delivery Time (minutes)"
  )

The above code gives us a graph to show the relation between order value and delivery time.

The above graph means that:

  • Most orders are scattered, with no strong pattern
    The red dots are spread out, meaning delivery times vary a lot, even for orders with similar values.
  • Slight trend: Higher order value = Faster delivery
    There’s a blue line (trend line) going slightly downward, which suggests that expensive orders might be delivered a little faster, but not always.
  • Other factors likely affect delivery time
    Since the dots are scattered, it's clear that order value isn’t the only thing that affects how fast an order gets delivered. Things like distance, traffic, or location may matter more.

Conclusion

In this Food Delivery Analysis project, we used R in Google Colab to explore and understand patterns in a real food order dataset. 

We cleaned and prepared the data, extracted key time-based features, and conducted exploratory analysis through visualizations on delivery durations, order values, payment methods, and discount usage.

We also examined peak order times, top-performing restaurants, and the relationship between order value and delivery time. These insights help understand customer behavior, delivery efficiency, and service trends, valuable for improving logistics and enhancing customer experience in the food delivery ecosystem.

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Colab Link:
https://colab.research.google.com/drive/12CCIF-GeDEFP0ORE2fjfrPjgu-NlpKDA

Frequently Asked Questions (FAQs)

1. What are the benefits of using R for food delivery data analysis?

2. What kind of insights can you extract from a food delivery dataset?

3. Is this project beginner-friendly for someone new to R?

4. Can this analysis be extended using machine learning?

5. What are some other project ideas to explore after this one?

Rohit Sharma

826 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

360° Career Support

Executive PG Program

12 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Double Credentials

Master's Degree

17 Months

upGrad Logo

Certification

3 Months