Student Performance Analysis In R With Code and Explanation
By Rohit Sharma
Updated on Aug 05, 2025 | 20 min read | 1.2K+ views
Share:
For working professionals
For fresh graduates
More
By Rohit Sharma
Updated on Aug 05, 2025 | 20 min read | 1.2K+ views
Share:
Table of Contents
This Student Performance Analysis in R project will focus on key factors that influence students’ final grades using a dataset of Portuguese secondary school students. We'll use Google Colab to run the project.
The project includes data cleaning, visual exploration, correlation analysis, and a simple linear regression model to predict student performance based on features like prior grades, study time, and absences.
Shape tomorrow with upGrad’s Data Science programs. Build practical skills in AI, Machine Learning, and Data Analytics for the next generation of tech leaders. Enrol now and fast-track your career.
Build Your Data Science Ambitions: Top 25+ R Projects for Beginners to Boost Your Data Science Skills in 2025
This Student Performance Analysis in R project is beginner-friendly and easy to complete in one sitting. The skills and timeline of this project are given in the table below:
Aspect |
Details |
Estimated Duration | 1.5 to 2 hours |
Difficulty Level | Easy to Moderate |
Skill Level Needed | Beginner in R and basic data analysis |
Tools Required | Google Colab, R, ggplot2, corrplot, dplyr |
Project Type | Exploratory Data Analysis + Regression |
Take charge of your future with upGrad’s Data Science and AI programs. Learn from industry experts, master cutting-edge tools, and build a career that stands out in the AI-driven world. Enrol today and get ahead.
To get the most out of this Student Performance Analysis in R project, it's helpful to have a basic understanding of a few core concepts. While the steps are beginner-friendly, being familiar with the following will ensure a smoother learning experience:
This project is entirely built in R using Google Colab, which allows you to run R code without installing anything on your local machine. We'll use a few essential R libraries to clean data, explore patterns, visualize relationships, and build a regression model.
Category |
Name / Package |
Purpose |
Platform | Google Colab (R kernel) | Run and share R code in the cloud |
Programming Language | R | Perform data manipulation, analysis, and modeling |
Data Wrangling | dplyr, tidyverse | Filter, select, transform, and manage data |
Visualization | ggplot2, corrplot | Create plots, graphs, and correlation matrices |
Data Summary | skimr (optional) | Get quick overviews of datasets |
Modeling | Base R (lm) | Build and evaluate linear regression models |
This section will break down the entire project into individual steps to help you understand the concepts of data analysis and modeling used in this project.
Google Colab runs Python by default, so we first need to switch the environment to R. This allows you to write and execute R code directly in the notebook.
To set it up:
Before starting data analysis, we need to install and load the libraries that will help with data cleaning, visualization, and correlation analysis. We only need to install the packages once; after that, simply load them each time you run the notebook. The code to install and load the libraries is given below:
# Install required packages (only needed once, skip if already installed)
install.packages("tidyverse") # Collection of packages for data manipulation and visualization
install.packages("skimr") # Provides an overview of dataset structure and summaries
install.packages("corrplot") # Helps in visualizing correlation matrices
# Load the libraries into the current session
library(tidyverse) # Loads ggplot2, dplyr, readr, and other useful packages
library(skimr) # Useful for summarizing datasets quickly
library(corrplot) # Used to draw correlation plots
The above code installs and loads the required libraries. The output is:
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
corrplot 0.95 loaded
Also Read: R For Data Science: Why Should You Choose R for Data Science?
Now that the libraries are ready, the next step is to bring your data into the notebook. We'll load the uploaded CSV file and preview the first few records to understand its structure. Here’s the code
# Load the uploaded dataset using its filename
student_data <- read.csv("student-por.csv")
# Display the first six rows to get a quick look at the data
head(student_data)
This gives us an overview of the dataset we’re working with. The output for the above code is:
school |
sex |
age |
address |
famsize |
Pstatus |
Medu |
Fedu |
Mjob |
Fjob |
⋯ |
famrel |
freetime |
goout |
Dalc |
Walc |
health |
absences |
G1 |
G2 |
G3 |
|
<chr> |
<chr> |
<int> |
<chr> |
<chr> |
<chr> |
<int> |
<int> |
<chr> |
<chr> |
⋯ |
<int> |
<int> |
<int> |
<int> |
<int> |
<int> |
<int> |
<int> |
<int> |
<int> |
|
1 |
GP |
F |
18 |
U |
GT3 |
A |
4 |
4 |
at_home |
teacher |
⋯ |
4 |
3 |
4 |
1 |
1 |
3 |
4 |
0 |
11 |
11 |
2 |
GP |
F |
17 |
U |
GT3 |
T |
1 |
1 |
at_home |
other |
⋯ |
5 |
3 |
3 |
1 |
1 |
3 |
2 |
9 |
11 |
11 |
3 |
GP |
F |
15 |
U |
LE3 |
T |
1 |
1 |
at_home |
other |
⋯ |
4 |
3 |
2 |
2 |
3 |
3 |
6 |
12 |
13 |
12 |
4 |
GP |
F |
15 |
U |
GT3 |
T |
4 |
2 |
health |
services |
⋯ |
3 |
2 |
2 |
1 |
1 |
5 |
0 |
14 |
14 |
14 |
5 |
GP |
F |
16 |
U |
GT3 |
T |
3 |
3 |
other |
other |
⋯ |
4 |
3 |
2 |
1 |
2 |
5 |
0 |
11 |
13 |
13 |
6 |
GP |
M |
16 |
U |
LE3 |
T |
4 |
3 |
services |
other |
⋯ |
5 |
4 |
2 |
1 |
2 |
5 |
6 |
12 |
12 |
13 |
Some column names may contain dots (.), which can make referencing them in code a bit tricky. Replacing them with underscores (_) makes the column names easier to work with. Here’s the code:
# Replace all dots in column names with underscores for cleaner access
colnames(student_data) <- gsub("\\.", "_", colnames(student_data))
# Display the cleaned column names
colnames(student_data)
The above code cleans the dataset. The output for the above code is:
'school'
'sex'
'age'
'address'
'famsize'
'Pstatus'
'Medu'
'Fedu'
'Mjob'
'Fjob'
'reason'
'guardian'
'traveltime'
'studytime'
'failures'
'schoolsup'
'famsup'
'paid'
'activities'
'nursery'
'higher'
'internet'
'romantic'
'famrel'
'freetime'
'goout'
'Dalc'
'Walc'
'health'
'absences'
'G1'
'G2'
'G3'
In this step, we'll examine the overall structure of the dataset, summarize its contents, and check for any missing values. This helps us understand what we're working with and identify any cleanup needed before deeper analysis. The code for this step is:
# View the structure of the dataset: shows data types and sample values
str(student_data)
# Get summary statistics for each column (min, max, mean, median, etc.)
summary(student_data)
# Load dplyr for data manipulation
library(dplyr)
# Check how many missing values exist in each column
student_data %>%
summarise_all(~sum(is.na(.))) %>% # Count NAs in each column
pivot_longer(cols = everything(), # Convert to long format
names_to = "Column",
values_to = "Missing_Values") %>%
filter(Missing_Values > 0) # Show only columns with missing data
The output for the above code is:
'data.frame': 649 obs. of 33 variables:
$ school : chr "GP" "GP" "GP" "GP" ...
$ sex : chr "F" "F" "F" "F" ...
$ age : int 18 17 15 15 16 16 16 17 15 15 ...
$ address : chr "U" "U" "U" "U" ...
$ famsize : chr "GT3" "GT3" "LE3" "GT3" ...
$ Pstatus : chr "A" "T" "T" "T" ...
$ Medu : int 4 1 1 4 3 4 2 4 3 3 ...
$ Fedu : int 4 1 1 2 3 3 2 4 2 4 ...
$ Mjob : chr "at_home" "at_home" "at_home" "health" ...
$ Fjob : chr "teacher" "other" "other" "services" ...
$ reason : chr "course" "course" "other" "home" ...
$ guardian : chr "mother" "father" "mother" "mother" ...
$ traveltime: int 2 1 1 1 1 1 1 2 1 1 ...
$ studytime : int 2 2 2 3 2 2 2 2 2 2 ...
$ failures : int 0 0 0 0 0 0 0 0 0 0 ...
$ schoolsup : chr "yes" "no" "yes" "no" ...
$ famsup : chr "no" "yes" "no" "yes" ...
$ paid : chr "no" "no" "no" "no" ...
$ activities: chr "no" "no" "no" "yes" ...
$ nursery : chr "yes" "no" "yes" "yes" ...
$ higher : chr "yes" "yes" "yes" "yes" ...
$ internet : chr "no" "yes" "yes" "yes" ...
$ romantic : chr "no" "no" "no" "yes" ...
$ famrel : int 4 5 4 3 4 5 4 4 4 5 ...
$ freetime : int 3 3 3 2 3 4 4 1 2 5 ...
$ goout : int 4 3 2 2 2 2 4 4 2 1 ...
$ Dalc : int 1 1 2 1 1 1 1 1 1 1 ...
$ Walc : int 1 1 3 1 2 2 1 1 1 1 ...
$ health : int 3 3 3 5 5 5 3 1 1 5 ...
$ absences : int 4 2 6 0 0 6 0 2 0 0 ...
$ G1 : int 0 9 12 14 11 12 13 10 15 12 ...
$ G2 : int 11 11 13 14 13 12 12 13 16 12 ...
$ G3 : int 11 11 12 14 13 13 13 13 17 13 ...
school sex age address
Length:649 Length:649 Min. :15.00 Length:649
Class :character Class :character 1st Qu.:16.00 Class :character
Mode :character Mode :character Median :17.00 Mode :character
Mean :16.74
3rd Qu.:18.00
Max. :22.00
famsize Pstatus Medu Fedu
Length:649 Length:649 Min. :0.000 Min. :0.000
Class :character Class :character 1st Qu.:2.000 1st Qu.:1.000
Mode :character Mode :character Median :2.000 Median :2.000
Mean :2.515 Mean :2.307
3rd Qu.:4.000 3rd Qu.:3.000
Max. :4.000 Max. :4.000
Mjob Fjob reason guardian
Length:649 Length:649 Length:649 Length:649
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
traveltime studytime failures schoolsup
Min. :1.000 Min. :1.000 Min. :0.0000 Length:649
1st Qu.:1.000 1st Qu.:1.000 1st Qu.:0.0000 Class :character
Median :1.000 Median :2.000 Median :0.0000 Mode :character
Mean :1.569 Mean :1.931 Mean :0.2219
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:0.0000
Max. :4.000 Max. :4.000 Max. :3.0000
famsup paid activities nursery
Length:649 Length:649 Length:649 Length:649
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
higher internet romantic famrel
Length:649 Length:649 Length:649 Min. :1.000
Class :character Class :character Class :character 1st Qu.:4.000
Mode :character Mode :character Mode :character Median :4.000
Mean :3.931
3rd Qu.:5.000
Max. :5.000
freetime goout Dalc Walc health
Min. :1.00 Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.000
1st Qu.:3.00 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.00 1st Qu.:2.000
Median :3.00 Median :3.000 Median :1.000 Median :2.00 Median :4.000
Mean :3.18 Mean :3.185 Mean :1.502 Mean :2.28 Mean :3.536
3rd Qu.:4.00 3rd Qu.:4.000 3rd Qu.:2.000 3rd Qu.:3.00 3rd Qu.:5.000
Max. :5.00 Max. :5.000 Max. :5.000 Max. :5.00 Max. :5.000
absences G1 G2 G3
Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0.00
1st Qu.: 0.000 1st Qu.:10.0 1st Qu.:10.00 1st Qu.:10.00
Median : 2.000 Median :11.0 Median :11.00 Median :12.00
Mean : 3.659 Mean :11.4 Mean :11.57 Mean :11.91
3rd Qu.: 6.000 3rd Qu.:13.0 3rd Qu.:13.00 3rd Qu.:14.00
Max. :32.000 Max. :19.0 Max. :19.00 Max. :19.00
Column
Missing_Values
<chr>
<int>
Here’s a Must Build R Project: Trend Analysis Project on COVID-19 using R
While the previous step filtered only columns with missing values, here we’re doing a quick scan of all columns to count how many missing values each one contains, whether it's zero or more. Here’s the code:
# Count and display the number of missing values in each column
colSums(is.na(student_data))
The output for the above code shows the number of missing values in each column.
School 0 sex 0 age 0 address 0 famsize 0 Pstatus 0 Medu 0 Fedu 0 Mjob 0 Fjob 0 reason 0 guardian 0 traveltime 0 studytime 0 failures 0 schoolsup 0 famsup 0 paid 0 activities 0 nursery 0 higher 0 internet 0 romantic 0 famrel 0 freetime 0 goout 0 Dalc 0 Walc 0 health 0 absences 0 G1 0 G2 0 G3 0
To create clear and informative plots, we'll use the ggplot2 package, which is part of the tidyverse collection. This package will help us visualize trends and patterns in the student data effectively. The code to load ggplot2 is:
# Load the ggplot2 package for creating data visualizations
library(ggplot2)
Build This R Project: Natural Disaster Prediction Analysis Project in R
In this step, we'll create a histogram to see how students' final grades (G3) are distributed. This gives us a quick view of whether most students performed well, poorly, or fell in the middle. Here’s the code:
# Create a histogram to visualize how the final grades (G3) are distributed
ggplot(student_data, aes(x = G3)) +
geom_histogram(binwidth = 1, fill = "skyblue", color = "black") + # Set bar color and outline
labs(title = "Distribution of Final Grades (G3)", # Add a chart title
x = "Final Grade", # Label for x-axis
y = "Number of Students") # Label for y-axis
The above code gives us an output of the graph of the distribution of grades in G3.
Popular Data Science Programs
The above graph shows:
This section visualizes how access to the internet at home might impact students’ final grades. Using a boxplot, we can observe whether students with or without internet access tend to perform better academically. Here’s the code:
ggplot(student_data, aes(x = internet, y = G3, fill = internet)) +
geom_boxplot() +
labs(title = "Final Grades by Internet Access",
x = "Internet Access at Home",
y = "Final Grade")
The above code gives us a graph showing the grades by gender.
The above graph shows that:
This step explores whether students who dedicate more weekly time to studying tend to perform better in their final grades. Here’s the code:
ggplot(student_data, aes(x = factor(studytime), y = G3, fill = factor(studytime))) +
geom_boxplot() +
labs(title = "Study Time vs Final Grade",
x = "Study Time (1 = <2 hrs, 4 = >10 hrs)",
y = "Final Grade")
The above code gives the output:
The above plot shows that:
New to R? Here’s A Fun R Project: Car Data Analysis Project Using R
This step explores whether having internet access at home affects students' academic performance. We'll use a boxplot to compare the final grades (G3) of students who have internet access versus those who do not. Here’s the code:
ggplot(student_data, aes(x = internet, y = G3, fill = internet)) +
geom_boxplot() +
labs(title = "Internet Access vs Final Grade",
x = "Internet Access (Yes/No)",
y = "Final Grade")
The above code gives us the graph:
The above plot shows that:
To visualize relationships between numeric variables, we'll use the corrplot package. This package creates an easy-to-read graphical representation of correlations in a dataset. We'll install and load it before plotting. Here’s the code:
# Install corrplot if not already installed
install.packages("corrplot")
# Load the library
library(corrplot)
Before creating a correlation plot, we need to isolate only the numeric columns from the dataset. This ensures the correlation matrix is accurate and relevant. We'll use select_if(is.numeric) from dplyr for this. Here’s the code:
# Filter numeric columns
numeric_data <- student_data %>% select_if(is.numeric)
# View column names
colnames(numeric_data)
The output for the above code is:
'age'
'Medu'
'Fedu'
'traveltime'
'studytime'
'failures'
'famrel'
'freetime'
'goout'
'Dalc'
'Walc'
'health'
'absences'
'G1'
'G2'
'G3'
Here’s a Fun R Project For You: Player Performance Analysis & Prediction Using R
Now that we've isolated the numeric columns, we calculate the correlation matrix to understand how these variables relate to each other. This matrix shows the strength and direction of linear relationships between pairs of numeric features. Here’s the code:
# Calculate correlation between numeric features
cor_matrix <- cor(numeric_data)
# View the correlation matrix rounded to 2 decimal places
round(cor_matrix, 2)
The output for the above step is:
age |
Medu |
Fedu |
traveltime |
studytime |
failures |
famrel |
freetime |
goout |
Dalc |
Walc |
health |
absences |
G1 |
G2 |
G3 |
|
age |
1.00 |
-0.11 |
-0.12 |
0.03 |
-0.01 |
0.32 |
-0.02 |
0.00 |
0.11 |
0.13 |
0.09 |
-0.01 |
0.15 |
-0.17 |
-0.11 |
-0.11 |
Medu |
-0.11 |
1.00 |
0.65 |
-0.27 |
0.10 |
-0.17 |
0.02 |
-0.02 |
0.01 |
-0.01 |
-0.02 |
0.00 |
-0.01 |
0.26 |
0.26 |
0.24 |
Fedu |
-0.12 |
0.65 |
1.00 |
-0.21 |
0.05 |
-0.17 |
0.02 |
0.01 |
0.03 |
0.00 |
0.04 |
0.04 |
0.03 |
0.22 |
0.23 |
0.21 |
traveltime |
0.03 |
-0.27 |
-0.21 |
1.00 |
-0.06 |
0.10 |
-0.01 |
0.00 |
0.06 |
0.09 |
0.06 |
-0.05 |
-0.01 |
-0.15 |
-0.15 |
-0.13 |
studytime |
-0.01 |
0.10 |
0.05 |
-0.06 |
1.00 |
-0.15 |
0.00 |
-0.07 |
-0.08 |
-0.14 |
-0.21 |
-0.06 |
-0.12 |
0.26 |
0.24 |
0.25 |
failures |
0.32 |
-0.17 |
-0.17 |
0.10 |
-0.15 |
1.00 |
-0.06 |
0.11 |
0.05 |
0.11 |
0.08 |
0.04 |
0.12 |
-0.38 |
-0.39 |
-0.39 |
famrel |
-0.02 |
0.02 |
0.02 |
-0.01 |
0.00 |
-0.06 |
1.00 |
0.13 |
0.09 |
-0.08 |
-0.09 |
0.11 |
-0.09 |
0.05 |
0.09 |
0.06 |
freetime |
0.00 |
-0.02 |
0.01 |
0.00 |
-0.07 |
0.11 |
0.13 |
1.00 |
0.35 |
0.11 |
0.12 |
0.08 |
-0.02 |
-0.09 |
-0.11 |
-0.12 |
goout |
0.11 |
0.01 |
0.03 |
0.06 |
-0.08 |
0.05 |
0.09 |
0.35 |
1.00 |
0.25 |
0.39 |
-0.02 |
0.09 |
-0.07 |
-0.08 |
-0.09 |
Dalc |
0.13 |
-0.01 |
0.00 |
0.09 |
-0.14 |
0.11 |
-0.08 |
0.11 |
0.25 |
1.00 |
0.62 |
0.06 |
0.17 |
-0.20 |
-0.19 |
-0.20 |
Walc |
0.09 |
-0.02 |
0.04 |
0.06 |
-0.21 |
0.08 |
-0.09 |
0.12 |
0.39 |
0.62 |
1.00 |
0.11 |
0.16 |
-0.16 |
-0.16 |
-0.18 |
health |
-0.01 |
0.00 |
0.04 |
-0.05 |
-0.06 |
0.04 |
0.11 |
0.08 |
-0.02 |
0.06 |
0.11 |
1.00 |
-0.03 |
-0.05 |
-0.08 |
-0.10 |
absences |
0.15 |
-0.01 |
0.03 |
-0.01 |
-0.12 |
0.12 |
-0.09 |
-0.02 |
0.09 |
0.17 |
0.16 |
-0.03 |
1.00 |
-0.15 |
-0.12 |
-0.09 |
G1 |
-0.17 |
0.26 |
0.22 |
-0.15 |
0.26 |
-0.38 |
0.05 |
-0.09 |
-0.07 |
-0.20 |
-0.16 |
-0.05 |
-0.15 |
1.00 |
0.86 |
0.83 |
G2 |
-0.11 |
0.26 |
0.23 |
-0.15 |
0.24 |
-0.39 |
0.09 |
-0.11 |
-0.08 |
-0.19 |
-0.16 |
-0.08 |
-0.12 |
0.86 |
1.00 |
0.92 |
G3 |
-0.11 |
0.24 |
0.21 |
-0.13 |
0.25 |
-0.39 |
0.06 |
-0.12 |
-0.09 |
-0.20 |
-0.18 |
-0.10 |
-0.09 |
0.83 |
0.92 |
1.00 |
To better understand the strength and direction of relationships between numeric variables, we use a correlation heatmap. This colorful plot helps us quickly identify strong positive or negative correlations among the features. Here’s the code:
# Create a colorful correlation plot
corrplot(cor_matrix, method = "color", type = "upper", tl.cex = 0.8)
The output for the above code is shown below:
In this step, we create a simple linear regression model to predict the final grade (G3) based on variables like first and second period grades (G1, G2), study time, number of failures, and absences. The model summary gives us coefficients, significance levels, and overall model performance. Here’s the code:
# Build a linear regression model
model <- lm(G3 ~ G1 + G2 + studytime + failures + absences, data = student_data)
# View model summary
summary(model)
The output for the above code is:
Call:
lm(formula = G3 ~ G1 + G2 + studytime + failures + absences, data = student_data)
Residuals:
Min 1Q Median 3Q Max
-9.0716 -0.4624 -0.0796 0.6346 5.8068
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.15519 0.25863 -0.600 0.54868
G1 0.13946 0.03623 3.849 0.00013 ***
G2 0.88571 0.03393 26.107 < 2e-16 ***
studytime 0.09670 0.06181 1.564 0.11820
failures -0.21829 0.09086 -2.402 0.01657 *
absences 0.02337 0.01079 2.165 0.03077 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.254 on 643 degrees of freedom
Multiple R-squared: 0.8506, Adjusted R-squared: 0.8494
F-statistic: 732 on 5 and 643 DF, p-value: < 2.2e-16
The above output means that:
Also Read: 18 Types of Regression in Machine Learning You Should Know
Now that we’ve trained the regression model, let’s use it to predict students’ final grades (G3). We’ll compare the predicted results with the actual values to see how closely the model performs. Here’s the code:
# Add predicted G3 values to the dataset
student_data$predicted_G3 <- predict(model, student_data)
# View first few actual vs predicted
head(student_data[, c("G3", "predicted_G3")])
The output gives us a table:
G3 |
predicted_G3 |
|
<int> |
<dbl> |
|
1 |
11 |
9.87447 |
2 |
11 |
11.08285 |
3 |
12 |
13.36611 |
4 |
14 |
14.48723 |
5 |
13 |
13.08645 |
6 |
13 |
12.48040 |
The above output shows:
This step creates a scatter plot that compares the actual final grades (G3) with the predicted grades from the regression model. A red dashed line indicates perfect prediction, where actual equals predicted. Points closer to this line represent better predictions. Here’s the code:
ggplot(student_data, aes(x = G3, y = predicted_G3)) +
geom_point(color = "blue", alpha = 0.6) +
geom_abline(intercept = 0, slope = 1, color = "red", linetype = "dashed") +
labs(title = "Actual vs Predicted Final Grades",
x = "Actual G3",
y = "Predicted G3")
The above code gives us a graph of the predicted vs actual grades.
The above output shows that:
In this Student Performance Analysis project, we built a linear regression model in R using Google Colab to predict students’ final grades (G3) based on features like first and second period grades (G1 and G2), study time, failures, and absences.
After preprocessing the data and exploring key relationships, we trained the model and evaluated its performance using R-squared and residual plots.
The model achieved an R-squared of 0.85, indicating that it explains 85% of the variance in final grades. Overall, the model shows strong predictive accuracy, especially when prior performance is considered.
Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!
Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!
Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!
Collab Link:
https://colab.research.google.com/drive/1XcU-XxV2j76DnWrfDdn7TpuTEh3xq9wi#scrollTo=jqe2jLRsgpUk
823 articles published
Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...
Speak with Data Science Expert
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources