Multiple Linear Regression in R: A Complete Guide
By Rohit Sharma
Updated on Jun 17, 2025 | 16 min read | 29.57K+ views
Share:
For working professionals
For fresh graduates
More
By Rohit Sharma
Updated on Jun 17, 2025 | 16 min read | 29.57K+ views
Share:
Table of Contents
Do you know? The accuracy of multiple regression models increases as the number of predictors rises from 4 to 10, with R² estimates improving markedly in larger, more complex datasets. This indicates that adding more relevant predictors helps the model explain more variability in the data, leading to better predictive accuracy. |
Performing multiple regression in R is widely preferred for analyzing complex relationships between multiple variables. For example, businesses use it to predict sales based on factors like advertising budget, location, and seasonality.
R's powerful statistical functions and extensive libraries make it easy to apply and interpret these models. It offers flexibility for handling large datasets and complex relationships, ensuring reliable results.
In this blog, you’ll explore how to perform multiple linear regression in R, with practical examples and visualizations to help you master the technique.
Multiple linear regression is a statistical analysis technique used to predict a variable’s outcome based on two or more variables. It is an extension of linear regression and also known as multiple regression. The variable to be predicted is the dependent variable, and the variables used to predict the value of the dependent variable are known as independent or explanatory variables.
R’s powerful statistical functions and extensive packages like lm() simplify model fitting, diagnostics, and data visualization. Its flexibility, open-source nature, and strong community support make it ideal for handling complex datasets and ensuring accurate, reliable results.
If you're looking to enhance your skills in data analysis and regression modeling, here are some top-rated courses to help you get there:
Now, let’s explore these steps in more detail:
The first step is to harness the data you’ll use for prediction. In our example, we need data that includes both the dependent variable (heart disease) and the independent variables (biking and smoking). This data should ideally reflect real-world scenarios and have enough variation to make accurate predictions.
For example, the dataset might look like:
Heart Disease |
Biking |
Smoking |
0 | 5 | 10 |
1 | 3 | 15 |
0 | 8 | 5 |
1 | 2 | 20 |
... | ... | ... |
Also Read: Data Collection for Mining: Your Essential Guide
Once you have your data, you need to import it into R for analysis. In R, you can use the read.csv() function to load data from a CSV file into a data frame.
heart.data <- read.csv("path_to_file/heart_data.csv")
This loads the data into the heart.data data frame, where you can start working on it.
Also Read: Top 7 Data Extraction Tools in the Market [Hand-picked]
Before applying the regression model, it's important to ensure that there is a linear relationship between the dependent variable (heart disease) and the independent variables (biking and smoking). This can be done by creating scatter plots to visually inspect the relationship or using statistical tests.
plot(heart.data$biking, heart.data$heart.disease, main="Biking vs Heart Disease", xlab="Biking", ylab="Heart Disease")
plot(heart.data$smoking, heart.data$heart.disease, main="Smoking vs Heart Disease", xlab="Smoking", ylab="Heart Disease")
Output:
If both plots display linear patterns, it implies that a linear regression model could be an appropriate choice for modeling the relationship between the variables.
Also Read: Top 15 Types of Data Visualization: Benefits and How to Choose the Right Tool for Your Needs in 2025
Once you've confirmed linearity, the next step is to apply the Multiple Linear Regression model using the lm() function in R. This function will calculate the regression equation, determining how much each independent variable contributes to the prediction of the dependent variable.
lm_model <- lm(heart.disease ~ biking + smoking, data = heart.data)
summary(lm_model)
The summary() function provides key results from the regression analysis, including the coefficients (intercept, biking, smoking), R², and p-values for hypothesis testing. This tells you how well biking and smoking predict heart disease.
After building the regression model, you can use it to make predictions. For example, if you have new data (e.g., someone who bikes 4 times a week and smokes 12 cigarettes daily), you can predict their likelihood of developing heart disease using the model.
new_data <- data.frame(biking = 4, smoking = 12)
predicted_heart_disease <- predict(lm_model, new_data)
print(predicted_heart_disease)
This will give you the predicted heart disease risk based on the input values for biking and smoking.
By following these steps, you can effectively perform Multiple Linear Regression in R to analyze the relationship between multiple variables and make predictions.
Also Read: Mastering Linear Regression in Machine Learning With Key Examples
Now, let’s look at how you can evaluate the goodness of fit for the model.
From the multiple linear regression model output, you can determine the fitted multiple linear regression equation. This equation is useful to make predictions about the mpg value for new observations. You can go through a few metrics as discussed below to evaluate how “good” the multiple regression model in R fits the data:
This metric of multiple regression in R measures the strength of the linear relationship between the response variable and the predictor variables. A multiple R-squared of 1 shows a perfect linear relationship, whereas a multiple R-squared of 0 shows that no linear relationship exists.
Multiple R alternatively denotes the square root of R-squared. It is the variance proportion in the response variable which can be explicated by the predictor variables.
This metric of multiple regression in R calculates the average distance that the observed values fall from the particular regression line.
Also Read: R Programming Cheat Sheet: Essential Data Manipulation
Next, let’s look at some of the benefits of using multiple linear regression R.
Multiple linear regression in R is a powerful statistical method used to understand the relationship between a dependent variable and multiple independent variables. Unlike simple linear regression, which only examines the relationship between two variables, multiple linear regression allows for a more comprehensive analysis by controlling the effects of other variables.
This technique is particularly valuable in real-world data analysis, where many factors influence the outcome. By incorporating multiple predictors, analysts can gain a more accurate and nuanced understanding of complex systems.
Here’s an overview of the key benefits of using multiple linear regression in R, along with relevant examples:
Benefit |
Example |
Controls the Effect of Other Variables | In predicting heart disease, controlling for factors like age, smoking, and exercise provides a clearer understanding of each variable’s impact. |
Incorporates Multiple Variables | Analyzing the effect of various factors like temperature, rainfall, and fertilizer on crop yields, leading to better agricultural predictions. |
Helps Understand the Relationship Between Variables | Understanding how the number of study hours and student height affect GPA, while controlling for other factors like socioeconomic background. |
Provides Clear Decision-Making Insights | Using the model to predict gold prices based on interest rates, inflation, and historical trends, helping businesses make informed investment decisions. |
Enables Graphical Representation of Effects | Visualizing how biking and smoking independently affect heart disease risk by plotting the relationships with the dependent variable. |
Estimates Regression Coefficients for Better Prediction | Estimating the relationship between a driver’s age, experience, and distance covered, which helps improve operational efficiency for ride-sharing services like Uber. |
Multiple linear regression is a very important aspect from an analyst’s point of view. Before looking at the details of how to plot multiple linear regression in R, you must know the instances where multiple linear regression is applied.
Below are five real-life applications where multiple linear regression in R can be applied:
In agriculture, multiple linear regression is used to predict crop yields based on variables like rainfall, temperature, and fertilizer levels. For instance, a farmer could use regression to estimate the expected yield of a crop (dependent variable) by analyzing how different factors like rainfall, temperature, and fertilizer usage (independent variables) influence the yield.
R Example:
lm(crop_yield ~ rainfall + temperature + fertilizer, data = crop_data)
Multiple linear regression is used by financial analysts to forecast future prices of stocks or commodities based on historical data. For example, a regression model could predict the price of gold (dependent variable) based on factors like interest rates, inflation, and historical gold prices (independent variables).
R Example:
lm(gold_price ~ interest_rate + inflation + historical_price, data = stock_data)
Example: Ride-hailing services like UBER can use multiple linear regression to predict the distance covered by a driver (dependent variable) based on their age and years of experience (independent variables). This helps in understanding how experience and age influence driver productivity.
R Example:
lm(distance_covered ~ driver_age + years_of_experience, data = uber_data)
Example: In educational settings, multiple linear regression can be applied to determine the relationship between a student's GPA (dependent variable) and the number of hours they study and their height (independent variables). This can help identify factors contributing to academic performance.
R Example:
lm(gpa ~ study_hours + height, data = student_data)
Companies use multiple regression analysis to determine employee salaries (dependent variable) based on independent variables such as years of experience and age. This helps in setting salary structures that are fair and competitive based on these factors.
R Example:
lm(salary ~ years_of_experience + age, data = employee_data)
Understand the basics of building hypotheses with upGrad’s free Hypothesis Testing course. Learn hypothesis types, test statistics, p-value, and critical value methods from the ground up.
Also Read: 6 Types of Regression Models in Machine Learning You Should Know About
Next, let’s look at how upGrad can help you learn linear regression.
Multiple Linear Regression in R is a powerful tool for analyzing complex relationships between multiple variables. You can easily apply this technique to make data-driven decisions. Companies rely on regression models for various tasks, from predicting sales and analyzing consumer behavior to understanding market trends.
upGrad can help you learn multiple linear regression in R by offering hands-on courses with expert mentorship. You’ll learn to apply multiple linear regression in R to solve real-life business problems.
In addition to the programs covered in the blog, here are some additional free courses to help you in your journey:
If you're unsure where to begin or which area to focus on, upGrad’s expert career counselors can guide you based on your goals. You can also visit a nearby upGrad offline center to explore course options, get hands-on experience, and speak directly with mentors!
Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!
Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!
Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!
Reference:
https://www.scielo.br/j/abmvz/a/DRQJqHHBVkvZbq77hjZFW4k/
763 articles published
Rohit Sharma shares insights, skill building advice, and practical tips tailored for professionals aiming to achieve their career goals.
Get Free Consultation
By submitting, I accept the T&C and
Privacy Policy
Start Your Career in Data Science Today
Top Resources