View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All

Multiple Linear Regression in R: A Complete Guide

By Rohit Sharma

Updated on Jun 17, 2025 | 16 min read | 29.57K+ views

Share:

Do you know? The accuracy of multiple regression models increases as the number of predictors rises from 4 to 10, with R² estimates improving markedly in larger, more complex datasets. This indicates that adding more relevant predictors helps the model explain more variability in the data, leading to better predictive accuracy.

Performing multiple regression in R is widely preferred for analyzing complex relationships between multiple variables. For example, businesses use it to predict sales based on factors like advertising budget, location, and seasonality. 

R's powerful statistical functions and extensive libraries make it easy to apply and interpret these models. It offers flexibility for handling large datasets and complex relationships, ensuring reliable results.

In this blog, you’ll explore how to perform multiple linear regression in Rwith practical examples and visualizations to help you master the technique.

If you want to build skills in using regression models, upGrad’s online AI and ML courses can help you. By the end of the program, participants will be able to build AI models, analyze complex data, and solve industry-specific challenges.

Multiple Linear Regression in R: A Simple Guide

Multiple linear regression is a statistical analysis technique used to predict a variable’s outcome based on two or more variables. It is an extension of linear regression and also known as multiple regression. The variable to be predicted is the dependent variable, and the variables used to predict the value of the dependent variable are known as independent or explanatory variables.

R’s powerful statistical functions and extensive packages like lm() simplify model fitting, diagnostics, and data visualization. Its flexibility, open-source nature, and strong community support make it ideal for handling complex datasets and ensuring accurate, reliable results.

If you're looking to enhance your skills in data analysis and regression modeling, here are some top-rated courses to help you get there:

Steps to Perform Multiple Regression in R:

background

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

Placement Assistance

Certification6 Months

Now, let’s explore these steps in more detail:

1. Data Collection

The first step is to harness the data you’ll use for prediction. In our example, we need data that includes both the dependent variable (heart disease) and the independent variables (biking and smoking). This data should ideally reflect real-world scenarios and have enough variation to make accurate predictions.

For example, the dataset might look like:

Heart Disease

Biking

Smoking

0 5 10
1 3 15
0 8 5
1 2 20
... ... ...

Also Read: Data Collection for Mining: Your Essential Guide

2. Data Import and Capturing in R

Once you have your data, you need to import it into R for analysis. In R, you can use the read.csv() function to load data from a CSV file into a data frame.

heart.data <- read.csv("path_to_file/heart_data.csv")

This loads the data into the heart.data data frame, where you can start working on it.

Also Read: Top 7 Data Extraction Tools in the Market [Hand-picked]

3. Checking Data Linearity

Before applying the regression model, it's important to ensure that there is a linear relationship between the dependent variable (heart disease) and the independent variables (biking and smoking). This can be done by creating scatter plots to visually inspect the relationship or using statistical tests.

plot(heart.data$biking, heart.data$heart.disease, main="Biking vs Heart Disease", xlab="Biking", ylab="Heart Disease")
plot(heart.data$smoking, heart.data$heart.disease, main="Smoking vs Heart Disease", xlab="Smoking", ylab="Heart Disease")

Output:

If both plots display linear patterns, it implies that a linear regression model could be an appropriate choice for modeling the relationship between the variables. 

Also Read: Top 15 Types of Data Visualization: Benefits and How to Choose the Right Tool for Your Needs in 2025

4. Applying Multiple Linear Regression in R

Once you've confirmed linearity, the next step is to apply the Multiple Linear Regression model using the lm() function in R. This function will calculate the regression equation, determining how much each independent variable contributes to the prediction of the dependent variable.

lm_model <- lm(heart.disease ~ biking + smoking, data = heart.data)
summary(lm_model)

The summary() function provides key results from the regression analysis, including the coefficients (intercept, biking, smoking), R², and p-values for hypothesis testing. This tells you how well biking and smoking predict heart disease.

5. Making Predictions

After building the regression model, you can use it to make predictions. For example, if you have new data (e.g., someone who bikes 4 times a week and smokes 12 cigarettes daily), you can predict their likelihood of developing heart disease using the model.

new_data <- data.frame(biking = 4, smoking = 12)
predicted_heart_disease <- predict(lm_model, new_data)
print(predicted_heart_disease)

This will give you the predicted heart disease risk based on the input values for biking and smoking.

By following these steps, you can effectively perform Multiple Linear Regression in R to analyze the relationship between multiple variables and make predictions. 

Also Read: Mastering Linear Regression in Machine Learning With Key Examples

Now, let’s look at how you can evaluate the goodness of fit for the model.

Evaluating the Goodness of Fit of the Model

From the multiple linear regression model output, you can determine the fitted multiple linear regression equation. This equation is useful to make predictions about the mpg value for new observations. You can go through a few metrics as discussed below to evaluate how “good” the multiple regression model in R fits the data:

  • Multiple R-Squared:

This metric of multiple regression in R measures the strength of the linear relationship between the response variable and the predictor variables. A multiple R-squared of 1 shows a perfect linear relationship, whereas a multiple R-squared of 0 shows that no linear relationship exists.

Multiple R alternatively denotes the square root of R-squared.  It is the variance proportion in the response variable which can be explicated by the predictor variables.

  • Residual Standard Error:

This metric of multiple regression in R calculates the average distance that the observed values fall from the particular regression line.

If you want to learn more about linear regression, try upGrad’s free Linear Regression - Step by Step Guide. It will help you build a strong foundation in predictive modeling and you will learn simple and multiple regression, performance metrics, and applications across data science domains.

Also Read: R Programming Cheat Sheet: Essential Data Manipulation

Next, let’s look at some of the benefits of using multiple linear regression R.

Benefits of using Multiple Linear Regression in R

Multiple linear regression in R is a powerful statistical method used to understand the relationship between a dependent variable and multiple independent variables. Unlike simple linear regression, which only examines the relationship between two variables, multiple linear regression allows for a more comprehensive analysis by controlling the effects of other variables. 

This technique is particularly valuable in real-world data analysis, where many factors influence the outcome. By incorporating multiple predictors, analysts can gain a more accurate and nuanced understanding of complex systems.

Here’s an overview of the key benefits of using multiple linear regression in R, along with relevant examples:

Benefit

Example

Controls the Effect of Other Variables In predicting heart disease, controlling for factors like age, smoking, and exercise provides a clearer understanding of each variable’s impact.
Incorporates Multiple Variables Analyzing the effect of various factors like temperature, rainfall, and fertilizer on crop yields, leading to better agricultural predictions.
Helps Understand the Relationship Between Variables Understanding how the number of study hours and student height affect GPA, while controlling for other factors like socioeconomic background.
Provides Clear Decision-Making Insights Using the model to predict gold prices based on interest rates, inflation, and historical trends, helping businesses make informed investment decisions.
Enables Graphical Representation of Effects Visualizing how biking and smoking independently affect heart disease risk by plotting the relationships with the dependent variable.
Estimates Regression Coefficients for Better Prediction Estimating the relationship between a driver’s age, experience, and distance covered, which helps improve operational efficiency for ride-sharing services like Uber.

You can also learn the fundamentals of logistic regression with upGrad’s free Logistic Regression for Beginners course. It covers univariate and multivariate models and their practical applications in data analysis and prediction.

Also Read: Linear Regression Vs. Logistic Regression: Difference Between Linear Regression & Logistic Regression

Applications of Multiple Linear Regression in R

Multiple linear regression is a very important aspect from an analyst’s point of view. Before looking at the details of how to plot multiple linear regression in R, you must know the instances where multiple linear regression is applied. 

Below are five real-life applications where multiple linear regression in R can be applied:

1. Predicting Crop Yields Based on Environmental Factors

In agriculture, multiple linear regression is used to predict crop yields based on variables like rainfall, temperature, and fertilizer levels. For instance, a farmer could use regression to estimate the expected yield of a crop (dependent variable) by analyzing how different factors like rainfall, temperature, and fertilizer usage (independent variables) influence the yield.

R Example:

lm(crop_yield ~ rainfall + temperature + fertilizer, data = crop_data)

2. Forecasting Stock Prices or Commodity Prices

Multiple linear regression is used by financial analysts to forecast future prices of stocks or commodities based on historical data. For example, a regression model could predict the price of gold (dependent variable) based on factors like interest rates, inflation, and historical gold prices (independent variables).

R Example:

lm(gold_price ~ interest_rate + inflation + historical_price, data = stock_data)

3. Predicting UBER Driver’s Earnings Based on Experience and Age

Example: Ride-hailing services like UBER can use multiple linear regression to predict the distance covered by a driver (dependent variable) based on their age and years of experience (independent variables). This helps in understanding how experience and age influence driver productivity.

R Example:

lm(distance_covered ~ driver_age + years_of_experience, data = uber_data)

4. Analyzing GPA Based on Study Hours and Height

Example: In educational settings, multiple linear regression can be applied to determine the relationship between a student's GPA (dependent variable) and the number of hours they study and their height (independent variables). This can help identify factors contributing to academic performance.

R Example:

lm(gpa ~ study_hours + height, data = student_data)

5. Predicting Employee Salaries Based on Experience and Age

Companies use multiple regression analysis to determine employee salaries (dependent variable) based on independent variables such as years of experience and age. This helps in setting salary structures that are fair and competitive based on these factors.

R Example:

lm(salary ~ years_of_experience + age, data = employee_data)

Understand the basics of building hypotheses with upGrad’s free Hypothesis Testing course. Learn hypothesis types, test statistics, p-value, and critical value methods from the ground up.

Also Read: 6 Types of Regression Models in Machine Learning You Should Know About

Next, let’s look at how upGrad can help you learn linear regression.

How Can upGrad Help You Learn Regression?

Multiple Linear Regression in R is a powerful tool for analyzing complex relationships between multiple variables. You can easily apply this technique to make data-driven decisions. Companies rely on regression models for various tasks, from predicting sales and analyzing consumer behavior to understanding market trends. 

upGrad can help you learn multiple linear regression in R by offering hands-on courses with expert mentorship. You’ll learn to apply multiple linear regression in R to solve real-life business problems.

In addition to the programs covered in the blog, here are some additional free courses to help you in your journey:

If you're unsure where to begin or which area to focus on, upGrad’s expert career counselors can guide you based on your goals. You can also visit a nearby upGrad offline center to explore course options, get hands-on experience, and speak directly with mentors! 

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Reference:
https://www.scielo.br/j/abmvz/a/DRQJqHHBVkvZbq77hjZFW4k/

Frequently Asked Questions (FAQs)

1. How do I handle multicollinearity in multiple linear regression in R?

2. How do I assess the model’s goodness-of-fit in multiple linear regression?

3. Can I use multiple linear regression for categorical variables?

4. How do I interpret the coefficients of a multiple linear regression model?

5. What is the role of residual plots in multiple linear regression?

6. How do I check for outliers in multiple linear regression?

7. How can I perform variable selection in multiple linear regression?

8. What do I do if my multiple linear regression model is overfitting?

9. How do I interpret the p-value in multiple linear regression?

10. Can multiple linear regression be used for time series data in R?

11. What is the difference between simple and multiple linear regression in R?

Rohit Sharma

763 articles published

Rohit Sharma shares insights, skill building advice, and practical tips tailored for professionals aiming to achieve their career goals.

Get Free Consultation

+91

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources

Recommended Programs

upGrad Logo

Certification

3 Months

Liverpool John Moores University Logo
bestseller

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree

17 Months

IIIT Bangalore logo
bestseller

The International Institute of Information Technology, Bangalore

Executive Diploma in Data Science & AI

Placement Assistance

Executive PG Program

12 Months