Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconLinear Regression Explained with Example

Linear Regression Explained with Example

Last updated:
13th Oct, 2021
Views
Read Time
7 Mins
share image icon
In this article
Chevron in toc
View All
Linear Regression Explained with Example

Linear regression is one of the most common algorithms for establishing relationships between the variables of a dataset. A mathematical model is a necessary tool for data scientists in performing predictive analysis. This blog will fill you in on the fundamental concept and also discuss a linear regression example. 

Top Machine Learning and AI Courses Online

What are Regression Models?

A regression model describes the relationship between dataset variables by fitting a line to the data observed. It is a mathematical analysis that sorts out which variables have an impact and matter the most. It also determines how certain we are about the factors involved. The two kinds of variables are:

  • Dependent: Factor that you are attempting to predict or understand. 
  • Independent: Factors that you suspect to have an impact on the dependent variable.

Regression models are used when the dependent variable is quantitative. It may be binary in the case of logistic regression. But in this blog, we will mainly focus on the linear regression model where both variables are quantitative. 

Ads of upGrad blog

Trending Machine Learning Skills

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Suppose you have data on the monthly sales and average monthly rainfall for the past three years. Let’s say that you plotted this information on a chart. The y-axis represents the number of sales (dependent variable), and the x-axis depicts the total rainfall. Each dot on the chart would show how much it rained during a particular month and the corresponding sales numbers. 

If you take another glance at the data, you might notice a pattern. Presume the sales to be higher on the days it rained more. But it would be tricky to estimate how much you would typically sell when it rained a certain amount, say 3 or 4 inches. You could get some degree of certainty if you drew a line through the middle of all data points on the chart. 

Nowadays, Excel and statistics software like SPSS, R, or STATA can help you draw a line that best fits the data at hand. In addition, you can also output a formula explaining the slope of the line. 

Consider this formula for the above example: Y = 200 + 3X. It tells you that you sold 200 units when it didn’t rain at all (i.e., when X=0). Assuming that the variables stay the same as we advance, every additional inch of rain would result in an average sales of three more units. You would sell 203 units if it rains 1 inch, 206 units if it rains 2 inches, 209 inches if it rains 3 inches, and so on.

Typically, the regression line formula also includes an error term (Y = 200 + 3 X + error term). It takes into account the reality that independent predictors may not always be perfect predictors of dependent variables. And the line merely gives you an estimate based on the data available. The larger the error term, the less certain would be your regression line.

Linear Regression Basics

A simple linear regression model uses a straight line to estimate the relationship between two quantitative variables. If you have more than one independent variable, you will use multiple linear regression instead.

Simple linear regression analysis is concerned with two things. First, it tells you the strength of the relationship between the dependent and independent factors of the historical data. Second, it gives you the value of the dependent variable at a certain value of the independent variable. 

Consider this linear regression example. A social researcher interested in knowing how individuals’ income affects their happiness levels performs a simple regression analysis to see if a linear relationship occurs. The researcher takes quantitative values of the dependent variable (happiness) and independent variable (income) by surveying people in a particular geographical location. 

For instance, the data contains income figures and happiness levels (ranked on a scale from 1 to 10) from 500 people from the Indian state of Maharashtra. The researcher would then plot the data points and fit a regression line to know how much the respondents’ earnings influence their wellbeing. 

Linear regression analysis is based on a few assumptions about the data. There are:

  • Linearity of the relationship between the dependent and independent variable, i.e., the line of best fit is straight, not curved.)
  • Homogeneity of variance, meaning the size of the error in the prediction, does not change significantly across different values of the independent variable.
  • Independence of observations in the dataset, referring to no hidden relationships.
  • Normality of data distribution for the dependent variable. You can check the same using the hist() function in R.

The Math Behind Linear Regression

y = c + ax is a standard equation where y is the output (that we want to estimate), x is the input variable (that we know), a is the slope of the line, and c is the constant. 

Here, the output varies linearly based on the input. The slope determines how much x impacts the value of y. The constant is the value of y when x is nil.

Let’s understand this through another linear regression example. Imagine that you are employed in an automobile company and want to study India’s passenger vehicle market. Let’s say that the national GDP influences passenger vehicle sales. To plan better for the business, you might want to find out the linear equation of the number of vehicles sold in the country concerning the GDP

For this, you would need sample data for year-wise passenger vehicle sales and the GDP figures for every year. You might discover that the GDP of the current year affects the sales for next year: Whichever year the GDP was less, vehicle sales were lower in the subsequent year.

To prepare this data for Machine Learning analytics, you would need to do a little more work. 

  • Please start with the equation y = c + ax, where y is the number of vehicles sold in a year and x is the GDP of the prior year. 
  • To find out c and an in the above problem, you can create a model using Python.

Check out this tutorial to understand the step-by-step method

If you were to perform simple linear regression in R, interpreting and reporting results become much easier.

For the same linear regression example, let us change the equation to y=B0 + B1x + e. Again, y is the dependent variable, and x is the independent or known variable. B0 is the constant or intercept, B1 is the slope of the regression coefficient, and e is the error of the estimate. 

Statistical software like R can find the line of best fit through the data and search for the B1 that minimises the total error of the model.

Follow these steps to begin:

  • Load the passenger vehicle sales dataset into the R environment.
  • Run the command to generate a linear model that describes the relationship between passenger vehicle sales and GDP. 
    • sales.gdp.lm <- lm(gdp ~ sales, data = sales.data)
  • Use the summary() function to view the most important linear model parameters in tabulated form.
    • summary(sales.gdp.lm)

    Note: The output would contain results like calls, Residuals, and Coefficients. The ‘Call’ table states the formula used. The ‘Residuals’ details the Median, Quartiles, minimum, and maximum values to indicate how well the model fits the real data. The first row of the ‘Coefficients’ table estimates the y-intercept, and the second row gives the regression coefficient. The columns of this table have labels like Estimate, Std. Error, t value, and p-value.

  • Plug the (Intercept) value into the regression equation to predict sales values across the range of GDP numbers.
  • Investigate the (Estimate) column to know the effect. The regression coefficient would tell you how much the sales change with the change in GDP.
  • Find out the variation in your estimate of the relationship between sales and GDP from the (Std. Error) label.
  • Look at the test statistic under (t-value) to know whether the results occurred by chance. The larger the t-value, the less likely it would be.
  • Go through the Pr(>|t|) column or p-values to see the estimated effect of GDP on sales if the null hypothesis were true. 
  • Present your results with the estimated effect, standard error, and p-values, clearly communicating what the regression coefficient means.
  • Include a graph with the report. A simple linear regression can be shown as a plot chart with the regression line and function. 
  • Calculate the error by measuring the distance of the observed and predicted y values, squaring the distances at each value of x, and calculating their mean.
Ads of upGrad blog

Popular AI and ML Blogs & Free Courses

Conclusion

With the above linear regression example, we have given you an overview of generating a simple linear regression model, finding the regression coefficient, and calculating the error of the estimate. We also touched upon the relevance of Python and R for predictive data analytics and statistics. Practical knowledge of such tools is crucial for pursuing careers in data science and machine learning today.

If you want to hone your programming skills, check out the Advanced Certificate Programme in Machine Learning by IIT Madras and upGrad. The online course also includes case studies, projects, and expert mentorship sessions to bring industry-orientedness to the training process. 

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select Coursecaret down icon
Selectcaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Explore Free Courses

Suggested Blogs

Artificial Intelligence course fees
5385
Artificial intelligence (AI) was one of the most used words in 2023, which emphasizes how important and widespread this technology has become. If you
Read More

by venkatesh Rajanala

29 Feb 2024

Artificial Intelligence in Banking 2024: Examples &#038; Challenges
6109
Introduction Millennials and their changing preferences have led to a wide-scale disruption of daily processes in many industries and a simultaneous g
Read More

by Pavan Vadapalli

27 Feb 2024

Top 9 Python Libraries for Machine Learning in 2024
75574
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 Feb 2024

Top 15 IoT Interview Questions &#038; Answers 2024 – For Beginners &#038; Experienced
64428
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 Feb 2024

Data Preprocessing in Machine Learning: 7 Easy Steps To Follow
152727
Summary: In this article, you will learn about data preprocessing in Machine Learning: 7 easy steps to follow. Acquire the dataset Import all the cr
Read More

by Kechit Goyal

18 Feb 2024

Artificial Intelligence Salary in India [For Beginners &#038; Experienced] in 2024
908666
Artificial Intelligence (AI) has been one of the hottest buzzwords in the tech sphere for quite some time now. As Data Science is advancing, both AI a
Read More

by upGrad

18 Feb 2024

24 Exciting IoT Project Ideas &#038; Topics For Beginners 2024 [Latest]
759498
Summary: In this article, you will learn the 24 Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Smart Agr
Read More

by Kechit Goyal

18 Feb 2024

Natural Language Processing (NLP) Projects &amp; Topics For Beginners [2023]
107595
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

17 Feb 2024

45+ Interesting Machine Learning Project Ideas For Beginners [2024]
328125
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

16 Feb 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon