Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligences USbreadcumb forward arrow iconAll about Linear Regression using Scikit

All about Linear Regression using Scikit

Last updated:
7th Sep, 2022
Views
Read Time
6 Mins
share image icon
In this article
Chevron in toc
View All
All about Linear Regression using Scikit

In practice, there are two primary supervised machine learning algorithms: 1. Classification and 2. Regression — Classification is used to predict discrete outputs, while regression is used to predict continuous value output. 

In algebra, linearity denotes a straight or linear relationship between multiple variables. A literal representation of this relationship would be a straight line. 

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Linear regression is a machine learning algorithm that is executed under supervision. It is a process of looking for and mapping a line suitable for all the data points available on the said plot. It is a regression model that helps estimate the value between one dependent and one independent variable, all with the help of a straight line. 

Ads of upGrad blog

Linear regression models help build a linear relationship between these independent variables, which have the lowest costs, based on the given dependent variables. 

In mathematics, we have three ways which are used to describe a linear regression model. They are as follows (y being the dependent variable):

  • y = intercept + (slope x) + error 
  • y = constant + (coefficientx) + error 
  • y = a + bx + e

Why is linear regression essential?

The models of linear regression are comparatively simpler and more user-friendly. They make the process of interpreting mathematical data/formulae capable of generating predictions relatively simpler. Linear regression can be instrumental in various fields (for instance, academics or business studies).

The linear regression model is the only scientifically proven method to accurately predict the future. It is used in various sciences from environmental, behavioural, social, etc. 

The properties of these models are very well understood and hence, easily trainable since it is a long-established statistical procedure. It also facilitates the transformation of copious raw data sets into actionable information.

Key assumptions of effective linear regression

  • The number of valid cases, mean, and standard deviation should be considered for each variable. 
  • For each model: Regression coefficients, correlation matrix, part and partial correlations, standard error of the estimate, analysis-of-variance table, predicted values, and residuals should be considered.
  • Plots: Scatterplots, histograms, partial plots, and normal probability plots are considered.
  • Data: It must be ensured that dependent and independent variables are quantitative. Categorical variables need not be re-coded to binary or dummy variables or other types of contrast variables.
  • Other assumptions: For every value of a given independent variable, we need a normal distribution of the dependent variable. The variance of the given distribution of the dependent variable should also be kept constant for every independent variable value. The relationship between every dependent independent variable should be linear. Plus, all observations should be independent.

Here is an existing example of a simple linear regression:

The dataset in the example contains information regarding the global weather situations of each day for a particular period. This detailed list of information includes factors like precipitation, snowfall, temperatures, wind speed, thunderstorms or other possible weather conditions. 

This problem aims to use the simple linear regression model to predict the maximum temperature while taking the minimum temperature as the input. 

Firstly, all the libraries need to be imported.

import pandas as pd  

import numpy as np  

import matplotlib.pyplot as plt  

import seaborn as seabornInstance 

from sklearn.model_selection import train_test_split 

from sklearn.linear_model import LinearRegression

from sklearn import metrics

%matplotlib inline

To import the following dataset using pandas, the following command needs to be applied:

dataset = pd.read_csv(‘/Users/nageshsinghchauhan/Documents/projects/ML/ML_BLOG_LInearRegression/Weather.csv’)

To check the number of rows and columns present in the dataset to explore the data, the following command needs to be applied: 

dataset.shape

The output received should be (119040, 31), which means the data contains 119040 rows and 31 columns.

To see the statistical details of the dataset, the following command can be used: 

describe():

dataset.describe()

Here is another example that will aim to demonstrate how one can retrieve and use various Python libraries which are to be used for applying linear regression to given data sets:

1. Importing all the required libraries

import numpy as np

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

from sklearn import preprocessing, svm

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

2. Reading the data set

cd C:\Users\Dev\Desktop\Kaggle\Salinity

# Changing the file read location to the location of the dataset

df = pd.read_csv(‘bottle.csv’)

df_binary = df[[‘Salnty’, ‘T_degC’]]

# Taking only the selected two attributes from the dataset

df_binary.columns = [‘Sal’, ‘Temp’]

# Renaming the columns for easier writing of the code

df_binary.head()

# Displaying only the 1st rows along with the column names

2. Exploring the data scatter

sns.lmplot(x =”Sal”, y =”Temp”, data = df_binary, order = 2, ci = None)

# Plotting the data scatter

3. Data cleaning

# Eliminating NaN or missing input numbers

df_binary.fillna(method =’ffill’, inplace = True)

4. Training the model

X = np.array(df_binary[‘Sal’]).reshape(-1, 1)

y = np.array(df_binary[‘Temp’]).reshape(-1, 1)

# Separating the data into independent and dependent variables

# Converting each dataframe into a numpy array

# since each dataframe contains only one column

df_binary.dropna(inplace = True)

# Dropping any rows with Nan values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

# Splitting the data into training and testing data

regr = LinearRegression()

regr.fit(X_train, y_train)

print(regr.score(X_test, y_test))

5. Exploring the results

y_pred = regr.predict(X_test)

plt.scatter(X_test, y_test, color =’b’)

plt.plot(X_test, y_pred, color =’k’)

plt.show()

# Data scatter of predicted values

6. Working with a smaller dataset

df_binary500 = df_binary[:][:500]

# Selecting the 1st 500 rows of the data

sns.lmplot(x =”Sal”, y =”Temp”, data = df_binary500,

order = 2, ci = None)

Popular AI and ML Blogs & Free Courses

Ads of upGrad blog

If you are interested in learning full-fledged machine learning, we recommend joining upGrad’s Master of Science in Machine Learning & AI. The 20-months program is offered in association with IIIT Bangalore and Liverpool John Moores University. It is designed to help you build competence in industry-relevant programming languages, tools, and libraries like Python, Keras, Tensor Flow, MySql, Flask, Kubernetes, etc.

The program can help you ace advanced data science concepts through hands-on experience and skill-building. Plus, you get the upGrad advantage with access to 360° career counsel, a networking pool of 40,000+ paid learners, and a ton of collaborating opportunities!

Book your seat today!

Profile

Rohan Vats

Blog Author
Software Engineering Manager @ upGrad. Passionate about building large scale web apps with delightful experiences. In pursuit of transforming engineers into leaders.
Get Free Consultation

Selectcaret down icon
Select Area of interestcaret down icon
Select Work Experiencecaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Best Artificial Intelligence Course

Frequently Asked Questions (FAQs)

1What is linear regression used for

This kind of analysis is generally used to predict the value of one variable based on another known variable. The variables being used to find the value of the other one are called dependent and independent variables, respectively.

2How to install scikit learn?

At first, the Scikit learn linear regression version provided by the concerned operating system or Python distribution needs to be installed. This is the quickest for people who have this option available. Then the officially released and latest updated version needs to be installed.

3How does scikit learn work?

Scikit learn linear regression gives out a range of supervised and unsupervised algorithms through an interface of python, which is always consistent. It is licensed under a permissible BSD license. It is distributed under various Linux operators. Usage of these algorithms is widely encouraged in business and education.

Explore Free Courses

Suggested Blogs

Top 25 New & Trending Technologies in 2024 You Should Know About
63210
Introduction As someone deeply immersed in the ever-changing landscape of technology, I’ve witnessed firsthand the rapid evolution of trending
Read More

by Rohit Sharma

23 Jan 2024

Basic CNN Architecture: Explaining 5 Layers of Convolutional Neural Network [US]
6375
A CNN (Convolutional Neural Network) is a type of deep learning neural network that uses a combination of convolutional and subsampling layers to lear
Read More

by Pavan Vadapalli

15 Apr 2023

Top 10 Speech Recognition Softwares You Should Know About
5509
What is a Speech Recognition Software? Speech Recognition Software programs are computer programs that interpret human speech and convert it into tex
Read More

by Sriram

26 Feb 2023

Top 16 Artificial Intelligence Project Ideas & Topics for Beginners [2024]
6138
Artificial intelligence controls computers to resemble the decision-making and problem-solving competencies of a human brain. It works on tasks usuall
Read More

by Sriram

26 Feb 2023

15 Interesting Machine Learning Project Ideas For Beginners & Experienced [2024]
5614
Taking on machine learning projects as a beginner is an excellent way to gain hands-on experience and develop a better understanding of the fundamenta
Read More

by Sriram

26 Feb 2023

Explaining 5 Layers of Convolutional Neural Network
5211
A CNN (Convolutional Neural Network) is a type of deep learning neural network that uses a combination of convolutional and subsampling layers to lear
Read More

by Sriram

26 Feb 2023

20 Exciting IoT Project Ideas & Topics in 2024 [For Beginners & Experienced]
9809
IoT (Internet of Things) is a network that houses multiple smart devices connected to one Cloud source. This network can be regulated in several ways
Read More

by Sriram

25 Feb 2023

Why Is Time Complexity Important: Algorithms, Types & Comparison
7577
Time complexity is a measure of the amount of time needed to execute an algorithm. It is a function of the algorithm’s input size and the type o
Read More

by Sriram

25 Feb 2023

Curse of dimensionality in Machine Learning: How to Solve The Curse?
11287
Machine learning can effectively analyze data with several dimensions. However, it becomes complex to develop relevant models as the number of dimensi
Read More

by Sriram

25 Feb 2023

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon