Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligences USbreadcumb forward arrow iconLogistic Regression for Machine Learning: A Complete Guide

Logistic Regression for Machine Learning: A Complete Guide

Last updated:
4th Oct, 2021
Views
Read Time
8 Mins
share image icon
In this article
Chevron in toc
View All
Logistic Regression for Machine Learning: A Complete Guide

Machine Learning models require algorithms to work. Depending on the set of conditions, a particular ML model can perform best using one or the other algorithm. As a result, Machine Learning engineers and enthusiasts should be aware of the different types of algorithms that can be used in different contexts – to know which one to use when the time comes. There is never a one-size-fits-all solution in Machine Learning, and tweaking with different algorithms can deliver the desired results. 

For example, you must already know about Linear Regression. However, this algorithm just cannot be applied to categorically dependent variables. This is where Logistic Regression comes in handy. 

In Machine Learning, Logistic Regression is a supervised method of learning used for predicting the probability of a dependent or a target variable. Using Logistic Regression, you can predict and establish relationships between dependent and one or more independent variables.

 Logistic Regression equations and models are generally used for predictive analytics for binary classification. You can also use them for multi-class classification. 

Ads of upGrad blog

Here is how the Logistic Regression equation for Machine Learning looks like: 

logit(p) = ln(p/(1-p)) = h0+h1X1+h2X2+h3X3….+hkXk

Where;

p= probability of the occurrence of the feature

x1,x2,..xk = set of input features

h1,h2,….hk = parametric values to be estimated in the Logistic Regression equation. 

 

Types of Logistic Regression Models in Machine Learning

Based on the way Logistic Regression is used, the type of Logistic Regression models can be classified as follows: 

1. Binary Logistic Regression Model

This is one of the most popularly used regression models for Logistic Regression. It helps categorize data into two classes and predict the value of a new input as belonging to either of the two classes. For instance, a patient’s tumour can either be benign or malignant but never both.

2. Multinomial Logistic Regression Model

This model helps classify target variables into more than two classes – regardless of any quantitative significance. An example of this could be predicting the type of food an individual is likely to order based on their diet preferences and past experiences. 

Join the Machine Learning Course online from the World’s top Universities – Masters, Executive Post Graduate Programs, and Advanced Certificate Program in ML & AI to fast-track your career.

3. Ordinal Logistic Regression Model

This model is used for classifying the target variable. For instance, a student’s performance in an examination can be classified as poor, good, and excellent in a hierarchical order. That way, the data is classified into three distinct categories, with each class having a specific level of importance.

The Logistic Regression equation can be used in several cases, such as spam detection, tumour classification, sex categorization, and many more. Let’s look at two of the most common example use cases of Logistic Regression equation in Machine Learning to help you understand better. 

Example use cases of Logistic Regression Equation

Example 1: Identifying Spam E-mails

Consider the class of 1 if the email is spam and 0 if the email is not. To detect this, multiple attributes are analyzed from the mail body. These include: 

  • The sender
  • Spelling errors
  • Keywords in the email such as “bank details”, “lucky”, “winner”, “congratulations”. 
  • Contact details or URL in the email

This extracted data can then be fed into the Logistic Regression equation for Machine Learning which will analyze all inputs and deliver a score between 0 and 1. If the score is greater than 0 but less than 0.5, the email will be classified as spam, and if the score is between 0.5 to 1, the mail is marked as non-spam. 

Example 2: Identifying Credit Card Fraud

Using Logistic Regression equations or Logistic Regression-based Machine Learning models, banks can promptly identify fraudulent credit card transactions. For this, details such as PoS, card number, transaction value, transaction data, and the likes are fed into the Logistic Regression model, which decides whether a given transaction is genuine (0) or fraudulent (1). For instance, if the purchase value is too high and deviates from typical values, the regression model assigns a value (between 0.5 and 1) that classifies the transaction as fraud.

Working of Logistic Regression in Machine Learning

Logistic Regression works by using the Sigmoid function to map the predictions to the output probabilities. This function is an S-shaped curve that plots the predicted values between 0 and 1. The values are then plotted towards the margins at the top and the bottom of the Y-axis, using 0 and 1 as the labels. Then, depending on these values, the independent variables can be classified. 

Here’s how the Sigmoid function looks like: 

The Sigmoid function is based on the following equation: 

y=1/(1+e^x) 

Where e^x= the exponential constant with a value of 2.718.

The Sigmoid function’s equation above provides the predicted value (y) as zero if x is considered negative. If x is a large positive number, the value predicted is close to one. 

Building a Logistic Regression Model in Python

Let’s walk through the process of building a Logistic Regression model in Python. For that, let’s use the Social Network dataset to carry out the regression analysis, and let’s try to predict whether or not an individual will purchase a particular car. Here’s how the steps look. 

Step 1: Importing the Libraries and Dataset

It begins by importing the libraries that are needed to build the model. This includes Pandas, Numpy, and Matplotlib. We also need to import the dataset we’ll be working with. Here’s what the code looks like: 

import numpy as np

import matplotlib.pyplot as pt

import pandas as pd

dataset = pd.read_csv(‘Social_Network.csv’)

Step 2: Splitting into Dependent and Independent Variables

Now it’s time to split the fed data into dependent and independent variables. For this example, we will consider the purchase value as the dependent variable during the estimated salary and age of the individuals as independent variables. 

x = dataset.iloc[:, [2,3]].values

y = dataset.iloc[:, 4].values

Step 3: Splitting the Dataset into a Training set and Test set

It is essential to split the dataset into specific training and test sets. The training set will train the Logistic Regression equation, while the test data will be used to validate the model’s training and test it. Sklearn is used to split the given dataset into two sets. We use the train_split_function by specifying the amount of data we wish to set aside for training and testing. 

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.33, random_state = 0)

As you can see, we have defined the test size as 33% of the complete dataset. So, the remaining 66% will be used as the training data. 

Step 4: Scaling

To improve the accuracy of your Logistic Regression model, you’ll need to rescale the data and bring values that might be highly varying in nature. 

from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()

X_train = sc_X.fit_transform(X_train)

X_test = sc_X.transform(X_test)

Step 5: Building the Logistic Regression model

Once that is done, you need to build the Logistic Regression model and fit it into the training set. Begin by importing the Logistic Regression algorithm from Sklearn.

from sklearn.linear_model import LogisticRegression

Then, create an instance classifier to fit the training data.

classifier = LogisticRegression(random_state=0)

classifier.fit(x_train, y_train)

Next, create predictions on the test dataset.

y_pred = classifier.predict(x_test)

Finally, check the performance of your Logistic Regression model using the Confusion matrix. 

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)

acc = accuracy_score(y_test, y_pred)

print(acc)

print(cm)

Now, you can use Matplotlib to visualize the entire data set, including training and test sets! 

In Conclusion

Ads of upGrad blog

Logistic Regression is one of the tools that help in the development of Machine Learning models and algorithms. Likewise, there are multiple other algorithms, too, that are used depending on the use case at hand. However, to know which algorithm to use, you should be aware of all possible options. Only then will you be in a position to select the most fitting algorithm for your data set. 

Check out our Executive PG Program in Machine Learning designed in a way that takes you from scratch and helps you build your skills to the very top – so that you are in a position to solve any real-world Machine Learning problem. Check out the different courses and enrol in the one that feels right for you. Join upGrad and experience a holistic learning environment and placement support! 

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select Coursecaret down icon
Selectcaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Best Artificial Intelligence Course

Frequently Asked Questions (FAQs)

1How many kinds of Logistic Regression for Machine Learning are possible?

Logistic Regression is broadly of three types:
1. Binary
2. Multinomial
3. Ordinal.

2What is Logistic Regression used for in Machine Learning?

Logistic Regression is one of the supervised learning methods used to find and build the best fit relationship between dependent and independent variables to make proper future predictions.

3What is the function that Logistic Regression for Machine Learning uses?

Logistic Regression for Machine Learning uses the Sigmoid function to find the best fit curve.

4What is Logistic Regression?

Logistic regression is a statistical analysis approach that predicts a data value based on previous observations from a data collection. The method enables a Machine Learning system to categorize input data based on past data. The system should improve its ability to anticipate classes within data sets as more relevant data is received. During the extract, transform and load process, logistic regression can help with data preparation by allowing data sets to be placed into specified buckets to stage data for analysis. A logistic regression model analyses the connection between one or more existing independent variables to determine a dependent data variable.

5How does Logistic Regression help machine learners?

Under the Supervised Learning technique, the most well-known Machine Learning algorithm is logistic regression. In Machine Learning, a categorical dependent variable's output is predicted using logistic regression. So, the result of the program must be either categorical or discrete. It can be Yes/No, 0/1, true/false, etc., but instead of giving precise values, it provides probabilistic values that are between 0 and 1. As it can generate probabilities and classify new data using both continuous and discrete datasets, logistic regression is a key Machine Learning approach. Logistic regression may be used to categorize observations based on multiple forms of data and can determine the most beneficial elements for classification.

6Is logistic regression a topic from Mathematics or Computer Science?

Machine Learning (ML) is a part of Data Science that lies at the confluence of Computer Science and Mathematics, with data-driven learning as its core. Supervised Learning, Unsupervised Learning, and Reinforcement Learning are the three subparts of Machine Learning, depending on the kind of learning. Supervised Learning is the most common and well-known of these learning styles. The two primary kinds of issues tackled in Supervised Learning are Classification and Regression. Classification is useful for categorizing data. You may further subdivide classification into generative and discriminative models. The most prevalent strategy in discriminative models is logistic regression.

Explore Free Courses

Suggested Blogs

Top 25 New & Trending Technologies in 2024 You Should Know About
63210
Introduction As someone deeply immersed in the ever-changing landscape of technology, I’ve witnessed firsthand the rapid evolution of trending
Read More

by Rohit Sharma

23 Jan 2024

Basic CNN Architecture: Explaining 5 Layers of Convolutional Neural Network [US]
6375
A CNN (Convolutional Neural Network) is a type of deep learning neural network that uses a combination of convolutional and subsampling layers to lear
Read More

by Pavan Vadapalli

15 Apr 2023

Top 10 Speech Recognition Softwares You Should Know About
5508
What is a Speech Recognition Software? Speech Recognition Software programs are computer programs that interpret human speech and convert it into tex
Read More

by Sriram

26 Feb 2023

Top 16 Artificial Intelligence Project Ideas & Topics for Beginners [2024]
6133
Artificial intelligence controls computers to resemble the decision-making and problem-solving competencies of a human brain. It works on tasks usuall
Read More

by Sriram

26 Feb 2023

15 Interesting Machine Learning Project Ideas For Beginners & Experienced [2024]
5614
Taking on machine learning projects as a beginner is an excellent way to gain hands-on experience and develop a better understanding of the fundamenta
Read More

by Sriram

26 Feb 2023

Explaining 5 Layers of Convolutional Neural Network
5208
A CNN (Convolutional Neural Network) is a type of deep learning neural network that uses a combination of convolutional and subsampling layers to lear
Read More

by Sriram

26 Feb 2023

20 Exciting IoT Project Ideas & Topics in 2024 [For Beginners & Experienced]
9770
IoT (Internet of Things) is a network that houses multiple smart devices connected to one Cloud source. This network can be regulated in several ways
Read More

by Sriram

25 Feb 2023

Why Is Time Complexity Important: Algorithms, Types & Comparison
7574
Time complexity is a measure of the amount of time needed to execute an algorithm. It is a function of the algorithm’s input size and the type o
Read More

by Sriram

25 Feb 2023

Curse of dimensionality in Machine Learning: How to Solve The Curse?
11274
Machine learning can effectively analyze data with several dimensions. However, it becomes complex to develop relevant models as the number of dimensi
Read More

by Sriram

25 Feb 2023

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon