Logistic Regression for Machine Learning: A Complete Guide

Machine Learning models require algorithms to work. Depending on the set of conditions, a particular ML model can perform best using one or the other algorithm. As a result, Machine Learning engineers and enthusiasts should be aware of the different types of algorithms that can be used in different contexts – to know which one to use when the time comes. There is never a one-size-fits-all solution in Machine Learning, and tweaking with different algorithms can deliver the desired results. 

For example, you must already know about Linear Regression. However, this algorithm just cannot be applied to categorically dependent variables. This is where Logistic Regression comes in handy. 

In Machine Learning, Logistic Regression is a supervised method of learning used for predicting the probability of a dependent or a target variable. Using Logistic Regression, you can predict and establish relationships between dependent and one or more independent variables.

 Logistic Regression equations and models are generally used for predictive analytics for binary classification. You can also use them for multi-class classification. 

Here is how the Logistic Regression equation for Machine Learning looks like: 

logit(p) = ln(p/(1-p)) = h0+h1X1+h2X2+h3X3….+hkXk

Where;

p= probability of the occurrence of the feature

x1,x2,..xk = set of input features

h1,h2,….hk = parametric values to be estimated in the Logistic Regression equation. 

 

Types of Logistic Regression Models in Machine Learning

Based on the way Logistic Regression is used, the type of Logistic Regression models can be classified as follows: 

1. Binary Logistic Regression Model

This is one of the most popularly used regression models for Logistic Regression. It helps categorize data into two classes and predict the value of a new input as belonging to either of the two classes. For instance, a patient’s tumour can either be benign or malignant but never both.

2. Multinomial Logistic Regression Model

This model helps classify target variables into more than two classes – regardless of any quantitative significance. An example of this could be predicting the type of food an individual is likely to order based on their diet preferences and past experiences. 

3. Ordinal Logistic Regression Model

This model is used for classifying the target variable. For instance, a student’s performance in an examination can be classified as poor, good, and excellent in a hierarchical order. That way, the data is classified into three distinct categories, with each class having a specific level of importance.

The Logistic Regression equation can be used in several cases, such as spam detection, tumour classification, sex categorization, and many more. Let’s look at two of the most common example use cases of Logistic Regression equation in Machine Learning to help you understand better. 

Example use cases of Logistic Regression Equation

Example 1: Identifying Spam E-mails

Consider the class of 1 if the email is spam and 0 if the email is not. To detect this, multiple attributes are analyzed from the mail body. These include: 

  • The sender
  • Spelling errors
  • Keywords in the email such as “bank details”, “lucky”, “winner”, “congratulations”. 
  • Contact details or URL in the email

This extracted data can then be fed into the Logistic Regression equation for Machine Learning which will analyze all inputs and deliver a score between 0 and 1. If the score is greater than 0 but less than 0.5, the email will be classified as spam, and if the score is between 0.5 to 1, the mail is marked as non-spam. 

Example 2: Identifying Credit Card Fraud

Using Logistic Regression equations or Logistic Regression-based Machine Learning models, banks can promptly identify fraudulent credit card transactions. For this, details such as PoS, card number, transaction value, transaction data, and the likes are fed into the Logistic Regression model, which decides whether a given transaction is genuine (0) or fraudulent (1). For instance, if the purchase value is too high and deviates from typical values, the regression model assigns a value (between 0.5 and 1) that classifies the transaction as fraud.

Working of Logistic Regression in Machine Learning

Logistic Regression works by using the Sigmoid function to map the predictions to the output probabilities. This function is an S-shaped curve that plots the predicted values between 0 and 1. The values are then plotted towards the margins at the top and the bottom of the Y-axis, using 0 and 1 as the labels. Then, depending on these values, the independent variables can be classified. 

Here’s how the Sigmoid function looks like: 

The Sigmoid function is based on the following equation: 

y=1/(1+e^x) 

Where e^x= the exponential constant with a value of 2.718.

The Sigmoid function’s equation above provides the predicted value (y) as zero if x is considered negative. If x is a large positive number, the value predicted is close to one. 

Building a Logistic Regression Model in Python

Let’s walk through the process of building a Logistic Regression model in Python. For that, let’s use the Social Network dataset to carry out the regression analysis, and let’s try to predict whether or not an individual will purchase a particular car. Here’s how the steps look. 

Step 1: Importing the Libraries and Dataset

It begins by importing the libraries that are needed to build the model. This includes Pandas, Numpy, and Matplotlib. We also need to import the dataset we’ll be working with. Here’s what the code looks like: 

import numpy as np

import matplotlib.pyplot as pt

import pandas as pd

dataset = pd.read_csv(‘Social_Network.csv’)

Step 2: Splitting into Dependent and Independent Variables

Now it’s time to split the fed data into dependent and independent variables. For this example, we will consider the purchase value as the dependent variable during the estimated salary and age of the individuals as independent variables. 

x = dataset.iloc[:, [2,3]].values

y = dataset.iloc[:, 4].values

Step 3: Splitting the Dataset into a Training set and Test set

It is essential to split the dataset into specific training and test sets. The training set will train the Logistic Regression equation, while the test data will be used to validate the model’s training and test it. Sklearn is used to split the given dataset into two sets. We use the train_split_function by specifying the amount of data we wish to set aside for training and testing. 

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.33, random_state = 0)

As you can see, we have defined the test size as 33% of the complete dataset. So, the remaining 66% will be used as the training data. 

Step 4: Scaling

To improve the accuracy of your Logistic Regression model, you’ll need to rescale the data and bring values that might be highly varying in nature. 

from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()

X_train = sc_X.fit_transform(X_train)

X_test = sc_X.transform(X_test)

Step 5: Building the Logistic Regression model

Once that is done, you need to build the Logistic Regression model and fit it into the training set. Begin by importing the Logistic Regression algorithm from Sklearn.

from sklearn.linear_model import LogisticRegression

Then, create an instance classifier to fit the training data.

classifier = LogisticRegression(random_state=0)

classifier.fit(x_train, y_train)

Next, create predictions on the test dataset.

y_pred = classifier.predict(x_test)

Finally, check the performance of your Logistic Regression model using the Confusion matrix. 

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)

acc = accuracy_score(y_test, y_pred)

print(acc)

print(cm)

Now, you can use Matplotlib to visualize the entire data set, including training and test sets! 

In Conclusion

Logistic Regression is one of the tools that help in the development of Machine Learning models and algorithms. Likewise, there are multiple other algorithms, too, that are used depending on the use case at hand. However, to know which algorithm to use, you should be aware of all possible options. Only then will you be in a position to select the most fitting algorithm for your data set. 

Check out our Executive PG Program in Machine Learning designed in a way that takes you from scratch and helps you build your skills to the very top – so that you are in a position to solve any real-world Machine Learning problem. Check out the different courses and enrol in the one that feels right for you. Join upGrad and experience a holistic learning environment and placement support! 

How many kinds of Logistic Regression for Machine Learning are possible?

Logistic Regression is broadly of three types:
1. Binary
2. Multinomial
3. Ordinal.

What is Logistic Regression used for in Machine Learning?

Logistic Regression is one of the supervised learning methods used to find and build the best fit relationship between dependent and independent variables to make proper future predictions.

What is the function that Logistic Regression for Machine Learning uses?

Logistic Regression for Machine Learning uses the Sigmoid function to find the best fit curve.

Enhance Your Career in Machine Learning and Artificial Intelligence

0 replies on “Logistic Regression for Machine Learning: A Complete Guide”

Accelerate Your Career with upGrad

Our Best Artificial Intelligence Course

×