View All
View All
View All
View All
View All
View All
View All
View All
View All
View All
View All

Multinomial Logistic Regression in Machine Learning: Examples and Applications

By Pavan Vadapalli

Updated on Jun 26, 2025 | 27 min read | 8.49K+ views

Share:

Did you know? The PIANO algorithm accelerates Multinomial Logistic Regression by using parallel methods to speed up parameter estimation, reducing computation time for large datasets. This breakthrough is transforming industries like image processing and bioinformatics!

Multinomial Logistic Regression (MLR) extends binary logistic regression to handle dependent variables with more than two categories. While binary logistic regression is used when the outcome has only two possible categories (such as "yes" or "no"), MLR is applied when the outcome involves more than two categories.

It estimates the probability of each class based on independent variables. For example, predicting a customer’s choice among multiple products or classifying types of diseases based on symptoms. This blog covers MLR's mechanism, applications and Python implementation offering guidance on its effective use in machine learning.

Explore upGrad’s specialized online AI and ML courses to deepen your understanding of multinomial logistic regression and machine learning. Gain hands-on experience and expert mentorship to master advanced techniques and accelerate your career in data science.

Understanding Multinomial Logistic Regression: Key Elements and Examples

Multinomial Logistic Regression models the relationship between multiple independent variables and a categorical dependent variable with more than two possible outcomes. The model computes the probability of each class relative to a reference category, utilizing a logit function.

The following table outlines the core elements of MLR, providing a technical breakdown of how the model works and how each component contributes to the overall prediction process.

Element

Description

Iteration History

Gradually refining the model to improve its predictions through repeated adjustments.

Parameter Coefficients

Numerical values indicate each independent variable's influence on the probability of outcomes.

Asymptotic Covariance & Correlation

Measures how variables interact, identifying relationships between independent variables.

Classification: Observed vs. Predicted

A comparison of predicted versus actual outcomes to assess the model’s accuracy.

Advance your career in AI and ML by mastering real-world techniques like multinomial logistic regression with our top-rated courses:

Practical Example: Predicting Movie Genre Preferences

To illustrate the application of MLR, consider a scenario where a movie studio wants to predict the type of film a moviegoer is likely to watch. The prediction is based on factors like age, gender, and relationship status. In this case:

  • Dependent Variable: Movie genre (e.g., Action, Comedy, Drama)
  • Independent Variables: Age (continuous), Gender (categorical), Relationship Status (categorical)

By applying MLR, the studio can estimate the probability of a customer selecting each genre based on factors like age and gender, enabling targeted marketing. This approach applies to healthcare (predicting disease outcomes) and retail (predicting product choices).

The graph below illustrates movie genre preferences based on age and gender, showing how these factors influence genre selection. This visualization highlights the core functionality of multinomial logistic regression in predictive modeling.
 

Placement Assistance

Executive PG Program12 Months
background

Liverpool John Moores University

Master of Science in Machine Learning & AI

Dual Credentials

Master's Degree18 Months

Also Read: Reinforcement Learning in Machine Learning: How It Works, Key Algorithms, and Challenges

Key Characteristics and Assumptions of Multinomial Logistic Regression

Multinomial Logistic Regression (MLR) extends binary logistic regression to handle multiclass classification problems. It predicts the probability of each outcome relative to a reference category. 

It uses the logit function to model probabilities for categorical outcomes without inherent ordering. MLR is applied across fields like marketing, healthcare, and retail to predict multiple possible outcomes based on continuous and categorical data.

The model’s flexibility makes it powerful for complex decision-making and predictive analysis.

Below are its key characteristics:

1. Handles Multiple Categories Without Assuming an Order

MLR works with categorical outcomes with no natural ranking or order, such as predicting transportation modes (car, bus, train).

2. Uses the Logit Function to Model Probabilities

The model calculates the log odds of each category relative to a reference category, which are then converted into probabilities ranging from 0 to 1. This helps predict outcomes like movie genre preferences, where categories are not ordered.

3. Requires a Reference Category

MLR uses one category as the reference, and the other categories are compared against it to calculate the relative probabilities. This choice does not affect overall predictive accuracy but influences model interpretation.

4. Supports Categorical and Continuous Predictor Variables

MLR can work with both types of predictor variables. For example, it can analyze continuous variable like income and categorical variables like occupation type to predict  customer behavior.

5. Assumes Independence of Irrelevant Alternatives (IIA)

The model assumes that the probability of choosing one category is unaffected by the presence of other alternatives. While this assumption is useful for simpler models, it may not always hold in more complex, real-world situations.

Assumptions Underlying Multinomial Logistic Regression

Multinomial Logistic Regression (MLR) relies on certain assumptions that, if met, lead to more reliable and valid results. Before implementing MLR, ensuring that the data adheres to the following assumptions is essential for accurate results:

Assumption

Description

Dependent and Independent Variables The dependent variable must be categorical with more than two classes, and the independent variables can be continuous, categorical, or ordinal.
Nominal or Ordinal Dependent Variables MLR is suited for nominal variables with categories that have no inherent order. Ordinal logistic regression is preferred for ordinal outcomes.
Independent Variables Can Be Continuous, Ordinal, or Nominal MLR can handle various variable types, providing flexibility in modeling complex real-world scenarios.
Categories Must Be Mutually Exclusive and Exhaustive Each observation must belong to only one category. This ensures accurate predictions and prevents overlapping categories.
No Multicollinearity Among Independent Variables Independent variables should not be highly correlated, as multicollinearity can distort the model’s estimates and make interpretations difficult.
No Significant Outliers Outliers can skew model results, so they must be detected and handled during preprocessing to avoid inaccurate predictions.

Also Read: ML Types Explained: A Complete Guide to Data Types in Machine Learning

Enhance your data analysis skills by learning how to integrate Excel, Power BI, and Generative AI tools with ML operations Enroll in the Gen AI Mastery Certificate for Data Analysis today !

Having covered the assumptions behind multinomial logistic regression, let's now explore how it functions and break down the key mechanics of this technique. 

How Multinomial Logistic Regression Functions?: A Detailed Breakdown

Multinomial Logistic Regression extends binary logistic regression to handle classification tasks with three or more unordered categories. It estimates the probability of each possible outcome relative to a reference category using the logit function, calculating distinct sets of coefficients for each category. 

Unlike linear regression, which assumes a continuous dependent variable, MLR models categorical outcomes by transforming probabilities into log odds, ensuring that the sum of all predicted probabilities equals 1. This approach is particularly suited for multi-class classification problems, offering a robust method for predicting complex categorical outcomes.

The Logit Function and Probability Estimation

Multinomial logistic regression applies the softmax (generalized logit) function to map predicted values to probabilities, ensuring they fall between 0 and 1. The logit function is expressed as:

LnP1-P=β0+β1X1+β2X2+β3X3+....βkXk

Source: The analysis factor

Where:

  • P is the probability of an instance belonging to the category.
  • 1-P is the probability of the reference category.
  • β0 is the intercept.
  • β1, β2, ..., βk​ are the coefficients for predictor variables X1, X2, ..., Xk

This function ensures that:

  • The sum of all predicted probabilities for a given instance equals 1.
  • The probability of a category increases if its exponentiated logit value is larger than others.

Interpreting Log Odds and Odds Ratios:

In multinomial logistic regression, we often use log odds and odds ratios (OR) to interpret the impact of predictor variables on the outcome.

  • Log Odds (log⁡(P/1−P)​) – Measures the likelihood of an outcome relative to the reference category.
  • Odds Ratio (OR = e^β) – Represents the change in odds for a one-unit increase in a predictor variable. An OR > 1 suggests an increased likelihood of an outcome, while OR < 1 suggests a decrease.

The logit function's output is a linear function of predictor variables, making it easier to model complex categorical relationships.

Model Estimation Techniques

Multinomial logistic regression involves estimating multiple coefficients simultaneously, which requires robust iterative optimization methods to determine the best-fitting parameters. 

These techniques aim to minimize the difference between the observed and predicted outcomes, ensuring an accurate model. The most common estimation methods include:

1. Maximum Likelihood Estimation (MLE)

MLE finds parameter estimates that maximize the likelihood of observing the given data. It defines a likelihood function. Assume that P(X|θ) is a likelihood function. Then, for the parameter we wish to infer, θ, the MLE is:

θMLE=arg maxθP(X|θ)⇒θMLE=arg maxθ∏iP(xi|θ)

Source: Sefidian

It would be impractical to compute as taking a product of some integers that are less than one will almost certainly equal zero as the number of those numbers increases to infinity. Because the logarithm increases monotonically, we will instead work in the log space, where maximizing a function is equal to maximizing the log of that function.

θMLE=arg maxθP(X|θ)⇒θMLE=arg maxθ∏iP(xi|θ)=arg maxθ∑ilogP(xi|θ)

Source: Sefidian

To utilize this method, we just need to calculate the model's log-likelihood and then apply our preferred optimization procedure (such as Gradient Descent) to maximize it with respect to θ.

Advantages and Limitations:

Advantages

Limitations

Provides unbiased, efficient parameter estimates for large datasets. Computationally expensive for large datasets due to iterative probability computations.
Strong theoretical foundation, widely used in statistical modeling. Sensitive to multicollinearity, requiring careful feature selection.
Can be applied to a wide range of models, including complex ones. May not perform well with small datasets or noisy data.

Use Case: 

In predictive modeling, MLE is often used to estimate the parameters of regression models, such as in logistic regression, where the goal is to find the parameters that maximize the likelihood of the observed class labels given the feature set. 

MLE can also be applied in areas like time-series forecasting, machine learning models, and even in complex models like Hidden Markov Models (HMMs).

Also Read: 18 Types of Regression in Machine Learning You Should Know [Explained With Examples]

2. Gradient Descent (For Large Datasets)

Gradient Descent is an optimization algorithm used when MLE computations become infeasible due to dataset size. It minimizes the loss function iteratively:

  • Starts with random coefficients.
  • Computes the gradient (derivative) of the log-likelihood function.
  • Update the coefficients in the direction that reduces the error.
Variants of Gradient Descent:
  • Batch Gradient Descent: Updates coefficients using the entire dataset. Stable but slow for large data.
  • Stochastic Gradient Descent (SGD): Updates coefficients using one data point at a time. Faster but noisier.
  • Mini-Batch Gradient Descent: Updates using small subsets of data, balancing speed and stability.

Advantages and Limitations:

Advantages

Limitations

Efficient for large datasets where direct optimization methods are impractical. Sensitive to the choice of learning rate (η\etaη). If too high, it may not converge; if too low, convergence can be slow.
Can be used with different types of optimization problems. Convergence is not guaranteed without tuning and may get stuck in local minima.
Scalable and works well with high-dimensional data. Requires careful selection of batch sizes for stochastic or mini-batch gradient descent.

Use Case:

Gradient Descent is widely used in training neural networks for image classification tasks. For example, in convolutional neural networks (CNNs), which are commonly used for object detection in large image datasets, Gradient Descent allows the model to adjust weights iteratively to minimize the error between predicted and actual class labels. 

3. Iteratively Reweighted Least Squares (IRLS)

IRLS is a hybrid approach combining MLE and Least Squares Regression. It iteratively adjusts weights for observations based on their predicted probabilities.

How It Works:
  • Assigns initial weights to each observation based on expected probabilities.
  • Performs weighted least squares regression to refine coefficients.
  • Repeat the process until coefficients converge.

Advantages and Limitations:

Advantages

Limitations

Efficient for GLMs, providing fast convergence for large datasets. Can be computationally expensive due to the need for multiple iterations.
Works well with models involving categorical outcomes. Sensitive to the starting values of parameters and may require careful initialization.
Often converges quickly compared to other iterative methods like Newton-Raphson. Requires the design matrix to be non-singular.

Use Case:

IRLS is highly effective in medical research for predicting the likelihood of disease occurrence based on various risk factors. For instance, in predicting whether a patient has a particular disease (e.g., diabetes) based on features like age, blood pressure, and BMI, IRLS can be applied in logistic regression models to estimate the parameters of the model iteratively. 

Choosing the Right Estimation Method:

  • Maximum Likelihood Estimation (MLE) is ideal for those working with moderate-sized datasets and seeking statistically robust results. It is suitable for scenarios where precision and accuracy are critical, and the dataset isn’t overwhelmingly large.
  • Gradient Descent should be the go-to method for large-scale machine learning models. If you’re dealing with vast datasets or high-dimensional data, Gradient Descent provides scalability and adaptability, but it requires careful tuning of hyperparameters for optimal performance.
  • Iteratively Reweighted Least Squares (IRLS) is best suited for small to medium-sized datasets and those looking for faster convergence and simplicity in statistical modeling. If computational efficiency and stability are your priorities, IRLS can offer a practical solution, though it may not scale well for more complex problems.

Also Read: Machine Learning vs Neural Networks: Understanding the Key Differences

Now that we understand how MLR works, let’s look at when to use it and explore its practical applications.

When to Use Multinomial Logistic Regression?: Key Insights for Practical Applications and Limitations

Multinomial Logistic Regression (MLR) is essential for modeling classification problems where the dependent variable has more than two unordered categories. It is beneficial when the response variable represents distinct, non-ranked groups. This capability makes it invaluable in healthcare, political science, and behavioral research, where multiple categorical outcomes must be predicted from a set of predictor variables.

The key distinction when using MLR is identifying whether the outcome variable is nominal or ordinal. Since MLR is designed explicitly for nominal outcomes, it's critical to understand this difference to apply the right model. Ordinal logistic regression is the appropriate choice for ordinal variables (those with an inherent order).

It’s important to differentiate between nominal and ordinal variables when using MLR. Here's an in-depth comparison:

Aspect

Nominal Variables

Ordinal Variables

Definition

Categories with no ranking or order

Categories with a meaningful ranking order

Example

Mode of transportation (Car, Bus, Train)

Education levels (High School, Bachelor’s, Master’s, PhD)

Best Model

Multinomial Logistic Regression

Ordinal Logistic Regression

Nominal Variables have no meaningful ranking between categories, such as mode of transportation or product preferences. 

In contrast, Ordinal Variables involve categories with a natural ordering (e.g., satisfaction levels or education levels).

Real-World Applications of Multinomial Logistic Regression

Multinomial Logistic Regression (MLR) is applied across various sectors to predict outcomes with multiple, unordered categories. Below is a table outlining key use cases and the variables involved in each application:

Application Area

Use Case

Predictor Variables

Impact

Urban Transportation

Predicting the mode of transportation for city planning

Home-to-work distance, Household income, Environmental concerns, Fuel prices

Optimizes public transportation policy, infrastructure planning, and sustainability efforts.

E-commerce & Retail

Personalized product recommendations based on consumer preferences

Past purchase behavior, Browsing history, Age group, Geographic location

Enhances targeted marketing, improves inventory management, and personalizes customer experiences.

Healthcare & Medical Diagnosis

Predicting diseases based on patient symptoms

Fever severity, Cough type (dry or wet), Fatigue level, and Shortness of breath

Improves diagnostic accuracy, reduces misdiagnosis, and supports early disease detection and telemedicine.

Political Science

Predicting political party affiliation based on demographic data

Age, Education level, Past voting history, Geographic region

Assists in electoral forecasting, tailors campaign strategies, and helps understand voter behavior.

Brand Loyalty & Marketing

Predicting customer brand preferences for targeted marketing

Price sensitivity, Quality ratings, Marketing exposure, Product reviews

Improves brand positioning, enhances marketing strategies, and boosts customer loyalty.

 

Also Read: A Guide to the Types of AI Algorithms and Their Applications

Start with the fundamentals of Logistic Regression and expand into Multinomial Logistic Regression. Logistic Regression for Beginners is perfect for those who want to understand both univariate and multivariate models, equipping you with the skills to tackle classification problems in machine learning.

Advantages and Limitations of Multinomial Logistic Regression

Multinomial Logistic Regression is a powerful tool for classifying outcomes with more than two categories, making it ideal for problems like customer segmentation, political affiliation, and medical diagnosis. While versatile, MLR requires attention to data quality, sample size, and model assumptions to ensure accurate and actionable insights.

Below is an in-depth look at the key benefits and challenges associated with MLR: 

Advantages

Limitations

Handles Multi-Class Problems Efficiently

Computational Complexity: As the number of categories increases, MLR requires estimating multiple coefficients, which can be computationally expensive, particularly with large datasets.

Interpretable Coefficients

Assumption of Independence of Irrelevant Alternatives (IIA): MLR assumes that the odds of selecting one category are independent of others, which may not hold true in many real-world scenarios, leading to biased predictions.

No Requirement for Ordinal Assumptions

Sensitivity to Multicollinearity: Highly correlated predictors can cause unstable coefficient estimates, leading to poor model performance. Techniques like VIF and PCA can mitigate this.

Probabilistic Predictions

Overfitting in High-Dimensional Data: In datasets with many predictors, MLR may overfit, learning noise rather than meaningful patterns. Regularization methods like Lasso and Ridge help address this.

Extension of Binary Logistic Regression

Need for Large Sample Sizes: MLR requires a sufficiently large sample size to generate reliable and stable predictions, especially as the number of categories and predictors grows.

Also Read: 52+ Must-Know Machine Learning Viva Questions and Interview Questions for 2025

Learn key statistical concepts and techniques essential for understanding Multinomial Logistic Regression with the Basics of Inferential Statistics course. This course covers probability, distributions, and statistical inference, helping you make accurate data-driven decisions essential for building reliable machine learning models.

Now that we've covered when to use MLR, let's explore how to implement it in Python. Next section will guide you through the steps of training and evaluating MLR using scikit-learn for practical applications.

Implementing Multinomial Logistic Regression in Python

Python’s scikit-learn library makes implementing Multinomial Logistic Regression (MLR) straightforward. This section walks you through data preparation, model training, and evaluation. 

With scikit-learn, you can efficiently train the model, predict categorical outcomes, and assess performance. Learn how to preprocess data, split the dataset, fine-tune the model, and evaluate its accuracy using standard metrics. This step-by-step process ensures the practical application of MLR in real-world classification tasks.

Setting Up the Environment

Make sure you have the required libraries installed before beginning to build the model. To install them, use the command below:

pip install numpy pandas scikit-learn matplotlib seaborn

  • NumPy and Pandas are tools for managing datasets and doing numerical calculations.
  • Multinomial logistic regression is one of the machine learning tools offered by scikit-learn.
  • Data visualization tools include Seaborn and Matplotlib.

Preparing the Dataset

First, the dataset is loaded, divided into training and testing sets, and some basic preprocessing is done. Here is a step-by-step guide to understand how to prepare the dataset for multinomial logistic regression:

Step 1: Load the Dataset

First, we import the necessary libraries and load the dataset.

import pandas as pd
from sklearn.model_selection import train_test_split
# Loading the dataset
df = pd.read_csv("dataset.csv")
# Display basic dataset information
print(df.head())

Step 2: Determine the Target Variable and Features

We have the following in multinomial logistic regression:

  • The input characteristics that aid in outcome prediction are known as independent variables (X).
  • The categorical result we wish to forecast is the dependent variable (y).
# Splitting features and target variable
X = df.drop("target", axis=1)  # Independent variables
y = df["target"]  # Dependent variable
# Display dataset dimensions
print("Feature matrix shape:", X.shape)
print("Target variable shape:", y.shape)

Step 3: Splitting Data into Training and Testing Sets

We divided the dataset into 70% training data and 30% testing data to make sure the model generalizes properly.

# Splitting the dataset into 70% training and 30% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Display dataset sizes
print("Training set size:", X_train.shape[0], "samples")
print("Testing set size:", X_test.shape[0], "samples")

Step 4: Check for Missing Values

Prior to model training, handling missing values is essential.

# Check for missing values
print("Missing values:\n", df.isnull().sum())

The number of missing values in each column is displayed by the.isnull().sum() function. Imputation methods (such as substituting the mean, median, or mode) ought to be applied if there are missing values.

Building and Evaluating the Model

Once the dataset is prepared, the next step is to train the multinomial logistic regression model and evaluate its performance. This involves:

  • Setting up and refining the model.
  • Generating forecasts based on test results.
  • Assessing the accuracy of the model.

Step 1: Import Required Libraries

We import the required libraries before beginning to build the model.

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Explanation:

  • The multinomial logistic regression model is constructed and trained using LogisticRegression.
  • The model's accuracy_score indicates how well it guesses the right categories.
  • For every class, classification_report gives the F1-score, recall, and precision.
  • The confusion matrix aids in evaluating the model's ability to distinguish between distinct categories.

Step 2: Set Up the Model and Train It

Next, we build a multinomial logistic regression model and use our dataset to train it.

# Initialize multinomial logistic regression model
model = LogisticRegression(multi_class="multinomial", solver="lbfgs", max_iter=1000)
# Train the model using the training dataset
model.fit(X_train, y_train)

Explanation:

  • The model is guaranteed to handle more than two categories when multi_class="multinomial" is used.
  • The robust optimization algorithm solver="lbfgs" effectively determines the ideal model parameters.
  • max_iter=1000 raises the number of iterations to achieve adequate convergence (default is frequently too low).
  • .fit(X_train, y_train) trains the model using the training dataset.

Step 3: Make Predictions on Test Data

After training, we apply the model to forecast data that hasn't been observed yet.

# Predict target categories for test data
y_pred = model.predict(X_test)
The .predict(X_test) method uses test data to produce predicted categories.

Step 4: Assess the Performance of the Model

It is an important step as it allows you to rank the best model based on the scores.

1. Accuracy Score

Accuracy is a valuable metric, but if the dataset is unbalanced, it might not always represent model performance. We employ several evaluation measures to gauge the model's efficacy. To calculate the percentage of accurate predictions, use accuracy_score(y_test, y_pred). Use the following code:

# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)

2. Classification Report

For every category, classification_report(y_test, y_pred) yields the F1-score, precision, and recall:

  • Precision: The proportion of expected values that turn out to be accurate.
  • Recall: The model's ability to recognize each category.
  • F1-score: Equilibrium recall and precision.

To find the classification report use the following code:

# Generate classification report
print("Classification Report:\n", classification_report(y_test, y_pred))

3. Confusion Matrix

The confusion matrix illustrates how frequently the model incorrectly classifies categories by comparing real and expected values. It helps determine which classes are frequently mistaken for one another. 

Use the following command:

# Compute confusion matrix
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Here is the combined and final code that incorporates all the steps outlined above:

# Step 1: Import Required Libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Step 2: Load the Dataset
df = pd.read_csv("dataset.csv")  # Replace 'dataset.csv' with your dataset path

# Display basic dataset information
print(df.head())

# Step 3: Determine the Target Variable and Features
X = df.drop("target", axis=1)  # Independent variables (features)
y = df["target"]  # Dependent variable (target)

# Display dataset dimensions
print("Feature matrix shape:", X.shape)
print("Target variable shape:", y.shape)

# Step 4: Splitting Data into Training and Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Display dataset sizes
print("Training set size:", X_train.shape[0], "samples")
print("Testing set size:", X_test.shape[0], "samples")

# Step 5: Check for Missing Values
print("Missing values:\n", df.isnull().sum())

# Handle missing values (if any), for example, by imputing the mean:
df.fillna(df.mean(), inplace=True)

# Step 6: Initialize Multinomial Logistic Regression Model
model = LogisticRegression(multi_class="multinomial", solver="lbfgs", max_iter=1000)

# Step 7: Train the Model
model.fit(X_train, y_train)

# Step 8: Make Predictions on Test Data
y_pred = model.predict(X_test)

# Step 9: Assess the Performance of the Model

# 1. Accuracy Score
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)

# 2. Classification Report
print("Classification Report:\n", classification_report(y_test, y_pred))

# 3. Confusion Matrix
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Explanation:

  1. Libraries: We start by importing necessary libraries such as Pandas, scikit-learn for machine learning, and evaluation metrics.
  2. Dataset Loading: The dataset is loaded using pandas.read_csv() and the first few rows are printed using head().
  3. Features and Target Variables: The dataset is split into independent variables (X) and dependent (target) variable (y).
  4. Train-Test Split: The data is divided into a training set (70%) and a testing set (30%) using train_test_split().
  5. Handling Missing Values: We check for missing values using .isnull().sum() and impute them using the mean value if necessary.
  6. Model Setup and Training: We initialize and train the multinomial logistic regression model using LogisticRegression().
  7. Predictions: After training the model, we use .predict() to forecast the target values for the test data.
  8. Evaluation:
    • Accuracy Score: This tells us how many of the predictions were correct.
    • Classification Report: This gives us detailed performance metrics like F1-score, precision, and recall.
    • Confusion Matrix: This matrix helps to visualize how well the model classified each category.

Make sure to replace "dataset.csv" with the actual path to your dataset. Also, if your dataset has missing values or requires other preprocessing steps, adjust the handling accordingly.

Sample Output:

  feature1  feature2  feature3  target

0       1.2       3.4       2.5       0

1       2.3       3.5       1.8       1

2       3.1       4.2       3.6       2

3       2.0       3.1       2.7       0

...

Feature matrix shape: (100, 3)

Target variable shape: (100,)

Training set size: 70 samples

Testing set size: 30 samples

Missing values:

 feature1    0

 feature2    0

 feature3    0

 target      0

dtype: int64

Model Accuracy: 0.85

Classification Report:

               precision    recall  f1-score   support

           0       0.84      0.87      0.85        10

           1       0.88      0.83      0.85        10

           2       0.83      0.88      0.85        10

    accuracy                           0.85        30

   macro avg       0.85      0.86      0.85        30

weighted avg       0.85      0.85      0.85        30

Confusion Matrix:

 [[ 8  2  0]

 [ 1  8  1]

 [ 0  1  9]]

Here’s a breakdown of the sample output:

1. Dataset Information: The feature matrix (X) has 100 rows and 3 columns (features). The target variable (y) contains 100 rows.

2. Missing Values: There are no missing values in the dataset, as indicated by df.isnull().sum() returning zeros for all columns.

3. Accuracy: The model achieved an accuracy of 85%, which means 85% of the predictions on the test set were correct.

4. Classification Report:

  • Precision: The proportion of true positive predictions for each class.
  • Recall: The model’s ability to correctly identify each class.
  • F1-score: The harmonic mean of precision and recall, providing a balance between the two.

The macro avg and weighted avg give an overall performance summary across all classes, with an average F1-score of 0.85.

5. Confusion Matrix: This matrix compares the predicted vs actual values. The diagonal elements represent correct classifications, while the off-diagonal elements show misclassifications.

For example, in the first row, the model correctly predicted 8 instances as class 0, but mistakenly predicted 2 instances as class 1 and 0 instances as class 2.

This output would help in understanding the effectiveness of the model and where improvements might be needed.

Take your software development skills to the next level with a focus on AI and machine learning. The Gen AI Mastery Certificate for Software Development will help you understand advanced modeling techniques like Multinomial Logistic Regression, enabling you to develop robust AI-driven applications.

Also Read: The Data Science Process: Key Steps to Build Data-Driven Solutions

Now that we’ve covered MLR implementation in Python, let’s look at practical examples of applying it to real-world classification problems.

Practical Examples of Multinomial Logistic Regression in Python

Multinomial Logistic Regression (MLR) is essential for handling classification problems where the outcome variable involves multiple unordered categories. 

Below are two real-world applications demonstrating how MLR can be effectively implemented in Python, specifically using scikit-learn, to tackle these complex classification tasks.

Predicting Consumer Product Preferences

Retailers and online businesses study shoppers' buying habits to forecast the product categories that consumers are most likely to pick. By understanding these preferences, businesses can tailor marketing initiatives and personalize product recommendations.

1. Outcome Categories: 

A business can classify its goods into:

  • Electronics (e.g., phones, computers)
  • Clothing (e.g., tops, pants)
  • Home Appliances (e.g., vacuum, refrigerator)
  • Books (e.g., fiction books, non-fiction books, scholarly books)

2. Independent Variables (Predictors): 

For predicting consumer behavior, the model takes into account several factors, including:

  • Demographics: Age, gender, income level, occupation.
  • Shopping Behavior: Browse history, history of purchases, and cart abandonment ratio.
  • Location: Urban vs. rural, regional shopping patterns.
  • Marketing Influence: Impact of targeted advertisements, coupons, and offers.

Code:

In this example, we model an e-commerce market dataset in which we forecast a customer's chosen product category by taking into account their browsing time, age, and income.

# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Step 1: Generate a synthetic dataset for product purchase prediction
np.random.seed(42)
data_size = 300  # Increase dataset size for better training

age = np.random.randint(18, 65, data_size)  # Random ages
income = np.random.randint(30000, 150000, data_size)  # Random income
browsing_time = np.random.randint(5, 60, data_size)  # Time spent on e-commerce site

# Assign product categories based on some patterns
product_categories = np.random.choice(["Electronics", "Clothing", "Books", "Home Appliances"], data_size)

# Create DataFrame
df = pd.DataFrame({
    "Age": age,
    "Income": income,
    "Browsing_Time": browsing_time,
    "Product_Category": product_categories
})

# Encode categorical target variable
label_encoder = LabelEncoder()
df["Product_Category_Encoded"] = label_encoder.fit_transform(df["Product_Category"])

# Define features and target variable
X = df[["Age", "Income", "Browsing_Time"]]
y = df["Product_Category_Encoded"]

# Step 2:Split the data into train and test sets with stratification
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

# Step 3: Standardize the features for better model performance
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Step 4:Train multinomial logistic regression model
model = LogisticRegression(multi_class="multinomial", solver="lbfgs", max_iter=500)
model.fit(X_train_scaled, y_train)

# Step 5: Make predictions
y_pred = model.predict(X_test_scaled)

# Step 6:Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"\nProduct Preferences Model Accuracy: {accuracy:.2f}\n")

# Classification report
print("Product Preferences Classification Report:")
print(classification_report(y_test, y_pred, target_names=label_encoder.classes_))

# Step 7: Visualize confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6, 4))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix for Product Preferences Model")
plt.show()

 

Explanation:

The accuracy of the model is 25%, that is, it correctly predicts the product category only once every four times, just as well as random guessing. Although it fares reasonably for Clothing (71% recall), it fails utterly for Electronics and Home Appliances (0% recall). The low precision and F1 scores indicate poor learning, probably due to imbalanced data, too few features, or inappropriate model settings. To enhance, we require more balanced data, improved features, and perhaps a different model, such as Decision Trees or Neural Networks, for improved predictions.

Application of Multinomial Logistic Regression

The model predicts probabilities for every product category depending on consumer traits. For example, it can forecast:

  • A person with technical expertise has a higher chance of buying electronics.
  • A young urban dweller may have a preference for fashionable clothing.
  • A family with kids may indicate an increased likelihood of purchasing home appliances.

Business Advantages

  • Personalized Recommendations: Enhances customer satisfaction by displaying appropriate products.
  • Optimized Inventory Management: Facilitates companies to keep the appropriate products in inventory based on demand.
  • Targeted Advertising: Minimizes marketing expenses by targeting the most responsive customer segments.

Also Read: Top 48 Machine Learning Projects [2025 Edition] with Source Code

Analyzing Voting Behavior in Elections

Political analysts use multinomial logistic regression to forecast choices among more than one political candidate or party for the voter. It helps in understanding the trend of politics and strategizing campaigns.

1. Outcome Categories:

A voter may pick any one of:

  • Democratic Party
  • Republican Party
  • Independent Candidate
  • Green Party

2. Independent Variables (Predictors):

Votes are influenced by many variables like the below ones: 

  • Demographics: Age, gender, ethnicity, education level
  • Economic Status: Income level, employment status
  • Political Ideology: Liberal, conservative, moderate.
  • Past Voting History: Whether or not the voter voted for the same party in past elections.
  • Geographical Region: Urban versus rural voting.

Code:

# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Step 1: Create a sample dataset for voter behavior prediction
data = {
    "age": [22, 34, 45, 53, 28, 40, 67, 55, 30, 43],
    "education_years": [16, 18, 12, 14, 20, 16, 10, 12, 18, 14],
    "income": [30000, 60000, 55000, 45000, 75000, 50000, 20000, 32000, 70000, 58000],
    "party_affiliation": ["Democrat", "Republican", "Independent", "Democrat", "Green Party",
                          "Republican", "Independent", "Democrat", "Green Party", "Republican"]
}

df = pd.DataFrame(data)

# Step 2: Encode target variable
label_encoder = LabelEncoder()
df["party_encoded"] = label_encoder.fit_transform(df["party_affiliation"])

# Define features and target
X = df.drop(columns=["party_affiliation", "party_encoded"])
y = df["party_encoded"]

# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 3: Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Train multinomial logistic regression model
model = LogisticRegression(multi_class="multinomial", solver="lbfgs", max_iter=500)
model.fit(X_train, y_train)

# Step 4: Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("\nVoting Behavior Model Accuracy:", accuracy)

# Ensure classification report matches test set labels
unique_labels = sorted(set(y_test) | set(y_pred))
print("\nVoting Behavior Classification Report:\n", classification_report(y_test, y_pred, labels=unique_labels, target_names=label_encoder.classes_[:len(unique_labels)]))

Output:

Explanation:

The accuracy of the model is 100%, i.e., it classified all test samples perfectly. Democrat and Green Party labels both have perfect precision, recall, and F1-score (1.00), which means the model accurately predicts voter affiliation. But this doesn't mean the model is trustworthy; the test data is too small (just 2 test samples), and this may result in overfitting. In actual use, we require more diversified data to guarantee the model can generalize well to future unseen data.

Application of Multinomial Logistic Regression:

The model predicts the likelihood of a voter voting for a given party on the basis of their demographic and ideological background.

For instance:

  • A very educated young person in an urban area may tend towards the Democratic Party.
  • A middle-aged person in a rural area may tend to vote for the Republican Party.
  • A voter who thinks independently and does not strongly affiliate with a party might find an Independent appealing.

Political Gains and Insights:

  • Targeting in Campaigns: Allows parties to focus efforts on voters who are likely to be more impacted.
  • Policy Making: Indicates which policies matter most to various groups of voters.
  • Forecasting Elections: Helps election forecasters.

Also Read: Top 10 Highest Paying Machine Learning Jobs in India [A Complete Report]

After exploring practical examples, let’s examine the advantages and limitations of Multinomial Logistic Regression, providing a deeper understanding of when and why to use this technique effectively.

How upGrad Can Help You in Your Machine Learning and MLR Journey?

Multinomial Logistic Regression (MLR) is crucial for handling classification problems with multiple, unordered categories. MLR is highly effective in market analysis, disease classification, and predictive analytics where outcomes fall into numerous categories. Understanding how MLR works, primarily through Python, provides valuable insights into solving complex multiclass classification problems across industries.

However, starting with machine learning, including MLR, can be challenging without proper guidance. upGrad offers specialized data science and machine learning courses, providing the skills, mentorship, and resources needed to excel and advance your career.

Some additional courses include: 

For additional support, you can schedule a free career counseling session with upGrad’s experts or visit one of our offline centers for a personalized experience. Let’s turn your aspirations into achievements.

Expand your expertise with the best resources available. Browse the programs below to find your ideal fit in Best Machine Learning and AI Courses Online.

Discover in-demand Machine Learning skills to expand your expertise. Explore the programs below to find the perfect fit for your goals.

Discover popular AI and ML blogs and free courses to deepen your expertise. Explore the programs below to find your perfect fit.

Reference:
https://arxiv.org/abs/2002.09133 Logistic Regression in Machine Learning - Analytics Vidhya

Frequently Asked Questions (FAQs)

1. How can Multinomial Logistic Regression be used in predictive maintenance for industrial equipment?

2. Can MLR be used to predict customer churn in a subscription-based service?

3. How is MLR applied in medical diagnostics for multi-disease classification?

4. How can MLR be used to predict political party affiliation based on demographic data?

5. How can Multinomial Logistic Regression assist in financial credit scoring?

6. Can MLR be applied in social media analysis to predict user preferences?

7. How does MLR work in predicting customer preferences for different types of insurance products?

8. How can Multinomial Logistic Regression be used to predict traffic patterns in smart cities?

9. How does MLR help in analyzing customer feedback across various product categories?

10. How can MLR be used in education to predict student performance across various subjects?

11. How can MLR be applied in employee satisfaction and retention prediction?

Pavan Vadapalli

900 articles published

Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology s...

Get Free Consultation

+91

By submitting, I accept the T&C and
Privacy Policy

India’s #1 Tech University

Executive Program in Generative AI for Leaders

76%

seats filled

View Program

Top Resources

Recommended Programs

LJMU

Liverpool John Moores University

Master of Science in Machine Learning & AI

Dual Credentials

Master's Degree

18 Months

IIITB
bestseller

IIIT Bangalore

Executive Diploma in Machine Learning and AI

Placement Assistance

Executive PG Program

12 Months

upGrad
new course

upGrad

Advanced Certificate Program in GenerativeAI

Generative AI curriculum

Certification

4 months