Home
Blog
Artificial Intelligence
Getting Started With Negative Binomial Regression: Step by Step Guide

Getting Started With Negative Binomial Regression: Step by Step Guide

Updated on Jul 07, 2025 | 10 min read | 7.34K+ views

Table of Contents

View all

Example of Negative Binomial Regression
Analysis Using the Negative Binomial Regression
Process of Doing Negative Binomial Regression Analysis in Python
Steps to Perform Negative Binomial Regression in Python
Considerations for Negative Binomial Regression
Conclusion

The technique of Negative Binomial Regression is used for carrying out the modeling of count variables. The method is almost similar to the multiple regression method. However, there is the difference that in the case of Negative Binomial Regression, the dependent variable, i.e., Y, follows the negative binomial distribution. Therefore, the values of the variable can be non-negative integers such as 0, 1, 2.

Best Machine Learning and AI Courses Online

Master of Science in Machine Learning & AI from LJMU	Executive Post Graduate Programme in Machine Learning & AI from IIITB	Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland
Advanced Certificate Programme in Machine Learning & NLP from IIITB	Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB	View all Machine Learning Courses

The method is also an extension of the Poisson regression that makes a relaxation in assuming that the mean is equal to the variance. One of the traditional models of binomial regression, defined as “NB2,” is based on the mixed distribution of Poisson-gamma.

The method of the Poisson regression is generalized through the addition of a variable of gamma noise. This variable has a value of mean one and also a scale parameter which is “v.”

Here are a few examples of the Negative Binomial Regression:

The school administrators conducted a study to study the attendance behavior of the high school students from two schools. The factors that might influence the attendance behavior might include the days in which the juniors were absent from school. Also, the program in which they were enrolled.
A researcher from a health-related study carried out a study of how many senior citizens visited a hospital in the last 12 months. The study was based on the individual’s characteristics and the health plans that the senior citizens bought.

IIIT Bangalore

Executive Diploma in Machine Learning and AI

Placement Assistance

Executive PG Program12 Months

Liverpool John Moores University

Master of Science in Machine Learning & AI

Dual Credentials

Master's Degree18 Months

In-demand Machine Learning Skills

Artificial Intelligence Courses	Tableau Courses
NLP Courses	Deep Learning Courses

Get Machine Learning Certification from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Example of Negative Binomial Regression

Suppose there is an attendance sheet of around 314 students from high school. The data is taken from two urban schools and stored in a file named nb_data.dta. The interesting response variable in this example is the absent days which are “daysabs.” One variable, “math,” is present, which defines the math score for every student. There is another variable which is “prog.” This variable indicates the program in which the students are enrolled.

Popular AI Programs

Generative AI Courses AI Leadership Program PG Diploma in AI and ML LLM in Law and Technology from OPJ Masters in AI and ML in India

Source

Each of the variables has around 314 observations. Therefore, the distributions among the variables are also reasonable. Also, considering the outcome variable, the unconditional mean is lower than the variance.

Now, focus on the variable description considered in the dataset. A table tabulates the average days a student was absent from school in every program type. This suggests that the variable type program can predict the days the student was absent from school. You can also use it for predicting the outcome variable. This is because the mean value for the outcome variable varies by the variable prog. Also, the values of the variances are higher than are in each level of the variable prog. These values are called the variances and the means. The existing differences suggest that there is the presence of over-dispersion, and therefore it will be appropriate to use a negative binomial model.

Source

A researcher can consider several analysis methods for this type of study. These methods are described below. A few of the methods of analysis that the user can use for analyzing the regression model are:

1. Negative binomial regression

The method of Negative Binomial Regression is to be used when there is overdispersed data. This means that the value of conditional variance is higher or exceeds the value of the conditional mean. The method is considered to be generalized from the Poisson regression method. This is because both the methods have the same structure of the mean. But, there is an additional parameter in the Negative binomial regression used to model the overdispersion. The confidence intervals are considered narrower than passion regression when the conditional distribution is over-dispersed from the outcome variable.

2. Poisson regression

The method of Poisson regression is used in the modeling of the count data. Many extensions can be used for modeling the count variables in the Poisson regression.

3. OLS regression

The outcomes of the count variables are log-transformed sometimes and then analyzed through the method of OLS regression. However, there are sometimes issues related to the method of OLS regression. These issues might be the data loss due to the generation of any undefined value through consideration of the log of the value zero. Also, it might be generated due to the lack of modeling the dispersed data.

4. Zero-inflated models

These types of models try to account for all the excess zeros in the model. The zero inflated negative binomial regression is usually applicable for overdispersed count outcome variables.

Analysis Using the Negative Binomial Regression

The command “nbreg” is used for estimating the model of Negative Binomial Regression. There is an “i” before the variable “prog.” The presence of “i” indicates that the variable is of type factor, i.e., categorical variable. These should be included as indicator variables in the model.

The output of the model begins with an iteration log. It starts through the fitting of the model of Poisson, followed by a null model, and then the model of the negative binomial. The method uses the estimate of maximum likelihood and keeps on iterating until there is a change in the value of the final log. The likelihood of the log is used for the comparison of the models.
The next information is in the header file.
There is the information of coefficients of Negative Binomial Regression just below the header. The coefficients are generated for every variable along with the errors such as the p-values, z-scores. There is also a confidence interval of 95% for all the coefficients. The coefficient for the “math” variable is -0.006, which denotes that it is statistically significant. The result means that if there is an increase in one unit on the variable “math,” the expected log count for the absent number of days decreases by a value of 0.006. Also, the value of the 2. prog, the indicator variable, is the difference expected in the count of log between the two groups ( group 2 and reference group).
The parameter estimation for the log transferred over-dispersion is done and then displayed with the untransformed value. In the Poisson model, the value is zero.
There is a ratio test likelihood information below the coefficients table. The model can be further understood through the use of the commands “margins.”

Process of Doing Negative Binomial Regression Analysis in Python

The required packages for carrying out the regression process are required to be imported from Python. These packages are listed below:

import statsmodels.api as sm
import matplotlib.pyplot as plt
import numpy as np
from patsy import dmatrices
import pandas as pd

Steps to Perform Negative Binomial Regression in Python

You will have to follow these steps to perform negative binomial regression in Python:

Step 1: Testing the Poisson regression method on the training data set

You will have to begin by setting up the regression expression. To prove that BB COUNT is the dependent value, you can use regression variables like DAY, MONTH, DAY OF WEEK, LOW T, HIGH T, and PRECIP.

expr = “””BB COUNT DAY + DAY OF WEEK + MONTH + HIGH T + LOW T + PRECIP””” expr = “””BB COUNT DAY + DAY OF WEEK + MONTH + HIGH T + LOW T + PRECIP”””

Organize the training and testing data sets’ x and y matrices with the help of Patsy.

dmatrices(expr, df train, return type=’dataframe’), y train, X train = dmatrices(expr, df train, return type=’dataframe’)

dmatrices(expr, df test, return type=’dataframe’) = y test, X test

Use the statsmodels GLM class to train the Poisson negative binomial regression model.

sm = poisson training results

family=sm.families. GLM(y train, X train, family=sm.families.

Poisson()).

fit()

This step will help you finish training the regression model.

Step 2: Fitting the auxiliary Ordinary least square regression model and finding α

Start by importing the API package into your project.

In the training set DataFrame, you will have to add the ‘BB LAMBDA’ vector.

Remember that the measurements are (n x 1). You can utilize (161 x 1). The vector is likely to be spotted in Poisson training results.mu:

df train [‘BB LAMBDA’] = poisson training results.mu

Now, add the derived column to the ‘AUX OLS DEP’ Pandas DataFrame. In this new column, you will find the values of the ordinary least square regression’s dependent variable.

df train [‘AUX OLS DEP’] = df train.apply df train. apply df train.apply (lambda x ((x[‘BB COUNT’] – x[‘BB LAMBDA’])**2 – x[‘BB LAMBDA’]) / x[‘BB LAMBDA’], axis=1) – x[‘BB LAMBDA’])

You can now employ Patsy to build the OLSR model specification. The ‘-1’ at the back of the phrase denotes “don’t use a regression intercept.”

“”AUX OLS DEP BB LAMBDA – 1″”” ols expr = “””AUX OLS DEP BB LAMBDA – 1″””

Next, follow this step to fit the OLSR model:

aux_olsr_results = smf.ols(ols_expr, df_train).

fit()

Step 3: Delivering the alpha value determined in the last step

NB 2_training_results = sm.GLM(y_train, X_train,family=sm.families.NegativeBinomial(alpha=aux_olsr_results.params[0])).fit()

Step 4: Make predictions using the trained negative binomial regression2 model

NB 2_predictions = NB 2_training_results.get_prediction(X_test)

The NB 2 model can monitor the bicycle count trends quite minutely.

Step 5: Evaluating the goodness-of-fit of the NB Regression2 model

The training summary of the NB Regression2 model will include three points of relevance for the goodness-of-fit. You should go over each of them individually. The Log-Likelihood value should be the first parameter that you consider.

Considerations for Negative Binomial Regression

There are a few things that should be considered while applying the method of Negative Binomial Regression analysis. These include:

If there is the presence of small samples, then the Negative Binomial Regression method is not recommended.
Sometimes there are excess zeros present which might be a cause for the overdispersion. These zeros might be generated due to the process of adding data generation. If such a type of case occurs, it is recommended to use the method of the zero-inflated model.
If the process of data generation does not consider any zeros, then in such cases, it is recommended to use the method of the zero-truncated model.
There is an exposure variable associated with the count data. The variable denotes the times there is a chance that the event can occur. This variable is necessary to be incorporated into the model of Negative Binomial Regression. This is done through the option of exp().
The outcome variable cannot be any negative value in the model of the Negative Binomial Regression analysis. Also, the exposure variable cannot have the value 0.
The command “glm” can also be used for running a Negative Binomial Regression analysis method. This can be done through the link of the log and also the family of binomials.
The command “glm” is required for obtaining the residuals. This is to check if there are any other assumptions in the model of Negative Binomial Regression.
There is the existence of the various measures of the pseudo-R-squared. However, every measure provides information similar to the information provided by the R-squared in the regression of OLS.

Popular AI and ML Blogs & Free Courses

IoT: History, Present & Future	Machine Learning Tutorial: Learn ML	What is Algorithm? Simple & Easy
Robotics Engineer Salary in India : All Roles	A Day in the Life of a Machine Learning Engineer: What do they do?	What is Information Technology?
Permutation vs Combination: Difference between Permutation and Combination	Learning Artificial Intelligence & Machine Learning - How to Start	Machine Learning with R: Everything You Need to Know
NLP Free Course	Fundamentals of Deep Learning of Neural Networks	Linear Regression: Step by Step Guide
Artificial Intelligence in the Real World	Introduction to Tableau	Case Study using Python, SQL and Tableau

Conclusion

The article discussed the topic of Negative Binomial Regression. We have seen that it is almost similar to the method of multiple regressions and is a generalized form of the Poisson distribution. There are several applications of the method. The technique can also be applied through the python programming language or in R.

Several case studies are also present that show its application in studies such as aging. Also, the classical models of regressions that can be used on the count data are the Poisson Regression, Negative Binomial Regression, and Geometric Regression. These methods belonged to the family of linear models and were included in almost all statistical packages such as the R system.

If you want to excel in machine learning and want to explore the field of data, then you can check the course Executive PG Programme in Machine Learning & AI offered by upGrad. So, if you are a working professional who dreams of being an expert in machine learning, come and gain the experience of getting trained under experts. More details can be achieved through our website. For any queries, our team can assist you promptly.

Pavan Vadapalli

900 articles published

Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology s...

Get Free Consultation

By submitting, I accept the T&C and
Privacy Policy

India’s #1 Tech University

Executive Program in Generative AI for Leaders

76%

seats filled

View Program

Top Resources