Machine learning applications have been increasing with wide applicability in research, social media, advertising, etc. However, the applications mostly deal with the prediction that involves a huge amount of data. Statistics are often used for the quantification of the measurement of values of uncertainty. If we have different events, then three approaches can determine the probability of the event.
These three methods are:
Let us consider an example of a dice being rolled to find the probability of whether it will show the face of “four.” It will help in the understanding of the three types of methods of determining probability. Suppose you consider the classical method of probability estimation. In that case, it will be believed that there will be a total of six outcomes, and the probability of any outcome occurring will be the same. In such an assumption, the probability that the outcome will be four will be 1/6. The classical method usually works fine when outcomes have equally likely results. But when the outcomes become more subjective, this method cannot be used.
If we consider the Frequentist method, it is required that there is an infinite sequence of an event that is hypothetical. It then requires the search of relevant frequency in the infinite hypothetical sequence. Considering the above example of dice, if the dice are rolled an infinite number of times, the outcome, i.e., 1/6, we can get the outcome as four. Therefore, the probability that the outcome will be four in the six-sided dice will be 1/6 as per the definition of the frequentist method.
Now coming towards the Bayesian approach, it provides you with some advantages. As per the perspective of this method, you can incorporate a personal belief in the process of decision-making. That means it will consider the things such as the information known regarding the problem. The fact that different individuals can have different beliefs is also considered in this approach. For example, suppose if someone mentions that the probability of rain will be 90% tomorrow, for some other person, the probability of rain might be 60%. Therefore, the method of the Bayesian approach is subjective. However, the results are more intuitive compared to the Frequentist method.
Table of Contents
Bayesian Inference is used mostly for the problem of statistical Inference. In these cases, there is always an unknown quantity (data) that needs to be estimated. And then, from the data, the amount desired is to be estimated. The unknown quantity is referred to as θ. There is an assumption that the θ is a random quantity, and there are some initial guesses for the values of θ. This type of distribution is referred to as prior distribution. The update of the value is usually done through the Bayes rule. Therefore, the approach is referred to as the Bayesian approach.
The application of Bayesian Inference depends on the understanding of the Bayes’ Theorem.
Consider there are two outcomes sets, such as Set A and set B. These sets are also called events. Let us denote the probability for event A as P(A) and event B’s as P(B). These were the probabilities of the events individually. However, a joint probability can be defined through the term P(A, B). The conditional probabilities can be expanded as:
P(A,B) = P(A|B)P(B),
This means that while B is given, the conditional probability of A and B results in the joint probability of the two events.
P(A,B) = P(B|A)P(A)
In both the above equations, the left-hand side of the equations are the same, so the right-hand side of the equations should be equal.
P(A|B)P(B) = P(B|A)P(A)
P(A|B) = P(B|A)P(A)/P(B)
This equation is known as the Bayes’ theorem.
In the field of data science, the Bayes’ Theorem can be written in a way as
P(hypothesis|data) = P(data|hypothesis) P(hypothesis)/p(data)
The denominator, which is the evidence, ensures that the posterior distribution on the left side of the equation is the valid probability density. This is also called a normalizing constant.
There are three components in the equation of the Bayes theorem.
One of the key factors in the Bayesian Inference method is the Prior distribution. Through this, you can incorporate personal beliefs into the process of decision-making. Also, you can incorporate the judgments based on different individuals into the study. This is done through a mathematical expression. An unknown parameter, represented by θ, is used for expressing one’s belief. For expressing these beliefs, a distribution function is used, which is the prior distribution. Therefore, before running any experiment, the distribution is chosen.
Beginners Guide to Bayesian Inference
1. Choosing the prior
A cumulative distribution is usually defined for the parameter θ. Those events with the value of prior probability as zero will have the value of posterior probability as zero. And for those events which have the value of prior probability, one will have the value of posterior probability as one. Therefore, a good framework of the Bayesian approach will not define some point estimates to those events that already occurred, or there is no information of its occurrence. There are certain techniques for choosing the prior. One technique that is widely used for choosing the prior is through the use of distribution functions. The family of all the functions is used. These functions should be flexible and will be able to represent the beliefs of the individuals.
Let us consider θ as the unknown parameter that is to be estimated. The fairness of a coin can be expressed through θ, considering the Bayesian Inference example. The coin is being flipped infinitely to check its fairness. So, every time while flipping, either there will be the head or a tail. The values that are assigned to the events are 0 and 1. This is also referred to as the Bernoulli trials. All the outcomes are considered independent. This can be expressed through an equation that defines the concept of likelihood. The likelihood is a density function which is a function of θ. For maximizing likelihood, the value of θ should result in the largest likelihood value. The method of estimation is also known as the Maximum likelihood estimate.
3. Posterior distribution
The result of the Bayes theorem is known as the posterior distribution. It is the updated probability of any event that takes place after considering the new information.
4. Bayesian Inference mechanism
As we have seen above, the Bayesian Inference method treats the concept of probability as some degree of belief. These beliefs are associated with the fact that the event might occur under such evidence. Therefore, the parameter theta “θ” is considered to be the random variable.
5. Bayesian Inference application in financial risk
There are a lot of algorithms where Bayesian Inference can be applied. Some of the algorithms are neural networks, random forest, regression, etc. The method has also found popularity in the financial sector. It can be used for the operational risk modeling of several banks. The data of the banks that show the loss of operations shows some events that were lost. These lost events had a low frequency but had a high severity. Therefore, in such cases, the Bayesian Inference proves to be quite useful. This is because, in this method, a lot of data is also not required for the analysis.
Other statistical analysis methods, such as the frequentist methods, were also applied earlier for modeling operational risks. But there was a problem in estimating the uncertainty parameter. Therefore, Bayesian Inference has been considered to be the most effective method. This is because the expert opinions and the data can be used for deriving posterior distributions. In this type of task, the data of internal loss of the banks is broken down into several smaller fragments, and then the frequency of each of the fragments is estimated through expert judgment. This is then fitted into the distributions of the probability.
Join the Machine Learning Course online from the World’s top Universities – Masters, Executive Post Graduate Programs, and Advanced Certificate Program in ML & AI to fast-track your career.
In statistics and machine learning, the two main approaches that can be applied are the methods of Frequentist and Bayesian Inference. We have discussed the Bayesian Inference method in the article, where the probabilities are calculated as subjective beliefs. Along with the data, the personal beliefs of the people are also incorporated while estimating the probabilities. These make the model far more widely accepted in a lot of estimation studies. Therefore, the techniques of Bayesian Inference specify the methods or ways to apply your beliefs to the observation of data. Moreover, in many types of applications with a lot of noisy data, the Bayesian Inference technique can be used. Therefore, the power that lies in the Bayes’ rule can relate to a quantity that can be calculated to the one that can be used to answer questions of arbitrary nature.