Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconBayesian Machine Learning – Exploring A Paradigm Shift In Statistical Data Modelling

Bayesian Machine Learning – Exploring A Paradigm Shift In Statistical Data Modelling

Last updated:
24th Nov, 2020
Views
Read Time
6 Mins
share image icon
In this article
Chevron in toc
View All
Bayesian Machine Learning – Exploring A Paradigm Shift In Statistical Data Modelling

What is Bayesian Machine Learning?

Bayesian Machine Learning (also known as Bayesian ML) is a systematic approach to construct statistical models, based on Bayes’ Theorem.

Any standard machine learning problem includes two primary datasets that need analysis:

  1. A comprehensive set of training data
  2. A collection of all available inputs and all recorded outputs

The traditional approach to analysing this data for modelling is to determine some patterns that can be mapped between these datasets. An analyst will usually splice together a model to determine the mapping between these, and the resultant approach is a very deterministic method to generate predictions for a target variable.

Top Machine Learning and AI Courses Online

Ads of upGrad blog

The only problem is that there is absolutely no way to explain what is happening inside this model with a clear set of definitions. All that is accomplished, essentially, is the minimisation of some loss functions on the training data set – but that hardly qualifies as true modelling.

An ideal (and preferably, lossless) model entails an objective summary of the model’s inherent parameters, supplemented with statistical easter eggs (such as confidence intervals) that can be defined and defended in the language of mathematical probability. This “ideal” scenario is what Bayesian Machine Learning sets out to accomplish.

Trending Machine Learning Skills

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

The Goals (And Magic) Of Bayesian Machine Learning

The primary objective of Bayesian Machine Learning is to estimate the posterior distribution, given the likelihood (a derivative estimate of the training data) and the prior distribution.

When training a regular machine learning model, this is exactly what we end up doing in theory and practice. Analysts are known to perform successive iterations of Maximum Likelihood Estimation on training data, thereby updating the parameters of the model in a way that maximises the probability of seeing the training data, because the model already has prima-facie visibility of the parameters.

It leads to a chicken-and-egg problem, which Bayesian Machine Learning aims to solve beautifully.

Things take an entirely different turn in a given instance where an analyst seeks to maximise the posterior distribution, assuming the training data to be fixed, and thereby determining the probability of any parameter setting that accompanies said data. This process is called Maximum A Posteriori, shortened as MAP. An easier way to grasp this concept is to think about it in terms of the likelihood function. 

Taking Bayes’ Theorem into account, the posterior can be defined as:


In this scenario, we leave the denominator out as a simple anti-redundancy measure. Anything which does not cause dependence on the model can be ignored in the maximisation procedure. This key piece of the puzzle, prior distribution, is what allows Bayesian models to stand out in contrast to their classical MLE-trained counterparts.

Analysts can often make reasonable assumptions about how well-suited a specific parameter configuration is, and this goes a long way in encoding their beliefs about these parameters even before they’ve seen them in real-time. It’s relatively commonplace, for instance, to use a Gaussian prior over the model’s parameters.

The analyst here is assuming that these parameters have been drawn from a normal distribution, with some display of both mean and variance. This sort of distribution features a classic bell-curve shape, consolidating a significant portion of its mass, impressively close to the mean.

On the other hand, occurrences of values towards the tail-end are pretty rare. The use of such a prior, effectively states the belief that a majority of the model’s weights must fit within a defined narrow range, very close to the mean value with only a few exceptional outliers. This is a reasonable belief to pursue, taking real-world phenomena and non-ideal circumstances into consideration.

The effects of a Bayesian model, however, are even more interesting when you observe that the use of these prior distributions (and the MAP process) generates results that are staggeringly similar, if not equal to those resolved by performing MLE in the classical sense, aided with some added regularisation.

It’s very amusing to note that just by constraining the “accepted” model weights with the prior, we end up creating a regulariser.

On the whole, Bayesian Machine Learning is evolving rapidly as a subfield of machine learning, and further development and inroads into the established canon appear to be a rather natural and likely outcome of the current pace of advancements in computational and statistical hardware.

Read: Bayesian Networks

The Different Methods Of Bayesian Machine Learning

There are three largely accepted approaches to Bayesian Machine Learning, namely MAP, MCMC, and the “Gaussian” process. 

Bayesian Machine Learning with MAP: Maximum A Posteriori

MAP enjoys the distinction of being the first step towards true Bayesian Machine Learning. However, it is limited in its ability to compute something as rudimentary as a point estimate, as commonly referred to by experienced statisticians.

The problem with point estimates is that they don’t reveal much about a parameter other than its optimum setting. Analysts and statisticians are often in pursuit of additional, core valuable information, for instance, the probability of a certain parameter’s value falling within this predefined range. After all, that’s where the real predictive power of Bayesian Machine Learning lies.

Must Read: Naive Bayes Explained

Bayesian Machine Learning with MCMC: Markov Chain Monte Carlo

Markov Chain Monte Carlo, also known commonly as MCMC, is a popular and celebrated “umbrella” algorithm, applied through a set of famous subsidiary methods such as Gibbs and Slice Sampling.

And while the mathematics of MCMC is generally considered difficult, it remains equally intriguing and impressive. The culmination of these subsidiary methods, is the construction of a known Markov chain, further settling into a distribution that is equivalent to the posterior. 

Many successive algorithms have opted to improve upon the MCMC method by including gradient information in an attempt to let analysts navigate the parameter space with increased efficiency.

There are simpler ways to achieve this accuracy, however. For instance, there are Bayesian linear and logistic regression equivalents, in which analysts use the Laplace Approximation. An analytical approximation (that can be explained on paper) to the posterior distribution is what sets this process apart.

Ads of upGrad blog

Popular AI and ML Blogs & Free Courses

Bayesian Machine Learning with the Gaussian process

The Gaussian process is a stochastic process, with strict Gaussian conditions being imposed on all the constituent, random variables. They work by determining a probability distribution over the space of all possible lines and then selecting the line that is most likely to be the actual predictor, taking the data into account.

These processes end up allowing analysts to perform regression in function space. Given that the entire posterior distribution is being analytically computed in this method, this is undoubtedly Bayesian estimation at its truest, and therefore both statistically and logically, the most admirable.

If you would like to know more about careers in Machine Learning and Artificial Intelligence, check out IIIT Bangalore and upGrad’s Master of Science in Machine Learning & AI.

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Selectcaret down icon
Select Area of interestcaret down icon
Select Work Experiencecaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Explore Free Courses

Suggested Blogs

15 Interesting MATLAB Project Ideas & Topics For Beginners [2024]
82457
Diving into the world of engineering and data science, I’ve discovered the potential of MATLAB as an indispensable tool. It has accelerated my c
Read More

by Pavan Vadapalli

09 Jul 2024

5 Types of Research Design: Elements and Characteristics
47126
The reliability and quality of your research depend upon several factors such as determination of target audience, the survey of a sample population,
Read More

by Pavan Vadapalli

07 Jul 2024

Biological Neural Network: Importance, Components & Comparison
50612
Humans have made several attempts to mimic the biological systems, and one of them is artificial neural networks inspired by the biological neural net
Read More

by Pavan Vadapalli

04 Jul 2024

Production System in Artificial Intelligence and its Characteristics
86790
The AI market has witnessed rapid growth on the international level, and it is predicted to show a CAGR of 37.3% from 2023 to 2030. The production sys
Read More

by Pavan Vadapalli

03 Jul 2024

AI vs Human Intelligence: Difference Between AI & Human Intelligence
112983
In this article, you will learn about AI vs Human Intelligence, Difference Between AI & Human Intelligence. Definition of AI & Human Intelli
Read More

by Pavan Vadapalli

01 Jul 2024

Career Opportunities in Artificial Intelligence: List of Various Job Roles
89547
Artificial Intelligence or AI career opportunities have escalated recently due to its surging demands in industries. The hype that AI will create tons
Read More

by Pavan Vadapalli

26 Jun 2024

Gini Index for Decision Trees: Mechanism, Perfect & Imperfect Split With Examples
70805
As you start learning about supervised learning, it’s important to get acquainted with the concept of decision trees. Decision trees are akin to
Read More

by MK Gurucharan

24 Jun 2024

Random Forest Vs Decision Tree: Difference Between Random Forest and Decision Tree
51730
Recent advancements have paved the growth of multiple algorithms. These new and blazing algorithms have set the data on fire. They help in handling da
Read More

by Pavan Vadapalli

24 Jun 2024

Basic CNN Architecture: Explaining 5 Layers of Convolutional Neural Network
270717
Introduction In the last few years of the IT industry, there has been a huge demand for once particular skill set known as Deep Learning. Deep Learni
Read More

by MK Gurucharan

21 Jun 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon