Table of Contents
In statistics, Probabilistic models are used to define a relationship between variables and can be used to calculate the probabilities of each variable. In many problems, there are a large number of variables. In such cases, the fully conditional models require a huge amount of data to cover each and every case of the probability functions which may be intractable to calculate in real-time. There have been several attempts to simplify the conditional probability calculations such as the Naïve Bayes but still, it does not prove to be efficient as it drastically cuts down several variables.
The only way is to develop a model that can preserve the conditional dependencies between random variables and conditional independence in other cases. This leads us to the concept of Bayesian Networks. These Bayesian Networks help us to effectively visualize the probabilistic model for each domain and to study the relationship between random variables in the form of a user-friendly graph.
Learn ML Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.
What are Bayesian Networks?
By definition, Bayesian Networks are a type of Probabilistic Graphical Model that uses the Bayesian inferences for probability computations. It represents a set of variables and its conditional probabilities with a Directed Acyclic Graph (DAG). They are primarily suited for considering an event that has occurred and predicting the likelihood that any one of the several possible known causes is the contributing factor.
As mentioned above, by making use of the relationships which are specified by the Bayesian Network, we can obtain the Joint Probability Distribution (JPF) with the conditional probabilities. Each node in the graph represents a random variable and the arc (or directed arrow) represents the relationship between the nodes. They can be either continuous or discrete in nature.
In the above diagram A, B, C and D are 4 random variables represented by nodes given in the network of the graph. To node B, A is its parent node and C is its child node. Node C is independent of Node A.
Before we get into the implementation of a Bayesian Network, there are a few probability basics that have to be understood.
Local Markov Property
The Bayesian Networks satisfy the property known as the Local Markov Property. It states that a node is conditionally independent of its non-descendants, given its parents. In the above example, P(D|A, B) is equal to P(D|A) because D is independent of its non-descendent, B. This property aids us in simplifying the Joint Distribution. The Local Markov Property leads us to the concept of a Markov Random Field which is a random field around a variable that is said to follow Markov properties.
In mathematics, the Conditional Probability of event A is the probability that event A will occur given that another event B has already occurred. In simple terms, p(A | B) is the probability of event A occurring, given that event, B occurs. However, there are two types of event possibilities between A and B. They may be either dependent events or independent events. Depending upon their type, there are two different ways to calculate the conditional probability.
- Given A and B are dependent events, the conditional probability is calculated as P (A| B) = P (A and B) / P (B)
- If A and B are independent events, then the expression for conditional probability is given by, P(A| B) = P (A)
Joint Probability Distribution
Before we get into an example of Bayesian Networks, let us understand the concept of Joint Probability Distribution. Consider 3 variables a1, a2 and a3. By definition, the probabilities of all different possible combinations of a1, a2, and a3 are called its Joint Probability Distribution.
If P[a1,a2, a3,….., an] is the JPD of the following variables from a1 to an, then there are several ways of calculating the Joint Probability Distribution as a combination of various terms such as,
P[a1,a2, a3,….., an] = P[a1 | a2, a3,….., an] * P[a2, a3,….., an]
= P[a1 | a2, a3,….., an] * P[a2 | a3,….., an]….P[an-1|an] * P[an]
Generalizing the above equation, we can write the Joint Probability Distribution as,
P(Xi|Xi-1,………, Xn) = P(Xi |Parents(Xi ))
Example of Bayesian Networks
Let us now understand the mechanism of Bayesian Networks and their advantages with the help of a simple example. In this example, let us imagine that we are given the task of modeling a student’s marks (m) for an exam he has just given. From the given Bayesian Network Graph below, we see that the marks depend upon two other variables. They are,
- Exam Level (e)– This discrete variable denotes the difficulty of the exam and has two values (0 for easy and 1 for difficult)
- IQ Level (i) – This represents the Intelligence Quotient level of the student and is also discrete in nature having two values (0 for low and 1 for high)
Additionally, the IQ level of the student also leads us to another variable, which is the Aptitude Score of the student (s). Now, with marks the student has scored, he can secure admission to a particular university. The probability distribution for getting admitted (a) to a university is also given below.
In the above graph, we see several tables representing the probability distribution values of the given 5 variables. These tables are called the Conditional Probabilities Table or CPT. There are a few properties of the CPT given below –
- The sum of the CPT values in each row must be equal to 1 because all the possible cases for a particular variable are exhaustive (representing all possibilities).
- If a variable that is Boolean in nature has k Boolean parents, then in the CPT it has 2K probability values.
Coming back to our problem, let us first list all the possible events that are occurring in the above-given table.
- Exam Level (e)
- IQ Level (i)
- Aptitude Score (s)
- Marks (m)
- Admission (a)
These five variables are represented in the form of a Directed Acyclic Graph (DAG) in a Bayesian Network format with their Conditional Probability tables. Now, to calculate the Joint Probability Distribution of the 5 variables the formula is given by,
P[a, m, i, e, s]= P(a | m) . P(m | i, e) . P(i) . P(e) . P(s | i)
From the above formula,
- P(a | m) denotes the conditional probability of the student getting admission based on the marks he has scored in the examination.
- P(m | i, e) represents the marks that the student will score given his IQ level and difficulty of the Exam Level.
- P(i) and P(e) represent the probability of the IQ Level and the Exam Level.
- P(s | i) is the conditional probability of the student’s Aptitude Score, given his IQ Level.
With the following probabilities calculated, we can find the Joint Probability Distribution of the entire Bayesian Network.
Calculation of Joint Probability Distribution
Let us now calculate the JPD for two cases.
Case 1: Calculate the probability that in spite of the exam level being difficult, the student having a low IQ level and a low Aptitude Score, manages to pass the exam and secure admission to the university.
From the above word problem statement, the Joint Probability Distribution can be written as below,
P[a=1, m=1, i=0, e=1, s=0]
From the above Conditional Probability tables, the values for the given conditions are fed to the formula and is calculated as below.
P[a=1, m=1, i=0, e=0, s=0] = P(a=1 | m=1) . P(m=1 | i=0, e=1) . P(i=0) . P(e=1) . P(s=0 | i=0)
= 0.1 * 0.1 * 0.8 * 0.3 * 0.75
Case 2: In another case, calculate the probability that the student has a High IQ level and Aptitude Score, the exam being easy yet fails to pass and does not secure admission to the university.
The formula for the JPD is given by
P[a=0, m=0, i=1, e=0, s=1]
P[a=0, m=0, i=1, e=0, s=1]= P(a=0 | m=0) . P(m=0 | i=1, e=0) . P(i=1) . P(e=0) . P(s=1 | i=1)
= 0.6 * 0.5 * 0.2 * 0.7 * 0.6
Hence, in this way, we can make use of Bayesian Networks and Probability tables to calculate the probability for various possible events that occur.
Also Read: Machine Learning Project Ideas & Topics
There are innumerable applications to Bayesian Networks in Spam Filtering, Semantic Search, Information Retrieval, and many more. For example, with a given symptom we can predict the probability of a disease occurring with several other factors contributing to the disease. Thus, the concept of the Bayesian Network is introduced in this article along with its implementation with a real-life example.
If you are curious to master Machine learning and AI, boost your career with an Advanced Course on Machine Learning and AI with IIIT-B & Liverpool John Moores University.
How are Bayesian networks implemented?
A Bayesian network is a graphical model where each of the nodes represent random variables. Each node is connected to other nodes by directed arcs. Each arc represents a conditional probability distribution of the parents given the children. The directed edges represent the influence of a parent on its children. The nodes usually represent some real-world objects and the arcs represent some physical or logical relationship between them. Bayesian networks are used in many applications like automatic speech recognition, document/image classification, medical diagnosis, and robotics.
Why is the Bayesian network important?
As we know, the Bayesian network is an important part of machine learning and statistics. It is used in data mining and scientific discovery. Bayesian network is a directed acyclic graph (DAG) with nodes representing random variables and arcs representing direct influence. Bayesian network is used in various applications like Text analysis, Fraud detection, Cancer detection, Image recognition etc. In this article, we will discuss Reasoning in Bayesian networks. Bayesian Network is an important tool for analyzing the past, predicting the future and improving the quality of decisions. Bayesian Network has its origins in statistics, but it is now being used by all professionals including Research Scientists, Operations Research Analysts, Industrial Engineers, Marketing Professionals, Business Consultants and even Managers.
What is a Sparse Bayesian Network?
A Sparse Bayesian Network (SBN) is a special kind of Bayesian network where the conditional probability distribution is a sparse graph. It might be appropriate to use a SBN when the number of variables is large and/or the number of observations is small. In general, Bayesian Networks are most useful when you are interested in explaining an observation or event by conditioning on a number of factors.