# Bayesian Networks: Introduction, Examples and Practical Applications

All those who have ever worked with data or statistics know one thing for sure: correlation does not necessarily mean or imply causation. Now, while this may sound pretty obvious, it might shock you to learn that most errors in data happen because of the confusion between the two terms. This is primarily because while it is convenient to define correlation, it is almost impossible to define or quantify causation.

In fact, Judea Pearl, author of Causality: Models, Reasoning, and Inference, states in the book that humans focus their mathematical efforts on probabilistic and statistical inferences, leaving causal considerations “to the mercy of intuition and good judgement.” He says that this is a major factor that we’re still greatly behind in terms of scientific progress.

This is when Bayesian Networks make it easy for us. They help us distinguish correlation from causation by allowing us to see various independent causes at once. All this is done accurately as machine learning algorithms do not work on subjectivity or intuition; they work on data.

Let’s see an example to understand how Bayesian Networks operate.

## Example of Bayesian Networks

For the sake of this example, let us suppose that the world is stricken by an extremely rare yet fatal disease; say there is a 1 in 1000 chance that you are infected by the disease.

Now, to figure whether someone is suffering from the disease, doctors develop a test. The catch is it is only 99% accurate.

How will you know for sure whether you have the disease or not? Will taking another test affect the results?

Let’s see what happens when you conduct…

Test 1

As the disease affects only 1 in a 1000, the probability of you being infected is:

 Infected 0.001 Free 0.999

Disease CPT (Conditional Probability Table)

Clearly, just as 1 in 1000 has a chance of suffering from the disease, 999 in 1000 are free from it.

Similarly, we will create a table to calculate the probability of the test. As mentioned before, if the test in only 99% accurate. That means that there is only a 99% chance that the result is true. Similar is the case with negative results.

 Virus Presence Infected Free Test 1 (Positive) 0.99 0.01 Test 1 (Negative) 0.01 0.99

Test1 CPT (Conditional Probability Table)

Now, let’s plot a graph to see how the presence of the disease is affected by the test results.

Filling these cells with the results of the test will give me the following result.

Image source

As you can see, if the test comes out to be positive, there is only a 9% chance that you are suffering from the disease.

Now, how did we get this number?

Bayes Theorem!

Image source

In our example,

P(H|E) = P(H) x P(E|H) / P(E)

• P(H|E) = P(H) x P(E|H) / {P(E|H) x P(H) + P(E|Hc) x P(Ec)}
• P(H|E) = (0.99 x 0.001) / (0.001 x 0.99 + 0.999 x 0.01) = 0.9 = 9%

What does this tell us?

Even when the test is positive, due to the disease being rare, there is only a 9% chance of having the disease.

So, then, what happens when you take another test to be sure and it, too, turns out to be positive.

Test 2

Again, the second test is also only going to be 99% accurate.

 Virus Presence Infected Free Test 2 (Positive) 0.99 0.01 Test 2 (Negative) 0.01 0.99

The Bayesian Network now would be:

Image source

The results have reversed!

This means that if you get two positive results on two tests, the odds of being infected by the virus increase from 9% to 91%.  But again, it doesn’t say 100%!

Now, what if you get one positive and one negative result from the test?

Image source

As you can see, there is a 100% chance that you don’t have the disease in case one of the two tests is negative.

Test 3

It gets even better when you conduct three tests and all of them come out to be true.

Image source

Clearly, now, there is a 100% chance that you’re infected.

Now let’s see what happens when one of the tests is negative but the other two are positive.

Image source

Again, the results are 91% positive for the presence of a virus.

## Bayesian Networks and Data Modeling

In the example above, it can be seen that Bayesian Networks play a significant role when it comes to modeling data to deliver accurate results.

In fact, refining the network by including more factors that might affect the result also allows us to visualize and simulate different scenarios using Bayesian Networks.

Bayesian Networks are also a great tool to quantify unfairness in data and curate techniques to decrease this unfairness.

In such cases, it is best to use path-specific techniques to identify sensitive factors that affect the end results.

## Top 5 Practical Applications of Bayesian Networks

Bayesian Networks are being widely used in the data science field to get accurate results with uncertain data.

### Applications of Bayesian Networks

#### 1. Spam Filter

You must be lying if you say that you’ve never wondered how Gmail filters spam emails (unwanted and unsolicited emails. It uses Bayesian spam filter, which is the most robust filter.

#### 2. Turbo Code

Bayesian Networks are used to create turbo codes that are high-performance forward error correction codes. These are used in 3G and 4G mobile networks.

#### 3. Image Processing

Bayesian Networks use mathematical operations to convert images into digital format. It also allows image enhancement.

#### 4. Biomonitoring

Quantifying the concentration of chemicals couldn’t get any easier than with Bayesian Networks. In this, the amount of blood and tissue in humans is measured using indicators.

#### 5. Gene Regulatory Network (GNR)

A GNR contains various DNA segments of a cell that interact with other cell contents through protein and RNA expression products. The predictions of its behavior can be analyzed using Bayesian Networks.

## Conclusion

In this online blog post, you learned about how Bayesian Networks help us get accurate results from the data at hand. Even the littles variation in data can significantly affect the end result. Bayesian Networks help us analyze data using causation instead of just correlation.

They have proved to be revolutionary in the data science field. Clearly, taking up a career in this science can help you get your dream job. So, enrol in one of our courses in data science and learn from the experts! We also offer free career support from top-notch and experienced career counsellors. Download the brochure to learn more about the course.

If you would like to know more about careers in Machine Learning and Artificial Intelligence, check out IIT Madras and upGrad’s Advanced Certification in Machine Learning and Cloud.

## What are the components of a Bayesian network?

Bayesian Networks have their origin in Bayes Theorem, which is named after Thomas Bayes, the famous British mathematician. This theorem is essentially a mathematical formula used to determine conditional probability. Bayesian Networks in the field of artificial intelligence is derived from Bayesian Statistics, which has Bayes Theorem as its foundational layer. A Bayesian Network consists of two modules – conditional probability in the quantitative module and directed acyclic graph in its qualitative module. In AI and machine learning, Bayesian Networks are tools used for reasoning and modeling based on uncertain beliefs.

## How much probability and statistics do you need to know for machine learning?

A considerable part of AI and its different subfields are based on probability and statistics. When it comes to machine learning, you need to consider it more as an interdisciplinary field, which employs probability, statistics, and various algorithms. Statistics and probability are related fields of mathematics used to analyze the relative occurrence of events. This combination of statistics, probability, and algorithms is ultimately used to build intelligent applications that learn from data and also offer valuable insights. So, a basic understanding of statistics and probability is mandatory if you want to learn machine learning. You should be familiar with foundational concepts like empirical and theoretical probability, joint probability, conditional probability, Bayes Theorem, descriptive statistics, univariate and bivariate descriptive statistics, correlation, etc.

## What are the advantages of using Bayesian Networks in AI?

Bayesian Networks are a hugely popular technique for creating models for complex and uncertain domains. Using Bayesian Networks, you can develop a mathematically logical and robust framework for uncertain landscapes like ecosystems and environment management. The most significant advantage of using this technique is that you can easily incorporate data from heterogeneous sources and varying accuracy levels into a mathematically coherent model. This helps combine expert knowledge with data about variables that do not have any data.

×