All those who have ever worked with data or statistics know one thing for sure: correlation does not necessarily mean or imply causation. Now, while this may sound pretty obvious, it might shock you to learn that most errors in data happen because of the confusion between the two terms. This is primarily because while it is convenient to define correlation, it is almost impossible to define or quantify causation.
Top Machine Learning and AI Courses Online
In fact, Judea Pearl, author of Causality: Models, Reasoning, and Inference, states in the book that humans focus their mathematical efforts on probabilistic and statistical inferences, leaving causal considerations “to the mercy of intuition and good judgement.” He says that this is a major factor that we’re still greatly behind in terms of scientific progress.
This is when Bayesian Networks make it easy for us. They help us distinguish correlation from causation by allowing us to see various independent causes at once. All this is done accurately as machine learning algorithms do not work on subjectivity or intuition; they work on data.
Trending Machine Learning Skills
Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.
Let’s see an example to understand how Bayesian Networks operate.
Example of Bayesian Networks
For the sake of this example, let us suppose that the world is stricken by an extremely rare yet fatal disease; say there is a 1 in 1000 chance that you are infected by the disease.
Now, to figure whether someone is suffering from the disease, doctors develop a test. The catch is it is only 99% accurate.
How will you know for sure whether you have the disease or not? Will taking another test affect the results?
Let’s see what happens when you conduct…
Test 1
As the disease affects only 1 in a 1000, the probability of you being infected is:
Infected | 0.001 |
Free | 0.999 |
Disease CPT (Conditional Probability Table)
Clearly, just as 1 in 1000 has a chance of suffering from the disease, 999 in 1000 are free from it.
Similarly, we will create a table to calculate the probability of the test. As mentioned before, if the test in only 99% accurate. That means that there is only a 99% chance that the result is true. Similar is the case with negative results.
Virus Presence | Infected | Free |
Test 1 (Positive) | 0.99 | 0.01 |
Test 1 (Negative) | 0.01 | 0.99 |
Test1 CPT (Conditional Probability Table)
Now, let’s plot a graph to see how the presence of the disease is affected by the test results.
Filling these cells with the results of the test will give me the following result.
As you can see, if the test comes out to be positive, there is only a 9% chance that you are suffering from the disease.
Now, how did we get this number?
Bayes Theorem!
In our example,
P(H|E) = P(H) x P(E|H) / P(E)
- P(H|E) = P(H) x P(E|H) / {P(E|H) x P(H) + P(E|Hc) x P(Ec)}
- P(H|E) = (0.99 x 0.001) / (0.001 x 0.99 + 0.999 x 0.01) = 0.9 = 9%
What does this tell us?
Even when the test is positive, due to the disease being rare, there is only a 9% chance of having the disease.
So, then, what happens when you take another test to be sure and it, too, turns out to be positive.
Read: Machine Learning Project Ideas for Beginners
Test 2
Again, the second test is also only going to be 99% accurate.
Virus Presence | Infected | Free |
Test 2 (Positive) | 0.99 | 0.01 |
Test 2 (Negative) | 0.01 | 0.99 |
The Bayesian Network now would be:
The results have reversed!
This means that if you get two positive results on two tests, the odds of being infected by the virus increase from 9% to 91%. But again, it doesn’t say 100%!
Now, what if you get one positive and one negative result from the test?
As you can see, there is a 100% chance that you don’t have the disease in case one of the two tests is negative.
Test 3
It gets even better when you conduct three tests and all of them come out to be true.
Clearly, now, there is a 100% chance that you’re infected.
Now let’s see what happens when one of the tests is negative but the other two are positive.
Again, the results are 91% positive for the presence of a virus.
Bayesian Networks and Data Modeling
In the example above, it can be seen that Bayesian Networks play a significant role when it comes to modeling data to deliver accurate results.
In fact, refining the network by including more factors that might affect the result also allows us to visualize and simulate different scenarios using Bayesian Networks.
Bayesian Networks are also a great tool to quantify unfairness in data and curate techniques to decrease this unfairness.
In such cases, it is best to use path-specific techniques to identify sensitive factors that affect the end results.
Top 5 Practical Applications of Bayesian Networks
Bayesian Networks are being widely used in the data science field to get accurate results with uncertain data.
Applications of Bayesian Networks
1. Spam Filter
You must be lying if you say that you’ve never wondered how Gmail filters spam emails (unwanted and unsolicited emails. It uses Bayesian spam filter, which is the most robust filter.
2. Turbo Code
Bayesian Networks are used to create turbo codes that are high-performance forward error correction codes. These are used in 3G and 4G mobile networks.
3. Image Processing
Bayesian Networks use mathematical operations to convert images into digital format. It also allows image enhancement.
4. Biomonitoring
Quantifying the concentration of chemicals couldn’t get any easier than with Bayesian Networks. In this, the amount of blood and tissue in humans is measured using indicators.
5. Gene Regulatory Network (GNR)
A GNR contains various DNA segments of a cell that interact with other cell contents through protein and RNA expression products. The predictions of its behavior can be analyzed using Bayesian Networks.
Popular AI and ML Blogs & Free Courses
Conclusion
In this online blog post, you learned about how Bayesian Networks help us get accurate results from the data at hand. Even the littles variation in data can significantly affect the end result. Bayesian Networks help us analyze data using causation instead of just correlation.
They have proved to be revolutionary in the data science field. Clearly, taking up a career in this science can help you get your dream job. So, enrol in one of our courses in data science and learn from the experts! We also offer free career support from top-notch and experienced career counsellors. Download the brochure to learn more about the course.
If you would like to know more about careers in Machine Learning and Artificial Intelligence, check out IIIT Bangalore and upGrad’s Master of Science in Machine Learning & AI.