Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconUnderstanding the Decision Tree Entropy in Machine Learning

Understanding the Decision Tree Entropy in Machine Learning

Last updated:
29th Dec, 2020
Views
Read Time
8 Mins
share image icon
In this article
Chevron in toc
View All
Understanding the Decision Tree Entropy in Machine Learning

Decision Tree is a part of Supervised Machine Learning in which you explain the input for which the output is in the training data. In Decision trees, data is split multiple times according to the given parameters. It keeps breaking the data into smaller subsets, and simultaneously, the tree is developed incrementally. The tree has two entities, which are decision nodes and leaf nodes.

Top Machine Learning and AI Courses Online

Different Entities of Decision Tree

1. Decision Node

The decision nodes are the ones where the data splits. It usually has two or more branches.

2. Leaf Nodes

The leaf nodes represent the outcomes, classification, or decisions of the event. A binary tree for “Eligibility for Miss India Beauty Pageant”:

Ads of upGrad blog

Let us take an example of a simple binary tree to understand the decision trees. Let us consider that you want to find if a girl is eligible for a beauty pageant contest like Miss India.

The decision node first asks the question if the girl is a resident of India. If yes, is her age between 18 to 25 years old? If yes, she is eligible, else not. If no, does she have valid certificates? If yes, she is eligible, else not. This was a simple yes or no type of problem. The decision trees are classified into two main types:

Must Read: Decision Tree in AI

Decision Tree Classification

1. Classification Trees

The classification trees are the simple yes or no type of trees. It is similar to the example we have seen above, where the outcome had variables like ‘eligible’ or ‘not eligible.’ The decision variable here is Categorical.

Trending Machine Learning Skills

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

2. Regression Trees

In regression trees, the outcome variable or the decision is continuous, e.g., a letter like ABC.

Now that you are completely aware of the decision tree and its type, we can get into the depths of it. Decision trees can be constructed using many algorithms; however, ID3 or Iterative Dichotomiser 3 Algorithm is the best one. This is where decision tree entropy comes into the frame.

The ID3 algorithm on every iteration goes through an unused attribute of the set and calculates the Entropy H(s) or Information Gain IG(s). Since we are more interested in knowing about decision tree entropy in the current article, let us first understand the term Entropy and get it simplified with an example.

Entropy: For a finite set S, Entropy, also called Shannon Entropy, is the measure of the amount of randomness or uncertainty in the data. It is denoted by H(S).

In simple terms, it predicts a certain event by measuring the purity. The decision tree is built in a top-down manner and starts with a root node. The data of this root node is further partitioned or classified into subsets that contain homogeneous instances.

For example, consider a plate used in cafes having “we are open” written on the one side and “we are closed” on the other side. The probability of “we are open” is 0.5, and the probability of “we are closed” is 0.5. Since there is no way of determining the outcome in this particular example, the entropy is the highest possible.

Coming to the same example, if the plate only had “we are open” written on both of its sides, then the entropy can be predicted very well since we know already that either keeping on the front side or the backside, we are still going to have “we are open.” In other words, it has no randomness, meaning the entropy is zero. It should be remembered that the lower the value of entropy, the higher the purity of the event, and the higher the value of entropy, the lower the event’s purity.

Read: Decision Tree Classification

Example

Let us consider that you have 110 balls. 89 out of these are green balls, and 21 are blue. Calculate the entropy for the overall dataset.

Total number of balls (n) = 110

Since we have 89 green balls out of 110, the probability of green would be 80.91% or 89 divided by 110, which gives 0.8091. Further, the probability of green ball multiplied with the log of the probability of green gives 0.2473. Here, it should be remembered that a log of probability will always be a negative number. So, we have to attach a negative sign. This can be expressed simply as:

Now, performing the same steps for the blue balls, we have 21 out of 110. Hence, a blue ball’s probability is 19.09% or 21 divided by 110, which gives 0.1909. Further, on multiplying the probability of blue balls with the log of the blue ball’s probability, we get 0.4561. Again, as instructed above, we will be attaching a negative sign since the log of the probability always gives a negative outcome, which we don’t expect. Expressing this simply:

Now, the Decision Tree Entropy of the overall data is given by the sum of the individual entropy. We need the sum of the product of the probability of green ball and log of the probability of green ball and the product of the probability of blue ball and log of the probability of blue ball.

Entropy (Overall Data)= 0.2473 + 0.4561 =0.7034

This was one example to help you understand how the entropy is calculated. Hopefully, it is pretty clear, and you have this concept understood. Calculating the decision tree entropy is not rocket science as such.

Popular AI and ML Blogs & Free Courses

However, you must be keen while doing the calculations. Being on this page, it is obvious that you are a machine learning enthusiast, and thus, you are expected to know how important the role of every minute detail is. Even the tiniest mistake can cause trouble, and hence, you should always have proper calculations.

Checkout: Types of Binary Tree

Bottom Line

Ads of upGrad blog

A decision tree is supervised machine learning that uses various algorithms to construct the decision tree. Among different algorithms, the ID3 algorithm uses Entropy. Entropy is nothing but the measure of the purity of the event.

We know that a career in machine learning has a promising future and a flourishing career. This industry still has a long way to reach its peak, and hence the opportunities for machine learning enthusiasts are growing exponentially with a lot of other advantages. Make your remarkable place in the machine learning industry with the help of the right knowledge and skills.

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select Coursecaret down icon
Selectcaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Frequently Asked Questions (FAQs)

1What is the difference between Entropy and Gini Impurity?

Decision Tree Algorithms are classification methods used to predict possible, reliable solutions. Entropy is calculated in a Decision Tree to optimise it. These subsets that complement the Decision Tree features are chosen to achieve greater purity by calculating Entropy. It determines the purity of the component in the subgroup and splits the input accordingly. Entropy lies between 0 to 1. Gini also measures the data’s impurity to select the most appropriate split. Gini Index or Gini Impurity measures whether a division is incorrect concerning its features. Ideally, all the splits should have the same classification to achieve purity.

2What is Information gain in Decision Trees?

Decision Trees involve a lot of splitting to achieve purity in the subsets. When the purity is highest, the prediction of the decision is the strongest. Information gain is a continuous calculative process of measuring the impurity at each subset before splitting the data further. Information gain uses Entropy to determine this purity. At each subgroup, the ratio of various variables in the subsets determines the amount of information required to choose the subset for splitting further. Information gain will be more balanced in the proportion of variables in the subset, promising more purity.

3What are the disadvantages of a Decision Tree?

The Decision Tree algorithm is the most widely used machine learning mechanism for decision making. Analogous to a tree, it uses nodes to classify data into subsets until the most appropriate decision is made. Decision Trees help predict successful solutions. However, they have their limitations as well. Excessively giant Decision Trees are hard to follow and perceive; this can very well be due to the overfitting of data. If the data set is tweaked in any manner, repercussions in the final decision will follow. Hence, Decision Trees might be complex but can be appropriately executed with training.

Explore Free Courses

Suggested Blogs

Artificial Intelligence course fees
5436
Artificial intelligence (AI) was one of the most used words in 2023, which emphasizes how important and widespread this technology has become. If you
Read More

by venkatesh Rajanala

29 Feb 2024

Artificial Intelligence in Banking 2024: Examples & Challenges
6173
Introduction Millennials and their changing preferences have led to a wide-scale disruption of daily processes in many industries and a simultaneous g
Read More

by Pavan Vadapalli

27 Feb 2024

Top 9 Python Libraries for Machine Learning in 2024
75629
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 Feb 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced
64466
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 Feb 2024

Data Preprocessing in Machine Learning: 7 Easy Steps To Follow
152947
Summary: In this article, you will learn about data preprocessing in Machine Learning: 7 easy steps to follow. Acquire the dataset Import all the cr
Read More

by Kechit Goyal

18 Feb 2024

Artificial Intelligence Salary in India [For Beginners & Experienced] in 2024
908752
Artificial Intelligence (AI) has been one of the hottest buzzwords in the tech sphere for quite some time now. As Data Science is advancing, both AI a
Read More

by upGrad

18 Feb 2024

24 Exciting IoT Project Ideas & Topics For Beginners 2024 [Latest]
760295
Summary: In this article, you will learn the 24 Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Smart Agr
Read More

by Kechit Goyal

18 Feb 2024

Natural Language Processing (NLP) Projects & Topics For Beginners [2023]
107732
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

17 Feb 2024

45+ Interesting Machine Learning Project Ideas For Beginners [2024]
328344
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

16 Feb 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon