Understanding the Decision Tree Entropy in Machine Learning

Decision Tree is a part of Supervised Machine Learning in which you explain the input for which the output is in the training data. In Decision trees, data is split multiple times according to the given parameters. It keeps breaking the data into smaller subsets, and simultaneously, the tree is developed incrementally. The tree has two entities, which are decision nodes and leaf nodes.

Different Entities of Decision Tree

1. Decision Node

The decision nodes are the ones where the data splits. It usually has two or more branches.

2. Leaf Nodes

The leaf nodes represent the outcomes, classification, or decisions of the event.                          A binary tree for “Eligibility for Miss India Beauty Pageant”:

Let us take an example of a simple binary tree to understand the decision trees. Let us consider that you want to find if a girl is eligible for a beauty pageant contest like Miss India.

The decision node first asks the question if the girl is a resident of India. If yes, is her age between 18 to 25 years old? If yes, she is eligible, else not. If no, does she have valid certificates? If yes, she is eligible, else not. This was a simple yes or no type of problem. The decision trees are classified into two main types:

Must Read: Decision Tree in AI

Decision Tree Classification

1. Classification Trees

The classification trees are the simple yes or no type of trees. It is similar to the example we have seen above, where the outcome had variables like ‘eligible’ or ‘not eligible.’ The decision variable here is Categorical.

2. Regression Trees

In regression trees, the outcome variable or the decision is continuous, e.g., a letter like ABC.

Now that you are completely aware of the decision tree and its type, we can get into the depths of it. Decision trees can be constructed using many algorithms; however, ID3 or Iterative Dichotomiser 3 Algorithm is the best one. This is where decision tree entropy comes into the frame.

The ID3 algorithm on every iteration goes through an unused attribute of the set and calculates the Entropy H(s) or Information Gain IG(s). Since we are more interested in knowing about decision tree entropy in the current article, let us first understand the term Entropy and get it simplified with an example.

Entropy: For a finite set S, Entropy, also called Shannon Entropy, is the measure of the amount of randomness or uncertainty in the data. It is denoted by H(S).

In simple terms, it predicts a certain event by measuring the purity. The decision tree is built in a top-down manner and starts with a root node. The data of this root node is further partitioned or classified into subsets that contain homogeneous instances.

For example, consider a plate used in cafes having “we are open” written on the one side and “we are closed” on the other side. The probability of “we are open” is 0.5, and the probability of “we are closed” is 0.5. Since there is no way of determining the outcome in this particular example, the entropy is the highest possible.

Coming to the same example, if the plate only had “we are open” written on both of its sides, then the entropy can be predicted very well since we know already that either keeping on the front side or the backside, we are still going to have “we are open.” In other words, it has no randomness, meaning the entropy is zero. It should be remembered that the lower the value of entropy, the higher the purity of the event, and the higher the value of entropy, the lower the event’s purity.

Read: Decision Tree Classification


Let us consider that you have 110 balls. 89 out of these are green balls, and 21 are blue. Calculate the entropy for the overall dataset.

Total number of balls (n) = 110

Since we have 89 green balls out of 110, the probability of green would be 80.91% or 89 divided by 110, which gives 0.8091. Further, the probability of green ball multiplied with the log of the probability of green gives 0.2473. Here, it should be remembered that a log of probability will always be a negative number. So, we have to attach a negative sign. This can be expressed simply as:

Now, performing the same steps for the blue balls, we have 21 out of 110. Hence, a blue ball’s probability is 19.09% or 21 divided by 110, which gives 0.1909. Further, on multiplying the probability of blue balls with the log of the blue ball’s probability, we get 0.4561. Again, as instructed above, we will be attaching a negative sign since the log of the probability always gives a negative outcome, which we don’t expect. Expressing this simply:

Now, the Decision Tree Entropy of the overall data is given by the sum of the individual entropy. We need the sum of the product of the probability of green ball and log of the probability of green ball and the product of the probability of blue ball and log of the probability of blue ball.

Entropy (Overall Data)= 0.2473 + 0.4561 =0.7034

This was one example to help you understand how the entropy is calculated. Hopefully, it is pretty clear, and you have this concept understood. Calculating the decision tree entropy is not rocket science as such.

However, you must be keen while doing the calculations. Being on this page, it is obvious that you are a machine learning enthusiast, and thus, you are expected to know how important the role of every minute detail is. Even the tiniest mistake can cause trouble, and hence, you should always have proper calculations.

Checkout: Types of Binary Tree

Bottom Line

A decision tree is supervised machine learning that uses various algorithms to construct the decision tree. Among different algorithms, the ID3 algorithm uses Entropy. Entropy is nothing but the measure of the purity of the event.

We know that a career in machine learning has a promising future and a flourishing career. This industry still has a long way to reach its peak, and hence the opportunities for machine learning enthusiasts are growing exponentially with a lot of other advantages. Make your remarkable place in the machine learning industry with the help of the right knowledge and skills.

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Lead the AI Driven Technological Revolution

Enroll Now @ upGrad

Leave a comment

Your email address will not be published.

Accelerate Your Career with upGrad

Our Popular Machine Learning Course