Decision Tree is a part of Supervised Machine Learning in which you explain the input for which the output is in the training data. In Decision trees, data is split multiple times according to the given parameters. It keeps breaking the data into smaller subsets, and simultaneously, the tree is developed incrementally. The tree has two entities, which are decision nodes and leaf nodes.

Table of Contents

**Different Entities of Decision Tree**

**1. Decision Node**

The decision nodes are the ones where the data splits. It usually has two or more branches.

**2. Leaf Nodes**

The leaf nodes represent the outcomes, classification, or decisions of the event. A binary tree for “Eligibility for Miss India Beauty Pageant”:

Let us take an example of a simple binary tree to understand the decision trees. Let us consider that you want to find if a girl is eligible for a beauty pageant contest like Miss India.

The decision node first asks the question if the girl is a resident of India. If yes, is her age between 18 to 25 years old? If yes, she is eligible, else not. If no, does she have valid certificates? If yes, she is eligible, else not. This was a simple yes or no type of problem. The decision trees are classified into two main types:

**Must Read: **Decision Tree in AI

**Decision Tree Classification**

**1. Classification Trees**

The classification trees are the simple yes or no type of trees. It is similar to the example we have seen above, where the outcome had variables like ‘eligible’ or ‘not eligible.’ The decision variable here is Categorical.

**2. Regression Trees**

In regression trees, the outcome variable or the decision is continuous, e.g., a letter like ABC.

Now that you are completely aware of the decision tree and its type, we can get into the depths of it. Decision trees can be constructed using many algorithms; however, ID3 or Iterative Dichotomiser 3 Algorithm is the best one. This is where **decision tree entropy** comes into the frame.

The ID3 algorithm on every iteration goes through an unused attribute of the set and calculates the Entropy H(s) or Information Gain IG(s). Since we are more interested in knowing about **decision tree entropy** in the current article, let us first understand the term Entropy and get it simplified with an example.

**Entropy:** For a finite set S, Entropy, also called Shannon Entropy, is the measure of the amount of randomness or uncertainty in the data. It is denoted by H(S).

In simple terms, it predicts a certain event by measuring the purity. The decision tree is built in a top-down manner and starts with a root node. The data of this root node is further partitioned or classified into subsets that contain homogeneous instances.

For example, consider a plate used in cafes having “we are open” written on the one side and “we are closed” on the other side. The probability of “we are open” is 0.5, and the probability of “we are closed” is 0.5. Since there is no way of determining the outcome in this particular example, the entropy is the highest possible.

Coming to the same example, if the plate only had “we are open” written on both of its sides, then the entropy can be predicted very well since we know already that either keeping on the front side or the backside, we are still going to have “we are open.” In other words, it has no randomness, meaning the entropy is zero. It should be remembered that the lower the value of entropy, the higher the purity of the event, and the higher the value of entropy, the lower the event’s purity.

**Read: **Decision Tree Classification

**Example**

Let us consider that you have 110 balls. 89 out of these are green balls, and 21 are blue. Calculate the entropy for the overall dataset.

Total number of balls (n) = 110

Since we have 89 green balls out of 110, the probability of green would be 80.91% or 89 divided by 110, which gives 0.8091. Further, the probability of green ball multiplied with the log of the probability of green gives 0.2473. Here, it should be remembered that a log of probability will always be a negative number. So, we have to attach a negative sign. This can be expressed simply as:

Now, performing the same steps for the blue balls, we have 21 out of 110. Hence, a blue ball’s probability is 19.09% or 21 divided by 110, which gives 0.1909. Further, on multiplying the probability of blue balls with the log of the blue ball’s probability, we get 0.4561. Again, as instructed above, we will be attaching a negative sign since the log of the probability always gives a negative outcome, which we don’t expect. Expressing this simply:

Now, the **Decision Tree Entrop**y of the overall data is given by the sum of the individual entropy. We need the sum of the product of the probability of green ball and log of the probability of green ball and the product of the probability of blue ball and log of the probability of blue ball.

**Entropy (Overall Data)= 0.2473 + 0.4561 =0.7034**

This was one example to help you understand how the entropy is calculated. Hopefully, it is pretty clear, and you have this concept understood. Calculating the **decision tree entropy** is not rocket science as such.

However, you must be keen while doing the calculations. Being on this page, it is obvious that you are a machine learning enthusiast, and thus, you are expected to know how important the role of every minute detail is. Even the tiniest mistake can cause trouble, and hence, you should always have proper calculations.

**Checkout: **Types of Binary Tree

**Bottom Line**

A decision tree is supervised machine learning that uses various algorithms to construct the decision tree. Among different algorithms, the ID3 algorithm uses Entropy. Entropy is nothing but the measure of the purity of the event.

We know that a career in machine learning has a promising future and a flourishing career. This industry still has a long way to reach its peak, and hence the opportunities for machine learning enthusiasts are growing exponentially with a lot of other advantages. Make your remarkable place in the machine learning industry with the help of the right knowledge and skills.

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

## What is the difference between Entropy and Gini Impurity?

Decision Tree Algorithms are classification methods used to predict possible, reliable solutions. Entropy is calculated in a Decision Tree to optimise it. These subsets that complement the Decision Tree features are chosen to achieve greater purity by calculating Entropy. It determines the purity of the component in the subgroup and splits the input accordingly. Entropy lies between 0 to 1. Gini also measures the data’s impurity to select the most appropriate split. Gini Index or Gini Impurity measures whether a division is incorrect concerning its features. Ideally, all the splits should have the same classification to achieve purity.

## What is Information gain in Decision Trees?

Decision Trees involve a lot of splitting to achieve purity in the subsets. When the purity is highest, the prediction of the decision is the strongest. Information gain is a continuous calculative process of measuring the impurity at each subset before splitting the data further. Information gain uses Entropy to determine this purity. At each subgroup, the ratio of various variables in the subsets determines the amount of information required to choose the subset for splitting further. Information gain will be more balanced in the proportion of variables in the subset, promising more purity.

## What are the disadvantages of a Decision Tree?

The Decision Tree algorithm is the most widely used machine learning mechanism for decision making. Analogous to a tree, it uses nodes to classify data into subsets until the most appropriate decision is made. Decision Trees help predict successful solutions. However, they have their limitations as well. Excessively giant Decision Trees are hard to follow and perceive; this can very well be due to the overfitting of data. If the data set is tweaked in any manner, repercussions in the final decision will follow. Hence, Decision Trees might be complex but can be appropriately executed with training.