Understanding the Decision Tree Entropy in Machine Learning

# Understanding the Decision Tree Entropy in Machine Learning

Last updated:
29th Dec, 2020
Views
8 Mins
View All

Decision Tree is a part of Supervised Machine Learning in which you explain the input for which the output is in the training data. In Decision trees, data is split multiple times according to the given parameters. It keeps breaking the data into smaller subsets, and simultaneously, the tree is developed incrementally. The tree has two entities, which are decision nodes and leaf nodes.

## Top Machine Learning and AI Courses Online

 Master of Science in Machine Learning & AI from LJMU Executive Post Graduate Programme in Machine Learning & AI from IIITB Advanced Certificate Programme in Machine Learning & NLP from IIITB Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland To Explore all our certification courses on AI & ML, kindly visit our page below. Machine Learning Certification

## Different Entities of Decision Tree

### 1. Decision Node

The decision nodes are the ones where the data splits. It usually has two or more branches.

### 2. Leaf Nodes

The leaf nodes represent the outcomes, classification, or decisions of the event. A binary tree for “Eligibility for Miss India Beauty Pageant”:

Let us take an example of a simple binary tree to understand the decision trees. Let us consider that you want to find if a girl is eligible for a beauty pageant contest like Miss India.

The decision node first asks the question if the girl is a resident of India. If yes, is her age between 18 to 25 years old? If yes, she is eligible, else not. If no, does she have valid certificates? If yes, she is eligible, else not. This was a simple yes or no type of problem. The decision trees are classified into two main types:

Must Read: Decision Tree in AI

## Decision Tree Classification

### 1. Classification Trees

The classification trees are the simple yes or no type of trees. It is similar to the example we have seen above, where the outcome had variables like ‘eligible’ or ‘not eligible.’ The decision variable here is Categorical.

## Trending Machine Learning Skills

 AI Courses Tableau Certification Natural Language Processing Deep Learning AI

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

### 2. Regression Trees

In regression trees, the outcome variable or the decision is continuous, e.g., a letter like ABC.

Now that you are completely aware of the decision tree and its type, we can get into the depths of it. Decision trees can be constructed using many algorithms; however, ID3 or Iterative Dichotomiser 3 Algorithm is the best one. This is where decision tree entropy comes into the frame.

The ID3 algorithm on every iteration goes through an unused attribute of the set and calculates the Entropy H(s) or Information Gain IG(s). Since we are more interested in knowing about decision tree entropy in the current article, let us first understand the term Entropy and get it simplified with an example.

Entropy: For a finite set S, Entropy, also called Shannon Entropy, is the measure of the amount of randomness or uncertainty in the data. It is denoted by H(S).

In simple terms, it predicts a certain event by measuring the purity. The decision tree is built in a top-down manner and starts with a root node. The data of this root node is further partitioned or classified into subsets that contain homogeneous instances.

For example, consider a plate used in cafes having “we are open” written on the one side and “we are closed” on the other side. The probability of “we are open” is 0.5, and the probability of “we are closed” is 0.5. Since there is no way of determining the outcome in this particular example, the entropy is the highest possible.

Coming to the same example, if the plate only had “we are open” written on both of its sides, then the entropy can be predicted very well since we know already that either keeping on the front side or the backside, we are still going to have “we are open.” In other words, it has no randomness, meaning the entropy is zero. It should be remembered that the lower the value of entropy, the higher the purity of the event, and the higher the value of entropy, the lower the event’s purity.

### Example

Let us consider that you have 110 balls. 89 out of these are green balls, and 21 are blue. Calculate the entropy for the overall dataset.

Total number of balls (n) = 110

Since we have 89 green balls out of 110, the probability of green would be 80.91% or 89 divided by 110, which gives 0.8091. Further, the probability of green ball multiplied with the log of the probability of green gives 0.2473. Here, it should be remembered that a log of probability will always be a negative number. So, we have to attach a negative sign. This can be expressed simply as:

Now, performing the same steps for the blue balls, we have 21 out of 110. Hence, a blue ball’s probability is 19.09% or 21 divided by 110, which gives 0.1909. Further, on multiplying the probability of blue balls with the log of the blue ball’s probability, we get 0.4561. Again, as instructed above, we will be attaching a negative sign since the log of the probability always gives a negative outcome, which we don’t expect. Expressing this simply:

Now, the Decision Tree Entropy of the overall data is given by the sum of the individual entropy. We need the sum of the product of the probability of green ball and log of the probability of green ball and the product of the probability of blue ball and log of the probability of blue ball.

Entropy (Overall Data)= 0.2473 + 0.4561 =0.7034

This was one example to help you understand how the entropy is calculated. Hopefully, it is pretty clear, and you have this concept understood. Calculating the decision tree entropy is not rocket science as such.

## Popular AI and ML Blogs & Free Courses

 IoT: History, Present & Future Machine Learning Tutorial: Learn ML What is Algorithm? Simple & Easy Robotics Engineer Salary in India : All Roles A Day in the Life of a Machine Learning Engineer: What do they do? What is IoT (Internet of Things) Permutation vs Combination: Difference between Permutation and Combination Top 7 Trends in Artificial Intelligence & Machine Learning Machine Learning with R: Everything You Need to Know AI & ML Free Courses Introduction to NLP Fundamentals of Deep Learning of Neural Networks Linear Regression: Step by Step Guide Artificial Intelligence in the Real World Introduction to Tableau Case Study using Python, SQL and Tableau

However, you must be keen while doing the calculations. Being on this page, it is obvious that you are a machine learning enthusiast, and thus, you are expected to know how important the role of every minute detail is. Even the tiniest mistake can cause trouble, and hence, you should always have proper calculations.

Checkout: Types of Binary Tree

## Bottom Line

A decision tree is supervised machine learning that uses various algorithms to construct the decision tree. Among different algorithms, the ID3 algorithm uses Entropy. Entropy is nothing but the measure of the purity of the event.

We know that a career in machine learning has a promising future and a flourishing career. This industry still has a long way to reach its peak, and hence the opportunities for machine learning enthusiasts are growing exponentially with a lot of other advantages. Make your remarkable place in the machine learning industry with the help of the right knowledge and skills.

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select
Select Area of interest
Select Work Experience
By clicking 'Submit' you Agree to

#### Our Popular Machine Learning Course

1What is the difference between Entropy and Gini Impurity?

Decision Tree Algorithms are classification methods used to predict possible, reliable solutions. Entropy is calculated in a Decision Tree to optimise it. These subsets that complement the Decision Tree features are chosen to achieve greater purity by calculating Entropy. It determines the purity of the component in the subgroup and splits the input accordingly. Entropy lies between 0 to 1. Gini also measures the data’s impurity to select the most appropriate split. Gini Index or Gini Impurity measures whether a division is incorrect concerning its features. Ideally, all the splits should have the same classification to achieve purity.

2What is Information gain in Decision Trees?

Decision Trees involve a lot of splitting to achieve purity in the subsets. When the purity is highest, the prediction of the decision is the strongest. Information gain is a continuous calculative process of measuring the impurity at each subset before splitting the data further. Information gain uses Entropy to determine this purity. At each subgroup, the ratio of various variables in the subsets determines the amount of information required to choose the subset for splitting further. Information gain will be more balanced in the proportion of variables in the subset, promising more purity.

3What are the disadvantages of a Decision Tree?

The Decision Tree algorithm is the most widely used machine learning mechanism for decision making. Analogous to a tree, it uses nodes to classify data into subsets until the most appropriate decision is made. Decision Trees help predict successful solutions. However, they have their limitations as well. Excessively giant Decision Trees are hard to follow and perceive; this can very well be due to the overfitting of data. If the data set is tweaked in any manner, repercussions in the final decision will follow. Hence, Decision Trees might be complex but can be appropriately executed with training.

## Suggested Blogs

82661
Diving into the world of engineering and data science, I’ve discovered the potential of MATLAB as an indispensable tool. It has accelerated my c

09 Jul 2024

47210
The reliability and quality of your research depend upon several factors such as determination of target audience, the survey of a sample population,

07 Jul 2024

50612
Humans have made several attempts to mimic the biological systems, and one of them is artificial neural networks inspired by the biological neural net

04 Jul 2024

86790
The AI market has witnessed rapid growth on the international level, and it is predicted to show a CAGR of 37.3% from 2023 to 2030. The production sys

03 Jul 2024

113131
In this article, you will learn about AI vs Human Intelligence, Difference Between AI & Human Intelligence. Definition of AI & Human Intelli

01 Jul 2024

89811
Artificial Intelligence or AI career opportunities have escalated recently due to its surging demands in industries. The hype that AI will create tons

26 Jun 2024

70942
As you start learning about supervised learning, it’s important to get acquainted with the concept of decision trees. Decision trees are akin to

24 Jun 2024

51778
Recent advancements have paved the growth of multiple algorithms. These new and blazing algorithms have set the data on fire. They help in handling da