Decision Tree is one of the most commonly used, practical approaches for supervised learning. It can be used to solve both Regression and Classification tasks with the latter being put more into practical application. In these trees, the class labels are represented by the leaves and the branches denote the conjunctions of features leading to those class labels. It is widely used in Machine learning algorithms. Typically, a machine learning approach includes controlling many hyperparameters and optimizations.
The regression tree is used when the predicted outcome is a real number and the classification tree is used to predict the class to which the data belongs. These two terms are collectively called as Classification and Regression Trees (CART).
These are non-parametric decision tree learning techniques that provide regression or classification trees, relying on whether the dependent variable is categorical or numerical respectively. This algorithm deploys the method of Gini Index to originate binary splits. Both Gini Index and Gini Impurity are used interchangeably.
Decision trees have influenced regression models in machine learning. While designing the tree, developers set the nodes’ features and the possible attributes of that feature with edges.
The Gini Index or Gini Impurity is calculated by subtracting the sum of the squared probabilities of each class from one. It favours mostly the larger partitions and are very simple to implement. In simple terms, it calculates the probability of a certain randomly selected feature that was classified incorrectly.
The Gini Index varies between 0 and 1, where 0 represents purity of the classification and 1 denotes random distribution of elements among various classes. A Gini Index of 0.5 shows that there is equal distribution of elements across some classes.
Mathematically, The Gini Index is represented by
The Gini Index works on categorical variables and gives the results in terms of “success” or “failure” and hence performs only binary split. It isn’t computationally intensive as its counterpart – Information Gain. From the Gini Index, the value of another parameter named Gini Gain is calculated whose value is maximised with each iteration by the Decision Tree to get the perfect CART
Let us understand the calculation of the Gini Index with a simple example. In this, we have a total of 10 data points with two variables, the reds and the blues. The X and Y axes are numbered with spaces of 100 between each term. From the given example, we shall calculate the Gini Index and the Gini Gain.
For a decision tree, we need to split the dataset into two branches. Consider the following data points with 5 Reds and 5 Blues marked on the X-Y plane. Suppose we make a binary split at X=200, then we will have a perfect split as shown below.
It is seen that the split is correctly performed and we are left with two branches each with 5 reds (left branch) and 5 blues (right branch).
But what will be the outcome if we make the split at X=250?
We are left with two branches, the left branch consisting of 5 reds and 1 blue, while the right branch consists of 4 blues. The following is referred to as an imperfect split. In training the Decision Tree model, to quantify the amount of imperfectness of the split, we can use the Gini Index.
To calculate the Gini Impurity, let us first understand it’s basic mechanism.
- First, we shall randomly pick up any data point from the dataset
- Then, we will classify it randomly according to the class distribution in the given dataset. In our dataset, we shall give a data point chosen with a probability of 5/10 for red and 5/10 for blue as there are five data points of each colour and hence the probability.
Now, in order to calculate the Gini Index, the formula is given by
Where, C is the total number of classes and p(i) is the probability of picking the data point with the class i.
In the above example, we have C=2 and p(1) = p(2) = 0.5, Hence the Gini Index can be calculated as,
G =p(1) ∗ (1−p(1)) + p(2) ∗ (1−p(2))
=0.5 ∗ (1−0.5) + 0.5 ∗ (1−0.5)
Where 0.5 is the total probability of classifying a data point imperfectly and hence is exactly 50%.
Now, let us calculate the Gini Impurity for both the perfect and imperfect split that we performed earlier,
The left branch has only reds and hence its Gini Impurity is,
G(left) =1 ∗ (1−1) + 0 ∗ (1−0) = 0
The right branch also has only blues and hence its Gini Impurity is also given by,
G(right) =1 ∗ (1−1) + 0 ∗ (1−0) = 0
From the quick calculation, we see that both the left and right branches of our perfect split have probabilities of 0 and hence is indeed perfect. A Gini Impurity of 0 is the lowest and the best possible impurity for any data set.
In this case, the left branch has 5 reds and 1 blue. Its Gini Impurity can be given by,
G(left) =1/6 ∗ (1−1/6) + 5/6 ∗ (1−5/6) = 0.278
The right branch has all blues and hence as calculated above its Gini Impurity is given by,
G(right) =1 ∗ (1−1) + 0 ∗ (1−0) = 0
Now that we have the Gini Impurities of the imperfect split, in order to evaluate the quality or extent of the split, we will give a specific weight to the impurity of each branch with the number of elements it has.
(0.6∗0.278) + (0.4∗0) = 0.167
Now that we have calculated the Gini Index, we shall calculate the value of another parameter, Gini Gain and analyse its application in Decision Trees. The amount of impurity removed with this split is calculated by deducting the above value with the Gini Index for the entire dataset (0.5)
0.5 – 0.167 = 0.333
This value calculated is called as the “Gini Gain”. In simple terms, Higher Gini Gain = Better Split.
Hence, in a Decision Tree algorithm, the best split is obtained by maximizing the Gini Gain, which is calculated in the above manner with each iteration.
After calculating the Gini Gain for each attribute in the data set, the class, sklearn.tree.DecisionTreeClassifier will choose the largest Gini Gain as the Root Node. When a branch with Gini of 0 is encountered it becomes the leaf node and the other branches with Gini more than 0 need further splitting. These nodes are grown recursively till all of them are classified.
Use in Machine Learning
There are various algorithms designed for different purposes in the world of machine learning. The problem lies in identifying which algorithm to suit best on a given dataset. The decision tree algorithm seems to show convincing results too. To recognize it, one must think that decision trees somewhat mimic human subjective power.
So, a problem with more human cognitive questioning is likely to be more suited for decision trees. The underlying concept of decision trees can be easily understandable for its tree-like structure.
An alternative to the Gini Index is the Information Entropy which used to determine which attribute gives us the maximum information about a class. It is based on the concept of entropy, which is the degree of impurity or uncertainty. It aims to decrease the level of entropy from the root nodes to the leaf nodes of the decision tree.
In this way, the Gini Index is used by the CART algorithms to optimise the decision trees and create decision points for classification trees.
If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.