Decision Tree Example: Function & Implementation [Step-by-step]

# Decision Tree Example: Function & Implementation [Step-by-step]

Last updated:
28th Dec, 2020
Views
9 Mins
View All

## Introduction

Decision Trees are one of the most powerful and popular algorithms for both regression and classification tasks. They are a flowchart like structure and fall under the category of supervised algorithms. The ability of the decision trees to be visualized like a flowchart enables them to easily mimic the thinking level of humans and this is the reason why these decision trees are easily understood and interpreted.

## Top Machine Learning and AI Courses Online

 Master of Science in Machine Learning & AI from LJMU Executive Post Graduate Programme in Machine Learning & AI from IIITB Advanced Certificate Programme in Machine Learning & NLP from IIITB Advanced Certificate Programme in Machine Learning & Deep Learning from IIITB Executive Post Graduate Program in Data Science & Machine Learning from University of Maryland To Explore all our certification courses on AI & ML, kindly visit our page below. Machine Learning Certification

## What is a Decision Tree?

Decision Trees are a type of tree-structured classifiers. They have three types of nodes which are,

• Root Nodes
• Internal Nodes
• Leaf Nodes

Image Source

The Root nodes are the primary nodes that represent the entire sample which is further split into several other nodes. The Internal nodes represent the test on an attribute while the branches represent the decision of the test. Finally, the leaf nodes denote the class of the label, which is the decision taken after the compilation of all attributes. Learn more about decision tree learning.

## Trending Machine Learning Skills

 AI Courses Tableau Certification Natural Language Processing Deep Learning AI

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

## How do Decision Trees work?

The decision trees are used in classification by sorting them down the entire tree structure from the root node to the leaf node. This approach used by the decision tree is called as the Top-Down approach. Once a particular data point is fed into the decision tree, it is made to pass through each and every node of the tree by answering Yes/No questions till it reaches the particular designated leaf node.

Each node in the decision tree represents a test case for an attribute and each descent (branch) to a new node corresponds to one of the possible answers to that test case. In this way, with multiple iterations, the decision tree predicts a value for the regression task or classifies the object in a classification task.

## Decision Tree Implementation

Now that we have the basics of a decision tree, let us go through on of its execution in Python programming.

### Problem Analysis

In the following example we are going to use the famous “Iris Flower” Dataset. Originally published in 1936 at UCI Machine Learning Repository, (Link: https://archive.ics.uci.edu/ml/datasets/Iris), this small dataset is widely used for testing out machine learning algorithms and visualizations.

In this, there are a total of 150 rows and 5 columns of which 4 columns are the attributes or features and the last column is the type of Iris flower species. Iris is a genus of flowering plants in botany. The four attributes in cm are,

• Sepal Length
• Sepal Width
• Petal Length
• Petal Width

These four features are used to define and classify the type of Iris flower depending upon the size and shape. The 5th or the last column consists of the Iris flower class, which are Iris Setosa, Iris Versicolor and Iris Virginica.

For our problem, we have to build a Machine Learning model utilizing Decision Tree Algorithm to learn the features and classify them based on the Iris flower class.

Let us go through its implementation in python, step by step:

### Step 1: Importing the libraries

The first step in building any machine learning model in Python will be to import the necessary libraries such as Numpy, Pandas and Matplotlib. The tree module is imported from the sklearn library to visualise the Decision Tree model at the end.

### Step 2: Importing the dataset

Once we have imported the Iris dataset, we store the .csv file into a Pandas DataFrame from which we can easily access the columns and rows of the table. The first four columns of the dataframe are the independent variables or the features which are to be understood by the decision tree classifier and are stored into the variable X.

The dependant variable which is the Iris flower class consisting of 3 species is stored into the variable y. The dataset is visualized by printing the first 5 rows.

### Step 3: Splitting the dataset into the Training set and Test set

In the following step, after reading the dataset, we have to split the entire dataset into the training set, using which the classifier model will be trained upon and the test set, on which the trained model will be implemented. The results obtained on the test set will be compared to check for accuracy of the trained model.

Here, we have used a test size of 0.25, which denotes that 25% of the entire dataset will be randomly split as the test set and the remaining 75% will consist of the training set to be used in training the model. Hence, out of 150 datapoints, 38 random datapoints are retained as the test set and the remaining 112 samples are used in the training set.

### Step 4: Training the Decision Tree Classification model on the Training Set

Once the model has been split and is ready for training purpose, the DecisionTreeClassifier module is imported from the sklearn library and the training variables (X_train and y_train) are fitted on the classifier to build the model. During this training process, the classifier undergoes several optimization methods such as the Gradient Descent and Backpropagation and finally builds the Decision Tree Classifier model.

### Step 5: Predicting the Test Set Results

As we have our model ready, shouldn’t we check its accuracy on the test set? This step involves the testing of the model built using decision tree algorithm on the test set that was split earlier. These results are stored in a variable, “y_pred”.

### Step 6: Comparing the Real Values with Predicted Values

This is another simple step, where we will build another simple dataframe which will consist of two columns, the real values of the test set on one side and the predicted values on the other side. This step enables us to compare the results obtained by the model built.

### Step 7: Confusion Matrix and Accuracy

Now that we have both the real and predicted values of the test sets, let us build a simple classification matrix and calculate the accuracy of our model built using simple library functions within sklearn. The accuracy score is calculated by inputting both the real and predicted values of the test set. The model built using the above steps gives us an accuracy of 92.1% which is denoted as 0.92105 in the step below.

The confusion matrix is a table that is used to show the correct and incorrect predictions on a classification problem. For simple usage, the values across the diagonal represent the correct predictions and the other values outside of the diagonal are incorrect predictions.

On calculating the number from 38 test set datapoints we get 35 correct predictions and 3 incorrect predictions, which are reflected as 92% accurate. The accuracy can be improved by optimizing the hyperparameters which can be given as arguments to the classifier before training the model.

### Step 8: Visualizing the Decision Tree Classifier

Finally, in the last step we shall visualize the Decision Tree built. On noticing the root node, it is seen that the number of “samples” are 112, which are in sync with the training set samples split before. The GINI index is calculated during each step of the decision tree algorithm and the 3 classes are split as shown in the “value” parameter in the decision tree.

## Popular AI and ML Blogs & Free Courses

 IoT: History, Present & Future Machine Learning Tutorial: Learn ML What is Algorithm? Simple & Easy Robotics Engineer Salary in India : All Roles A Day in the Life of a Machine Learning Engineer: What do they do? What is IoT (Internet of Things) Permutation vs Combination: Difference between Permutation and Combination Top 7 Trends in Artificial Intelligence & Machine Learning Machine Learning with R: Everything You Need to Know AI & ML Free Courses Introduction to NLP Fundamentals of Deep Learning of Neural Networks Linear Regression: Step by Step Guide Artificial Intelligence in the Real World Introduction to Tableau Case Study using Python, SQL and Tableau

## Conclusion

Hence, in this way, we have understood the concept of Decision Tree algorithm and have built a simple Classifier to solve a classification problem using this algorithm.

If you’re interested to learn more about decision trees, machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select
Select Area of interest
Select Work Experience
By clicking 'Submit' you Agree to

#### Our Popular Machine Learning Course

1What are the cons of using decision trees?

While decision trees help in the classification or sorting of data, their use sometimes creates a few problems too. Often, decision trees lead to the overfitting of data, which further makes the final result highly inaccurate. In case of large datasets, the use of a single decision tree is not recommended because it causes complexity. Also, decision trees are highly unstable, which means that if you cause a small change in the given dataset, the structure of the decision tree changes greatly.

2How does a random forest algorithm work?

A random forest is essentially a collection of diverse decision trees, just like a forest is made up of many trees. The random forest algorithm's outcomes are actually dependent on the decision trees' predictions. The random forest technique also minimizes the likelihood of data over-fitting. To get the required outcome, random forest classification employs an ensemble approach. The training data is used to train various decision trees. When nodes are separated, this dataset contains observations and attributes that will be picked at random.

3How is a decision table different from a decision tree?

A decision table may be produced from a decision tree, but not the other way around. A decision tree is made up of nodes and branches, whereas a decision table is made up of rows and columns. In decision tables, more than one or condition can be inserted. In decision trees, this is not the case. Decision tables are only useful when only a few properties are presented; decision trees, on the other hand, can be used effectively with a large number of properties and sophisticated logic.

## Suggested Blogs

82647
Diving into the world of engineering and data science, I’ve discovered the potential of MATLAB as an indispensable tool. It has accelerated my c

09 Jul 2024

47200
The reliability and quality of your research depend upon several factors such as determination of target audience, the survey of a sample population,

07 Jul 2024

50612
Humans have made several attempts to mimic the biological systems, and one of them is artificial neural networks inspired by the biological neural net

04 Jul 2024

86790
The AI market has witnessed rapid growth on the international level, and it is predicted to show a CAGR of 37.3% from 2023 to 2030. The production sys

03 Jul 2024

113116
In this article, you will learn about AI vs Human Intelligence, Difference Between AI & Human Intelligence. Definition of AI & Human Intelli

01 Jul 2024

89811
Artificial Intelligence or AI career opportunities have escalated recently due to its surging demands in industries. The hype that AI will create tons

26 Jun 2024

70930
As you start learning about supervised learning, it’s important to get acquainted with the concept of decision trees. Decision trees are akin to

24 Jun 2024

51773
Recent advancements have paved the growth of multiple algorithms. These new and blazing algorithms have set the data on fire. They help in handling da