Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconDecision Tree Example: Function & Implementation [Step-by-step]

Decision Tree Example: Function & Implementation [Step-by-step]

Last updated:
28th Dec, 2020
Views
Read Time
9 Mins
share image icon
In this article
Chevron in toc
View All
Decision Tree Example: Function & Implementation [Step-by-step]

Introduction

Decision Trees are one of the most powerful and popular algorithms for both regression and classification tasks. They are a flowchart like structure and fall under the category of supervised algorithms. The ability of the decision trees to be visualized like a flowchart enables them to easily mimic the thinking level of humans and this is the reason why these decision trees are easily understood and interpreted. 

Top Machine Learning and AI Courses Online

What is a Decision Tree?

Decision Trees are a type of tree-structured classifiers. They have three types of nodes which are,

  • Root Nodes
  • Internal Nodes
  • Leaf Nodes

Ads of upGrad blog

Image Source

The Root nodes are the primary nodes that represent the entire sample which is further split into several other nodes. The Internal nodes represent the test on an attribute while the branches represent the decision of the test. Finally, the leaf nodes denote the class of the label, which is the decision taken after the compilation of all attributes. Learn more about decision tree learning.

Trending Machine Learning Skills

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

How do Decision Trees work?

The decision trees are used in classification by sorting them down the entire tree structure from the root node to the leaf node. This approach used by the decision tree is called as the Top-Down approach. Once a particular data point is fed into the decision tree, it is made to pass through each and every node of the tree by answering Yes/No questions till it reaches the particular designated leaf node.

Each node in the decision tree represents a test case for an attribute and each descent (branch) to a new node corresponds to one of the possible answers to that test case. In this way, with multiple iterations, the decision tree predicts a value for the regression task or classifies the object in a classification task. 

Decision Tree Implementation

Now that we have the basics of a decision tree, let us go through on of its execution in Python programming.

Problem Analysis 

In the following example we are going to use the famous “Iris Flower” Dataset. Originally published in 1936 at UCI Machine Learning Repository, (Link: https://archive.ics.uci.edu/ml/datasets/Iris), this small dataset is widely used for testing out machine learning algorithms and visualizations.

In this, there are a total of 150 rows and 5 columns of which 4 columns are the attributes or features and the last column is the type of Iris flower species. Iris is a genus of flowering plants in botany. The four attributes in cm are, 

  • Sepal Length
  • Sepal Width
  • Petal Length 
  • Petal Width

These four features are used to define and classify the type of Iris flower depending upon the size and shape. The 5th or the last column consists of the Iris flower class, which are Iris Setosa, Iris Versicolor and Iris Virginica.

For our problem, we have to build a Machine Learning model utilizing Decision Tree Algorithm to learn the features and classify them based on the Iris flower class.

Let us go through its implementation in python, step by step:

Step 1: Importing the libraries

The first step in building any machine learning model in Python will be to import the necessary libraries such as Numpy, Pandas and Matplotlib. The tree module is imported from the sklearn library to visualise the Decision Tree model at the end.

Step 2: Importing the dataset

Once we have imported the Iris dataset, we store the .csv file into a Pandas DataFrame from which we can easily access the columns and rows of the table. The first four columns of the dataframe are the independent variables or the features which are to be understood by the decision tree classifier and are stored into the variable X.

The dependant variable which is the Iris flower class consisting of 3 species is stored into the variable y. The dataset is visualized by printing the first 5 rows.

Also Read: Decision Tree Classification

Step 3: Splitting the dataset into the Training set and Test set

In the following step, after reading the dataset, we have to split the entire dataset into the training set, using which the classifier model will be trained upon and the test set, on which the trained model will be implemented. The results obtained on the test set will be compared to check for accuracy of the trained model.

Here, we have used a test size of 0.25, which denotes that 25% of the entire dataset will be randomly split as the test set and the remaining 75% will consist of the training set to be used in training the model. Hence, out of 150 datapoints, 38 random datapoints are retained as the test set and the remaining 112 samples are used in the training set.

Step 4: Training the Decision Tree Classification model on the Training Set

Once the model has been split and is ready for training purpose, the DecisionTreeClassifier module is imported from the sklearn library and the training variables (X_train and y_train) are fitted on the classifier to build the model. During this training process, the classifier undergoes several optimization methods such as the Gradient Descent and Backpropagation and finally builds the Decision Tree Classifier model.

Step 5: Predicting the Test Set Results

As we have our model ready, shouldn’t we check its accuracy on the test set? This step involves the testing of the model built using decision tree algorithm on the test set that was split earlier. These results are stored in a variable, “y_pred”.

Step 6: Comparing the Real Values with Predicted Values

This is another simple step, where we will build another simple dataframe which will consist of two columns, the real values of the test set on one side and the predicted values on the other side. This step enables us to compare the results obtained by the model built.

Step 7: Confusion Matrix and Accuracy

Now that we have both the real and predicted values of the test sets, let us build a simple classification matrix and calculate the accuracy of our model built using simple library functions within sklearn. The accuracy score is calculated by inputting both the real and predicted values of the test set. The model built using the above steps gives us an accuracy of 92.1% which is denoted as 0.92105 in the step below. 

The confusion matrix is a table that is used to show the correct and incorrect predictions on a classification problem. For simple usage, the values across the diagonal represent the correct predictions and the other values outside of the diagonal are incorrect predictions.

Must Read: Decision Tree Interview Questions & Answers

On calculating the number from 38 test set datapoints we get 35 correct predictions and 3 incorrect predictions, which are reflected as 92% accurate. The accuracy can be improved by optimizing the hyperparameters which can be given as arguments to the classifier before training the model.

Step 8: Visualizing the Decision Tree Classifier

Finally, in the last step we shall visualize the Decision Tree built. On noticing the root node, it is seen that the number of “samples” are 112, which are in sync with the training set samples split before. The GINI index is calculated during each step of the decision tree algorithm and the 3 classes are split as shown in the “value” parameter in the decision tree. 

Ads of upGrad blog

Popular AI and ML Blogs & Free Courses

Conclusion

Hence, in this way, we have understood the concept of Decision Tree algorithm and have built a simple Classifier to solve a classification problem using this algorithm. 

If you’re interested to learn more about decision trees, machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select Coursecaret down icon
Selectcaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Frequently Asked Questions (FAQs)

1What are the cons of using decision trees?

While decision trees help in the classification or sorting of data, their use sometimes creates a few problems too. Often, decision trees lead to the overfitting of data, which further makes the final result highly inaccurate. In case of large datasets, the use of a single decision tree is not recommended because it causes complexity. Also, decision trees are highly unstable, which means that if you cause a small change in the given dataset, the structure of the decision tree changes greatly.

2How does a random forest algorithm work?

A random forest is essentially a collection of diverse decision trees, just like a forest is made up of many trees. The random forest algorithm's outcomes are actually dependent on the decision trees' predictions. The random forest technique also minimizes the likelihood of data over-fitting. To get the required outcome, random forest classification employs an ensemble approach. The training data is used to train various decision trees. When nodes are separated, this dataset contains observations and attributes that will be picked at random.

3How is a decision table different from a decision tree?

A decision table may be produced from a decision tree, but not the other way around. A decision tree is made up of nodes and branches, whereas a decision table is made up of rows and columns. In decision tables, more than one or condition can be inserted. In decision trees, this is not the case. Decision tables are only useful when only a few properties are presented; decision trees, on the other hand, can be used effectively with a large number of properties and sophisticated logic.

Explore Free Courses

Suggested Blogs

Artificial Intelligence course fees
5372
Artificial intelligence (AI) was one of the most used words in 2023, which emphasizes how important and widespread this technology has become. If you
Read More

by venkatesh Rajanala

29 Feb 2024

Artificial Intelligence in Banking 2024: Examples & Challenges
6097
Introduction Millennials and their changing preferences have led to a wide-scale disruption of daily processes in many industries and a simultaneous g
Read More

by Pavan Vadapalli

27 Feb 2024

Top 9 Python Libraries for Machine Learning in 2024
75564
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 Feb 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced
64413
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 Feb 2024

Data Preprocessing in Machine Learning: 7 Easy Steps To Follow
152688
Summary: In this article, you will learn about data preprocessing in Machine Learning: 7 easy steps to follow. Acquire the dataset Import all the cr
Read More

by Kechit Goyal

18 Feb 2024

Artificial Intelligence Salary in India [For Beginners & Experienced] in 2024
908637
Artificial Intelligence (AI) has been one of the hottest buzzwords in the tech sphere for quite some time now. As Data Science is advancing, both AI a
Read More

by upGrad

18 Feb 2024

24 Exciting IoT Project Ideas & Topics For Beginners 2024 [Latest]
759353
Summary: In this article, you will learn the 24 Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Smart Agr
Read More

by Kechit Goyal

18 Feb 2024

Natural Language Processing (NLP) Projects & Topics For Beginners [2023]
107577
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

17 Feb 2024

45+ Interesting Machine Learning Project Ideas For Beginners [2024]
328070
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

16 Feb 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon