Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconDecision Tree Classification: Everything You Need to Know

Decision Tree Classification: Everything You Need to Know

Last updated:
29th May, 2020
Views
Read Time
7 Mins
share image icon
In this article
Chevron in toc
View All
Decision Tree Classification: Everything You Need to Know

Introduction

Many analogies could be driven from nature into our real lives; trees happen to be one of the most influential of them. Trees have made their impact on a considerable area of machine learning. They cover both the essential classification and regression. When analyzing any decision, a decision tree classifier could be employed to represent the process of decision making.

So, basically, a decision tree happens to be a part of supervised machine learning where the processing of data happens by splitting the data continuously, all the while keeping in mind a particular parameter.

Best Machine Learning and AI Courses Online

What are decision trees made of?

The answer to the question is straightforward. Decision trees are made of three essential things, the analogy to each one of them could be drawn to a real-life tree. All three of them are listed below:

Ads of upGrad blog
  1. Nodes: This is the place where the testing for the value happens. In a node, the value of a specific attribute is passed, and it’s checked and tested against the values to make a decision.  
  2. Edges/Branches: This portion corresponds to the outcome of any test result. Edges/Branches are also responsible for linking two different nodes or leaves.
  3. Leaf Nodes: These are the nodes that are typically found at the terminals. Leaf nodes are responsible for predicting the outcome.

In-demand Machine Learning Skills

Get Machine Learning Certification from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Decision tree classification

The decision trees can be broadly classified into two categories, namely, Classification trees and Regression trees.

1. Classification trees

Classification trees are those types of decision trees which are based on answering the “Yes” or “No” questions and using this information to come to a decision. So, a tree, which determines whether a person is fit or unfit by asking a bunch of related questions and using the answers to come to a viable solution, is a type of classification tree.

These types of trees are usually constructed by employing a process which is called binary recursive partitioning. The method of binary recursive partitioning involves splitting the data into separate modules or partitions, and then these partitions are further spliced into every branch of the decision tree classifier.

2. Regression Trees

Now, a regression type of decision tree is different from the classification type of decision tree in one aspect. The data that has been fed into the two trees are very different. The classification trees handle the data, which is discreet, while the regression decision trees handle the continuous data type. A good example of regression trees would be the house price or how long a patient will typically stay in the hospital.

Learn more: Linear Regression in Machine Learning

How are the decision trees created?

Decision trees are created by taking the set of data that the model has to be trained on (decision trees are a part of supervised machine learning). This training dataset is to be continuously spliced into smaller data subsets. This process is complemented by the creation of an association tree that incrementally gets created side by side in the process of breaking down the data. After the machine has finished learning, the creation of a decision tree based on the training dataset that has been provided concludes, and this tree is then returned to the user.

The central idea behind using a decision tree is to separate the data into two primary regions, the region with the dense population (cluster) or the area, which are empty (or sparse) regions.

Decision Tree classification works on an elementary principle of the divide. It conquers where any new example which has been fed into the tree, after going through a series of tests, would be organized and given a class label. The algorithm of divide and conquer is discussed in details below:

Divide and conquer

It is apparent that the decision tree classifier is based and built by making use of a heuristic known as recursive partitioning, also known as the divide and conquer algorithm. It breaks down the data into smaller sets and continues to do so. Until it has determined that the data within each subset is homogenous, or if the user has defined another stopping criterion, that would put a stop to this algorithm.

How does the decision tree classifier work?

  1. The divide and conquer algorithm is used to create a decision tree classifier. By making the use of the algorithm we always begin at the root of the tree and we also split the dataset to reduce the uncertainty in the final decision.
  2. It happens to be an iterative process. So, we repeat this process at every node. This process is repeated until the time we don’t have the nodes of the purity we desire.
  3. Generally, to avoid overfitting we set a limit of purity to be achieved. This means the final result might not be 100% pure.

Basics of the divide and conquer algorithm:

  1. First comes choosing or selecting a test for the root node. Then begins the process of creating branches. The branches are designed with keeping in mind each possible outcome of the trial that has been defined.
  2. Next comes the splitting of the instances of data into smaller subsets. Each branch would have its own splice, which is connected to the node.
  3. This process then has to be repeated for each branch by using just the instances which come to the branch in question.
  4. This recursive process should be stopped if all the instances belong to the same class.

Advantages of using decision tree classification

  1. It does not require a tremendous amount of money to construct.
  2. It is a swift process of classification of records that are new or unknown.
  3. It can be very easily interpreted, especially if the tree is small in size.
  4. The accuracy of prediction using the decision tree classifier is comparable to other methods of prediction or classification.
  5. It also has the capability to exclude the features which are unimportant. This process of eliminating irrelevant features is done automatically.

Read: How to create perfect decision tree?

Disadvantages of using the decision tree classifier

  1. Overfitting the dataset is very easy in this case.
  2. The boundary of the decision has a restriction. It can only be parallel to the axes, which contains the attributes.
  3. Models based on decision trees often have biased-on splits that have a massive number of levels.
  4. Any small changes made on the dataset can have a significant impact on the logic which governs the decision.
  5. Lager trees are challenging to understand because sometimes they might feel very counter-intuitive.

Also read: Decision Trees in Machine Learning

Ads of upGrad blog

Learn data science courses from the World’s top Universities. Earn Executive PG Programs, Advanced Certificate Programs, or Masters Programs to fast-track your career.

Popular AI and ML Blogs & Free Courses

Conclusion

Decision trees come in handy while we are faced with problems that cannot be handled with linear solutions. From observations, it has been noted that tree-based models can easily map the non-linearity of the inputs and effectively eliminate the problem at hand. Sophisticated methods like random forest generation and gradient boosting are all based on the decision tree classifier itself.

Decision trees are a potent tool which can be used in many areas of real life such as, Biomedical Engineering, astronomy, system control, medicines, physics, etc. This effectively makes decision tree classification a critical and indispensable tool of machine learning.

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Selectcaret down icon
Select Area of interestcaret down icon
Select Work Experiencecaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Frequently Asked Questions (FAQs)

1Are Decision Trees inclined to overfit?

Decision Trees fragment the complex data into simpler forms. A Decision Tree classification tries to divide data until it can’t be further divided. A clear chart of all the possible contents is then created, which helps in further analysis. While a vast tree with numerous splices gives us a straight path, it can also generate a problem when testing the data. This excessive splicing leads to overfitting, wherein many divisions cause the tree to grow tremendously. In such cases, the predictive ability of the Decision Tree is compromised, and hence it becomes unsound. Pruning is a technique used to deal with overfitting, where the excessive subsets are removed.

2Do Decision Trees need normalisation?

Decision Trees are the most common machine learning algorithm used for the classification and regression of data. This supervised mechanism splices data per subset into various groups until it reaches the leaf node, where it cannot be divided further. Since this data will be split into categories based on the provided attributes, it will be evenly split. It conveys that both data that went through normalisation and data that didn’t would have the same number of splits. Therefore, normalisation is not a prerequisite for decision-based tree models.

3How to splice Decision Trees?

Decision Trees are a reliable mechanism to classify data and predict solutions. Splicing in a Decision Tree requires precision; one slight mistake can compromise the Decision Tree’s integrity. Splicing in a Decision Tree occurs using recursive partitioning. Splitting data starts with making subsets of data through the attributes assigned to it. The data is split recursively in repetition until the spliced data at each node is deemed obsolete in predicting solutions. The subset can be similar to the value of the target variable as well. Splicing has to be methodical and repetitive for good accuracy.

Explore Free Courses

Suggested Blogs

Top 5 Natural Language Processing (NLP) Projects & Topics For Beginners [2024]
109387
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

30 May 2024

Top 8 Exciting AWS Projects & Ideas For Beginners [2024]
99214
AWS Projects & Topics Looking for AWS project ideas? Then you’ve come to the right place because, in this article, we’ve shared multiple AWS proj
Read More

by Pavan Vadapalli

30 May 2024

Bagging vs Boosting in Machine Learning: Difference Between Bagging and Boosting
91427
Owing to the proliferation of Machine learning applications and an increase in computing power, data scientists have inherently implemented algorithms
Read More

by Pavan Vadapalli

25 May 2024

45+ Best Machine Learning Project Ideas For Beginners [2024]
331368
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

21 May 2024

Top 9 Python Libraries for Machine Learning in 2024
76255
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 May 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced
65223
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 May 2024

40 Best IoT Project Ideas & Topics For Beginners 2024 [Latest]
770113
In this article, you will learn the 40Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Best Simple IoT Proje
Read More

by Kechit Goyal

19 May 2024

Top 22 Artificial Intelligence Project Ideas & Topics for Beginners [2024]
423127
In this article, you will learn the 22 AI project ideas & Topics. Take a glimpse below. Best AI Project Ideas & Topics Predict Housing Price
Read More

by Pavan Vadapalli

18 May 2024

Image Segmentation Techniques [Step By Step Implementation]
64614
What do you see first when you look at your selfie? Your face, right? You can spot your face because your brain is capable of identifying your face an
Read More

by Pavan Vadapalli

16 May 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon