Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconPros and Cons of Decision Tree Regression in Machine Learning

Pros and Cons of Decision Tree Regression in Machine Learning

Last updated:
24th Dec, 2020
Views
Read Time
7 Mins
share image icon
In this article
Chevron in toc
View All
Pros and Cons of Decision Tree Regression in Machine Learning

One of the most popular machine learning algorithms, the decision tree regression, is used by both competitors and data science professionals. These are predictive models that calculate a target value based on a set of binary rules.

Top Machine Learning and AI Courses Online

It is used to build both regression and classification models in the form of a tree structure. Datasets are broken down into smaller subsets in a decision tree, while an associated decision tree is incrementally built simultaneously.

A decision tree is used to reach an estimate based on performing a series of questions on the dataset. By asking these true/false questions, the model is able to narrow down the possible values and make a prediction. The order and content of the question are decided by the model itself.

Ads of upGrad blog

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Trending Machine Learning Skills

What are the Decision Tree Terms?

A decision tree has branches, nodes, leaves, etc. A root node is an initial node representing the entire sample or population, and it can get further divided into other nodes or homogeneous sets. A decision node consists of two or more nodes that represent separate values of the attribute tested.

A leaf/terminal node does not split into further nodes, and it represents a decision. A branch or sub-tree is a subsection of an entire tree. Splitting is the process of dividing a node into two or more sub-nodes. The opposite of splitting is called pruning, i.e., the removal of sub-nodes of a decision node. A parent node is a node that gets divided into sub-nodes, and the sub-node is the child node.

Related: Guide to decision tree algorithm

How Does it Work?

The decision tree algorithm uses a data point and runs through the entire tree by asking true/false questions. Starting from the root node, questions are asked, and separate branches are created for each answer, and this continues till the leaf node is reached. Recursive partitioning is used to construct the tree.

A decision tree is a supervised machine learning model, and therefore, it learns to map data to the outputs in the training phase of the model building. This is done by fitting the model with historical data that needs to be relevant to the problem, along with its true value that the model should learn to predict accurately. This helps the model learn the relationships between the data and the target variable.

After this phase, the decision tree is able to build a similar tree by calculating the questions and their order, which will help it make the most accurate estimate. Thus, the prediction depends on the training data that is fed into the model.

FYI: Free nlp course!

How is the Splitting Decided?

The decision to split is different for classification and regression trees, and the accuracy of the tree’s prediction is highly dependent on it. Mean squared error (MSE) is usually used to decide whether to split a node into two or more sub-nodes in a decision tree regression. In the case of a binary tree, the algorithm picks a value and splits the data into two subsets, calculates MSE for each subset, and chooses the smallest MSE value as a result.

Implementing Decision Tree Regression

The basic structure to implement a decision tree regression algorithm is provided in the following steps.

Importing libraries

The first step to developing any machine learning model is to import all the needed libraries for the development.

Loading the data

After importing libraries, the next step is to load the dataset. The data can be downloaded or used from the user’s local folders.

Splitting the dataset

Once the data is loaded, it needs to be split into a training set and test set and creating the x and y variables. The values also need to be reshaped to make the data into the required format.

Training the model

Here the data tree regression model is trained by using the training set created in the previous step.

Predicting the results

Here the results of the test set are predicted by using the model trained on the training set.

Model evaluation

The model’s performance is checked by comparing the real values and predicted values in the final step. The model’s accuracy can be inferred by comparing these values. Visualizing the results by creating a graph of the values also helps in gauging the model’s accuracy.

Read: How to create perfect decision tree?

Advantages

  • The decision tree model can be used for both classification and regression problems, and it is easy to interpret, understand, and visualize. 
  • The output of a decision tree can also be easily understood. 
  • Compared with other algorithms, data preparation during pre-processing in a decision tree requires less effort and does not require normalization of data. 
  • The implementation can also be done without scaling the data. 
  • A decision tree is one of the quickest ways to identify relationships between variables and the most significant variable. 
  • New features can also be created for better target variable prediction. 
  • Decision trees are not largely influenced by outliers or missing values, and it can handle both numerical and categorical variables. 
  • Since it is a non-parametric method, it has no assumptions about space distributions and classifier structure.

Disadvantages

  • Overfitting is one of the practical difficulties for decision tree models. It happens when the learning algorithm continues developing hypotheses that reduce the training set error but at the cost of increasing test set error. But this issue can be resolved by pruning and setting constraints on the model parameters. 
  • Decision trees cannot be used well with continuous numerical variables. 
  • A small change in the data tends to cause a big difference in the tree structure, which causes instability. 
  • Calculations involved can also become complex compared to other algorithms, and it takes a longer time to train the model. 
  • It is also relatively expensive as the amount of time taken and the complexity levels are greater.

Popular AI and ML Blogs & Free Courses

Conclusion

The decision tree regression algorithm was explained through this article by describing how the tree gets constructed along with brief definitions of various terms regarding it. A brief description of how the decision tree works and how the decision about splitting any node is taken is also included.

Ads of upGrad blog

How a basic decision tree regression can be implemented was also explained through a sequence of steps. Lastly, the advantages and disadvantages of a decision tree algorithm were provided. 

If you’re interested to learn more about decision trees, machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select Coursecaret down icon
Selectcaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Frequently Asked Questions (FAQs)

1What is regression in machine learning?

Regression is used to predict continuous variables. It is when we have to predict a number. For example, if you want to predict the prices of houses in a city, based on the features like size of the house and area of the city, regression would be used. Regression problems are very easy to solve using linear regression. In a nutshell, regression is the act of estimating an unknown output value based on an input value.

2What are decision trees?

A decision tree is a diagram that shows all possible decisions and the possible outcomes. Decision trees are often used to examine how decisions influence future outcomes. For example, a decision tree can help a company analyze whether it should buy additional warehouses or build a new distribution center. In general, decision trees are used in operations research and management science. Decision trees are a common and popular concept in decision making and program planning. They can be used in choosing between courses of action when some of the possible courses are mutually exclusive, and when the outcome of each course of action depends on the state of the world.

3What are the advantages and disadvantages of decision trees?

Decision trees model can be used for any class of problems, either for classification or numeric prediction. It can be extended to any class of problems. It can be used for both supervised and unsupervised classification. It can handle a mixture of numeric and categorical features. It gives stable results. However, it is difficult to understand the reason behind the prediction. It should be understood that the model is not learning the best split in each node of the tree rather it is learning the probability distribution of class within each node. This requirement makes the model computationally intensive and prevents it from handling large amounts of data.

Explore Free Courses

Suggested Blogs

Artificial Intelligence course fees
5438
Artificial intelligence (AI) was one of the most used words in 2023, which emphasizes how important and widespread this technology has become. If you
Read More

by venkatesh Rajanala

29 Feb 2024

Artificial Intelligence in Banking 2024: Examples & Challenges
6176
Introduction Millennials and their changing preferences have led to a wide-scale disruption of daily processes in many industries and a simultaneous g
Read More

by Pavan Vadapalli

27 Feb 2024

Top 9 Python Libraries for Machine Learning in 2024
75639
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 Feb 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced
64469
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 Feb 2024

Data Preprocessing in Machine Learning: 7 Easy Steps To Follow
152994
Summary: In this article, you will learn about data preprocessing in Machine Learning: 7 easy steps to follow. Acquire the dataset Import all the cr
Read More

by Kechit Goyal

18 Feb 2024

Artificial Intelligence Salary in India [For Beginners & Experienced] in 2024
908759
Artificial Intelligence (AI) has been one of the hottest buzzwords in the tech sphere for quite some time now. As Data Science is advancing, both AI a
Read More

by upGrad

18 Feb 2024

24 Exciting IoT Project Ideas & Topics For Beginners 2024 [Latest]
760392
Summary: In this article, you will learn the 24 Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Smart Agr
Read More

by Kechit Goyal

18 Feb 2024

Natural Language Processing (NLP) Projects & Topics For Beginners [2023]
107744
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

17 Feb 2024

45+ Interesting Machine Learning Project Ideas For Beginners [2024]
328367
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

16 Feb 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon