Understanding Decision Tree Classification: Implementation in Python
Updated on Jul 23, 2025 | 12 min read | 7.17K+ views
Share:
For working professionals
For fresh graduates
More
Updated on Jul 23, 2025 | 12 min read | 7.17K+ views
Share:
Table of Contents
Did you know? Hyperparameter tuning can significantly enhance the predictive power of decision tree algorithms, such as CART, particularly when applied to diverse datasets. This simple technique can transform a good model into an exceptional one, making decision trees even more effective for real-world applications! |
Decision tree classification is a powerful machine learning technique that models decisions and predicts outcomes based on feature values. In Python, it can be easily implemented using Scikit-learn, providing an effective solution for tasks in finance, healthcare, and marketing.
Decision trees also enhance deep learning models in ensemble methods, combining interpretability with the accuracy of complex models.
This blog will guide you through implementing decision trees in Python, covering key concepts, optimization techniques, and visualization to solve real-world problems effectively.
Master decision trees and boost your ML skills with upGrad's AI and ML courses. Gain practical skills with mentorship from top 1% global universities and experience an average 51% salary hike upon completion
Popular AI Programs
A Decision Tree is a supervised machine learning algorithm for classification tasks, where data is partitioned through feature comparisons to predict class labels.
Its strength lies in interpretability, as its structure mirrors human decision-making, allowing stakeholders to trace decision paths.
Decision trees are crucial in healthcare for disease classification and in marketing for predicting customer behavior. They are also utilized in credit scoring, fraud detection, and personalized recommendations, underscoring their versatility across various industries.
Learn more concepts, such as decision trees, and advance your career in machine learning. Explore upGrad's programs to gain hands-on experience and solve real-world AI challenges.
A decision tree's structure consists of several key components that work in tandem to drive the classification process. Here's an overview of each:
Component |
Description |
Root Node | The starting point of the decision tree, representing the entire dataset before any splits. It is the node where data is first divided based on the feature that best partitions the dataset. |
Branches | Lines that connect nodes, representing the flow of decisions. Each branch reflects the outcome of a decision at the parent node and directs the data to the next decision point. |
Internal Nodes | Points within the tree where decisions are made based on feature values. These nodes test specific features and split the data according to the most useful criterion (e.g., Information Gain, Gini Index). |
Leaf Nodes | The end points of the tree represent the final classifications or predictions. Each leaf node corresponds to a class label assigned after all the splits. The path from root to leaf reveals the decision process. |
In domains such as medical diagnosis, a decision tree could begin by testing whether a patient has a fever (root node), followed by tests for additional symptoms (internal nodes), ultimately leading to a conclusion like "flu" or "common cold" (leaf nodes).
The ability to trace these decisions helps not only in explaining the results but also in fine-tuning the model for higher accuracy.
Also Read: 4 Types of Trees in Data Structures Explained[2025]
Lead the future of AI-driven decision-making with the Executive Programme in Generative AI for Leaders. Learn to apply decision trees and AI techniques to optimize business strategies and enhance leadership in tech-driven industries.
Now that we’ve covered the concepts of decision trees, let’s explore how the algorithm works in classification and the key components behind its decision-making.
A decision tree algorithm splits the dataset into subsets based on criteria to make predictions. Classification Trees are used for categorical data, such as predicting whether an email is spam or not, based on features like specific words or sender domain.
In contrast, Regression Trees predict continuous outcomes, such as house prices, by splitting data based on features. The key difference is that classification trees predict discrete classes, while regression trees predict continuous values.
Now, let's understand the core idea behind decision tree splitting.
The decision tree algorithm recursively splits the dataset based on the feature that best partitions the data, aiming to maximize the "purity" of the resulting subsets. This continues until a stopping criterion is met, such as when all data points in a node belong to the same class or the tree reaches a maximum depth.
Decision trees use splitting criteria to measure "impurity" or disorder, guiding the algorithm's feature selection for each split.
Also Read: Introduction to Classification Algorithm: Concepts & Various Types
The process of selecting the most relevant feature for splitting the data is crucial for the performance of the decision tree. Several attribute selection measures are commonly used in decision trees to evaluate which features provide the best splits:
1. Information Gain
Information Gain is based on entropy, a measure of uncertainty. It selects the feature that reduces entropy the most, leading to the greatest gain in information. The formula for Information Gain is:
Where:
Information Gain selects the feature that best reduces uncertainty about the target class.
2. Gain Ratio
Gain Ratio adjusts Information Gain for features with many unique values (e.g., ID numbers) by normalizing it against the feature's intrinsic information to provide a more balanced measure.
This penalizes features with many values that offer little predictive power, ensuring more meaningful splits.
Also Read: Sentiment Analysis: What is it and Why Does it Matter?
3. Gini Index
The Gini Index measures impurity and is used in the CART algorithm. It calculates the probability of incorrectly classifying a randomly chosen element in the node. The formula is:
Where pi is the probability of class i in the node S. A Gini Index of 0 indicates perfect purity, where all elements in the node belong to a single class.
Also Read: Understanding Decision Tree In AI: Types, Examples, and How to Create One
In decision tree construction, the best feature for splitting is the one that maximizes Information Gain, minimizes the Gini Index, or maximizes the Gain Ratio, depending on the algorithm being used. These measures ensure that the tree makes the most accurate classifications.
Gain hands-on experience with AI tools like Tableau and Python. The Gen AI Mastery Certificate in Data Analysis helps you apply decision trees in data analysis for real-world business solutions, such as fraud detection and predictive analytics!
With a clear understanding of how decision trees function, let’s move on to building and optimizing decision tree classifiers in Python to apply these concepts effectively
In this section, we’ll build a decision tree model using Python's Scikit-learn library, optimize its performance, and visualize the results.
We'll use the Pima Indians Diabetes Dataset, a widely used dataset for binary classification. It contains data on 768 patients, with 8 features: pregnancies, glucose level, blood pressure, skin thickness, insulin levels, BMI, age, and a diabetes outcome label (1 for positive, 0 for negative).
This dataset helps predict whether a patient has diabetes based on these input features.
Below is the complete Python code for building a decision tree classifier using the Pima Indians Diabetes Dataset in Scikit-learn. The code will load the data, train a decision tree model, and evaluate its performance.
Python Code:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
# Load the Pima Indians Diabetes Dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
column_names = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'Age', 'Outcome']
data = pd.read_csv(url, names=column_names)
# Split the dataset into features (X) and target (y)
X = data.drop('Outcome', axis=1)
y = data['Outcome']
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train the Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)
# Make predictions and evaluate the model
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
# Output the accuracy
print(f"Accuracy: {accuracy * 100:.2f}%")
# Visualize the decision tree
plt.figure(figsize=(20,10))
plot_tree(clf, filled=True, feature_names=X.columns, class_names=["No Diabetes", "Diabetes"], rounded=True, fontsize=14)
plt.show()
Output:
1. Accuracy:
Accuracy: 77.34%
2. Graph:
The decision tree will be displayed as a graphical plot showing how it splits the data based on different features like Pregnancies, Glucose, BMI, etc., to classify whether the patient has diabetes (1) or not (0).
Explanation:
1. Dataset Loading:
The dataset is loaded from a CSV file containing data about 768 patients, including features such as Pregnancies, Glucose, BMI, Age, and the target variable Outcome, which indicates whether the patient has diabetes (1) or not (0).
2. Data Splitting:
The dataset is divided into features (X) and the target variable (y). We then split the data into training and testing sets, using 70% of the data for training and 30% for testing. The train_test_split function ensures that the data is randomly shuffled.
3. Model Training:
We instantiate a DecisionTreeClassifier and fit it to the training data. The model learns patterns in the data to predict the outcome (diabetes or not).
4. Prediction and Evaluation:
The model makes predictions on the test set, and we calculate the accuracy using accuracy_score. This metric tells us the percentage of correct predictions made by the model on the test data.
In this example, the model correctly predicted the outcome for approximately 77.34% of the test data, indicating that the decision tree model performs reasonably well on this dataset.
Also Read: Top 5 Machine Learning Models Explained For Beginners
Decision trees are powerful but prone to overfitting, particularly with complex data or deep trees. To optimize performance, techniques like pruning and hyperparameter tuning help ensure the model generalizes well to unseen data.
Several methods can be employed to optimize decision tree performance:
1. Hyperparameter Tuning
Adjust key parameters such as:
2. Pruning
3. Cross-Validation
Use k-fold cross-validation to assess model performance across different data subsets, reducing the risk of overfitting.
4. Feature Engineering
Feature engineering can enhance feature quality or remove irrelevant features to improve model performance. Decision trees are less sensitive to scaling but benefit from well-selected features.
Did You Know? Decision trees can serve as base learners in Gradient Boosting and Random Forest models, greatly improving performance in complex datasets. |
Also Read: Random Forest Hyperparameter Tuning in Python: Complete Guide
Decision trees are valued for their simplicity and interpretability, but they also have limitations that require careful attention.
Decision trees are effective in use cases such as customer segmentation for marketing. For example, a decision tree can classify customers into high or low-value segments based on features like age, income, and purchase history.
The model’s transparency allows marketers to understand the reasoning behind segment assignments. However, decision trees also have limitations that can impact their performance in specific scenarios. Let's understand this further
Decision trees are highly interpretable and effective in use cases where transparency is crucial. For example, in loan approval systems, a decision tree can classify applicants as approved or denied based on features like credit score, income, and loan amount.
The model's decision-making process is straightforward, allowing financial institutions to justify their decisions to applicants clearly. Some more advantages include:
Boost your ability to apply decision trees to text classification and sentiment analysis. The Introduction to Natural Language Processing course will teach you NLP skills that integrate seamlessly with decision tree models for advanced text-based prediction tasks.
Despite their advantages, decision trees have limitations that can affect their performance, especially in complex datasets.
For example, in predicting stock prices, decision trees might struggle with capturing the nuanced, non-linear relationships between features, leading to inaccurate predictions. Main limitations of decision trees include:
Also Read: Artificial Intelligence Jobs in 2025: Skills and Opportunities
Now let’s look at how upGrad can guide you in learning decision trees and more advanced machine learning concepts.
Decision trees are key in machine learning, applied in finance, healthcare, and marketing for tasks like credit scoring, disease prediction, and customer segmentation. Mastering them is essential for a successful ML career.
To excel in decision trees, master libraries like Pandas, NumPy, and Scikit-learn. Practice on datasets like Iris and Pima Indians Diabetes, focusing on hyperparameter tuning and pruning. Then, apply regression and classification models to real-world projects.
While learning, challenges such as model optimization and feature selection can slow progress. For this, upGrad offers specialized machine learning courses where you can learn from industry experts and gain a career boost.
Some additional courses include:
With personalized mentorship, you'll receive guidance tailored to your learning pace and career goals. Additionally, upGrad's offline centers offer a hands-on learning experience, providing you with the opportunity to interact directly with experts and peers.
Expand your expertise with the best resources available. Browse the programs below to find your ideal fit in Best Machine Learning and AI Courses Online.
Discover in-demand Machine Learning skills to expand your expertise. Explore the programs below to find the perfect fit for your goals.
Discover popular AI and ML blogs and free courses to deepen your expertise. Explore the programs below to find your perfect fit.
Reference:
https://arxiv.org/abs/1812.02207
900 articles published
Pavan Vadapalli is the Director of Engineering , bringing over 18 years of experience in software engineering, technology leadership, and startup innovation. Holding a B.Tech and an MBA from the India...
Speak with AI & ML expert
By submitting, I accept the T&C and
Privacy Policy
Top Resources