Home
Blog
Data Science
Iris Dataset Classification Project Using Python

Iris Dataset Classification Project Using Python

Updated on Jul 22, 2025 | 6 min read | 1.5K+ views

The Iris dataset is one of the most commonly being used datasets by beginners who aspire to dive into the world of data science and machine learning. It offers a very clean and well-structured way to comprehend classification, feature importance, and model accuracy.

In this project, a machine learning model will be built to classify an iris flower into one of the three species: Setosa, Versicolor, or Virginica, based on the length and width of their sepals and petals. The project is ideal for those who are new to supervised learning and can provide a strong foundation in the main ML pipeline: going through the loading of data, preprocessing, model building, and evaluation.

Explore more project ideas like this in our Top 25+ Essential Data Science Projects GitHub to Explore in 2025 blog.

What Should You Know Beforehand?

It is better to have at least some of the background in:

Python programming language (functions, loops, and handling basic data types)
Pandas and NumPy tools (for reading, analyzing, and manipulating structured data)
Matplotlib and Seaborn libraries (for plotting and visualizing relationships between features)
Scikit-learn (for applying basic classification models and evaluating them using accuracy scores)

Technologies and Libraries Used

For this project, the following tools and libraries will be used:

Tool / Library	Purpose
Python	Programming language used for development
Google Colab	Cloud-based notebook environment for running Python code
Pandas	For data manipulation and analysis
NumPy	For numerical operations and array handling
Matplotlib	To visualize data through basic plots and graphs
Seaborn	For enhanced and beautiful statistical visualizations
Scikit-learn	To train classification models and evaluate performance

Models That Will Be Utilized for Learning

Both of these simple-but-effective algorithms will be applied in classifying iris flower species:

Logistic Regression: A linear model that suits both the binary and multiclass classification tasks. It predicts the probability of a sample belonging to one of the classes and works well when the relationship between features and labels is linear.
K-Nearest Neighbors (KNN): A nonparametric model that classifies a data point based on the classification of its nearest neighbors. Intuitive, easy to apply, KNN is often used in classification problems like the Iris dataset since it captures nonlinear patterns.

Time Taken and Difficulty

You can complete this Iris dataset classification project in about 1.5 to 2 hours. It’s a beginner-level machine learning project that offers a hands-on introduction to supervised classification.

How to Build the Iris Dataset Classification Model

Let’s start building the project from scratch. We will start by:

Loading and exploring the dataset
Preprocessing the features
Visualizing relationships between flower measurements
Training and evaluating classification models

Without any further delay, let’s start!

Step 1: Download the Dataset

To build the Iris flower classification model, we will use the popular Iris dataset available on Kaggle. It contains 150 samples divided equally among three species of iris flowers: Setosa, Versicolor, and Virginica.

Follow the steps mentioned below to download the dataset:

Open a new tab in any web browser.
Type https://www.kaggle.com/datasets/uciml/iris/data.
On the Iris Species page, in the right pane, under the Data Explorer section, click Iris.csv.
Click the download icon.

Step 2: Upload and Load the Dataset in Google Colab

Now that you have downloaded the file, upload it to Google Colab using the code below:

from google.colab import files

# Upload the CSV file
uploaded = files.upload()

This will prompt you to choose a file from your system. Select the Iris.csv file you just downloaded.

Now, use the below code to load the dataset into a Pandas DataFrame as well as preview the first 5 rows:

import pandas as pd

# Load the CSV file
df = pd.read_csv('Iris.csv')

# Show the first five rows
df.head()

Doing so will help you verify that the dataset is loaded correctly.

Output

	Id	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm	Species
0	1	5.1	3.5	1.4	0.2	Iris-setosa
1	2	4.9	3.0	1.4	0.2	Iris-setosa
2	3	4.7	3.2	1.3	0.2	Iris-setosa
3	4	4.6	3.1	1.5	0.2	Iris-setosa
4	5	5.0	3.6	1.4	0.2	Iris-setosa

Step 3: Explore the Dataset

As we can see that the above Iris dataset is well-structured. Therefore, we will just take a quick look at its basic structure, and column names. We should also check for missing values, just to be sure all data are good to go for modeling.

Use the below code to do so:

# Check the shape of the dataset
print("Dataset shape:", df.shape)

# Display column names
print("Column names:", df.columns.tolist())

# Check for any missing values
print("Missing values:\n", df.isnull().sum())

# Get a quick statistical summary
df.describe()

The below output confirms that all features are numerical and that there are no missing values.

Output

Dataset shape: (150, 6)

Column names: ['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species']

Missing values:

Id 0

SepalLengthCm 0

SepalWidthCm 0

PetalLengthCm 0

PetalWidthCm 0

Species 0

dtype: int64

	Id	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm
count	150.000000	150.000000	150.000000	150.000000	150.000000
mean	75.500000	5.843333	3.054000	3.758667	1.198667
std	43.445368	0.828066	0.433594	1.764420	0.763161
min	1.000000	4.300000	2.000000	1.000000	0.100000
25%	38.250000	5.100000	2.800000	1.600000	0.300000
50%	75.500000	5.800000	3.000000	4.350000	1.300000
75%	112.750000	6.400000	3.300000	5.100000	1.800000
max	150.000000	7.900000	4.400000	6.900000	2.500000

Step 4: Preprocess the Data

Now we will prepare it for model training by:

Separating features (X) and target (y)
Encoding the target species
Splitting the data into training and test sets

Here is the code to do so:

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

# Separate features and target
X = df.drop(['Id', 'Species'], axis=1)
y = df['Species']

# Encode target labels into numerical values
le = LabelEncoder()
y_encoded = le.fit_transform(y)

# Split the dataset (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)

# Check the shape
print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)

Output

X_train shape: (120, 4)

X_test shape: (30, 4)

Now data can be utilized in classification models.

Step 5: Train and Evaluate the Models

In this step, we will train the model using Logistic Regression and K Nearest Neighbors classifiers. And that’s not all. We will also evaluate the models using the - accuracy and classification reports.

Let’s start with Logistic Regression.

Model 1: Logistic Regression

Here’s the code:

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Initialize and train the model
lr_model = LogisticRegression(max_iter=200)
lr_model.fit(X_train, y_train)

# Make predictions
y_pred_lr = lr_model.predict(X_test)

# Evaluate
print("Logistic Regression Accuracy:", accuracy_score(y_test, y_pred_lr))
print("Classification Report:\n", classification_report(y_test, y_pred_lr, target_names=le.classes_))

Output

Logistic Regression Accuracy: 1.0

Classification Report:

precision recall f1-score support

Iris-setosa 1.00 1.00 1.00 10

Iris-versicolor 1.00 1.00 1.00 9

Iris-virginica 1.00 1.00 1.00 11

accuracy 1.00 30

macro avg 1.00 1.00 1.00 30

weighted avg 1.00 1.00 1.00 30

What does this output convey?

Logistic Regression was able to obtain 100% accuracy on the test set.
It was able to correctly classify all 30 test samples across the three iris species: setosa, versicolor, and virginica.
The precision was 1.00, and the same holds for recall and F1-score. No misclassifications occurred.

Now let’s try KNN.

Model 2: K-Nearest Neighbors (KNN)

Here is the code:

from sklearn.neighbors import KNeighborsClassifier

# Initialize and train the model
knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train, y_train)

# Make predictions
y_pred_knn = knn_model.predict(X_test)

# Evaluate
print("KNN Accuracy:", accuracy_score(y_test, y_pred_knn))
print("Classification Report:\n", classification_report(y_test, y_pred_knn, target_names=le.classes_))

Output

KNN Accuracy: 1.0

Classification Report:

precision recall f1-score support

Iris-setosa 1.00 1.00 1.00 10

Iris-versicolor 1.00 1.00 1.00 9

Iris-virginica 1.00 1.00 1.00 11

accuracy 1.00 30

macro avg 1.00 1.00 1.00 30

weighted avg 1.00 1.00 1.00 30

What does this output mean?

The KNN model was able to achieve 100% accuracy on the test set.
It perfectly predicted the correct class for all 30 test samples.
Each of the three iris species was classified with precision, recall, and F1-score of 1.00.
The chosen K value (likely k=3 or 5 by default) worked well on this balanced dataset with clearly separable classes.

Step 6: Compare Model Performances

Now that we have seen the performance of both models, let’s quickly compare their results and understand how they performed on the test data:

Metric	Logistic Regression	K-Nearest Neighbors
Accuracy	100%	100%
Precision (avg)	1.00	1.00
Recall (avg)	1.00	1.00
F1-Score (avg)	1.00	1.00
Misclassifications	0	0
Suitability	Simple, fast, great for linearly separable data	Works well with small datasets, non-parametric

Conclusion

In this project, you built a classification model using the famous Iris dataset. You learned how to load and explore a dataset, preprocess features, and apply machine learning algorithms like Logistic Regression and KNN.

While the results look ideal here, real-world data is often messier. Still, this project gives you a solid foundation in solving multi-class classification problems using Python and scikit-learn.

Popular Data Science Programs

Masters in Data Science Degree MSc AI and Data Science Program PG Diploma in Data Science Cloud Computing Courses Certification Data Science Advanced Course

Unlock the power of data with our popular Data Science courses, designed to make you proficient in analytics, machine learning, and big data!

Explore our Popular Data Science Courses

Executive Post Graduate Programme in Data Science from IIITB	Data Science Bootcamp with AI	Master of Science in Data Science from LJMU
Advanced Certificate Programme in Data Science from IIITB	Professional Certificate Program in Data Science and Business Analytics from University of Maryland	Data Science Courses

Elevate your career by learning essential Data Science skills such as statistical modeling, big data processing, predictive analytics, and SQL!

Liverpool John Moores University

MS in Data Science

Dual Credentials

Master's Degree17 Months

IIIT Bangalore

Executive Post Graduate Certificate in Data Science & AI

Placement Assistance

Certification6 Months

Top Data Science Skills to Learn

Data Analysis Course	Inferential Statistics Courses
Hypothesis Testing Programs	Logistic Regression Courses
Linear Regression Courses	Linear Algebra for Analysis

Stay informed and inspired with our popular Data Science articles, offering expert insights, trends, and practical tips for aspiring data professionals!

Read our popular Data Science Articles

Data Science Career Path: A Comprehensive Career Guide	Data Science Career Growth: The Future of Work is here	Why is Data Science Important? 8 Ways Data Science Brings Value to the Business
Relevance of Data Science for Managers	The Ultimate Data Science Cheat Sheet Every Data Scientists Should Have	How to Become a Data Scientist

Reference link:
https://www.kaggle.com/datasets/uciml/iris/data.

Colab Link -
https://colab.research.google.com/drive/1Gc_lUuqqVyisFj27PL4PgojkntKJmLhX?usp=sharing

Rohit Sharma

773 articles published

Rohit Sharma is the Head of Revenue & Programs (International), with over 8 years of experience in business analytics, EdTech, and program management. He holds an M.Tech from IIT Delhi and specializes...

Speak with Data Science Expert

By submitting, I accept the T&C and
Privacy Policy

Start Your Career in Data Science Today

Top Resources