Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconClustering in Machine Learning: 3 Types of Clustering Explained

Clustering in Machine Learning: 3 Types of Clustering Explained

Last updated:
30th Nov, 2020
Read Time
7 Mins
share image icon
In this article
Chevron in toc
View All
Clustering in Machine Learning: 3 Types of Clustering Explained


Machine Learning is one of the hottest technologies in 2020, as the data is increasing day by day the need of Machine Learning is also increasing exponentially. Machine Learning is a very vast topic that has different algorithms and use cases in each domain and Industry. One of which is Unsupervised Learning in which we can see the use of Clustering.

Best Machine Learning and AI Courses Online

Unsupervised learning is a technique in which the machine learns from unlabeled data. As we do not know the labels there is no right answer given for the machine to learn from it, but the machine itself finds some patterns out of the given data to come up with the answers to the business problem.

Clustering is a Machine Learning Unsupervised Learning technique that involves the grouping of given unlabeled data. In each cleaned data set, by using Clustering Algorithm we can cluster the given data points into each group. The clustering Algorithm assumes that the data points that are in the same cluster should have similar properties, while data points in different clusters should have highly dissimilar properties.

Ads of upGrad blog

In-demand Machine Learning Skills

Get Machine Learning Certification from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

In this article, we are going to learn the need of clustering, different types of clustering along with their pros and cons.

Read: Prerequisite of Machine Learning

What is the need of Clustering?

Clustering is a widely used ML Algorithm which allows us to find hidden relationships between the data points in our dataset.


1)     Customers are segmented according to similarities of the previous customers and can be used for recommendations.

2)     Based on a collection of text data, we can organize the data according to the content similarities in order to create a topic hierarchy.

3)     Image processing mainly in biology research for identifying the underlying patterns.

4)     Spam filtering.

5)     Identifying Fraudulent and Criminal activities.

6)     It can also be used for fantasy football and sports.

Types of Clustering     

There are many types of Clustering Algorithms in Machine learning. We are going to discuss the below three algorithms in this article:

1)     K-Means Clustering.

2)     Mean-Shift Clustering.

3)     DBSCAN.

1. K-Means Clustering

K-Means is the most popular clustering algorithm among the other clustering algorithms in Machine Learning. We can see this algorithm used in many top industries or even in a lot of introduction courses. It is one of the easiest models to start with both in implementation and understanding.

Step-1 We first select a random number of k to use and randomly initialize their respective center points.

Step-2 Each data point is then classified by calculating the distance (Euclidean or Manhattan) between that point and each group center, and then clustering the data point to be in the cluster whose center is closest to it.

Step-3 We recompute the group center by taking the mean of all the vectors in the group.

Step-4 We repeat all these steps for a n number of iterations or until the group centers don’t change much.


1)     Very Fast.

2)     Very few computations

3)     Linear Complexity O(n).


1)     Selecting the k value.

2)     Different clustering centers in different runs.

3)     Lack of Consistency.

2. Mean-Shift Clustering

Mean shift clustering is a sliding-window-based algorithm that tries to identify the dense areas of the data points. Being a centroid-based algorithm, meaning that the goal is to locate the center points of each class which in turn works on by updating candidates for center points to be the mean of the points in the sliding-window.

These selected candidate windows are then filtered in a post-processing stage in order to eliminate duplicates which will help in forming the final set of centers and their corresponding classes.

Step-1 We begin with a circular sliding window centered at a point C (randomly selected) and having radius r as the kernel. Mean shift is a hill-climbing type of algorithm that involves shifting this kernel iteratively to a higher density region on each step until we reach convergence.

Step-2 After each iteration the sliding window is shifted towards regions of the higher density by shifting the center point to the mean of the points within the window. The density within the sliding window is increases with the increase to the number of points inside it. Shifting the mean of the points in the window will gradually move towards areas of higher point density.

Step 3 In this step we continue to shift the sliding window based on the mean value until there is no direction at which a shift can get more points inside the selected kernel.

Step-4 The Steps 1-2 are done with many sliding windows until all points lie within a window. When multiple sliding windows tend to overlap the window containing the most points is selected. The data points are now clustered according to the sliding window in which they reside.


1)     No need to select the number of clusters.

2)     Fits well in a naturally data-driven sense


1)     The only drawback is the selection of the window size(r) can be non-trivial.

3. Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

DBSCAN is like Mean-Shift clustering which is also a density-based algorithm with a few changes.

Step-1 It begins with an arbitrary starting point, the neighborhood of this point is extracted using a distance called an epsilon.

Step-2 The clustering will start if there are enough points and the data point becomes the first new point in a cluster. If there is no sufficient data, the point will be labelled as noise and point will be marked visited.

Step-3 The points within the epsilon tend to become the part of the cluster. This procedure is repeated to all points inside the cluster.

Step-4 The steps 2&3 are repeated until the points in the cluster are visited and labelled.

Also Read: Machine Learning Project Ideas

Step-5 On completing the current cluster, a new unvisited point is processed into a new cluster leading to classifying it into a cluster or as a noise.


1)     No need to set the number of clusters.

2)     Defines outliers as noise.

3)     Helps to find the arbitrarily sized and arbitrarily shaped clusters quite well.


1)     Does not perform well on varying density clusters.

Ads of upGrad blog

2)     Does not perform well with high dimensional data.

Popular AI and ML Blogs & Free Courses


In this article, we got to know about the need for clustering in the current market, different types of clustering algorithms along with their pros and cons. Clustering is really a very interesting topic in Machine Learning and there are so many other types of clustering algorithms worth learning.       

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.


Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Selectcaret down icon
Select Area of interestcaret down icon
Select Work Experiencecaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Frequently Asked Questions (FAQs)

1What is meant by gaussian mixture clustering?

Gaussian mixture models are usually used in the case of query data to perform either hard or soft clustering. The Gaussian mixture models make a few assumptions in order to perform the clustering well. Based on the assumptions, the model groups the data points that belong to a single distribution together. These are probabilistic models, and they use a soft clustering approach to carry out the clustering process efficiently.

2What is the silhouette coefficient in clustering?

In order to measure how well the clustering has been carried out, we use the silhouette coefficient. Basically, the average distance between two clusters is measured, and then the silhouette width is calculated using a formula. This way, we can easily measure the optimal number of clusters present in the given data and thus find out the efficiency of the clustering done.

3What is meant by fuzzy clustering in machine learning?

When the given data comes under more than one cluster or group, a fuzzy clustering method is used, which works on a fuzzy C-mean algorithm or fuzzy K-mean algorithm. It is a soft clustering method. According to the distance between the cluster center and the image point, the method assigns membership values to each image point associated with each cluster center.

Explore Free Courses

Suggested Blogs

Top 5 Natural Language Processing (NLP) Projects & Topics For Beginners [2024]
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

30 May 2024

Top 8 Exciting AWS Projects & Ideas For Beginners [2024]
AWS Projects & Topics Looking for AWS project ideas? Then you’ve come to the right place because, in this article, we’ve shared multiple AWS proj
Read More

by Pavan Vadapalli

30 May 2024

Bagging vs Boosting in Machine Learning: Difference Between Bagging and Boosting
Owing to the proliferation of Machine learning applications and an increase in computing power, data scientists have inherently implemented algorithms
Read More

by Pavan Vadapalli

25 May 2024

45+ Best Machine Learning Project Ideas For Beginners [2024]
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

21 May 2024

Top 9 Python Libraries for Machine Learning in 2024
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 May 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 May 2024

40 Best IoT Project Ideas & Topics For Beginners 2024 [Latest]
In this article, you will learn the 40Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Best Simple IoT Proje
Read More

by Kechit Goyal

19 May 2024

Top 22 Artificial Intelligence Project Ideas & Topics for Beginners [2024]
In this article, you will learn the 22 AI project ideas & Topics. Take a glimpse below. Best AI Project Ideas & Topics Predict Housing Price
Read More

by Pavan Vadapalli

18 May 2024

Image Segmentation Techniques [Step By Step Implementation]
What do you see first when you look at your selfie? Your face, right? You can spot your face because your brain is capable of identifying your face an
Read More

by Pavan Vadapalli

16 May 2024

Schedule 1:1 free counsellingTalk to Career Expert
footer sticky close icon