Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconClustering in Machine Learning: 3 Types of Clustering Explained

Clustering in Machine Learning: 3 Types of Clustering Explained

Last updated:
30th Nov, 2020
Views
Read Time
7 Mins
share image icon
In this article
Chevron in toc
View All
Clustering in Machine Learning: 3 Types of Clustering Explained

Introduction

Machine Learning is one of the hottest technologies in 2020, as the data is increasing day by day the need of Machine Learning is also increasing exponentially. Machine Learning is a very vast topic that has different algorithms and use cases in each domain and Industry. One of which is Unsupervised Learning in which we can see the use of Clustering.

Best Machine Learning and AI Courses Online

Unsupervised learning is a technique in which the machine learns from unlabeled data. As we do not know the labels there is no right answer given for the machine to learn from it, but the machine itself finds some patterns out of the given data to come up with the answers to the business problem.

Clustering is a Machine Learning Unsupervised Learning technique that involves the grouping of given unlabeled data. In each cleaned data set, by using Clustering Algorithm we can cluster the given data points into each group. The clustering Algorithm assumes that the data points that are in the same cluster should have similar properties, while data points in different clusters should have highly dissimilar properties.

Ads of upGrad blog

In-demand Machine Learning Skills

Get Machine Learning Certification from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

In this article, we are going to learn the need of clustering, different types of clustering along with their pros and cons.

Read: Prerequisite of Machine Learning

What is the need of Clustering?

Clustering is a widely used ML Algorithm which allows us to find hidden relationships between the data points in our dataset.

Examples:

1)     Customers are segmented according to similarities of the previous customers and can be used for recommendations.

2)     Based on a collection of text data, we can organize the data according to the content similarities in order to create a topic hierarchy.

3)     Image processing mainly in biology research for identifying the underlying patterns.

4)     Spam filtering.

5)     Identifying Fraudulent and Criminal activities.

6)     It can also be used for fantasy football and sports.

Types of Clustering     

There are many types of Clustering Algorithms in Machine learning. We are going to discuss the below three algorithms in this article:

1)     K-Means Clustering.

2)     Mean-Shift Clustering.

3)     DBSCAN.

1. K-Means Clustering

K-Means is the most popular clustering algorithm among the other clustering algorithms in Machine Learning. We can see this algorithm used in many top industries or even in a lot of introduction courses. It is one of the easiest models to start with both in implementation and understanding.

Step-1 We first select a random number of k to use and randomly initialize their respective center points.

Step-2 Each data point is then classified by calculating the distance (Euclidean or Manhattan) between that point and each group center, and then clustering the data point to be in the cluster whose center is closest to it.

Step-3 We recompute the group center by taking the mean of all the vectors in the group.

Step-4 We repeat all these steps for a n number of iterations or until the group centers don’t change much.

Pros

1)     Very Fast.

2)     Very few computations

3)     Linear Complexity O(n).

Cons

1)     Selecting the k value.

2)     Different clustering centers in different runs.

3)     Lack of Consistency.

2. Mean-Shift Clustering

Mean shift clustering is a sliding-window-based algorithm that tries to identify the dense areas of the data points. Being a centroid-based algorithm, meaning that the goal is to locate the center points of each class which in turn works on by updating candidates for center points to be the mean of the points in the sliding-window.

These selected candidate windows are then filtered in a post-processing stage in order to eliminate duplicates which will help in forming the final set of centers and their corresponding classes.

Step-1 We begin with a circular sliding window centered at a point C (randomly selected) and having radius r as the kernel. Mean shift is a hill-climbing type of algorithm that involves shifting this kernel iteratively to a higher density region on each step until we reach convergence.

Step-2 After each iteration the sliding window is shifted towards regions of the higher density by shifting the center point to the mean of the points within the window. The density within the sliding window is increases with the increase to the number of points inside it. Shifting the mean of the points in the window will gradually move towards areas of higher point density.

Step 3 In this step we continue to shift the sliding window based on the mean value until there is no direction at which a shift can get more points inside the selected kernel.

Step-4 The Steps 1-2 are done with many sliding windows until all points lie within a window. When multiple sliding windows tend to overlap the window containing the most points is selected. The data points are now clustered according to the sliding window in which they reside.

Pros

1)     No need to select the number of clusters.

2)     Fits well in a naturally data-driven sense

Cons

1)     The only drawback is the selection of the window size(r) can be non-trivial.

3. Density-Based Spatial Clustering of Applications with Noise (DBSCAN)

DBSCAN is like Mean-Shift clustering which is also a density-based algorithm with a few changes.

Step-1 It begins with an arbitrary starting point, the neighborhood of this point is extracted using a distance called an epsilon.

Step-2 The clustering will start if there are enough points and the data point becomes the first new point in a cluster. If there is no sufficient data, the point will be labelled as noise and point will be marked visited.

Step-3 The points within the epsilon tend to become the part of the cluster. This procedure is repeated to all points inside the cluster.

Step-4 The steps 2&3 are repeated until the points in the cluster are visited and labelled.

Also Read: Machine Learning Project Ideas

Step-5 On completing the current cluster, a new unvisited point is processed into a new cluster leading to classifying it into a cluster or as a noise.

Pros

1)     No need to set the number of clusters.

2)     Defines outliers as noise.

3)     Helps to find the arbitrarily sized and arbitrarily shaped clusters quite well.

Cons

1)     Does not perform well on varying density clusters.

Ads of upGrad blog

2)     Does not perform well with high dimensional data.

Popular AI and ML Blogs & Free Courses

Conclusion

In this article, we got to know about the need for clustering in the current market, different types of clustering algorithms along with their pros and cons. Clustering is really a very interesting topic in Machine Learning and there are so many other types of clustering algorithms worth learning.       

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select Coursecaret down icon
Selectcaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Frequently Asked Questions (FAQs)

1What is meant by gaussian mixture clustering?

Gaussian mixture models are usually used in the case of query data to perform either hard or soft clustering. The Gaussian mixture models make a few assumptions in order to perform the clustering well. Based on the assumptions, the model groups the data points that belong to a single distribution together. These are probabilistic models, and they use a soft clustering approach to carry out the clustering process efficiently.

2What is the silhouette coefficient in clustering?

In order to measure how well the clustering has been carried out, we use the silhouette coefficient. Basically, the average distance between two clusters is measured, and then the silhouette width is calculated using a formula. This way, we can easily measure the optimal number of clusters present in the given data and thus find out the efficiency of the clustering done.

3What is meant by fuzzy clustering in machine learning?

When the given data comes under more than one cluster or group, a fuzzy clustering method is used, which works on a fuzzy C-mean algorithm or fuzzy K-mean algorithm. It is a soft clustering method. According to the distance between the cluster center and the image point, the method assigns membership values to each image point associated with each cluster center.

Explore Free Courses

Suggested Blogs

Artificial Intelligence course fees
5432
Artificial intelligence (AI) was one of the most used words in 2023, which emphasizes how important and widespread this technology has become. If you
Read More

by venkatesh Rajanala

29 Feb 2024

Artificial Intelligence in Banking 2024: Examples & Challenges
6171
Introduction Millennials and their changing preferences have led to a wide-scale disruption of daily processes in many industries and a simultaneous g
Read More

by Pavan Vadapalli

27 Feb 2024

Top 9 Python Libraries for Machine Learning in 2024
75623
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 Feb 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced
64465
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 Feb 2024

Data Preprocessing in Machine Learning: 7 Easy Steps To Follow
152929
Summary: In this article, you will learn about data preprocessing in Machine Learning: 7 easy steps to follow. Acquire the dataset Import all the cr
Read More

by Kechit Goyal

18 Feb 2024

Artificial Intelligence Salary in India [For Beginners & Experienced] in 2024
908742
Artificial Intelligence (AI) has been one of the hottest buzzwords in the tech sphere for quite some time now. As Data Science is advancing, both AI a
Read More

by upGrad

18 Feb 2024

24 Exciting IoT Project Ideas & Topics For Beginners 2024 [Latest]
760226
Summary: In this article, you will learn the 24 Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Smart Agr
Read More

by Kechit Goyal

18 Feb 2024

Natural Language Processing (NLP) Projects & Topics For Beginners [2023]
107723
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

17 Feb 2024

45+ Interesting Machine Learning Project Ideas For Beginners [2024]
328320
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

16 Feb 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon