Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconK Means Clustering Matlab [With Source Code]

K Means Clustering Matlab [With Source Code]

Last updated:
9th Dec, 2020
Views
Read Time
9 Mins
share image icon
In this article
Chevron in toc
View All
K Means Clustering Matlab [With Source Code]

K-means clustering is one of the most commonly used techniques by data professionals. Due to the algorithm’s efficacy, it is demanded by numerous industries in various applications.

Top Machine Learning and AI Courses Online

A data scientist’s job requires the implementation of Clustering in many stages. Many large-scale projects are currently based upon the clustering algorithm and have drastically raised the bar for the demand of data science professionals.

One of those algorithms is the K-means clustering, which is the basic idea of this article and its implementation with the MATLAB source code.

Ads of upGrad blog

Before getting the topic’s hold, let’s have a quick look at what Clustering is, its significance, and how it can be implemented in real life. By the end of the post, you will come to know how crucial this algorithm is for understanding data in large sets.

Trending Machine Learning Skills

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

What is Clustering?

Data is the most critical component for any application, and a cluster is nothing but an accumulation of similar data points combined. As the name clearly defines, Clustering is the process of dividing a large chunk of data into subgroups or only clusters based on the data pattern.

In machine learning, Clustering is applied when there is no predefined data available. The ultimate aim is to group data into classes with high Intra-class similarity.

Clustering is used to explore data. Some real-life examples where it can be used are in market segmentation to find customers with similar behaviours, image segmentation/compression, document clustering with multiple topics, etc.

It is a requisite step before processing data to identify homogeneous groups for building supervised models. K-Means clustering is an unsupervised learning algorithm as we have to look for data to integrate similar observations and form distinct groups.

Let’s take a look at the K-Means algorithm, which is one of the most applied and the simplest clustering algorithms.

K-Means Clustering

Image Source

K-means clustering is one of the most desired unsupervised machine learning algorithms.

Unsupervised algorithms make conclusions from datasets using input vectors without referring to labelled outcomes.

It is an iterative distance-based or centroid-based algorithm that segregates the dataset into K distinct subgroups (clusters) where each data point belongs to one group. The similarity of the intra-cluster data points is increased, and the distance between the clusters is kept optimum.

The distance between the data points and the centroid of the cluster is kept at a minimum, such as Euclidean distance. In K-Means, each cluster is linked to a centroid. The primary aim is to minimise the distances between the points and the respective cluster centroid.

FYI: Free nlp course!

How K-Means Clustering Works?

As the clustering process means several iterations to be performed, the K-Means algorithm has a unique way of working. Here is a step-by-step explanation of the way it works:

Image Source

Step 1: Initially, define the number of clusters ‘K’.

Step 2: Initialise random K data points as centroids for each cluster.

If there are 2 clusters, the value of ‘K’ will be 2.

Step 3: Perform several iterations until the assigned data points to clusters do not change.

Step 4: Calculate the sum of the squared distance between data points and the centroids.

Step 5: Allocate each data point to the closest cluster (centroid) to minimise the distance.

Step 6: Take an average of the centroids of the clusters belonging to each other.

This is a single iteration process performed for computing the centroid and assigning the points to the cluster based on their distance from the centroid. Once all the centroids are defined, the process is stopped.

An Illustrative Example Depicting the Implementation of K-Means Clustering

Statement: One of the famous food chains, McDonald’s wants to open a chain of outlets across California and want to find out the locations that will fetch them maximum revenue.

What McDonald’s already Has?

Ø  A strong e-commerce presence

Ø  Online customer data for analysing locations from where the orders are made frequently

Possible challenges they could face

  • Analyzing the areas from where the orders are made frequently.
  • Comprehend how many outlets to be opened in the area
  • Figure out the locations for the outlets within all areas to keep a minimum distance between the store and delivery points.

All these points need a lot of analysis and mathematics to work on.

How can the K-means Clustering Method be used here?

With a predefined value of K, the K-means algorithm can be implemented in the following steps:

  • Identifying the store locations with K Partition of objects into K non-empty subsets.
  • Determining the cluster centroids of the partition.
  • Assigning each location to a specific cluster.
  • Calculating the distances from each location and allocate points to the cluster where the distance is minimum with the outlet.
  • After one iteration, re-allotting the points, find the centroid of the new cluster formed.

Likewise, the K-Means Clustering algorithm can be applied to a variety of applications in varied scales. The hospitality industry, crime investigation departments, and image resizing, to name a few.

K-Means algorithm is implemented using many languages such as R, Python, MATLAB, etc. In the next section, we will look at how K-Means Clustering MATLAB is applied.

Read: Types of Functions in Matlab

K-Means Algorithm Using MATLAB

K-Means is a largely used algorithm used by many professionals dealing with data science, machine learning, artificial intelligence, cryptography, and cybersecurity.

The core objective of using this algorithm is to find out the centroid of each cluster. The data given to a programmer is heterogeneous. Here is the MATLAB code for plotting the centroid of each cluster and assign the coordinates of each centroid:

Clustering MATLAB

Code:

rng default; % For reproducibility

X = [randn(100,2)*0.75+ones(100,2);

    randn(100,2)*0.5-ones(100,2)];

 opts=statset(‘Display’,’final’);

[idx,C]=kmeans(X,4,’Distance’,’cityblock’,’Replicates’,5,’Options’,opts);

 plot(X(idx==1,1),X(idx==1,2),’r.’,’MarkerSize’,12);

hold on;

plot(X(idx==2,1),X(idx==2,2),’b.’,’MarkerSize’,12);

plot(X(idx==3,1),X(idx==3,2),’g.’,’MarkerSize’,12);

plot(X(idx==4,1),X(idx==4,2),’y.’,’MarkerSize’,12);

plot(C(:,1),C(:,2),’Kx’,’MarkerSize’,15,’LineWidth’,3);

legend(‘Cluster 1′,’Cluster 2′,’Cluster 3′,’Cluster 4′,’Centroids’, ‘Location’,’NW’);

title(‘Cluster Assignments and centroids’);

hold off;

for i=1:size(C, 1)

display([‘Centroid ‘, num2str(i), ‘: X1 = ‘, num2str(C(i, 1)), ‘; X2 = ‘, num2str(C(i, 2))]);

end

 Output:

MATLAB Window Showing Four Clusters and Respective Centroids

Image Source 

Results:

The centroids obtained are as follows:

  1. The value of X1 & X2 for Centroid 1: 1.3661; 1.7232
  2. The value of X1 & X2 for Centroid 2: -1.015; -1.053
  3. The value of X1 & X2 for Centroid 3: 1.6565; 0.36376
  4. The value of X1 & X2 for Centroid 4: 0.35134; 0.85358

Some business areas where K-Means clustering can be implemented

K-means clustering is a versatile algorithm and can be used for many business use cases for any type of grouping. Some examples are:

 Ø  Behavioral Segregation:

  • Division using purchase history
  • Division using application, website, or platform activities
  • Identify customers’ image based on their interests
  • Profile creation with monitoring activities

Ø  Image Scaling

  • Image compression using Python

Ø  Sensor measurements:

  • Detect motion sensors activity types
  • Group images
  • Divide audio
  • Spot health monitoring groups

Ø  Determine bots or anomalies:

  • Separate activity groups from bots
  • Make a group of valid activities to clean up outlier detection

Ø  Inventory classification:

  • Make inventory groups by sales activity
  • Make inventory groups by manufacturing metrics

Must Read: MATLAB Data Types

Advantages of K-Means Clustering

There’s a reason why top professionals prefer the K-Means clustering algorithm. Some benefits it offers:

  • It is a fast, robust, and easier to understand the algorithm.
  • The end-efficiency is relatively high
  • Offers phenomenal results when data sets are different from each other. For higher variables values, K-Means works comparatively quicker
  • The clusters produced with K-Means are relatively tighter than other clustering methods.

Popular AI and ML Blogs & Free Courses

Conclusion

Ads of upGrad blog

K-means clustering is a broadly used approach for analysing data clusters. Once you gain command, it is easier to understand and apply and deliver results quickly.

We hope with this article; we could introduce you to this analysis technique. For any queries regarding the K-means algorithm, feel free to comment below.

Further, if this field of study interests you, have a look at our PG Diploma in Machine Learning and AI program which is specially curated for working professionals offering 30+ case studies & assignments, 25+ mentorship sessions from industry experts, 10 Practical Hands-on Capstone Projects, 450+ hours of learning and placement assistance.

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Select Coursecaret down icon
Selectcaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Frequently Asked Questions (FAQs)

1What is K Means clustering in machine learning?

This is a popular clustering algorithm used in unsupervised machine learning. K Means algorithm works on the principle of identification of K centroids randomly. From the next step, the algorithm tries to maximize the overall within cluster distance and also minimize the overall between cluster distance. K Means algorithm is an iterative approach. In each iteration, it selects the K Means from the current set of centroids. The algorithm then assigns each observation to the closest K Mean. The distance between two clusters is computed based on the distance between the two closest observations. The Centroid of a cluster is defined as the average of all the observations in the cluster.

2What are the limitations of the K Means clustering algorithm?

There are some limitations of K Means that you will want to keep in mind when using it. K Means is not robust to outliers. The K Means algorithm only works well when all of your data points are approximately the same distance from the centroid. If some of your data points are far away from the centroid, this will bias the assignment of other data points to clusters. K Means does not guarantee a unique solution. If you have more than one cluster of points, there is no guarantee that K Means will return the same number of clusters each time the algorithm is run. K Means converges slowly. The algorithm converges very slowly, even on small datasets.

3What are the advantages of K Means clustering?

It is effective for both single and multiple dimensions. It is applicable in both two and three dimensions. It is particularly useful in situations where there are many clusters. The clusters are obtained at the mid-point of the data points. A mean value is calculated for each cluster. Each point is divided by the standard deviation and then it is compared to the mean value. The mean value and the standard deviation are calculated for all clusters and points.

Explore Free Courses

Suggested Blogs

Artificial Intelligence course fees
5438
Artificial intelligence (AI) was one of the most used words in 2023, which emphasizes how important and widespread this technology has become. If you
Read More

by venkatesh Rajanala

29 Feb 2024

Artificial Intelligence in Banking 2024: Examples & Challenges
6177
Introduction Millennials and their changing preferences have led to a wide-scale disruption of daily processes in many industries and a simultaneous g
Read More

by Pavan Vadapalli

27 Feb 2024

Top 9 Python Libraries for Machine Learning in 2024
75641
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 Feb 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced
64469
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 Feb 2024

Data Preprocessing in Machine Learning: 7 Easy Steps To Follow
153003
Summary: In this article, you will learn about data preprocessing in Machine Learning: 7 easy steps to follow. Acquire the dataset Import all the cr
Read More

by Kechit Goyal

18 Feb 2024

Artificial Intelligence Salary in India [For Beginners & Experienced] in 2024
908762
Artificial Intelligence (AI) has been one of the hottest buzzwords in the tech sphere for quite some time now. As Data Science is advancing, both AI a
Read More

by upGrad

18 Feb 2024

24 Exciting IoT Project Ideas & Topics For Beginners 2024 [Latest]
760407
Summary: In this article, you will learn the 24 Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Smart Agr
Read More

by Kechit Goyal

18 Feb 2024

Natural Language Processing (NLP) Projects & Topics For Beginners [2023]
107750
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

17 Feb 2024

45+ Interesting Machine Learning Project Ideas For Beginners [2024]
328371
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

16 Feb 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon