K Means Clustering Matlab [With Source Code]
Updated on Apr 25, 2025 | 18 min read | 9.9k views
Share:
For working professionals
For fresh graduates
More
Updated on Apr 25, 2025 | 18 min read | 9.9k views
Share:
Table of Contents
K-Means clustering is a powerful tool for diverse applications like image segmentation and market analysis, thanks to its specialized toolboxes and seamless integrations.
MATLAB’s ability to handle complex data preprocessing, advanced visualization, and algorithm customization (e.g., custom distance metrics or k-means++) makes it ideal for these tasks.
Its intuitive interface and built-in functions streamline workflows, enabling precise parameter tuning and validation for efficient clustering, even with large datasets.
Enhance your data skills with upGrad’s comprehensive data science courses and learn how to apply K-means clustering in MATLAB to solve complex data problems. Become an expert in analyzing datasets effectively and deriving meaningful insights. Start your learning journey today!
K-means clustering is a popular unsupervised learning algorithm that groups data into clusters based on similarity, with each data point assigned to the nearest mean. MATLAB streamlines this process using the kmeans() function, offering an efficient and user-friendly platform for both beginners and experts.
Its robust tools make it ideal for pattern recognition and data simplification across domains like finance, biology, and image processing.
Master machine learning and take your AI skills to the next level with these expert-led courses:
Here are the major features of this process:
Key Features of K Means Clustering MATLAB
MATLAB provides a robust environment for clustering tasks, offering several key advantages, including:
Example: Before applying K-Means, a dataset with numerical variables can be standardized using MATLAB’s normalize() function to ensure uniform scaling.
Now that the main features of K Means Clustering MATLAB have been covered, let us have a look at what makes MATLAB highly suitable for K-Means clustering:
Why MATLAB Excels in K-Means Clustering?
MATLAB provides a seamless workflow for clustering tasks, integrating preprocessing, clustering, and evaluation in one platform. Its built-in kmeans MATLAB functions allow easy implementation without extensive coding. Some examples that showcase its utility include:
Examples of K-Means Clustering in MATLAB
Cluster RGB values of an image to divide it into regions based on color similarity. This technique is widely used in practical scenarios like isolating tumors in medical imaging or identifying objects in satellite images for environmental analysis.
L = imsegkmeans(image, 5); % Segments the image into 5 clusters
imshow(label2rgb(L));
Cluster customers by age, income, and spending to identify behavioral patterns. This helps in creating personalized marketing strategies.
[idx, C] = kmeans(customerData, 3); % Groups data into 3 segments
scatter3(customerData(:,1), customerData(:,2), customerData(:,3), 10, idx);
Also Read: Clustering in Machine Learning: Learn About Different Techniques and Applications
Having understood the significance and basic concepts of K-means clustering in MATLAB, the next step is to learn how to implement it effectively. Let’s dive into the practical aspects of coding and executing K-means clustering to bring its capabilities to life in MATLAB.
K-means clustering is a powerful technique for grouping data points into clusters based on their similarities. MATLAB simplifies the implementation of this algorithm through its built-in kmeans() function, which enables efficient clustering with minimal code.
This section provides a detailed explanation of K-means clustering in MATLAB, starting with the syntax, progressing through the steps to implement it, and concluding with result interpretation.
MATLAB Syntax for K-Means Clustering
Before diving into implementation, it's essential to understand the syntax of MATLAB's kmeans() function.
[clusterIdx, clusterCenters] = kmeans(data, k, Name, Value);
Quick Tip: Always preprocess your data by scaling features to avoid bias caused by differing feature scales. For instance, use MATLAB's normalize() function to standardize your data before clustering:
data = normalize(data);
Understanding the syntax ensures you can tailor the function to specific data and clustering needs. With this foundation, let’s explore how K-means clustering is implemented in MATLAB:
Steps:
Here are the major steps involved in the implementation of K-Means clustering in MATLAB in detail:
Step 1: Data Preparation in MATLAB
Data should be in matrix format, with rows as observations and columns as features. In MATLAB, you can load data from external sources or generate synthetic data to prepare it for clustering.
Example: Generating synthetic data for clustering.
% Generate random data with three clusters
rng(1); % Set seed for reproducibility
data = [randn(50,2)*0.75 + ones(50,2);
randn(50,2)*0.5 - ones(50,2);
randn(50,2)*0.6 + [2, -2]]; % Create three clusters
With the data prepared, the next step is applying the clustering algorithm.
Step 2: Applying the kmeans() Function
The kmeans() function is the core of the clustering process. It assigns each data point to one of the specified clusters and calculates the cluster centroids iteratively.
Example: Applying the kmeans() function to the prepared data.
% Number of clusters
k = 3;
% Perform K-means clustering
[clusterIdx, clusterCenters] = kmeans(data, k, 'Distance', 'sqeuclidean', 'Replicates', 5);
The output includes:
Now that clustering is complete, the next step involves visualizing and interpreting the results.
Step 3: Visualizing the Results
Visualization is crucial for understanding the clustering process and verifying the quality of clusters. MATLAB offers plotting tools to display the clustered data and centroids.
Example: Scatter plot visualization of clusters.
% Plot data points with clusters
figure;
gscatter(data(:,1), data(:,2), clusterIdx, 'rgb', 'o', 8);
hold on;
% Plot centroids
plot(clusterCenters(:,1), clusterCenters(:,2), 'kx', 'MarkerSize', 15, 'LineWidth', 3);
title('K-Means Clustering Results');
xlabel('Feature 1');
ylabel('Feature 2');
legend('Cluster 1', 'Cluster 2', 'Cluster 3', 'Centroids');
The plot reveals the distribution of clusters and the central positions of the cluster centroids, providing visual confirmation of the clustering process.
Step 4: Interpreting the Results
Once the clusters are visualized, it's essential to interpret the output for meaningful insights.
Each data point is assigned a cluster label (e.g., Cluster 1, 2, or 3), indicating its group.
The centroid coordinates represent the mean values of the points within each cluster, summarizing the cluster’s characteristics.
Check for well-separated clusters in the plot.
Analyze cluster sizes and centroid positions to ensure they align with expectations.
Complete MATLAB Code Example
% Step 1: Data Preparation
rng(1);
data = [randn(50,2)+2; randn(50,2)-2; randn(50,2)+[-2, 2]];
% Step 2: K-Means Clustering
k = 3;
[idx, C] = kmeans(data, k, 'Distance', 'sqeuclidean', 'Replicates', 5);
% Step 3: Visualization
gscatter(data(:,1), data(:,2), idx, 'rbg', 'o', 8);
hold on;
plot(C(:,1), C(:,2), 'kx', 'MarkerSize', 10, 'LineWidth', 2);
title('K-Means Clustering in MATLAB');
xlabel('Feature 1'); ylabel('Feature 2');
legend('Cluster 1', 'Cluster 2', 'Cluster 3', 'Centroids');
hold off;
Output:
For larger or more complex datasets, using metrics like the silhouette coefficient or the Davies-Bouldin index helps assess the quality of clustering.
The silhouette coefficient measures how similar each point is to its own cluster compared to other clusters, while the Davies-Bouldin index evaluates the compactness and separation of clusters. Both metrics provide insights into the effectiveness of the clustering algorithm, helping you choose the best clustering configuration.
Dive deeper into clustering methods with upGrad's free course in Unsupervised Learning: Clustering. This course will enhance your understanding of K Means Clustering MATLAB and its application to actual data analysis.
By understanding the syntax of the kmeans() function, preparing the data, and visualizing the results, you can gain insights from your dataset.
Also Read: Supervised vs Unsupervised Learning: Difference Between Supervised and Unsupervised Learning
While these are the basics of K-Means clustering there are even more advanced techniques in MATLAB that are used to handle complex datasets and improve clustering performance.
These will be explored in-depth later, but before that, let’s have a look at the various practical uses of K-Means clustering with MATLAB.
K-Means clustering is best understood through practical, real applications. MATLAB’s powerful computational tools and built-in functions like kmeans() make it straightforward to apply the algorithm to diverse datasets.
This section explores detailed examples of K-Means clustering in action, showcasing its utility in solving complex problems such as location optimization for businesses and analyzing renowned datasets like Iris.
Through these examples of K means clustering, you’ll gain a hands-on understanding of the algorithm's implementation, visualization, and interpretation in MATLAB.
Also Read: 21 Best Ideas for MATLAB Projects & Topics For Beginners [2025]
A fast-food chain like McDonald’s aims to optimize new outlet locations based on customer distribution. Using K-means clustering, you can identify high-density customer areas and strategically place new outlets to maximize accessibility.
Here is a step-by-step look at the process:
Step 1: Data Preparation
Prepare the customer location dataset, which includes geographical coordinates (latitude and longitude).
% Generate synthetic customer location data
rng(2); % Ensure reproducibility
customerLocations = [randn(100,2)*0.8 + [10, 20];
randn(80,2)*0.5 + [15, 25];
randn(60,2)*0.6 + [20, 15]]; % Three clusters
Step 2: Apply K-Means Clustering
Cluster customer locations into three groups to determine potential outlet areas.
% Number of clusters (outlets)
k = 3;
% Perform K-means clustering
[clusterIdx, clusterCenters] = kmeans(customerLocations, k, 'Replicates', 10);
Step 3: Visualization
Visualize the clusters and potential outlet locations.
% Scatter plot of customer clusters
figure;
gscatter(customerLocations(:,1), customerLocations(:,2), clusterIdx, 'rgb', 'o', 8);
hold on;
% Mark outlet locations (centroids)
plot(clusterCenters(:,1), clusterCenters(:,2), 'kx', 'MarkerSize', 15, 'LineWidth', 3);
title('Optimal Outlet Locations');
xlabel('Longitude');
ylabel('Latitude');
legend('Cluster 1', 'Cluster 2', 'Cluster 3', 'Proposed Outlets');
Results
Understand the various aspects behind consumer choices with upGrad's free course in Introduction to Consumer Behavior. Learn how clustering techniques, like K-means, can segment consumers and optimize marketing strategies.
Let’s have a look at another example next:
The Iris dataset contains measurements of flower species. K-means clustering can classify the flowers into species groups based on their features without prior labels.
Here are the required steps for this process:
Step 1: Load and Prepare Data
Load the Iris dataset, selecting features for clustering (e.g., petal length and width).
% Load Iris dataset
load fisheriris;
data = meas(:,3:4); % Petal length and width
Step 2: Perform K-Means Clustering
Cluster the data into three groups corresponding to the three Iris species.
% Number of clusters
k = 3;
% K-means clustering
[clusterIdx, clusterCenters] = kmeans(data, k, 'Replicates', 5);
Step 3: Visualization
Visualize the clustering results using a scatter plot.
% Plot clusters
figure;
gscatter(data(:,1), data(:,2), clusterIdx, 'rgb', 'o', 8);
hold on;
% Mark centroids
plot(clusterCenters(:,1), clusterCenters(:,2), 'kx', 'MarkerSize', 15, 'LineWidth', 3);
title('Iris Clustering Using K-Means');
xlabel('Petal Length');
ylabel('Petal Width');
legend('Cluster 1', 'Cluster 2', 'Cluster 3', 'Centroids');
Results
While the practical examples of K means clustering show how it can be implemented in MATLAB for specific use cases, its versatility extends far beyond isolated examples.
Also Read: MATLAB Data Types: Everything You Need to Know
Let’s explore the broader applications of K-means clustering in real business scenarios to understand its widespread impact.
K-means clustering is applied across diverse industries to solve clustering problems. Below is a table summarizing its use in various fields.
Industry |
Application |
Example |
Retail | Customer segmentation | Grouping customers based on purchase behavior for targeted marketing. |
Healthcare | Disease pattern analysis | Clustering patient data to identify disease subtypes. |
Finance | Risk profiling | Categorizing clients by risk levels for investment strategies. |
Logistics | Distribution network optimization | Determining optimal warehouse locations. |
Telecommunications | User behavior analysis | Segmenting users based on data usage. |
These practical examples and uses in various fields demonstrate the power of K-means clustering in MATLAB, making it an indispensable tool for data-driven decision-making.
Also Read: Top 15 Types of Data Visualization: Benefits and How to Choose the Right Tool for Your Needs in 2025
Having seen the practical examples of K-means clustering in MATLAB, it’s also important to understand its strengths and weaknesses. Next let’s have a look at the key advantages and limitations of using K-means clustering in MATLAB, helping you make more informed decisions about when and how to apply it effectively.
MATLAB offers many advantages, like an intuitive interface, powerful computation, and excellent visualization tools, making it a favorite among data scientists. However, these strengths are balanced by limitations, such as handling non-spherical clusters, sensitivity to initialization, and challenges with large or high-dimensional datasets in complex scenarios.
Below is a detailed exploration of the strengths and weaknesses of K-means clustering in MATLAB, accompanied by potential solutions for overcoming these limitations.
K-means clustering in MATLAB offers simplicity, scalability, and speed, making it ideal for partitioning large datasets. It is easy to implement and can handle various types of data. Let's explore the key benefits it brings to clustering tasks.
1. Intuitive Implementation with the kmeans() Function
2. High Computational Efficiency
3. Powerful Visualization Tools
4. Flexibility in Distance Metrics
5. Support for Parallel Computing
While K-means clustering in MATLAB offers several advantages, it’s important to recognize that no algorithm is perfect. Despite its strengths, K-means comes with some limitations that can affect its performance under certain conditions.
Also Read: Clustering vs Classification: Difference Between Clustering & Classification
Let’s now take a look at these challenges and explore potential solutions to overcome them.
While K-means is efficient, it has limitations, such as sensitivity to initial centroids and difficulty handling non-spherical clusters. Let’s have a look at these challenges and discuss solutions to overcome them.
1. Sensitivity to Initialization
Poor centroid initialization can lead to suboptimal clustering results, especially with complex datasets.
Solutions:
[clusterIdx, clusterCenters] = kmeans(data, k, 'Replicates', 20);
Advanced Initialization Methods: Implement k-means++ for smarter centroid placement. While MATLAB lacks a built-in k-means++ option, custom implementations or external libraries can be used to address this.
2. Requirement to Predefine the Number of Clusters (k)
Solutions:
Elbow Method: Plot the within-cluster sum of squares (WCSS) against different k values to find the optimal number of clusters.
wcss = [];
for i = 1:10
[~, ~, sumd] = kmeans(data, i);
wcss = [wcss; sum(sumd)];
End
plot(1:10, wcss, '-o');
title('Elbow Method');
xlabel('Number of Clusters');
ylabel('WCSS');
Silhouette Analysis: Use MATLAB’s silhouette function to evaluate cluster quality for varying values of k.
silhouette(data, clusterIdx);
3. Poor Performance on Non-Spherical Data
Solutions:
Switch to Other Algorithms: Use algorithms like DBSCAN or Gaussian Mixture Models for non-spherical data.
Kernel K-Means: Transform the data into a higher-dimensional space using a kernel function before applying K-means. MATLAB’s Statistics and Machine Learning Toolbox supports custom kernel functions.
4. Impact of Outliers
Solutions:
Preprocess Data: Remove or transform outliers using statistics before clustering.
% Remove outliers using interquartile range
Q1 = quantile(data, 0.25);
Q3 = quantile(data, 0.75);
IQR = Q3 - Q1;
data = data(data > Q1 - 1.5*IQR & data < Q3 + 1.5*IQR, :);
Robust Clustering Variants: Implement robust K-means variants that assign lower weights to outliers, such as Trimmed K-means.
5. Challenges with High-Dimensional Data
Solutions:
Dimensionality Reduction: Use techniques like Principal Component Analysis (PCA) in machine learning or t-SNE before applying K-means.
[coeff, score] = pca(data);
reducedData = score(:,1:3); % Select top 3 components
[clusterIdx, clusterCenters] = kmeans(reducedData, k);
Feature Selection: Reduce irrelevant features by calculating feature importance scores.
After understanding the advantages and limitations of K-means clustering, it’s clear that while the algorithm is useful, there are areas where it can be further improved. To address these challenges and enhance its performance, advanced techniques can be applied.
Learn the essential data concepts with upGrad's free course in Data Structures & Algorithms. Strengthen your foundation for implementing K-means clustering efficiently, especially for large and complex datasets.
Let’s explore some of these methods that can take K-means clustering to the next level in MATLAB.
K-means clustering is a popular algorithm for grouping data, but it struggles with high-dimensional, large, or irregular datasets. MATLAB offers advanced techniques to enhance K-means, making it effective for complex clustering scenarios.
Here's a look at these methods, their use cases, and practical MATLAB implementations.
Mini-Batch K-Means, ideal for large datasets, processes small random data batches. This is done instead of the entire dataset, making it faster and more memory-efficient.
Advantages:
Implementation in MATLAB:
MATLAB does not have a built-in Mini-Batch K-Means function, but it can be implemented using custom code. Here's a step-by-step process:
data = rand(10000, 3); % Simulated dataset with 10,000 points
batchSize = 100; % Define batch size
k = 5; % Number of clusters
maxIterations = 50;
% Initialize centroids randomly
centroids = data(randperm(size(data, 1), k), :);
for iter = 1:maxIterations
batchIndices = randperm(size(data, 1), batchSize); % Select random batch
batch = data(batchIndices, :);
% Assign data points in batch to nearest centroid
[~, clusterIdx] = pdist2(centroids, batch, 'euclidean', 'Smallest', 1);
% Update centroids
for i = 1:k
pointsInCluster = batch(clusterIdx == i, :);
if ~isempty(pointsInCluster)
centroids(i, :) = (1 - 0.1) * centroids(i, :) + 0.1 * mean(pointsInCluster, 1);
end
end
end
Result:
While Mini-Batch K-Means offers an efficient solution for large datasets, another powerful optimization technique is parallel processing. By distributing the computational load across multiple cores or GPUs, you can further enhance the speed and scalability of K-means clustering.
K-means clustering can be computationally intensive for large datasets or high-dimensional data. MATLAB’s parallel computing tools allow the workload to be distributed across multiple CPU cores or GPUs, significantly speeding up computation.
Advantages:
Implementation in MATLAB:
MATLAB supports parallel processing using the parpool and parfor commands. The built-in kmeans function can automatically leverage parallel processing.
High-dimensional data often complicates K-means clustering due to the "curse of dimensionality," where distances between points become less meaningful. Dimensionality reduction techniques, including PCA or t-SNE, can simplify the data while preserving its essential structure.
Advantages:
Implementation in MATLAB:
data = rand(500, 50); % Simulated high-dimensional dataset (500 points, 50 features)
[coeff, score, ~] = pca(data); % Perform PCA
reducedData = score(:, 1:3); % Retain top 3 principal components
k = 5; % Number of clusters
[clusterIdx, centroids] = kmeans(reducedData, k);
scatter3(reducedData(:,1), reducedData(:,2), reducedData(:,3), 10, clusterIdx, 'filled');
title('Clusters After PCA Dimensionality Reduction');
xlabel('PC1'); ylabel('PC2'); zlabel('PC3');
Result:
While dimensionality reduction addresses high-dimensional data, K-means struggles with non-spherical clusters. For such cases, alternative methods like Gaussian Mixture Models (GMM) can be used to capture complex cluster shapes and relationships better.
Traditional K-means assume clusters are spherical, which may not hold true for all datasets. MATLAB’s variations, such as Gaussian Mixture Models (GMM), can better handle non-spherical cluster shapes.
Advantages:
Implementation Using MATLAB’s GMM:
data = rand(500, 3); % Simulated dataset
k = 3; % Number of clusters
% Fit a GMM model
gm = fitgmdist(data, k);
% Assign clusters based on posterior probabilities
clusterIdx = cluster(gm, data);
scatter3(data(:,1), data(:,2), data(:,3), 10, clusterIdx, 'filled');
title('Non-Spherical Clusters Using GMM');
xlabel('Feature 1'); ylabel('Feature 2'); zlabel('Feature 3');
Result:
From Mini-Batch K-Means for large-scale data to dimensionality reduction and parallel processing, these methods address specific challenges, making clustering more accurate and efficient.
While learning K-Means clustering in MATLAB is key, truly understanding it requires ongoing learning and practice. With upGrad, you’ll get expert guidance and hands-on projects to build confidence and take your skills to the next level.
To learn K-Means clustering in MATLAB and advance your data science career in 2025, it’s essential to understand machine learning algorithms, data preprocessing, and model evaluation.
upGrad’s specialized courses offer in-depth expertise in K-Means clustering and other essential techniques, empowering you to analyze and interpret data effectively.
Some of the top programs to help you enhance your learning include:
Struggling to analyze complex datasets or master tools like K-Means clustering? Speak with upGrad counselors or visit your nearest Career Center to find the right program to get you started. Build confidence in data analysis and take the next big step in your tech career.
Expand your expertise with the best resources available. Browse the programs below to find your ideal fit in Best Machine Learning and AI Courses Online.
Discover in-demand Machine Learning skills to expand your expertise. Explore the programs below to find the perfect fit for your goals.
Discover popular AI and ML blogs and free courses to deepen your expertise. Explore the programs below to find your perfect fit.
900 articles published
Get Free Consultation
By submitting, I accept the T&C and
Privacy Policy
Top Resources