Blog_Banner_Asset
    Homebreadcumb forward arrow iconBlogbreadcumb forward arrow iconArtificial Intelligencebreadcumb forward arrow iconK-Nearest Neighbors Algorithm in Machine Learning [With Examples]

K-Nearest Neighbors Algorithm in Machine Learning [With Examples]

Last updated:
27th Oct, 2020
Views
Read Time
8 Mins
share image icon
In this article
Chevron in toc
View All
K-Nearest Neighbors Algorithm in Machine Learning [With Examples]

Introduction

Machine Learning is undoubtedly one of the most happening and powerful technologies in today’s data driven world where we are collecting more amount of data every single second. This is one of the rapid growing technology where every domain and every sector has its own use cases and projects.

Machine Learning or Model Development is one of the phases in a Data Science Project Life Cycle which seems to be one of the most important on as well. This article is designed as an introduction to KNN (K-Nearest Neighbors) in Machine Learning.

Top Machine Learning and AI Courses Online

Ads of upGrad blog

K-Nearest Neighbors

 If you’re familiar with machine learning or have been a part of Data Science or AI team, then you’ve probably heard of the k-Nearest Neighbors algorithm, or simple called as KNN. This algorithm is one of the go to algorithms used in machine learning because it is easy-to-implement, non-parametric, lazy learning and has low calculation time.

Another advantage of k-Nearest Neighbors algorithm is that it can be used for both Classification and Regression type of Problems. If you are unaware of the difference between these two then let me make it clear to you, the main difference between Classification and Regression is that the output variable in regression is numerical(Continuous) while that for classification is categorical(Discrete).

Read: KNN Algorithms in R

How does k-Nearest Neighbors work?

K-nearest neighbors (KNN) algorithm uses the technique ‘feature similarity’ or ‘nearest neighbors’ to predict the cluster that a new data point fall into. Below are the few steps based on which we can understand the working of this algorithm better

Trending Machine Learning Skills

Enrol for the Machine Learning Course from the World’s top Universities. Earn Masters, Executive PGP, or Advanced Certificate Programs to fast-track your career.

Step 1For implementing any algorithm in Machine learning, we need a cleaned data set ready for modelling. Let’s assume that we already have a cleaned dataset which has been split into training and testing data set.

Step 2As we already have the data sets ready, we need to choose the value of K (integer) which tells us how many nearest data points we need to take into consideration to implement the algorithm. We can get to know how to determine the k value in the later stages of the article.

Step 3This step is an iterative one and needs to be applied for each data point in the dataset

I. Calculate the distance between test data and each row of training data using any of the distance metric

a. Euclidean distance

b. Manhattan distance

c. Minkowski distance

d. Hamming distance.

 Many data scientists tend to use the Euclidean distance, but we can get to know the significance of each one in the later stage of this article.

II. We need to sort the data based on the distance metric that we have used in the above step.

III. Choose the top K rows in the transformed sorted data.

IV. Then it will assign a class to the test point based on most frequent class of these rows.

Step 4End

How to determine the K value? 

We need to select an appropriate K value to in order to achieve the maximum accuracy of the model, but there are no pre-defined statistical methods to find the most favorable value of K. But most of them use the Elbow Method.

Elbow method starts with computing the Sum of Squared Error (SSE) for some values of k. The SSE is the sum of the squared distance between each member of the cluster and its centroid. 

SSE=∑Ki=1∑xcidist(x,ci)2SSE=∑∑xcidist(x,ci)2

 If you plot different values of k against the SSE, we can see that the error decreases as the value of k gets larger, this happens because when the number of clusters increases, the clusters will tend to become smaller, so distortion will also be smaller. The idea of the elbow method is to choose the k at which the SSE decreases suddenly signifying the shape of elbow.

In some cases, there are more than one elbow, or no elbow at all. In such cases we usually end up calculating the best k by evaluating how well k-means ML Algorithm performs in the context of the problem you are trying to solve.

Also Read: Machine Learning Models

Types of Distance Metric

 Let’s get to know about the different distance metrics used to calculate the distance between two data points one by one.

1. Euclidean distance – Euclidean distance is the square root of the sum of squared distance between two points.

2. Manhattan distance – Manhattan distance is the sum of the absolute values of the differences between two points.

3. Minkowski distance – Minkowski distance is used to find distance similarity between two points. Based on the below formula changes to either Manhattan distance (When p=1) and Euclidean distance (When p=2).

4. Hamming distance – Hamming distance is used for categorical variables. This metric will tell whether two categorical variables are the same or not.

Applications of KNN

Predicting a new customer’s Credit rating based on already available customers credit usages and ratings.

  1. Whether to sanction a loan or not? to a candidate.
  2. Classifying given transaction is fraudulent or not.
  3. Recommendation System (YouTube, Netflix)
  4. Handwriting detection (like OCR).
  5. Image recognition.
  6. Video recognition.

Pros and Cons of KNN

 Machine Learning consists of many algorithms, so each one has its own advantages and disadvantages. Depending on the industry, domain and the type of the data and different evaluation metrics for each algorithm, a Data Scientist should choose the best algorithm that fits and answers the Business problem. Let us see few Pros and Cons of K-Nearest Neighbors.

Ads of upGrad blog

Must Read: Machine Learning Project Ideas

Pros

  1. Easy to use, understand and interpret.
  2. Quick calculation time.
  3. No assumptions about data.
  4. High accuracy of predictions.
  5. Versatile – Can be used for both Classification and Regression Business Problems.
  6. Can be used for Multi Class Problems as well.
  7. We have only one Hyper parameter to tweak at Hyperparameter Tuning step.

Cons

  1. Computationally expensive and requires high memory as the algorithm stores all the training data.
  2. The algorithm gets slower as the variables increase.
  3. It is very Sensitive to irrelevant features.
  4. Curse of Dimensionality.
  5. Choosing the optimal value of K.
  6. Class Imbalanced dataset will cause problem.
  7. Missing values in the data also causes problem.

Popular AI and ML Blogs & Free Courses

Conclusion

This is a fundamental machine learning algorithm that is popularly known for ease of use and quick calculation time. This would be a decent algorithm to pick if you are very new to Machine Learning World and would like to complete the given task without much hassle.

If you’re interested to learn more about machine learning, check out IIIT-B & upGrad’s PG Diploma in Machine Learning & AI which is designed for working professionals and offers 450+ hours of rigorous training, 30+ case studies & assignments, IIIT-B Alumni status, 5+ practical hands-on capstone projects & job assistance with top firms.

Profile

Pavan Vadapalli

Blog Author
Director of Engineering @ upGrad. Motivated to leverage technology to solve problems. Seasoned leader for startups and fast moving orgs. Working on solving problems of scale and long term technology strategy.
Get Free Consultation

Selectcaret down icon
Select Area of interestcaret down icon
Select Work Experiencecaret down icon
By clicking 'Submit' you Agree to  
UpGrad's Terms & Conditions

Our Popular Machine Learning Course

Frequently Asked Questions (FAQs)

1Is the K-Nearest Neighbors algorithm expensive?

In the case of enormous datasets, the K-Nearest Neighbors algorithm can be expensive both in terms of computing time as well as storage. This is because this KNN algorithm has to save and store all of the training datasets to work. KNN is highly sensitive to the scale of training data since it depends on calculating the distances. This algorithm does not fetch outcomes based on assumptions about the training data. Even though this might not be the general case when you consider other supervised learning algorithms, the KNN algorithm is considered highly effective in solving problems that come with non-linear data points.

2What are some of the practical applications of the K-NN algorithm?

KNN algorithm is often used by businesses to recommend products to individuals who share common interests. For instance, companies can suggest TV shows based on viewer choices, apparel designs based on previous purchases, and hotel and accommodation options during tours based on bookings history. It can also be employed by financial institutions to assign credit ratings to customers based on similar financial features. Banks base their decisions of loan disbursal on specific applications that appear to share characteristics similar to defaulters. Advanced applications of this algorithm include image recognition, handwriting detection using OCR as well as video recognition.

3What does the future look like for machine learning engineers?

With further advancements in AI and machine learning, the market or demand for machine learning engineers looks very promising. By the latter half of 2021, there were around 23,000 jobs listed on LinkedIn for machine learning engineers. Global giant organizations starting from the likes of Amazon and Google to PayPal, Autodesk, Morgan Stanley, Accenture, and others, are always scouting for the top talents. With strong fundamentals in subjects like programming, statistics, machine learning, engineers can also assume leadership roles in data analytics, automation, AI integration, and other areas.

Explore Free Courses

Suggested Blogs

Top 5 Natural Language Processing (NLP) Projects & Topics For Beginners [2024]
109366
What are Natural Language Processing Projects? NLP project ideas advanced encompass various applications and research areas that leverage computation
Read More

by Pavan Vadapalli

30 May 2024

Top 8 Exciting AWS Projects & Ideas For Beginners [2024]
99183
AWS Projects & Topics Looking for AWS project ideas? Then you’ve come to the right place because, in this article, we’ve shared multiple AWS proj
Read More

by Pavan Vadapalli

30 May 2024

Bagging vs Boosting in Machine Learning: Difference Between Bagging and Boosting
91422
Owing to the proliferation of Machine learning applications and an increase in computing power, data scientists have inherently implemented algorithms
Read More

by Pavan Vadapalli

25 May 2024

45+ Best Machine Learning Project Ideas For Beginners [2024]
331320
Summary: In this Article, you will learn Stock Prices Predictor Sports Predictor Develop A Sentiment Analyzer Enhance Healthcare Prepare ML Algorith
Read More

by Jaideep Khare

21 May 2024

Top 9 Python Libraries for Machine Learning in 2024
76249
Machine learning is the most algorithm-intense field in computer science. Gone are those days when people had to code all algorithms for machine learn
Read More

by upGrad

19 May 2024

Top 15 IoT Interview Questions & Answers 2024 – For Beginners & Experienced
65214
These days, the minute you indulge in any technology-oriented discussion, interview questions on cloud computing come up in some form or the other. Th
Read More

by Kechit Goyal

19 May 2024

40 Best IoT Project Ideas & Topics For Beginners 2024 [Latest]
769941
In this article, you will learn the 40Exciting IoT Project Ideas & Topics. Take a glimpse at the project ideas listed below. Best Simple IoT Proje
Read More

by Kechit Goyal

19 May 2024

Top 22 Artificial Intelligence Project Ideas & Topics for Beginners [2024]
422816
In this article, you will learn the 22 AI project ideas & Topics. Take a glimpse below. Best AI Project Ideas & Topics Predict Housing Price
Read More

by Pavan Vadapalli

18 May 2024

Image Segmentation Techniques [Step By Step Implementation]
64595
What do you see first when you look at your selfie? Your face, right? You can spot your face because your brain is capable of identifying your face an
Read More

by Pavan Vadapalli

16 May 2024

Schedule 1:1 free counsellingTalk to Career Expert
icon
footer sticky close icon