0% found this document useful (0 votes)

7 views

K-Nearest Neighbors Clearly Explained

K-Nearest Neighbors (KNN) is a supervised learning algorithm used for classification and regression by predicting the label of an unlabelled data point based on the majority class of its K nearest neighbors. The process involves choosing the number of neighbors (K), calculating distances using metrics like Euclidean distance, identifying the nearest neighbors, and making predictions. Choosing the optimal K is crucial for model accuracy, with recommendations for cross-validation and avoiding extremes in K values.

Uploaded by

wiktor05

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

K-Nearest Neighbors Clearly Explained

Uploaded by

wiktor05

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

linkedin.

com/in/vikrantkumar95

K-Nearest Neighbors
Clearly Explained
What is K-Nearest Neighbors?
K-Nearest Neighbors (KNN) is a Supervised Learning method. It’s
quite similar to K-Means Clustering we saw earlier, which as an
Unsupervised Learning Method. We use KNN when we already have a
labeled set of clusters and we’re trying to predict the label for a given
set of unlabelled data points. It can be used for both classification
and regression.

Suppose we had a dataset with labeled fruits. Now suppose we had a

new fruit come in and we wanted to classify it. As the name suggests,
we would look at it’s nearest neighbors. How many neighbors do we
look at? That’s the K that we decide before hand. Here let’s say our K
is 4.

That new fruit would have 3 Apples and 1 Orange as it’s nearest
neighbors. We would then classify it based on the majority class,
which is an Apple.

Apples

Pineapples

Oranges

linkedin.com/in/vikrantkumar95
How does KNN work?
We saw earlier what KNN is. So how does it work? How does it find the
nearest neighbors? How do we decide on the value of K? Let’s take a look.

We’ll quantify our initial fruits example. Given below is a table with the
fruits’ weight and color intensity score along with their label.

Fruit Weight (kg) Color Score Type

F1 0.15 0.8 Apple

F2 0.17 0.6 Orange

F3 0.14 0.9 Apple

F4 0.18 0.7 Orange

F5 0.35 0.8 Pineapple

F6 0.3 0.65 Pineapple

F7 0.145 0.85 Apple

F5
F6

F4
F2
Weight

F7
F1
F3

Color Score

linkedin.com/in/vikrantkumar95
How does KNN work?
We have our data set. Now suppose we get a new fruit that we need
to classify:

Fruit Weight (kg) Color Score Type

F8 0.27 0.75 ?

F5
F6
F8

F4
F2
Weight

F1F7
F3

Color Score

Now, in order to classify our new fruit, F8, using KNN, we have to
execute the following steps:

Step 1: Choose the number of neighbors (k).

Step 2: Measure the distance between data points (e.g., Euclidean
distance).
Step 3: Identify the k nearest neighbors.
Step 4: Make a prediction (majority class in classification).

linkedin.com/in/vikrantkumar95
Choosing K & Calculating Distance
Step 1: Choose the number of neighbors (k)

Let’s arbitrarily take k = 3 for this example

Step 2: Measure the distance between data points

We need to first calculate the distances between the point we’re

trying to predict and all the other labeled points. There’s a number of
ways to calculate the distance, but we’ll use the most common one,
Euclidean Distance

F5
F6
F8

F4
F2
Weight

F1F7
F3

Color Score

The formula for Euclidean Distance is:

linkedin.com/in/vikrantkumar95
Calculating Distances
We need to calculate the distance of our new fruit F8, from all the
other fruits in the labelled dataset.

Fruit Weight (kg) Color Score Type

F8 0.27 0.75 ?

Let’s calculate the distance from F1 as an example:

Distance
Fruit Weight (kg) Color Score Type
from F8

F1 0.15 0.8 Apple -

Similarly we canculate the distance of F8 from all the other fruits:

Distance
Fruit Weight (kg) Color Score Type
from F8

F1 0.15 0.8 Apple 0.130

F2 0.17 0.6 Orange 0.180

F3 0.14 0.9 Apple 0.198

F4 0.18 0.7 Orange 0.103

F5 0.35 0.8 Pineapple 0.094

F6 0.3 0.65 Pineapple 0.104

F7 0.145 0.85 Apple 0.160

linkedin.com/in/vikrantkumar95
Identify Nearest Neighbors
Step 3: Identify the k nearest neighbors.

We’ve calculated the distances. Now we need to identify the k

nearest neighbors. Since we assumed k to be equal to 3, we need to
find the 3 nearest neighbors with the least distances.

We’ll order the dataset by increasing order of distance from F8 and

create a cutoff at 3

Distance
Fruit Weight (kg) Color Score Type
from F8

F5 0.35 0.8 Pineapple 0.094

F4 0.18 0.7 Orange 0.103

F6 0.3 0.65 Pineapple 0.104

F1 0.15 0.8 Apple 0.130

F7 0.145 0.85 Apple 0.160

F2 0.17 0.6 Orange 0.180

F3 0.14 0.9 Apple 0.198

Looking at the distances, the nearest 3 neighbors are : F5, F4, and F6

Now let’s see next how we classify our unknown fruit based on the
nearest neighbors we’ve identified.

linkedin.com/in/vikrantkumar95
Make a Prediction

To classify / predict which type of fruit is F8 based on the nearest

neighbors we identified, we simply take the majority class.

In this case, out of the 3 nearest neighbors, 2 were Pineapple and 1

was Orange. Hence, we’ll classify our new point F8 as a Pineapple!

F5
F6
F8

F4
F2
Weight

F1F7
F3

Color Score

Fruit Weight (kg) Color Score Type

F8 0.27 0.75 Pineapple

Now that we’ve seen how KNN works, there’s still a major question
that you might have: How do we decide on the value of K? Let’s take
a look in the next section!

linkedin.com/in/vikrantkumar95
How to Choose the value of K?
First let’s recap: What is K?

K is the number of nearest neighbors the algorithm considers

when making predictions.
The value of K directly affects the accuracy and performance of
the model.

What are the Pros and Cons of having a very high or a very low K?.

K Pros Cons

Captures small Very sensitive to

Low (eg. K = 1) patterns in the noise, leading to
data. overfitting.

More stable
May oversmooth,
predictions and
High (eg. K = 40) ignoring small but
less sensitive to
important patterns.
noise.

How to Find the Optimal K?

Cross-Validation:
Use cross-validation to test different values of K and choose the
one with the best performance.

Avoid Too Small or Too Large K:

Too Small K: High variance, overfits to the training data.
Too Large K: High bias, oversmooths the data.

Rule of thumb:

As a general rule of thumb, root of n (total data points) is a good place

to start
linkedin.com/in/vikrantkumar95
Let’s Summarise

Let’s summarise what KNN is:

K-Nearest Neighbors (KNN) is a simple and versatile supervised

learning algorithm used for classification and regression.
It works by identifying the K closest data points (neighbors) and
predicting based on their majority vote (classification) or average
value (regression).
Key Components:
Distance metrics (e.g., Euclidean, Manhattan).
The value of K (number of neighbors to consider).
Feature scaling (important for accurate distance calculations).

Tips for Using KNN:

Scale Your Data: Use normalization or standardization to ensure

features contribute equally to distance calculations.
Choose the Right K: Use cross-validation to find the optimal K value
for your dataset.
Eliminate Irrelevant Features: Use feature selection or dimensionality
reduction to remove noise and improve performance.
Monitor Class Imbalance: If your dataset is imbalanced, use
techniques like weighted voting or synthetic data generation to
balance the classes.

linkedin.com/in/vikrantkumar95
Enjoyed
reading?

Follow for
everything Data
and AI!

linkedin.com/in/vikrantkumar95

Marketplace: System Requirements Specification (SRS)
100% (1)
Marketplace: System Requirements Specification (SRS)
12 pages
Refund Apple ENG
No ratings yet
Refund Apple ENG
11 pages
ISO-9000 Marketing Procedures
100% (4)
ISO-9000 Marketing Procedures
30 pages
K Nearest Neighbor - Step by Step Tutorial
No ratings yet
K Nearest Neighbor - Step by Step Tutorial
16 pages
KNN Activity
No ratings yet
KNN Activity
4 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
Group 4 Documentation of KNN 1
No ratings yet
Group 4 Documentation of KNN 1
5 pages
K-Nearest-Neighbors-KNN-A-Fundamental-Machine-Learning-Algorithm (1).pptx
No ratings yet
K-Nearest-Neighbors-KNN-A-Fundamental-Machine-Learning-Algorithm (1).pptx
11 pages
ML Notes
100% (2)
ML Notes
125 pages
K - Nearest Neighbours (K-NN) Algorithm
No ratings yet
K - Nearest Neighbours (K-NN) Algorithm
10 pages
Sample KNN
No ratings yet
Sample KNN
7 pages
Clustering - KNN
No ratings yet
Clustering - KNN
10 pages
Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang
7 pages
K- Nearest Neighbor
No ratings yet
K- Nearest Neighbor
13 pages
A Complete Guide To KNN
No ratings yet
A Complete Guide To KNN
16 pages
KNN
No ratings yet
KNN
29 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Bài-nhóm-tìm-hiểu-về-KNN
No ratings yet
Bài-nhóm-tìm-hiểu-về-KNN
5 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
KNN ALGO[1]
No ratings yet
KNN ALGO[1]
9 pages
KNN REPORT
No ratings yet
KNN REPORT
28 pages
knn
No ratings yet
knn
2 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
KNN Updated
No ratings yet
KNN Updated
30 pages
Machine Learning KNN - Supervised
No ratings yet
Machine Learning KNN - Supervised
9 pages
KNN
No ratings yet
KNN
3 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
K Nearest neighbour’s(knn)[1] using R
No ratings yet
K Nearest neighbour’s(knn)[1] using R
9 pages
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
No ratings yet
Ml 7th Sem Aiml Ite Notes Complete Long[1]-63-155
93 pages
KNN_Algorithm
No ratings yet
KNN_Algorithm
2 pages
Experiment No 7 ML
No ratings yet
Experiment No 7 ML
4 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
Lazy LearningClassification Using Nearest Neighbors
No ratings yet
Lazy LearningClassification Using Nearest Neighbors
36 pages
k-Nearest Neighbors (k-NN) Algorithm
No ratings yet
k-Nearest Neighbors (k-NN) Algorithm
10 pages
What Is KNN
No ratings yet
What Is KNN
9 pages
S3-K-Nearest-Neighbor-LKW-15Jan2025
No ratings yet
S3-K-Nearest-Neighbor-LKW-15Jan2025
16 pages
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
No ratings yet
Seminar Report File On KNN Models: University Institute of Engineering and Technology, Kurukshetra University
24 pages
2.2 Lazy Learning
No ratings yet
2.2 Lazy Learning
26 pages
KNN Using Python
No ratings yet
KNN Using Python
23 pages
K- Nearest Neighbors.pptx
No ratings yet
K- Nearest Neighbors.pptx
33 pages
K Nearest Neighbor
No ratings yet
K Nearest Neighbor
33 pages
AIML PPT[1]
No ratings yet
AIML PPT[1]
13 pages
K_NN classification
No ratings yet
K_NN classification
4 pages
Shubh
No ratings yet
Shubh
10 pages
Mastering K-Nearest Neighbors (KNN) For Accurate Predictions
No ratings yet
Mastering K-Nearest Neighbors (KNN) For Accurate Predictions
18 pages
1694600817-Unit2.3 KNN CU 2.0
No ratings yet
1694600817-Unit2.3 KNN CU 2.0
25 pages
Unit 3 KNN
No ratings yet
Unit 3 KNN
16 pages
K Nearestneighborknnalgorithm 241117075907 d767c46d
No ratings yet
K Nearestneighborknnalgorithm 241117075907 d767c46d
13 pages
Presentation of KNN-1
No ratings yet
Presentation of KNN-1
18 pages
5. K-Nearest Neighbors
No ratings yet
5. K-Nearest Neighbors
35 pages
Chapter 4. K Nearest Neighbors (2)
No ratings yet
Chapter 4. K Nearest Neighbors (2)
55 pages
WEEK 07
No ratings yet
WEEK 07
24 pages
'Machine Learning (Nagarjun)
No ratings yet
'Machine Learning (Nagarjun)
10 pages
Summary of K-Nearest Neighbours Algorithms
No ratings yet
Summary of K-Nearest Neighbours Algorithms
1 page
4.kNN Concepts
No ratings yet
4.kNN Concepts
12 pages
Machine Learning Unit-3.1
No ratings yet
Machine Learning Unit-3.1
20 pages
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Analytics in Python
No ratings yet
Chapter 7 - K-Nearest-Neighbor: Data Mining For Business Analytics in Python
21 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
KMEANS
No ratings yet
KMEANS
9 pages
Apple Vision Pro For Dummies
From Everand
Apple Vision Pro For Dummies
Marc Saltzman
No ratings yet
Web Applications and Security Revision Notes
No ratings yet
Web Applications and Security Revision Notes
10 pages
CX32L003_UserManual_1.0.2_FlashDetails_EN_BVE_04Sep2023
No ratings yet
CX32L003_UserManual_1.0.2_FlashDetails_EN_BVE_04Sep2023
11 pages
@vtucode.in-21AI63-model-set-1-paper
No ratings yet
@vtucode.in-21AI63-model-set-1-paper
2 pages
Lecture 2. AI Techniques
No ratings yet
Lecture 2. AI Techniques
23 pages
Java Deserialization Vulnerabilites The Forgotten Bug Class Redacted v1
100% (1)
Java Deserialization Vulnerabilites The Forgotten Bug Class Redacted v1
57 pages
Where Are All Valorant Servers Located
No ratings yet
Where Are All Valorant Servers Located
2 pages
Indian Register Quality Systems Questionnaire - ISO 9001:2015 (QMS)
No ratings yet
Indian Register Quality Systems Questionnaire - ISO 9001:2015 (QMS)
4 pages
Yao X Thesis 2023
No ratings yet
Yao X Thesis 2023
185 pages
Computer Science Model Paper XI (Paper I)
No ratings yet
Computer Science Model Paper XI (Paper I)
16 pages
Consume Spring SOAP Web Services Using Client Application - Part II
No ratings yet
Consume Spring SOAP Web Services Using Client Application - Part II
11 pages
Epson Stylus Photo 2100 - 2200 Dis Assembly
No ratings yet
Epson Stylus Photo 2100 - 2200 Dis Assembly
24 pages
MJML 06 FJ
No ratings yet
MJML 06 FJ
13 pages
Module 4
No ratings yet
Module 4
27 pages
Data NUCLEO-G0B1RE
No ratings yet
Data NUCLEO-G0B1RE
44 pages
ReactJS
No ratings yet
ReactJS
30 pages
Pssed 1
No ratings yet
Pssed 1
40 pages
Mental Health Tracking Application (Batch-7)
No ratings yet
Mental Health Tracking Application (Batch-7)
22 pages
Dokumen - Pub - C Programming For Beginners Crash Course Ultimate Edition 9781542764803 1542764807
50% (2)
Dokumen - Pub - C Programming For Beginners Crash Course Ultimate Edition 9781542764803 1542764807
43 pages
UNIT4
No ratings yet
UNIT4
16 pages
SWIFT CSP Security Controls Public 2022
No ratings yet
SWIFT CSP Security Controls Public 2022
30 pages
Stova - Event Creation & Navigation
No ratings yet
Stova - Event Creation & Navigation
17 pages
Cutting-Edge Technology in Library Services
No ratings yet
Cutting-Edge Technology in Library Services
2 pages
Mpec16 It
No ratings yet
Mpec16 It
433 pages
Program Listno-1
No ratings yet
Program Listno-1
1 page
NETW191 Module 2 PPT Template 10.23
No ratings yet
NETW191 Module 2 PPT Template 10.23
7 pages
Module 1 2
No ratings yet
Module 1 2
23 pages
02 PDF
100% (1)
02 PDF
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

K-Nearest Neighbors Clearly Explained

Uploaded by

K-Nearest Neighbors Clearly Explained

Uploaded by

linkedin.

Suppose we had a dataset with labeled fruits. Now suppose we had a

Fruit Weight (kg) Color Score Type

F1 0.15 0.8 Apple

F2 0.17 0.6 Orange

F3 0.14 0.9 Apple

F4 0.18 0.7 Orange

F5 0.35 0.8 Pineapple

F6 0.3 0.65 Pineapple

F7 0.145 0.85 Apple

Fruit Weight (kg) Color Score Type

Step 1: Choose the number of neighbors (k).

Let’s arbitrarily take k = 3 for this example

Step 2: Measure the distance between data points

We need to first calculate the distances between the point we’re

The formula for Euclidean Distance is:

Fruit Weight (kg) Color Score Type

Let’s calculate the distance from F1 as an example:

F1 0.15 0.8 Apple -

Similarly we canculate the distance of F8 from all the other fruits:

F1 0.15 0.8 Apple 0.130

F2 0.17 0.6 Orange 0.180

F3 0.14 0.9 Apple 0.198

F4 0.18 0.7 Orange 0.103

F5 0.35 0.8 Pineapple 0.094

F6 0.3 0.65 Pineapple 0.104

F7 0.145 0.85 Apple 0.160

We’ve calculated the distances. Now we need to identify the k

We’ll order the dataset by increasing order of distance from F8 and

F5 0.35 0.8 Pineapple 0.094

F4 0.18 0.7 Orange 0.103

F6 0.3 0.65 Pineapple 0.104

F1 0.15 0.8 Apple 0.130

F7 0.145 0.85 Apple 0.160

F2 0.17 0.6 Orange 0.180

F3 0.14 0.9 Apple 0.198

To classify / predict which type of fruit is F8 based on the nearest

In this case, out of the 3 nearest neighbors, 2 were Pineapple and 1

Fruit Weight (kg) Color Score Type

F8 0.27 0.75 Pineapple

K is the number of nearest neighbors the algorithm considers

Captures small Very sensitive to

How to Find the Optimal K?

Avoid Too Small or Too Large K:

As a general rule of thumb, root of n (total data points) is a good place

Let’s summarise what KNN is:

K-Nearest Neighbors (KNN) is a simple and versatile supervised

Tips for Using KNN:

Scale Your Data: Use normalization or standardization to ensure

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.