0% found this document useful (0 votes)
7 views

K-Nearest Neighbors Clearly Explained

K-Nearest Neighbors (KNN) is a supervised learning algorithm used for classification and regression by predicting the label of an unlabelled data point based on the majority class of its K nearest neighbors. The process involves choosing the number of neighbors (K), calculating distances using metrics like Euclidean distance, identifying the nearest neighbors, and making predictions. Choosing the optimal K is crucial for model accuracy, with recommendations for cross-validation and avoiding extremes in K values.

Uploaded by

wiktor05
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

K-Nearest Neighbors Clearly Explained

K-Nearest Neighbors (KNN) is a supervised learning algorithm used for classification and regression by predicting the label of an unlabelled data point based on the majority class of its K nearest neighbors. The process involves choosing the number of neighbors (K), calculating distances using metrics like Euclidean distance, identifying the nearest neighbors, and making predictions. Choosing the optimal K is crucial for model accuracy, with recommendations for cross-validation and avoiding extremes in K values.

Uploaded by

wiktor05
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

linkedin.

com/in/vikrantkumar95

K-Nearest Neighbors
Clearly Explained
What is K-Nearest Neighbors?
K-Nearest Neighbors (KNN) is a Supervised Learning method. It’s
quite similar to K-Means Clustering we saw earlier, which as an
Unsupervised Learning Method. We use KNN when we already have a
labeled set of clusters and we’re trying to predict the label for a given
set of unlabelled data points. It can be used for both classification
and regression.

Suppose we had a dataset with labeled fruits. Now suppose we had a


new fruit come in and we wanted to classify it. As the name suggests,
we would look at it’s nearest neighbors. How many neighbors do we
look at? That’s the K that we decide before hand. Here let’s say our K
is 4.

That new fruit would have 3 Apples and 1 Orange as it’s nearest
neighbors. We would then classify it based on the majority class,
which is an Apple.

Apples

Pineapples

Oranges

linkedin.com/in/vikrantkumar95
How does KNN work?
We saw earlier what KNN is. So how does it work? How does it find the
nearest neighbors? How do we decide on the value of K? Let’s take a look.

We’ll quantify our initial fruits example. Given below is a table with the
fruits’ weight and color intensity score along with their label.

Fruit Weight (kg) Color Score Type

F1 0.15 0.8 Apple

F2 0.17 0.6 Orange

F3 0.14 0.9 Apple

F4 0.18 0.7 Orange

F5 0.35 0.8 Pineapple

F6 0.3 0.65 Pineapple

F7 0.145 0.85 Apple

F5
F6

F4
F2
Weight

F7
F1
F3

Color Score

linkedin.com/in/vikrantkumar95
How does KNN work?
We have our data set. Now suppose we get a new fruit that we need
to classify:

Fruit Weight (kg) Color Score Type

F8 0.27 0.75 ?

F5
F6
F8

F4
F2
Weight

F1F7
F3

Color Score

Now, in order to classify our new fruit, F8, using KNN, we have to
execute the following steps:

Step 1: Choose the number of neighbors (k).


Step 2: Measure the distance between data points (e.g., Euclidean
distance).
Step 3: Identify the k nearest neighbors.
Step 4: Make a prediction (majority class in classification).

linkedin.com/in/vikrantkumar95
Choosing K & Calculating Distance
Step 1: Choose the number of neighbors (k)

Let’s arbitrarily take k = 3 for this example

Step 2: Measure the distance between data points

We need to first calculate the distances between the point we’re


trying to predict and all the other labeled points. There’s a number of
ways to calculate the distance, but we’ll use the most common one,
Euclidean Distance

F5
F6
F8

F4
F2
Weight

F1F7
F3

Color Score

The formula for Euclidean Distance is:

linkedin.com/in/vikrantkumar95
Calculating Distances
We need to calculate the distance of our new fruit F8, from all the
other fruits in the labelled dataset.

Fruit Weight (kg) Color Score Type

F8 0.27 0.75 ?

Let’s calculate the distance from F1 as an example:

Distance
Fruit Weight (kg) Color Score Type
from F8

F1 0.15 0.8 Apple -

Similarly we canculate the distance of F8 from all the other fruits:

Distance
Fruit Weight (kg) Color Score Type
from F8

F1 0.15 0.8 Apple 0.130

F2 0.17 0.6 Orange 0.180

F3 0.14 0.9 Apple 0.198

F4 0.18 0.7 Orange 0.103

F5 0.35 0.8 Pineapple 0.094

F6 0.3 0.65 Pineapple 0.104

F7 0.145 0.85 Apple 0.160

linkedin.com/in/vikrantkumar95
Identify Nearest Neighbors
Step 3: Identify the k nearest neighbors.

We’ve calculated the distances. Now we need to identify the k


nearest neighbors. Since we assumed k to be equal to 3, we need to
find the 3 nearest neighbors with the least distances.

We’ll order the dataset by increasing order of distance from F8 and


create a cutoff at 3

Distance
Fruit Weight (kg) Color Score Type
from F8

F5 0.35 0.8 Pineapple 0.094

F4 0.18 0.7 Orange 0.103

F6 0.3 0.65 Pineapple 0.104

F1 0.15 0.8 Apple 0.130

F7 0.145 0.85 Apple 0.160

F2 0.17 0.6 Orange 0.180

F3 0.14 0.9 Apple 0.198

Looking at the distances, the nearest 3 neighbors are : F5, F4, and F6

Now let’s see next how we classify our unknown fruit based on the
nearest neighbors we’ve identified.

linkedin.com/in/vikrantkumar95
Make a Prediction

To classify / predict which type of fruit is F8 based on the nearest


neighbors we identified, we simply take the majority class.

In this case, out of the 3 nearest neighbors, 2 were Pineapple and 1


was Orange. Hence, we’ll classify our new point F8 as a Pineapple!

F5
F6
F8

F4
F2
Weight

F1F7
F3

Color Score

Fruit Weight (kg) Color Score Type

F8 0.27 0.75 Pineapple

Now that we’ve seen how KNN works, there’s still a major question
that you might have: How do we decide on the value of K? Let’s take
a look in the next section!

linkedin.com/in/vikrantkumar95
How to Choose the value of K?
First let’s recap: What is K?

K is the number of nearest neighbors the algorithm considers


when making predictions.
The value of K directly affects the accuracy and performance of
the model.

What are the Pros and Cons of having a very high or a very low K?.

K Pros Cons

Captures small Very sensitive to


Low (eg. K = 1) patterns in the noise, leading to
data. overfitting.

More stable
May oversmooth,
predictions and
High (eg. K = 40) ignoring small but
less sensitive to
important patterns.
noise.

How to Find the Optimal K?

Cross-Validation:
Use cross-validation to test different values of K and choose the
one with the best performance.

Avoid Too Small or Too Large K:


Too Small K: High variance, overfits to the training data.
Too Large K: High bias, oversmooths the data.

Rule of thumb:

As a general rule of thumb, root of n (total data points) is a good place


to start
linkedin.com/in/vikrantkumar95
Let’s Summarise

Let’s summarise what KNN is:

K-Nearest Neighbors (KNN) is a simple and versatile supervised


learning algorithm used for classification and regression.
It works by identifying the K closest data points (neighbors) and
predicting based on their majority vote (classification) or average
value (regression).
Key Components:
Distance metrics (e.g., Euclidean, Manhattan).
The value of K (number of neighbors to consider).
Feature scaling (important for accurate distance calculations).

Tips for Using KNN:

Scale Your Data: Use normalization or standardization to ensure


features contribute equally to distance calculations.
Choose the Right K: Use cross-validation to find the optimal K value
for your dataset.
Eliminate Irrelevant Features: Use feature selection or dimensionality
reduction to remove noise and improve performance.
Monitor Class Imbalance: If your dataset is imbalanced, use
techniques like weighted voting or synthetic data generation to
balance the classes.

linkedin.com/in/vikrantkumar95
Enjoyed
reading?

Follow for
everything Data
and AI!

linkedin.com/in/vikrantkumar95

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy