K-NN (KNearestNeighbors)


 

K-Nearest Neighbors

K-nearest neighbors (K-NN) is a supervised machine learning algorithm that can be used to solve both classification and regression tasks. K-NN works similarly with the value of a data point is determined by the data points around it.

  • KNN classifies data points based on the majority of the other points that are closest to the point in question.
  • However, in order to use the algorithm, there was a need to input the number of neighbors as a parameter.
  • In the case of underfitting and overfitting, the number of neighbors do play an important role. This is because the number of neighbours determine how likely the model will overfit. The higher the number of neighbours, the less likely the model will overfit. If the number of neighbors is too high, it will be likely for the model to underfit.

let's understand through graph:- 

 


  • Graph shows a random distribution of a data for two separate values red and blue.
  • Now suppose you have entered a value of yellow data type (you can see in the graph a yellow dot).
  • Looking at the data, would you say that the unknown( yellow dot) input belong to the red data or blue data. 
  • Well you can see that yellow dot is very much closer to the red dot so, most likey this unknown input represents a red data. So, this is the problem where K-NN algorithm is applied.
  • Fig(b) shows the concept how K-NN decides or you can say predict output.
  • In fig (b) by taking the default value of K-NN i.e; 5 you can predict by seeing the graph that unknown input belongs to red data. 
  • This method of determining a group by its distance to other, already known data points, is called the K-nearest neighbours.  

 

K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories.

K-NN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data into a category that is much similar to the new data. 

Here is an example of how K-NN is applied on dataset in Python3.



Comments