K-Nearest Neighbors (KNN) is a simple, non-parametric algorithm used for both classification and regression tasks. It makes predictions for a new data point based on the K closest data points (neighbors).
Not to be confused with K-means clustering. KNN is a supervised algorithm for classification/regression finding nearest neighbors, K-means is an unsupervised algorithm for clustering finding cluster centroids.

How It Works

  1. Choose the number K of neighbors
  2. Calculate the distance between the query instance and all training samples
  3. Sort the distances and determine the K nearest neighbors
  4. For classification:
    • Aggregate the class labels of K neighbors
    • Return the majority vote as the prediction
  5. For regression:
    • Aggregate the values of K neighbors
    • Return the mean value as the prediction

As KNN algorithm does not build a model during the training phase, it is called a lazy learner. The algorithm memories the entire training dataset and performs an action on the dataset at the time of prediction.

Distance Metrics

  1. Euclidean Distance:

  2. Manhattan Distance:

  3. Minkowski Distance:

Computational Approaches

  1. Brute Force: Calculates distances to all points. Simple but inefficient for large datasets.
  2. K-D Tree: Space-partitioning data structure for organizing points in K-dimensional space: “The basic idea is that if point 𝐴 is very distant from point 𝐵, and point 𝐵 is very close to point 𝐶, then we know that points 𝐴 and 𝐶 are very distant, without having to explicitly calculate their distance”. Efficient for low-dimensional data.
  3. Ball Tree: Hierarchical data structure using hyper-spheres. Efficient for high-dimensional data.

Advantages

  1. Simple implement
  2. No assumptions about data distribution (non-parametric)
  3. Can capture complex decision boundaries

Disadvantages

  1. Computationally expensive for large datasets
  2. Sensitive to the scale of features
  3. Curse of dimensionality: performance degrades with high-dimensional data

Choosing K

  1. Odd K for binary classification to avoid ties
  2. Square root of N (where N is the number of samples) as a rule of thumb
  3. Use cross-validation to find optimal K
  4. Elbow Method: calculate error for different values of K, make a plot, look for an “elbow” on it - where the error begins to level off.