Skip to article frontmatterSkip to article content

Nearest neighbor (NN) classifier

  • Also called Instance-Based (IB) classifier:
    1. Find the closest example.
    2. Copy the class value.

What is the complexity?

  • With a dataset of size nn with dimension kk:
    • O(________) to train.
      • What is the knowledge learned? L____ learning algorithm.
    • O(________) to classify.
      • O(________) on average possible by pre-sorting data into search tree ( tree) if n2kn \gg 2^k.
      • Speed up using partial distance, editing/pruning/condensing.
  • Does NN classifier make good decision?

Decision regions and boundaries

  • Decision boundaries separate regions with different decisions (decision r________).
  • What are the decision boundaries for a NN classifier?
  • The decision boundaries for NN classifier can be obtained from V__________ diagram.

NN without normalization

  • Consider predicting weight using height and age. Will the decision regions/boundaries depend on the unit, e.g., cm vs m?
  • Same decision regions? Yes/No

Min-max normalization

Standard normalization

  • What about features with possibly unbounded support?
    • Min-max normalization fails because, as _____ increases, the normalization factor
      maxjzjminjzj.\max_j z_j - \min_j z_j \to \infty.
  • z-score/standard normalization:
    zi:=ziμσz_i' := \frac{z_i - \mu}{\sigma}
    • with mean μ and standard deviation σ of ziz_i’s.
    • This works for features with unbounded support because ____ is 1, not zero.

Measure of distance/similarity

  • Numeric attributes: Euclidean, Manhattan, Minkowski or supremum distances, Jaccard coefficient, term-frequency vectors, cosine measure, Tanimoto coefficient, …

  • Nominal attributes: indicator of mismatch or d___________________.

  • Missing values: E.g., use maximum possible difference

    • Numeric: dist(?,?)=x,dist(?,v)=x\op{dist}(?,?) = \underline{\phantom{x}}, \quad \op{dist}(?, v) = \underline{\phantom{x}}
    • Nominal: dist(?,?)=dist(?,v)=x\op{dist}(?,?) = \op{dist}(?, v) = \underline{\phantom{x}}

Pros and Cons of NN classification

  • Can learn without too many examples: True/False
  • Can avoid overfitting: True/False

kk-nearest neighbor (kk-NN or IBkk) classifier

  • y^=x\hat{y}=\underline{\phantom{x}} for x=(0.75,0.75)\M{x}=(0.75, 0.75).
  • Instance 4 is regarded as an outlier.
  • Any issue? u__________
  • How to choose the best kk?

References

  • 9.5 Lazy Learners (or Learning from Your Neighbors)