Learning from Neighbors - CS5483

Skip to article frontmatter Skip to article content

Nearest neighbor (NN) classifier¶

Also called Instance-Based (IB) classifier:
1. Find the closest example.
2. Copy the class value.

What is the complexity?¶

With a dataset of size $n$ with dimension $k$ :
- O(________) to train.
  - What is the knowledge learned? L____ learning algorithm.
- O(________) to classify.
  - O(________) on average possible by pre-sorting data into search tree ( tree) if $n \gg 2^k$ .
  - Speed up using partial distance, editing/pruning/condensing.
Does NN classifier make good decision?

Decision regions and boundaries¶

Decision boundaries separate regions with different decisions (decision r________).
What are the decision boundaries for a NN classifier?

The decision boundaries for NN classifier can be obtained from V__________ diagram.

NN without normalization¶

Consider predicting weight using height and age. Will the decision regions/boundaries depend on the unit, e.g., cm vs m?

Same decision regions? Yes/No

Min-max normalization¶

Standard normalization¶

What about features with possibly unbounded support?
- Min-max normalization fails because, as _____ increases, the normalization factor
  $\max_j z_j - \min_j z_j \to \infty.$
  (1)

z-score/standard normalization:
$z_i' := \frac{z_i - \mu}{\sigma}$
(2)
- with mean μ and standard deviation σ of $z_i$ ’s.
- This works for features with unbounded support because ____ is 1, not zero.

Measure of distance/similarity¶

Numeric attributes: Euclidean, Manhattan, Minkowski or supremum distances, Jaccard coefficient, term-frequency vectors, cosine measure, Tanimoto coefficient, …
Nominal attributes: indicator of mismatch or d___________________.
Missing values: E.g., use maximum possible difference
- Numeric: $\op{dist}(?,?) = \underline{\phantom{x}}, \quad \op{dist}(?, v) = \underline{\phantom{x}}$
- Nominal: $\op{dist}(?,?) = \op{dist}(?, v) = \underline{\phantom{x}}$

Pros and Cons of NN classification¶

Can learn without too many examples: True/False

Can avoid overfitting: True/False

$k$ -nearest neighbor ( $k$ -NN or IB $k$ ) classifier¶

$\hat{y}=\underline{\phantom{x}}$ for $\M{x}=(0.75, 0.75)$ .
Instance 4 is regarded as an outlier.
Any issue? u__________
How to choose the best $k$ ?

References¶

9.5 Lazy Learners (or Learning from Your Neighbors)

Rule-Based Classification