Nearest neighbor (NN) classifier¶
- Also called Instance-Based (IB) classifier:
- Find the closest example.
- Copy the class value.
What is the complexity?¶
- With a dataset of size with dimension :
- O(________) to train.
- What is the knowledge learned? L____ learning algorithm.
- O(________) to classify.
- O(________) on average possible by pre-sorting data into search tree ( tree) if .
- Speed up using partial distance, editing/pruning/condensing.
- O(________) to train.
- Does NN classifier make good decision?
Decision regions and boundaries¶
- Decision boundaries separate regions with different decisions (decision r________).
- What are the decision boundaries for a NN classifier?
- The decision boundaries for NN classifier can be obtained from V__________ diagram.
NN without normalization¶
- Consider predicting weight using height and age. Will the decision regions/boundaries depend on the unit, e.g., cm vs m?
- Same decision regions? Yes/No
Min-max normalization¶
Standard normalization¶
- What about features with possibly unbounded support?
- Min-max normalization fails because, as _____ increases, the normalization factor
- z-score/standard normalization:
- with mean μ and standard deviation σ of ’s.
- This works for features with unbounded support because ____ is 1, not zero.
Measure of distance/similarity¶
Numeric attributes: Euclidean, Manhattan, Minkowski or supremum distances, Jaccard coefficient, term-frequency vectors, cosine measure, Tanimoto coefficient, …
Nominal attributes: indicator of mismatch or d___________________.
Missing values: E.g., use maximum possible difference
- Numeric:
- Nominal:
Pros and Cons of NN classification¶
- Can learn without too many examples: True/False
- Can avoid overfitting: True/False
-nearest neighbor (-NN or IB) classifier¶
- for .
- Instance 4 is regarded as an outlier.
- Any issue? u__________
- How to choose the best ?
References¶
- 9.5 Lazy Learners (or Learning from Your Neighbors)