What is a decision tree?¶
- Internal nodes (circles).
- Label (splitting criterion).
- For each (outcome), an edge to (child node).
- Leaf nodes (squares).
- Label (decision).
How to classify?¶
- Trace from root to leaves.
How to build a decision stump?¶
- A decision stump is a decision tree with depth .
- Choose a splitting attribute.
- Use majority voting to determine .
- Which decision stump is better? Left/right because of o__________.
Binary splits for numeric attributes¶
- C__________ m__-points as s________ points.
- Which is/are the best split(s)? left/middle/right.
- How to build a tree instead of a stump? R__________ split (d_____-and-c______).
How to build a decision tree?¶
- Greedy algorithm (See Han11 Fig 8.3 for the full version)
How to find good splitting attribute?¶
- Given the data to split, choose the splitting attribute that minimizes e____ of decision stump by .
- What is the precise formula?
- : set of tuples in with .
- : fraction of tuples in belonging to class .
Example¶
- What is the best splitting attribute? .
Further split on ¶
Issue of greedy algorithm¶
- Locally optimal split may not be g_______ optimal.
- Why splitting on is not good? Child nodes of are less p___.
- Why misclassification rate fails? It neglects the distribution of the class values of m____________ instances.
How to remain greedy but not myopic?¶
- Find better i measures than misclassification rate.
- How to measure impurity?
E.g., order the following distributions in ascending order of impurities:
___ (purest) < ___ < ___ < ___
- Given a distribution of the class values of , how to define a non-negative function of ’s that respect the above ordering?
- works? Yes/No
- works? Yes/No
Gini Impurity Index¶
Why it works?¶
- . Equality iff . Why?
- . Equality iff . Why?
Finding the best split using Gini impurity¶
- Minimize the Gini impurity given a :
- What is the best splitting attribute? .
An impurity measure from information theory¶
- Shannon’s entropy
- Measured in bits with base-2 logarithm. Why?
- is regarded as even though is undefined.
Why it works?¶
- . Equality iff . Why?
- . Equality iff . Why?
Finding the best split by conditional entropy¶
Minimize the entropy given :
- What is the best splitting attribute? .
Which impurity measure is used?¶
ID3 (Iterative Dichotomiser 3) maximizes
CART (Classification and Regression Tree)
- Is a good splitting attribute? Yes/No.
Bias towards attributes with many outcomes¶
- An attribute with more outcomes tends to
- reduce impurity more but
- result in more comparisons.
- Issues: Such attribute may not minimize impurity per comparison.
- Remedies?
Binary split also for nominal attributes¶
- CART uses a s____________ to generate a binary split (whether ).
- The number of outcomes is therefore limited to ___.
Normalization by split information¶
- C4.5/J48 allows m________ split but uses information gain ratio:
where . - is the entropy of __________ because __________________.
- Attributes with many outcomes tend to have smaller/larger .
How to avoid overfitting¶
- P__-pruning: Limit the size of the tree as we build it. E.g.,
- Ensure each node is supported by enough examples. (C4.5: minimum number of objects.)
- Split only if we are confident enough about the improvement. (C4.5: confidence factor.)
- P___-pruning: Reduce the size of the tree after we build it. E.g.,
- Contract leaf nodes if complexity outweighs the risk. (CART: cost-complexity pruning)
References¶
- 8.1 Basic Concepts
- 8.2 Decision Tree Induction
- Optional readings