Motivation¶
- When is the decision equal to 1?
- If _____________________, then .
- Else .
Knowledge representation¶
- Benefits representing knowledge by rules: (c.f. decision tree or NN)
- M____________________________
- I_____________________________
- How to generate rules?
Generate rules from a decision tree¶
- Each path from root to leaf corresponds to a rule:
- Does the ordering of these rules matter? Yes/No because__________________________________________________________________
Sequential covering¶
- S________-and-c________ (c.f. divide-and-conquer)
- Learn a good rule.
- Remove covered instances and repeat 1 until all instances are covered.
- How to learn a good rule?
PART (partial tree) decision list¶
- PART (partial tree) decision list:
- Build a new decision tree (by C4.5) and extract the rule that maximizes coverage: fraction of instances satisfying the antecedent.
- Remove covered instances and repeat 1 until all instances are covered.
Example¶
Rule 1: ________________
Rule 2: ________________
Default rule:
Issue: [Time complexity] _______________________________________
Generating rule directly¶
Start with ZeroR, add conjuncts to improve confidence: fraction of correctly classified instances.
- Rule 1:
- Confidence:
- Rule 1 (refined):
- Confidence:
- Rule 1:
Repeatedly add new rules to cover remaining tuples
- Rule 2:
- Confidence:
- Rule 2 (refined):
- Confidence:
- Default rule:
- Rule 2:
- Decision list
- Rule 1:
- Rule 2:
- Default rule:
- Is the list best possible? Yes/No
- Time to detect positive class:
- Length of the list:
Class-based ordering¶
- Learn rules for positive class first:
- Rule 1:
- Default rule:
- Rule 1:
- Will the above guarantee a short decision list in general? Yes/No because
First Order Inductive Learner Gain¶
- Add conjunct that maximizes
- Change in the number of positives:
- Change in the number of negatives:
:
:
First/Second is better.
:
:
First/Second is better.
Heuristics:
- (1) favors rules with large coverage/confidence.
- (2)*(3) favors rules with large coverage/confidence given the same coverage/confidence.
- (3) ensures is positive if coverage/confidence increases.
[Challenge] Why not use information gain or gain ratio?
How to avoid overfitting?¶
- Repeated Incremental Pruning to Produce Error Reduction
- After each new rule, eliminate a conjunct (starting with the most recently added one) if it improves the following on a v_________ set: or equivalently reduces
References¶
- 8.4 Rule-Based Classification
- (Optional) Eibe Frank, Ian H. Witten. “Generating accurate rule sets without global optimization.” Fifteenth International Conference on Machine Learning, 1998, p.144-151.
- A partial tree is built with nodes (subsets of data) split (expanded) in the order of their entropy.
- A node is considered for pruning by subtree replacement if all its children are leaf nodes.
- (Optional) Cohen, William W. “Fast effective rule induction.” Machine Learning Proceedings, 1995, p.115-123. (See also WEKA JRIP or its source code.)
- The algorithm stops adding rules to the rule-set if the description length of the new rule is 64 bits more than the minimum description length met.
- After the algorithm stops adding rules, there is a rule optimization step that optimizes each rule one-by-one.