Skip to article frontmatterSkip to article content

Motivation

  • When is the decision equal to 1?
    1. If _____________________, then Y=1\R{Y}=1.
    2. Else Y=0\R{Y}=0.

Knowledge representation

  • Benefits representing knowledge by rules: (c.f. decision tree or NN)
    • M____________________________
    • I_____________________________
  • How to generate rules?

Generate rules from a decision tree

  • Each path from root to leaf corresponds to a rule:
    1. X1=xY=0\R{X}_1 = \underline{\phantom{x}} \Rightarrow \R{Y} = 0
    2. X1=x,X2=xY=0\R{X}_1 = \underline{\phantom{x}}, \R{X}_2 = \underline{\phantom{x}} \Rightarrow \R{Y} = 0
    3. X1=x,X2=xY=1\R{X}_1 = \underline{\phantom{x}}, \R{X}_2 = \underline{\phantom{x}} \Rightarrow \R{Y} = 1
  • Does the ordering of these rules matter? Yes/No because__________________________________________________________________

Sequential covering

  • S________-and-c________ (c.f. divide-and-conquer)
    • Learn a good rule.
    • Remove covered instances and repeat 1 until all instances are covered.
  • How to learn a good rule?

PART (partial tree) decision list

  • PART (partial tree) decision list:
    1. Build a new decision tree (by C4.5) and extract the rule that maximizes coverage: fraction of instances satisfying the antecedent.
    2. Remove covered instances and repeat 1 until all instances are covered.

Example

  1. Rule 1: ________________

    1. X1=0Y=0(coverage:xxx%)\R{X}_1 = 0 \Rightarrow \R{Y} = 0 \quad (\text{coverage:} \underline{\phantom{xxx}} \%)
    2. X1=1,X2=0Y=0(coverage:xxx%)\R{X}_1 = 1, \R{X}_2 = 0 \Rightarrow \R{Y} = 0 \quad (\text{coverage:} \underline{\phantom{xxx}} \%)
    3. X1=1,X2=1Y=1(coverage:xxx%)\R{X}_1 = 1, \R{X}_2 = 1 \Rightarrow \R{Y} = 1 \quad (\text{coverage:} \underline{\phantom{xxx}} \%)
  2. Rule 2: ________________

    1. X2=0Y=0(coverage:xxx%)\R{X}_2 = 0 \Rightarrow \R{Y} = 0 \quad (\text{coverage:} \underline{\phantom{xxx}} \%)
    2. X2=1Y=1(coverage:xxx%)\R{X}_2 = 1 \Rightarrow \R{Y} = 1 \quad (\text{coverage:} \underline{\phantom{xxx}} \%)
  3. Default rule: Y=xxx \R{Y} = \underline{\phantom{xxx}}

  4. Issue: [Time complexity] _______________________________________

Generating rule directly

  1. Start with ZeroR, add conjuncts to improve confidence: fraction of correctly classified instances.

    • Rule 1: Y=0\R{Y} = 0
      • Confidence: xxx%\underline{\phantom{xxx}} \%
    • Rule 1 (refined): X1=0Y=0\R{X}_1 = 0 \Rightarrow \R{Y} = 0
      • Confidence: xxx%\underline{\phantom{xxx}} \%
  2. Repeatedly add new rules to cover remaining tuples

    • Rule 2: Y=0\R{Y} = 0
      • Confidence: xxx%\underline{\phantom{xxx}} \%
    • Rule 2 (refined): X2=0Y=0\R{X}_2 = 0 \Rightarrow \R{Y} = 0
      • Confidence: xxx%\underline{\phantom{xxx}} \%
    • Default rule: Y=xxx\R{Y} = \underline{\phantom{xxx}}
  • Decision list
    1. Rule 1: X1=0Y=0\R{X}_1 = 0 \Rightarrow \R{Y} = 0
    2. Rule 2: X2=0Y=0\R{X}_2 = 0 \Rightarrow \R{Y} = 0
    3. Default rule: Y=1\R{Y} = 1
  • Is the list best possible? Yes/No
    1. Time to detect positive class: xxx\underline{\phantom{xxx}}
    2. Length of the list: xxx\underline{\phantom{xxx}}

Class-based ordering

  • Learn rules for positive class first:
    1. Rule 1:
      1. Y=1(confidence:xxx%)\R{Y} = 1 \quad (\text{confidence:} \underline{\phantom{xxx}} \%)
      2. X1=xxxY=1(confidence:xxx%)\R{X}_1 = \underline{\phantom{xxx}} \Rightarrow \R{Y} = 1 \quad (\text{confidence:} \underline{\phantom{xxx}} \%)
      3. X1=xxx,X2=xxxY=1(confidence:xxx%)\R{X}_1 = \underline{\phantom{xxx}}, \R{X}_2 = \underline{\phantom{xxx}} \Rightarrow \R{Y} = 1 \quad (\text{confidence:} \underline{\phantom{xxx}} \%)
    2. Default rule: Y=xxx\R{Y} = \underline{\phantom{xxx}}
  • Will the above guarantee a short decision list in general? Yes/No because xxx\underline{\phantom{xxx}}

First Order Inductive Learner Gain

  • Add conjunct that maximizes
    FOIL_Gain=p(logpp+nlogpp+n)\begin{align} \op{FOIL\_Gain} &= p' \left( \log \frac{p'}{p' + n'} - \log \frac{p}{p + n} \right) \end{align}
    • Change in the number of positives: ppp \rightarrow p'
    • Change in the number of negatives: nnn \rightarrow n'
  • Y=1X1=0Y=1\R{Y} = 1 \rightarrow \R{X}_1 = 0 \Rightarrow \R{Y} = 1:

    FOIL_Gain=\op{FOIL\_Gain} = \underline{\phantom{\kern11em}}

  • Y=1X1=1Y=1\R{Y} = 1 \rightarrow \R{X}_1 = 1 \Rightarrow \R{Y} = 1:

    FOIL_Gain=\op{FOIL\_Gain} = \underline{\phantom{\kern11em}}

  • First/Second is better.

  • X1=1Y=1X1=1,X2=0Y=1\R{X}_1 = 1 \Rightarrow \R{Y} = 1 \rightarrow \R{X}_1 = 1, \R{X}_2 = 0 \Rightarrow \R{Y} = 1:

    FOIL_Gain=\op{FOIL\_Gain} = \underline{\phantom{\kern11em}}

  • X1=1Y=1X1=1,X2=1Y=1\R{X}_1 = 1 \Rightarrow \R{Y} = 1 \rightarrow \R{X}_1 = 1, \R{X}_2 = 1 \Rightarrow \R{Y} = 1:

    FOIL_Gain=\op{FOIL\_Gain} = \underline{\phantom{\kern11em}}

  • First/Second is better.

FOIL_Gain=p(logpp+nlogpp+n)=(p+n)(1)pp+n(2)(logpp+nlogpp+n)(3)\begin{align} \op{FOIL\_Gain} &= p' \left( \log \frac{p'}{p' + n'} - \log \frac{p}{p + n} \right)\\ &= \underbrace{(p' + n')}_{\text{(1)}} \underbrace{\frac{p'}{p' + n'}}_{\text{(2)}} \underbrace{\left( \log \frac{p'}{p' + n'} - \log \frac{p}{p + n} \right)}_{\text{(3)}} \end{align}
  • Heuristics:

    • (1) favors rules with large coverage/confidence.
    • (2)*(3) favors rules with large coverage/confidence given the same coverage/confidence.
    • (3) ensures FOIL_Gain\op{FOIL\_Gain} is positive if coverage/confidence increases.
  • [Challenge] Why not use information gain or gain ratio?

How to avoid overfitting?

  • Repeated Incremental Pruning to Produce Error Reduction
  • After each new rule, eliminate a conjunct (starting with the most recently added one) if it improves the following on a v_________ set:
    FOIL_Prune=pnp+n\op{FOIL\_Prune} = \frac{p - n}{p + n}
    or equivalently reduces
    error=np+n.\op{error} = \frac{n}{p + n}.

References