Frequent-Pattern Analysis

load("datamining.mac")$

Apriori algorithm¶

Transactional data¶

A transactional data set is a list of transactions in the form of subsets of items purchased. all_items(data) returns the list of all items in a transactional data set data.

block(
    [
        data: [
            {    3,2,1},
            {    3,2,1},
            {  4,  2,1},
            {  4,  2,1},
            {5,  3,  1}
        ]
    ],
    all_items(data)
);

Frequent itemsets¶

The support count of an item set (a set of items) is the number of transactions that contain all items in the item set:

support_count('data, 'A);

where $\chi(p)$ is the indicator function of the predicate $p$ , and subsetp(A, T) returns true iff A is a subset of T:

chi('p);

describe(subsetp)$

support_counts(data, C) returns the list of [A,c] where A is an itemset in C, and c is the support count.

block(
    [
        data: [
            {    3,2,1},
            {    3,2,1},
            {  4,  2,1},
            {  4,  2,1},
            {5,  3,  1}
        ],
        C
    ],
    C: makelist({i},i,all_items(data)),
    support_counts(data, C)
);

frequent_itemsets(C, min_sup) filters the list of [A,c] to remove itemsets A with count c strictly below min_sup.

block(
    [
        data: [
            {    3,2,1},
            {    3,2,1},
            {  4,  2,1},
            {  4,  2,1},
            {5,  3,  1}
        ],
        min_sup:2,
        C
    ],
    C: support_counts(data, makelist({i},i,all_items(data))),
    frequent_itemsets(C, min_sup)
);

Frequent 1-itemsets¶

apriori1(data, min_sup) implements the apriori algorithm to generate the frequent 1-itemsets of data with support counts at least min_sup.

block(
    [
        data: [
            {    3,2,1},
            {    3,2,1},
            {  4,  2,1},
            {  4,  2,1},
            {5,  3,  1}
        ],
        min_sup:2
    ],
    apriori1(data, min_sup)
);

Frequent $k$ -itemsets¶

Given a transactional data set data and a list L of frequent $(k-1)$ -itemsets with $k>1$ , apriorik(data, L, min_sup) generates the list of [A,c] where A is a frequent $k$ -itemset of data with support count c at least min_sup.

block(
    [
        data: [
            {    3,2,1},
            {    3,2,1},
            {  4,  2,1},
            {  4,  2,1},
            {5,  3,  1}
        ],
        min_sup:2,
        L_1
    ],
    L_1: apriori1(data, min_sup),
    L_2: apriorik(data, L_1, min_sup),
    L_3: apriorik(data, L_2, min_sup),
    map(lambda([k, L], build_data_from_list([sconcat(k,"-itemset"), "count"], L)), [2, 3], [L_2, L_3])
);

apriori(data, min_sup) generates the list of frequent $k$ -itemsets for $k$ from 1 until there are no more frequent itemsets.

block(
    [
        data: [
            {    3,2,1},
            {    3,2,1},
            {  4,  2,1},
            {  4,  2,1},
            {5,  3,  1}
        ],
        min_sup:2,
        L
    ],
    L:apriori(data, min_sup),
    makelist(build_data_from_list([sconcat(length(Lk[1][1]),"-itemset"), "count"], Lk), Lk, L)
);

Join and prune steps¶

apriori_join and apriori_prune implement the join and prune steps, respectively:

block(
    [
        data: [
            {    3,2,1},
            {    3,2,1},
            {  4,  2,1},
            {  4,  2,1},
            {5,  3,  1}
        ],
        min_sup:2,
        L_1
    ],
    L_1: apriori1(data, min_sup),
    L_2: apriorik(data, L_1, min_sup),
    C:apriori_join(data, L_2),
    [C, apriori_prune(data, C, setify(map(first,L_2)))]
);

Association rule mining¶

Association rules¶

An association rule can be created as follows:

block(
    [
        R:ar('A,'B)
    ],
    build_data_from_list(
        ["rule", "antecedent", "consequence"],
        [[R, ar_A(R), ar_B(R)]]
    )
);

ar(A,B) creates an association rule (A ⇒ B).
ar_A(R) returns the itemset associated with the antecedent of R.
ar_B(R) returns the itemset associated with the consequence of R.

Rule qualities¶

The following computes various qualities of an association rule from transaction data:

block(
    [
        data: [
            {    3,2,1},
            {    3,2,1},
            {  4,  2,1},
            {  4,  2,1},
            {5,  3,  1}
        ],
        min_sup:2,
        R: ar({1,2},{3})
    ],
    [coverage(data, R), support(data, R), confidence(data, R), prior(data, R), lift(data, R)] 
);

Support-confidence framework¶

Association rules can be generated using the support-confidence framework as follows:

block(
    [
        data: [
            {    3,2,1},
            {    3,2,1},
            {  4,  2,1},
            {  4,  2,1},
            {5,  3,  1}
        ],
        c:6/10, s:4/10
    ],
    lst:support_confidence_framework(data, s, c),
    build_data_from_list(
        ["rule", "coverage", "support", "confidence", "prior", "lift"],
        sort(
            lst, 
            lambda([a,b], a[4]>b[4] or (a[4]=b[4] and a[6]>b[6])) /* descending order in (confidence, lift) */
        )
    )
);

Data cube computation¶

Bottom-up construction¶

BUC(data, dims, fact, min_val) implements the bottom-up construction of the iceberg cube where

data is the base cuboid,
dims is the names of the dimensions,
fact is the name of the fact, and
min_val being the minimum value of fact required by the iceberg condition.

block(
    [
        fns: ['A, 'B, 'C, "fact"],
        lst: [
            ['a_2, 'b_2, 'c_2, 1],
            ['a_1, 'b_2, 'c_1, 1],
            ['a_2, 'b_2, 'c_1, 1],
            ['a_1, 'b_1, 'c_1, 1]
        ],
        dims, fact, data
    ],
    dims: rest(fns,-1),
    fact: last(fns),
    data: build_data_from_list(fns, lst),
    BUC(data, dims, fact, 2)
);