load("datamining.mac")$Apriori algorithm¶
Transactional data¶
A transactional data set is a list of transactions in the form of subsets of items purchased. all_items(data) returns the list of all items in a transactional data set data.
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
]
],
all_items(data)
);Frequent itemsets¶
The support count of an item set (a set of items) is the number of transactions that contain all items in the item set:
support_count('data, 'A);where is the indicator function of the predicate , and subsetp(A, T) returns true iff A is a subset of T:
chi('p);describe(subsetp)$support_counts(data, C) returns the list of [A,c] where A is an itemset in C, and c is the support count.
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
C
],
C: makelist({i},i,all_items(data)),
support_counts(data, C)
);frequent_itemsets(C, min_sup) filters the list of [A,c] to remove itemsets A with count c strictly below min_sup.
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
min_sup:2,
C
],
C: support_counts(data, makelist({i},i,all_items(data))),
frequent_itemsets(C, min_sup)
);Frequent 1-itemsets¶
apriori1(data, min_sup) implements the apriori algorithm to generate the frequent 1-itemsets of data with support counts at least min_sup.
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
min_sup:2
],
apriori1(data, min_sup)
);Frequent -itemsets¶
Given a transactional data set data and a list L of frequent -itemsets with , apriorik(data, L, min_sup) generates the list of [A,c] where A is a frequent -itemset of data with support count c at least min_sup.
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
min_sup:2,
L_1
],
L_1: apriori1(data, min_sup),
L_2: apriorik(data, L_1, min_sup),
L_3: apriorik(data, L_2, min_sup),
map(lambda([k, L], build_data_from_list([sconcat(k,"-itemset"), "count"], L)), [2, 3], [L_2, L_3])
);apriori(data, min_sup) generates the list of frequent -itemsets for from 1 until there are no more frequent itemsets.
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
min_sup:2,
L
],
L:apriori(data, min_sup),
makelist(build_data_from_list([sconcat(length(Lk[1][1]),"-itemset"), "count"], Lk), Lk, L)
);Join and prune steps¶
apriori_join and apriori_prune implement the join and prune steps, respectively:
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
min_sup:2,
L_1
],
L_1: apriori1(data, min_sup),
L_2: apriorik(data, L_1, min_sup),
C:apriori_join(data, L_2),
[C, apriori_prune(data, C, setify(map(first,L_2)))]
);Association rule mining¶
Association rules¶
An association rule can be created as follows:
block(
[
R:ar('A,'B)
],
build_data_from_list(
["rule", "antecedent", "consequence"],
[[R, ar_A(R), ar_B(R)]]
)
);ar(A,B)creates an association rule(A ⇒ B).ar_A(R)returns the itemset associated with the antecedent ofR.ar_B(R)returns the itemset associated with the consequence ofR.
Rule qualities¶
The following computes various qualities of an association rule from transaction data:
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
min_sup:2,
R: ar({1,2},{3})
],
[coverage(data, R), support(data, R), confidence(data, R), prior(data, R), lift(data, R)]
);Support-confidence framework¶
Association rules can be generated using the support-confidence framework as follows:
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
c:6/10, s:4/10
],
lst:support_confidence_framework(data, s, c),
build_data_from_list(
["rule", "coverage", "support", "confidence", "prior", "lift"],
sort(
lst,
lambda([a,b], a[4]>b[4] or (a[4]=b[4] and a[6]>b[6])) /* descending order in (confidence, lift) */
)
)
);Data cube computation¶
Bottom-up construction¶
BUC(data, dims, fact, min_val) implements the bottom-up construction of the iceberg cube where
datais the base cuboid,dimsis the names of the dimensions,factis the name of the fact, andmin_valbeing the minimum value of fact required by the iceberg condition.
block(
[
fns: ['A, 'B, 'C, "fact"],
lst: [
['a_2, 'b_2, 'c_2, 1],
['a_1, 'b_2, 'c_1, 1],
['a_2, 'b_2, 'c_1, 1],
['a_1, 'b_1, 'c_1, 1]
],
dims, fact, data
],
dims: rest(fns,-1),
fact: last(fns),
data: build_data_from_list(fns, lst),
BUC(data, dims, fact, 2)
);