load("datamining.mac")$
Apriori algorithm¶
Transactional data¶
A transactional data set is a list of transactions in the form of subsets of items purchased. all_items(data)
returns the list of all items in a transactional data set data
.
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
]
],
all_items(data)
);
Frequent itemsets¶
The support count of an item set (a set of items) is the number of transactions that contain all items in the item set:
support_count('data, 'A);
where is the indicator function of the predicate , and subsetp(A, T)
returns true iff A
is a subset of T
:
chi('p);
describe(subsetp)$
support_counts(data, C)
returns the list of [A,c]
where A
is an itemset in C
, and c
is the support count.
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
C
],
C: makelist({i},i,all_items(data)),
support_counts(data, C)
);
frequent_itemsets(C, min_sup)
filters the list of [A,c]
to remove itemsets A
with count c
strictly below min_sup
.
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
min_sup:2,
C
],
C: support_counts(data, makelist({i},i,all_items(data))),
frequent_itemsets(C, min_sup)
);
Frequent 1-itemsets¶
apriori1(data, min_sup)
implements the apriori algorithm to generate the frequent 1-itemsets of data
with support counts at least min_sup
.
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
min_sup:2
],
apriori1(data, min_sup)
);
Frequent -itemsets¶
Given a transactional data set data
and a list L
of frequent -itemsets with , apriorik(data, L, min_sup)
generates the list of [A,c]
where A
is a frequent -itemset of data
with support count c
at least min_sup
.
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
min_sup:2,
L_1
],
L_1: apriori1(data, min_sup),
L_2: apriorik(data, L_1, min_sup),
L_3: apriorik(data, L_2, min_sup),
map(lambda([k, L], build_data_from_list([sconcat(k,"-itemset"), "count"], L)), [2, 3], [L_2, L_3])
);
apriori(data, min_sup)
generates the list of frequent -itemsets for from 1 until there are no more frequent itemsets.
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
min_sup:2,
L
],
L:apriori(data, min_sup),
makelist(build_data_from_list([sconcat(length(Lk[1][1]),"-itemset"), "count"], Lk), Lk, L)
);
Join and prune steps¶
apriori_join
and apriori_prune
implement the join and prune steps, respectively:
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
min_sup:2,
L_1
],
L_1: apriori1(data, min_sup),
L_2: apriorik(data, L_1, min_sup),
C:apriori_join(data, L_2),
[C, apriori_prune(data, C, setify(map(first,L_2)))]
);
Association rule mining¶
Association rules¶
An association rule can be created as follows:
block(
[
R:ar('A,'B)
],
build_data_from_list(
["rule", "antecedent", "consequence"],
[[R, ar_A(R), ar_B(R)]]
)
);
ar(A,B)
creates an association rule(A ⇒ B)
.ar_A(R)
returns the itemset associated with the antecedent ofR
.ar_B(R)
returns the itemset associated with the consequence ofR
.
Rule qualities¶
The following computes various qualities of an association rule from transaction data:
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
min_sup:2,
R: ar({1,2},{3})
],
[coverage(data, R), support(data, R), confidence(data, R), prior(data, R), lift(data, R)]
);
Support-confidence framework¶
Association rules can be generated using the support-confidence framework as follows:
block(
[
data: [
{ 3,2,1},
{ 3,2,1},
{ 4, 2,1},
{ 4, 2,1},
{5, 3, 1}
],
c:6/10, s:4/10
],
lst:support_confidence_framework(data, s, c),
build_data_from_list(
["rule", "coverage", "support", "confidence", "prior", "lift"],
sort(
lst,
lambda([a,b], a[4]>b[4] or (a[4]=b[4] and a[6]>b[6])) /* descending order in (confidence, lift) */
)
)
);
Data cube computation¶
Bottom-up construction¶
BUC(data, dims, fact, min_val)
implements the bottom-up construction of the iceberg cube where
data
is the base cuboid,dims
is the names of the dimensions,fact
is the name of the fact, andmin_val
being the minimum value of fact required by the iceberg condition.
block(
[
fns: ['A, 'B, 'C, "fact"],
lst: [
['a_2, 'b_2, 'c_2, 1],
['a_1, 'b_2, 'c_1, 1],
['a_2, 'b_2, 'c_1, 1],
['a_1, 'b_1, 'c_1, 1]
],
dims, fact, data
],
dims: rest(fns,-1),
fact: last(fns),
data: build_data_from_list(fns, lst),
BUC(data, dims, fact, 2)
);