Main Page | Namespace List | Class Hierarchy | Class List | File List | Class Members | File Members

An efficient implemenation of APRIORI algorithm

This program is a very efficient implementation of APRIORI algorithm proposed by Rakesh Agrawal and Ramakrishnan Srikant. APRIORI is the most basic and well-known algorithm to find frequent itemsets in a transactional database.

Frequent Itemset Mining problem

A transactional database consists of sequence of transaction: $T=\langle t_1,\ldots ,t_n\rangle $. A transaction is a set of items ($t_i\in I$). Transactions are often called baskets, referring to the primary application domain (i.e. market-basket analysis). A set of items is often called itemset by the data mining community. The (absolute) support or the occurrence of $X$ (denoted by $supp(X)$) is the number of transactions that are supersets of $X$ (i.e. that contain $X$). The realtive support is the absolute support divided by the number of transactions (i.e. n). An itemset is frequent if its support is greater or equal than a threshold value.

In the frequent itemset mining problem a transaction database and a relative support threshold (traditionally denoted by min_supp) is given and we have to find all frequent itemsets.

Association Rule Mining problem

This program is also capable of mining association rules. An association rule is like an implication: $X\to Y $ means that if itemset X occurs in a transaction, than itemset Y also occurs with high probability. This probability is given by the confidence of the rule. It is like an approxiamtion of p(Y|X), it is the number of transactions that contain both X and Y divided by the number of transaction that contain X, thus $conf(X\to Y)=\frac{supp(X\cup Y)}{supp(X)}$. The relative support of the association rule $X\to Y $ is the support of itemset $X \cup Y $. The lift of $X\to Y $ tries to capture the independence of the antecedent and the consequent of the rule: $lift(X\to Y)=\frac{supp(X\cup Y)}{supp(X)supp(Y)}$ An association rule is valid if its confidence, support and lift are greater than or equal than corresponding threshold values.

In the association rule mining problem a transaction database and a relative support threshold (traditionally denoted by min_supp), a confidence threshold (traditionally denoted by min_conf), and a lift threshold is given and we have to find all valid association rules.


Generated on Tue Mar 2 18:12:10 2004 for APRIORI algorithm by doxygen 1.3.5