Main Page | Namespace List | Class Hierarchy | Class List | File List | Class Members | File Members

Apriori Class Reference

This class implements the APRIORI algirithm. More...

#include <Apriori.hpp>

List of all members.

Public Member Functions

 Apriori (const bool &quiet, const bool &store_input, const int &trie_type=1, const int &child_threshold=5)
void APRIORI_alg (ofstream &outcomefile, const char *basket_filename, const double &min_supp, const double &min_conf)
 This procedure implements the APRIORI algorithm.


Private Member Functions

void read_in_a_line (FILE *filepoint)
 Reads in one transaction from the datafile.

void support (FILE *filepoint, const itemtype &candidate_size)
 Determines the support of the candidates of the given size.


Private Attributes

Trietrie
vector< itemtypebasket
unsigned long basket_number
map< vector< itemtype >, unsigned
long > 
reduced_baskets
bool quiet
bool store_input


Detailed Description

This class implements the APRIORI algirithm.

APRIORI is a levelwise algorithm. It scans the transaction database several times. After the first scan the frequent 1-itemsets are found, and in general after the kth scan the frequent k-itemsets are extracted. The method does not determine the support of every possible itemset. In an attempt to narrow the domain to be searched, before every pass it generates candidate itemsets. An itemset becomes a candidate if every subset of it is frequent. Obviously every frequent itemset needs to be candidate too, hence only the support of candidates is calculated. Frequent k-itemsets generate the candidate k+1-itemsets after the $k^{th}$ scan.

After all the candidate k+1-itemsets have been generated, a new scan of the transactions is effected and the precise support of the candidates is determined. The candidates with low support are thrown away. The algorithm ends when no candidates can be generated.

The intuition behind candidate generation is based on the following simple fact:

Every subset of a frequent itemset is frequent.

This is immediate, because if a transaction t supports an itemset X, then t supports every subset $Y\subseteq X$.

Using the fact indirectly, we infer, that if an itemset has a subset that is infrequent, then it cannot be frequent. So in the algorithm APRIORI only those itemsets will be candidates whose every subset is frequent. The frequent k-itemsets are available when we attempt to generate candidate k+1-itemsets. The algorithm seeks candidate k+1-itemsets among the sets which are unions of two frequent k-itemsets. After forming the union we need to verify that all of its subsets are frequent, otherwise it should not be a candidate. To this end, it is clearly enough to check if all the k-subsets of X are frequent.

Next the supports of the candidates are calculated. This is done by reading transactions one by one. For each transaction t the algorithm decides which candidates are supported by t. To solve this task efficiently APRIORI uses a hash-tree. However in this implementation a trie (prefix-tree) is applied. Tries have many advantages over hash-trees.

  1. It is faster
  2. It needs no parameters (main drawback of a hash-tree is that its performance is very sensitive to the parameteres)
  3. The candidate generation is very simple.


Constructor & Destructor Documentation

Apriori::Apriori const bool &  quiet,
const bool &  store_input,
const int &  trie_type = 1,
const int &  child_threshold = 5
 


Member Function Documentation

void Apriori::APRIORI_alg ofstream &  outcomefile,
const char *  basket_filename,
const double &  min_supp,
const double &  min_conf
 

This procedure implements the APRIORI algorithm.

Parameters:
outcomefile The file the output will be written to.
basket_filename The name of the datafile that contains the transactions.
min_supp The relative support threshold
min_conf The confidence threshold for association rules. If min_conf=0 no association rules will be extraced.

void Apriori::read_in_a_line FILE *  filepoint  )  [private]
 

Reads in one transaction from the datafile.

void Apriori::support FILE *  filepoint,
const itemtype candidate_size
[private]
 

Determines the support of the candidates of the given size.


Member Data Documentation

vector<itemtype> Apriori::basket [private]
 

unsigned long Apriori::basket_number [private]
 

bool Apriori::quiet [private]
 

map<vector<itemtype>, unsigned long> Apriori::reduced_baskets [private]
 

bool Apriori::store_input [private]
 

Trie* Apriori::trie [private]
 


The documentation for this class was generated from the following files:
Generated on Tue Mar 2 18:12:10 2004 for APRIORI algorithm by doxygen 1.3.5