#include <Apriori.hpp>
Collaboration diagram for Apriori< C, D, MAIN_DATA_STRUCTURE, SEC_DATA_STRUCTURE, FII, FPI, CG, IR, SUPP_C >:
Public Member Functions | |
Apriori (MAIN_DATA_STRUCTURE &data_structure, SEC_DATA_STRUCTURE &s_alloc, IR &infrequent_remover, C &coder, D &decoder, FII &fii) | |
void | findFrequentItemsets (const counter_t emptyset_support, std::vector< counter_t > &freq_counters, std::vector< std::pair< counter_t, std::pair< item_t, item_t > > > &freq_pairs_with_counters, const counter_t min_supp, unsigned int maxsize=largest_itemsetsize) |
This method finds the frequent itemsets. | |
Private Attributes | |
MAIN_DATA_STRUCTURE & | data_structure |
the data strucure that stores candidate. | |
SEC_DATA_STRUCTURE & | s_alloc |
a secondary data strucure; in general it is an allocator | |
IR & | infrequent_remover |
the infrequent remover | |
C & | coder |
The coder. | |
D & | decoder |
The decoder. | |
FII & | fii |
unsigned int | candidate_size |
APRIORI is a levelwise algorithm. It scans the transaction database several times. After the first scan the frequent 1-itemsets are found, and in general after the kth scan the frequent k-itemsets are extracted. The method does not determine the support of every possible itemset. In an attempt to narrow the domain to be searched, before every pass it generates candidate itemsets. An itemset becomes a candidate if every subset of it is frequent. Obviously every frequent itemset needs to be candidate too, hence only the support of candidates is calculated. Frequent k-itemsets generate the candidate k+1-itemsets after the kth scan.
After all the candidate (k+1)-itemsets have been generated, a new scan of the transactions is effected and the precise support of the candidates is determined. The candidates with low support are thrown away. The algorithm ends when no candidates can be generated.
The intuition behind candidate generation is based on the following simple fact:
Using the fact indirectly, we infer, that if an itemset has a subset that is infrequent, then it cannot be frequent. So in the algorithm APRIORI only those itemsets will be candidates whose every subset is frequent. The frequent k-itemsets are available when we attempt to generate candidate (k+1)-itemsets. The algorithm seeks candidate k+1-itemsets among the sets which are unions of two frequent k-itemsets. After forming the union we need to verify that all of its subsets are frequent, otherwise it should not be a candidate. To this end, it is clearly enough to check if all the k-subsets of X are frequent.
Next the supports of the candidates are calculated. This is done by reading transactions one by one. For each transaction t the algorithm decides which candidates are supported by t. To solve this task efficiently it is useful to store the candidates in a special datastructure like trie or hash-tree. The template parameters are the following:
C | type of the coder, it has to implement the same function as Coder does. | |
D | type of the decoder, it has to implement interface DecoderBase. | |
MAIN_DATA_STRUCTURE | the main data structure. It is basically a trie (prefix-trie) | |
SEC_DATA_STRUCTURE | the secondary data structure | |
FII | the type of a frequent item inserter (it inserts frequent items into the data structure) | |
FPI | the type of a frequent item inserter (it inserts frequent pairs into the data structure) | |
CG | the type of candidate generator | |
IR | the type of the infrequent remover | |
SUPP_C | the type of the support counter |
Definition at line 84 of file Apriori.hpp.
|
Definition at line 104 of file Apriori.hpp. |
|
This method finds the frequent itemsets.
Definition at line 138 of file Apriori.hpp. |
|
Definition at line 100 of file Apriori.hpp. |
|
The coder.
Definition at line 94 of file Apriori.hpp. |
|
the data strucure that stores candidate.
Definition at line 88 of file Apriori.hpp. |
|
The decoder.
Definition at line 96 of file Apriori.hpp. |
|
Definition at line 98 of file Apriori.hpp. |
|
the infrequent remover
Definition at line 92 of file Apriori.hpp. |
|
a secondary data strucure; in general it is an allocator
Definition at line 90 of file Apriori.hpp. |