#include <Apriori.hpp>
Collaboration diagram for Apriori:
Public Member Functions | |
Apriori (ifstream &basket_file, const char *output_file_name, const bool store_input) | |
void | APRIORI_alg (const double min_supp, const bool quiet, const countertype size_threshold) |
This procedure implements the APRIORI algorithm. | |
~Apriori () | |
Private Member Functions | |
void | support (const itemtype &candidate_size) |
Determines the support of the candidates of the given size. | |
Private Attributes | |
Apriori_Trie * | apriori_trie |
A trie that stores the candidates. | |
Input_Output_Manager | input_output_manager |
The input_output_manager that is responsibel for the input, output and recoding operations. | |
map< vector< itemtype >, countertype > | reduced_baskets |
This will store the reduced baskets, if store_input=true;. | |
bool | store_input |
If store_input = true, then the reduced baskets will be stored in memory. |
APRIORI is a levelwise algorithm. It scans the transaction database several times. After the first scan the frequent 1-itemsets are found, and in general after the kth scan the frequent k-itemsets are extracted. The method does not determine the support of every possible itemset. In an attempt to narrow the domain to be searched, before every pass it generates candidate itemsets. An itemset becomes a candidate if every subset of it is frequent. Obviously every frequent itemset needs to be candidate too, hence only the support of candidates is calculated. Frequent k-itemsets generate the candidate k+1-itemsets after the scan.
After all the candidate k+1-itemsets have been generated, a new scan of the transactions is effected and the precise support of the candidates is determined. The candidates with low support are thrown away. The algorithm ends when no candidates can be generated.
The intuition behind candidate generation is based on the following simple fact:
Using the fact indirectly, we infer, that if an itemset has a subset that is infrequent, then it cannot be frequent. So in the algorithm APRIORI only those itemsets will be candidates whose every subset is frequent. The frequent k-itemsets are available when we attempt to generate candidate k+1-itemsets. The algorithm seeks candidate k+1-itemsets among the sets which are unions of two frequent k-itemsets. After forming the union we need to verify that all of its subsets are frequent, otherwise it should not be a candidate. To this end, it is clearly enough to check if all the k-subsets of X are frequent.
Next the supports of the candidates are calculated. This is done by reading transactions one by one. For each transaction t the algorithm decides which candidates are supported by t. To solve this task efficiently APRIORI uses a hash-tree. However in this implementation a trie (prefix-tree) is applied. Tries have many advantages over hash-trees.
Definition at line 78 of file Apriori.hpp.
|
Definition at line 89 of file Apriori.hpp. |
|
Definition at line 97 of file Apriori.hpp. |
|
This procedure implements the APRIORI algorithm.
Referenced by main(). |
|
Determines the support of the candidates of the given size.
Definition at line 21 of file Apriori.cpp. References apriori_trie, Input_Output_Manager::basket_recode(), countertype, Apriori_Trie::find_candidate(), input_output_manager, Input_Output_Manager::read_in_a_line(), and reduced_baskets. |
|
A trie that stores the candidates.
Definition at line 108 of file Apriori.hpp. Referenced by support(). |
|
The input_output_manager that is responsibel for the input, output and recoding operations.
Definition at line 112 of file Apriori.hpp. Referenced by support(). |
|
This will store the reduced baskets, if store_input=true;.
Definition at line 115 of file Apriori.hpp. Referenced by support(). |
|
If store_input = true, then the reduced baskets will be stored in memory.
Definition at line 119 of file Apriori.hpp. |