Frequent Itemset Mining (FIM) is the most researched field of frequent pattern mining. Over one hundred FIM algorithms were proposed - the majority claiming to be the most efficient. To clarify this chaos and the contradictions, two FIMI competitions were organized. It not only resulted in some excellent implementations but the community also get better understanding (limits, drawbacks, advantages) of the algorithms. Although FIMI'03 and FIMI'04 raised higher the level of publication, it has not solved many problem and we still suffer from the lack of understanding. For more information about the curse of FIM click here.
The separate work of different research groups, however, cannot be so efficient than tightly cooperating groups that interchange their knowledge. The goal of this FIM environment is to support this tight cooperation by providing an environment that includes base library functions, like IO, coding, decoding, ... and some high level description of the most important algorithms, such as apriori, eclat, FP-growth together with some very efficient implementations of them. The environment also contains many test classes to test the solutions for specific problems.
Flexibility and plugability are achieved by a template-based solution, that does not prevent the compiler form code optimization (like virtual function-based solution), which results in codes that are competitive with the best solutions.
If you use this library in a research paper then a citation to either of the following papers is welcome.
The library includes an some optimized input/output and coding/decoding classes, allocators, many well designed data structures (trie, Patricia-tree, ) database cachers, some very efficient Apriori, Eclat and FP-growth algorithms, an Apriori algorithm that finds frequent sequences of items and an association rule miner that uses an Apriori to find frequent itemsets. Also, there exist many test classes that aims to test the efficiency of differrent techniques, like transaction caching, dead-end pruning, equisupport pruning, candidate generation technique, etc.
Name | Description | arguments |
---|---|---|
compile command | ||
main source file | ||
fim | This program includes the most important FIM algorithms, i.e. Apriori, Eclat, FP-growth | There are 4 mandatory parameters:
Example: fim apriori ../data/kosarak.dat 1500 output.txt 8 |
make ../fim* | ||
main.cpp | ||
assoc_rule | This is an association rule miner, that uses Apriori to find the frequent itemsets | There are 5 mandatory parameters:
assoc_rule ../data/kosarak.dat 1500 0.2 3 output.txt |
make assoc_rule* | ||
main-assoc-rule.cpp | ||
apriori_simple | A simplified Apriori implementation that omits many speed-up techniques. We intend this program for students or researchers, who would like to have an efficient but still easy-to-understand and easy-to-modify Apriori implementation. If you don't find the speed of this Apriori satisfactory, you can check the implementation in main.cpp. | There are 3 mandatory parameters:
Example: ./apriori-simple ../data/kosarak.dat 1500 output.txt 8 |
make apriori_simple* | ||
main-apriori-simple.cpp | ||
fsm | An Apriori-based frequent item sequence miner that uses a trie to store the candidates. More information is found in the paper: Ferenc Bodon, Trie-based APRIORI Implementation for Mining Frequent Itemsequences, ACM SIGKDD Workshop on Open Source Data Mining Workshop (OSDM'05), in Bart Goethals and Siegfried Nijssen and Mohammed J. Zaki editors, pages 56 - 65, Chicago, IL, USA. 2005. [PDF] [PS][Slides][BibTex] |
There are 4 mandatory parameters:
Example: fsm apriori ../data/seq/kosarak2_10_4.dat 1500 output.txt 5 |
make ../fsm* | ||
main-fsm.cpp |
For
Before compiling please run the cd src;make dep command.
The compile commands are given in the table above.
To get (update) your local copy of the documentation run make doc in the src directory. The documentation will be found under doc/html. Please note, that you need doxygen and graphviz installed to be able to generate the documentation.
Similar template library, called DMTL, was proposed by Hasan et al. The purpose of DMTL differs from the purpose of our library. The DTML is more pattern and application oriented, we concentrate on algorithms and data structures. DMTL is an excellent tool if we would like to mine a new type of patterns using the existing algorithms, and our library comes at hand if we would like to change a part of an existing algorithm. In DTML the algorithms are the building blocks, while in our library we disassemble the methods as much as it makes sense.
Note, that the library is in developement phase. It will be extended by many new classes and functionalities, some interfaces will change, the documentation will be improved and a restructuring is also expected.
A more detailed todo list is found here.