Turi Create
4.0
|
#include <toolkits/pattern_mining/fp_results_tree.hpp>
Public Member Functions | |
void | save (oarchive &oarc) const |
void | load (iarchive &iarc) |
bool | is_itemset_redundant (const std::vector< size_t > &potential_itemset, const size_t &support) const |
virtual void | add_itemset (const std::vector< size_t > &potential_itemset, const size_t &support) |
void | build_tree (const gl_sframe &closed_itemsets) |
gl_sframe | get_closed_itemsets (const std::shared_ptr< topk_indexer > &indexer=nullptr) const |
std::vector< dense_bitset > | get_top_k_closed_bitsets (const size_t &size, const size_t &top_k=TOP_K_MAX, const size_t &min_length=1) const |
gl_sframe | get_top_k_closed_itemsets (const size_t &top_k=TOP_K_MAX, const size_t &min_length=1, const std::shared_ptr< topk_indexer > &indexer=nullptr) const |
std::vector< size_t > | sort_itemset (const std::vector< size_t > &itemset) const |
size_t | get_support (const std::vector< size_t > &sorted_itemset, const size_t &lower_bound_on_support=0) const |
size_t | get_num_transactions () const |
void | prune_tree (const size_t &min_support) |
Tree data structure for keeping track of the frequent 'closed' itemsets This is a compressed, memory efficient data structure used to store and during the mining of 'closed' itemsets. Similar to the fp_tree structure, but stores itemsets, not transactions.
Members: id_order_map (map of size_t to ranking) - global ordering of the ids (heuristic is to use decreasing order of support) hash_id_map (map to fp_nodes) - map to the head of linked lists for each id root_node
See: http://web.engr.illinois.edu/~hanj/pdf/icdm02_topk.pdf
Definition at line 44 of file fp_results_tree.hpp.
|
virtual |
Add a potential closed itemset to the tree
Args: potential_itemset (vector of size_ts) - itemset to add support (size_t) - support of the itemset
Reimplemented in turi::pattern_mining::fp_top_k_results_tree.
void turi::pattern_mining::fp_results_tree::build_tree | ( | const gl_sframe & | closed_itemsets | ) |
Build results tree from a collection of closed itemsets
Args: closed_itemsets (gl_sframe) - sframe of closed itemset (ids), support
gl_sframe turi::pattern_mining::fp_results_tree::get_closed_itemsets | ( | const std::shared_ptr< topk_indexer > & | indexer = nullptr | ) | const |
Return the current collection of closed itemsets
Returns: closed_itemsets (gl_sframe) - sframe of closed itemset (ids), support
|
inline |
Get the number of transaction
Definition at line 191 of file fp_results_tree.hpp.
size_t turi::pattern_mining::fp_results_tree::get_support | ( | const std::vector< size_t > & | sorted_itemset, |
const size_t & | lower_bound_on_support = 0 |
||
) | const |
Get the support for a frequent itemset Note: the support for the empty set is the total number of transactions
Args: sorted_itemset (vector of size_t) - a sorted frequent itemset (not necessarily closed) (see sort_itemset). lower_bound_on_support (size_t) - optional, for efficiency Returns: support (size_t)
std::vector<dense_bitset> turi::pattern_mining::fp_results_tree::get_top_k_closed_bitsets | ( | const size_t & | size, |
const size_t & | top_k = TOP_K_MAX , |
||
const size_t & | min_length = 1 |
||
) | const |
Return the current collection of closed itemsets as bitsets.
Args: size: Size of each bitset returned. top_k (size_t) - maximum number of closed itemsets to returns min_length (size_t) - minimum required length for a closed itemset
Returns: closed_bitsets - Vector of bitsets representing each set.
gl_sframe turi::pattern_mining::fp_results_tree::get_top_k_closed_itemsets | ( | const size_t & | top_k = TOP_K_MAX , |
const size_t & | min_length = 1 , |
||
const std::shared_ptr< topk_indexer > & | indexer = nullptr |
||
) | const |
Return the top_k closed itemsets in descending order.
Takes longer than get_closed_itemsets() due to sorting, but is faster than trying to sort a gl_sframe.
Args: top_k (size_t) - maximum number of closed itemsets to returns min_length (size_t) - minimum required length for a closed itemset
Returns: top_k_closed_itemsets (gl_sframe) - sframe of top_k itemsets, support
bool turi::pattern_mining::fp_results_tree::is_itemset_redundant | ( | const std::vector< size_t > & | potential_itemset, |
const size_t & | support | ||
) | const |
Check if frequent itemset cannot be closed
Checks if the itemset is a subset of an existing itemset in the tree with equal support. Returns true if a subset with equal support exists.
Args: potential_itemset (vector of size_t) - itemset to check Returns: is_redundant (bool) - whether the itemset passes the test
|
inline |
Load the fp_results_tree from an iarc.
Definition at line 73 of file fp_results_tree.hpp.
void turi::pattern_mining::fp_results_tree::prune_tree | ( | const size_t & | min_support | ) |
Prune Tree Pass over the tree, removing all nodes (corresponding to itemsets) with support strictly less than the given (final) min support Rebuilds tree -> Should be rarely called (ideally only at the end)
Args: min_support (size_t)
|
inline |
Save the fp_results_tree into a oarc.
Definition at line 58 of file fp_results_tree.hpp.
std::vector<size_t> turi::pattern_mining::fp_results_tree::sort_itemset | ( | const std::vector< size_t > & | itemset | ) | const |
Sort itemset by id_order_map (last element first) Args: itemset (vector of size_ts) Returns: sorted_itemset (vector of size_ts)