Turi Create  4.0
turi::pattern_mining::fp_results_tree Class Reference

#include <toolkits/pattern_mining/fp_results_tree.hpp>

Public Member Functions

void save (oarchive &oarc) const
 
void load (iarchive &iarc)
 
bool is_itemset_redundant (const std::vector< size_t > &potential_itemset, const size_t &support) const
 
virtual void add_itemset (const std::vector< size_t > &potential_itemset, const size_t &support)
 
void build_tree (const gl_sframe &closed_itemsets)
 
gl_sframe get_closed_itemsets (const std::shared_ptr< topk_indexer > &indexer=nullptr) const
 
std::vector< dense_bitsetget_top_k_closed_bitsets (const size_t &size, const size_t &top_k=TOP_K_MAX, const size_t &min_length=1) const
 
gl_sframe get_top_k_closed_itemsets (const size_t &top_k=TOP_K_MAX, const size_t &min_length=1, const std::shared_ptr< topk_indexer > &indexer=nullptr) const
 
std::vector< size_t > sort_itemset (const std::vector< size_t > &itemset) const
 
size_t get_support (const std::vector< size_t > &sorted_itemset, const size_t &lower_bound_on_support=0) const
 
size_t get_num_transactions () const
 
void prune_tree (const size_t &min_support)
 

Detailed Description

Tree data structure for keeping track of the frequent 'closed' itemsets This is a compressed, memory efficient data structure used to store and during the mining of 'closed' itemsets. Similar to the fp_tree structure, but stores itemsets, not transactions.

Members: id_order_map (map of size_t to ranking) - global ordering of the ids (heuristic is to use decreasing order of support) hash_id_map (map to fp_nodes) - map to the head of linked lists for each id root_node

See: http://web.engr.illinois.edu/~hanj/pdf/icdm02_topk.pdf

Definition at line 44 of file fp_results_tree.hpp.

Member Function Documentation

◆ add_itemset()

virtual void turi::pattern_mining::fp_results_tree::add_itemset ( const std::vector< size_t > &  potential_itemset,
const size_t &  support 
)
virtual

Add a potential closed itemset to the tree

Args: potential_itemset (vector of size_ts) - itemset to add support (size_t) - support of the itemset

Reimplemented in turi::pattern_mining::fp_top_k_results_tree.

◆ build_tree()

void turi::pattern_mining::fp_results_tree::build_tree ( const gl_sframe closed_itemsets)

Build results tree from a collection of closed itemsets

Args: closed_itemsets (gl_sframe) - sframe of closed itemset (ids), support

◆ get_closed_itemsets()

gl_sframe turi::pattern_mining::fp_results_tree::get_closed_itemsets ( const std::shared_ptr< topk_indexer > &  indexer = nullptr) const

Return the current collection of closed itemsets

Returns: closed_itemsets (gl_sframe) - sframe of closed itemset (ids), support

◆ get_num_transactions()

size_t turi::pattern_mining::fp_results_tree::get_num_transactions ( ) const
inline

Get the number of transaction

Definition at line 191 of file fp_results_tree.hpp.

◆ get_support()

size_t turi::pattern_mining::fp_results_tree::get_support ( const std::vector< size_t > &  sorted_itemset,
const size_t &  lower_bound_on_support = 0 
) const

Get the support for a frequent itemset Note: the support for the empty set is the total number of transactions

Args: sorted_itemset (vector of size_t) - a sorted frequent itemset (not necessarily closed) (see sort_itemset). lower_bound_on_support (size_t) - optional, for efficiency Returns: support (size_t)

◆ get_top_k_closed_bitsets()

std::vector<dense_bitset> turi::pattern_mining::fp_results_tree::get_top_k_closed_bitsets ( const size_t &  size,
const size_t &  top_k = TOP_K_MAX,
const size_t &  min_length = 1 
) const

Return the current collection of closed itemsets as bitsets.

Args: size: Size of each bitset returned. top_k (size_t) - maximum number of closed itemsets to returns min_length (size_t) - minimum required length for a closed itemset

Returns: closed_bitsets - Vector of bitsets representing each set.

◆ get_top_k_closed_itemsets()

gl_sframe turi::pattern_mining::fp_results_tree::get_top_k_closed_itemsets ( const size_t &  top_k = TOP_K_MAX,
const size_t &  min_length = 1,
const std::shared_ptr< topk_indexer > &  indexer = nullptr 
) const

Return the top_k closed itemsets in descending order.

Takes longer than get_closed_itemsets() due to sorting, but is faster than trying to sort a gl_sframe.

Args: top_k (size_t) - maximum number of closed itemsets to returns min_length (size_t) - minimum required length for a closed itemset

Returns: top_k_closed_itemsets (gl_sframe) - sframe of top_k itemsets, support

◆ is_itemset_redundant()

bool turi::pattern_mining::fp_results_tree::is_itemset_redundant ( const std::vector< size_t > &  potential_itemset,
const size_t &  support 
) const

Check if frequent itemset cannot be closed

Checks if the itemset is a subset of an existing itemset in the tree with equal support. Returns true if a subset with equal support exists.

Args: potential_itemset (vector of size_t) - itemset to check Returns: is_redundant (bool) - whether the itemset passes the test

◆ load()

void turi::pattern_mining::fp_results_tree::load ( iarchive iarc)
inline

Load the fp_results_tree from an iarc.

Definition at line 73 of file fp_results_tree.hpp.

◆ prune_tree()

void turi::pattern_mining::fp_results_tree::prune_tree ( const size_t &  min_support)

Prune Tree Pass over the tree, removing all nodes (corresponding to itemsets) with support strictly less than the given (final) min support Rebuilds tree -> Should be rarely called (ideally only at the end)

Args: min_support (size_t)

◆ save()

void turi::pattern_mining::fp_results_tree::save ( oarchive oarc) const
inline

Save the fp_results_tree into a oarc.

Definition at line 58 of file fp_results_tree.hpp.

◆ sort_itemset()

std::vector<size_t> turi::pattern_mining::fp_results_tree::sort_itemset ( const std::vector< size_t > &  itemset) const

Sort itemset by id_order_map (last element first) Args: itemset (vector of size_ts) Returns: sorted_itemset (vector of size_ts)


The documentation for this class was generated from the following file: