Turi Create  4.0
turi::block_cache Class Reference

#include <core/storage/fileio/block_cache.hpp>

Public Member Functions

 block_cache ()=default
 
 ~block_cache ()
 Default destructor. Deletes all associated files.
 
void init (const std::string &storage_prefix, size_t max_file_handle_cache=16)
 
bool write (const std::string &key, const std::string &value)
 
bool evict_key (const std::string &key)
 
int64_t value_length (const std::string &key)
 
int64_t read (const std::string &key, char *output, size_t start=0, size_t end=(size_t)(-1))
 
int64_t read (const std::string &key, std::string &output, size_t start=0, size_t end=(size_t)(-1))
 
size_t file_handle_cache_hits () const
 
size_t file_handle_cache_misses () const
 
size_t get_max_capacity ()
 
void set_max_capacity (size_t)
 
void hold_cache_provider (std::shared_ptr< const fileio::fixed_size_cache_manager > instance)
 

Static Public Member Functions

static block_cacheget_instance ()
 

Detailed Description

The block cache implements a simple key-value store for extremely large values (~16MB at least). Every key can only be written to exactly once, and allows for arbitrary range reads (i.e. read byte X to byte Y of this key)

Essentially every value is stored as a single file inside the storage_prefix parameter set at init.

The block_cache is safe for concurrent use.

Use On a Distributed File System

The storage prefix be located on a distributed filesystem (for instance HDFS or NFS). In which case, every machine sharing the same storage prefix also shares keys.

When sharing a storage prefix with other processes on a distributed filesystem, the atomicity guarantees of the filesystem becomes important.

In particular, on HDFS, you may find keys in a "indeterminate" state, where it cannot be written to, but cannot be queried (because the writer has created the file but has not finished writing to it yet). On NFS multiple machines may be able to write to the same key, but only one will win. Also the length and contents of the key may be wrong if you read the key while someone else is writing to it.

Design Notes

We will like these "interesting" distributed file system properties to not be true when the block_cache is merely used concurrently. So a bit of care is needed to ensure atomicity, at least within the context of the same block_cache object. Essentially we want write-once, but arbitrary parallel reads semantics.

Definition at line 53 of file block_cache.hpp.

Constructor & Destructor Documentation

◆ block_cache()

turi::block_cache::block_cache ( )
default

Constructs the block cache. init must be called before the block_cache can be used.

make sure the cache manager is detroyed after all cache files are deleted.

Member Function Documentation

◆ evict_key()

bool turi::block_cache::evict_key ( const std::string &  key)

Evicts a particular key. Returns true on success, false on failure

Parameters
keyThe key name

◆ file_handle_cache_hits()

size_t turi::block_cache::file_handle_cache_hits ( ) const

This returns then number of file handle cache hits. This function is for profiling purposes since file handles are cached for performance reasons.

◆ file_handle_cache_misses()

size_t turi::block_cache::file_handle_cache_misses ( ) const

This returns then number of file handle cache misses. This function is for profiling purposes since file handles are cached for performance reasons.

◆ get_instance()

static block_cache& turi::block_cache::get_instance ( )
static

Gets a singleton instance. The singleton instance has this default behavior:

Location of storage:

  • If temp files are located on HDFS, the cache just writes through and is always located on HDFS.
  • If temp files are located on local disk, the cache is set to the cache:// file system. This allows for a degree of in-memory caching.

File handle LRU cache size:

  • 4 * ncpus

◆ get_max_capacity()

size_t turi::block_cache::get_max_capacity ( )

Sets the maximum number of files managed. If 0, there is no max capacity.

◆ hold_cache_provider()

void turi::block_cache::hold_cache_provider ( std::shared_ptr< const fileio::fixed_size_cache_manager instance)
inline

dependency injection, meaning, underlying cache provider instance should be released after block_cache singleton is released.

Definition at line 175 of file block_cache.hpp.

◆ init()

void turi::block_cache::init ( const std::string &  storage_prefix,
size_t  max_file_handle_cache = 16 
)

init must be called exactly once on block cache construction before the block cache can be used. Multiple calls to init will raise an exception.

Parameters
storage_prefixThe location where all values are stored
max_file_handle_cacheThe maximum number of file handles to cache

Essentially, every value is stored as a separate file inside the directory.

◆ read() [1/2]

int64_t turi::block_cache::read ( const std::string &  key,
char *  output,
size_t  start = 0,
size_t  end = (size_t)(-1) 
)

Reads the value of a key into an output string, resizing the output string if necessary; Returns the number of bytes read.

Parameters
keyThe key to read
outputA reference to the output string
startOptional, denotes the start offset of the value to read. Defaults to 0.
endOptional, denotes the end offset of the value to read. The byte at end is not read. i.e. to read the first 5 bytes of the file, you call read(key, output, 0, 5); Defaults to the length of the file.

Note that the number of bytes read can be 0 if:

  • start is past the end of the value
  • end is less than start

If start and end are not passed, the entire block is read.

Returns
A value less than 0 on failure.

◆ read() [2/2]

int64_t turi::block_cache::read ( const std::string &  key,
std::string &  output,
size_t  start = 0,
size_t  end = (size_t)(-1) 
)

string overload. Note the char* reader is faster.

◆ set_max_capacity()

void turi::block_cache::set_max_capacity ( size_t  )

Gets the maximum number of files managed. If 0, there is no max capacity.

◆ value_length()

int64_t turi::block_cache::value_length ( const std::string &  key)

Returns the length of the value of a particular key.

Parameters
keyThe key to query
Returns
the length of the value on success, a value < 0 on failure.

◆ write()

bool turi::block_cache::write ( const std::string &  key,
const std::string &  value 
)

Writes a string to a key. Returns true on success. The key must not already exist. If the key already exists this fails and false is returned. When operating on a distributed filesystem, note that every machine sharing the same storage prefix have a common key space.

Parameters
keyThe key to write to
valueThe value to write
Returns
true on success, false on failure

The documentation for this class was generated from the following file: