Turi Create
4.0
|
#include <core/storage/fileio/block_cache.hpp>
Public Member Functions | |
block_cache ()=default | |
~block_cache () | |
Default destructor. Deletes all associated files. | |
void | init (const std::string &storage_prefix, size_t max_file_handle_cache=16) |
bool | write (const std::string &key, const std::string &value) |
bool | evict_key (const std::string &key) |
int64_t | value_length (const std::string &key) |
int64_t | read (const std::string &key, char *output, size_t start=0, size_t end=(size_t)(-1)) |
int64_t | read (const std::string &key, std::string &output, size_t start=0, size_t end=(size_t)(-1)) |
size_t | file_handle_cache_hits () const |
size_t | file_handle_cache_misses () const |
size_t | get_max_capacity () |
void | set_max_capacity (size_t) |
void | hold_cache_provider (std::shared_ptr< const fileio::fixed_size_cache_manager > instance) |
Static Public Member Functions | |
static block_cache & | get_instance () |
The block cache implements a simple key-value store for extremely large values (~16MB at least). Every key can only be written to exactly once, and allows for arbitrary range reads (i.e. read byte X to byte Y of this key)
Essentially every value is stored as a single file inside the storage_prefix parameter set at init.
The block_cache is safe for concurrent use.
The storage prefix be located on a distributed filesystem (for instance HDFS or NFS). In which case, every machine sharing the same storage prefix also shares keys.
When sharing a storage prefix with other processes on a distributed filesystem, the atomicity guarantees of the filesystem becomes important.
In particular, on HDFS, you may find keys in a "indeterminate" state, where it cannot be written to, but cannot be queried (because the writer has created the file but has not finished writing to it yet). On NFS multiple machines may be able to write to the same key, but only one will win. Also the length and contents of the key may be wrong if you read the key while someone else is writing to it.
We will like these "interesting" distributed file system properties to not be true when the block_cache is merely used concurrently. So a bit of care is needed to ensure atomicity, at least within the context of the same block_cache object. Essentially we want write-once, but arbitrary parallel reads semantics.
Definition at line 53 of file block_cache.hpp.
|
default |
Constructs the block cache. init must be called before the block_cache can be used.
make sure the cache manager is detroyed after all cache files are deleted.
bool turi::block_cache::evict_key | ( | const std::string & | key | ) |
Evicts a particular key. Returns true on success, false on failure
key | The key name |
size_t turi::block_cache::file_handle_cache_hits | ( | ) | const |
This returns then number of file handle cache hits. This function is for profiling purposes since file handles are cached for performance reasons.
size_t turi::block_cache::file_handle_cache_misses | ( | ) | const |
This returns then number of file handle cache misses. This function is for profiling purposes since file handles are cached for performance reasons.
|
static |
Gets a singleton instance. The singleton instance has this default behavior:
Location of storage:
File handle LRU cache size:
size_t turi::block_cache::get_max_capacity | ( | ) |
Sets the maximum number of files managed. If 0, there is no max capacity.
|
inline |
dependency injection, meaning, underlying cache provider instance should be released after block_cache singleton is released.
Definition at line 175 of file block_cache.hpp.
void turi::block_cache::init | ( | const std::string & | storage_prefix, |
size_t | max_file_handle_cache = 16 |
||
) |
init must be called exactly once on block cache construction before the block cache can be used. Multiple calls to init will raise an exception.
storage_prefix | The location where all values are stored |
max_file_handle_cache | The maximum number of file handles to cache |
Essentially, every value is stored as a separate file inside the directory.
int64_t turi::block_cache::read | ( | const std::string & | key, |
char * | output, | ||
size_t | start = 0 , |
||
size_t | end = (size_t)(-1) |
||
) |
Reads the value of a key into an output string, resizing the output string if necessary; Returns the number of bytes read.
key | The key to read |
output | A reference to the output string |
start | Optional, denotes the start offset of the value to read. Defaults to 0. |
end | Optional, denotes the end offset of the value to read. The byte at end is not read. i.e. to read the first 5 bytes of the file, you call read(key, output, 0, 5); Defaults to the length of the file. |
Note that the number of bytes read can be 0 if:
If start and end are not passed, the entire block is read.
int64_t turi::block_cache::read | ( | const std::string & | key, |
std::string & | output, | ||
size_t | start = 0 , |
||
size_t | end = (size_t)(-1) |
||
) |
string overload. Note the char* reader is faster.
void turi::block_cache::set_max_capacity | ( | size_t | ) |
Gets the maximum number of files managed. If 0, there is no max capacity.
int64_t turi::block_cache::value_length | ( | const std::string & | key | ) |
Returns the length of the value of a particular key.
key | The key to query |
bool turi::block_cache::write | ( | const std::string & | key, |
const std::string & | value | ||
) |
Writes a string to a key. Returns true on success. The key must not already exist. If the key already exists this fails and false is returned. When operating on a distributed filesystem, note that every machine sharing the same storage prefix have a common key space.
key | The key to write to |
value | The value to write |