#include <core/storage/fileio/block_cache.hpp>

Public Member Functions
	block_cache ()=default

	~block_cache ()
	Default destructor. Deletes all associated files.

void	init (const std::string &storage_prefix, size_t max_file_handle_cache=16)

bool	write (const std::string &key, const std::string &value)

bool	evict_key (const std::string &key)

int64_t	value_length (const std::string &key)

int64_t	read (const std::string &key, char *output, size_t start=0, size_t end=(size_t)(-1))

int64_t	read (const std::string &key, std::string &output, size_t start=0, size_t end=(size_t)(-1))

size_t	file_handle_cache_hits () const

size_t	file_handle_cache_misses () const

size_t	get_max_capacity ()

void	set_max_capacity (size_t)

void	hold_cache_provider (std::shared_ptr< const fileio::fixed_size_cache_manager > instance)

Static Public Member Functions
static block_cache &	get_instance ()

Detailed Description

The block cache implements a simple key-value store for extremely large values (~16MB at least). Every key can only be written to exactly once, and allows for arbitrary range reads (i.e. read byte X to byte Y of this key)

Essentially every value is stored as a single file inside the storage_prefix parameter set at init.

The block_cache is safe for concurrent use.

Use On a Distributed File System

The storage prefix be located on a distributed filesystem (for instance HDFS or NFS). In which case, every machine sharing the same storage prefix also shares keys.

When sharing a storage prefix with other processes on a distributed filesystem, the atomicity guarantees of the filesystem becomes important.

In particular, on HDFS, you may find keys in a "indeterminate" state, where it cannot be written to, but cannot be queried (because the writer has created the file but has not finished writing to it yet). On NFS multiple machines may be able to write to the same key, but only one will win. Also the length and contents of the key may be wrong if you read the key while someone else is writing to it.

Design Notes

We will like these "interesting" distributed file system properties to not be true when the block_cache is merely used concurrently. So a bit of care is needed to ensure atomicity, at least within the context of the same block_cache object. Essentially we want write-once, but arbitrary parallel reads semantics.

Definition at line 53 of file block_cache.hpp.

Constructor & Destructor Documentation

◆ block_cache()

turi::block_cache::block_cache ( )

default

Constructs the block cache. init must be called before the block_cache can be used.

make sure the cache manager is detroyed after all cache files are deleted.

Member Function Documentation

◆ evict_key()

bool turi::block_cache::evict_key ( const std::string & key )

Evicts a particular key. Returns true on success, false on failure

Parameters

key	The key name

◆ file_handle_cache_hits()

size_t turi::block_cache::file_handle_cache_hits ( ) const

This returns then number of file handle cache hits. This function is for profiling purposes since file handles are cached for performance reasons.

◆ file_handle_cache_misses()

size_t turi::block_cache::file_handle_cache_misses ( ) const

This returns then number of file handle cache misses. This function is for profiling purposes since file handles are cached for performance reasons.

◆ get_instance()

static block_cache& turi::block_cache::get_instance ( )

static

Gets a singleton instance. The singleton instance has this default behavior:

Location of storage:

If temp files are located on HDFS, the cache just writes through and is always located on HDFS.
If temp files are located on local disk, the cache is set to the cache:// file system. This allows for a degree of in-memory caching.

File handle LRU cache size:

4 * ncpus

◆ get_max_capacity()

size_t turi::block_cache::get_max_capacity ( )

Sets the maximum number of files managed. If 0, there is no max capacity.

◆ hold_cache_provider()

void turi::block_cache::hold_cache_provider ( std::shared_ptr< const fileio::fixed_size_cache_manager > instance )

inline

dependency injection, meaning, underlying cache provider instance should be released after block_cache singleton is released.

Definition at line 175 of file block_cache.hpp.

◆ init()

void turi::block_cache::init	(	const std::string &	storage_prefix,
		size_t	max_file_handle_cache = `16`
	)

init must be called exactly once on block cache construction before the block cache can be used. Multiple calls to init will raise an exception.

Parameters

storage_prefix	The location where all values are stored
max_file_handle_cache	The maximum number of file handles to cache

Essentially, every value is stored as a separate file inside the directory.

◆ read() [1/2]

int64_t turi::block_cache::read	(	const std::string &	key,
		char *	output,
		size_t	start = `0`,
		size_t	end = `(size_t)(-1)`
	)

Reads the value of a key into an output string, resizing the output string if necessary; Returns the number of bytes read.

Parameters

key	The key to read
output	A reference to the output string
start	Optional, denotes the start offset of the value to read. Defaults to 0.
end	Optional, denotes the end offset of the value to read. The byte at end is not read. i.e. to read the first 5 bytes of the file, you call read(key, output, 0, 5); Defaults to the length of the file.

Note that the number of bytes read can be 0 if:

start is past the end of the value
end is less than start

If start and end are not passed, the entire block is read.

Returns: A value less than 0 on failure.

◆ read() [2/2]

int64_t turi::block_cache::read	(	const std::string &	key,
		std::string &	output,
		size_t	start = `0`,
		size_t	end = `(size_t)(-1)`
	)

string overload. Note the char* reader is faster.

◆ set_max_capacity()

void turi::block_cache::set_max_capacity ( size_t )

Gets the maximum number of files managed. If 0, there is no max capacity.

◆ value_length()

int64_t turi::block_cache::value_length ( const std::string & key )

Returns the length of the value of a particular key.

Parameters

key	The key to query

Returns: the length of the value on success, a value < 0 on failure.

◆ write()

bool turi::block_cache::write	(	const std::string &	key,
		const std::string &	value
	)

Writes a string to a key. Returns true on success. The key must not already exist. If the key already exists this fails and false is returned. When operating on a distributed filesystem, note that every machine sharing the same storage prefix have a common key space.

Parameters

key	The key to write to
value	The value to write

Returns: true on success, false on failure

The documentation for this class was generated from the following file:

core/storage/fileio/block_cache.hpp

Public Member Functions

Static Public Member Functions

Detailed Description

Use On a Distributed File System

Design Notes

Constructor & Destructor Documentation

◆ block_cache()

Member Function Documentation

◆ evict_key()

◆ file_handle_cache_hits()

◆ file_handle_cache_misses()

◆ get_instance()

◆ get_max_capacity()

◆ hold_cache_provider()

◆ init()

◆ read() [1/2]

◆ read() [2/2]

◆ set_max_capacity()

◆ value_length()

◆ write()