Turi Create  4.0
Serialization

Modules

 Technical Details: Serialization
 

Classes

class  turi::dir_archive
 
class  turi::iarchive
 The serialization input archive object which, provided with a reference to an istream, will read from the istream, providing deserialization capabilities. More...
 
class  turi::iarchive_soft_fail
 When this archive is used to deserialize an object, and the object does not support serialization, failure will only occur at runtime. Otherwise equivalent to turi::iarchive. More...
 
struct  turi::IS_POD_TYPE
 Inheriting from this type will force the serializer to treat the derived type as a POD type. More...
 
struct  turi::gl_is_pod< T >
 Tests if T is a POD type. More...
 
class  turi::oarchive
 The serialization output archive object which, provided with a reference to an ostream, will write to the ostream, providing serialization capabilities. More...
 
class  turi::oarchive_soft_fail
 When this archive is used to serialize an object, and the object does not support serialization, failure will only occur at runtime. Otherwise equivalent to turi::oarchive. More...
 
struct  turi::unsupported_serialize
 Inheritting from this class will prevent the serialization of the derived class. Used for debugging purposes. More...
 

Macros

#define BEGIN_OUT_OF_PLACE_LOAD(arc, tname, tval)
 Macro to make it easy to define out-of-place loads. More...
 
#define BEGIN_OUT_OF_PLACE_SAVE(arc, tname, tval)
 Macro to make it easy to define out-of-place saves. More...
 
#define TURI_UNSERIALIZABLE(tname)
 A macro which disables the serialization of type so that it will fault at runtime. More...
 

Functions

template<typename OutArcType , typename RandomAccessIterator >
void turi::serialize_iterator (OutArcType &oarc, RandomAccessIterator begin, RandomAccessIterator end)
 Serializes the contents between the iterators begin and end. More...
 
template<typename OutArcType , typename InputIterator >
void turi::serialize_iterator (OutArcType &oarc, InputIterator begin, InputIterator end, size_t vsize)
 Serializes the contents between the iterators begin and end. More...
 
template<typename InArcType , typename T , typename OutputIterator >
void turi::deserialize_iterator (InArcType &iarc, OutputIterator result)
 The accompanying function to serialize_iterator() Reads elements from the stream and writes it to the output iterator. More...
 
template<typename T >
std::string turi::serialize_to_string (const T &t)
 Serializes a object to a string. More...
 
template<typename T >
void turi::deserialize_from_string (const std::string &s, T &t)
 Deserializes a object from a string. More...
 

Detailed Description

We have a custom serialization scheme which is designed for performance rather than compatibility. It does not perform type checking, It does not perform pointer tracking, and has only limited support across platforms. It has been tested, and should be compatible across x86 platforms.

For a summary of all serialization functionality see Serialization For more technical details, see Technical Details: Serialization .

There are two serialization classes turi::oarchive and turi::iarchive. The former does output, while the latter does input. To include all serialization headers, #include <turicreate/serialization/serialization_includes.hpp>.

Basic serialize/deserialize

To serialize data to disk, you just create an output archive, and associate it wiith an output stream.

For instance, to serialize to a file called "file.bin":

std::ofstream fout("file.bin", std::fstream::binary);
turi::oarchive oarc(fout);

The << stream operators are then used to write data into the archive.

int i = 10;
double j = 20;
std::vector<float> v(10,1.0); // create a vector of 10 "1.0" values
oarc << i << j << v;

To read back, you use the iarchive with an input stream, and read back the variables in the same order:

std::ifstream fin("file.bin", std::fstream::binary);
turi::iarchive iarc(fout);
int i;
double j;
std::vector<float> v;
iarc >> i >> j >> v;

Serializable

So what type of data is serializable?

Integer Types

All integer datatypes are serializable.

Since all fixed width integer types from stdint (int16_t, int32_t, etc) are derived from these basic types, all fixed width integer types are also serializable.

All integer types are saved in their raw binary form without any additional re-encoding. It is therefore important to deserialize with the same integer width as what was serialized.

The following code will fail in dramatic ways:

int i;
oarc << i; // write some integer to a file
...
// some time later we need to read back the integer.
long j;
iarc >> j; // this will fail

Floating Point Types

All floating point data types are serializable.

Similar to integer types, all floating types are saved in raw binary form without re-encoding. You must deserialize with the same floating point width as what was serialized. (i.e. if you serialize a double, you must deserialize a double.

Containers

The following template containers are serializable as long as the contained types are all serializable. This can be recursively applied.

For instance, a std::vector<int> is serializable. A std::list<std::vector<int> > is therefore also serializable.

There is special handling for the std::vector<T> for performance in the event that T is a simple POD (Plain Old Data) data type. POD types are data types which occupy a contiguous region in memory. For instance, basic types (double, int, etc), or structs which contains only basic types. Such types can be copied or replicated using a simple mem-copy operation and can be greatly acceleration during serialization / deserialization. All basic data types are automatically POD types. We will discuss structs and other user types in the next section.

User Structs and Classes

To serialize a struct/class, all you need to do is to define a public load/save function. For instance:

class TestClass{
public:
int i, j;
std::vector<int> k;
void save(turi::oarchive& oarc) const {
oarc << i << j << k;
}
void load(turi::iarchive& iarc) {
iarc >> i >> j >> k;
}
};

The save() and load() function prototypes must match exactly. Other conditions are that the class must be Default Constructible:

// it must be possible to create a variable of TestClass type like this
TestClass a;

And that the class must be Assignable:

TestClass a, b;
// it must be possible to assign one variable of TestClass to another
b = a;

After which, TestClass becomes serializable, and can be stored and read from an archive:

TestClass t;
// set values to t
oarc << t; // write it to a file
... some time afterwords...
TestClass t2;
iarc >> t2; // read it to a file

Since TestClass is now serializable, containers of TestClass listed in Containers are also serializable.

POD Serialization

As mentioned in Containers, POD data types occupy a contiguous region in memory and hence can be serialized and deserialized very quickly. Ideally, determination of whether a data type is POD or not should be handled by the compiler. However, this capability is only available in C++11 and not all compilers support it yet. We therefore implemented a simple workaround which will allow you to identify to the serializer that a class is POD, and avoid writing a save/load function.

We consider the following Coordinate struct.

struct Coordinate{
int x, y, z;
};

This struct can be defined to be a POD type using an accelerated serializer by simply inheriting from turi::IS_POD_TYPE

struct Coordinate: public turi::IS_POD_TYPE{
int x, y, z;
};

Now, Coordinate variables, or even vector<Coordinate> variables will serialize/deserialize faster. Also, you avoid writing a save() and load() function.

Note
Currently POD detection is performed through the boost type traits library. When compilers implement std::is_pod (in C++11), POD detection will improve, increasing the scope of types which can be serialized quickly and automatically. A minor concern is that the scope of POD types is still slightly too large, since technically pointer types are POD, and those cannot not be serialized automatically.

Out of Place Serialization

In some situations, you may find that you need to make a data type serializable, but the data type is implemented by someone else, in a different library, making it impossible to extend and write a member save() and load() function as described in User Structs and Classes.

In this situation, it is necessary to implement an "Out of place" serializer. This is unfortunately somewhat more complicated.

For instance, if there is an external type implemented by some other library called Matrix which I would like to make serializable. The following code will have to be written in the global namespace

BEGIN_OUT_OF_PLACE_SAVE(oarc, Matrix, mat)
// write the "mat" variable which is of the type Matrix
// into the output archive oarc
END_OUT_OF_PLACE_SAVE()
BEGIN_OUT_OF_PLACE_LOAD(iarc, Matrix, mat)
// read the "mat" variable which is of the type Matrix
// from the input archive iarc
END_OUT_OF_PLACE_LOAD()

To facilitate reading and writing of data from the archives, the output oarchive object provides an turi::oarchive::write() oarchive::write() function which directly writes a sequence of bytes to the stream. Similarly, the input iarchive object provides a turi::iarchive::read() iarchive::read() function which directly reads a sequence of bytes from the stream.

For instance, if the Matrix type example above is defined in the following way:

struct Matrix {
int width; // width of the matrix
int height; // height of the matrix
double* data; // an array containing all the values in the matrix
int datalen; // the number of elements in the "data" array.
}

An "out of place" serializer could be implemented the following way:

BEGIN_OUT_OF_PLACE_SAVE(oarc, Matrix, mat)
// store the dimensions of the matrix
oarc << mat.width << mat.height;
// store the length of the data array
oarc << mat.datalen;
// write the double array
oarc.write((char*)(mat.data), sizeof(double) * mat.datalen);
END_OUT_OF_PLACE_SAVE()
BEGIN_OUT_OF_PLACE_LOAD(iarc, Matrix, mat)
// clear the matrix data if there is any
if (mat.data != NULL) delete [] mat.data;
// read the dimensions of the matrix
iarc >> mat.width >> mat.height;
// read the length of the data array
iarc >> mat.datalen;
// allocate sufficient storage for the array
mat.data = new double[mat.datalen];
// read the double array
iarc.read((char*)(mat.data), sizeof(double) * mat.datalen);
END_OUT_OF_PLACE_LOAD()

Macro Definition Documentation

◆ BEGIN_OUT_OF_PLACE_LOAD

#define BEGIN_OUT_OF_PLACE_LOAD (   arc,
  tname,
  tval 
)
Value:
namespace turi{ namespace archive_detail { \
template <typename InArcType> \
struct deserialize_impl<InArcType, tname, false>{ \
static void exec(InArcType& arc, tname & tval) {

Macro to make it easy to define out-of-place loads.

In the event that it is impractical to implement a save() and load() function in the class one wnats to serialize, it is necessary to define an "out of save" save and load.

See Out of Place Serialization for an example

Note
important! this must be defined in the global namespace!

Definition at line 314 of file iarchive.hpp.

◆ BEGIN_OUT_OF_PLACE_SAVE

#define BEGIN_OUT_OF_PLACE_SAVE (   arc,
  tname,
  tval 
)
Value:
namespace turi{ namespace archive_detail { \
template <typename OutArcType> struct serialize_impl<OutArcType, tname, false> { \
static void exec(OutArcType& arc, const tname & tval) {

Macro to make it easy to define out-of-place saves.

In the event that it is impractical to implement a save() and load() function in the class one wnats to serialize, it is necessary to define an "out of save" save and load.

See Out of Place Serialization for an example

Note
important! this must be defined in the global namespace!

Definition at line 346 of file oarchive.hpp.

◆ TURI_UNSERIALIZABLE

#define TURI_UNSERIALIZABLE (   tname)
Value:
BEGIN_OUT_OF_PLACE_LOAD(arc, tname, tval) \
ASSERT_MSG(false, "trying to deserialize an unserializable object"); \
END_OUT_OF_PLACE_LOAD() \
BEGIN_OUT_OF_PLACE_SAVE(arc, tname, tval) \
ASSERT_MSG(false, "trying to serialize an unserializable object"); \
END_OUT_OF_PLACE_SAVE() \
#define BEGIN_OUT_OF_PLACE_LOAD(arc, tname, tval)
Macro to make it easy to define out-of-place loads.
Definition: iarchive.hpp:314
#define BEGIN_OUT_OF_PLACE_SAVE(arc, tname, tval)
Macro to make it easy to define out-of-place saves.
Definition: oarchive.hpp:346

A macro which disables the serialization of type so that it will fault at runtime.

Writing TURI_UNSERIALIZABLE(T) for some typename T in the global namespace will result in an assertion failure if any attempt is made to serialize or deserialize the type T. This is largely used for debugging purposes to enforce that certain types are never serialized.

Definition at line 46 of file unsupported_serialize.hpp.

Function Documentation

◆ deserialize_from_string()

template<typename T >
void turi::deserialize_from_string ( const std::string &  s,
T &  t 
)
inline

Deserializes a object from a string.

Deserializes a serializable object t from a string using the deserializer.

Template Parameters
Tthe type of object to deserialize. Typically will be inferred by the compiler.
Parameters
sThe string to deserialize
tA reference to the object which will contain the deserialized object when the function returns
See also
serialize_from_string()

Definition at line 54 of file serialize_to_from_string.hpp.

◆ deserialize_iterator()

template<typename InArcType , typename T , typename OutputIterator >
void turi::deserialize_iterator ( InArcType &  iarc,
OutputIterator  result 
)

The accompanying function to serialize_iterator() Reads elements from the stream and writes it to the output iterator.

Note that this requires an additional template parameter T which is the "type of object to deserialize" This is necessary for instance for the map type. The map<T,U>::value_type is pair<const T,U>which is not useful since I cannot assign to it. In this case, T=pair<T,U>

Template Parameters
OutArcTypeThe output archive type.
TThe type of values to deserialize
OutputIteratorThe type of the output iterator to be written to. This should not need to be specified. The compiler will typically infer this correctly.
Parameters
iarcA reference to the input archive
resultThe output iterator to write to

Definition at line 100 of file iterator.hpp.

◆ serialize_iterator() [1/2]

template<typename OutArcType , typename RandomAccessIterator >
void turi::serialize_iterator ( OutArcType &  oarc,
RandomAccessIterator  begin,
RandomAccessIterator  end 
)

Serializes the contents between the iterators begin and end.

This function prefers random access iterators since it needs a distance between the begin and end iterator. This function as implemented will work for other input iterators but is extremely inefficient.

Template Parameters
OutArcTypeThe output archive type. This should not need to be specified. The compiler will typically infer this correctly.
RandomAccessIteratorThe iterator type. This should not need to be specified. The compiler will typically infer this correctly.
Parameters
oarcA reference to the output archive to write to.
beginThe start of the iterator range to write.
endThe end of the iterator range to write.

Definition at line 36 of file iterator.hpp.

◆ serialize_iterator() [2/2]

template<typename OutArcType , typename InputIterator >
void turi::serialize_iterator ( OutArcType &  oarc,
InputIterator  begin,
InputIterator  end,
size_t  vsize 
)

Serializes the contents between the iterators begin and end.

This functions takes all iterator types, but takes a "count" for efficiency. This count is checked and will return failure if the number of elements serialized does not match the count

Template Parameters
OutArcTypeThe output archive type. This should not need to be specified. The compiler will typically infer this correctly.
InputIteratorThe iterator type. This should not need to be specified. The compiler will typically infer this correctly.
Parameters
oarcA reference to the output archive to write to.
beginThe start of the iterator range to write.
endThe end of the iterator range to write.
vsizeThe distance between the iterators begin and end. Must match std::distance(begin, end);

Definition at line 67 of file iterator.hpp.

◆ serialize_to_string()

template<typename T >
std::string turi::serialize_to_string ( const T &  t)
inline

Serializes a object to a string.

Converts a serializable object t to a string using the serializer.

Template Parameters
Tthe type of object to serialize. Typically will be inferred by the compiler.
Parameters
tThe object to serializer
Returns
A string containing a serialized form of t
See also
deserialize_from_string()

Definition at line 28 of file serialize_to_from_string.hpp.