Turi Create  4.0
turi::integer_pack Namespace Reference

Functions

template<typename OutArcType >
static void variable_encode (OutArcType &oarc, uint64_t s)
 
template<typename InArcType >
static void variable_decode (InArcType &iarc, uint64_t &s)
 
uint64_t shifted_integer_encode (int64_t val)
 
int64_t shifted_integer_decode (uint64_t val)
 
template<typename OutArcType >
GL_HOT void frame_of_reference_encode_128 (const uint64_t *input, size_t len, OutArcType &oarc)
 
template<typename InArcType >
void frame_of_reference_decode_128 (InArcType &iarc, size_t len, uint64_t *output)
 
static size_t pack_1 (const uint64_t *src, size_t srclen, uint8_t *out)
 
static size_t pack_2 (const uint64_t *src, size_t srclen, uint8_t *out)
 
static size_t pack_4 (const uint64_t *src, size_t srclen, uint8_t *out)
 
static size_t pack_8 (const uint64_t *src, size_t srclen, uint8_t *out)
 
static size_t pack_16 (const uint64_t *src, size_t srclen, uint16_t *out)
 
static size_t pack_32 (const uint64_t *src, size_t srclen, uint32_t *out)
 
static void unpack_1 (const uint8_t *src, size_t nout_values, uint64_t *out)
 
static void unpack_2 (const uint8_t *src, size_t nout_values, uint64_t *out)
 
static void unpack_4 (const uint8_t *src, size_t nout_values, uint64_t *out)
 
static void unpack_8 (const uint8_t *src, size_t nout_values, uint64_t *out)
 
static void unpack_16 (const uint16_t *src, size_t nout_values, uint64_t *out)
 
static void unpack_32 (const uint32_t *src, size_t nout_values, uint64_t *out)
 

Variables

static constexpr unsigned char FRAME_OF_REFERENCE = 0
 
static constexpr unsigned char FRAME_OF_REFERENCE_DELTA = 1
 
static constexpr unsigned char FRAME_OF_REFERENCE_DELTA_NEGATIVE = 2
 
static constexpr unsigned char FRAME_OF_REFERENCE_HEADER_NUM_BITS = 2
 
static constexpr unsigned char FRAME_OF_REFERENCE_HEADER_MASK = 3
 

Detailed Description

Integer Packing Routines

Function Documentation

◆ frame_of_reference_decode_128()

template<typename InArcType >
void turi::integer_pack::frame_of_reference_decode_128 ( InArcType &  iarc,
size_t  len,
uint64_t *  output 
)

Performs a group decode of a collection of up to 128 64-bit numbers. See frame_of_reference_encode_128() for the encoding details.

Definition at line 427 of file integer_pack.hpp.

◆ frame_of_reference_encode_128()

template<typename OutArcType >
GL_HOT void turi::integer_pack::frame_of_reference_encode_128 ( const uint64_t *  input,
size_t  len,
OutArcType &  oarc 
)

Performs a group encode of a collection of up to 128 64-bit numbers.

There are 3 basic strategies for computing the code.

  • Frame of Reference Coding: (code difference to a minimum value) Use variable_encode() to code the smallest value. Then compute an array which is the difference of every number against the smallest value, and pack it using as few bits as possible. See below for the details on the packing.
  • Frame of Reference Delta Coding: (code incremental gaps) Use variable_encode() to code the first value. Then compute a delta array (which is the the difference between consecutive values), and pack that using as few bits as possible. See below for the details on the packing.
  • Frame of Reference Delta Negative Coding: (Like delta, but supports negative gaps) Use variable_encode() to code the first value. Then compute a delta array (which is the the difference between consecutive values), and apply the shifted_integer_encode() to the delta array, and pack that using as few bits as possible. See below for the details on the packing.

Packing

After the values to be coded are generated (see above), the remaining numbers are packed by finding the maximum number of bits required to represent any value. i.e.

[0,0,0,0,0,0] --> 0 bit maximum
[1,0,1,0,1,0] --> 1 bit maximum
[1,0,1,0,1,7] --> 3 bits maximum
[1,0,16,0,1,7] --> 5 bits maximum

This maximum is then rounded up to the nearest power of two (thus ensuring the coding/decoding is simple and is generally word-aligned.) i.e.

[0,0,0,0,0,0] --> 0 bit code
[1,0,1,0,1,0] --> 1 bit code
[1,0,1,0,1,7] --> 4 bit code
[1,0,16,0,1,7] --> 8 bit code

Then one of the pack_... functions are used to code the values.

Coding

First there is a 1 byte header: [6 bit: 1 + log2 code length] [2 bits codec type] first 6 bits: If code length == 0 , we write 0 here. Otherwise, 1 + log2 code length; The codec type is one of:

  • FRAME_OF_REFERENCE
  • FRAME_OF_REFERENCE_DELTA
  • FRAME_OF_REFERENCE_DELTA_NEGATIVE
Note
The coding does not store the number of values stored. The decoder frame_of_reference_decode_128() requires the number of values to decode correctly.

Definition at line 282 of file integer_pack.hpp.

◆ pack_1()

static size_t turi::integer_pack::pack_1 ( const uint64_t *  src,
size_t  srclen,
uint8_t *  out 
)
inlinestatic

Packs a sequence of 1 bit values

Definition at line 28 of file integer_pack_impl.hpp.

◆ pack_16()

static size_t turi::integer_pack::pack_16 ( const uint64_t *  src,
size_t  srclen,
uint16_t *  out 
)
inlinestatic

Packs a sequence of 16 bit values

Definition at line 118 of file integer_pack_impl.hpp.

◆ pack_2()

static size_t turi::integer_pack::pack_2 ( const uint64_t *  src,
size_t  srclen,
uint8_t *  out 
)
inlinestatic

Packs a sequence of 2 bit values

Definition at line 51 of file integer_pack_impl.hpp.

◆ pack_32()

static size_t turi::integer_pack::pack_32 ( const uint64_t *  src,
size_t  srclen,
uint32_t *  out 
)
inlinestatic

Packs a sequence of 32 bit values

Definition at line 130 of file integer_pack_impl.hpp.

◆ pack_4()

static size_t turi::integer_pack::pack_4 ( const uint64_t *  src,
size_t  srclen,
uint8_t *  out 
)
inlinestatic

Packs a sequence of 4 bit values

Definition at line 77 of file integer_pack_impl.hpp.

◆ pack_8()

static size_t turi::integer_pack::pack_8 ( const uint64_t *  src,
size_t  srclen,
uint8_t *  out 
)
inlinestatic

Packs a sequence of 8 bit values

Definition at line 106 of file integer_pack_impl.hpp.

◆ shifted_integer_decode()

int64_t turi::integer_pack::shifted_integer_decode ( uint64_t  val)
inline

Reverse of shifted_integer_encode. Maps values [0,1,2,3,4,5,6,...] to [0,-1,1,-2,2,-3,3,-4,4...]

Definition at line 189 of file integer_pack.hpp.

◆ shifted_integer_encode()

uint64_t turi::integer_pack::shifted_integer_encode ( int64_t  val)
inline

Maps values [0,-1,1,-2,2,-3,3,-4,4...] to [0,1,2,3,4,5,6,...] return 2*|val| - sign.

(This is equivalent Google Protobuf's ZigZag encoding (see https://developers.google.com/protocol-buffers/docs/encoding#types) and can be more compactly written as (n << 1) ^ (n >> 63) (requiring only 3 ops instead of 5 ops here)

Definition at line 177 of file integer_pack.hpp.

◆ unpack_1()

static void turi::integer_pack::unpack_1 ( const uint8_t *  src,
size_t  nout_values,
uint64_t *  out 
)
inlinestatic

Unpacks a sequence of 1 bit values

Definition at line 143 of file integer_pack_impl.hpp.

◆ unpack_16()

static void turi::integer_pack::unpack_16 ( const uint16_t *  src,
size_t  nout_values,
uint64_t *  out 
)
inlinestatic

Unpacks a sequence of 16 bit values

Definition at line 227 of file integer_pack_impl.hpp.

◆ unpack_2()

static void turi::integer_pack::unpack_2 ( const uint8_t *  src,
size_t  nout_values,
uint64_t *  out 
)
inlinestatic

Unpacks a sequence of 2 bit values

Definition at line 168 of file integer_pack_impl.hpp.

◆ unpack_32()

static void turi::integer_pack::unpack_32 ( const uint32_t *  src,
size_t  nout_values,
uint64_t *  out 
)
inlinestatic

Unpacks a sequence of 32 bit values

Definition at line 237 of file integer_pack_impl.hpp.

◆ unpack_4()

static void turi::integer_pack::unpack_4 ( const uint8_t *  src,
size_t  nout_values,
uint64_t *  out 
)
inlinestatic

Unpacks a sequence of 4 bit values

Definition at line 191 of file integer_pack_impl.hpp.

◆ unpack_8()

static void turi::integer_pack::unpack_8 ( const uint8_t *  src,
size_t  nout_values,
uint64_t *  out 
)
inlinestatic

Unpacks a sequence of 8 bit values

Definition at line 216 of file integer_pack_impl.hpp.

◆ variable_decode()

template<typename InArcType >
static void turi::integer_pack::variable_decode ( InArcType &  iarc,
uint64_t &  s 
)
inlinestatic

Performs a byte aligned variable length decode of 8 byte wide integers. See the documentation for variable_encode() for details on the encoding.

Definition at line 131 of file integer_pack.hpp.

◆ variable_encode()

template<typename OutArcType >
static void turi::integer_pack::variable_encode ( OutArcType &  oarc,
uint64_t  s 
)
inlinestatic

Performs a byte aligned variable length encode of 8 byte wide integers.

It is important to remember that x86(-64) is little endian. i.e. the least significant byte is at the lowest memory address. This makes the presentation a little different from how most other integer compression blogs which generally "prefix" the binary string with stuff, thus in the most significant bits. This makes it very complicated to decode. Here we are instead going to attach stuff to the lowest significant bits which will then appear as a prefix when writing to a stream.

The format is relatively straightforward. [... binary representation of number .. ] [0] [as many "1" bits as (length of number in bytes, rounded up. - 1)]

Essentially this leaves 8 possible encodings

  • [ 7 bit number ] [0] –> 1 byte
  • [ 14 bit number ] [0] [1] –> 2 bytes
  • [ 21 bit number ] [0] [11] –> 3 bytes
  • [ 28 bit number ] [0] [111] –> 4 bytes
  • [ 35 bit number ] [0] [1111] –> 5 bytes
  • [ 42 bit number ] [0] [11111] –> 6 bytes
  • [ 49 bit number ] [0] [111111] –> 7 bytes
  • [ 64 bit number ] [0] [1111111] –> 9 bytes

Note that there is no code of total length 8 bytes. (since the suffix has 7 "1"'s, the suffix is 1 byte long, and the remaining 8 bytes can hold all 64-bit numbers.

The basic encode algorithm is then as such: if s >> [...] == 0 (test if s can be encoded with just that many bits) Then to put in the suffix, (s << ...) | (...)

Example codes: [0001100] [0] – decodes to –> [00001100] = 12 [0001100] [0] – decodes to –> [00001100] = 12 [00000010] [001100] [0] [1] – decodes to –> [00000010,00001100] = 524

Definition at line 70 of file integer_pack.hpp.

Variable Documentation

◆ FRAME_OF_REFERENCE

constexpr unsigned char turi::integer_pack::FRAME_OF_REFERENCE = 0
static

The codec number for "frame of reference" coding.

Definition at line 200 of file integer_pack.hpp.

◆ FRAME_OF_REFERENCE_DELTA

constexpr unsigned char turi::integer_pack::FRAME_OF_REFERENCE_DELTA = 1
static

The codec number for "frame of reference" delta-coding.

Definition at line 205 of file integer_pack.hpp.

◆ FRAME_OF_REFERENCE_DELTA_NEGATIVE

constexpr unsigned char turi::integer_pack::FRAME_OF_REFERENCE_DELTA_NEGATIVE = 2
static

The codec number for "frame of reference" delta-coding with negative numbers.

Definition at line 209 of file integer_pack.hpp.

◆ FRAME_OF_REFERENCE_HEADER_MASK

constexpr unsigned char turi::integer_pack::FRAME_OF_REFERENCE_HEADER_MASK = 3
static

The mask you can apply to the codec header to extract the codec number

Definition at line 219 of file integer_pack.hpp.

◆ FRAME_OF_REFERENCE_HEADER_NUM_BITS

constexpr unsigned char turi::integer_pack::FRAME_OF_REFERENCE_HEADER_NUM_BITS = 2
static

Number of bits used by the header to store the codec number

Definition at line 214 of file integer_pack.hpp.