Release Notes

6.2.19

Fixes

  • Protect the proxies from running out of memory when bombarded with requests from clients. (PR #2812).
  • One process with a proxy class would not become the first proxy when put with other stateless class processes. (PR #2819).
  • If a transaction log stalled on a disk operation during recruitment the cluster would become unavailable until the process died. (PR #2815).
  • Avoid recruiting satellite transaction logs when usable_regions=1. (PR #2813).
  • Prevent the cluster from having too many active generations as a safety measure against repeated failures. (PR #2814).
  • fdbcli status JSON could become truncated because of unprintable characters. (PR #2807).
  • The data distributor used too much CPU in large clusters (broken in 6.2.16). (PR #2806).

Status

  • Added cluster.workload.operations.memory_errors to measure the number of requests rejected by the proxies because the memory limit has been exceeded. (PR #2812).
  • Added cluster.workload.operations.location_requests to measure the number of outgoing key server location responses from the proxies. (PR #2812).
  • Added cluster.recovery_state.active_generations to track the number of generations for which the cluster still requires transaction logs. (PR #2814).
  • Added network.tls_policy_failures to the processes section to record the number of TLS policy failures each process has observed. (PR #2811).

Features

  • Added --debug-tls as a command line argument to fdbcli to help diagnose TLS issues. (PR #2810).

6.2.18

Fixes

  • When configuring a cluster to usable_regions=2, data distribution would not react to machine failures while copying data to the remote region. (PR #2774).
  • When a cluster is configured with usable_regions=2, data distribution could push a cluster into saturation by relocating too many shards simulatenously. (PR #2776).
  • Do not allow the cluster controller to mark any process as failed within 30 seconds of startup. (PR #2780).
  • Backup could not establish TLS connections (broken in 6.2.16). (PR #2775).
  • Certificates were not refreshed automatically (broken in 6.2.16). (PR #2781).

Performance

  • Improved the efficiency of establishing large numbers of network connections. (PR #2777).

Features

  • Add support for setting knobs to modify the behavior of fdbcli. (PR #2773).

Other Changes

  • Setting invalid knobs in backup and DR binaries is now a warning instead of an error and will not result in the application being terminated. (PR #2773).

6.2.17

Fixes

  • Restored the ability to set TLS configuration using environment variables (broken in 6.2.16). (PR #2755).

6.2.16

Performance

  • Reduced tail commit latencies by improving commit pipelining on the proxies. (PR #2589).
  • Data distribution does a better job balancing data when disks are more than 70% full. (PR #2722).
  • Reverse range reads could read too much data from disk, resulting in poor performance relative to forward range reads. (PR #2650).
  • Switched from LibreSSL to OpenSSL to improve the speed of establishing connections. (PR #2650).
  • The cluster controller does a better job avoiding multiple recoveries when first recruited. (PR #2698).

Fixes

  • Storage servers could fail to advance their version correctly in response to empty commits. (PR #2617).
  • Status could not label more than 5 processes as proxies. (PR #2653).
  • The TR_FLAG_DISABLE_MACHINE_TEAM_REMOVER, TR_FLAG_REMOVE_MT_WITH_MOST_TEAMS, TR_FLAG_DISABLE_SERVER_TEAM_REMOVER, and BUGGIFY_ALL_COORDINATION knobs could not be set at runtime. (PR #2661).
  • Backup container filename parsing was unnecessarily consulting the local filesystem which will error when permission is denied. (PR #2693).
  • Rebalancing data movement could stop doing work even though the data in the cluster was not well balanced. (PR #2703).
  • Data movement uses available space rather than free space when deciding how full a process is. (PR #2708).
  • Fetching status attempts to reuse its connection with the cluster controller. (PR #2583).

6.2.15

Fixes

  • TLS throttling could block legitimate connections. (PR #2575).

6.2.14

Fixes

  • Data distribution was prioritizing shard merges too highly. (PR #2562).
  • Status would incorrectly mark clusters as having no fault tolerance. (PR #2562).
  • A proxy could run out of memory if disconnected from the cluster for too long. (PR #2562).

6.2.13

Performance

  • Optimized the commit path the proxies to significantly reduce commit latencies in large clusters. (PR #2536).
  • Data distribution could create temporarily untrackable shards which could not be split if they became hot. (PR #2546).

6.2.12

Performance

  • Throttle TLS connect attempts from misconfigured clients. (PR #2529).
  • Reduced master recovery times in large clusters. (PR #2430).
  • Improved performance while a remote region is catching up. (PR #2527).
  • The data distribution algorithm does a better job preventing hot shards while recovering from machine failures. (PR #2526).

Fixes

  • Improve the reliability of a kill command from fdbcli. (PR #2512).
  • The --traceclock parameter to fdbserver incorrectly had no effect. (PR #2420).
  • Clients could throw an internal error during commit if client buggification was enabled. (PR #2427).
  • Backup and DR agent transactions which update and clean up status had an unnecessarily high conflict rate. (PR #2483).
  • The slow task profiler used an unsafe call to get a timestamp in its signal handler that could lead to rare crashes. (PR #2515).

6.2.11

Fixes

  • Clients could hang indefinitely on reads if all storage servers holding a keyrange were removed from a cluster since the last time the client read a key in the range. (PR #2377).
  • In rare scenarios, status could falsely report no replicas remain of some data. (PR #2380).
  • Latency band tracking could fail to configure correctly after a recovery or upon process startup. (PR #2371).

6.2.10

Fixes

6.2.9

Fixes

  • Small clusters using specific sets of process classes could cause the data distributor to be continuously killed and re-recruited. (PR #2344).
  • The data distributor and ratekeeper could be recruited on non-optimal processes. (PR #2344).
  • A kill command from fdbcli could take a long time before being executed by a busy process. (PR #2339).
  • Committing transactions larger than 1 MB could cause the proxy to stall for up to a second. (PR #2350).
  • Transaction timeouts would use memory for the entire duration of the timeout, regardless of whether the transaction had been destroyed. (PR #2353).

6.2.8

Fixes

  • Significantly improved the rate at which the transaction logs in a remote region can pull data from the primary region. (PR #2307) (PR #2323).
  • The system_kv_size_bytes status field could report a size much larger than the actual size of the system keyspace. (PR #2305).

6.2.7

Performance

  • A new transaction log spilling implementation is now the default. Write bandwidth and latency will no longer degrade during storage server or remote region failures. (PR #1731).
  • Storage servers will locally throttle incoming read traffic when they are falling behind. (PR #1447).
  • Use CRC32 checksum for SQLite pages. (PR #1582).
  • Added a 96-byte fast allocator, so storage queue nodes use less memory. (PR #1336).
  • Improved network performance when sending large packets. (PR #1684).
  • Spilled data can be consumed from transaction logs more quickly and with less overhead. (PR #1584).
  • Clients no longer talk to the cluster controller for failure monitoring information. (PR #1640).
  • Reduced the number of connection monitoring messages between clients and servers. (PR #1768).
  • Close connections which have been idle for a long period of time. (PR #1768).
  • Each client connects to exactly one coordinator, and at most five proxies. (PR #1909).
  • Ratekeeper will throttle traffic when too many storage servers are not making versions durable fast enough. (PR #1784).
  • Storage servers recovering a memory storage engine will abort recovery if the cluster is already healthy. (PR #1713).
  • Improved how the data distribution algorithm balances data across teams of storage servers. (PR #1785).
  • Lowered the priority for data distribution team removal, to avoid prioritizing team removal work over splitting shards. (PR #1853).
  • Made the storage cache eviction policy configurable, and added an LRU policy. (PR #1506).
  • Improved the speed of recoveries on large clusters at log_version >= 4. (PR #1729).
  • Log routers will prefer to peek from satellites at log_version >= 4. (PR #1795).
  • In clusters using a region configuration, clients will read from the remote region if all of the servers in the primary region are overloaded. [6.2.3] (PR #2019).
  • Significantly improved the rate at which the transaction logs in a remote region can pull data from the primary region. [6.2.4] (PR #2101).
  • Raised the data distribution priority of splitting shards because delaying splits can cause hot write shards. [6.2.6] (PR #2234).

Fixes

  • During an upgrade, the multi-version client now persists database default options and transaction options that aren’t reset on retry (e.g. transaction timeout). In order for these options to function correctly during an upgrade, a 6.2 or later client should be used as the primary client. (PR #1767).
  • If a cluster is upgraded during an onError call, the cluster could return a cluster_version_changed error. (PR #1734).
  • Data distribution will now pick a random destination when merging shards in the \xff keyspace. This avoids an issue with backup where the write-heavy mutation log shards could concentrate on a single process that has less data than everybody else. (PR #1916).
  • Setting --machine_id (or -i) for an fdbserver process now sets locality_machineid in addition to locality_zoneid. (PR #1928).
  • File descriptors opened by clients and servers set close-on-exec, if available on the platform. (PR #1581).
  • fdbrestore commands other than start required a default cluster file to be found but did not actually use it. (PR #1912).
  • Unneeded network connections were not being closed because peer reference counts were handled improperly. (PR #1768).
  • In very rare scenarios, master recovery would restart because system metadata was loaded incorrectly. (PR #1919).
  • Ratekeeper will aggressively throttle when unable to fetch the list of storage servers for a considerable period of time. (PR #1858).
  • Proxies could become overloaded when all storage servers on a team fail. [6.2.1] (PR #1976).
  • Proxies could start too few transactions if they didn’t receive get read version requests frequently enough. [6.2.3] (PR #1999).
  • The fileconfigure command in fdbcli could fail with an unknown error if the file did not contain a valid JSON object. (PR #2017).
  • Configuring regions would fail with an internal error if the cluster contained storage servers that didn’t set a datacenter ID. (PR #2017).
  • Clients no longer prefer reading from servers with the same zone ID, because it could create hot shards. [6.2.3] (PR #2019).
  • Data distribution could fail to start if any storage servers had misconfigured locality information. This problem could persist even after the offending storage servers were removed or fixed. [6.2.5] (PR #2110).
  • Data distribution was running at too high of a priority, which sometimes caused other roles on the same process to stall. [6.2.5] (PR #2170).
  • Loading a 6.1 or newer fdb_c library as a secondary client using the multi-version client could lead to an infinite recursion when run with API versions older than 610. [6.2.5] (PR #2169)
  • Using C API functions that were removed in 6.1 when using API version 610 or above now results in a compilation error. [6.2.5] (PR #2169)
  • Coordinator changes could fail to complete if the database wasn’t allowing any transactions to start. [6.2.6] (PR #2191)
  • Status would report incorrect fault tolerance metrics when a remote region was configured and the primary region lost a storage replica. [6.2.6] (PR #2230)
  • The cluster would not change to a new set of satellite transaction logs when they become available in a better satellite location. [6.2.6] (PR #2241).
  • The existence of proxy or resolver class processes prevented stateless class processes from being recruited as proxies or resolvers. [6.2.6] (PR #2241).
  • The cluster controller could become saturated in clusters with large numbers of connected clients using TLS. [6.2.6] (PR #2252).
  • Backup and DR would not share a mutation stream if they were started on different versions of FoundationDB. Either backup or DR must be restarted to resolve this issue. [6.2.6] (PR #2202).
  • Don’t track batch priority GRV requests in latency bands. [6.2.7] (PR #2279).
  • Transaction log processes used twice their normal memory when switching spill types. [6.2.7] (PR #2256).
  • Under certain conditions, cross region replication could stall for 10 minute periods. [6.2.7] (PR #1818) (PR #2276).
  • When dropping a remote region from the configuration after processes in the region have failed, data distribution would create teams from the dead servers for one minute. [6.2.7] (PR #2286).

Status

  • Added run_loop_busy to the processes section to record the fraction of time the run loop is busy. (PR #1760).
  • Added cluster.page_cache section to status. In this section, added two new statistics storage_hit_rate and log_hit_rate that indicate the fraction of recent page reads that were served by cache. (PR #1823).
  • Added transaction start counts by priority to cluster.workload.transactions. The new counters are named started_immediate_priority, started_default_priority, and started_batch_priority. (PR #1836).
  • Remove cluster.datacenter_version_difference and replace it with cluster.datacenter_lag that has subfields versions and seconds. (PR #1800).
  • Added local_rate to the roles section to record the throttling rate of the local ratekeeper (PR #1712).
  • Renamed cluster.fault_tolerance fields max_machines_without_losing_availability and max_machines_without_losing_data to max_zones_without_losing_availability and max_zones_without_losing_data (PR #1925).
  • fdbcli status now reports the configured zone count. The fault tolerance is now reported in terms of the number of zones unless machine IDs are being used as zone IDs. (PR #1924).
  • connected_clients is now only a sample of the connected clients, rather than a complete list. (PR #1902).
  • Added max_protocol_clients to the supported_versions section, which provides a sample of connected clients which cannot connect to any higher protocol version. (PR #1902).
  • Clients which connect without specifying their supported versions are tracked as an Unknown version in the supported_versions section. [6.2.2] (PR #1990).
  • Add coordinator to the list of roles that can be reported for a process. [6.2.3] (PR #2006).
  • Added worst_durability_lag_storage_server and limiting_durability_lag_storage_server to the cluster.qos section, each with subfields versions and seconds. These report the durability lag values being used by ratekeeper to potentially limit the transaction rate. [6.2.3] (PR #2003).
  • Added worst_data_lag_storage_server and limiting_data_lag_storage_server to the cluster.qos section, each with subfields versions and seconds. These are meant to replace worst_version_lag_storage_server and limiting_version_lag_storage_server, which are now deprecated. [6.2.3] (PR #2003).
  • Added system_kv_size_bytes to the cluster.data section to record the size of the system keyspace. [6.2.5] (PR #2170).

Bindings

  • API version updated to 620. See the API version upgrade guide for upgrade details.
  • Add a transaction size limit as both a database option and a transaction option. (PR #1725).
  • Added a new API to get the approximated transaction size before commit, e.g., fdb_transaction_get_approximate_size in the C binding. (PR #1756).
  • C: fdb_future_get_version has been renamed to fdb_future_get_int64. (PR #1756).
  • C: Applications linking to libfdb_c can now use pkg-config foundationdb-client or find_package(FoundationDB-Client ...) (for cmake) to get the proper flags for compiling and linking. (PR #1636).
  • Go: The Go bindings now require Go version 1.11 or later.
  • Go: Finalizers could run too early leading to undefined behavior. (PR #1451).
  • Added a transaction option to control the field length of keys and values in debug transaction logging in order to avoid truncation. (PR #1844).
  • Added a transaction option to control the whether get_addresses_for_key includes a port in the address. This will be deprecated in api version 700, and addresses will include ports by default. [6.2.4] (PR #2060).
  • Python: Versionstamp comparisons didn’t work in Python 3. [6.2.4] (PR #2089).

Features

  • Added the cleanup command to fdbbackup which can be used to remove orphaned backups or DRs. [6.2.5] (PR #2170).
  • Added the ability to configure satellite_logs by satellite location. This will overwrite the region configure of satellite_logs if both are present. [6.2.6] (PR #2241).

Other Changes

  • Added the primitives for FDB backups based on disk snapshots. This provides an ability to take a cluster level backup based on disk level snapshots of the storage, tlogs and coordinators. (PR #1733).
  • Foundationdb now uses the flatbuffers serialization format for all network messages. (PR 1090).
  • Clients will throw transaction_too_old when attempting to read if setVersion was called with a version smaller than the smallest read version obtained from the cluster. This is a protection against reading from the wrong cluster in multi-cluster scenarios. (PR #1413).
  • Trace files are now ordered lexicographically. This means that the filename format for trace files has changed. (PR #1828).
  • Improved TransactionMetrics log events by adding a random UID to distinguish multiple open connections, a flag to identify internal vs. client connections, and logging of rates and roughness in addition to total count for several metrics. (PR #1808).
  • FoundationDB can now be built with clang and libc++ on Linux. (PR #1666).
  • Added experimental framework to run C and Java clients in simulator. (PR #1678).
  • Added new network options for client buggify which will randomly throw expected exceptions in the client. This is intended to be used for client testing. (PR #1417).
  • Added --cache_memory parameter for fdbserver processes to control the amount of memory dedicated to caching pages read from disk. (PR #1889).
  • Added MakoWorkload, used as a benchmark to do performance testing of FDB. (PR #1586).
  • fdbserver now accepts a comma separated list of public and listen addresses. (PR #1721).
  • CAUSAL_READ_RISKY has been enhanced to further reduce the chance of causally inconsistent reads. Existing users of CAUSAL_READ_RISKY may see increased GRV latency if proxies are distantly located from logs. (PR #1841).
  • CAUSAL_READ_RISKY can be turned on for all transactions using a database option. (PR #1841).
  • Added a no_wait option to the fdbcli exclude command to avoid blocking. (PR #1852).
  • Idle clusters will fsync much less frequently. (PR #1697).
  • CMake is now the official build system. The Makefile based build system is deprecated.
  • The incompatible client list in status (cluster.incompatible_connections) may now spuriously include clients that use the multi-version API to try connecting to the cluster at multiple versions.

Fixes only impacting 6.2.0+

  • Clients could crash when closing connections with incompatible servers. [6.2.1] (PR #1976).
  • Do not close idle network connections with incompatible servers. [6.2.1] (PR #1976).
  • In status, max_protocol_clients were incorrectly added to the connected_clients list. [6.2.2] (PR #1990).
  • Ratekeeper ignores the (default 5 second) MVCC window when controlling on durability lag. [6.2.3] (PR #2012).
  • The macOS client was not compatible with a Linux server. [6.2.3] (PR #2045).
  • Incompatible clients would continually reconnect with coordinators. [6.2.3] (PR #2048).
  • Connections were being closed as idle when there were still unreliable requests waiting for a response. [6.2.3] (PR #2048).
  • The cluster controller would saturate its CPU for a few seconds when sending configuration information to all of the worker processes. [6.2.4] (PR #2086).
  • The data distributor would build all possible team combinations if it was tracking an unhealthy server with less than 10 teams. [6.2.4] (PR #2099).
  • The cluster controller could crash if a coordinator was unreachable when compiling cluster status. [6.2.4] (PR #2065).
  • A storage server could crash if it took longer than 10 minutes to fetch a key range from another server. [6.2.5] (PR #2170).
  • Excluding or including servers would restart the data distributor. [6.2.5] (PR #2170).
  • The data distributor could read invalid memory when estimating database size. [6.2.6] (PR #2225).
  • Status could incorrectly report that backup and DR were not sharing a mutation stream. [6.2.7] (PR #2274).