Release Notes

6.1.13

  • Loading a 6.1 or newer fdb_c library as a secondary client using the multi-version client could lead to an infinite recursion when run with API versions older than 610. (PR #2169)
  • Using C API functions that were removed in 6.1 when using API version 610 or above now results in a compilation error. (PR #2169)
  • fdbrestore commands other than start required a default cluster file to be found but did not actually use it. (PR #1912).

6.1.12

Fixes

  • Fixed a thread safety issue while writing large keys or values. (Issue #1846)
  • An untracked data distributor could prevent a newly recruited data distributor from being started. (PR #1849)

6.1.11

Fixes

  • Machines which were added to a cluster immediately after the cluster was upgraded to 6.1 would not be given data. (PR #1764)

6.1.10

Performance

  • Improved the recovery speed of storage servers with large amount of data. (PR #1700)

Fixes

  • The fdbrestore commands abort, wait, and status would use a default cluster file instead of the destination cluster file argument. (PR #1701)

6.1.9

Fixes

  • Sometimes a minority of coordinators would not converge to the leader. (PR #1649)
  • HTTP responses indicating a server-side error are no longer expected to contain a ResponseID header. (PR #1651)

6.1.8

Features

  • Improved replication mechanism using a new hierarchical technique that significantly reduces the frequency of data loss events even when multiple fault-tolerance zones permanently fail at the same time. After upgrading to 6.1 clusters will experience a low level of background data movement to store data in accordance with the new policy. (PR #964).
  • Added a background actor to remove redundant teams from team collection so that the healthy team number is guaranteed to not exceed the desired number. (PR #1139)
  • Get read version, read, and commit requests are counted and aggregated by server-side latency in configurable latency bands and output in JSON status. (PR #1084)
  • Added configuration option to choose log spilling implementation (PR #1160)
  • Added configuration option to choose log system implementation (PR #1160)
  • Batch priority transactions are now limited separately by ratekeeper and will be throttled at lower levels of cluster saturation. This makes it possible to run a more intense background load at saturation without significantly affecting normal priority transactions. It is still recommended not to run excessive loads at batch priority. (PR #1198)
  • Restore now requires the destination cluster to be specified explicitly to avoid confusion. (PR #1240)
  • Restore now accepts a timestamp that can be used to determine the restore version if the original cluster is available. (PR #1240)
  • Backup status and describe commands now have a --json output option. (PR #1248)
  • Separated data distribution from the master into its own role. (PR #1062)
  • Separated ratekeeper from the master into its own role. (PR #1176)
  • Added a CompareAndClear atomic op that clears a key if its value matches the supplied value. (PR #1105)
  • Added support for IPv6. (PR #1178)
  • FDB can now simultaneously listen to TLS and unencrypted ports to facilitate smoother migration to and from TLS. (PR #1157)
  • Added DISABLE_POSIX_KERNEL_AIO knob to fallback to libeio instead of kernel async I/O (KAIO) for systems that do not support KAIO or O_DIRECT flag. (PR #1283)
  • Added support for configuring the cluster to use the primary and remote DC’s as satellites. (PR #1320)
  • Added support for restoring multiple key ranges in a single restore job. (PR #1190)
  • Deprecated transaction option TRANSACTION_LOGGING_ENABLE. Added two new transaction options DEBUG_TRANSACTION_IDENTIFIER and LOG_TRANSACTION that sets an identifier for the transaction and logs the transaction to the trace file respectively. (PR #1200)
  • Clients can now specify default transaction timeouts and retry limits for all transactions through a database option. (Issue #775)
  • The “timeout”, “max retry delay”, and “retry limit” transaction options are no longer reset when the transaction is reset after a call to onError (as of API version 610). (Issue #775)
  • Added the force_recovery_with_data_loss command to fdbcli. When a cluster is configured with usable_regions=2, this command will force the database to recover in the remote region. (PR #1168)
  • Added a limit to the number of status requests the cluster controller will handle. (PR #1093) (submitted by tclinken)
  • Added a coordinator process class. Processes with this class can only be used as a coordinator, and coordinators auto will prefer to choose processes of this class. (PR #1069) (submitted by tclinken)
  • The consistencycheck fdbserver role will check the entire database at most once every week. (PR #1126)
  • Added the metadata version key (\xff/metadataVersion). The value of this key is sent with every read version. It is intended to help clients cache rarely changing metadata. (PR #1213)
  • The fdbdr switch command verifies a dr_agent exists in both directions. (Issue #1220)
  • Transaction logs that cannot commit to disk for more than 5 seconds are marked as degraded. The cluster controller will prefer to recruit transaction logs on other processes before using degraded processes. (Issue #690)
  • The memory storage engine configuration now uses the ssd engine for transaction log spilling. Transaction log spilling only happens when the transaction logs are using too much memory, so using the memory storage engine for this purpose can cause the process to run out of memory. Existing clusters will NOT automatically change their configuration. (PR #1314)
  • Trace logs can be output as JSON instead of XML using the --trace_format command line option. (PR #976) (by atn34)
  • Added modify command to fdbbackup for modifying parameters of a running backup. (PR #1237)
  • Added header parameter to blobstore backup URLs for setting custom HTTP headers. (PR #1237)
  • Added the maintenance command to fdbcli. This command will stop data distribution from moving data away from processes with a specified zoneID. (PR #1397)
  • Added the three_data_hall_fallback configuration, which can be used to drop storage replicas in a dead data hall. [6.1.1] (PR #1422)

Performance

  • Increased the get read version batch size in the client. This change reduces the load on the proxies when doing many transactions with only a few operations per transaction. (PR #1311)
  • Clients no longer attempt to connect to the master during recovery. (PR #1317)
  • Increase the rate that deleted pages are made available for reuse in the SQLite storage engine. Rename and add knobs to provide more control over this process. [6.1.3] (PR #1485)
  • SQLite page files now grow and shrink in chunks based on a knob which defaults to an effective chunk size of 100MB. [6.1.4] (PR #1482) (PR #1499)
  • Reduced the rate at which data is moved between servers, to reduce the impact a failure has on cluster performance. [6.1.4] (PR #1499)
  • Avoid closing saturated network connections which have not received ping packets. [6.1.7] (PR #1601)

Fixes

  • Python: Creating a SingleFloat for the tuple layer didn’t work with integers. (PR #1216)
  • In some cases, calling OnError with a non-retryable error would partially reset a transaction. As of API version 610, the transaction will no longer be reset in these cases and will instead put the transaction into an error state. (PR #1298)
  • Standardized datetime string format across all backup and restore command options and outputs. (PR #1248)
  • Read workload status metrics would disappear when a storage server was missing. (PR #1348)
  • The coordinators auto command could recruit multiple coordinators with the same zone ID. (Issue #988)
  • The data version of a cluster after a restore could have been lower than the restore version, making versionstamp operations get smaller. (PR #1213)
  • Fixed a few thread safety issues with slow task profiling. (PR #1085)
  • Changing the class of a process would not change its preference for becoming the cluster controller. (PR #1350)
  • The Go bindings reported an incorrect required version when trying to load an incompatible fdb_c library. (PR #1053)
  • The include command in fdbcli would falsely include all machines with IP addresses that have the included IP address as a prefix (for example include 1.0.0.1 would also include 1.0.0.10). (PR #1121)
  • Restore could crash when reading a file that ends on a block boundary (1MB default). (PR #1205)
  • Java: Successful commits and range reads no longer create FDBException objects, which avoids wasting resources and reduces memory pressure. (Issue #1235)
  • Windows: Fixed a crash when deleting files. (Issue #1380) (by KrzysFR)
  • Starting a restore on a tag already in-use would hang and the process would eventually run out of memory. (PR #1394)
  • The proxy_memory_limit_exceeded error was treated as retryable, but fdb_error_predicate returned that it is not retryable. (PR #1438).
  • Consistency check could report inaccurate shard size estimates if there were enough keys with large values and a small number of keys with small values. [6.1.3] (PR #1468).
  • Storage servers could not rejoin the cluster when the proxies were saturated. [6.1.4] (PR #1486) (PR #1499)
  • The configure command in fdbcli returned successfully even when the configuration was not changed for some error types. [6.1.4] (PR #1509)
  • Safety protections in the configure command in fdbcli would trigger spuriously when changing between three_datacenter replication and a region configuration. [6.1.4] (PR #1509)
  • Status could report an incorrect reason for ongoing data movement. [6.1.5] (PR #1544)
  • Storage servers were considered failed as soon as they were rebooted, instead of waiting to see if they rejoin the cluster. [6.1.8] (PR #1618)

Status

  • Report the number of connected coordinators for each client. This aids in monitoring client TLS support when enabling TLS on a live cluster. (PR #1222)
  • Degraded processes are reported in status json. (Issue #690)

Bindings

  • API version updated to 610. See the API version upgrade guide for upgrade details.
  • The API to create a database has been simplified across the bindings. All changes are backward compatible with previous API versions, with one exception in Java noted below. (PR #942)
  • C: FDBCluster objects and related methods (fdb_create_cluster, fdb_cluster_create_database, fdb_cluster_set_option, fdb_cluster_destroy, fdb_future_get_cluster) have been removed. (PR #942)
  • C: Added fdb_create_database that creates a new FDBDatabase object synchronously and removed fdb_future_get_database. (PR #942)
  • Python: Removed fdb.init, fdb.create_cluster, and fdb.Cluster. fdb.open no longer accepts a database_name parameter. (PR #942)
  • Java: Deprecated FDB.createCluster and Cluster. The preferred way to get a Database is by using FDB.open, which should work in both new and old API versions. (PR #942)
  • Java: Removed Cluster(long cPtr, Executor executor) constructor. This is API breaking for any code that has subclassed the Cluster class and is not protected by API versioning. (PR #942)
  • Java: Several methods relevant to read-only transactions have been moved into the ReadTransaction interface.
  • Java: Tuples now cache previous hash codes and equality checking no longer requires packing the underlying Tuples. (PR #1166)
  • Java: Tuple performance has been improved to use fewer allocations when packing and unpacking. (Issue #1206)
  • Java: Unpacking a Tuple with a byte array or string that is missing the end-of-string character now throws an error. (Issue #671)
  • Java: Unpacking a Tuple constrained to a subset of the underlying array now throws an error when it encounters a truncated integer. (Issue #672)
  • Ruby: Removed FDB.init, FDB.create_cluster, and FDB.Cluster. FDB.open no longer accepts a database_name parameter. (PR #942)
  • Golang: Deprecated fdb.StartNetwork, fdb.Open, fdb.MustOpen, and fdb.CreateCluster and added fdb.OpenDatabase and fdb.MustOpenDatabase. The preferred way to start the network and get a Database is by using FDB.OpenDatabase or FDB.OpenDefault. (PR #942)
  • Flow: Removed API::createCluster and Cluster and added API::createDatabase. The new way to get a Database is by using API::createDatabase. (PR #942) (PR #1215)
  • Flow: Changed DatabaseContext to Database, and API::createDatabase returns Reference<Database> instead of Reference<<DatabaseContext>. (PR #1215)
  • Flow: Converted Transaction into an interface and moved its implementation into an internal class. Transactions should now be created using Database::createTransaction(db). (PR #1215)
  • Flow: Added ReadTransaction interface that allows only read operations on a transaction. The Transaction interface inherits from ReadTransaction and can be used when a ReadTransaction is required. (PR #1215)
  • Flow: Changed Transaction::setVersion to Transaction::setReadVersion. (PR #1215)
  • Flow: On update to this version of the Flow bindings, client code will fail to build due to the changes in the API, irrespective of the API version used. Client code must be updated to use the new bindings API. These changes affect the bindings only and won’t impact compatibility with different versions of the cluster. (PR #1215)
  • Golang: Added fdb.Printable to print a human-readable string for a given byte array. Add Key.String(), which converts the Key to a string using the Printable function. (PR #1010) (submitted by pjvds)
  • Golang: Tuples now support Versionstamp operations. (PR #1187) (submitted by ryanworl)
  • Python: Python signal handling didn’t work when waiting on a future. In particular, pressing Ctrl-C would not successfully interrupt the program. (PR #1138)

Other Changes

  • Migrated to Boost 1.67. (PR #1242)
  • IPv4 address in trace log filename is no longer zero-padded. (PR #1157)
  • The process_behind error can now be thrown by clients and is treated as retryable. [6.1.1] (PR #1438).

Fixes only impacting 6.1.0+

  • The consistencycheck fdbserver role would repeatedly exit. [6.1.1] (PR #1437)
  • The consistencycheck fdbserver role could proceed at a very slow rate after inserting data into an empty database. [6.1.2] (PR #1452)
  • The background actor which removes redundant teams could leave data unbalanced. [6.1.3] (PR #1479)
  • The transaction log spill-by-reference policy could read too much data from disk. [6.1.5] (PR #1527)
  • Memory tracking trace events could cause the program to crash when called from inside a trace event. [6.1.5] (PR #1541)
  • TLogs will replace a large file with an empty file rather than doing a large truncate operation. [6.1.5] (PR #1545)
  • Fix PR #1545 to work on Windows and Linux. [6.1.6] (PR #1556)
  • Adding a read conflict range for the metadata version key no longer requires read access to the system keys. [6.1.6] (PR #1556)
  • The TLog’s disk queue files would grow indefinitely after a storage server was removed from the cluster. [6.1.8] (PR #1617)