Release Notes
6.1.13
Loading a 6.1 or newer
fdb_c
library as a secondary client using the multi-version client could lead to an infinite recursion when run with API versions older than 610. (PR #2169)Using C API functions that were removed in 6.1 when using API version 610 or above now results in a compilation error. (PR #2169)
fdbrestore
commands other thanstart
required a default cluster file to be found but did not actually use it. (PR #1912).
6.1.12
Fixes
Fixed a thread safety issue while writing large keys or values. (Issue #1846)
An untracked data distributor could prevent a newly recruited data distributor from being started. (PR #1849)
6.1.11
Fixes
Machines which were added to a cluster immediately after the cluster was upgraded to 6.1 would not be given data. (PR #1764)
6.1.10
Performance
Improved the recovery speed of storage servers with large amount of data. (PR #1700)
Fixes
The
fdbrestore
commandsabort
,wait
, andstatus
would use a default cluster file instead of the destination cluster file argument. (PR #1701)
6.1.9
Fixes
Sometimes a minority of coordinators would not converge to the leader. (PR #1649)
HTTP responses indicating a server-side error are no longer expected to contain a ResponseID header. (PR #1651)
6.1.8
Features
Improved replication mechanism using a new hierarchical technique that significantly reduces the frequency of data loss events even when multiple fault-tolerance zones permanently fail at the same time. After upgrading to 6.1 clusters will experience a low level of background data movement to store data in accordance with the new policy. (PR #964).
Added a background actor to remove redundant teams from team collection so that the healthy team number is guaranteed to not exceed the desired number. (PR #1139)
Get read version, read, and commit requests are counted and aggregated by server-side latency in configurable latency bands and output in JSON status. (PR #1084)
Added configuration option to choose log spilling implementation (PR #1160)
Added configuration option to choose log system implementation (PR #1160)
Batch priority transactions are now limited separately by ratekeeper and will be throttled at lower levels of cluster saturation. This makes it possible to run a more intense background load at saturation without significantly affecting normal priority transactions. It is still recommended not to run excessive loads at batch priority. (PR #1198)
Restore now requires the destination cluster to be specified explicitly to avoid confusion. (PR #1240)
Restore now accepts a timestamp that can be used to determine the restore version if the original cluster is available. (PR #1240)
Backup
status
anddescribe
commands now have a--json
output option. (PR #1248)Separated data distribution from the master into its own role. (PR #1062)
Separated ratekeeper from the master into its own role. (PR #1176)
Added a
CompareAndClear
atomic op that clears a key if its value matches the supplied value. (PR #1105)Added support for IPv6. (PR #1178)
FDB can now simultaneously listen to TLS and unencrypted ports to facilitate smoother migration to and from TLS. (PR #1157)
Added
DISABLE_POSIX_KERNEL_AIO
knob to fallback to libeio instead of kernel async I/O (KAIO) for systems that do not support KAIO or O_DIRECT flag. (PR #1283)Added support for configuring the cluster to use the primary and remote DC’s as satellites. (PR #1320)
Added support for restoring multiple key ranges in a single restore job. (PR #1190)
Deprecated transaction option
TRANSACTION_LOGGING_ENABLE
. Added two new transaction optionsDEBUG_TRANSACTION_IDENTIFIER
andLOG_TRANSACTION
that sets an identifier for the transaction and logs the transaction to the trace file respectively. (PR #1200)Clients can now specify default transaction timeouts and retry limits for all transactions through a database option. (Issue #775)
The “timeout”, “max retry delay”, and “retry limit” transaction options are no longer reset when the transaction is reset after a call to
onError
(as of API version 610). (Issue #775)Added the
force_recovery_with_data_loss
command tofdbcli
. When a cluster is configured with usable_regions=2, this command will force the database to recover in the remote region. (PR #1168)Added a limit to the number of status requests the cluster controller will handle. (PR #1093) (submitted by tclinken)
Added a
coordinator
process class. Processes with this class can only be used as a coordinator, andcoordinators auto
will prefer to choose processes of this class. (PR #1069) (submitted by tclinken)The
consistencycheck
fdbserver role will check the entire database at most once every week. (PR #1126)Added the metadata version key (
\xff/metadataVersion
). The value of this key is sent with every read version. It is intended to help clients cache rarely changing metadata. (PR #1213)The
fdbdr switch
command verifies adr_agent
exists in both directions. (Issue #1220)Transaction logs that cannot commit to disk for more than 5 seconds are marked as degraded. The cluster controller will prefer to recruit transaction logs on other processes before using degraded processes. (Issue #690)
The
memory
storage engine configuration now uses the ssd engine for transaction log spilling. Transaction log spilling only happens when the transaction logs are using too much memory, so using the memory storage engine for this purpose can cause the process to run out of memory. Existing clusters will NOT automatically change their configuration. (PR #1314)Trace logs can be output as JSON instead of XML using the
--trace_format
command line option. (PR #976) (by atn34)Added
modify
command to fdbbackup for modifying parameters of a running backup. (PR #1237)Added
header
parameter to blobstore backup URLs for setting custom HTTP headers. (PR #1237)Added the
maintenance
command tofdbcli
. This command will stop data distribution from moving data away from processes with a specified zoneID. (PR #1397)Added the
three_data_hall_fallback
configuration, which can be used to drop storage replicas in a dead data hall. [6.1.1] (PR #1422)
Performance
Increased the get read version batch size in the client. This change reduces the load on the proxies when doing many transactions with only a few operations per transaction. (PR #1311)
Clients no longer attempt to connect to the master during recovery. (PR #1317)
Increase the rate that deleted pages are made available for reuse in the SQLite storage engine. Rename and add knobs to provide more control over this process. [6.1.3] (PR #1485)
SQLite page files now grow and shrink in chunks based on a knob which defaults to an effective chunk size of 100MB. [6.1.4] (PR #1482) (PR #1499)
Reduced the rate at which data is moved between servers, to reduce the impact a failure has on cluster performance. [6.1.4] (PR #1499)
Avoid closing saturated network connections which have not received ping packets. [6.1.7] (PR #1601)
Fixes
Python: Creating a
SingleFloat
for the tuple layer didn’t work with integers. (PR #1216)In some cases, calling
OnError
with a non-retryable error would partially reset a transaction. As of API version 610, the transaction will no longer be reset in these cases and will instead put the transaction into an error state. (PR #1298)Standardized datetime string format across all backup and restore command options and outputs. (PR #1248)
Read workload status metrics would disappear when a storage server was missing. (PR #1348)
The
coordinators auto
command could recruit multiple coordinators with the same zone ID. (Issue #988)The data version of a cluster after a restore could have been lower than the restore version, making versionstamp operations get smaller. (PR #1213)
Fixed a few thread safety issues with slow task profiling. (PR #1085)
Changing the class of a process would not change its preference for becoming the cluster controller. (PR #1350)
The Go bindings reported an incorrect required version when trying to load an incompatible fdb_c library. (PR #1053)
The
include
command in fdbcli would falsely include all machines with IP addresses that have the included IP address as a prefix (for exampleinclude 1.0.0.1
would also include1.0.0.10
). (PR #1121)Restore could crash when reading a file that ends on a block boundary (1MB default). (PR #1205)
Java: Successful commits and range reads no longer create
FDBException
objects, which avoids wasting resources and reduces memory pressure. (Issue #1235)Windows: Fixed a crash when deleting files. (Issue #1380) (by KrzysFR)
Starting a restore on a tag already in-use would hang and the process would eventually run out of memory. (PR #1394)
The
proxy_memory_limit_exceeded
error was treated as retryable, butfdb_error_predicate
returned that it is not retryable. (PR #1438).Consistency check could report inaccurate shard size estimates if there were enough keys with large values and a small number of keys with small values. [6.1.3] (PR #1468).
Storage servers could not rejoin the cluster when the proxies were saturated. [6.1.4] (PR #1486) (PR #1499)
The
configure
command infdbcli
returned successfully even when the configuration was not changed for some error types. [6.1.4] (PR #1509)Safety protections in the
configure
command infdbcli
would trigger spuriously when changing betweenthree_datacenter
replication and a region configuration. [6.1.4] (PR #1509)Status could report an incorrect reason for ongoing data movement. [6.1.5] (PR #1544)
Storage servers were considered failed as soon as they were rebooted, instead of waiting to see if they rejoin the cluster. [6.1.8] (PR #1618)
Status
Report the number of connected coordinators for each client. This aids in monitoring client TLS support when enabling TLS on a live cluster. (PR #1222)
Degraded processes are reported in
status json
. (Issue #690)
Bindings
API version updated to 610. See the API version upgrade guide for upgrade details.
The API to create a database has been simplified across the bindings. All changes are backward compatible with previous API versions, with one exception in Java noted below. (PR #942)
C:
FDBCluster
objects and related methods (fdb_create_cluster
,fdb_cluster_create_database
,fdb_cluster_set_option
,fdb_cluster_destroy
,fdb_future_get_cluster
) have been removed. (PR #942)C: Added
fdb_create_database
that creates a newFDBDatabase
object synchronously and removedfdb_future_get_database
. (PR #942)Python: Removed
fdb.init
,fdb.create_cluster
, andfdb.Cluster
.fdb.open
no longer accepts adatabase_name
parameter. (PR #942)Java: Deprecated
FDB.createCluster
andCluster
. The preferred way to get aDatabase
is by usingFDB.open
, which should work in both new and old API versions. (PR #942)Java: Removed
Cluster(long cPtr, Executor executor)
constructor. This is API breaking for any code that has subclassed theCluster
class and is not protected by API versioning. (PR #942)Java: Several methods relevant to read-only transactions have been moved into the
ReadTransaction
interface.Java: Tuples now cache previous hash codes and equality checking no longer requires packing the underlying Tuples. (PR #1166)
Java: Tuple performance has been improved to use fewer allocations when packing and unpacking. (Issue #1206)
Java: Unpacking a Tuple with a byte array or string that is missing the end-of-string character now throws an error. (Issue #671)
Java: Unpacking a Tuple constrained to a subset of the underlying array now throws an error when it encounters a truncated integer. (Issue #672)
Ruby: Removed
FDB.init
,FDB.create_cluster
, andFDB.Cluster
.FDB.open
no longer accepts adatabase_name
parameter. (PR #942)Golang: Deprecated
fdb.StartNetwork
,fdb.Open
,fdb.MustOpen
, andfdb.CreateCluster
and addedfdb.OpenDatabase
andfdb.MustOpenDatabase
. The preferred way to start the network and get aDatabase
is by usingFDB.OpenDatabase
orFDB.OpenDefault
. (PR #942)Flow: Removed
API::createCluster
andCluster
and addedAPI::createDatabase
. The new way to get aDatabase
is by usingAPI::createDatabase
. (PR #942) (PR #1215)Flow: Changed
DatabaseContext
toDatabase
, andAPI::createDatabase
returnsReference<Database>
instead ofReference<<DatabaseContext>
. (PR #1215)Flow: Converted
Transaction
into an interface and moved its implementation into an internal class. Transactions should now be created usingDatabase::createTransaction(db)
. (PR #1215)Flow: Added
ReadTransaction
interface that allows only read operations on a transaction. TheTransaction
interface inherits fromReadTransaction
and can be used when aReadTransaction
is required. (PR #1215)Flow: Changed
Transaction::setVersion
toTransaction::setReadVersion
. (PR #1215)Flow: On update to this version of the Flow bindings, client code will fail to build due to the changes in the API, irrespective of the API version used. Client code must be updated to use the new bindings API. These changes affect the bindings only and won’t impact compatibility with different versions of the cluster. (PR #1215)
Golang: Added
fdb.Printable
to print a human-readable string for a given byte array. AddKey.String()
, which converts theKey
to astring
using thePrintable
function. (PR #1010) (submitted by pjvds)Golang: Tuples now support
Versionstamp
operations. (PR #1187) (submitted by ryanworl)Python: Python signal handling didn’t work when waiting on a future. In particular, pressing Ctrl-C would not successfully interrupt the program. (PR #1138)
Other Changes
Migrated to Boost 1.67. (PR #1242)
IPv4 address in trace log filename is no longer zero-padded. (PR #1157)
The
process_behind
error can now be thrown by clients and is treated as retryable. [6.1.1] (PR #1438).
Fixes only impacting 6.1.0+
The
consistencycheck
fdbserver role would repeatedly exit. [6.1.1] (PR #1437)The
consistencycheck
fdbserver role could proceed at a very slow rate after inserting data into an empty database. [6.1.2] (PR #1452)The background actor which removes redundant teams could leave data unbalanced. [6.1.3] (PR #1479)
The transaction log spill-by-reference policy could read too much data from disk. [6.1.5] (PR #1527)
Memory tracking trace events could cause the program to crash when called from inside a trace event. [6.1.5] (PR #1541)
TLogs will replace a large file with an empty file rather than doing a large truncate operation. [6.1.5] (PR #1545)
Fix PR #1545 to work on Windows and Linux. [6.1.6] (PR #1556)
Adding a read conflict range for the metadata version key no longer requires read access to the system keys. [6.1.6] (PR #1556)
The TLog’s disk queue files would grow indefinitely after a storage server was removed from the cluster. [6.1.8] (PR #1617)