Perpetual Storage Wiggle

This document covers the concept and usage of perpetual storage wiggle.

Summary

Perpetual storage wiggle is a feature that forces the data distributor to constantly build new storage teams when the cluster is healthy. On a high-level note, the process is like this:

Order storage servers by their created time, from oldest to newest. For each storage server n:

  1. Exclude storage server n.
  2. Wait until all data has been moved off the storage server.
  3. Include storage n

Goto step a to wiggle the next storage server.

With a perpetual wiggle, storage migrations will be much less impactful. The wiggler will detect the healthy status based on healthy teams, available disk space and the number of unhealthy relocations. It will pause the wiggle until the cluster is healthy again.

Configuration

You can configure the Perpetual Storage Wiggle via the FDB command line interface.

Note that to have the Perpetual Storage Wiggle change the storage engine type, you must configure storage_migration_type=gradual.

Example commands

Open perpetual storage wiggle: configure perpetual_storage_wiggle=1.

Disable perpetual storage wiggle on the cluster: configure perpetual_storage_wiggle=0.

Open perpetual storage wiggle for only processes matching the given locality key and value: configure perpetual_storage_wiggle=1 perpetual_storage_wiggle_locality=<LOCALITY_KEY>:<LOCALITY_VALUE>.

Disable perpetual storage wiggle locality matching filter, which wiggles all the processes: configure perpetual_storage_wiggle_locality=0.

Monitor

  • The status command will report the IP address of the Storage Server under wiggling.
  • The status json command in the FDB command line interface will show the current perpetual_storage_wiggle value. Plus, the cluster.storage_wiggler field reports storage wiggle details.

Trace Events

PerpetualStorageWiggleOpen shows up when you switch on perpetual storage wiggle, while PerpetualStorageWiggleClose appears when you turn it off;

PerpetualStorageWiggleStart event means the wiggler start wiggling 1 process, it also contains the process id of the wiggling process how many healthy teams we has now. It’s worthy to note ExtraHealthyTeamCount that indicates how many healthy team we need to restart the paused wiggler and HealthyTeamCount. If ExtraHealthyTeamCount keep being larger than team count, then you may need to add more storage server.

PerpetualStorageWigglePause event shows up when the wiggler pause because it detect the cluster is unhealthy;

PerpetualStorageWiggleFinish event indicates the wiggle is done on current process.

In MovingData event, the field PriorityStorageWiggle shows how many relocations are in the queue because storage wiggle.