Internal Dev Tools
Code Probes
Code probes are a mechanism in FDB to prove that certain code-paths are being tested under the right conditions. They differ from code coverage in multiple ways (explained below).
The general format of a code probe is:
CODE_PROBE(<condition>, "Comment", [annotations...]);
A simple example of a code probe could look as follows:
CODE_PROBE(self->forceRecovery, "Resolver detects forced recovery", probe::context::sim2);
On a very high level, the above code will indicate that whenever this line is executed and self->forceRecovery
is true
, we ran into some interesting case. In addition this probe is also annotated with probe::context::sim2
. This indicates that we expect this code to be eventually hit in simulation.
By default, FDB simply will write a trace-line when this code is hit and the condition is true
. If the code is never hit, the simulator will, at the end of the run, print the code probe but set the covered
field to false
. This all happens in the context of a single simulation run (fdbserver
doesn’t have a concept of ensembles). This information is written into the log file. TestHarness
(see below) will then use this information to write code probe statistics to the ensemble in the Joshua cluster (if the test is run in Joshua).
We expect that ALL code probes will be hit in a nightly run. In the future we can potentially use this feature for other things (like instructing the simulator to do an extensive search starting when one of these probes is being hit).
In addition to context
annotations, users can also define and pass assertions. For example:
CODE_PROBE(condition, "Some comment", assert::simOnly);
These will add an assertion to the code. In addition to that, the simulator will not print missed code probes that asserted that the probe won’t be hit in simulation.
Test Harness
TestHarness is our primary testing tool. It has multiple jobs:
Running: It can run a test in Joshua.
Statistics: It will choose a test to run based on previous runs (within the same ensemble) spent CPU time for each test. It does that by writing statistics about the test at the end of each run.
Reporting: After an ensemble has finished (or while it is running),
TestHarness
can be used to generate a report inxml
orjson
.
Test Harness can be found in the FDB source repository under contrib/TestHarness2
. It has a weak dependency to joshua (if Test Harness can find joshua it will report back about failed tests, otherwise it will just print out general statistics about the ensemble). Joshua will call Test Harness as follows:
python3 -m test_harness.app -s ${JOSHUA_SEED} --old-binaries-path ${OLDBINDIR}
Here the seed is a random number generated by joshua and OLDBINDIR
is a directory path where the old fdb binaries can be found (this is needed for restart tests). If one wants to retry a test they can pass the previous joshua seed, a directory path that has exactly the same content as OLDBINARYDIR
, plus the reported statistics to the test harness app. This should then re-run the same code as before.
In order to figure out what command line arguments test_harness.app
(and test_harness.results
) accepts, one can check the contents of contrib/TestHarness2/test_harness/config.py
.
Reporting
After a joshua ensemble completed, test_harness.results
can be used in order to get a report on the ensemble. This will include, by default, a list of all failed tests (similar to joshua tail --errors
, though in a more human readable file). For completed ensemble it will also print code probes that weren’t hit often enough. An ensemble is considered to be successful if no simulation runs completed with an error AND all code probes have been hit sufficiently often.