Algorithms#

Abstract base classes#

Algorithms of base class FederatedAlgorithm implement local training of models and processing of central model updates. It is these two parts of the end-to-end training loop that define the behaviour of a specific federated algorithm. The remaining parts define the private federated learning framework itself and do not change with different training algorithms.

class pfl.algorithm.base.FederatedAlgorithm#

Base class for federated algorithms.

Federated algorithms consist of a computation on multiple clients and a computation on the server that processes the results from the clients. This class is where all pieces of computation can be implemented. It enforces a specific structure so that it is possible to switch between simulation and live training with minimal changes.

A subclass of FederatedAlgorithm should provide process_aggregated_statistics, which is the server-side part of the computation. In addition, it is useful to run simulations with client-side computation locally. To do this, simulate_one_user can implement the client-side computation in this same class. Finally, the run method performs the orchestration.

The object connecting the server-side and the client-side computation is the backend, passed into run. In simulation, this may be a SimulatedBackend which calls simulate_one_user to perform simulation, In live training, the backend will call out to real devices.

When subclassing, new parameters can either go into constructor or by subclassing AlgorithmHyperParams. As a rule of thumb, subclass algorithm parameters if this new algorithm should be suitable for subclassing as well.

save(dir_path)#

Save state of object to disk. Should be able to interpret saved state as a checkpoint that can be restored with load.

Parameters:

dir_path (str) – Directory on disk to store state.

Return type:

None

load(dir_path)#

Load checkpoint from disk, which is the state previously saved with save.

Parameters:

dir_path (str) – Path to root directory where checkpoint can be loaded from. Should be same path as used with save.

Return type:

None

abstract process_aggregated_statistics(central_context, aggregate_metrics, model, statistics)#

The server-side part of the computation.

This should process aggregated model statistics and use them to update a model. If the algorithm performs multiple training central iterations at once, which can happen if the particular algorithm returns multiple contexts for training by get_next_central_contexts, then this method is called once for each central-context-statistics combination.

Parameters:
  • central_context (CentralContext[TypeVar(AlgorithmHyperParamsType, bound= AlgorithmHyperParams), TypeVar(ModelHyperParamsType, bound= ModelHyperParams)]) – Settings used to gather the metrics and statistics also given as input.

  • aggregate_metrics (Metrics) – A Metrics object with aggregated metrics accumulated from local training on users.

  • model (TypeVar(ModelType, bound= Model)) – The model in its state before the aggregate statistics were processed.

  • statistics (TypeVar(StatisticsType, bound= TrainingStatistics)) – Aggregated model statistics from a cohort of devices (simulated or real).

Return type:

Tuple[TypeVar(ModelType, bound= Model), Metrics]

Returns:

A metrics object with new metrics generated from this model update. Do not include any of the aggregate_metrics!

process_aggregated_statistics_from_all_contexts(stats_context_pairs, aggregate_metrics, model)#

Override this method if the algorithm you are developing produces aggregated statistics from multiple cohorts using multiple central contexts each central iteration. Usually, algorithms only require one aggregated statistics from one cohort each central iteration, hence by default this method simply forwards that single result to process_aggregated_statistics.

Parameters:
  • stats_context_pairs (Tuple[Tuple[CentralContext[TypeVar(AlgorithmHyperParamsType, bound= AlgorithmHyperParams), TypeVar(ModelHyperParamsType, bound= ModelHyperParams)], TypeVar(StatisticsType, bound= TrainingStatistics)], ...]) – Tuple of pairs of central context and its model statistics. Each model statistics were accumulated from a cohort which used the corresponding central context.

  • aggregate_metrics (Metrics) – A Metrics object with aggregated metrics accumulated from local training on users.

  • model (TypeVar(ModelType, bound= Model)) – The model in its state before the aggregate statistics were processed.

Return type:

Tuple[TypeVar(ModelType, bound= Model), Metrics]

Returns:

A metrics object with new metrics generated from this model update. Do not include any of the aggregate_metrics!

simulate_one_user(model, user_dataset, central_context)#

Simulate the client-side part of the computation.

This should train a single user on its dataset (simulated as user_dataset) and return the model statistics. The statistics should be in such a form that they can be aggregated over users. For a cohort of multiple users, this method will be called multiple times.

Parameters:
  • model (TypeVar(ModelType, bound= Model)) – The current state of the model.

  • user_dataset (TypeVar(AbstractDatasetType, bound= AbstractDataset)) – The data simulated for a single user.

  • central_context (CentralContext[TypeVar(AlgorithmHyperParamsType, bound= AlgorithmHyperParams), TypeVar(ModelHyperParamsType, bound= ModelHyperParams)]) – Settings to use for this round. Contains the model params used for training/evaluating a user’s model.

Return type:

Tuple[Optional[TypeVar(StatisticsType, bound= TrainingStatistics)], Metrics]

Returns:

A tuple (statistics, metrics), with model statistics and metrics generated from the training of a single user. Both statistics and metrics will be aggregated and the aggregate will be passed to process_aggregated_statistics.

run(algorithm_params, backend, model, model_train_params, model_eval_params=None, callbacks=None, *, send_metrics_to_platform=True)#

Orchestrate the federated computation.

Parameters:
  • backend (Backend) – The Backend that aggregates the contributions from individual users. This may be simulated (in which case the backend will call simulate_one_user), or it may perform for live training (in which case your client code will be called).

  • model (TypeVar(ModelType, bound= Model)) – The model to train.

  • callbacks (Optional[List[TrainingProcessCallback]]) – A list of callbacks for hooking into the training loop, potentially performing complementary actions to the model training, e.g. central evaluation or training parameter schemes.

  • send_metrics_to_platform (bool) – Allow the platform to process the aggregated metrics after each central iteration.

Return type:

TypeVar(ModelType, bound= Model)

Returns:

The trained model. It may be the same object as given in the input, or it may be different.

class pfl.algorithm.base.NNAlgorithmParams(central_num_iterations, evaluation_frequency, train_cohort_size, val_cohort_size)#

Parameters for algorithms that involve training neural networks.

Parameters:
  • central_num_iterations (int) – Total number of central iterations.

  • evaluation_frequency (int) – Frequency with which the model will be evaluated (in terms of central iterations).

  • train_cohort_size (Union[HyperParam[int], int]) – Cohort size for training.

  • val_cohort_size (Optional[int]) – Cohort size for evaluation on validation users.

class pfl.algorithm.base.FederatedNNAlgorithm#
simulate_one_user(model, user_dataset, central_context)#

If population is Population.TRAIN, trains one user and returns the model difference before and after training. Also evaluates the performance before and after training the user. Metrics with the postfix “after local training” measure the performance after training the user. If population is not Population.TRAIN, does only evaluation.

Return type:

Tuple[Optional[TypeVar(StatisticsType, bound= TrainingStatistics)], Metrics]

class pfl.algorithm.base.PersonalizedNNAlgorithmParams(central_num_iterations, evaluation_frequency, train_cohort_size, val_cohort_size, val_split_fraction, min_train_size, min_val_size)#

Base parameter config for federated personalization algorithms.

Has same parameters as NNAlgorithmParams in addition to:

Parameters:
  • val_split_fraction (float) – Parameter to Dataset.split. Defines the fraction of data to use for training. A good starting value can be 0.8 in many cases, i.e. split into 80% training data and 20% validation data.

  • min_train_size (int) – Parameter to Dataset.split. Defines the minimum number of data samples for training. A good starting value can be 1 in many cases.

  • min_val_size (int) – Parameter to Dataset.split. Defines the minimum number of data samples for validation. A good starting value can be 1 in many cases.

class pfl.algorithm.base.PersonalizedNNAlgorithm#
simulate_one_user(model, user_dataset, central_context)#

If trains one user and returns the model difference before and after training. Also evaluates the performance before and after training the user. Metrics with the postfix “after local training” measure the performance after training the user. Unlike in FederatedNNAlgorithm, the training happens whether the user is in the train population or not.

The performance after training on the val population is a measurement of how well the model can personalize to one user.

Return type:

Tuple[Optional[TypeVar(StatisticsType, bound= TrainingStatistics)], Metrics]

Federated learning#

Federated averaging#

class pfl.algorithm.federated_averaging.FederatedAveraging#

Defines the Federated Averaging algorithm by providing the implementation as hooks into the training process.

process_aggregated_statistics(central_context, aggregate_metrics, model, statistics)#

Average the statistics and update the model.

Return type:

Tuple[TypeVar(StatefulModelType, bound= StatefulModel), Metrics]

FedProx#

class pfl.algorithm.fedprox.FedProx#

FedProx algorithm, introduced by T. Li. et al. - Federated Optimization in Heterogeneous Networks (https://arxiv.org/pdf/1812.06127.pdf).

Adds a proximal term to loss during local training which is a soft constraint on the local model not diverging too far from current global model.

class pfl.algorithm.fedprox.FedProxParams(central_num_iterations, evaluation_frequency, train_cohort_size, val_cohort_size, mu)#

Parameters for FedProx algorithm.

Parameters:

mu (Union[HyperParam[float], float]) – Scales the additional loss term added by FedProx. Only values [0,1] make sense. A value of 0 means the additional loss term has no effect and this algorithm regresses back to Federated Averaging.

class pfl.algorithm.fedprox.AdaptMuOnMetricCallback(metric_name, adapt_frequency, initial_value=0.0, step_size=0.1, decrease_mu_after_consecutive_improvements=1)#

Adaptive scalar for proximal term (mu) in FedProx algorithm, described in Appendix C.3.3 in T. Li. et al. - Federated Optimization in Heterogeneous Networks (https://arxiv.org/pdf/1812.06127.pdf).

Set an instance of this as mu in FedProxParams and add it to the algorithm’s list of callbacks to make mu adaptive.

Parameters:
  • metric_name (Union[str, StringMetricName]) – The metric name to use for adapting mu. Lower should be better.

  • adapt_frequency (int) –

    Adapt mu according to the rules every this number of iterations.

    Note

    If you adapt on a metric that is not reported every round, e.g. central iteration, make sure that adapting mu is done at the same frequency such that the metric is available in the aggregated metrics.

  • initial_value (float) – Initial value for mu. Appendix C.3.3 suggest 0.0 for homogeneous federated datasets and 1.0 for heterogeneous federated datasets.

  • step_size (float) – How much to increase or decrease mu each step it is calibrated. T. Li. et al. suggests 0.1.

  • decrease_mu_after_consecutive_improvements (int) – Decreaase mu if metric_value is lower this many times in a row.

value()#

The current state (inner value) of the hyperparameter.

Return type:

float

after_central_iteration(aggregate_metrics, model, *, central_iteration)#

Extend adaptive mu as a callback such that it can be calibrated after each central iteration.

Return type:

Tuple[bool, Metrics]

Meta-learning / personalisation#

class pfl.algorithm.reptile.Reptile#

Defines the Reptile algorithm by providing the implementation as hooks into the training process.

Expectation maximization GMM#

class pfl.algorithm.expectation_maximization_gmm.EMMGMMHyperParams(central_num_iterations, evaluation_frequency, val_cohort_size, compute_cohort_size, compute_new_num_components)#

Parameters for EM GMM algorithms.

Parameters:
  • central_num_iterations (int) – Total number of central iterations.

  • evaluation_frequency (int) – Frequency with which the model will be evaluated (in terms of central iterations).

  • val_cohort_size (Optional[int]) – Cohort size for evaluation on validation users.

  • compute_cohort_size (Callable[[int, int], int]) – Function (int, int) -> int that receives (iteration, num_components) and returns the desired cohort size for that iteration.

  • compute_new_num_components (Callable[[int, int, int], int]) – Function (int, int, int) -> int that computes the desired number of components. This is passed (iteration, num_iterations_since_last_mix_up, num_components). It is called after the iteration. The number of components that it returns must be at least num_components. If the return value is greater, components with the greatest weight will be split, so that they can differentiate themselves in the next round of training. If this parameter is set to None, then the number of components will stay constant. It is often easy to generate this function using make_compute_new_num_components().

pfl.algorithm.expectation_maximization_gmm.make_compute_new_num_components(num_initial_iterations, mix_up_interval, max_num_components=None, step_components=0, fraction_new_components=0.0)#

Make a function to compute the desired number of components to generate. This can be passed to GMMHyperParams as compute_new_num_components.

It is often useful in training GMMs to increase the number of components incrementally during training (“mixing up”). This is done by splitting up components every few iterations. During the iterations in between, the components can settle.

Parameters:
  • num_initial_iterations (int) – The number of iterations to wait until mixing up starts.

  • mix_up_interval (int) – The number of iterations to wait between increasing the number of components.

  • max_num_components (Optional[int]) – The number of components after which no more components should be added. If None, then there is no limit.

  • step_components (int) – The number of components to add when mixing up. If both fraction_new_components and this are given, the max of the two results is used.

  • fraction_new_components (float) – The fraction of current components to add when mixing up. To compute the actual number, this value is rounded down. It is usually useful to supply non-zero step_components to, so that even when the current number is small, mixing up happens.

Return type:

Callable[[int, int, int], int]

class pfl.algorithm.expectation_maximization_gmm.ExpectationMaximizationGMM#

Train a GaussianMixtureModel with expectation–maximization. for private federated learning, this uses sufficient statistic perturbation.

process_aggregated_statistics(central_context, aggregate_metrics, model, statistics)#

The server-side part of the computation.

This should process aggregated model statistics and use them to update a model. If the algorithm performs multiple training central iterations at once, which can happen if the particular algorithm returns multiple contexts for training by get_next_central_contexts, then this method is called once for each central-context-statistics combination.

Parameters:
  • central_context (CentralContext) – Settings used to gather the metrics and statistics also given as input.

  • aggregate_metrics (Metrics) – A Metrics object with aggregated metrics accumulated from local training on users.

  • model (GaussianMixtureModel) – The model in its state before the aggregate statistics were processed.

  • statistics (MappedVectorStatistics) – Aggregated model statistics from a cohort of devices (simulated or real).

Return type:

Tuple[GaussianMixtureModel, Metrics]

Returns:

A metrics object with new metrics generated from this model update. Do not include any of the aggregate_metrics!

simulate_one_user(model, user_dataset, central_context)#

Simulate the client-side part of the computation.

This should train a single user on its dataset (simulated as user_dataset) and return the model statistics. The statistics should be in such a form that they can be aggregated over users. For a cohort of multiple users, this method will be called multiple times.

Parameters:
  • model (GaussianMixtureModel) – The current state of the model.

  • user_dataset (TypeVar(AbstractDatasetType, bound= AbstractDataset)) – The data simulated for a single user.

  • central_context (CentralContext) – Settings to use for this round. Contains the model params used for training/evaluating a user’s model.

Return type:

Tuple[Optional[MappedVectorStatistics], Metrics]

Returns:

A tuple (statistics, metrics), with model statistics and metrics generated from the training of a single user. Both statistics and metrics will be aggregated and the aggregate will be passed to process_aggregated_statistics.

Utils#

pfl.algorithm.algorithm_utils.run_train_eval(algorithm, backend, model, central_contexts)#

Run training/evaluation and gather aggregated model updates and metrics for multiple rounds given multiple contexts. Used in FederatedAlgorithm.run.

Parameters:
  • algorithm – The FederatedAlgorithm to use. Implements local training behaviour and central model update behaviour. Note that some algorithms are incompatible with some models.

  • backend (Backend) – The Backend to use to distribute training of individual users and aggregate the results. Can be used for simulation or live training with the infrastructure.

  • model (Model) – The model to train.

  • central_contexts (Tuple[CentralContext, ...]) – A tuple of multiple contexts. Gather aggregated results once for each of the contexts.

Return type:

List[Tuple[TypeVar(StatisticsType, bound= TrainingStatistics), Metrics]]

Returns:

Aggregated model updates and metrics.