Contributing#
We welcome contributions from anyone who wish to improve pfl
for research.
The framework is maintained in pfl-research Git repository and anything related to benchmarking and new research goes into pfl-research Git repository.
Below are guidelines on how to make contributions to pfl
that follow our best practices.
See open issues for community reported bugs and beginner friendly tasks.
Setting up development environment#
The main
branch always contains the latest release of pfl
.
Any new features and improvements should be based off the development branch called develop
.
If you would like to contribute to pfl
, a first step is to set up your development environment by following these steps:
Create a fork of
pfl-research
repo using the github UI: go to pfl-research Git repository and click onFork
. Do not keep your branches in the main pfl-research repo.Clone the forked repository to your local computer.
Execute these commands:
# Add apple/pfl-research as an upstream
git remote add upstream https://github.com/apple/pfl-research
git fetch upstream
# Checkout a new feature branch, called 'new-awesome-feature', based off "develop" from upstream
git checkout upstream/develop -b new-awesome-feature
Prerequisite - you need to have Poetry installed:
curl -sSL https://install.python-poetry.org | python3 -
Prerequisite - if you don’t have the correct Python version installed which is constrainted in pyproject.toml, install it with conda. You only need to have this activated the first time you install the poetry environment.
conda create -n py310 python=3.10
conda activate py310
Install environment:
# If you have the new Python 3.10 environment active, it should be cloned.
poetry env use `which python`
# Install with whatever extras and dev groups you need for development.
poetry install -E pytorch -E tf -E trees --with dev,docs
# Activate environment
poetry shell
This will install pfl
in a reproducible environment identical to everybody working on pfl
, including our CI build pipelines.
Compiling the documentation#
If you have improvements for the documentation, or if you add things that require a change in the documentation, we look forward to finding this in a pull request to the develop
branch.
The documentation is built with Sphinx.
To build the documentation, you need to install pfl
at minimum with the docs
dependency group:
poetry install --only docs
Once all required packages are installed, building is easy:
make docs
Or use the underlying poetry command:
poetry run make -C docs html
Any PR that adds a public interface should have a docstring with Sphinx-style docstring formatting and be imported into the reference documentation at doc/source/reference
.
Contributing to code#
Development process#
We have a few rules with regards to the process of developing a new feature in pfl
.
We follow semantic versioning.
This means that we do not accept any new changes into pfl’s current major version that cause a breaking change to the public API.
See code structure for information about what parts of pfl
are classified as public API.
We don’t have an RFC process in place yet for external contributions. If you wish to implement a new feature that will require at least a minor version bump according to semantic versioning we suggest that you first open an issue and explain the why, what and how and get some initial feedback.
There are 2 active branches in the main repository (and 1 additional when preparing to release):
main - This is the current released version of
pfl
, which is also available on PyPI. The only branches that should be merged into this branch are release branches.develop - This is the working branch to base from when developing a new feature and PRs should be directed toward this branch as long as they are compatible with the current major version of
pfl
.release-x.y.z - These branches are made when we prepare for a release, and are merge commited into main.
Feature PRs merged into develop and release-x.y.z should be squash merged. Merging release-x.y.z → main and main → develop should be regular merge commits such that the history is kept.
Standardizing the code#
We like to keep the quality of the code high and reduce the ambiguity in how it should be formatted. For that, we use yapf, ruff and mypy.
Following Google’s Python styleguide is preferred, but it is acceptable to deviate from it on your own discretion because we do not strictly require it.
Unlike Google’s styleguide, we use Sphinx-style docstring formatting.
Do not include :type and :rtype, since that is replaced by type hints.
For consistency, code should be formatted with yapf of the version specified in pyproject.toml.
yapf automates a subset of the rules mentioned in the styleguide.
To run yapf, integrate it in your IDE or manually run the following command in the root directory of pfl-research
:
make yapf
Or use the underlying poetry command:
poetry run yapf -i --recursive --parallel pfl/ tests/
pfl
uses type hints to ensure the design is coherent and to provide valuable documentation about parameters of interfaces.
We use mypy for the type checking itself.
poetry run mypy
We also use ruff for consistency.
The settings to use for linting pfl
is included in pyproject.toml in the root directory.
You can integrate ruff with your IDE or manually run the following command to check for linting issues in all of pfl
and perform type checking.
poetry run ruff check --diff pfl/ tests/
In summary, the checks your branch needs to pass to be able to contribute the changes are pytest, yapf, mypy, ruff and successfully compiling the documentation. Don’t worry if you might have missed any of these, there is a CI build pipeline for each check that will run when you submit your PR.
Testing#
We like making sure that your code works and keeps working even if we make changes. Therefore, we have unit tests for anything that is reasonably unit-testable. To be merged, the code in the pull request should pass all tests. To run all tests locally, install pytest and run it in the root dir:
make test
Or use the underlying poetry command:
poetry run pytest -svx
To test all different environments pfl
is expected to be compatible with,
install tox and run tox
in the root dir.
Tests are all located in the tests
directory.
Larger functional/integration tests are located in tests/integration
and are currently only required for features related to multi-worker and multi-process simulations.
New unit tests should be placed in a file in tests
which mirrors the path of the feature in the pfl
module.
The class and method names should also be mirrored. Example:
# pfl/model/my_model.py
class MyModel:
def my_method(self):
# ... implementation
# tests/model/test_my_model.py
class TestMyModel:
def test_my_method(self):
# ... test implementation
Try to re-use existing mocked components from the conftest.py
files before you decide to create a new mocked component.
If your test will only be able to run on MacOS (currently, CI build pipelines to not support MacOS), use the decorator @pytest.mark.macos
.
The test will then only be enabled when run with:
poetry run pytest --macos
Package dependencies#
It is not trivial to manage a package which supports use-cases using TF, PyTorch or none of these packages. We keep dependencies of different deep learning frameworks separate with install extras, see pyproject.toml. If your feature has a new dependency but it is only relevant when used in combination with a particular deep learning framework, then put the dependency in the appropriate install extra. In pyproject.toml, constrain the new dependency from the earliest known minor version, up to but not including the next major version. Thereafter, update the lock file to update pinned versions for local development and CI:
poetry lock --no-update
This will ensure both developers and CI have reproducible builds.
Code structure#
Note
When implementing components for your own research without the plan to contribute back to pfl
, you can of course import e.g. torch
anywhere you want and don’t have to adhere to these rules.
Everything in pfl.internal
is not considered a part of the public API, everything else in the pfl
module is.
This means that you are free to make breaking interface changes if the code is located in pfl.internal
, but not to other components because of semantic versioning.
Users should be able to run pfl
with 0 or 1 deep learning frameworks without the need to install all deep learning frameworks.
For this we keep the TF and PyTorch code encapsulated in a few modules, which should not be automatically imported into __init__.py
files.
Code for each deep learning framework should be encapsulated in 1 dataset module, 1 model module, 1 ops module and 1 bridge module.
Note
For example, all PyTorch code is encapsulated in:
pfl.data.pytorch
- PyTorch native dataset supportpfl.model.pytorch
- PyTorch modelspfl.internal.ops.pytorch_ops
- low-level PyTorch-specific operations. This module does not depend on any other components inpfl
.pfl.internal.bridge.pytorch
- Concrete PyTorch implementations of interfaces that are used to inject framework-specific code into algorithms.pfl
primitives such asMetrics
orTrainingStatistics
may be used here.
To inject code from the particular deep learning framework currently in use, put it in the ops module if it is a single function that has no other dependencies, or make a new interface in pfl.internal.bridge
if it is a collection of functions that belong together to make a certain feature work and is optional to implement for any one deep learning framework .
How to use the current selected ops module:
from pfl.internal.ops.selector import get_default_framework_module as ops
ops().get_shape(tensor)
Making a pull request#
Once the code is standardized and tests are implemented in the tests
directory, you are ready to do a pull request.
For that you just need to push the files you modified to the branch of your forked repository.
git add files-you-changed
git commit -m 'Message explaining your changes'
git push origin new-awesome-feature
Finally, you can go do a pull request from the Github UI: go to https://github.com/apple/pfl-research/compare
and click on Create pull request
. Then, make sure that you reference any open issues that the PR solves and leave an informative message about the changes made in the new-awesome-feature
branch.
The next step is to wait for the CI builds to pass (progress is shown at the bottom of the PR). Contributors with admin status may have to kickstart the CI for you.
Checklist for a PR to pass review
This is a checklist for the reviewer to go through in addition to reviewing the code quality. The contributor should take this checklist into account to ensure a smooth and quick process for getting the PR merged.
Corresponding documentation should also be updated: reference docstring and tutorials.
Unit tests should have good coverage: positive test (should pass), negative test (should fail), test for numerical stability if relevant, statistical test if relevant.
If relevant, add test case to any integration test this PR can affect (in
tests/integration
).Check if pfl-research Git repository repo should be updated along with this PR.
Pass all PR builds (pytest, yapf, ruff, mypy, build wheel, build docs). This is run by a pfl admin on the contributor’s behalf.
A small description of the change is included in CHANGELOG.md if it is relevant to notify users about it.