utils.third_party package

Submodules

utils.third_party.ddp_functional_utils module

utils.third_party.ddp_functional_utils.broadcast(tensor, src, group=None)[source]

Broadcasts the tensor to the whole group.

tensor must have the same number of elements in all processes participating in the collective.

Parameters:

tensor (Tensor) – Data to be sent if src is the rank of current process.
src (int) – Source rank.
group (ProcessGroup, optional) – The process group to work on.

Returns:

Received tensor from the broadcast op.

Return type:

Tensor

utils.third_party.ddp_functional_utils.gather(tensor, dst=0, group=None)[source]

Gathers a list of tensors in a single process.

Parameters:

tensor (Tensor) – Input tensor.
dst (int, optional) – Destination rank (default is 0).
group (ProcessGroup, optional) – The process group to work on.

Returns:

List of appropriately-sized tensors with the gathered data.

Return type:

tuple[Tensor]

utils.third_party.ddp_functional_utils.scatter(tensors, src=0, group=None)[source]

Scatters a list of tensors to all processes in a group.

Each process will receive exactly one tensor and store its data in the tensor argument.

Parameters:

tensors (list[Tensor]) – List of tensors to scatter on the source rank. Receivers must pass ``None`.
src (int, optional) – Source rank (default is 0).
group (ProcessGroup, optional) – The process group to work on.

Returns:

Output tensor from the scatter operation.

Return type:

Tensor

utils.third_party.ddp_functional_utils.reduce(tensor, dst, op=<torch.distributed.distributed_c10d.ReduceOp object>, group=None)[source]

Reduces the tensor data across all machines.

Only the process with rank dst is going to receive the final result.

Parameters:

tensor (Tensor) – Input of the collective.
dst (int) – Destination rank.
op (optional) – One of the values from torch.distributed.ReduceOp enum. Specifies an operation used for element-wise reductions.
group (ProcessGroup, optional) – The process group to work on.

Returns:

Output of the collective.

Return type:

Tensor

utils.third_party.ddp_functional_utils.reduce_scatter(output, input_list, op=<torch.distributed.distributed_c10d.ReduceOp object>, group=None)[source]

Reduces, then scatters a list of tensors to all processes in a group.

Parameters:

output (Tensor) – Output tensor.
input_list (list[Tensor]) – List of tensors to reduce and scatter.
op (optional) – One of the values from torch.distributed.ReduceOp enum. Specifies an operation used for element-wise reductions.
group (ProcessGroup, optional) – The process group to work on.

Returns:

Output of the collective.

Return type:

Tensor

utils.third_party.ddp_functional_utils.all_gather(tensor, group=None)[source]

Gathers tensors from the whole group in a list.

Parameters:

tensor (Tensor) – Tensor to be broadcast from current process.
group (ProcessGroup, optional) – The process group to work on.

Returns:

Output of the collective.

Return type:

tuple([Tensor])

utils.third_party.ddp_functional_utils.all_to_all(output_tensor_list, input_tensor_list, group=None)[source]

Each process scatters list of input tensors to all processes in a group and return gathered list of tensors in output list.

Parameters:

out_tensor_list (list[Tensor]) – list of tensors to gather one per rank.
input_tensor_list (list[Tensor]) – List of tensors to scatter one per rank.
group (ProcessGroup, optional) – The process group to work on.

Returns:

Output of the collective.

Return type:

tuple([Tensor])

utils.third_party.ddp_functional_utils.all_to_all_single(output, input, output_split_sizes=None, input_split_sizes=None, group=None)[source]

Each process splits input tensor and then scatters the split list to all processes in a group. Then concatenate the received tensors from all the processes in the group and return single output tensor.

Parameters:

output (Tensor) – Gathered cancatenated output tensor.
input (Tensor) – Input tensor to scatter.
output_split_sizes – (list[Int], optional): Output split sizes for dim 0 if specified None or empty, dim 0 of output tensor must divide equally by world_size.
input_split_sizes – (list[Int], optional): Input split sizes for dim 0 if specified None or empty, dim 0 of input tensor must divide equally by world_size.

Returns:

Output of the collective.

Return type:

Tensor

utils.third_party.ddp_functional_utils.all_reduce(tensor, op=<torch.distributed.distributed_c10d.ReduceOp object>, group=None)[source]

Reduces the tensor data across all machines in such a way that all get the final result.

After the call the returned tensor is going to be bitwise identical in all processes.

Parameters:

tensor (Tensor) – Input of the collective.
op (optional) – One of the values from torch.distributed.ReduceOp enum. Specifies an operation used for element-wise reductions.
group (ProcessGroup, optional) – The process group to work on.

Returns:

Output of the collective

Return type:

Tensor

utils.third_party package

Submodules

utils.third_party.ddp_functional_utils module

Module contents