data.transforms package

Subpackages

data.transforms.audio_aux package

Submodules

data.transforms.audio module

class data.transforms.audio.Gain(opts, *args, **kwargs)[source]

Bases: BaseTransformation

This class implements gain augmentation for audio.

__init__(opts, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.audio.Noise(opts: Namespace, is_training: bool = True, noise_files_dir: str | None = None, *args, **kwargs)[source]

Bases: BaseTransformation

This class implements ambient noise augmentation for audio.

__init__(opts: Namespace, is_training: bool = True, noise_files_dir: str | None = None, *args, **kwargs) → None[source]

load_noise_files(cache_size: int) → List[TensorType][source]: This method caches a list of noise files for on the fly augmentation.

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.audio.SetFixedLength(opts: Namespace, *args, **kwargs)[source]

Bases: BaseTransformation

Set the audio buffer to a fixed length.

__init__(opts: Namespace, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.audio.Roll(opts: Namespace, *args, **kwargs)[source]

Bases: BaseTransformation

Perform a roll augmentation by shifting the window in a circular manner.

__init__(opts: Namespace, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.audio.MFCCs(opts: Namespace, *args, **kwargs)[source]

Bases: BaseTransformation

__init__(opts: Namespace, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.audio.LambdaAudio(opts: Namespace, func: Callable[[Tensor], Tensor], *args, **kwargs)[source]

Bases: BaseTransformation

Similar to @torchvision.transforms.Lambda, applies a user-defined lambda on the audio samples as a transform.

__init__(opts: Namespace, func: Callable[[Tensor], Tensor], *args, **kwargs) → None[source]

class data.transforms.audio.AudioResample(opts: Namespace, *args, **kwargs)[source]

Bases: BaseTransformation

Resample audio to a specified framerate.

classmethod add_arguments(parser: ArgumentParser) → None[source]

__init__(opts: Namespace, *args, **kwargs) → None[source]

class data.transforms.audio.StandardizeChannels(opts: Namespace, *args, **kwargs)[source]

Bases: BaseTransformation

__init__(opts: Namespace, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

data.transforms.audio_bytes module

class data.transforms.audio_bytes.TorchaudioSave(opts: Namespace, *args, **kwargs)[source]

Bases: BaseTransformation

Encode audio with a supported file encoding.

Parameters:: opts – The global options.

__init__(opts: Namespace, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

data.transforms.base_transforms module

class data.transforms.base_transforms.BaseTransformation(opts, *args, **kwargs)[source]

Bases: object

Base class for augmentation methods

__init__(opts, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

data.transforms.common module

class data.transforms.common.Compose(opts, img_transforms: List, *args, **kwargs)[source]

Bases: BaseTransformation

This method applies a list of transforms in a sequential fashion.

__init__(opts, img_transforms: List, *args, **kwargs) → None[source]

data.transforms.image_bytes module

class data.transforms.image_bytes.PILSave(opts: Namespace, *args, **kwargs)[source]

Bases: BaseTransformation

Encode an image with a supported file encoding.

__init__(opts: Namespace, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_bytes.ShuffleBytes(opts: Namespace, *args, **kwargs)[source]

Bases: BaseTransformation

Reorder the bytes in a 1-dimensional buffer.

__init__(opts: Namespace, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_bytes.MaskPositions(opts: Namespace, *args, **kwargs)[source]

Bases: BaseTransformation

Mask out values in a 1-dimensional buffer using a fixed masking pattern.

__init__(opts: Namespace, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_bytes.BytePermutation(opts: Namespace, *args, **kwargs)[source]

Bases: BaseTransformation

Remap byte values in [0, 255] to new values in [0, 255] using a permutation.

__init__(opts: Namespace, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_bytes.RandomUniformNoise(opts: Namespace, *args, **kwargs)[source]

Bases: BaseTransformation

Add random uniform noise to integer values.

__init__(opts: Namespace, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

data.transforms.image_pil module

class data.transforms.image_pil.FixedSizeCrop(opts, size: int | Tuple[int, int] | None = None, *args, **kwargs)[source]

Bases: BaseTransformation

__init__(opts, size: int | Tuple[int, int] | None = None, *args, **kwargs)[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.ScaleJitter(opts, *args, **kwargs)[source]

Bases: BaseTransformation

Randomly resizes the input within the scale range

__init__(opts, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.RandomResizedCrop(opts: Namespace, size: Sequence | int, *args, **kwargs)[source]

Bases: BaseTransformation, RandomResizedCrop

This class crops a random portion of an image and resize it to a given size.

__init__(opts: Namespace, size: Sequence | int, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

get_rrc_params(image: Image) → Tuple[int, int, int, int][source]

class data.transforms.image_pil.AutoAugment(opts, *args, **kwargs)[source]

Bases: BaseTransformation, AutoAugment

This class implements the AutoAugment data augmentation method.

__init__(opts, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.RandAugment(opts, *args, **kwargs)[source]

Bases: BaseTransformation, RandAugment

This class implements the RandAugment data augmentation method.

__init__(opts, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.TrivialAugmentWide(opts, *args, **kwargs)[source]

Bases: BaseTransformation, TrivialAugmentWide

This class implements the TrivialAugment (Wide) data augmentation method.

__init__(opts, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.RandomHorizontalFlip(opts, *args, **kwargs)[source]

Bases: BaseTransformation

This class implements random horizontal flipping method

__init__(opts, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.RandomRotate(opts, *args, **kwargs)[source]

Bases: BaseTransformation

This class implements random rotation method

__init__(opts, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.Resize(opts, img_size: int | Tuple[int, int] | None = None, *args, **kwargs)[source]

Bases: BaseTransformation

This class implements resizing operation.

Two possible modes for resizing. 1. Resize while maintaining aspect ratio. To enable this option, pass int as a size 2. Resize to a fixed size. To enable this option, pass a tuple of height and width as a size

Note

If img_size is passed as a positional argument, then it will override size from args

__init__(opts, img_size: int | Tuple[int, int] | None = None, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.CenterCrop(opts, *args, **kwargs)[source]

Bases: BaseTransformation

This class implements center cropping method.

Note

This class assumes that the input size is greater than or equal to the desired size.

__init__(opts, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.SSDCroping(opts, *args, **kwargs)[source]

Bases: BaseTransformation

This class implements cropping method for Single shot object detector.

__init__(opts, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.PhotometricDistort(opts, *args, **kwargs)[source]

Bases: BaseTransformation

This class implements Photometeric distorion.

Note

Hyper-parameters of PhotoMetricDistort in PIL and OpenCV are different. Be careful

__init__(opts, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.BoxPercentCoords(opts, *args, **kwargs)[source]

Bases: BaseTransformation

This class converts the box coordinates to percent

__init__(opts, *args, **kwargs) → None[source]

class data.transforms.image_pil.InstanceProcessor(opts, instance_size: int | Tuple[int, ...] | None = 16, *args, **kwargs)[source]

Bases: BaseTransformation

This class processes the instance masks.

__init__(opts, instance_size: int | Tuple[int, ...] | None = 16, *args, **kwargs) → None[source]

class data.transforms.image_pil.RandomResize(opts, *args, **kwargs)[source]

Bases: BaseTransformation

This class implements random resizing method.

__init__(opts, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.RandomShortSizeResize(opts, *args, **kwargs)[source]

Bases: BaseTransformation

This class implements random resizing such that shortest side is between specified minimum and maximum values.

__init__(opts, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.RandomErasing(opts, *args, **kwargs)[source]

Bases: BaseTransformation, RandomErasing

This class randomly selects a region in a tensor and erases its pixels. See this paper for details.

__init__(opts, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.RandomGaussianBlur(opts, *args, **kwargs)[source]

Bases: BaseTransformation

This method randomly blurs the input image.

__init__(opts, *args, **kwargs)[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.RandomCrop(opts, size: Sequence | int, ignore_idx: int | None = 255, *args, **kwargs)[source]

Bases: BaseTransformation

This method randomly crops an image area.

Note

If the size of input image is smaller than the desired crop size, the input image is first resized while maintaining the aspect ratio and then cropping is performed.

__init__(opts, size: Sequence | int, ignore_idx: int | None = 255, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

static get_params(img_h, img_w, target_h, target_w)[source]

static get_params_from_box(boxes, img_h, img_w)[source]

get_params_from_mask(data, i, j, h, w)[source]

class data.transforms.image_pil.ToTensor(opts, *args, **kwargs)[source]

Bases: BaseTransformation

This method converts an image into a tensor and optionally normalizes by a mean and std.

__init__(opts, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.RandomOrder(opts, img_transforms: List, *args, **kwargs)[source]

Bases: BaseTransformation

This method applies a list of all or few transforms in a random order.

__init__(opts, img_transforms: List, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_pil.RandAugmentTimm(opts, *args, **kwargs)[source]

Bases: BaseTransformation

This class implements the RandAugment data augmentation method, as described in ResNet Strikes Back paper

__init__(opts, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

data.transforms.image_torch module

class data.transforms.image_torch.RandomMixup(opts: Namespace, num_classes: int, *args, **kwargs)[source]

Bases: BaseTransformation

Given a batch of input images and labels, this class randomly applies the MixUp transformation

Parameters:

opts (argparse.Namespace) – Arguments
num_classes (int) – Number of classes in the dataset

__init__(opts: Namespace, num_classes: int, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.image_torch.RandomCutmix(opts: Namespace, num_classes: int, *args, **kwargs)[source]

Bases: BaseTransformation

Given a batch of input images and labels, this class randomly applies the CutMix transformation

Parameters:

opts (argparse.Namespace) – Arguments
num_classes (int) – Number of classes in the dataset

__init__(opts: Namespace, num_classes: int, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

data.transforms.image_torch.apply_mixing_transforms(opts: Namespace, data: Dict) → Dict[source]

Helper function to apply MixUp/CutMix transforms. If both MixUp and CutMix transforms are selected with 0.0 < p <= 1.0, then one of them is chosen randomly and applied.

Input data format:

data: mapping of: {: “samples”: {“sample_key”: Tensor of shape: [Batch, Channels, Height, Width]}, “targets”: {“target_key”: IntTensor of shape: [Batch]}

}

OR data: mapping of: {

“samples”: {“sample_key”: Tensor of shape: [Batch, Channels, Height, Width}, “targets”: IntTensor of shape: [Batch]

}

OR data: mapping of: {

“samples”: Tensor of shape: [Batch, Channels, Height, Width], “targets”: {“target_key”: IntTensor of shape: [Batch]}

} OR data: mapping of: {

“samples”: Tensor of shape: [Batch, Channels, Height, Width], “targets”: IntTensor of shape: [Batch]

}

Output data format: Same as the input

data.transforms.utils module

data.transforms.utils.setup_size(size: Any, error_msg='Need a tuple of length 2')[source]

data.transforms.utils.intersect(box_a, box_b)[source]: Computes the intersection between box_a and box_b

data.transforms.utils.jaccard_numpy(box_a: ndarray, box_b: ndarray)[source]

Computes the intersection of two boxes. :param box_a: Boxes of shape [Num_boxes_A, 4] :type box_a: np.ndarray :param box_b: Box osf shape [Num_boxes_B, 4] :type box_b: np.ndarray

Returns:: intersection over union scores. Shape is [box_a.shape[0], box_a.shape[1]]

data.transforms.video module

class data.transforms.video.ToTensor(opts: Namespace, *args, **kwargs)[source]

Bases: BaseTransformation

This method converts an image into a tensor.

Note

We do not perform any mean-std normalization. If mean-std normalization is desired, please modify this class.

__init__(opts: Namespace, *args, **kwargs) → None[source]

class data.transforms.video.SaveInputs(opts: Namespace, get_frame_captions: Callable[[Dict], List[str]] | None = None, *args, **kwargs)[source]

Bases: BaseTransformation

__init__(opts: Namespace, get_frame_captions: Callable[[Dict], List[str]] | None = None, *args, **kwargs) → None[source]

Saves the clips that are returned by VideoDataset.__getitem__() to disk for debugging use cases. This transformation operates on multiple clips that are extracted out of a single raw video. The video and audio of the clips are concatenated and saved into 1 video file.

1 raw input video ==> VideoDataset.__getitem__() ==>: multiple clips in data[“samples”][“video”] ==> SaveInputs() ==> 1 output debugging video.

This is useful for visualizing training and/or validation videos to make sure preprocessing logic is behaving as expected.

Parameters:

opts – Command line options.
get_frame_captions – If provided, this function returns a list of strings (one string per video frame). The frame captions will be added to the video as subtitles.

classmethod add_arguments(parser: ArgumentParser) → None[source]

save_video_with_annotations(data: Dict, output_video_path: Path) → None[source]

Save a video with audio and captions.

Parameters:

data –
Dataset output dict. Schema: { “samples”: {

”video”: Tensor[N x T X C x H x W], “audio”: Tensor[N x T_audio x C], # Optional “audio_raw”: Tensor[N x T_audio x C], # Optional - if provided,

# “audio” will be ignored.

”metadata”: {
“video_fps”: Union[float,int], “audio_fps”: Union[float,int],

}

}
} –
output_video_path – Path for saving the video.
get_frame_captions – A callback that receives @data as input and returns a list of captions (one string per video frame). If provided, the captions will be added to the output video as subtitles.

class data.transforms.video.RandomResizedCrop(opts, size: Tuple | int, *args, **kwargs)[source]

Bases: BaseTransformation

This class crops a random portion of an image and resize it to a given size.

__init__(opts, size: Tuple | int, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

get_params(height: int, width: int) -> (<class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>)[source]

class data.transforms.video.RandomShortSizeResizeCrop(opts, size: Tuple | int, *args, **kwargs)[source]

Bases: BaseTransformation

This class first randomly resizes the input video such that shortest side is between specified minimum and maximum values, adn then crops a desired size video.

Note

This class assumes that the video size after resizing is greater than or equal to the desired size.

__init__(opts, size: Tuple | int, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

get_params(height, width) → Tuple[int, int, int, int][source]

class data.transforms.video.RandomCrop(opts, size: Tuple | int, *args, **kwargs)[source]

Bases: BaseTransformation

This method randomly crops a video area.

Note

This class assumes that the input video size is greater than or equal to the desired size.

__init__(opts, size: Tuple | int, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

get_params(height: int, width: int) → Tuple[int, int, int, int][source]

class data.transforms.video.RandomHorizontalFlip(opts, *args, **kwargs)[source]

Bases: BaseTransformation

This class implements random horizontal flipping method

__init__(opts, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.video.CenterCrop(opts, size: Sequence, *args, **kwargs)[source]

Bases: BaseTransformation

This class implements center cropping method.

Note

This class assumes that the input size is greater than or equal to the desired size.

__init__(opts, size: Sequence, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.video.Resize(opts, *args, **kwargs)[source]

Bases: BaseTransformation

This class implements resizing operation.

Two possible modes for resizing. 1. Resize while maintaining aspect ratio. To enable this option, pass int as a size. 2. Resize to a fixed size. To enable this option, pass a tuple of height and width

as a size.

__init__(opts, *args, **kwargs) → None[source]

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.video.CropByBoundingBox(opts: Namespace, image_size: Tuple[int, int] | None = None, *args, **kwargs)[source]

Bases: BaseTransformation

Crops video frames based on bounding boxes and adjusts the @targets “box_coordinates” annotations. Before cropping, the bounding boxes are expanded with @multiplier, while the “box_coordinates” cover the original areas of the image. Note that the cropped images may be padded with 0 values in the boundaries of the cropped image when the bounding boxes are near the edges.

__init__(opts: Namespace, image_size: Tuple[int, int] | None = None, *args, **kwargs) → None[source]

expand_boxes(box_coordinates: Tensor) → Tuple[Tensor, Tensor][source]

Parameters:: box_coordinates – Tensor of shape […, 4] with (x0, y0, x1, y1) in [0,1]

Outputs (tuple items):

expanded_corners: Tensor of shape […, 4] with (x0, y0, x1, y1), containing: the coordinates for cropping. Because of the expansion, coordinates could be negative or >1.
box_coordinates: Tensor of shape […, 4] with (x0, y0, x1, y1) in [0,1] to: be used as bounding boxes after cropping.

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

class data.transforms.video.ShuffleAudios(opts: Namespace, is_training: bool, is_evaluation: bool, item_index: int, *args, **kwargs)[source]

Bases: BaseTransformation

__init__(opts: Namespace, is_training: bool, is_evaluation: bool, item_index: int, *args, **kwargs) → None[source]

Transforms a batch of audio-visual clips. Generates binary labels, useful for self-supervised audio-visual training.

At each invocation, a subset of clips within video (batch) get their audios shuffled. The ratio of clips that participate in the shuffling is configurable by argparse options.

When training, the shuffle order is random. When evaluating, the shuffle order is deterministic.

Parameters:

is_training – When False, decide to shuffle the audios or not deterministically.
is_evaluation – Combined with @is_training, determines which shuffle ratio argument to use (train/val/eval).
item_index – Used for deterministic shuffling based on the item_index.

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

Module contents

data.transforms.arguments_augmentation(parser: ArgumentParser) → ArgumentParser[source]