data.transforms package
Subpackages
Submodules
data.transforms.audio module
- class data.transforms.audio.Gain(opts, *args, **kwargs)[source]
Bases:
BaseTransformationThis class implements gain augmentation for audio.
- class data.transforms.audio.Noise(opts: Namespace, is_training: bool = True, noise_files_dir: str | None = None, *args, **kwargs)[source]
Bases:
BaseTransformationThis class implements ambient noise augmentation for audio.
- __init__(opts: Namespace, is_training: bool = True, noise_files_dir: str | None = None, *args, **kwargs) None[source]
- class data.transforms.audio.SetFixedLength(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformationSet the audio buffer to a fixed length.
- class data.transforms.audio.Roll(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformationPerform a roll augmentation by shifting the window in a circular manner.
- class data.transforms.audio.MFCCs(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformation
- class data.transforms.audio.LambdaAudio(opts: Namespace, func: Callable[[Tensor], Tensor], *args, **kwargs)[source]
Bases:
BaseTransformationSimilar to @torchvision.transforms.Lambda, applies a user-defined lambda on the audio samples as a transform.
- class data.transforms.audio.AudioResample(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformationResample audio to a specified framerate.
data.transforms.audio_bytes module
- class data.transforms.audio_bytes.TorchaudioSave(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformationEncode audio with a supported file encoding.
- Parameters:
opts – The global options.
data.transforms.base_transforms module
data.transforms.common module
- class data.transforms.common.Compose(opts, img_transforms: List, *args, **kwargs)[source]
Bases:
BaseTransformationThis method applies a list of transforms in a sequential fashion.
data.transforms.image_bytes module
- class data.transforms.image_bytes.PILSave(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformationEncode an image with a supported file encoding.
- class data.transforms.image_bytes.ShuffleBytes(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformationReorder the bytes in a 1-dimensional buffer.
- class data.transforms.image_bytes.MaskPositions(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformationMask out values in a 1-dimensional buffer using a fixed masking pattern.
- class data.transforms.image_bytes.BytePermutation(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformationRemap byte values in [0, 255] to new values in [0, 255] using a permutation.
data.transforms.image_pil module
- class data.transforms.image_pil.FixedSizeCrop(opts, size: int | Tuple[int, int] | None = None, *args, **kwargs)[source]
Bases:
BaseTransformation
- class data.transforms.image_pil.ScaleJitter(opts, *args, **kwargs)[source]
Bases:
BaseTransformationRandomly resizes the input within the scale range
- class data.transforms.image_pil.RandomResizedCrop(opts: Namespace, size: Sequence | int, *args, **kwargs)[source]
Bases:
BaseTransformation,RandomResizedCropThis class crops a random portion of an image and resize it to a given size.
- class data.transforms.image_pil.AutoAugment(opts, *args, **kwargs)[source]
Bases:
BaseTransformation,AutoAugmentThis class implements the AutoAugment data augmentation method.
- class data.transforms.image_pil.RandAugment(opts, *args, **kwargs)[source]
Bases:
BaseTransformation,RandAugmentThis class implements the RandAugment data augmentation method.
- class data.transforms.image_pil.TrivialAugmentWide(opts, *args, **kwargs)[source]
Bases:
BaseTransformation,TrivialAugmentWideThis class implements the TrivialAugment (Wide) data augmentation method.
- class data.transforms.image_pil.RandomHorizontalFlip(opts, *args, **kwargs)[source]
Bases:
BaseTransformationThis class implements random horizontal flipping method
- class data.transforms.image_pil.RandomRotate(opts, *args, **kwargs)[source]
Bases:
BaseTransformationThis class implements random rotation method
- class data.transforms.image_pil.Resize(opts, img_size: int | Tuple[int, int] | None = None, *args, **kwargs)[source]
Bases:
BaseTransformationThis class implements resizing operation.
Two possible modes for resizing. 1. Resize while maintaining aspect ratio. To enable this option, pass int as a size 2. Resize to a fixed size. To enable this option, pass a tuple of height and width as a size
Note
If img_size is passed as a positional argument, then it will override size from args
- class data.transforms.image_pil.CenterCrop(opts, *args, **kwargs)[source]
Bases:
BaseTransformationThis class implements center cropping method.
Note
This class assumes that the input size is greater than or equal to the desired size.
- class data.transforms.image_pil.SSDCroping(opts, *args, **kwargs)[source]
Bases:
BaseTransformationThis class implements cropping method for Single shot object detector.
- class data.transforms.image_pil.PhotometricDistort(opts, *args, **kwargs)[source]
Bases:
BaseTransformationThis class implements Photometeric distorion.
Note
Hyper-parameters of PhotoMetricDistort in PIL and OpenCV are different. Be careful
- class data.transforms.image_pil.BoxPercentCoords(opts, *args, **kwargs)[source]
Bases:
BaseTransformationThis class converts the box coordinates to percent
- class data.transforms.image_pil.InstanceProcessor(opts, instance_size: int | Tuple[int, ...] | None = 16, *args, **kwargs)[source]
Bases:
BaseTransformationThis class processes the instance masks.
- class data.transforms.image_pil.RandomResize(opts, *args, **kwargs)[source]
Bases:
BaseTransformationThis class implements random resizing method.
- class data.transforms.image_pil.RandomShortSizeResize(opts, *args, **kwargs)[source]
Bases:
BaseTransformationThis class implements random resizing such that shortest side is between specified minimum and maximum values.
- class data.transforms.image_pil.RandomErasing(opts, *args, **kwargs)[source]
Bases:
BaseTransformation,RandomErasingThis class randomly selects a region in a tensor and erases its pixels. See this paper for details.
- class data.transforms.image_pil.RandomGaussianBlur(opts, *args, **kwargs)[source]
Bases:
BaseTransformationThis method randomly blurs the input image.
- class data.transforms.image_pil.RandomCrop(opts, size: Sequence | int, ignore_idx: int | None = 255, *args, **kwargs)[source]
Bases:
BaseTransformationThis method randomly crops an image area.
Note
If the size of input image is smaller than the desired crop size, the input image is first resized while maintaining the aspect ratio and then cropping is performed.
- class data.transforms.image_pil.ToTensor(opts, *args, **kwargs)[source]
Bases:
BaseTransformationThis method converts an image into a tensor and optionally normalizes by a mean and std.
- class data.transforms.image_pil.RandomOrder(opts, img_transforms: List, *args, **kwargs)[source]
Bases:
BaseTransformationThis method applies a list of all or few transforms in a random order.
- class data.transforms.image_pil.RandAugmentTimm(opts, *args, **kwargs)[source]
Bases:
BaseTransformationThis class implements the RandAugment data augmentation method, as described in ResNet Strikes Back paper
data.transforms.image_torch module
- class data.transforms.image_torch.RandomMixup(opts: Namespace, num_classes: int, *args, **kwargs)[source]
Bases:
BaseTransformationGiven a batch of input images and labels, this class randomly applies the MixUp transformation
- Parameters:
opts (argparse.Namespace) – Arguments
num_classes (int) – Number of classes in the dataset
- class data.transforms.image_torch.RandomCutmix(opts: Namespace, num_classes: int, *args, **kwargs)[source]
Bases:
BaseTransformationGiven a batch of input images and labels, this class randomly applies the CutMix transformation
- Parameters:
opts (argparse.Namespace) – Arguments
num_classes (int) – Number of classes in the dataset
- data.transforms.image_torch.apply_mixing_transforms(opts: Namespace, data: Dict) Dict[source]
Helper function to apply MixUp/CutMix transforms. If both MixUp and CutMix transforms are selected with 0.0 < p <= 1.0, then one of them is chosen randomly and applied.
- Input data format:
- data: mapping of: {
“samples”: {“sample_key”: Tensor of shape: [Batch, Channels, Height, Width]}, “targets”: {“target_key”: IntTensor of shape: [Batch]}
}
OR data: mapping of: {
“samples”: {“sample_key”: Tensor of shape: [Batch, Channels, Height, Width}, “targets”: IntTensor of shape: [Batch]
}
OR data: mapping of: {
“samples”: Tensor of shape: [Batch, Channels, Height, Width], “targets”: {“target_key”: IntTensor of shape: [Batch]}
} OR data: mapping of: {
“samples”: Tensor of shape: [Batch, Channels, Height, Width], “targets”: IntTensor of shape: [Batch]
}
Output data format: Same as the input
data.transforms.utils module
- data.transforms.utils.intersect(box_a, box_b)[source]
Computes the intersection between box_a and box_b
- data.transforms.utils.jaccard_numpy(box_a: ndarray, box_b: ndarray)[source]
Computes the intersection of two boxes. :param box_a: Boxes of shape [Num_boxes_A, 4] :type box_a: np.ndarray :param box_b: Box osf shape [Num_boxes_B, 4] :type box_b: np.ndarray
- Returns:
intersection over union scores. Shape is [box_a.shape[0], box_a.shape[1]]
data.transforms.video module
- class data.transforms.video.ToTensor(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformationThis method converts an image into a tensor.
Note
We do not perform any mean-std normalization. If mean-std normalization is desired, please modify this class.
- class data.transforms.video.SaveInputs(opts: Namespace, get_frame_captions: Callable[[Dict], List[str]] | None = None, *args, **kwargs)[source]
Bases:
BaseTransformation- __init__(opts: Namespace, get_frame_captions: Callable[[Dict], List[str]] | None = None, *args, **kwargs) None[source]
Saves the clips that are returned by VideoDataset.__getitem__() to disk for debugging use cases. This transformation operates on multiple clips that are extracted out of a single raw video. The video and audio of the clips are concatenated and saved into 1 video file.
- 1 raw input video ==> VideoDataset.__getitem__() ==>
multiple clips in data[“samples”][“video”] ==> SaveInputs() ==> 1 output debugging video.
This is useful for visualizing training and/or validation videos to make sure preprocessing logic is behaving as expected.
- Parameters:
opts – Command line options.
get_frame_captions – If provided, this function returns a list of strings (one string per video frame). The frame captions will be added to the video as subtitles.
- save_video_with_annotations(data: Dict, output_video_path: Path) None[source]
Save a video with audio and captions.
- Parameters:
data –
Dataset output dict. Schema: { “samples”: {
”video”: Tensor[N x T X C x H x W], “audio”: Tensor[N x T_audio x C], # Optional “audio_raw”: Tensor[N x T_audio x C], # Optional - if provided,
# “audio” will be ignored.
- ”metadata”: {
“video_fps”: Union[float,int], “audio_fps”: Union[float,int],
}
}
} –
output_video_path – Path for saving the video.
get_frame_captions – A callback that receives @data as input and returns a list of captions (one string per video frame). If provided, the captions will be added to the output video as subtitles.
- class data.transforms.video.RandomResizedCrop(opts, size: Tuple | int, *args, **kwargs)[source]
Bases:
BaseTransformationThis class crops a random portion of an image and resize it to a given size.
- class data.transforms.video.RandomShortSizeResizeCrop(opts, size: Tuple | int, *args, **kwargs)[source]
Bases:
BaseTransformationThis class first randomly resizes the input video such that shortest side is between specified minimum and maximum values, adn then crops a desired size video.
Note
This class assumes that the video size after resizing is greater than or equal to the desired size.
- class data.transforms.video.RandomCrop(opts, size: Tuple | int, *args, **kwargs)[source]
Bases:
BaseTransformationThis method randomly crops a video area.
Note
This class assumes that the input video size is greater than or equal to the desired size.
- class data.transforms.video.RandomHorizontalFlip(opts, *args, **kwargs)[source]
Bases:
BaseTransformationThis class implements random horizontal flipping method
- class data.transforms.video.CenterCrop(opts, size: Sequence, *args, **kwargs)[source]
Bases:
BaseTransformationThis class implements center cropping method.
Note
This class assumes that the input size is greater than or equal to the desired size.
- class data.transforms.video.Resize(opts, *args, **kwargs)[source]
Bases:
BaseTransformationThis class implements resizing operation.
Two possible modes for resizing. 1. Resize while maintaining aspect ratio. To enable this option, pass int as a size. 2. Resize to a fixed size. To enable this option, pass a tuple of height and width
as a size.
- class data.transforms.video.CropByBoundingBox(opts: Namespace, image_size: Tuple[int, int] | None = None, *args, **kwargs)[source]
Bases:
BaseTransformationCrops video frames based on bounding boxes and adjusts the @targets “box_coordinates” annotations. Before cropping, the bounding boxes are expanded with @multiplier, while the “box_coordinates” cover the original areas of the image. Note that the cropped images may be padded with 0 values in the boundaries of the cropped image when the bounding boxes are near the edges.
- expand_boxes(box_coordinates: Tensor) Tuple[Tensor, Tensor][source]
- Parameters:
box_coordinates – Tensor of shape […, 4] with (x0, y0, x1, y1) in [0,1]
- Outputs (tuple items):
- expanded_corners: Tensor of shape […, 4] with (x0, y0, x1, y1), containing
the coordinates for cropping. Because of the expansion, coordinates could be negative or >1.
- box_coordinates: Tensor of shape […, 4] with (x0, y0, x1, y1) in [0,1] to
be used as bounding boxes after cropping.
- class data.transforms.video.ShuffleAudios(opts: Namespace, is_training: bool, is_evaluation: bool, item_index: int, *args, **kwargs)[source]
Bases:
BaseTransformation- __init__(opts: Namespace, is_training: bool, is_evaluation: bool, item_index: int, *args, **kwargs) None[source]
Transforms a batch of audio-visual clips. Generates binary labels, useful for self-supervised audio-visual training.
At each invocation, a subset of clips within video (batch) get their audios shuffled. The ratio of clips that participate in the shuffling is configurable by argparse options.
When training, the shuffle order is random. When evaluating, the shuffle order is deterministic.
- Parameters:
is_training – When False, decide to shuffle the audios or not deterministically.
is_evaluation – Combined with @is_training, determines which shuffle ratio argument to use (train/val/eval).
item_index – Used for deterministic shuffling based on the item_index.