data.transforms package
Subpackages
Submodules
data.transforms.audio module
- class data.transforms.audio.Gain(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
This class implements gain augmentation for audio.
- class data.transforms.audio.Noise(opts: Namespace, is_training: bool = True, noise_files_dir: str | None = None, *args, **kwargs)[source]
Bases:
BaseTransformation
This class implements ambient noise augmentation for audio.
- __init__(opts: Namespace, is_training: bool = True, noise_files_dir: str | None = None, *args, **kwargs) None [source]
- class data.transforms.audio.SetFixedLength(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformation
Set the audio buffer to a fixed length.
- class data.transforms.audio.Roll(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformation
Perform a roll augmentation by shifting the window in a circular manner.
- class data.transforms.audio.MFCCs(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformation
- class data.transforms.audio.LambdaAudio(opts: Namespace, func: Callable[[Tensor], Tensor], *args, **kwargs)[source]
Bases:
BaseTransformation
Similar to @torchvision.transforms.Lambda, applies a user-defined lambda on the audio samples as a transform.
- class data.transforms.audio.AudioResample(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformation
Resample audio to a specified framerate.
data.transforms.audio_bytes module
- class data.transforms.audio_bytes.TorchaudioSave(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformation
Encode audio with a supported file encoding.
- Parameters:
opts – The global options.
data.transforms.base_transforms module
data.transforms.common module
- class data.transforms.common.Compose(opts, img_transforms: List, *args, **kwargs)[source]
Bases:
BaseTransformation
This method applies a list of transforms in a sequential fashion.
data.transforms.image_bytes module
- class data.transforms.image_bytes.PILSave(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformation
Encode an image with a supported file encoding.
- class data.transforms.image_bytes.ShuffleBytes(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformation
Reorder the bytes in a 1-dimensional buffer.
- class data.transforms.image_bytes.MaskPositions(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformation
Mask out values in a 1-dimensional buffer using a fixed masking pattern.
- class data.transforms.image_bytes.BytePermutation(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformation
Remap byte values in [0, 255] to new values in [0, 255] using a permutation.
data.transforms.image_pil module
- class data.transforms.image_pil.FixedSizeCrop(opts, size: int | Tuple[int, int] | None = None, *args, **kwargs)[source]
Bases:
BaseTransformation
- class data.transforms.image_pil.ScaleJitter(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
Randomly resizes the input within the scale range
- class data.transforms.image_pil.RandomResizedCrop(opts: Namespace, size: Sequence | int, *args, **kwargs)[source]
Bases:
BaseTransformation
,RandomResizedCrop
This class crops a random portion of an image and resize it to a given size.
- class data.transforms.image_pil.AutoAugment(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
,AutoAugment
This class implements the AutoAugment data augmentation method.
- class data.transforms.image_pil.RandAugment(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
,RandAugment
This class implements the RandAugment data augmentation method.
- class data.transforms.image_pil.TrivialAugmentWide(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
,TrivialAugmentWide
This class implements the TrivialAugment (Wide) data augmentation method.
- class data.transforms.image_pil.RandomHorizontalFlip(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
This class implements random horizontal flipping method
- class data.transforms.image_pil.RandomRotate(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
This class implements random rotation method
- class data.transforms.image_pil.Resize(opts, img_size: int | Tuple[int, int] | None = None, *args, **kwargs)[source]
Bases:
BaseTransformation
This class implements resizing operation.
Two possible modes for resizing. 1. Resize while maintaining aspect ratio. To enable this option, pass int as a size 2. Resize to a fixed size. To enable this option, pass a tuple of height and width as a size
Note
If img_size is passed as a positional argument, then it will override size from args
- class data.transforms.image_pil.CenterCrop(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
This class implements center cropping method.
Note
This class assumes that the input size is greater than or equal to the desired size.
- class data.transforms.image_pil.SSDCroping(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
This class implements cropping method for Single shot object detector.
- class data.transforms.image_pil.PhotometricDistort(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
This class implements Photometeric distorion.
Note
Hyper-parameters of PhotoMetricDistort in PIL and OpenCV are different. Be careful
- class data.transforms.image_pil.BoxPercentCoords(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
This class converts the box coordinates to percent
- class data.transforms.image_pil.InstanceProcessor(opts, instance_size: int | Tuple[int, ...] | None = 16, *args, **kwargs)[source]
Bases:
BaseTransformation
This class processes the instance masks.
- class data.transforms.image_pil.RandomResize(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
This class implements random resizing method.
- class data.transforms.image_pil.RandomShortSizeResize(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
This class implements random resizing such that shortest side is between specified minimum and maximum values.
- class data.transforms.image_pil.RandomErasing(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
,RandomErasing
This class randomly selects a region in a tensor and erases its pixels. See this paper for details.
- class data.transforms.image_pil.RandomGaussianBlur(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
This method randomly blurs the input image.
- class data.transforms.image_pil.RandomCrop(opts, size: Sequence | int, ignore_idx: int | None = 255, *args, **kwargs)[source]
Bases:
BaseTransformation
This method randomly crops an image area.
Note
If the size of input image is smaller than the desired crop size, the input image is first resized while maintaining the aspect ratio and then cropping is performed.
- class data.transforms.image_pil.ToTensor(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
This method converts an image into a tensor and optionally normalizes by a mean and std.
- class data.transforms.image_pil.RandomOrder(opts, img_transforms: List, *args, **kwargs)[source]
Bases:
BaseTransformation
This method applies a list of all or few transforms in a random order.
- class data.transforms.image_pil.RandAugmentTimm(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
This class implements the RandAugment data augmentation method, as described in ResNet Strikes Back paper
data.transforms.image_torch module
- class data.transforms.image_torch.RandomMixup(opts: Namespace, num_classes: int, *args, **kwargs)[source]
Bases:
BaseTransformation
Given a batch of input images and labels, this class randomly applies the MixUp transformation
- Parameters:
opts (argparse.Namespace) – Arguments
num_classes (int) – Number of classes in the dataset
- class data.transforms.image_torch.RandomCutmix(opts: Namespace, num_classes: int, *args, **kwargs)[source]
Bases:
BaseTransformation
Given a batch of input images and labels, this class randomly applies the CutMix transformation
- Parameters:
opts (argparse.Namespace) – Arguments
num_classes (int) – Number of classes in the dataset
- data.transforms.image_torch.apply_mixing_transforms(opts: Namespace, data: Dict) Dict [source]
Helper function to apply MixUp/CutMix transforms. If both MixUp and CutMix transforms are selected with 0.0 < p <= 1.0, then one of them is chosen randomly and applied.
- Input data format:
- data: mapping of: {
“samples”: {“sample_key”: Tensor of shape: [Batch, Channels, Height, Width]}, “targets”: {“target_key”: IntTensor of shape: [Batch]}
}
OR data: mapping of: {
“samples”: {“sample_key”: Tensor of shape: [Batch, Channels, Height, Width}, “targets”: IntTensor of shape: [Batch]
}
OR data: mapping of: {
“samples”: Tensor of shape: [Batch, Channels, Height, Width], “targets”: {“target_key”: IntTensor of shape: [Batch]}
} OR data: mapping of: {
“samples”: Tensor of shape: [Batch, Channels, Height, Width], “targets”: IntTensor of shape: [Batch]
}
Output data format: Same as the input
data.transforms.utils module
- data.transforms.utils.intersect(box_a, box_b)[source]
Computes the intersection between box_a and box_b
- data.transforms.utils.jaccard_numpy(box_a: ndarray, box_b: ndarray)[source]
Computes the intersection of two boxes. :param box_a: Boxes of shape [Num_boxes_A, 4] :type box_a: np.ndarray :param box_b: Box osf shape [Num_boxes_B, 4] :type box_b: np.ndarray
- Returns:
intersection over union scores. Shape is [box_a.shape[0], box_a.shape[1]]
data.transforms.video module
- class data.transforms.video.ToTensor(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseTransformation
This method converts an image into a tensor.
Note
We do not perform any mean-std normalization. If mean-std normalization is desired, please modify this class.
- class data.transforms.video.SaveInputs(opts: Namespace, get_frame_captions: Callable[[Dict], List[str]] | None = None, *args, **kwargs)[source]
Bases:
BaseTransformation
- __init__(opts: Namespace, get_frame_captions: Callable[[Dict], List[str]] | None = None, *args, **kwargs) None [source]
Saves the clips that are returned by VideoDataset.__getitem__() to disk for debugging use cases. This transformation operates on multiple clips that are extracted out of a single raw video. The video and audio of the clips are concatenated and saved into 1 video file.
- 1 raw input video ==> VideoDataset.__getitem__() ==>
multiple clips in data[“samples”][“video”] ==> SaveInputs() ==> 1 output debugging video.
This is useful for visualizing training and/or validation videos to make sure preprocessing logic is behaving as expected.
- Parameters:
opts – Command line options.
get_frame_captions – If provided, this function returns a list of strings (one string per video frame). The frame captions will be added to the video as subtitles.
- save_video_with_annotations(data: Dict, output_video_path: Path) None [source]
Save a video with audio and captions.
- Parameters:
data –
Dataset output dict. Schema: { “samples”: {
”video”: Tensor[N x T X C x H x W], “audio”: Tensor[N x T_audio x C], # Optional “audio_raw”: Tensor[N x T_audio x C], # Optional - if provided,
# “audio” will be ignored.
- ”metadata”: {
“video_fps”: Union[float,int], “audio_fps”: Union[float,int],
}
}
} –
output_video_path – Path for saving the video.
get_frame_captions – A callback that receives @data as input and returns a list of captions (one string per video frame). If provided, the captions will be added to the output video as subtitles.
- class data.transforms.video.RandomResizedCrop(opts, size: Tuple | int, *args, **kwargs)[source]
Bases:
BaseTransformation
This class crops a random portion of an image and resize it to a given size.
- class data.transforms.video.RandomShortSizeResizeCrop(opts, size: Tuple | int, *args, **kwargs)[source]
Bases:
BaseTransformation
This class first randomly resizes the input video such that shortest side is between specified minimum and maximum values, adn then crops a desired size video.
Note
This class assumes that the video size after resizing is greater than or equal to the desired size.
- class data.transforms.video.RandomCrop(opts, size: Tuple | int, *args, **kwargs)[source]
Bases:
BaseTransformation
This method randomly crops a video area.
Note
This class assumes that the input video size is greater than or equal to the desired size.
- class data.transforms.video.RandomHorizontalFlip(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
This class implements random horizontal flipping method
- class data.transforms.video.CenterCrop(opts, size: Sequence, *args, **kwargs)[source]
Bases:
BaseTransformation
This class implements center cropping method.
Note
This class assumes that the input size is greater than or equal to the desired size.
- class data.transforms.video.Resize(opts, *args, **kwargs)[source]
Bases:
BaseTransformation
This class implements resizing operation.
Two possible modes for resizing. 1. Resize while maintaining aspect ratio. To enable this option, pass int as a size. 2. Resize to a fixed size. To enable this option, pass a tuple of height and width
as a size.
- class data.transforms.video.CropByBoundingBox(opts: Namespace, image_size: Tuple[int, int] | None = None, *args, **kwargs)[source]
Bases:
BaseTransformation
Crops video frames based on bounding boxes and adjusts the @targets “box_coordinates” annotations. Before cropping, the bounding boxes are expanded with @multiplier, while the “box_coordinates” cover the original areas of the image. Note that the cropped images may be padded with 0 values in the boundaries of the cropped image when the bounding boxes are near the edges.
- __init__(opts: Namespace, image_size: Tuple[int, int] | None = None, *args, **kwargs) None [source]
- expand_boxes(box_coordinates: Tensor) Tuple[Tensor, Tensor] [source]
- Parameters:
box_coordinates – Tensor of shape […, 4] with (x0, y0, x1, y1) in [0,1]
- Outputs (tuple items):
- expanded_corners: Tensor of shape […, 4] with (x0, y0, x1, y1), containing
the coordinates for cropping. Because of the expansion, coordinates could be negative or >1.
- box_coordinates: Tensor of shape […, 4] with (x0, y0, x1, y1) in [0,1] to
be used as bounding boxes after cropping.
- class data.transforms.video.ShuffleAudios(opts: Namespace, is_training: bool, is_evaluation: bool, item_index: int, *args, **kwargs)[source]
Bases:
BaseTransformation
- __init__(opts: Namespace, is_training: bool, is_evaluation: bool, item_index: int, *args, **kwargs) None [source]
Transforms a batch of audio-visual clips. Generates binary labels, useful for self-supervised audio-visual training.
At each invocation, a subset of clips within video (batch) get their audios shuffled. The ratio of clips that participate in the shuffling is configurable by argparse options.
When training, the shuffle order is random. When evaluating, the shuffle order is deterministic.
- Parameters:
is_training – When False, decide to shuffle the audios or not deterministically.
is_evaluation – Combined with @is_training, determines which shuffle ratio argument to use (train/val/eval).
item_index – Used for deterministic shuffling based on the item_index.