The Structure of Parametrized Transforms#
12 Minute Read
Summary#
This tutorial covers the details of how parameterized transforms are implemented.
We have two different core transform classes–
AtomicTransform
andComposingTransform
.These two classes inherit from the
Transform
base class, which provides their common functionalities.
The Core Structure of a Parametrized Transform#
The classes described in this tutorial are implemented in core.py.
The Transform
Base Class#
A Transform is an object that performs some processing on an input image to modify it.
Every transform in this package has a certain number of parameters associated with it.
For instance,
Grayscale
transform always converts the inputRGB
image to grayscale and thus, has0
parameters.On the other hand,
RandomGrayscale
transform converts the inputRGB
image to grayscale with a given probability and so, it has1
parameter to specify whether the input image was indeed converted to grayscale or not.As another example, the
ColorJitter
transform perturbs the brightness, contrast, saturation, and hue of the input image with randomly sampled strengths and in a random order. Thus, this transform has8
parameters– four parameters to specify the strengths of the brightness, contrast, saturation, and hue perturbations, and four more parameters to indicate the order in which these perturbations were applied.All provided transforms store this number of parameters as an attribute
param_count
, which is inferred using theset_param_count
method. In order to write your own transforms, you MUST define this attribute while initializing your class. You may use the providedset_param_count
method to do so but it is NOT mandatory.
Each transform is designed to input a tuple of the following two objects– 1. an image, and 2. a tuple of parameters. It also outputs a tuple of the following two objects– 1. the augmented image, and 2. the updated parameters. This is a major change in comparison with
torchvision
-based transforms;torchvision
-based transforms input only an image and output only the processed image.Check the type-hints
IMAGE_TYPE
,PARAM_TYPE
,TRANSFORM_RETURN_TYPE
, and the signature of the__call__
method of the base class for the same.
This package implements all transforms in two different modes– CASCADE and CONSUME.
Each transform MUST have a mode and is stored in the
tx_mode
attribute. Its values are defined in theTransformMode
enum.In CASCADE mode, the transform inputs an image and a tuple of parameters. It then samples local parameters, applies the transform using these parameters, and returns a tuple of the following two objects– 1. the augmented image, and 2. the previously input parameters appended with the local parameters.
In CONSUME mode, the transform inputs an image and a tuple of parameters. It then extracts the required number of local parameters from the input parameters, applies the transform with these parameters, and returns a tuple of the following two objects– 1. the augmented image, and 2. the remaining parameters from the input ones.
These two modes provide us with enough power to extract parameters of transforms and to reproduce transforms defined by given (well-defined) parametrization.
The
__call__
method of the base class redirects the inputs tocascade_transform
orconsume_transform
based on the mode of the transform.
For each transform, we define a method
get_default_params
, which is intended to return parameters that preserve the identity of the input image whenever possible. If this is not possible, these parameters are intended to preserve as much information in the input image as possible.For example, the
RandomRotation
transform hasparam_count = 1
to capture the angle of rotation of the image. Thus, the default parameters are the singleton tuple(0, )
as a0
-degree rotation preserves the image.For the case of
ToTensor
transform being applied on input images of the classPIL.Image.Image
, we haveparam_count = 0
and the default parameters are the empty tuple()
; these parameters preserve the information in the image and only change the type in which the image is represented.However, the
CenterCrop
transform differs from the above two cases. It hasparam_count = 0
and when the desired crop size does NOT match the image size, the resulting augmented image will NEVER be identical to the original image. Here still, the default parameters are the empty tuple()
; they do NOT preserve the identity of the image but retain as much information as possible.Another example is the
Compose
transform, whose default parameters are the concatenation of the default parameters of its components.Whenever possible, the default parameters are intended to act as “identity parameters”; applying the transform with those parameters retains the image information. In other cases, these are the parameters that retain as much information from the image as possible.
Note that in some cases, it may be possible to have more than one default parameters. For instance, the
ColorJitter
transform can preserve the input image as it is by applying brightness, contrast, and saturation perturbations of1.0
and hue perturbation of0.0
. However, these four operations can be applied in ANY order to get the identical image back from the transform. To cater for such cases, the corresponding transforms have an extra attributedefault_params_mode
.The values taken by
default_params_mode
are defined in theDefaultParamsMode
enum. The valueDefaultParamsMode.RANDOMIZED
is used to obtain a randomly sampled default parameter from the set of all possible default parameters whenever the transform is applied. On the other hand, the other valueDefaultParamsMode.UNIQUE
is used to obtain a fixed pre-defined default parameter value whenever the transform is applied.
Thus, to define your own transform, you can do the following–
Subclass from
Transform
.Define the attributes
tx_mode
andparam_count
; for the latter attribute, you may choose to do so with your concrete definition of theset_param_count
method.Define your concrete implementation of
cascade_transform
method.Define your concrete implementation of
consume_transform
method.Define your concrete implementation of
get_default_params
method.(Optional) Define your concrete implementation of
__str__
method.
We classify all transforms into two major types– Atomic and Composing. As the names suggest, Atomic transforms perform one simple set of actions on a data point. On the other hand, Composing transforms input one or more other transforms (either Atomic or Composing) and combine their functionalities to achieve the desired compositional behavior. We refer to these one or more transforms as the core transforms.
For example,
Grayscale
would be an Atomic transform that converts givenRGB
images into grayscale.However,
Compose
would be a Composing transform that has a list of other core transforms and applies them sequentially on a given input image.Similarly,
RandomChoice
would also be a Composing transform that has a list of other core transforms and applies a randomly sampled transform from this list on a given input image.
The AtomicTransform
Base Class#
The atomic transforms are implemented using the base class
AtomicTransform
, which subclasses the base classTransform
. TheAtomicTransform
provides a partial implementation ofcascade_transform
andconsume_transform
in order to enable the functions of all the atomic transforms as described below.Consider an atomic transform that is being called with the input image
img
and the input parametersparams
. It should perform the following five steps to operate in the CASCADE mode–Generate raw parameters
local_raw_params
for applying the transform.Apply the transform with these raw parameters on
img
to obtain the augmented imageaug_img
.Post-process
local_raw_params
to obtain the processed parameterslocal_proc_params
.Append
local_proc_params
to the inputparams
to obtain the concatenated processed parametersconcat_proc_params
.Return the tuple of the augmented image
aug_img
andlocal_proc_params
.
These steps are captured in the partial implementation of
cascade_tranform
method ofAtomicTransform
–
def cascade_transform(self, img: IMAGE_TYPE, params: PARAM_TYPE) -> TRANSFORM_RETURN_TYPE:
local_raw_params = self.get_raw_params(img=img)
aug_img = self.apply_transform(img=img, params=local_raw_params)
local_proc_params = self.post_process_params(img=img, params=local_raw_params)
concat_proc_params = Transform.concat_params(params, local_proc_params)
return aug_img, concat_proc_params
For the transform to operate in the CONSUME mode, the transform should perform the following steps–
From the given parameters, extract the processed parameters
local_proc_params
for the transform and the remaining processed parametersrem_proc_params
to be passed on.Pre-process
local_proc_params
to obtain the raw local parameterslocal_raw_params
.Apply the transform with these raw parameters on
img
to obtain the augmented imageaug_img
.Return the tuple of the augmented image
aug_img
and the remaning processed parametersrem_proc_params
.
These steps are captured in the partial implementation of
consume_tranform
method ofAtomicTransform
–
def consume_transform(self, img: IMAGE_TYPE, params: PARAM_TYPE) -> TRANSFORM_RETURN_TYPE:
local_proc_params, rem_proc_params = self.extract_params(params=params)
local_raw_params = self.pre_process_params(img=img, params=local_proc_params)
aug_img = self.apply_transform(img=img, params=local_raw_params)
return aug_img, rem_proc_params
Thus, to define your own atomic transform, you can do the following–
Subclass from
AtomicTransform
.Define the attributes
tx_mode
andparam_count
; you may choose to do so with your concrete definition of theset_param_count
method for the latter.Define your concrete implementation of
get_raw_params
method.Define your concrete implementation of
apply_transform
method.Define your concrete implementation of
post_process_params
method.Define your concrete implementation of
extract_params
method.Define your concrete implementation of
pre_process_params
method.Define your concrete implementation of
get_default_params
method.(Optional) Define your concrete implementation of
__str__
method.
The ComposingTransform
Base Class#
The composing transforms are implemented using the base class
ComposingTransform
, which subclasses the base classTransform
. TheComposingTransform
provides a partial implementation ofcascade_transform
andconsume_transform
in order to enable the composing functionalities of all the composing transforms as described below.Consider a compsing transform that is being called with the input image
img
and the input parametersparams
. The composing transform is designed to perform some composing functionality on top of its core transforms and thus, it should perform the following five steps to operate in the CASCADE mode–Generate raw parameters
local_raw_params
to guide the application of the core transform.Apply the core transforms guided by
local_raw_params
onimg
to obtain the augmented imageaug_img
along with the parametersaug_params
generated by the core transform.Post-process
local_raw_params
andaug_params
to obtain their processed versions–local_proc_params
andaug_proc_params
respectively.Append
local_proc_params
andaug_proc_params
to the inputparams
to obtain the concatenated processed parametersconcat_proc_params
.Return the tuple of the augmented image
aug_img
andlocal_proc_params
.
These steps are captured in the partial implementation of
cascade_tranform
method ofComposingTransform
–
def cascade_transform(self, img: IMAGE_TYPE, params: PARAM_TYPE) -> TRANSFORM_RETURN_TYPE:
local_raw_params = self.get_raw_params(img=img)
aug_img, aug_params = self.apply_cascade_transform(img=img, params=local_raw_params)
local_proc_params, aug_proc_params = self.post_process_params(img=img, params=local_raw_params, aug_params=aug_params)
concat_proc_params = Transform.concat_params(params, local_proc_params, aug_proc_params)
return aug_img, concat_proc_params
For the transform to operate in the CONSUME mode, the composing transform should perform the following steps–
From the given parameters, extract the processed parameters
concat_local_params
for the transform and the remaining processed parametersrem_proc_params
to be passed on.Pre-process
concat_local_params
to obtain the raw local parameterslocal_raw_params
along with the core augmentation parametersaug_params
.Apply the core transform guided by
local_raw_params
and defined usingaug_params
onimg
to obtain the augmented imageaug_img
and the remaining augmentation parametersrem_aug_params
.Note that
rem_aug_params
MUST be empty. Otherwise, there is an error in the implementation.
Return the tuple of the augmented image
aug_img
and the remaning processed parametersrem_proc_params
.
These steps are captured in the partial implementation of
consume_tranform
method ofAtomicTransform
–
def consume_transform(self, img: IMAGE_TYPE, params: PARAM_TYPE) -> TRANSFORM_RETURN_TYPE:
concat_proc_params, rem_proc_params = self.extract_params(params=params)
local_raw_params, aug_params = self.pre_process_params(img=img, params=concat_local_params)
aug_img, rem_aug_params = self.apply_consume_transform(img=img, params=local_raw_params, aug_params=aug_params)
assert len(rem_aug_params) == 0
return aug_img, rem_proc_params
Thus, to define your own composing transform, you can do the following–
Subclass from
ComposingTransform
.Define the attributes
tx_mode
andparam_count
; you may choose to do so with your concrete definition of theset_param_count
method for the latter.Define the core transform with the attribute
transforms
.Define your concrete implementation of
get_raw_params
method.Define your concrete implementation of
apply_cascade_transform
method.Define your concrete implementation of
post_process_params
method.Define your concrete implementation of
extract_params
method.Define your concrete implementation of
pre_process_params
method.Define your concrete implementation of
apply_consume_transform
method.Define your concrete implementation of
get_default_params
method.(Optional) Define your concrete implementation of
__str__
method.
About the Next Tutorial#
In the next tutorial 002-How-to-Write-Your-Own-Transforms.md, we will use the structure of parameterized transforms as described in this tutorial to write our own custom transforms!
In particular, we will write one atomic transform named
RandomColorErasing
and one composing transform namedRandomSubsetApply
to better understand the structure of parameterized transforms. We highly recommend spending some time going through this tutorial to understand the nitty-gritties of actually writing a parameterized transforms.However, in case you are only interested in using the parameterized transforms provided by this package, you may skip the next tutorial and jump directly to the subsequent tutorial– 003-A-Brief-Introduction-to-the-Transforms-in-This-Package.