How to Write Your Own Transforms#
26 Minute Read
Summary#
In this tutorial, we will write our own custom transforms to illustrate the structure of parameterized transforms as described in the previous tutorial.
We will write one custom atomic transform
RandomColorErasing
and one custom composing transformRandomSubsetApply
.We will then visualize sample augmentations generated by these transforms along with the parameters corresponding to those augmentations.
A Quick Recap of the Atomic Transform Structure#
Recall that to define your own atomic transform, you can do the following–
Subclass from
AtomicTransform
.Define the attributes
tx_mode
andparam_count
; you may choose to do so with your concrete definition of theset_param_count
method for the latter.Define your concrete implementation of
get_raw_params
method.Define your concrete implementation of
apply_transform
method.Define your concrete implementation of
post_process_params
method.Define your concrete implementation of
extract_params
method.Define your concrete implementation of
pre_process_params
method.Define your concrete implementation of
get_default_params
method.(Optional) Define your concrete implementation of
__str__
method.
RandomColorErasing
: A Custom Atomic Transform#
Follow the notebook 002-001-RandomColorErasing.ipynb while going through the details below.
Let us write a transform
RandomColorErasing
that inputsPIL
-images and erases a randomly selected rectangular region of the image by replacing it with a randomly sampled color.Since we want a well-defined and simple functionality, we implement it as an atomic transform. Thus, we subclass from
AtomicTransform
to derive theRandomColorErasing
class.
The Initializer __init__
#
The initializer of the class sets the transform mode
tx_mode
and the default parameters modedefault_params_mode
. The former is a requirement as per the transform structure and the latter is needed since this transform has multiple default parameter possibilities, as explained later on.The initializer sets the number of parameters of the transform
param_count
, which is7
. This is because we need to uniquely describe the rectangle to be erased and the fill-color for this erased region. The rectangle can be uniquely defined with4
parameters; two parameters describing the coordinates of the location of the rectangle, which is defined as its top-left corner, and two more parameters describing its size, which comprises of its height and width. Next, the fill-color can be captured with3
parameters for its(r, g, b)
values. Thus, we need7
parameters in total, which is why we haveparam_count = 7
.
Getting Raw Parameters with get_raw_params
#
We define our implementation of
get_raw_params
as follows.The raw parameters are the
4
parameters for the rectangle to be erasedi, j, h, w
, The tuple(i, j)
represents the top-left corner of the rectangle in height-first format. The tuple(h, w)
represents the tuple of the height and width of this rectangle.We define a random color to be used for filling in the erased image, the
fill_color
, as annp.ndarray
instance of size[3, ]
and of dtypenp.uint8
.Thus, the raw parameters is the tuple
(i, j, h, w, fill_color)
, which is output by this method.
Applying the Transform with apply_transform
#
We define the our implementation of
apply_transform
in a straightfoward manner as follows.We select the rectangle slice of the image starting at pixel
(i, j)
and of shape(h, w)
. This slice is replaced with the colorfill_color
. We then return the augmented image to complete the implementation.
Standard Format Parameters with post_process_params
#
Our implementation requires that final parameters to be output by the transform MUST be a tuple of scalars.
Thus, we define our implementation of
post_process_params
as follows.We take the starting pixel
(i, j)
, rectangle size(h, w)
, and the fill colorfill_color = array([r, g, b])
, and return the7
-tuple of parameters(i, j, h, w, r, g, b)
.
Extracting Required Parameters with extract_params
#
The scenario for this function is that we are given a tuple of scalar parameters and we want to extract the parameters that we need for the current transform. By design, these required parameters are going to be at the beginning of the parameters tuple.
Thus, we define our implementation of
extract_params
as follows.We extract the first
self.param_count
many parameters as the required processed parameters and keep the rest of the parameters to the next transforms.Note that in almost all the cases, this is going to be the implementation of this
post_process_params
.
Recovering Raw Parameters with pre_process_params
#
Note that the extracted parameters are going to be post-processed versions of the corresponding raw parameters. Thus, we need to convert these back to obtain these raw parameters, which can then be used to apply the transform.
Thus, we define our implementation of
pre_process_params
as follows.We keep the
i, j, h, w
parameters as is and convert the last three scalar parametersr, g, b
into anumpy.ndarray
instance of size[3, ]
and dtypenp.uint8
to obtain the fill colorfill_color
.We then return the tuple of the recovered raw parameters
(i, j, h, w, fill_color)
.
Defining Default Parameters with get_default_params
#
This is a tricky method as we want to define the default parameters, which preserve the image as much as possible.
In case of this transform, we can indeed obtain an “identity transform”; we can define parameter settings such that the transformed image will be identical to the input one. In fact, we have more than one parameter settings that can preserve the identity of the image. Thus, we also choose to provide control over these identity parameters via the
DefaultParamsMode
enum and thedefault_params_mode
attribute.With this in consideration, we define our implementation of
get_default_params
as follows.First, note that if
default_params_mode
isDefaultParamsMode.UNIQUE
, we can use the location of the rectangle to be erased as(i=0, j=0)
and keep its height and width as(h=0, w=0)
. Here, the idea is that a rectangle at location(0, 0)
and of shape(0, 0)
is being erased, which means that we are NOT erasing anything. Further, we define thefill_color
to be(r=0, g=0, b=0)
to fill in the erased rectangle. Thus, the processed default parameters fordefault_params_mode = DefaultParamsMode.UNIQUE
are the tuple(0, 0, 0, 0, 0, 0, 0)
.Otherwise, we have
default_params_mode = DefaultParamsMode.RANDOMIZED
and we want to obtain randomized default parameters. We can indeed have randomized default parameters because a(0, 0)
shaped rectangle at ANY location in the image will leave the image unchanged. Further, vacuously, this rectangle can be colored with ANY fill color and we will still get the same image back.Note that we can have some other ways in which we can obtain an identity transform too.
For instance, we can have any location for rectangle to be erased, any color to fill in this region, but exactly one of height and width to be 0.
This parameter setting is indeed possible to achieve an identity transform but for the sake of simplicity in this example, we will ignore these parameter settings!
Thus, in this remaining case, we sample any valid location
(sample_i, sample_j)
in the input image for the rectangular region to be erased. We define the height and width of this rectangle to be(h=0, w=0)
. Finally, we sample any color(sample_r, sample_g, sample_b)
to be used to fill in this rectangular region. And thus, the returned default parameters by this transform will be the sample–(sample_i, sample_j, 0, 0, sample_r, sample_g, sample_b)
.
And that’s it! We have written our own parameterized transform! Now, we will write a custom composing transform to illustrate its structure. As we will see, the structure for composing transforms is almost the same as that for atomic transforms but with minor changes.
A Quick Recap of the Composing Transform Structure#
Recall that to define your own composing transform, you can do the following–
Subclass from
ComposingTransform
.Define the attributes
tx_mode
andparam_count
; you may choose to do so with your concrete definition of theset_param_count
method for the latter.Define the core transform with the attribute
transforms
.Define your concrete implementation of
get_raw_params
method.Define your concrete implementation of
apply_cascade_transform
method.Define your concrete implementation of
post_process_params
method.Define your concrete implementation of
extract_params
method.Define your concrete implementation of
pre_process_params
method.Define your concrete implementation of
apply_consume_transform
method.Define your concrete implementation of
get_default_params
method.(Optional) Define your concrete implementation of
__str__
method.
RandomSubsetApply
: A Custom Composing Transform#
Follow the notebook 002-002-RandomSubsetApply.ipynb while going through the details below.
Let us write a transform
RandomSubsetApply
that wants to perform the following functionality. Given a sequence of transformst1, t2, ..., tN
, it flips a coin for each transform and based on the outcome, decides to either apply that transform or skip applying that transform. As a concrete example, suppose we are given transformsRandomRotation
,ColorJitter
, andRandomAffine
. We flip a coin thrice and observeH
,T
,H
. Then, we applyRandomRotation
to get partially augmented image, skipColorJitter
, and finally applyRandomAffine
on the partially augmented image to get the final augmentation.We name this transform so because effectively, we are taking a subset of all the transforms and applying in order!
Firstly, we observe that this transform inputs a variable number of other transforms and defines a composing behavior on top of their functionality. Thus, we subclass from
ComposingTransform
to define ourRandomSubsetApply
class.
The Initializer __init__
#
Since we need to work with any number of core transforms, we set the attribute
transforms
to hold the list of input core transforms.As seen before in the case of
RandomColorErasing
, we set the transform modetx_mode
attribute.Now, we need to set the
param_count
attribute, which is interesting! For this, we first need to decide how we want to parameterize the augmentation strategy. In this example, we will go about it as follows. When a core transform is indeed chosen to be applied, we will take its augmentation parameters. However, when a transform is skipped, we will use its default parameters, which are intended to act as identity-preserving parameters by design.There can be several ways to parameterized the transform uniquely and we provide one such way here!
Thus, the total parameter count for our composing transform is simply going to be the sum of the parameter counts of all the core transforms. Towards this, we have a special helper function in
parameterized_transforms/utils.py
, namedget_total_params_count
. Thus, we set theparam_count
attribute of the transform using it.In addition, we also set extra attributes
idx_limits
andnum_core_transforms
as we will need them later on. The former attribute holds the indices where parameters of different core transforms begin and end and the latter attribute holds the total number of core transforms. For convenience, let us say that there areN
core transforms.
Getting Raw Parameters with get_raw_params
#
We define our implementation of
get_raw_params
as follows.The raw parameters of our transform are going to be
N
coin flips, where thei-
th coin flip decides whether to apply or skip the transform.We encode this using an
[N, ]
shapednumpy.ndarray
with values0
and1
, representing the outputs of the coin tosses and return it as a tuple.
Applying the CASCADE Mode Transform with apply_cascade_transform
#
Now, having access to the raw parameters, we easily define our implementation of
apply_cascade_transform
as follows.For each transform
tx
in core transforms, we check if we want to use it or skip it based on the corresponding raw parameter. If we want to use the core transform, we ask the core transform to apply a random augmentation and collect the corresponding augmentation parameters. However, if we want to skip the core transform, we ask the core transform to get its default parameters, apply those parameters on the image to get the default-augmented image, and collect the default parameters.Here, the reason to apply the default augmentations is very subtle. It can help in the cases where the augmentation output by one transform might affect the augmentation parameters of the next transform in line. We then concatenate all the collected augmentation parameters together. Finally, we return the tuple of the augmented image and the combined parameters as output.
Here, we can find a subtle difference in the structure of composing transforms versus that of the atomic transforms. The atomic transforms only have one method
apply_transform
, whereas composing transforms have two separate methods–apply_cascade_transform
andapply_consume_transform
. The reason for this will be apparent later on but in brief, this is required due to possibly different functionalities for the core transform based on the mode of the composing transform.Another subtle difference is the signature of these methods;
apply_transform
only returns anIMAGE_TYPE
instance, whereasapply_cascade_transform
andapply_consume_transform
return a tuple ofIMAGE_TYPE
andPARAM_TYPE
. The reason for this is that we want to have access to the augmentation parameters of the core transforms.
Standard Format Parameters with post_process_params
#
Here, we define our implementation of
post_process_params
.As the structure of composing transform shows, there is a subtle difference in the signatures of this method in the atomic and the composing transform templates. In composing transforms, this method inputs the image along with two groups of parameters– 1.
params
, representing the local raw parameters of the composing transforms, and 2.aug_params
, the augmentation parameters collected from the core transforms. The job of this method is to post-process both these groups of parameters and return their processed versions.In our case, by design, the local raw parameters are redundant; the augmentation parameters, along with the
idx_limits
attribute essentially encode all the information that we need to uniquely describe the parametrization.Thus, we simply ignore the
params
and return the tuple(), aug_params
as output!
Extracting Required Parameters with extract_params
#
Again, as seen before, the scenario for this function is that we are given a tuple of scalar parameters and we want to extract the parameters that we need for the current composing transform. These required parameters are going to be at the beginning of the input parameters tuple.
Thus, we define our implementation of
extract_params
exactly as done before.We extract the first
self.param_count
many parameters as the required processed parameters and keep the rest of the parameters to the next transforms.
Recovering Raw Parameters with pre_process_params
#
This is another place where the composing transform structure differs subtly from the atomic transform structure. Obviously, we want to recover raw parameters from the extracted parameters but in this case, we want to recover the local raw parameters that would be needed by the transform to perform the task of applying CONSUME mode transform AS WELL AS the processed augmentation parameters to be supplied to the core transform to make it perform the task of applying its CONSUME mode transform. Thus, instead of returning a single parameter group, here we return a tuple of two parameter groups.
Thus, we define our implementation of
pre_process_params
as follows.Since all the required information would be present in the augmentation params to be fed to the core transforms, we ignore the local raw parameter requirement and return the tuple of empty parameters (for the local raw parameters) and the input
params
as they are (for augmentation parameters).
Applying the CONSUME Mode Transform with apply_consume_transform
#
We define our implementation of the
apply_consume_transform
, which is a separate method to be used to input the local raw parameters as well as augmentation parameters for the core transforms, and return the corresponding augmentation obtained with these parameters.Since local raw parameters are
()
as explained above, we just use the augmentation parameters and chunk them using theidx_limits
attribute to recover the augmentation parameter of each of the core transforms in order.We then just apply these chunked augmentation parameters in order on the core transforms and get the required image.
For the sake of a safety check, we return the augmented image along with the remaining augmentation parameters.
If everything is working correctly, the remaining augmentation parameters MUST be empty, i.e.,
()
!
Defining Default Parameters with get_default_params
#
We define our implementation of the
get_default_params
, which returns the default parameters of this composing transform.Now, note that this transform can preserve identity by just using the default parameters of each of the core transforms; note that which transform were actually used and which were skipped becomes irrelevant in this parametrization scheme!
Thus, we simply obtain the default parameters of each of the core transforms, apply these default parameters on the image to obtain its default-augmented version, and return the combined default augmentation parameters!
As seen before, the application of default parameters is a subtle but important detail; it might be possible that application of a transform might affect the default parameters for some transform next-in-line.
And voilà! We have written our own custom composing transform!
Visualizing Augmentations with RandomColorErasing
and RandomSubsetApply
#
Follow the notebook 002-003-Visualizations.ipynb while going through the details below.
Now, having implemented the transforms
RandomColorErasing
andRandomSubsetApply
, we visualize the augmentations generated by them. This visualization is only for the sake of verification of the efficacy of the transforms and we will explore the power and capabilities of the parameterized transforms in a subsequent tutorial. In the notebook for this section, we provide a function namedshow_image
to quickly visualize different images.We will visualize the augmentations with two beautiful images– Curly, the cat, and Pep, the dog. These images are displayed below–
Visualizing RandomColorErasing
#
We will visualize the augmentations generated by the
RandomColorErasing
with Curly, the cat.Let us create an instace
tx_RCE
of ourRandomColorErasing
transform and print it to see the output of our custom__str__
method.Next, we generate multiple samples from our transform. Observe that the augmentations consist of a rectangular region of the image being erased and replaced by a random color, which is the desired behavior of the transform. Also, observe the parameters of this augmentation and check whether the location and size of the erased rectangle as well as the color used for filling in the erased rectangle are consistent and make sense!
Here are some of the augmentations generated by our transform–
Note that in some cases, the erased rectangle can be difficult to spot due to the size or fill-in color of the erased rectangle.
Next, we also perform two sanity checks–
Let us say that for the original image of Curly, we pass it through the CASCADE mode of
tx_RCE
and obtain an augmentationaug
and corresponding parametersparams
. Now, by definition, if we were to take the original image of Curly along with these parametersparams
, and pass it through the CONSUME mode of thetx_RCE
transform, we ought to get the same augmented imageaug
back. We can observe that this is indeed the case!Note that the default parameters for
RandomColorErasing
transform indeed preserve the image and act as identity transform parameters. Thus, if we were to pass the original image of Curly, along with the default parameters, and pass these through the CONSUME mode of thetx_RCE
transform, we ought to get the original image back. We can also observe that this too is indeed true!
So, we can see that our custom atomic transform
RandomColorErasing
behaves as expected!
Visualizing RandomSubsetApply
#
We will visualize the augmentations generated by the
RandomColorErasing
with Pep, the dog.Note that
RandomSubsetApply
is a composing transform and requires one or more core transform(s) in order to function properly. Given these core transforms, this composing transform will select a subset of the core transforms and apply them IN ORDER (i.e., in the order in which they appear in the core transforms).Thus, we will use the following two transforms as the core transforms for
RandomSubsetApply
– one instance of our ownRandomColorErasing
transform and one instance ofRandomRotation
transform provided by our package.This
RandomRotation
transform inputs an image and rotates it by a random angle sampled from the range provided to it.
So, let us create an instace
tx_RSA
of ourRandomSubsetApply
transform as described above containing the following core transforms in the given order–RandomColorErasing(param_count=7, tx_mode=TransformMode.CASCADE, default_params_mode=DefaultParamsMode.RANDOMIZED)
, which erases a randomly chosen rectangle from the image and replaces it with a randomly chosen fill-in color, andRandomRotation(degrees=[-30.0, 30.0], interpolation=InterpolationMode.NEAREST, expand=False, center=None, fill=0, param_count=1, tx_mode=TransformMode.CASCADE)
, which rotates the input image by a randomly sampled angle (in degrees) from the range[-30.0, 30.0]
. Here, we see the utility of defining the custom implementation of__str__
; it allows for a clean, readable description of the transform, and may also help in debugging issues with complex transforms.
Since our composing transform flips a (fair) coin per core transform and choses to apply it or skip, it should result in the following behavior–
With probability
1/4
, it skips both the transforms and keeps the image as it is.With probability
1/4
, it ONLY applies theRandomColorErasing
transform, thereby erasing a randomly chosen rectangle from the image and filling it in with a randomly chosen color, and skips theRandomRotation
transform.With probability
1/4
, it skips theRandomColorErasing
transform and applies ONLY theRandomRotation
transform, which rotates the original image by a randomly chosen angle (in degrees) from the range[-30.0, 30.0]
.With probability
1/4
, it first applies theRandomColorErasing
to erase a randomly chosen rectangular region from the image and to fill it in with a randomly chosen color. It then applies theRandomRotation
transform to rotate this partially augmented image by a randomly chosen angle (in degrees) from the range[-30.0, 30.0]
.
Thus, extremely importantly, we will NEVER observe images that are rotated but have a rectangular region colored with a random color that is axis-aligned with the original image, which would be the result of first applying the
RandomRotation
core transform followed by theRandomColorErasing
core transform.Further, by definition of composing transforms, the output parameters will be a tuple of
8
scalars, the first7
scalars representing the parameters for theRandomColorErasing
transform and the last parameter will be the angle of rotation of the image in degrees, which is the single parameter generated by theRandomRotation
core transform.Here are some of the augmentations generated by our transform–
Note that in some cases, the erased rectangle can be difficult to spot due to the size or fill-in color of the erased rectangle.
Ensure that the parameters for the augmentations make sense and that the fraction of different types of behaviors
of this composing transform is consistent with the description above.
Further, ensure that you do NOT see any augmentations where the image is rotated by a non-zero angle but it has a non-rotated rectangular patch erased and colored with a random color. This is because as explained above, that behavior would be the result of first applying
RandomRotation
and thenRandomColorErasing
. In particular, you should NOT observe any image similar to the ones shown below–
Next, similar to the case of
RandomColorErasing
, we also perform two sanity checks for theRandomSubsetApply
transform–Let us say that for the original image of Pep, we pass it through the CASCADE mode of
tx_RSA
and obtain an augmentationaug
and corresponding parametersparams
. Now, by definition, if we were to take the original image of Pep along with these parametersparams
, and pass it through the CONSUME mode of thetx_RSA
transform, we ought to get the same augmented imageaug
back. We can observe that this is indeed the case!Note that the default parameters for
RandomSubsetApply
transform indeed preserve the image and act as identity transform parameters. This is because by definition, the identity parameters ofRandomSubsetApply
are a concatenation of the identity parameters ofRandomColorErasing
and the identity parameters ofRandomRotation
, both of which preserve the image identity separately. Thus, if we were to pass the original image of Pep the cat, along with the default parameters, through the CONSUME mode of thetx_RSA
transform, we ought to get the original image of Pep back. We can observe that this too is indeed true!
So, we can see that our custom composing transform
RandomSubsetApply
behaves as expected!
About the Next Tutorial#
Now that we have written our own parameterized transforms and verified experimentally that they work as expected, we will see the transforms that are already provided in the package in our next tutorial– 003-A-Brief-Introduction-to-the-Transforms-in-This-Package.