cvnets.layers package

Subpackages

Submodules

cvnets.layers.adaptive_pool module

class cvnets.layers.adaptive_pool.AdaptiveAvgPool2d(output_size: int | Tuple[int, int] = 1, *args, **kwargs)[source]

Bases: AdaptiveAvgPool2d

Applies a 2D adaptive average pooling over an input tensor.

Parameters:: output_size (Optional, int or Tuple[int, int]) – The target output size. If a single int $h$ is passed,

:param then a square output of size $h x h$ is produced. If a tuple of size $h x w$ is passed: :param then an: :param output of size hxw is produced. Default is 1.:

Shape:

Input: $(N, C, H, W)$ where $N$ is the batch size, $C$ is the number of input channels,

$H$ is the input height, and $W$ is the input width - Output: $(N, C, h, h)$ or $(N, C, h, w)$

__init__(output_size: int | Tuple[int, int] = 1, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.base_layer module

class cvnets.layers.base_layer.BaseLayer(*args, **kwargs)[source]

Bases: Module

Base class for neural network layers. Subclass must implement forward function.

__init__(*args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]: Add layer specific arguments

get_trainable_parameters(weight_decay: float | None = 0.0, no_decay_bn_filter_bias: bool | None = False, *args, **kwargs) → Tuple[List[Dict], List[float]][source]

Get parameters for training along with the learning rate.

Parameters:

weight_decay – weight decay
no_decay_bn_filter_bias – Do not decay BN and biases. Defaults to False.

Returns:

Returns a tuple of length 2. The first entry is a list of dictionary with three keys (params, weight_decay, param_names). The second entry is a list of floats containing learning rate for each parameter.

Note

Learning rate multiplier is set to 1.0 here as it is handled inside the Central Model.

forward(*args, **kwargs) → Any[source]: Forward function.

cvnets.layers.conv_layer module

Bases: Conv2d

Applies a 2D convolution over an input.

Parameters:

in_channels – $C_{i n}$ from an expected input of size $(N, C_{i n}, H_{i n}, W_{i n})$ .
out_channels – $C_{o u t}$ from an expected output of size $(N, C_{o u t}, H_{o u t}, W_{o u t})$ .
kernel_size – Kernel size for convolution.
stride – Stride for convolution. Default: 1.
padding – Padding for convolution. Default: 0.
dilation – Dilation rate for convolution. Default: 1.
groups – Number of groups in convolution. Default: 1.
bias – Use bias. Default: False.
padding_mode – Padding mode (‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’). Default: zeros.
use_norm – Use normalization layer after convolution. Default: True.
use_act – Use activation layer after convolution (or convolution and normalization). Default: True.
act_name – Use specific activation function. Overrides the one specified in command line args.

Shape:

Input: $(N, C_{i n}, H_{i n}, W_{i n})$ .
Output: $(N, C_{o u t}, H_{o u t}, W_{o u t})$ .

__init__(in_channels: int, out_channels: int, kernel_size: int | Tuple[int, int], stride: int | Tuple[int, int] | None = 1, padding: int | Tuple[int, int] | None = 0, dilation: int | Tuple[int, int] | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.conv_layer.ConvLayer1d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]

Bases: _BaseConvNormActLayer

ndim = 1

module_cls: alias of Conv1d

class cvnets.layers.conv_layer.ConvLayer2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]

Bases: _BaseConvNormActLayer

ndim = 2

module_cls: alias of Conv2d

class cvnets.layers.conv_layer.ConvLayer3d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]

Bases: _BaseConvNormActLayer

ndim = 3

module_cls: alias of Conv3d

Bases: BaseLayer

Applies a 2D Transpose convolution (aka as Deconvolution) over an input.

Parameters:

opts – Command line arguments.
in_channels – $C_{i n}$ from an expected input of size $(N, C_{i n}, H_{i n}, W_{i n})$ .
out_channels – $C_{o u t}$ from an expected output of size $(N, C_{o u t}, H_{o u t}, W_{o u t})$ .
kernel_size – Kernel size for convolution.
stride – Stride for convolution. Default: 1.
dilation – Dilation rate for convolution. Default: 1.
groups – Number of groups in convolution. Default: 1.
bias – Use bias. Default: False.
padding_mode – Padding mode. Default: zeros.
use_norm – Use normalization layer after convolution. Default: True.
use_act – Use activation layer after convolution (or convolution and normalization).
Default – True.
padding – Padding will be done on both sides of each dimension in the input.
output_padding – Additional padding on the output tensor.
auto_padding – Compute padding automatically. Default: True.

Shape:

Input: $(N, C_{i n}, H_{i n}, W_{i n})$ .
Output: $(N, C_{o u t}, H_{o u t}, W_{o u t})$ .

__init__(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple, stride: int | Tuple | None = 1, dilation: int | Tuple | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', use_norm: bool | None = True, use_act: bool | None = True, padding: int | Tuple | None = (0, 0), output_padding: int | Tuple | None = None, auto_padding: bool | None = True, *args, **kwargs)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]: Forward function.

class cvnets.layers.conv_layer.NormActLayer(opts, num_features, *args, **kwargs)[source]

Bases: BaseLayer

Applies a normalization layer followed by an activation layer.

Parameters:

opts – Command-line arguments.
num_features – $C$ from an expected input of size $(N, C, H, W)$ .

Shape:

Input: $(N, C, H, W)$ .
Output: $(N, C, H, W)$ .

__init__(opts, num_features, *args, **kwargs)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]: Forward function.

class cvnets.layers.conv_layer.SeparableConv1d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]

Bases: _BaseSeparableConv

conv_layer_cls: alias of ConvLayer1d

class cvnets.layers.conv_layer.SeparableConv2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]

Bases: _BaseSeparableConv

conv_layer_cls: alias of ConvLayer2d

class cvnets.layers.conv_layer.SeparableConv3d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]

Bases: _BaseSeparableConv

conv_layer_cls: alias of ConvLayer3d

cvnets.layers.dropout module

class cvnets.layers.dropout.Dropout(p: float | None = 0.5, inplace: bool | None = False, *args, **kwargs)[source]

Bases: Dropout

This layer, during training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution.

Parameters:

p – probability of an element to be zeroed. Default: 0.5
inplace – If set to True, will do this operation in-place. Default: False

Shape:

Input: $(N, *)$ where $N$ is the batch size
Output: same as the input

__init__(p: float | None = 0.5, inplace: bool | None = False, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.dropout.Dropout2d(p: float = 0.5, inplace: bool = False)[source]

Bases: Dropout2d

This layer, during training, randomly zeroes some of the elements of the 4D input tensor with probability p using samples from a Bernoulli distribution.

Parameters:

p – probability of an element to be zeroed. Default: 0.5
inplace – If set to True, will do this operation in-place. Default: False

Shape:

Input: $(N, C, H, W)$ where $N$ is the batch size, $C$ is the input channels,
$H$ is the input tensor height, and $W$ is the input tensor width
Output: same as the input

__init__(p: float = 0.5, inplace: bool = False)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.embedding module

class cvnets.layers.embedding.Embedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, *args, **kwargs)[source]

Bases: Embedding

A lookup table that stores embeddings of a fixed dictionary and size.

Parameters:

num_embeddings (int) – size of the dictionary of embeddings
embedding_dim (int) – the size of each embedding vector
padding_idx (int, optional) – If specified, the entries at padding_idx do not contribute to the gradient; therefore, the embedding vector at padding_idx is not updated during training, i.e. it remains as a fixed “pad”. For a newly constructed Embedding, the embedding vector at padding_idx will default to all zeros, but can be updated to another value to be used as the padding vector.

Shape:

Input: $(*)$ , IntTensor or LongTensor of arbitrary shape containing the indices to extract
Output: $(*, H)$ , where * is the input shape and $H = embedding\_dim$

__init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, *args, **kwargs)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

reset_parameters() → None[source]

cvnets.layers.flatten module

class cvnets.layers.flatten.Flatten(start_dim: int | None = 1, end_dim: int | None = -1)[source]

Bases: Flatten

This layer flattens a contiguous range of dimensions into a tensor.

Parameters:

start_dim (Optional[int]) – first dim to flatten. Default: 1
end_dim (Optional[int]) – last dim to flatten. Default: -1

Shape:

Input: $(*, S_{start}, . . ., S_{i}, . . ., S_{end}, *)$ ,’ where $S_{i}$ is the size at dimension $i$ and $*$ means any number of dimensions including none.
Output: $(*, \prod_{i = start}^{end} S_{i}, *)$ .

__init__(start_dim: int | None = 1, end_dim: int | None = -1)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.global_pool module

class cvnets.layers.global_pool.GlobalPool(pool_type: str | None = 'mean', keep_dim: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

This layers applies global pooling over a 4D or 5D input tensor

Parameters:

pool_type (Optional[str]) – Pooling type. It can be mean, rms, or abs. Default: mean
keep_dim (Optional[bool]) – Do not squeeze the dimensions of a tensor. Default: False

Shape:

Input: $(N, C, H, W)$ or $(N, C, D, H, W)$
Output: $(N, C, 1, 1)$ or $(N, C, 1, 1, 1)$ if keep_dim else $(N, C)$

pool_types = ['mean', 'rms', 'abs']

__init__(pool_type: str | None = 'mean', keep_dim: bool | None = False, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser)[source]: Add layer specific arguments

forward(x: Tensor) → Tensor[source]: Forward function.

cvnets.layers.identity module

class cvnets.layers.identity.Identity[source]

Bases: BaseLayer

This is a place-holder and returns the same tensor.

__init__()[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]: Forward function.

cvnets.layers.linear_attention module

class cvnets.layers.linear_attention.LinearSelfAttention(opts, embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs)[source]

Bases: BaseLayer

This layer applies a self-attention with linear complexity, as described in MobileViTv2 paper. This layer can be used for self- as well as cross-attention.

Parameters:

opts – command line arguments
embed_dim (int) – $C$ from an expected input of size $(N, C, H, W)$
attn_dropout (Optional[float]) – Dropout value for context scores. Default: 0.0
bias (Optional[bool]) – Use bias in learnable layers. Default: True

Shape:

Input: $(N, C, P, N)$ where $N$ is the batch size, $C$ is the input channels,

$P$ is the number of pixels in the patch, and $N$ is the number of patches - Output: same as the input

Note

For MobileViTv2, we unfold the feature map [B, C, H, W] into [B, C, P, N] where P is the number of pixels in a patch and N is the number of patches. Because channel is the first dimension in this unfolded tensor, we use point-wise convolution (instead of a linear layer). This avoids a transpose operation (which may be expensive on resource-constrained devices) that may be required to convert the unfolded tensor from channel-first to channel-last format in case of a linear layer.

__init__(opts, embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

static visualize_context_scores(context_scores)[source]

forward(x: Tensor, x_prev: Tensor | None = None, *args, **kwargs) → Tensor[source]: Forward function.

cvnets.layers.linear_layer module

class cvnets.layers.linear_layer.LinearLayer(in_features: int, out_features: int, bias: bool | None = True, channel_first: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

Applies a linear transformation to the input data

Parameters:

in_features (int) – number of features in the input tensor
out_features (int) – number of features in the output tensor
bias (Optional[bool]) – use bias or not
channel_first (Optional[bool]) – Channels are first or last dimension. If first, then use Conv2d

Shape:

Input: $(N, *, C_{i n})$ if not channel_first else $(N, C_{i n}, *)$ where $*$ means any number of dimensions.
Output: $(N, *, C_{o u t})$ if not channel_first else $(N, C_{o u t}, *)$

__init__(in_features: int, out_features: int, bias: bool | None = True, channel_first: bool | None = False, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser)[source]: Add layer specific arguments

reset_params()[source]

forward(x: Tensor) → Tensor[source]: Forward function.

class cvnets.layers.linear_layer.GroupLinear(in_features: int, out_features: int, n_groups: int, bias: bool | None = True, feature_shuffle: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

Applies a GroupLinear transformation layer, as defined here, here and here

Parameters:

in_features (int) – number of features in the input tensor
out_features (int) – number of features in the output tensor
n_groups (int) – number of groups
bias (Optional[bool]) – use bias or not
feature_shuffle (Optional[bool]) – Shuffle features between groups

Shape:

Input: $(N, *, C_{i n})$
Output: $(N, *, C_{o u t})$

__init__(in_features: int, out_features: int, n_groups: int, bias: bool | None = True, feature_shuffle: bool | None = False, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser)[source]: Add layer specific arguments

reset_params()[source]

forward(x: Tensor) → Tensor[source]: Forward function.

cvnets.layers.multi_head_attention module

class cvnets.layers.multi_head_attention.MultiHeadAttention(embed_dim: int, num_heads: int, attn_dropout: float | None = 0.0, bias: bool | None = True, output_dim: int | None = None, coreml_compatible: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

This layer applies a multi-head self- or cross-attention as described in Attention is all you need paper

Parameters:

embed_dim (int) – $C_{i n}$ from an expected input of size $(N, S, C_{i n})$
num_heads (int) – Number of heads in multi-head attention
attn_dropout (Optional[float]) – Attention dropout. Default: 0.0
bias (Optional[bool]) – Use bias or not. Default: True

Shape:

Input:
- Query tensor (x_q) $(N, S, C_{i n})$ where $N$ is batch size, $S$ is number of source tokens,

and $C_{i n}$ is input embedding dim

Optional Key-Value tensor (x_kv) $(N, T, C_{i n})$ where $T$ is number of target tokens

Output: same shape as the input

__init__(embed_dim: int, num_heads: int, attn_dropout: float | None = 0.0, bias: bool | None = True, output_dim: int | None = None, coreml_compatible: bool | None = False, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward_tracing(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) → Tensor[source]

forward_default(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) → Tensor[source]

forward_pytorch(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) → Tensor[source]

forward(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None, *args, **kwargs) → Tensor[source]: Forward function.

cvnets.layers.normalization_layers module

class cvnets.layers.normalization_layers.AdjustBatchNormMomentum(opts, *args, **kwargs)[source]

Bases: object

This class enables adjusting the momentum in batch normalization layer.

Note

It’s an experimental feature and should be used with caution.

round_places = 6

__init__(opts, *args, **kwargs)[source]

adjust_momentum(model: Module, iteration: int, epoch: int) → None[source]

cvnets.layers.pixel_shuffle module

class cvnets.layers.pixel_shuffle.PixelShuffle(upscale_factor: int, *args, **kwargs)[source]

Bases: PixelShuffle

Rearranges elements in a tensor of shape $(*, C i m e s r^{2}, H, W)$ to a tensor of shape $(*, C, H i m e s r, W i m e s r)$ , where r is an upscale factor.

Parameters:: upscale_factor (int) – factor to increase spatial resolution by

Shape:

Input: $(*, C i m e s r^{2}, H, W)$ , where * is zero or more dimensions
Output: $(*, C, H i m e s r, W i m e s r)$

__init__(upscale_factor: int, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.pooling module

class cvnets.layers.pooling.MaxPool2d(kernel_size: int | None = 3, stride: int | None = 2, padding: int | None = 1, *args, **kwargs)[source]

Bases: MaxPool2d

Applies a 2D max pooling over a 4D input tensor.

Parameters:

kernel_size (Optional[int]) – the size of the window to take a max over
stride (Optional[int]) – The stride of the window. Default: 2
padding (Optional[int]) – Padding to be added on both sides of the tensor. Default: 1

Shape:

Input: $(N, C, H_{i n}, W_{i n})$ where $N$ is the batch size, $C$ is the input channels,
$H_{i n}$ is the input height, and $W_{i n}$ is the input width
Output: $(N, C, H_{o u t}, W_{o u t})$ where $H_{o u t}$ is the output height, and $W_{i n}$ is
the output width

__init__(kernel_size: int | None = 3, stride: int | None = 2, padding: int | None = 1, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

Bases: AvgPool2d

Applies a 2D average pooling over a 4D input tensor.

Parameters:

kernel_size (Optional[int]) – the size of the window to take a max over
stride (Optional[int]) – The stride of the window. Default: 2
padding (Optional[int]) – Padding to be added on both sides of the tensor. Default: 1
ceil_mode (Optional[bool]) – When True, will use ceil instead of floor to compute the output shape. Default: False
count_include_pad (Optional[bool]) – When True, will include the zero-padding in the averaging calculation. Default: True
divisor_override – if specified, it will be used as divisor, otherwise size of the pooling region will be used. Default: None

Shape:

Input: $(N, C, H_{i n}, W_{i n})$ where $N$ is the batch size, $C$ is the input channels,
$H_{i n}$ is the input height, and $W_{i n}$ is the input width
Output: $(N, C, H_{o u t}, W_{o u t})$ where $H_{o u t}$ is the output height, and $W_{i n}$ is
the output width

__init__(kernel_size: tuple, stride: tuple | None = None, padding: tuple | None = (0, 0), ceil_mode: bool | None = False, count_include_pad: bool | None = True, divisor_override: bool | None = None)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.positional_embedding module

class cvnets.layers.positional_embedding.PositionalEmbedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, is_learnable: bool | None = False, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]

Bases: BaseLayer

__init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, is_learnable: bool | None = False, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(seq_len: int, *args, **kwargs) → Tensor[source]: Forward function.

class cvnets.layers.positional_embedding.LearnablePositionalEmbedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]

Bases: Module

Learnable Positional embedding

__init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

reset_parameters() → None[source]

forward(seq_len: int, *args, **kwargs) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class cvnets.layers.positional_embedding.SinusoidalPositionalEmbedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]

Bases: Module

__init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

get_weights() → Tensor[source]: Build sinusoidal embeddings. Adapted from Fairseq.

forward(seq_len: int, *args, **kwargs) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

cvnets.layers.positional_encoding module

class cvnets.layers.positional_encoding.SinusoidalPositionalEncoding(d_model: int, dropout: float | None = 0.0, max_len: int | None = 5000, channels_last: bool | None = True, *args, **kwargs)[source]

Bases: BaseLayer

This layer adds sinusoidal positional embeddings to a 3D input tensor. The code has been adapted from Pytorch tutorial

Parameters:

d_model (int) – dimension of the input tensor
dropout (Optional[float]) – Dropout rate. Default: 0.0
max_len (Optional[int]) – Max. number of patches (or seq. length). Default: 5000
channels_last (Optional[bool]) – Channels dimension is the last in the input tensor

Shape:

Input: $(N, C, P)$ or $(N, P, C)$ where $N$ is the batch size, $C$ is the embedding dimension,
$P$ is the number of patches
Output: same shape as the input

__init__(d_model: int, dropout: float | None = 0.0, max_len: int | None = 5000, channels_last: bool | None = True, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward_patch_last(x, indices: Tensor | None = None, *args, **kwargs) → Tensor[source]

forward_others(x, indices: Tensor | None = None, *args, **kwargs) → Tensor[source]

forward(x, indices: Tensor | None = None, *args, **kwargs) → Tensor[source]: Forward function.

class cvnets.layers.positional_encoding.LearnablePositionEncoding(embed_dim: int, num_embeddings: int, dropout: float | None = 0.0, channels_last: bool | None = True, *args, **kwargs)[source]

Bases: BaseLayer

This layer adds learnable positional embeddings to a 3D input tensor.

Parameters:

embed_dim (int) – dimension of the input tensor
num_embeddings (int) – number of input embeddings. This is similar to vocab size in NLP.
dropout (Optional[float]) – Dropout rate. Default: 0.0
channels_last (Optional[bool]) – Channels dimension is the last in the input tensor

Shape:

Input: $(N, *, C, P)$ or $(N, *, P, C)$ where $N$ is the batch size, $C$ is the embedding dimension,
$P$ is the number of patches
Output: same shape as the input

__init__(embed_dim: int, num_embeddings: int, dropout: float | None = 0.0, channels_last: bool | None = True, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x, *args, **kwargs) → Tensor[source]: Forward function.

cvnets.layers.random_layers module

class cvnets.layers.random_layers.RandomApply(module_list: List, keep_p: float | None = 0.8, *args, **kwargs)[source]

Bases: BaseLayer

This layer randomly applies a list of modules during training.

Parameters:

module_list (List) – List of modules
keep_p (Optional[float]) – Keep P modules from the list during training. Default: 0.8 (or 80%)

__init__(module_list: List, keep_p: float | None = 0.8, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]: Forward function.

cvnets.layers.single_head_attention module

class cvnets.layers.single_head_attention.SingleHeadAttention(embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs)[source]

Bases: BaseLayer

This layer applies a single-head attention as described in DeLighT paper

Parameters:

embed_dim (int) – $C_{i n}$ from an expected input of size $(N, P, C_{i n})$
attn_dropout (Optional[float]) – Attention dropout. Default: 0.0
bias (Optional[bool]) – Use bias or not. Default: True

Shape:

Input: $(N, P, C_{i n})$ where $N$ is batch size, $P$ is number of patches,

and $C_{i n}$ is input embedding dim - Output: same shape as the input

__init__(embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None, *args, **kwargs) → Tensor[source]: Forward function.

cvnets.layers.softmax module

class cvnets.layers.softmax.Softmax(dim: int | None = -1, *args, **kwargs)[source]

Bases: Softmax

Applies the Softmax function to an input tensor along the specified dimension

Parameters:: dim (int) – Dimension along which softmax to be applied. Default: -1

Shape:

Input: $(*)$ where $*$ is one or more dimensions
Output: same shape as the input

__init__(dim: int | None = -1, *args, **kwargs)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.stochastic_depth module

class cvnets.layers.stochastic_depth.StochasticDepth(p: float, mode: str)[source]

Bases: StochasticDepth

Implements the Stochastic Depth “Deep Networks with Stochastic Depth” used for randomly dropping residual branches of residual architectures.

__init__(p: float, mode: str) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.token_merging module

class cvnets.layers.token_merging.TokenMerging(dim: int, window: int = 2)[source]

Bases: Module

Merge tokens from a [batch_size, sequence_length, num_channels] tensor using a linear projection.

This function also updates masks and adds padding as needed to make the sequence length divisible by the window size before merging tokens.

Parameters:

dim – Number of input channels.
window – The size of the window to merge into a single token.

__init__(dim: int, window: int = 2) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor, key_padding_mask: Tensor) → Tuple[Tensor, Tensor][source]

Perform token merging.

Parameters:

x – A tensor of shape [batch_size, sequence_length, num_channels].
key_padding_mask – A tensor of shape [batch_size, sequence_length] with “-inf” values at mask tokens, and “0” values at unmasked tokens.

Returns:

A tensor of shape [batch_size, math.ceil(sequence_length /: self.window), num_channels], where @self.window is the window size.

extra_repr() → str[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

cvnets.layers.token_merging.pad_x_and_mask(x: Tensor, key_padding_mask: Tensor, window_size: int) → Tuple[Tensor, Tensor][source]

Apply padding to @x and @key_padding_mask to make their lengths divisible by @window_size.

Parameters:

x – The input tensor of shape [B, N, C].
key_padding_mask – The mask of shape [B, N].
window_size – the N dimension of @x and @key_padding_mask will be padded to make them divisble by this number.

Returns:

A tuple containing @x and @key_padding_mask, with padding applied.

cvnets.layers.upsample module

Bases: Upsample

This layer upsamples a given input tensor.

Parameters:

size (Optional[Union[int, Tuple[int, ...]]) – Output spatial size. Default: None
scale_factor (Optional[float]) – Scale each spatial dimension of the input by this factor. Default: None
mode (Optional[str]) – Upsampling algorithm ('nearest', 'linear', 'bilinear', 'bicubic' and 'trilinear'. Default: 'nearest'
align_corners (Optional[bool]) – if True, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect when mode is 'linear', 'bilinear', 'bicubic', or 'trilinear'. Default: None

Shape:

Input: $(N, C, W_{i n})$ or $(N, C, H_{i n}, W_{i n})$ or $(N, C, D_{i n}, H_{i n}, W_{i n})$
Output: $(N, C, W_{o u t})$ or $(N, C, H_{o u t}, W_{o u t})$ or $(N, C, D_{o u t}, H_{o u t}, W_{o u t})$

__init__(size: int | Tuple[int, ...] | None = None, scale_factor: float | None = None, mode: str | None = 'nearest', align_corners: bool | None = None, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

Module contents

class cvnets.layers.ConvLayer1d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]

Bases: _BaseConvNormActLayer

ndim = 1

module_cls: alias of Conv1d

training: bool

class cvnets.layers.ConvLayer2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]

Bases: _BaseConvNormActLayer

ndim = 2

module_cls: alias of Conv2d

training: bool

class cvnets.layers.ConvLayer3d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]

Bases: _BaseConvNormActLayer

ndim = 3

module_cls: alias of Conv3d

training: bool

class cvnets.layers.SeparableConv1d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]

Bases: _BaseSeparableConv

conv_layer_cls: alias of ConvLayer1d

class cvnets.layers.SeparableConv2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]

Bases: _BaseSeparableConv

conv_layer_cls: alias of ConvLayer2d

class cvnets.layers.SeparableConv3d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]

Bases: _BaseSeparableConv

conv_layer_cls: alias of ConvLayer3d

class cvnets.layers.NormActLayer(opts, num_features, *args, **kwargs)[source]

Bases: BaseLayer

Applies a normalization layer followed by an activation layer.

Parameters:

opts – Command-line arguments.
num_features – $C$ from an expected input of size $(N, C, H, W)$ .

Shape:

Input: $(N, C, H, W)$ .
Output: $(N, C, H, W)$ .

__init__(opts, num_features, *args, **kwargs)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]: Forward function.

Bases: BaseLayer

Applies a 2D Transpose convolution (aka as Deconvolution) over an input.

Parameters:

opts – Command line arguments.
in_channels – $C_{i n}$ from an expected input of size $(N, C_{i n}, H_{i n}, W_{i n})$ .
out_channels – $C_{o u t}$ from an expected output of size $(N, C_{o u t}, H_{o u t}, W_{o u t})$ .
kernel_size – Kernel size for convolution.
stride – Stride for convolution. Default: 1.
dilation – Dilation rate for convolution. Default: 1.
groups – Number of groups in convolution. Default: 1.
bias – Use bias. Default: False.
padding_mode – Padding mode. Default: zeros.
use_norm – Use normalization layer after convolution. Default: True.
use_act – Use activation layer after convolution (or convolution and normalization).
Default – True.
padding – Padding will be done on both sides of each dimension in the input.
output_padding – Additional padding on the output tensor.
auto_padding – Compute padding automatically. Default: True.

Shape:

Input: $(N, C_{i n}, H_{i n}, W_{i n})$ .
Output: $(N, C_{o u t}, H_{o u t}, W_{o u t})$ .

__init__(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple, stride: int | Tuple | None = 1, dilation: int | Tuple | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', use_norm: bool | None = True, use_act: bool | None = True, padding: int | Tuple | None = (0, 0), output_padding: int | Tuple | None = None, auto_padding: bool | None = True, *args, **kwargs)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]: Forward function.

class cvnets.layers.LinearLayer(in_features: int, out_features: int, bias: bool | None = True, channel_first: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

Applies a linear transformation to the input data

Parameters:

in_features (int) – number of features in the input tensor
out_features (int) – number of features in the output tensor
bias (Optional[bool]) – use bias or not
channel_first (Optional[bool]) – Channels are first or last dimension. If first, then use Conv2d

Shape:

Input: $(N, *, C_{i n})$ if not channel_first else $(N, C_{i n}, *)$ where $*$ means any number of dimensions.
Output: $(N, *, C_{o u t})$ if not channel_first else $(N, C_{o u t}, *)$

__init__(in_features: int, out_features: int, bias: bool | None = True, channel_first: bool | None = False, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser)[source]: Add layer specific arguments

reset_params()[source]

forward(x: Tensor) → Tensor[source]: Forward function.

class cvnets.layers.GroupLinear(in_features: int, out_features: int, n_groups: int, bias: bool | None = True, feature_shuffle: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

Applies a GroupLinear transformation layer, as defined here, here and here

Parameters:

in_features (int) – number of features in the input tensor
out_features (int) – number of features in the output tensor
n_groups (int) – number of groups
bias (Optional[bool]) – use bias or not
feature_shuffle (Optional[bool]) – Shuffle features between groups

Shape:

Input: $(N, *, C_{i n})$
Output: $(N, *, C_{o u t})$

__init__(in_features: int, out_features: int, n_groups: int, bias: bool | None = True, feature_shuffle: bool | None = False, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser)[source]: Add layer specific arguments

reset_params()[source]

forward(x: Tensor) → Tensor[source]: Forward function.

class cvnets.layers.GlobalPool(pool_type: str | None = 'mean', keep_dim: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

This layers applies global pooling over a 4D or 5D input tensor

Parameters:

pool_type (Optional[str]) – Pooling type. It can be mean, rms, or abs. Default: mean
keep_dim (Optional[bool]) – Do not squeeze the dimensions of a tensor. Default: False

Shape:

Input: $(N, C, H, W)$ or $(N, C, D, H, W)$
Output: $(N, C, 1, 1)$ or $(N, C, 1, 1, 1)$ if keep_dim else $(N, C)$

pool_types = ['mean', 'rms', 'abs']

__init__(pool_type: str | None = 'mean', keep_dim: bool | None = False, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser)[source]: Add layer specific arguments

forward(x: Tensor) → Tensor[source]: Forward function.

training: bool

class cvnets.layers.Identity[source]

Bases: BaseLayer

This is a place-holder and returns the same tensor.

__init__()[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]: Forward function.

class cvnets.layers.PixelShuffle(upscale_factor: int, *args, **kwargs)[source]

Bases: PixelShuffle

Rearranges elements in a tensor of shape $(*, C i m e s r^{2}, H, W)$ to a tensor of shape $(*, C, H i m e s r, W i m e s r)$ , where r is an upscale factor.

Parameters:: upscale_factor (int) – factor to increase spatial resolution by

Shape:

Input: $(*, C i m e s r^{2}, H, W)$ , where * is zero or more dimensions
Output: $(*, C, H i m e s r, W i m e s r)$

__init__(upscale_factor: int, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

Bases: Upsample

This layer upsamples a given input tensor.

Parameters:

size (Optional[Union[int, Tuple[int, ...]]) – Output spatial size. Default: None
scale_factor (Optional[float]) – Scale each spatial dimension of the input by this factor. Default: None
mode (Optional[str]) – Upsampling algorithm ('nearest', 'linear', 'bilinear', 'bicubic' and 'trilinear'. Default: 'nearest'
align_corners (Optional[bool]) – if True, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect when mode is 'linear', 'bilinear', 'bicubic', or 'trilinear'. Default: None

Shape:

Input: $(N, C, W_{i n})$ or $(N, C, H_{i n}, W_{i n})$ or $(N, C, D_{i n}, H_{i n}, W_{i n})$
Output: $(N, C, W_{o u t})$ or $(N, C, H_{o u t}, W_{o u t})$ or $(N, C, D_{o u t}, H_{o u t}, W_{o u t})$

__init__(size: int | Tuple[int, ...] | None = None, scale_factor: float | None = None, mode: str | None = 'nearest', align_corners: bool | None = None, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.MaxPool2d(kernel_size: int | None = 3, stride: int | None = 2, padding: int | None = 1, *args, **kwargs)[source]

Bases: MaxPool2d

Applies a 2D max pooling over a 4D input tensor.

Parameters:

kernel_size (Optional[int]) – the size of the window to take a max over
stride (Optional[int]) – The stride of the window. Default: 2
padding (Optional[int]) – Padding to be added on both sides of the tensor. Default: 1

Shape:

Input: $(N, C, H_{i n}, W_{i n})$ where $N$ is the batch size, $C$ is the input channels,
$H_{i n}$ is the input height, and $W_{i n}$ is the input width
Output: $(N, C, H_{o u t}, W_{o u t})$ where $H_{o u t}$ is the output height, and $W_{i n}$ is
the output width

__init__(kernel_size: int | None = 3, stride: int | None = 2, padding: int | None = 1, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

Bases: AvgPool2d

Applies a 2D average pooling over a 4D input tensor.

Parameters:

kernel_size (Optional[int]) – the size of the window to take a max over
stride (Optional[int]) – The stride of the window. Default: 2
padding (Optional[int]) – Padding to be added on both sides of the tensor. Default: 1
ceil_mode (Optional[bool]) – When True, will use ceil instead of floor to compute the output shape. Default: False
count_include_pad (Optional[bool]) – When True, will include the zero-padding in the averaging calculation. Default: True
divisor_override – if specified, it will be used as divisor, otherwise size of the pooling region will be used. Default: None

Shape:

Input: $(N, C, H_{i n}, W_{i n})$ where $N$ is the batch size, $C$ is the input channels,
$H_{i n}$ is the input height, and $W_{i n}$ is the input width
Output: $(N, C, H_{o u t}, W_{o u t})$ where $H_{o u t}$ is the output height, and $W_{i n}$ is
the output width

__init__(kernel_size: tuple, stride: tuple | None = None, padding: tuple | None = (0, 0), ceil_mode: bool | None = False, count_include_pad: bool | None = True, divisor_override: bool | None = None)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.Dropout(p: float | None = 0.5, inplace: bool | None = False, *args, **kwargs)[source]

Bases: Dropout

This layer, during training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution.

Parameters:

p – probability of an element to be zeroed. Default: 0.5
inplace – If set to True, will do this operation in-place. Default: False

Shape:

Input: $(N, *)$ where $N$ is the batch size
Output: same as the input

__init__(p: float | None = 0.5, inplace: bool | None = False, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.Dropout2d(p: float = 0.5, inplace: bool = False)[source]

Bases: Dropout2d

This layer, during training, randomly zeroes some of the elements of the 4D input tensor with probability p using samples from a Bernoulli distribution.

Parameters:

p – probability of an element to be zeroed. Default: 0.5
inplace – If set to True, will do this operation in-place. Default: False

Shape:

Input: $(N, C, H, W)$ where $N$ is the batch size, $C$ is the input channels,
$H$ is the input tensor height, and $W$ is the input tensor width
Output: same as the input

__init__(p: float = 0.5, inplace: bool = False)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.AdjustBatchNormMomentum(opts, *args, **kwargs)[source]

Bases: object

This class enables adjusting the momentum in batch normalization layer.

Note

It’s an experimental feature and should be used with caution.

round_places = 6

__init__(opts, *args, **kwargs)[source]

adjust_momentum(model: Module, iteration: int, epoch: int) → None[source]

class cvnets.layers.Flatten(start_dim: int | None = 1, end_dim: int | None = -1)[source]

Bases: Flatten

This layer flattens a contiguous range of dimensions into a tensor.

Parameters:

start_dim (Optional[int]) – first dim to flatten. Default: 1
end_dim (Optional[int]) – last dim to flatten. Default: -1

Shape:

Input: $(*, S_{start}, . . ., S_{i}, . . ., S_{end}, *)$ ,’ where $S_{i}$ is the size at dimension $i$ and $*$ means any number of dimensions including none.
Output: $(*, \prod_{i = start}^{end} S_{i}, *)$ .

__init__(start_dim: int | None = 1, end_dim: int | None = -1)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.MultiHeadAttention(embed_dim: int, num_heads: int, attn_dropout: float | None = 0.0, bias: bool | None = True, output_dim: int | None = None, coreml_compatible: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

This layer applies a multi-head self- or cross-attention as described in Attention is all you need paper

Parameters:

embed_dim (int) – $C_{i n}$ from an expected input of size $(N, S, C_{i n})$
num_heads (int) – Number of heads in multi-head attention
attn_dropout (Optional[float]) – Attention dropout. Default: 0.0
bias (Optional[bool]) – Use bias or not. Default: True

Shape:

Input:
- Query tensor (x_q) $(N, S, C_{i n})$ where $N$ is batch size, $S$ is number of source tokens,

and $C_{i n}$ is input embedding dim

Optional Key-Value tensor (x_kv) $(N, T, C_{i n})$ where $T$ is number of target tokens

Output: same shape as the input

__init__(embed_dim: int, num_heads: int, attn_dropout: float | None = 0.0, bias: bool | None = True, output_dim: int | None = None, coreml_compatible: bool | None = False, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward_tracing(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) → Tensor[source]

forward_default(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) → Tensor[source]

forward_pytorch(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) → Tensor[source]

forward(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None, *args, **kwargs) → Tensor[source]: Forward function.

class cvnets.layers.SingleHeadAttention(embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs)[source]

Bases: BaseLayer

This layer applies a single-head attention as described in DeLighT paper

Parameters:

embed_dim (int) – $C_{i n}$ from an expected input of size $(N, P, C_{i n})$
attn_dropout (Optional[float]) – Attention dropout. Default: 0.0
bias (Optional[bool]) – Use bias or not. Default: True

Shape:

Input: $(N, P, C_{i n})$ where $N$ is batch size, $P$ is number of patches,

and $C_{i n}$ is input embedding dim - Output: same shape as the input

__init__(embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None, *args, **kwargs) → Tensor[source]: Forward function.

class cvnets.layers.Softmax(dim: int | None = -1, *args, **kwargs)[source]

Bases: Softmax

Applies the Softmax function to an input tensor along the specified dimension

Parameters:: dim (int) – Dimension along which softmax to be applied. Default: -1

Shape:

Input: $(*)$ where $*$ is one or more dimensions
Output: same shape as the input

__init__(dim: int | None = -1, *args, **kwargs)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.LinearSelfAttention(opts, embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs)[source]

Bases: BaseLayer

This layer applies a self-attention with linear complexity, as described in MobileViTv2 paper. This layer can be used for self- as well as cross-attention.

Parameters:

opts – command line arguments
embed_dim (int) – $C$ from an expected input of size $(N, C, H, W)$
attn_dropout (Optional[float]) – Dropout value for context scores. Default: 0.0
bias (Optional[bool]) – Use bias in learnable layers. Default: True

Shape:

Input: $(N, C, P, N)$ where $N$ is the batch size, $C$ is the input channels,

$P$ is the number of pixels in the patch, and $N$ is the number of patches - Output: same as the input

Note

For MobileViTv2, we unfold the feature map [B, C, H, W] into [B, C, P, N] where P is the number of pixels in a patch and N is the number of patches. Because channel is the first dimension in this unfolded tensor, we use point-wise convolution (instead of a linear layer). This avoids a transpose operation (which may be expensive on resource-constrained devices) that may be required to convert the unfolded tensor from channel-first to channel-last format in case of a linear layer.

__init__(opts, embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

static visualize_context_scores(context_scores)[source]

forward(x: Tensor, x_prev: Tensor | None = None, *args, **kwargs) → Tensor[source]: Forward function.

class cvnets.layers.Embedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, *args, **kwargs)[source]

Bases: Embedding

A lookup table that stores embeddings of a fixed dictionary and size.

Parameters:

num_embeddings (int) – size of the dictionary of embeddings
embedding_dim (int) – the size of each embedding vector
padding_idx (int, optional) – If specified, the entries at padding_idx do not contribute to the gradient; therefore, the embedding vector at padding_idx is not updated during training, i.e. it remains as a fixed “pad”. For a newly constructed Embedding, the embedding vector at padding_idx will default to all zeros, but can be updated to another value to be used as the padding vector.

Shape:

Input: $(*)$ , IntTensor or LongTensor of arbitrary shape containing the indices to extract
Output: $(*, H)$ , where * is the input shape and $H = embedding\_dim$

__init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, *args, **kwargs)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

reset_parameters() → None[source]

class cvnets.layers.PositionalEmbedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, is_learnable: bool | None = False, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]

Bases: BaseLayer

__init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, is_learnable: bool | None = False, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(seq_len: int, *args, **kwargs) → Tensor[source]: Forward function.

class cvnets.layers.StochasticDepth(p: float, mode: str)[source]

Bases: StochasticDepth

Implements the Stochastic Depth “Deep Networks with Stochastic Depth” used for randomly dropping residual branches of residual architectures.

__init__(p: float, mode: str) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.get_normalization_layer(opts: Namespace, num_features: int, norm_type: str | None = None, num_groups: int | None = None, momentum: float | None = None) → Module: Helper function to build the normalization layer. The function can be used in either of below mentioned ways: Scenario 1: Set the default normalization layers using command line arguments. This is useful when the same normalization layer is used for the entire network (e.g., ResNet). Scenario 2: Network uses different normalization layers. In that case, we can override the default normalization layer by specifying the name using norm_type argument.