cvnets.layers package

Subpackages

Submodules

cvnets.layers.adaptive_pool module

class cvnets.layers.adaptive_pool.AdaptiveAvgPool2d(output_size: int | Tuple[int, int] = 1, *args, **kwargs)[source]

Bases: AdaptiveAvgPool2d

Applies a 2D adaptive average pooling over an input tensor.

Parameters:

output_size (Optional, int or Tuple[int, int]) – The target output size. If a single int \(h\) is passed,

:param then a square output of size \(hxh\) is produced. If a tuple of size \(hxw\) is passed: :param then an: :param output of size hxw is produced. Default is 1.:

Shape:
  • Input: \((N, C, H, W)\) where \(N\) is the batch size, \(C\) is the number of input channels,

\(H\) is the input height, and \(W\) is the input width - Output: \((N, C, h, h)\) or \((N, C, h, w)\)

__init__(output_size: int | Tuple[int, int] = 1, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.base_layer module

class cvnets.layers.base_layer.BaseLayer(*args, **kwargs)[source]

Bases: Module

Base class for neural network layers. Subclass must implement forward function.

__init__(*args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add layer specific arguments

get_trainable_parameters(weight_decay: float | None = 0.0, no_decay_bn_filter_bias: bool | None = False, *args, **kwargs) Tuple[List[Dict], List[float]][source]

Get parameters for training along with the learning rate.

Parameters:
  • weight_decay – weight decay

  • no_decay_bn_filter_bias – Do not decay BN and biases. Defaults to False.

Returns:

Returns a tuple of length 2. The first entry is a list of dictionary with three keys (params, weight_decay, param_names). The second entry is a list of floats containing learning rate for each parameter.

Note

Learning rate multiplier is set to 1.0 here as it is handled inside the Central Model.

forward(*args, **kwargs) Any[source]

Forward function.

cvnets.layers.conv_layer module

class cvnets.layers.conv_layer.Conv2d(in_channels: int, out_channels: int, kernel_size: int | Tuple[int, int], stride: int | Tuple[int, int] | None = 1, padding: int | Tuple[int, int] | None = 0, dilation: int | Tuple[int, int] | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', *args, **kwargs)[source]

Bases: Conv2d

Applies a 2D convolution over an input.

Parameters:
  • in_channels\(C_{in}\) from an expected input of size \((N, C_{in}, H_{in}, W_{in})\).

  • out_channels\(C_{out}\) from an expected output of size \((N, C_{out}, H_{out}, W_{out})\).

  • kernel_size – Kernel size for convolution.

  • stride – Stride for convolution. Default: 1.

  • padding – Padding for convolution. Default: 0.

  • dilation – Dilation rate for convolution. Default: 1.

  • groups – Number of groups in convolution. Default: 1.

  • bias – Use bias. Default: False.

  • padding_mode – Padding mode (‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’). Default: zeros.

  • use_norm – Use normalization layer after convolution. Default: True.

  • use_act – Use activation layer after convolution (or convolution and normalization). Default: True.

  • act_name – Use specific activation function. Overrides the one specified in command line args.

Shape:
  • Input: \((N, C_{in}, H_{in}, W_{in})\).

  • Output: \((N, C_{out}, H_{out}, W_{out})\).

__init__(in_channels: int, out_channels: int, kernel_size: int | Tuple[int, int], stride: int | Tuple[int, int] | None = 1, padding: int | Tuple[int, int] | None = 0, dilation: int | Tuple[int, int] | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.conv_layer.ConvLayer1d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]

Bases: _BaseConvNormActLayer

ndim = 1
module_cls

alias of Conv1d

class cvnets.layers.conv_layer.ConvLayer2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]

Bases: _BaseConvNormActLayer

ndim = 2
module_cls

alias of Conv2d

class cvnets.layers.conv_layer.ConvLayer3d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]

Bases: _BaseConvNormActLayer

ndim = 3
module_cls

alias of Conv3d

class cvnets.layers.conv_layer.TransposeConvLayer2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple, stride: int | Tuple | None = 1, dilation: int | Tuple | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', use_norm: bool | None = True, use_act: bool | None = True, padding: int | Tuple | None = (0, 0), output_padding: int | Tuple | None = None, auto_padding: bool | None = True, *args, **kwargs)[source]

Bases: BaseLayer

Applies a 2D Transpose convolution (aka as Deconvolution) over an input.

Parameters:
  • opts – Command line arguments.

  • in_channels\(C_{in}\) from an expected input of size \((N, C_{in}, H_{in}, W_{in})\).

  • out_channels\(C_{out}\) from an expected output of size \((N, C_{out}, H_{out}, W_{out})\).

  • kernel_size – Kernel size for convolution.

  • stride – Stride for convolution. Default: 1.

  • dilation – Dilation rate for convolution. Default: 1.

  • groups – Number of groups in convolution. Default: 1.

  • bias – Use bias. Default: False.

  • padding_mode – Padding mode. Default: zeros.

  • use_norm – Use normalization layer after convolution. Default: True.

  • use_act – Use activation layer after convolution (or convolution and normalization).

  • DefaultTrue.

  • padding – Padding will be done on both sides of each dimension in the input.

  • output_padding – Additional padding on the output tensor.

  • auto_padding – Compute padding automatically. Default: True.

Shape:
  • Input: \((N, C_{in}, H_{in}, W_{in})\).

  • Output: \((N, C_{out}, H_{out}, W_{out})\).

__init__(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple, stride: int | Tuple | None = 1, dilation: int | Tuple | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', use_norm: bool | None = True, use_act: bool | None = True, padding: int | Tuple | None = (0, 0), output_padding: int | Tuple | None = None, auto_padding: bool | None = True, *args, **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]

Forward function.

class cvnets.layers.conv_layer.NormActLayer(opts, num_features, *args, **kwargs)[source]

Bases: BaseLayer

Applies a normalization layer followed by an activation layer.

Parameters:
  • opts – Command-line arguments.

  • num_features\(C\) from an expected input of size \((N, C, H, W)\).

Shape:
  • Input: \((N, C, H, W)\).

  • Output: \((N, C, H, W)\).

__init__(opts, num_features, *args, **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]

Forward function.

class cvnets.layers.conv_layer.SeparableConv1d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]

Bases: _BaseSeparableConv

conv_layer_cls

alias of ConvLayer1d

class cvnets.layers.conv_layer.SeparableConv2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]

Bases: _BaseSeparableConv

conv_layer_cls

alias of ConvLayer2d

class cvnets.layers.conv_layer.SeparableConv3d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]

Bases: _BaseSeparableConv

conv_layer_cls

alias of ConvLayer3d

cvnets.layers.dropout module

class cvnets.layers.dropout.Dropout(p: float | None = 0.5, inplace: bool | None = False, *args, **kwargs)[source]

Bases: Dropout

This layer, during training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution.

Parameters:
  • p – probability of an element to be zeroed. Default: 0.5

  • inplace – If set to True, will do this operation in-place. Default: False

Shape:
  • Input: \((N, *)\) where \(N\) is the batch size

  • Output: same as the input

__init__(p: float | None = 0.5, inplace: bool | None = False, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.dropout.Dropout2d(p: float = 0.5, inplace: bool = False)[source]

Bases: Dropout2d

This layer, during training, randomly zeroes some of the elements of the 4D input tensor with probability p using samples from a Bernoulli distribution.

Parameters:
  • p – probability of an element to be zeroed. Default: 0.5

  • inplace – If set to True, will do this operation in-place. Default: False

Shape:
  • Input: \((N, C, H, W)\) where \(N\) is the batch size, \(C\) is the input channels,

    \(H\) is the input tensor height, and \(W\) is the input tensor width

  • Output: same as the input

__init__(p: float = 0.5, inplace: bool = False)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.embedding module

class cvnets.layers.embedding.Embedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, *args, **kwargs)[source]

Bases: Embedding

A lookup table that stores embeddings of a fixed dictionary and size.

Parameters:
  • num_embeddings (int) – size of the dictionary of embeddings

  • embedding_dim (int) – the size of each embedding vector

  • padding_idx (int, optional) – If specified, the entries at padding_idx do not contribute to the gradient; therefore, the embedding vector at padding_idx is not updated during training, i.e. it remains as a fixed “pad”. For a newly constructed Embedding, the embedding vector at padding_idx will default to all zeros, but can be updated to another value to be used as the padding vector.

Shape:
  • Input: \((*)\), IntTensor or LongTensor of arbitrary shape containing the indices to extract

  • Output: \((*, H)\), where * is the input shape and \(H=\text{embedding\_dim}\)

__init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, *args, **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

reset_parameters() None[source]

cvnets.layers.flatten module

class cvnets.layers.flatten.Flatten(start_dim: int | None = 1, end_dim: int | None = -1)[source]

Bases: Flatten

This layer flattens a contiguous range of dimensions into a tensor.

Parameters:
  • start_dim (Optional[int]) – first dim to flatten. Default: 1

  • end_dim (Optional[int]) – last dim to flatten. Default: -1

Shape:
  • Input: \((*, S_{\text{start}},..., S_{i}, ..., S_{\text{end}}, *)\),’ where \(S_{i}\) is the size at dimension \(i\) and \(*\) means any number of dimensions including none.

  • Output: \((*, \prod_{i=\text{start}}^{\text{end}} S_{i}, *)\).

__init__(start_dim: int | None = 1, end_dim: int | None = -1)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.global_pool module

class cvnets.layers.global_pool.GlobalPool(pool_type: str | None = 'mean', keep_dim: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

This layers applies global pooling over a 4D or 5D input tensor

Parameters:
  • pool_type (Optional[str]) – Pooling type. It can be mean, rms, or abs. Default: mean

  • keep_dim (Optional[bool]) – Do not squeeze the dimensions of a tensor. Default: False

Shape:
  • Input: \((N, C, H, W)\) or \((N, C, D, H, W)\)

  • Output: \((N, C, 1, 1)\) or \((N, C, 1, 1, 1)\) if keep_dim else \((N, C)\)

pool_types = ['mean', 'rms', 'abs']
__init__(pool_type: str | None = 'mean', keep_dim: bool | None = False, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser)[source]

Add layer specific arguments

forward(x: Tensor) Tensor[source]

Forward function.

cvnets.layers.identity module

class cvnets.layers.identity.Identity[source]

Bases: BaseLayer

This is a place-holder and returns the same tensor.

__init__()[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]

Forward function.

cvnets.layers.linear_attention module

class cvnets.layers.linear_attention.LinearSelfAttention(opts, embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs)[source]

Bases: BaseLayer

This layer applies a self-attention with linear complexity, as described in MobileViTv2 paper. This layer can be used for self- as well as cross-attention.

Parameters:
  • opts – command line arguments

  • embed_dim (int) – \(C\) from an expected input of size \((N, C, H, W)\)

  • attn_dropout (Optional[float]) – Dropout value for context scores. Default: 0.0

  • bias (Optional[bool]) – Use bias in learnable layers. Default: True

Shape:
  • Input: \((N, C, P, N)\) where \(N\) is the batch size, \(C\) is the input channels,

\(P\) is the number of pixels in the patch, and \(N\) is the number of patches - Output: same as the input

Note

For MobileViTv2, we unfold the feature map [B, C, H, W] into [B, C, P, N] where P is the number of pixels in a patch and N is the number of patches. Because channel is the first dimension in this unfolded tensor, we use point-wise convolution (instead of a linear layer). This avoids a transpose operation (which may be expensive on resource-constrained devices) that may be required to convert the unfolded tensor from channel-first to channel-last format in case of a linear layer.

__init__(opts, embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

static visualize_context_scores(context_scores)[source]
forward(x: Tensor, x_prev: Tensor | None = None, *args, **kwargs) Tensor[source]

Forward function.

cvnets.layers.linear_layer module

class cvnets.layers.linear_layer.LinearLayer(in_features: int, out_features: int, bias: bool | None = True, channel_first: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

Applies a linear transformation to the input data

Parameters:
  • in_features (int) – number of features in the input tensor

  • out_features (int) – number of features in the output tensor

  • bias (Optional[bool]) – use bias or not

  • channel_first (Optional[bool]) – Channels are first or last dimension. If first, then use Conv2d

Shape:
  • Input: \((N, *, C_{in})\) if not channel_first else \((N, C_{in}, *)\) where \(*\) means any number of dimensions.

  • Output: \((N, *, C_{out})\) if not channel_first else \((N, C_{out}, *)\)

__init__(in_features: int, out_features: int, bias: bool | None = True, channel_first: bool | None = False, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser)[source]

Add layer specific arguments

reset_params()[source]
forward(x: Tensor) Tensor[source]

Forward function.

class cvnets.layers.linear_layer.GroupLinear(in_features: int, out_features: int, n_groups: int, bias: bool | None = True, feature_shuffle: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

Applies a GroupLinear transformation layer, as defined here, here and here

Parameters:
  • in_features (int) – number of features in the input tensor

  • out_features (int) – number of features in the output tensor

  • n_groups (int) – number of groups

  • bias (Optional[bool]) – use bias or not

  • feature_shuffle (Optional[bool]) – Shuffle features between groups

Shape:
  • Input: \((N, *, C_{in})\)

  • Output: \((N, *, C_{out})\)

__init__(in_features: int, out_features: int, n_groups: int, bias: bool | None = True, feature_shuffle: bool | None = False, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser)[source]

Add layer specific arguments

reset_params()[source]
forward(x: Tensor) Tensor[source]

Forward function.

cvnets.layers.multi_head_attention module

class cvnets.layers.multi_head_attention.MultiHeadAttention(embed_dim: int, num_heads: int, attn_dropout: float | None = 0.0, bias: bool | None = True, output_dim: int | None = None, coreml_compatible: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

This layer applies a multi-head self- or cross-attention as described in Attention is all you need paper

Parameters:
  • embed_dim (int) – \(C_{in}\) from an expected input of size \((N, S, C_{in})\)

  • num_heads (int) – Number of heads in multi-head attention

  • attn_dropout (Optional[float]) – Attention dropout. Default: 0.0

  • bias (Optional[bool]) – Use bias or not. Default: True

Shape:
  • Input:
    • Query tensor (x_q) \((N, S, C_{in})\) where \(N\) is batch size, \(S\) is number of source tokens,

and \(C_{in}\) is input embedding dim
  • Optional Key-Value tensor (x_kv) \((N, T, C_{in})\) where \(T\) is number of target tokens

  • Output: same shape as the input

__init__(embed_dim: int, num_heads: int, attn_dropout: float | None = 0.0, bias: bool | None = True, output_dim: int | None = None, coreml_compatible: bool | None = False, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward_tracing(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) Tensor[source]
forward_default(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) Tensor[source]
forward_pytorch(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) Tensor[source]
forward(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None, *args, **kwargs) Tensor[source]

Forward function.

cvnets.layers.normalization_layers module

class cvnets.layers.normalization_layers.AdjustBatchNormMomentum(opts, *args, **kwargs)[source]

Bases: object

This class enables adjusting the momentum in batch normalization layer.

Note

It’s an experimental feature and should be used with caution.

round_places = 6
__init__(opts, *args, **kwargs)[source]
adjust_momentum(model: Module, iteration: int, epoch: int) None[source]

cvnets.layers.pixel_shuffle module

class cvnets.layers.pixel_shuffle.PixelShuffle(upscale_factor: int, *args, **kwargs)[source]

Bases: PixelShuffle

Rearranges elements in a tensor of shape \((*, C imes r^2, H, W)\) to a tensor of shape \((*, C, H imes r, W imes r)\), where r is an upscale factor.

Parameters:

upscale_factor (int) – factor to increase spatial resolution by

Shape:
  • Input: \((*, C imes r^2, H, W)\), where * is zero or more dimensions

  • Output: \((*, C, H imes r, W imes r)\)

__init__(upscale_factor: int, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.pooling module

class cvnets.layers.pooling.MaxPool2d(kernel_size: int | None = 3, stride: int | None = 2, padding: int | None = 1, *args, **kwargs)[source]

Bases: MaxPool2d

Applies a 2D max pooling over a 4D input tensor.

Parameters:
  • kernel_size (Optional[int]) – the size of the window to take a max over

  • stride (Optional[int]) – The stride of the window. Default: 2

  • padding (Optional[int]) – Padding to be added on both sides of the tensor. Default: 1

Shape:
  • Input: \((N, C, H_{in}, W_{in})\) where \(N\) is the batch size, \(C\) is the input channels,

    \(H_{in}\) is the input height, and \(W_{in}\) is the input width

  • Output: \((N, C, H_{out}, W_{out})\) where \(H_{out}\) is the output height, and \(W_{in}\) is

    the output width

__init__(kernel_size: int | None = 3, stride: int | None = 2, padding: int | None = 1, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.pooling.AvgPool2d(kernel_size: tuple, stride: tuple | None = None, padding: tuple | None = (0, 0), ceil_mode: bool | None = False, count_include_pad: bool | None = True, divisor_override: bool | None = None)[source]

Bases: AvgPool2d

Applies a 2D average pooling over a 4D input tensor.

Parameters:
  • kernel_size (Optional[int]) – the size of the window to take a max over

  • stride (Optional[int]) – The stride of the window. Default: 2

  • padding (Optional[int]) – Padding to be added on both sides of the tensor. Default: 1

  • ceil_mode (Optional[bool]) – When True, will use ceil instead of floor to compute the output shape. Default: False

  • count_include_pad (Optional[bool]) – When True, will include the zero-padding in the averaging calculation. Default: True

  • divisor_override – if specified, it will be used as divisor, otherwise size of the pooling region will be used. Default: None

Shape:
  • Input: \((N, C, H_{in}, W_{in})\) where \(N\) is the batch size, \(C\) is the input channels,

    \(H_{in}\) is the input height, and \(W_{in}\) is the input width

  • Output: \((N, C, H_{out}, W_{out})\) where \(H_{out}\) is the output height, and \(W_{in}\) is

    the output width

__init__(kernel_size: tuple, stride: tuple | None = None, padding: tuple | None = (0, 0), ceil_mode: bool | None = False, count_include_pad: bool | None = True, divisor_override: bool | None = None)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.positional_embedding module

class cvnets.layers.positional_embedding.PositionalEmbedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, is_learnable: bool | None = False, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]

Bases: BaseLayer

__init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, is_learnable: bool | None = False, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(seq_len: int, *args, **kwargs) Tensor[source]

Forward function.

class cvnets.layers.positional_embedding.LearnablePositionalEmbedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]

Bases: Module

Learnable Positional embedding

__init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

reset_parameters() None[source]
forward(seq_len: int, *args, **kwargs) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class cvnets.layers.positional_embedding.SinusoidalPositionalEmbedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]

Bases: Module

__init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

get_weights() Tensor[source]

Build sinusoidal embeddings. Adapted from Fairseq.

forward(seq_len: int, *args, **kwargs) Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

cvnets.layers.positional_encoding module

class cvnets.layers.positional_encoding.SinusoidalPositionalEncoding(d_model: int, dropout: float | None = 0.0, max_len: int | None = 5000, channels_last: bool | None = True, *args, **kwargs)[source]

Bases: BaseLayer

This layer adds sinusoidal positional embeddings to a 3D input tensor. The code has been adapted from Pytorch tutorial

Parameters:
  • d_model (int) – dimension of the input tensor

  • dropout (Optional[float]) – Dropout rate. Default: 0.0

  • max_len (Optional[int]) – Max. number of patches (or seq. length). Default: 5000

  • channels_last (Optional[bool]) – Channels dimension is the last in the input tensor

Shape:
  • Input: \((N, C, P)\) or \((N, P, C)\) where \(N\) is the batch size, \(C\) is the embedding dimension,

    \(P\) is the number of patches

  • Output: same shape as the input

__init__(d_model: int, dropout: float | None = 0.0, max_len: int | None = 5000, channels_last: bool | None = True, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward_patch_last(x, indices: Tensor | None = None, *args, **kwargs) Tensor[source]
forward_others(x, indices: Tensor | None = None, *args, **kwargs) Tensor[source]
forward(x, indices: Tensor | None = None, *args, **kwargs) Tensor[source]

Forward function.

class cvnets.layers.positional_encoding.LearnablePositionEncoding(embed_dim: int, num_embeddings: int, dropout: float | None = 0.0, channels_last: bool | None = True, *args, **kwargs)[source]

Bases: BaseLayer

This layer adds learnable positional embeddings to a 3D input tensor.

Parameters:
  • embed_dim (int) – dimension of the input tensor

  • num_embeddings (int) – number of input embeddings. This is similar to vocab size in NLP.

  • dropout (Optional[float]) – Dropout rate. Default: 0.0

  • channels_last (Optional[bool]) – Channels dimension is the last in the input tensor

Shape:
  • Input: \((N, *, C, P)\) or \((N, *, P, C)\) where \(N\) is the batch size, \(C\) is the embedding dimension,

    \(P\) is the number of patches

  • Output: same shape as the input

__init__(embed_dim: int, num_embeddings: int, dropout: float | None = 0.0, channels_last: bool | None = True, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x, *args, **kwargs) Tensor[source]

Forward function.

cvnets.layers.random_layers module

class cvnets.layers.random_layers.RandomApply(module_list: List, keep_p: float | None = 0.8, *args, **kwargs)[source]

Bases: BaseLayer

This layer randomly applies a list of modules during training.

Parameters:
  • module_list (List) – List of modules

  • keep_p (Optional[float]) – Keep P modules from the list during training. Default: 0.8 (or 80%)

__init__(module_list: List, keep_p: float | None = 0.8, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]

Forward function.

cvnets.layers.single_head_attention module

class cvnets.layers.single_head_attention.SingleHeadAttention(embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs)[source]

Bases: BaseLayer

This layer applies a single-head attention as described in DeLighT paper

Parameters:
  • embed_dim (int) – \(C_{in}\) from an expected input of size \((N, P, C_{in})\)

  • attn_dropout (Optional[float]) – Attention dropout. Default: 0.0

  • bias (Optional[bool]) – Use bias or not. Default: True

Shape:
  • Input: \((N, P, C_{in})\) where \(N\) is batch size, \(P\) is number of patches,

and \(C_{in}\) is input embedding dim - Output: same shape as the input

__init__(embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None, *args, **kwargs) Tensor[source]

Forward function.

cvnets.layers.softmax module

class cvnets.layers.softmax.Softmax(dim: int | None = -1, *args, **kwargs)[source]

Bases: Softmax

Applies the Softmax function to an input tensor along the specified dimension

Parameters:

dim (int) – Dimension along which softmax to be applied. Default: -1

Shape:
  • Input: \((*)\) where \(*\) is one or more dimensions

  • Output: same shape as the input

__init__(dim: int | None = -1, *args, **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.stochastic_depth module

class cvnets.layers.stochastic_depth.StochasticDepth(p: float, mode: str)[source]

Bases: StochasticDepth

Implements the Stochastic Depth “Deep Networks with Stochastic Depth” used for randomly dropping residual branches of residual architectures.

__init__(p: float, mode: str) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.token_merging module

class cvnets.layers.token_merging.TokenMerging(dim: int, window: int = 2)[source]

Bases: Module

Merge tokens from a [batch_size, sequence_length, num_channels] tensor using a linear projection.

This function also updates masks and adds padding as needed to make the sequence length divisible by the window size before merging tokens.

Parameters:
  • dim – Number of input channels.

  • window – The size of the window to merge into a single token.

__init__(dim: int, window: int = 2) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor, key_padding_mask: Tensor) Tuple[Tensor, Tensor][source]

Perform token merging.

Parameters:
  • x – A tensor of shape [batch_size, sequence_length, num_channels].

  • key_padding_mask – A tensor of shape [batch_size, sequence_length] with “-inf” values at mask tokens, and “0” values at unmasked tokens.

Returns:

A tensor of shape [batch_size, math.ceil(sequence_length /

self.window), num_channels], where @self.window is the window size.

extra_repr() str[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

cvnets.layers.token_merging.pad_x_and_mask(x: Tensor, key_padding_mask: Tensor, window_size: int) Tuple[Tensor, Tensor][source]

Apply padding to @x and @key_padding_mask to make their lengths divisible by @window_size.

Parameters:
  • x – The input tensor of shape [B, N, C].

  • key_padding_mask – The mask of shape [B, N].

  • window_size – the N dimension of @x and @key_padding_mask will be padded to make them divisble by this number.

Returns:

A tuple containing @x and @key_padding_mask, with padding applied.

cvnets.layers.upsample module

class cvnets.layers.upsample.UpSample(size: int | Tuple[int, ...] | None = None, scale_factor: float | None = None, mode: str | None = 'nearest', align_corners: bool | None = None, *args, **kwargs)[source]

Bases: Upsample

This layer upsamples a given input tensor.

Parameters:
  • size (Optional[Union[int, Tuple[int, ...]]) – Output spatial size. Default: None

  • scale_factor (Optional[float]) – Scale each spatial dimension of the input by this factor. Default: None

  • mode (Optional[str]) – Upsampling algorithm ('nearest', 'linear', 'bilinear', 'bicubic' and 'trilinear'. Default: 'nearest'

  • align_corners (Optional[bool]) – if True, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect when mode is 'linear', 'bilinear', 'bicubic', or 'trilinear'. Default: None

Shape:
  • Input: \((N, C, W_{in})\) or \((N, C, H_{in}, W_{in})\) or \((N, C, D_{in}, H_{in}, W_{in})\)

  • Output: \((N, C, W_{out})\) or \((N, C, H_{out}, W_{out})\) or \((N, C, D_{out}, H_{out}, W_{out})\)

__init__(size: int | Tuple[int, ...] | None = None, scale_factor: float | None = None, mode: str | None = 'nearest', align_corners: bool | None = None, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Module contents

class cvnets.layers.ConvLayer1d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]

Bases: _BaseConvNormActLayer

ndim = 1
module_cls

alias of Conv1d

training: bool
class cvnets.layers.ConvLayer2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]

Bases: _BaseConvNormActLayer

ndim = 2
module_cls

alias of Conv2d

training: bool
class cvnets.layers.ConvLayer3d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]

Bases: _BaseConvNormActLayer

ndim = 3
module_cls

alias of Conv3d

training: bool
class cvnets.layers.SeparableConv1d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]

Bases: _BaseSeparableConv

conv_layer_cls

alias of ConvLayer1d

class cvnets.layers.SeparableConv2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]

Bases: _BaseSeparableConv

conv_layer_cls

alias of ConvLayer2d

class cvnets.layers.SeparableConv3d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]

Bases: _BaseSeparableConv

conv_layer_cls

alias of ConvLayer3d

class cvnets.layers.NormActLayer(opts, num_features, *args, **kwargs)[source]

Bases: BaseLayer

Applies a normalization layer followed by an activation layer.

Parameters:
  • opts – Command-line arguments.

  • num_features\(C\) from an expected input of size \((N, C, H, W)\).

Shape:
  • Input: \((N, C, H, W)\).

  • Output: \((N, C, H, W)\).

__init__(opts, num_features, *args, **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]

Forward function.

class cvnets.layers.TransposeConvLayer2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple, stride: int | Tuple | None = 1, dilation: int | Tuple | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', use_norm: bool | None = True, use_act: bool | None = True, padding: int | Tuple | None = (0, 0), output_padding: int | Tuple | None = None, auto_padding: bool | None = True, *args, **kwargs)[source]

Bases: BaseLayer

Applies a 2D Transpose convolution (aka as Deconvolution) over an input.

Parameters:
  • opts – Command line arguments.

  • in_channels\(C_{in}\) from an expected input of size \((N, C_{in}, H_{in}, W_{in})\).

  • out_channels\(C_{out}\) from an expected output of size \((N, C_{out}, H_{out}, W_{out})\).

  • kernel_size – Kernel size for convolution.

  • stride – Stride for convolution. Default: 1.

  • dilation – Dilation rate for convolution. Default: 1.

  • groups – Number of groups in convolution. Default: 1.

  • bias – Use bias. Default: False.

  • padding_mode – Padding mode. Default: zeros.

  • use_norm – Use normalization layer after convolution. Default: True.

  • use_act – Use activation layer after convolution (or convolution and normalization).

  • DefaultTrue.

  • padding – Padding will be done on both sides of each dimension in the input.

  • output_padding – Additional padding on the output tensor.

  • auto_padding – Compute padding automatically. Default: True.

Shape:
  • Input: \((N, C_{in}, H_{in}, W_{in})\).

  • Output: \((N, C_{out}, H_{out}, W_{out})\).

__init__(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple, stride: int | Tuple | None = 1, dilation: int | Tuple | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', use_norm: bool | None = True, use_act: bool | None = True, padding: int | Tuple | None = (0, 0), output_padding: int | Tuple | None = None, auto_padding: bool | None = True, *args, **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]

Forward function.

class cvnets.layers.LinearLayer(in_features: int, out_features: int, bias: bool | None = True, channel_first: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

Applies a linear transformation to the input data

Parameters:
  • in_features (int) – number of features in the input tensor

  • out_features (int) – number of features in the output tensor

  • bias (Optional[bool]) – use bias or not

  • channel_first (Optional[bool]) – Channels are first or last dimension. If first, then use Conv2d

Shape:
  • Input: \((N, *, C_{in})\) if not channel_first else \((N, C_{in}, *)\) where \(*\) means any number of dimensions.

  • Output: \((N, *, C_{out})\) if not channel_first else \((N, C_{out}, *)\)

__init__(in_features: int, out_features: int, bias: bool | None = True, channel_first: bool | None = False, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser)[source]

Add layer specific arguments

reset_params()[source]
forward(x: Tensor) Tensor[source]

Forward function.

class cvnets.layers.GroupLinear(in_features: int, out_features: int, n_groups: int, bias: bool | None = True, feature_shuffle: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

Applies a GroupLinear transformation layer, as defined here, here and here

Parameters:
  • in_features (int) – number of features in the input tensor

  • out_features (int) – number of features in the output tensor

  • n_groups (int) – number of groups

  • bias (Optional[bool]) – use bias or not

  • feature_shuffle (Optional[bool]) – Shuffle features between groups

Shape:
  • Input: \((N, *, C_{in})\)

  • Output: \((N, *, C_{out})\)

__init__(in_features: int, out_features: int, n_groups: int, bias: bool | None = True, feature_shuffle: bool | None = False, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser)[source]

Add layer specific arguments

reset_params()[source]
forward(x: Tensor) Tensor[source]

Forward function.

class cvnets.layers.GlobalPool(pool_type: str | None = 'mean', keep_dim: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

This layers applies global pooling over a 4D or 5D input tensor

Parameters:
  • pool_type (Optional[str]) – Pooling type. It can be mean, rms, or abs. Default: mean

  • keep_dim (Optional[bool]) – Do not squeeze the dimensions of a tensor. Default: False

Shape:
  • Input: \((N, C, H, W)\) or \((N, C, D, H, W)\)

  • Output: \((N, C, 1, 1)\) or \((N, C, 1, 1, 1)\) if keep_dim else \((N, C)\)

pool_types = ['mean', 'rms', 'abs']
__init__(pool_type: str | None = 'mean', keep_dim: bool | None = False, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser)[source]

Add layer specific arguments

forward(x: Tensor) Tensor[source]

Forward function.

training: bool
class cvnets.layers.Identity[source]

Bases: BaseLayer

This is a place-holder and returns the same tensor.

__init__()[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) Tensor[source]

Forward function.

class cvnets.layers.PixelShuffle(upscale_factor: int, *args, **kwargs)[source]

Bases: PixelShuffle

Rearranges elements in a tensor of shape \((*, C imes r^2, H, W)\) to a tensor of shape \((*, C, H imes r, W imes r)\), where r is an upscale factor.

Parameters:

upscale_factor (int) – factor to increase spatial resolution by

Shape:
  • Input: \((*, C imes r^2, H, W)\), where * is zero or more dimensions

  • Output: \((*, C, H imes r, W imes r)\)

__init__(upscale_factor: int, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.UpSample(size: int | Tuple[int, ...] | None = None, scale_factor: float | None = None, mode: str | None = 'nearest', align_corners: bool | None = None, *args, **kwargs)[source]

Bases: Upsample

This layer upsamples a given input tensor.

Parameters:
  • size (Optional[Union[int, Tuple[int, ...]]) – Output spatial size. Default: None

  • scale_factor (Optional[float]) – Scale each spatial dimension of the input by this factor. Default: None

  • mode (Optional[str]) – Upsampling algorithm ('nearest', 'linear', 'bilinear', 'bicubic' and 'trilinear'. Default: 'nearest'

  • align_corners (Optional[bool]) – if True, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect when mode is 'linear', 'bilinear', 'bicubic', or 'trilinear'. Default: None

Shape:
  • Input: \((N, C, W_{in})\) or \((N, C, H_{in}, W_{in})\) or \((N, C, D_{in}, H_{in}, W_{in})\)

  • Output: \((N, C, W_{out})\) or \((N, C, H_{out}, W_{out})\) or \((N, C, D_{out}, H_{out}, W_{out})\)

__init__(size: int | Tuple[int, ...] | None = None, scale_factor: float | None = None, mode: str | None = 'nearest', align_corners: bool | None = None, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.MaxPool2d(kernel_size: int | None = 3, stride: int | None = 2, padding: int | None = 1, *args, **kwargs)[source]

Bases: MaxPool2d

Applies a 2D max pooling over a 4D input tensor.

Parameters:
  • kernel_size (Optional[int]) – the size of the window to take a max over

  • stride (Optional[int]) – The stride of the window. Default: 2

  • padding (Optional[int]) – Padding to be added on both sides of the tensor. Default: 1

Shape:
  • Input: \((N, C, H_{in}, W_{in})\) where \(N\) is the batch size, \(C\) is the input channels,

    \(H_{in}\) is the input height, and \(W_{in}\) is the input width

  • Output: \((N, C, H_{out}, W_{out})\) where \(H_{out}\) is the output height, and \(W_{in}\) is

    the output width

__init__(kernel_size: int | None = 3, stride: int | None = 2, padding: int | None = 1, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.AvgPool2d(kernel_size: tuple, stride: tuple | None = None, padding: tuple | None = (0, 0), ceil_mode: bool | None = False, count_include_pad: bool | None = True, divisor_override: bool | None = None)[source]

Bases: AvgPool2d

Applies a 2D average pooling over a 4D input tensor.

Parameters:
  • kernel_size (Optional[int]) – the size of the window to take a max over

  • stride (Optional[int]) – The stride of the window. Default: 2

  • padding (Optional[int]) – Padding to be added on both sides of the tensor. Default: 1

  • ceil_mode (Optional[bool]) – When True, will use ceil instead of floor to compute the output shape. Default: False

  • count_include_pad (Optional[bool]) – When True, will include the zero-padding in the averaging calculation. Default: True

  • divisor_override – if specified, it will be used as divisor, otherwise size of the pooling region will be used. Default: None

Shape:
  • Input: \((N, C, H_{in}, W_{in})\) where \(N\) is the batch size, \(C\) is the input channels,

    \(H_{in}\) is the input height, and \(W_{in}\) is the input width

  • Output: \((N, C, H_{out}, W_{out})\) where \(H_{out}\) is the output height, and \(W_{in}\) is

    the output width

__init__(kernel_size: tuple, stride: tuple | None = None, padding: tuple | None = (0, 0), ceil_mode: bool | None = False, count_include_pad: bool | None = True, divisor_override: bool | None = None)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.Dropout(p: float | None = 0.5, inplace: bool | None = False, *args, **kwargs)[source]

Bases: Dropout

This layer, during training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution.

Parameters:
  • p – probability of an element to be zeroed. Default: 0.5

  • inplace – If set to True, will do this operation in-place. Default: False

Shape:
  • Input: \((N, *)\) where \(N\) is the batch size

  • Output: same as the input

__init__(p: float | None = 0.5, inplace: bool | None = False, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.Dropout2d(p: float = 0.5, inplace: bool = False)[source]

Bases: Dropout2d

This layer, during training, randomly zeroes some of the elements of the 4D input tensor with probability p using samples from a Bernoulli distribution.

Parameters:
  • p – probability of an element to be zeroed. Default: 0.5

  • inplace – If set to True, will do this operation in-place. Default: False

Shape:
  • Input: \((N, C, H, W)\) where \(N\) is the batch size, \(C\) is the input channels,

    \(H\) is the input tensor height, and \(W\) is the input tensor width

  • Output: same as the input

__init__(p: float = 0.5, inplace: bool = False)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.AdjustBatchNormMomentum(opts, *args, **kwargs)[source]

Bases: object

This class enables adjusting the momentum in batch normalization layer.

Note

It’s an experimental feature and should be used with caution.

round_places = 6
__init__(opts, *args, **kwargs)[source]
adjust_momentum(model: Module, iteration: int, epoch: int) None[source]
class cvnets.layers.Flatten(start_dim: int | None = 1, end_dim: int | None = -1)[source]

Bases: Flatten

This layer flattens a contiguous range of dimensions into a tensor.

Parameters:
  • start_dim (Optional[int]) – first dim to flatten. Default: 1

  • end_dim (Optional[int]) – last dim to flatten. Default: -1

Shape:
  • Input: \((*, S_{\text{start}},..., S_{i}, ..., S_{\text{end}}, *)\),’ where \(S_{i}\) is the size at dimension \(i\) and \(*\) means any number of dimensions including none.

  • Output: \((*, \prod_{i=\text{start}}^{\text{end}} S_{i}, *)\).

__init__(start_dim: int | None = 1, end_dim: int | None = -1)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.MultiHeadAttention(embed_dim: int, num_heads: int, attn_dropout: float | None = 0.0, bias: bool | None = True, output_dim: int | None = None, coreml_compatible: bool | None = False, *args, **kwargs)[source]

Bases: BaseLayer

This layer applies a multi-head self- or cross-attention as described in Attention is all you need paper

Parameters:
  • embed_dim (int) – \(C_{in}\) from an expected input of size \((N, S, C_{in})\)

  • num_heads (int) – Number of heads in multi-head attention

  • attn_dropout (Optional[float]) – Attention dropout. Default: 0.0

  • bias (Optional[bool]) – Use bias or not. Default: True

Shape:
  • Input:
    • Query tensor (x_q) \((N, S, C_{in})\) where \(N\) is batch size, \(S\) is number of source tokens,

and \(C_{in}\) is input embedding dim
  • Optional Key-Value tensor (x_kv) \((N, T, C_{in})\) where \(T\) is number of target tokens

  • Output: same shape as the input

__init__(embed_dim: int, num_heads: int, attn_dropout: float | None = 0.0, bias: bool | None = True, output_dim: int | None = None, coreml_compatible: bool | None = False, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward_tracing(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) Tensor[source]
forward_default(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) Tensor[source]
forward_pytorch(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) Tensor[source]
forward(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None, *args, **kwargs) Tensor[source]

Forward function.

class cvnets.layers.SingleHeadAttention(embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs)[source]

Bases: BaseLayer

This layer applies a single-head attention as described in DeLighT paper

Parameters:
  • embed_dim (int) – \(C_{in}\) from an expected input of size \((N, P, C_{in})\)

  • attn_dropout (Optional[float]) – Attention dropout. Default: 0.0

  • bias (Optional[bool]) – Use bias or not. Default: True

Shape:
  • Input: \((N, P, C_{in})\) where \(N\) is batch size, \(P\) is number of patches,

and \(C_{in}\) is input embedding dim - Output: same shape as the input

__init__(embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None, *args, **kwargs) Tensor[source]

Forward function.

class cvnets.layers.Softmax(dim: int | None = -1, *args, **kwargs)[source]

Bases: Softmax

Applies the Softmax function to an input tensor along the specified dimension

Parameters:

dim (int) – Dimension along which softmax to be applied. Default: -1

Shape:
  • Input: \((*)\) where \(*\) is one or more dimensions

  • Output: same shape as the input

__init__(dim: int | None = -1, *args, **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class cvnets.layers.LinearSelfAttention(opts, embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs)[source]

Bases: BaseLayer

This layer applies a self-attention with linear complexity, as described in MobileViTv2 paper. This layer can be used for self- as well as cross-attention.

Parameters:
  • opts – command line arguments

  • embed_dim (int) – \(C\) from an expected input of size \((N, C, H, W)\)

  • attn_dropout (Optional[float]) – Dropout value for context scores. Default: 0.0

  • bias (Optional[bool]) – Use bias in learnable layers. Default: True

Shape:
  • Input: \((N, C, P, N)\) where \(N\) is the batch size, \(C\) is the input channels,

\(P\) is the number of pixels in the patch, and \(N\) is the number of patches - Output: same as the input

Note

For MobileViTv2, we unfold the feature map [B, C, H, W] into [B, C, P, N] where P is the number of pixels in a patch and N is the number of patches. Because channel is the first dimension in this unfolded tensor, we use point-wise convolution (instead of a linear layer). This avoids a transpose operation (which may be expensive on resource-constrained devices) that may be required to convert the unfolded tensor from channel-first to channel-last format in case of a linear layer.

__init__(opts, embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

static visualize_context_scores(context_scores)[source]
forward(x: Tensor, x_prev: Tensor | None = None, *args, **kwargs) Tensor[source]

Forward function.

class cvnets.layers.Embedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, *args, **kwargs)[source]

Bases: Embedding

A lookup table that stores embeddings of a fixed dictionary and size.

Parameters:
  • num_embeddings (int) – size of the dictionary of embeddings

  • embedding_dim (int) – the size of each embedding vector

  • padding_idx (int, optional) – If specified, the entries at padding_idx do not contribute to the gradient; therefore, the embedding vector at padding_idx is not updated during training, i.e. it remains as a fixed “pad”. For a newly constructed Embedding, the embedding vector at padding_idx will default to all zeros, but can be updated to another value to be used as the padding vector.

Shape:
  • Input: \((*)\), IntTensor or LongTensor of arbitrary shape containing the indices to extract

  • Output: \((*, H)\), where * is the input shape and \(H=\text{embedding\_dim}\)

__init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, *args, **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

reset_parameters() None[source]
class cvnets.layers.PositionalEmbedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, is_learnable: bool | None = False, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]

Bases: BaseLayer

__init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, is_learnable: bool | None = False, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(seq_len: int, *args, **kwargs) Tensor[source]

Forward function.

class cvnets.layers.StochasticDepth(p: float, mode: str)[source]

Bases: StochasticDepth

Implements the Stochastic Depth “Deep Networks with Stochastic Depth” used for randomly dropping residual branches of residual architectures.

__init__(p: float, mode: str) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

cvnets.layers.get_normalization_layer(opts: Namespace, num_features: int, norm_type: str | None = None, num_groups: int | None = None, momentum: float | None = None) Module

Helper function to build the normalization layer. The function can be used in either of below mentioned ways: Scenario 1: Set the default normalization layers using command line arguments. This is useful when the same normalization layer is used for the entire network (e.g., ResNet). Scenario 2: Network uses different normalization layers. In that case, we can override the default normalization layer by specifying the name using norm_type argument.