cvnets.layers package
Subpackages
- cvnets.layers.activation package
- Submodules
- cvnets.layers.activation.gelu module
- cvnets.layers.activation.hard_sigmoid module
- cvnets.layers.activation.hard_swish module
- cvnets.layers.activation.leaky_relu module
- cvnets.layers.activation.prelu module
- cvnets.layers.activation.relu module
- cvnets.layers.activation.relu6 module
- cvnets.layers.activation.sigmoid module
- cvnets.layers.activation.swish module
- cvnets.layers.activation.tanh module
- Module contents
- cvnets.layers.normalization package
Submodules
cvnets.layers.adaptive_pool module
- class cvnets.layers.adaptive_pool.AdaptiveAvgPool2d(output_size: int | Tuple[int, int] = 1, *args, **kwargs)[source]
Bases:
AdaptiveAvgPool2d
Applies a 2D adaptive average pooling over an input tensor.
- Parameters:
output_size (Optional, int or Tuple[int, int]) – The target output size. If a single int \(h\) is passed,
:param then a square output of size \(hxh\) is produced. If a tuple of size \(hxw\) is passed: :param then an: :param output of size hxw is produced. Default is 1.:
- Shape:
Input: \((N, C, H, W)\) where \(N\) is the batch size, \(C\) is the number of input channels,
\(H\) is the input height, and \(W\) is the input width - Output: \((N, C, h, h)\) or \((N, C, h, w)\)
cvnets.layers.base_layer module
- class cvnets.layers.base_layer.BaseLayer(*args, **kwargs)[source]
Bases:
Module
Base class for neural network layers. Subclass must implement forward function.
- __init__(*args, **kwargs) None [source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- classmethod add_arguments(parser: ArgumentParser) ArgumentParser [source]
Add layer specific arguments
- get_trainable_parameters(weight_decay: float | None = 0.0, no_decay_bn_filter_bias: bool | None = False, *args, **kwargs) Tuple[List[Dict], List[float]] [source]
Get parameters for training along with the learning rate.
- Parameters:
weight_decay – weight decay
no_decay_bn_filter_bias – Do not decay BN and biases. Defaults to False.
- Returns:
Returns a tuple of length 2. The first entry is a list of dictionary with three keys (params, weight_decay, param_names). The second entry is a list of floats containing learning rate for each parameter.
Note
Learning rate multiplier is set to 1.0 here as it is handled inside the Central Model.
cvnets.layers.conv_layer module
- class cvnets.layers.conv_layer.Conv2d(in_channels: int, out_channels: int, kernel_size: int | Tuple[int, int], stride: int | Tuple[int, int] | None = 1, padding: int | Tuple[int, int] | None = 0, dilation: int | Tuple[int, int] | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', *args, **kwargs)[source]
Bases:
Conv2d
Applies a 2D convolution over an input.
- Parameters:
in_channels – \(C_{in}\) from an expected input of size \((N, C_{in}, H_{in}, W_{in})\).
out_channels – \(C_{out}\) from an expected output of size \((N, C_{out}, H_{out}, W_{out})\).
kernel_size – Kernel size for convolution.
stride – Stride for convolution. Default: 1.
padding – Padding for convolution. Default: 0.
dilation – Dilation rate for convolution. Default: 1.
groups – Number of groups in convolution. Default: 1.
bias – Use bias. Default:
False
.padding_mode – Padding mode (‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’). Default:
zeros
.use_norm – Use normalization layer after convolution. Default:
True
.use_act – Use activation layer after convolution (or convolution and normalization). Default:
True
.act_name – Use specific activation function. Overrides the one specified in command line args.
- Shape:
Input: \((N, C_{in}, H_{in}, W_{in})\).
Output: \((N, C_{out}, H_{out}, W_{out})\).
- __init__(in_channels: int, out_channels: int, kernel_size: int | Tuple[int, int], stride: int | Tuple[int, int] | None = 1, padding: int | Tuple[int, int] | None = 0, dilation: int | Tuple[int, int] | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', *args, **kwargs) None [source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class cvnets.layers.conv_layer.ConvLayer1d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]
Bases:
_BaseConvNormActLayer
- ndim = 1
- module_cls
alias of
Conv1d
- class cvnets.layers.conv_layer.ConvLayer2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]
Bases:
_BaseConvNormActLayer
- ndim = 2
- class cvnets.layers.conv_layer.ConvLayer3d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]
Bases:
_BaseConvNormActLayer
- ndim = 3
- module_cls
alias of
Conv3d
- class cvnets.layers.conv_layer.TransposeConvLayer2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple, stride: int | Tuple | None = 1, dilation: int | Tuple | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', use_norm: bool | None = True, use_act: bool | None = True, padding: int | Tuple | None = (0, 0), output_padding: int | Tuple | None = None, auto_padding: bool | None = True, *args, **kwargs)[source]
Bases:
BaseLayer
Applies a 2D Transpose convolution (aka as Deconvolution) over an input.
- Parameters:
opts – Command line arguments.
in_channels – \(C_{in}\) from an expected input of size \((N, C_{in}, H_{in}, W_{in})\).
out_channels – \(C_{out}\) from an expected output of size \((N, C_{out}, H_{out}, W_{out})\).
kernel_size – Kernel size for convolution.
stride – Stride for convolution. Default: 1.
dilation – Dilation rate for convolution. Default: 1.
groups – Number of groups in convolution. Default: 1.
bias – Use bias. Default:
False
.padding_mode – Padding mode. Default:
zeros
.use_norm – Use normalization layer after convolution. Default:
True
.use_act – Use activation layer after convolution (or convolution and normalization).
Default –
True
.padding – Padding will be done on both sides of each dimension in the input.
output_padding – Additional padding on the output tensor.
auto_padding – Compute padding automatically. Default:
True
.
- Shape:
Input: \((N, C_{in}, H_{in}, W_{in})\).
Output: \((N, C_{out}, H_{out}, W_{out})\).
- __init__(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple, stride: int | Tuple | None = 1, dilation: int | Tuple | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', use_norm: bool | None = True, use_act: bool | None = True, padding: int | Tuple | None = (0, 0), output_padding: int | Tuple | None = None, auto_padding: bool | None = True, *args, **kwargs)[source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class cvnets.layers.conv_layer.NormActLayer(opts, num_features, *args, **kwargs)[source]
Bases:
BaseLayer
Applies a normalization layer followed by an activation layer.
- Parameters:
opts – Command-line arguments.
num_features – \(C\) from an expected input of size \((N, C, H, W)\).
- Shape:
Input: \((N, C, H, W)\).
Output: \((N, C, H, W)\).
- class cvnets.layers.conv_layer.SeparableConv1d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]
Bases:
_BaseSeparableConv
- conv_layer_cls
alias of
ConvLayer1d
- class cvnets.layers.conv_layer.SeparableConv2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]
Bases:
_BaseSeparableConv
- conv_layer_cls
alias of
ConvLayer2d
- class cvnets.layers.conv_layer.SeparableConv3d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]
Bases:
_BaseSeparableConv
- conv_layer_cls
alias of
ConvLayer3d
cvnets.layers.dropout module
- class cvnets.layers.dropout.Dropout(p: float | None = 0.5, inplace: bool | None = False, *args, **kwargs)[source]
Bases:
Dropout
This layer, during training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution.
- Parameters:
p – probability of an element to be zeroed. Default: 0.5
inplace – If set to
True
, will do this operation in-place. Default:False
- Shape:
Input: \((N, *)\) where \(N\) is the batch size
Output: same as the input
- class cvnets.layers.dropout.Dropout2d(p: float = 0.5, inplace: bool = False)[source]
Bases:
Dropout2d
This layer, during training, randomly zeroes some of the elements of the 4D input tensor with probability p using samples from a Bernoulli distribution.
- Parameters:
p – probability of an element to be zeroed. Default: 0.5
inplace – If set to
True
, will do this operation in-place. Default:False
- Shape:
- Input: \((N, C, H, W)\) where \(N\) is the batch size, \(C\) is the input channels,
\(H\) is the input tensor height, and \(W\) is the input tensor width
Output: same as the input
cvnets.layers.embedding module
- class cvnets.layers.embedding.Embedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, *args, **kwargs)[source]
Bases:
Embedding
A lookup table that stores embeddings of a fixed dictionary and size.
- Parameters:
num_embeddings (int) – size of the dictionary of embeddings
embedding_dim (int) – the size of each embedding vector
padding_idx (int, optional) – If specified, the entries at
padding_idx
do not contribute to the gradient; therefore, the embedding vector atpadding_idx
is not updated during training, i.e. it remains as a fixed “pad”. For a newly constructed Embedding, the embedding vector atpadding_idx
will default to all zeros, but can be updated to another value to be used as the padding vector.
- Shape:
Input: \((*)\), IntTensor or LongTensor of arbitrary shape containing the indices to extract
Output: \((*, H)\), where * is the input shape and \(H=\text{embedding\_dim}\)
cvnets.layers.flatten module
- class cvnets.layers.flatten.Flatten(start_dim: int | None = 1, end_dim: int | None = -1)[source]
Bases:
Flatten
This layer flattens a contiguous range of dimensions into a tensor.
- Parameters:
start_dim (Optional[int]) – first dim to flatten. Default: 1
end_dim (Optional[int]) – last dim to flatten. Default: -1
- Shape:
Input: \((*, S_{\text{start}},..., S_{i}, ..., S_{\text{end}}, *)\),’ where \(S_{i}\) is the size at dimension \(i\) and \(*\) means any number of dimensions including none.
Output: \((*, \prod_{i=\text{start}}^{\text{end}} S_{i}, *)\).
cvnets.layers.global_pool module
- class cvnets.layers.global_pool.GlobalPool(pool_type: str | None = 'mean', keep_dim: bool | None = False, *args, **kwargs)[source]
Bases:
BaseLayer
This layers applies global pooling over a 4D or 5D input tensor
- Parameters:
pool_type (Optional[str]) – Pooling type. It can be mean, rms, or abs. Default: mean
keep_dim (Optional[bool]) – Do not squeeze the dimensions of a tensor. Default: False
- Shape:
Input: \((N, C, H, W)\) or \((N, C, D, H, W)\)
Output: \((N, C, 1, 1)\) or \((N, C, 1, 1, 1)\) if keep_dim else \((N, C)\)
- pool_types = ['mean', 'rms', 'abs']
cvnets.layers.identity module
cvnets.layers.linear_attention module
- class cvnets.layers.linear_attention.LinearSelfAttention(opts, embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs)[source]
Bases:
BaseLayer
This layer applies a self-attention with linear complexity, as described in MobileViTv2 paper. This layer can be used for self- as well as cross-attention.
- Parameters:
opts – command line arguments
embed_dim (int) – \(C\) from an expected input of size \((N, C, H, W)\)
attn_dropout (Optional[float]) – Dropout value for context scores. Default: 0.0
bias (Optional[bool]) – Use bias in learnable layers. Default: True
- Shape:
Input: \((N, C, P, N)\) where \(N\) is the batch size, \(C\) is the input channels,
\(P\) is the number of pixels in the patch, and \(N\) is the number of patches - Output: same as the input
Note
For MobileViTv2, we unfold the feature map [B, C, H, W] into [B, C, P, N] where P is the number of pixels in a patch and N is the number of patches. Because channel is the first dimension in this unfolded tensor, we use point-wise convolution (instead of a linear layer). This avoids a transpose operation (which may be expensive on resource-constrained devices) that may be required to convert the unfolded tensor from channel-first to channel-last format in case of a linear layer.
cvnets.layers.linear_layer module
- class cvnets.layers.linear_layer.LinearLayer(in_features: int, out_features: int, bias: bool | None = True, channel_first: bool | None = False, *args, **kwargs)[source]
Bases:
BaseLayer
Applies a linear transformation to the input data
- Parameters:
in_features (int) – number of features in the input tensor
out_features (int) – number of features in the output tensor
bias (Optional[bool]) – use bias or not
channel_first (Optional[bool]) – Channels are first or last dimension. If first, then use Conv2d
- Shape:
Input: \((N, *, C_{in})\) if not channel_first else \((N, C_{in}, *)\) where \(*\) means any number of dimensions.
Output: \((N, *, C_{out})\) if not channel_first else \((N, C_{out}, *)\)
- class cvnets.layers.linear_layer.GroupLinear(in_features: int, out_features: int, n_groups: int, bias: bool | None = True, feature_shuffle: bool | None = False, *args, **kwargs)[source]
Bases:
BaseLayer
Applies a GroupLinear transformation layer, as defined here, here and here
- Parameters:
in_features (int) – number of features in the input tensor
out_features (int) – number of features in the output tensor
n_groups (int) – number of groups
bias (Optional[bool]) – use bias or not
feature_shuffle (Optional[bool]) – Shuffle features between groups
- Shape:
Input: \((N, *, C_{in})\)
Output: \((N, *, C_{out})\)
cvnets.layers.multi_head_attention module
- class cvnets.layers.multi_head_attention.MultiHeadAttention(embed_dim: int, num_heads: int, attn_dropout: float | None = 0.0, bias: bool | None = True, output_dim: int | None = None, coreml_compatible: bool | None = False, *args, **kwargs)[source]
Bases:
BaseLayer
This layer applies a multi-head self- or cross-attention as described in Attention is all you need paper
- Parameters:
embed_dim (int) – \(C_{in}\) from an expected input of size \((N, S, C_{in})\)
num_heads (int) – Number of heads in multi-head attention
attn_dropout (Optional[float]) – Attention dropout. Default: 0.0
bias (Optional[bool]) – Use bias or not. Default:
True
- Shape:
- Input:
Query tensor (x_q) \((N, S, C_{in})\) where \(N\) is batch size, \(S\) is number of source tokens,
- and \(C_{in}\) is input embedding dim
Optional Key-Value tensor (x_kv) \((N, T, C_{in})\) where \(T\) is number of target tokens
Output: same shape as the input
- __init__(embed_dim: int, num_heads: int, attn_dropout: float | None = 0.0, bias: bool | None = True, output_dim: int | None = None, coreml_compatible: bool | None = False, *args, **kwargs) None [source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward_tracing(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) Tensor [source]
- forward_default(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) Tensor [source]
cvnets.layers.normalization_layers module
cvnets.layers.pixel_shuffle module
- class cvnets.layers.pixel_shuffle.PixelShuffle(upscale_factor: int, *args, **kwargs)[source]
Bases:
PixelShuffle
Rearranges elements in a tensor of shape \((*, C imes r^2, H, W)\) to a tensor of shape \((*, C, H imes r, W imes r)\), where r is an upscale factor.
- Parameters:
upscale_factor (int) – factor to increase spatial resolution by
- Shape:
Input: \((*, C imes r^2, H, W)\), where * is zero or more dimensions
Output: \((*, C, H imes r, W imes r)\)
cvnets.layers.pooling module
- class cvnets.layers.pooling.MaxPool2d(kernel_size: int | None = 3, stride: int | None = 2, padding: int | None = 1, *args, **kwargs)[source]
Bases:
MaxPool2d
Applies a 2D max pooling over a 4D input tensor.
- Parameters:
kernel_size (Optional[int]) – the size of the window to take a max over
stride (Optional[int]) – The stride of the window. Default: 2
padding (Optional[int]) – Padding to be added on both sides of the tensor. Default: 1
- Shape:
- Input: \((N, C, H_{in}, W_{in})\) where \(N\) is the batch size, \(C\) is the input channels,
\(H_{in}\) is the input height, and \(W_{in}\) is the input width
- Output: \((N, C, H_{out}, W_{out})\) where \(H_{out}\) is the output height, and \(W_{in}\) is
the output width
- class cvnets.layers.pooling.AvgPool2d(kernel_size: tuple, stride: tuple | None = None, padding: tuple | None = (0, 0), ceil_mode: bool | None = False, count_include_pad: bool | None = True, divisor_override: bool | None = None)[source]
Bases:
AvgPool2d
Applies a 2D average pooling over a 4D input tensor.
- Parameters:
kernel_size (Optional[int]) – the size of the window to take a max over
stride (Optional[int]) – The stride of the window. Default: 2
padding (Optional[int]) – Padding to be added on both sides of the tensor. Default: 1
ceil_mode (Optional[bool]) – When True, will use ceil instead of floor to compute the output shape. Default: False
count_include_pad (Optional[bool]) – When True, will include the zero-padding in the averaging calculation. Default: True
divisor_override – if specified, it will be used as divisor, otherwise size of the pooling region will be used. Default: None
- Shape:
- Input: \((N, C, H_{in}, W_{in})\) where \(N\) is the batch size, \(C\) is the input channels,
\(H_{in}\) is the input height, and \(W_{in}\) is the input width
- Output: \((N, C, H_{out}, W_{out})\) where \(H_{out}\) is the output height, and \(W_{in}\) is
the output width
cvnets.layers.positional_embedding module
- class cvnets.layers.positional_embedding.PositionalEmbedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, is_learnable: bool | None = False, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]
Bases:
BaseLayer
- __init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, is_learnable: bool | None = False, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class cvnets.layers.positional_embedding.LearnablePositionalEmbedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]
Bases:
Module
Learnable Positional embedding
- __init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(seq_len: int, *args, **kwargs) Tensor [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class cvnets.layers.positional_embedding.SinusoidalPositionalEmbedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]
Bases:
Module
- __init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(seq_len: int, *args, **kwargs) Tensor [source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
cvnets.layers.positional_encoding module
- class cvnets.layers.positional_encoding.SinusoidalPositionalEncoding(d_model: int, dropout: float | None = 0.0, max_len: int | None = 5000, channels_last: bool | None = True, *args, **kwargs)[source]
Bases:
BaseLayer
This layer adds sinusoidal positional embeddings to a 3D input tensor. The code has been adapted from Pytorch tutorial
- Parameters:
d_model (int) – dimension of the input tensor
dropout (Optional[float]) – Dropout rate. Default: 0.0
max_len (Optional[int]) – Max. number of patches (or seq. length). Default: 5000
channels_last (Optional[bool]) – Channels dimension is the last in the input tensor
- Shape:
- Input: \((N, C, P)\) or \((N, P, C)\) where \(N\) is the batch size, \(C\) is the embedding dimension,
\(P\) is the number of patches
Output: same shape as the input
- class cvnets.layers.positional_encoding.LearnablePositionEncoding(embed_dim: int, num_embeddings: int, dropout: float | None = 0.0, channels_last: bool | None = True, *args, **kwargs)[source]
Bases:
BaseLayer
This layer adds learnable positional embeddings to a 3D input tensor.
- Parameters:
embed_dim (int) – dimension of the input tensor
num_embeddings (int) – number of input embeddings. This is similar to vocab size in NLP.
dropout (Optional[float]) – Dropout rate. Default: 0.0
channels_last (Optional[bool]) – Channels dimension is the last in the input tensor
- Shape:
- Input: \((N, *, C, P)\) or \((N, *, P, C)\) where \(N\) is the batch size, \(C\) is the embedding dimension,
\(P\) is the number of patches
Output: same shape as the input
cvnets.layers.random_layers module
- class cvnets.layers.random_layers.RandomApply(module_list: List, keep_p: float | None = 0.8, *args, **kwargs)[source]
Bases:
BaseLayer
This layer randomly applies a list of modules during training.
- Parameters:
module_list (List) – List of modules
keep_p (Optional[float]) – Keep P modules from the list during training. Default: 0.8 (or 80%)
cvnets.layers.single_head_attention module
- class cvnets.layers.single_head_attention.SingleHeadAttention(embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs)[source]
Bases:
BaseLayer
This layer applies a single-head attention as described in DeLighT paper
- Parameters:
embed_dim (int) – \(C_{in}\) from an expected input of size \((N, P, C_{in})\)
attn_dropout (Optional[float]) – Attention dropout. Default: 0.0
bias (Optional[bool]) – Use bias or not. Default:
True
- Shape:
Input: \((N, P, C_{in})\) where \(N\) is batch size, \(P\) is number of patches,
and \(C_{in}\) is input embedding dim - Output: same shape as the input
cvnets.layers.softmax module
- class cvnets.layers.softmax.Softmax(dim: int | None = -1, *args, **kwargs)[source]
Bases:
Softmax
Applies the Softmax function to an input tensor along the specified dimension
- Parameters:
dim (int) – Dimension along which softmax to be applied. Default: -1
- Shape:
Input: \((*)\) where \(*\) is one or more dimensions
Output: same shape as the input
cvnets.layers.stochastic_depth module
- class cvnets.layers.stochastic_depth.StochasticDepth(p: float, mode: str)[source]
Bases:
StochasticDepth
Implements the Stochastic Depth “Deep Networks with Stochastic Depth” used for randomly dropping residual branches of residual architectures.
cvnets.layers.token_merging module
- class cvnets.layers.token_merging.TokenMerging(dim: int, window: int = 2)[source]
Bases:
Module
Merge tokens from a [batch_size, sequence_length, num_channels] tensor using a linear projection.
This function also updates masks and adds padding as needed to make the sequence length divisible by the window size before merging tokens.
- Parameters:
dim – Number of input channels.
window – The size of the window to merge into a single token.
- __init__(dim: int, window: int = 2) None [source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x: Tensor, key_padding_mask: Tensor) Tuple[Tensor, Tensor] [source]
Perform token merging.
- Parameters:
x – A tensor of shape [batch_size, sequence_length, num_channels].
key_padding_mask – A tensor of shape [batch_size, sequence_length] with “-inf” values at mask tokens, and “0” values at unmasked tokens.
- Returns:
- A tensor of shape [batch_size, math.ceil(sequence_length /
self.window), num_channels], where @self.window is the window size.
- cvnets.layers.token_merging.pad_x_and_mask(x: Tensor, key_padding_mask: Tensor, window_size: int) Tuple[Tensor, Tensor] [source]
Apply padding to @x and @key_padding_mask to make their lengths divisible by @window_size.
- Parameters:
x – The input tensor of shape [B, N, C].
key_padding_mask – The mask of shape [B, N].
window_size – the N dimension of @x and @key_padding_mask will be padded to make them divisble by this number.
- Returns:
A tuple containing @x and @key_padding_mask, with padding applied.
cvnets.layers.upsample module
- class cvnets.layers.upsample.UpSample(size: int | Tuple[int, ...] | None = None, scale_factor: float | None = None, mode: str | None = 'nearest', align_corners: bool | None = None, *args, **kwargs)[source]
Bases:
Upsample
This layer upsamples a given input tensor.
- Parameters:
size (Optional[Union[int, Tuple[int, ...]]) – Output spatial size. Default: None
scale_factor (Optional[float]) – Scale each spatial dimension of the input by this factor. Default: None
mode (Optional[str]) – Upsampling algorithm (
'nearest'
,'linear'
,'bilinear'
,'bicubic'
and'trilinear'
. Default:'nearest'
align_corners (Optional[bool]) – if
True
, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect whenmode
is'linear'
,'bilinear'
,'bicubic'
, or'trilinear'
. Default:None
- Shape:
Input: \((N, C, W_{in})\) or \((N, C, H_{in}, W_{in})\) or \((N, C, D_{in}, H_{in}, W_{in})\)
Output: \((N, C, W_{out})\) or \((N, C, H_{out}, W_{out})\) or \((N, C, D_{out}, H_{out}, W_{out})\)
Module contents
- class cvnets.layers.ConvLayer1d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]
Bases:
_BaseConvNormActLayer
- ndim = 1
- module_cls
alias of
Conv1d
- training: bool
- class cvnets.layers.ConvLayer2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]
Bases:
_BaseConvNormActLayer
- ndim = 2
- training: bool
- class cvnets.layers.ConvLayer3d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, padding: int | Tuple[int, ...] | None = None, groups: int = 1, bias: bool = False, padding_mode: str = 'zeros', use_norm: bool = True, use_act: bool = True, norm_layer: Module | None = None, act_layer: Module | None = None, *args, **kwargs)[source]
Bases:
_BaseConvNormActLayer
- ndim = 3
- module_cls
alias of
Conv3d
- training: bool
- class cvnets.layers.SeparableConv1d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]
Bases:
_BaseSeparableConv
- conv_layer_cls
alias of
ConvLayer1d
- class cvnets.layers.SeparableConv2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]
Bases:
_BaseSeparableConv
- conv_layer_cls
alias of
ConvLayer2d
- class cvnets.layers.SeparableConv3d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple[int, ...], stride: int | Tuple[int, ...] = 1, dilation: int | Tuple[int, ...] = 1, use_norm: bool = True, use_act: bool = True, use_act_depthwise: bool = False, bias: bool = False, padding_mode: str = 'zeros', act_name: str | None = None, *args, **kwargs)[source]
Bases:
_BaseSeparableConv
- conv_layer_cls
alias of
ConvLayer3d
- class cvnets.layers.NormActLayer(opts, num_features, *args, **kwargs)[source]
Bases:
BaseLayer
Applies a normalization layer followed by an activation layer.
- Parameters:
opts – Command-line arguments.
num_features – \(C\) from an expected input of size \((N, C, H, W)\).
- Shape:
Input: \((N, C, H, W)\).
Output: \((N, C, H, W)\).
- class cvnets.layers.TransposeConvLayer2d(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple, stride: int | Tuple | None = 1, dilation: int | Tuple | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', use_norm: bool | None = True, use_act: bool | None = True, padding: int | Tuple | None = (0, 0), output_padding: int | Tuple | None = None, auto_padding: bool | None = True, *args, **kwargs)[source]
Bases:
BaseLayer
Applies a 2D Transpose convolution (aka as Deconvolution) over an input.
- Parameters:
opts – Command line arguments.
in_channels – \(C_{in}\) from an expected input of size \((N, C_{in}, H_{in}, W_{in})\).
out_channels – \(C_{out}\) from an expected output of size \((N, C_{out}, H_{out}, W_{out})\).
kernel_size – Kernel size for convolution.
stride – Stride for convolution. Default: 1.
dilation – Dilation rate for convolution. Default: 1.
groups – Number of groups in convolution. Default: 1.
bias – Use bias. Default:
False
.padding_mode – Padding mode. Default:
zeros
.use_norm – Use normalization layer after convolution. Default:
True
.use_act – Use activation layer after convolution (or convolution and normalization).
Default –
True
.padding – Padding will be done on both sides of each dimension in the input.
output_padding – Additional padding on the output tensor.
auto_padding – Compute padding automatically. Default:
True
.
- Shape:
Input: \((N, C_{in}, H_{in}, W_{in})\).
Output: \((N, C_{out}, H_{out}, W_{out})\).
- __init__(opts: Namespace, in_channels: int, out_channels: int, kernel_size: int | Tuple, stride: int | Tuple | None = 1, dilation: int | Tuple | None = 1, groups: int | None = 1, bias: bool | None = False, padding_mode: str | None = 'zeros', use_norm: bool | None = True, use_act: bool | None = True, padding: int | Tuple | None = (0, 0), output_padding: int | Tuple | None = None, auto_padding: bool | None = True, *args, **kwargs)[source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class cvnets.layers.LinearLayer(in_features: int, out_features: int, bias: bool | None = True, channel_first: bool | None = False, *args, **kwargs)[source]
Bases:
BaseLayer
Applies a linear transformation to the input data
- Parameters:
in_features (int) – number of features in the input tensor
out_features (int) – number of features in the output tensor
bias (Optional[bool]) – use bias or not
channel_first (Optional[bool]) – Channels are first or last dimension. If first, then use Conv2d
- Shape:
Input: \((N, *, C_{in})\) if not channel_first else \((N, C_{in}, *)\) where \(*\) means any number of dimensions.
Output: \((N, *, C_{out})\) if not channel_first else \((N, C_{out}, *)\)
- class cvnets.layers.GroupLinear(in_features: int, out_features: int, n_groups: int, bias: bool | None = True, feature_shuffle: bool | None = False, *args, **kwargs)[source]
Bases:
BaseLayer
Applies a GroupLinear transformation layer, as defined here, here and here
- Parameters:
in_features (int) – number of features in the input tensor
out_features (int) – number of features in the output tensor
n_groups (int) – number of groups
bias (Optional[bool]) – use bias or not
feature_shuffle (Optional[bool]) – Shuffle features between groups
- Shape:
Input: \((N, *, C_{in})\)
Output: \((N, *, C_{out})\)
- class cvnets.layers.GlobalPool(pool_type: str | None = 'mean', keep_dim: bool | None = False, *args, **kwargs)[source]
Bases:
BaseLayer
This layers applies global pooling over a 4D or 5D input tensor
- Parameters:
pool_type (Optional[str]) – Pooling type. It can be mean, rms, or abs. Default: mean
keep_dim (Optional[bool]) – Do not squeeze the dimensions of a tensor. Default: False
- Shape:
Input: \((N, C, H, W)\) or \((N, C, D, H, W)\)
Output: \((N, C, 1, 1)\) or \((N, C, 1, 1, 1)\) if keep_dim else \((N, C)\)
- pool_types = ['mean', 'rms', 'abs']
- __init__(pool_type: str | None = 'mean', keep_dim: bool | None = False, *args, **kwargs) None [source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- training: bool
- class cvnets.layers.Identity[source]
Bases:
BaseLayer
This is a place-holder and returns the same tensor.
- class cvnets.layers.PixelShuffle(upscale_factor: int, *args, **kwargs)[source]
Bases:
PixelShuffle
Rearranges elements in a tensor of shape \((*, C imes r^2, H, W)\) to a tensor of shape \((*, C, H imes r, W imes r)\), where r is an upscale factor.
- Parameters:
upscale_factor (int) – factor to increase spatial resolution by
- Shape:
Input: \((*, C imes r^2, H, W)\), where * is zero or more dimensions
Output: \((*, C, H imes r, W imes r)\)
- class cvnets.layers.UpSample(size: int | Tuple[int, ...] | None = None, scale_factor: float | None = None, mode: str | None = 'nearest', align_corners: bool | None = None, *args, **kwargs)[source]
Bases:
Upsample
This layer upsamples a given input tensor.
- Parameters:
size (Optional[Union[int, Tuple[int, ...]]) – Output spatial size. Default: None
scale_factor (Optional[float]) – Scale each spatial dimension of the input by this factor. Default: None
mode (Optional[str]) – Upsampling algorithm (
'nearest'
,'linear'
,'bilinear'
,'bicubic'
and'trilinear'
. Default:'nearest'
align_corners (Optional[bool]) – if
True
, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect whenmode
is'linear'
,'bilinear'
,'bicubic'
, or'trilinear'
. Default:None
- Shape:
Input: \((N, C, W_{in})\) or \((N, C, H_{in}, W_{in})\) or \((N, C, D_{in}, H_{in}, W_{in})\)
Output: \((N, C, W_{out})\) or \((N, C, H_{out}, W_{out})\) or \((N, C, D_{out}, H_{out}, W_{out})\)
- class cvnets.layers.MaxPool2d(kernel_size: int | None = 3, stride: int | None = 2, padding: int | None = 1, *args, **kwargs)[source]
Bases:
MaxPool2d
Applies a 2D max pooling over a 4D input tensor.
- Parameters:
kernel_size (Optional[int]) – the size of the window to take a max over
stride (Optional[int]) – The stride of the window. Default: 2
padding (Optional[int]) – Padding to be added on both sides of the tensor. Default: 1
- Shape:
- Input: \((N, C, H_{in}, W_{in})\) where \(N\) is the batch size, \(C\) is the input channels,
\(H_{in}\) is the input height, and \(W_{in}\) is the input width
- Output: \((N, C, H_{out}, W_{out})\) where \(H_{out}\) is the output height, and \(W_{in}\) is
the output width
- class cvnets.layers.AvgPool2d(kernel_size: tuple, stride: tuple | None = None, padding: tuple | None = (0, 0), ceil_mode: bool | None = False, count_include_pad: bool | None = True, divisor_override: bool | None = None)[source]
Bases:
AvgPool2d
Applies a 2D average pooling over a 4D input tensor.
- Parameters:
kernel_size (Optional[int]) – the size of the window to take a max over
stride (Optional[int]) – The stride of the window. Default: 2
padding (Optional[int]) – Padding to be added on both sides of the tensor. Default: 1
ceil_mode (Optional[bool]) – When True, will use ceil instead of floor to compute the output shape. Default: False
count_include_pad (Optional[bool]) – When True, will include the zero-padding in the averaging calculation. Default: True
divisor_override – if specified, it will be used as divisor, otherwise size of the pooling region will be used. Default: None
- Shape:
- Input: \((N, C, H_{in}, W_{in})\) where \(N\) is the batch size, \(C\) is the input channels,
\(H_{in}\) is the input height, and \(W_{in}\) is the input width
- Output: \((N, C, H_{out}, W_{out})\) where \(H_{out}\) is the output height, and \(W_{in}\) is
the output width
- class cvnets.layers.Dropout(p: float | None = 0.5, inplace: bool | None = False, *args, **kwargs)[source]
Bases:
Dropout
This layer, during training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution.
- Parameters:
p – probability of an element to be zeroed. Default: 0.5
inplace – If set to
True
, will do this operation in-place. Default:False
- Shape:
Input: \((N, *)\) where \(N\) is the batch size
Output: same as the input
- class cvnets.layers.Dropout2d(p: float = 0.5, inplace: bool = False)[source]
Bases:
Dropout2d
This layer, during training, randomly zeroes some of the elements of the 4D input tensor with probability p using samples from a Bernoulli distribution.
- Parameters:
p – probability of an element to be zeroed. Default: 0.5
inplace – If set to
True
, will do this operation in-place. Default:False
- Shape:
- Input: \((N, C, H, W)\) where \(N\) is the batch size, \(C\) is the input channels,
\(H\) is the input tensor height, and \(W\) is the input tensor width
Output: same as the input
- class cvnets.layers.AdjustBatchNormMomentum(opts, *args, **kwargs)[source]
Bases:
object
This class enables adjusting the momentum in batch normalization layer.
Note
It’s an experimental feature and should be used with caution.
- round_places = 6
- class cvnets.layers.Flatten(start_dim: int | None = 1, end_dim: int | None = -1)[source]
Bases:
Flatten
This layer flattens a contiguous range of dimensions into a tensor.
- Parameters:
start_dim (Optional[int]) – first dim to flatten. Default: 1
end_dim (Optional[int]) – last dim to flatten. Default: -1
- Shape:
Input: \((*, S_{\text{start}},..., S_{i}, ..., S_{\text{end}}, *)\),’ where \(S_{i}\) is the size at dimension \(i\) and \(*\) means any number of dimensions including none.
Output: \((*, \prod_{i=\text{start}}^{\text{end}} S_{i}, *)\).
- class cvnets.layers.MultiHeadAttention(embed_dim: int, num_heads: int, attn_dropout: float | None = 0.0, bias: bool | None = True, output_dim: int | None = None, coreml_compatible: bool | None = False, *args, **kwargs)[source]
Bases:
BaseLayer
This layer applies a multi-head self- or cross-attention as described in Attention is all you need paper
- Parameters:
embed_dim (int) – \(C_{in}\) from an expected input of size \((N, S, C_{in})\)
num_heads (int) – Number of heads in multi-head attention
attn_dropout (Optional[float]) – Attention dropout. Default: 0.0
bias (Optional[bool]) – Use bias or not. Default:
True
- Shape:
- Input:
Query tensor (x_q) \((N, S, C_{in})\) where \(N\) is batch size, \(S\) is number of source tokens,
- and \(C_{in}\) is input embedding dim
Optional Key-Value tensor (x_kv) \((N, T, C_{in})\) where \(T\) is number of target tokens
Output: same shape as the input
- __init__(embed_dim: int, num_heads: int, attn_dropout: float | None = 0.0, bias: bool | None = True, output_dim: int | None = None, coreml_compatible: bool | None = False, *args, **kwargs) None [source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward_tracing(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) Tensor [source]
- forward_default(x_q: Tensor, x_kv: Tensor | None = None, key_padding_mask: Tensor | None = None, attn_mask: Tensor | None = None) Tensor [source]
- class cvnets.layers.SingleHeadAttention(embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs)[source]
Bases:
BaseLayer
This layer applies a single-head attention as described in DeLighT paper
- Parameters:
embed_dim (int) – \(C_{in}\) from an expected input of size \((N, P, C_{in})\)
attn_dropout (Optional[float]) – Attention dropout. Default: 0.0
bias (Optional[bool]) – Use bias or not. Default:
True
- Shape:
Input: \((N, P, C_{in})\) where \(N\) is batch size, \(P\) is number of patches,
and \(C_{in}\) is input embedding dim - Output: same shape as the input
- class cvnets.layers.Softmax(dim: int | None = -1, *args, **kwargs)[source]
Bases:
Softmax
Applies the Softmax function to an input tensor along the specified dimension
- Parameters:
dim (int) – Dimension along which softmax to be applied. Default: -1
- Shape:
Input: \((*)\) where \(*\) is one or more dimensions
Output: same shape as the input
- class cvnets.layers.LinearSelfAttention(opts, embed_dim: int, attn_dropout: float | None = 0.0, bias: bool | None = True, *args, **kwargs)[source]
Bases:
BaseLayer
This layer applies a self-attention with linear complexity, as described in MobileViTv2 paper. This layer can be used for self- as well as cross-attention.
- Parameters:
opts – command line arguments
embed_dim (int) – \(C\) from an expected input of size \((N, C, H, W)\)
attn_dropout (Optional[float]) – Dropout value for context scores. Default: 0.0
bias (Optional[bool]) – Use bias in learnable layers. Default: True
- Shape:
Input: \((N, C, P, N)\) where \(N\) is the batch size, \(C\) is the input channels,
\(P\) is the number of pixels in the patch, and \(N\) is the number of patches - Output: same as the input
Note
For MobileViTv2, we unfold the feature map [B, C, H, W] into [B, C, P, N] where P is the number of pixels in a patch and N is the number of patches. Because channel is the first dimension in this unfolded tensor, we use point-wise convolution (instead of a linear layer). This avoids a transpose operation (which may be expensive on resource-constrained devices) that may be required to convert the unfolded tensor from channel-first to channel-last format in case of a linear layer.
- class cvnets.layers.Embedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, *args, **kwargs)[source]
Bases:
Embedding
A lookup table that stores embeddings of a fixed dictionary and size.
- Parameters:
num_embeddings (int) – size of the dictionary of embeddings
embedding_dim (int) – the size of each embedding vector
padding_idx (int, optional) – If specified, the entries at
padding_idx
do not contribute to the gradient; therefore, the embedding vector atpadding_idx
is not updated during training, i.e. it remains as a fixed “pad”. For a newly constructed Embedding, the embedding vector atpadding_idx
will default to all zeros, but can be updated to another value to be used as the padding vector.
- Shape:
Input: \((*)\), IntTensor or LongTensor of arbitrary shape containing the indices to extract
Output: \((*, H)\), where * is the input shape and \(H=\text{embedding\_dim}\)
- class cvnets.layers.PositionalEmbedding(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, is_learnable: bool | None = False, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]
Bases:
BaseLayer
- __init__(opts, num_embeddings: int, embedding_dim: int, padding_idx: int | None = None, is_learnable: bool | None = False, sequence_first: bool | None = False, interpolation_mode: str | None = 'bilinear', *args, **kwargs)[source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- class cvnets.layers.StochasticDepth(p: float, mode: str)[source]
Bases:
StochasticDepth
Implements the Stochastic Depth “Deep Networks with Stochastic Depth” used for randomly dropping residual branches of residual architectures.
- cvnets.layers.get_normalization_layer(opts: Namespace, num_features: int, norm_type: str | None = None, num_groups: int | None = None, momentum: float | None = None) Module
Helper function to build the normalization layer. The function can be used in either of below mentioned ways: Scenario 1: Set the default normalization layers using command line arguments. This is useful when the same normalization layer is used for the entire network (e.g., ResNet). Scenario 2: Network uses different normalization layers. In that case, we can override the default normalization layer by specifying the name using norm_type argument.