cvnets.models.classification package

Subpackages

Submodules

cvnets.models.classification.base_image_encoder module

class cvnets.models.classification.base_image_encoder.BaseImageEncoder(opts: Namespace, *args, **kwargs)[source]

Bases: BaseAnyNNModel

Base class for different image classification models

__init__(opts: Namespace, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add image classification model-specific arguments

check_model() None[source]

Check to see if model is adhering to the image encoder structure. Sub-classes are not required to adhere to this structure. This is only required for easy integration with downstream tasks.

update_classifier(opts: Namespace, n_classes: int) None[source]

This function updates the classification layer in a model. Useful for fine-tuning purposes.

extract_end_points_all(x: Tensor, use_l5: bool | None = True, use_l5_exp: bool | None = False, *args, **kwargs) Dict[str, Tensor][source]

Extract feature maps from different spatial levels of the model.

Parameters:
  • x – Input image tensor

  • use_l5 – Include features from layer_5 in the output dictionary. Defaults to True.

  • use_l5_exp – Include features from conv_1x1_exp in the output dictionary. Defaults to False.

Returns:

A mapping containing the name and output at each spatial-level of the model.

…note:

This is useful for down-stream tasks.

extract_end_points_l4(x: Tensor, *args, **kwargs) Dict[str, Tensor][source]

This function is similar to extract_end_points_all, with an exception that it only returns output in a dictionary form till layer_4 of the model.

extract_features(x: Tensor, *args, **kwargs) Tensor[source]

This function is similar to extract_end_points_all. However, it returns a single tensor as the output of the last layer instead of a dictionary, and is typically used during classification tasks where intermediate feature maps are not required.

forward_classifier(x: Tensor, *args, **kwargs) Tensor[source]

A helper function to extract features and running a classifier.

forward(x: Any, *args, **kwargs) Any[source]

A forward function of the model, optionally training the model with neural augmentation.

get_trainable_parameters(weight_decay: float | None = 0.0, no_decay_bn_filter_bias: bool | None = False, *args, **kwargs) Tuple[List[Mapping], List[float]][source]

Get parameters for training along with the learning rate.

Parameters:
  • weight_decay – weight decay

  • no_decay_bn_filter_bias – Do not decay BN and biases. Defaults to False.

Returns:

Returns a tuple of length 2. The first entry is a list of dictionary with three keys (params, weight_decay, param_names). The second entry is a list of floats containing learning rate for each parameter.

dummy_input_and_label(batch_size: int) Dict[source]

Create dummy input and labels for CI/CD purposes. Child classes must override it if functionality is different.

get_exportable_model() Module[source]

This function can be used to prepare the architecture for inference. For example, re-parameterizing branches when possible. The functionality of this method may vary from model to model, so child model classes have to implement this method, if such a transformation exists.

classmethod build_model(opts: Namespace, *args, **kwargs) BaseAnyNNModel[source]

Helper function to build a model.

Parameters:

opts – Command-line arguments

Returns:

An instance of cvnets.models.BaseAnyNNModel.

cvnets.models.classification.base_image_encoder.set_model_specific_opts_before_model_building(opts: Namespace) Dict[str, Any][source]

Override library-level defaults with model-specific default values.

Parameters:

opts – Command-line arguments

Returns:

A dictionary containing the name of arguments that are updated along with their original values. This dictionary is used in unset_model_specific_opts_after_model_building function to unset the model-specific to library-specific defaults.

cvnets.models.classification.base_image_encoder.unset_model_specific_opts_after_model_building(opts: Namespace, default_opts_info: Dict[str, Any], *ars, **kwargs) None[source]

Given command-line arguments and a mapping of opts that needs to be unset, this function unsets the library-level defaults that were over-ridden previously in set_model_specific_opts_before_model_building.

cvnets.models.classification.byteformer module

cvnets.models.classification.byteformer.unfold_tokens(t: Tensor, kernel_size: int) Tensor[source]

Group tokens from tensor @t using torch.Tensor.unfold, using the given kernel size. This amounts to windowing @t using overlapping windows of size @kernel_size, with overlap of @kernel_size // 2.

Parameters:
  • t – A tensor of shape [batch_size, sequence_length, num_channels].

  • kernel_size – The kernel size.

Returns:

A tensor of shape [batch_size * (sequence_length - kernel_size) // (kernel_size // 2) + 1, kernel_size, num_channels].

class cvnets.models.classification.byteformer.ByteFormer(opts: Namespace, *args, **kwargs)[source]

Bases: BaseAnyNNModel

This class defines the ByteFormer architecture.

__init__(opts: Namespace, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add model-specific arguments

dummy_input_and_label(batch_size: int) Dict[source]

Get a dummy input and label that could be passed to the model.

Parameters:

batch_size – The batch size to use for the generated inputs.

Returns:

A dict with
{

“samples”: tensor of shape [batch_size, sequence_length], “targets”: tensor of shape [batch_size],

}

apply_token_reduction_net(x: Tensor, x_mask: Tensor) Tuple[Tensor, Tensor][source]

Apply the portion of the network used to reduce sequence lengths before the transformer backbone.

Parameters:
  • x – The input token embeddings of shape [batch_size, sequence_length, embed_dim].

  • x_mask – The input mask of shape [batch_size, sequence_length].

Returns:

New versions of @x and @x_mask, downsampled along the sequence dimension by the token reduction net.

get_backbone_inputs(x: Tensor) Tuple[Tensor, Tensor][source]

Convert input bytes into embeddings to be passed to the network’s transformer backbone.

Parameters:

x – The input bytes as an integer tensor of shape [batch_size, sequence_length]. Integer tensors are expected (rather than byte tensors) since -1 is usually used for padding.

Returns:

The embeddings of shape [batch_size, new_sequence_length] and a mask tensor of shape [batch_size, new_sequence_length]. The mask contains 0 at unmasked positions and float(-inf) at masked positions.

backbone_forward(x: Tensor, key_padding_mask: Tensor) Tuple[Tensor, Tensor][source]

Execute the forward pass of the network’s transformer backbone.

Parameters:
  • x – The input embeddings as a [batch_size, sequence_length, embed_dim] tensor.

  • key_padding_mask – The mask tensor of shape [batch_size, sequence_length].

Returns:

The outputs of the backbone as a tuple. The first element is the feature tensor, and the second element is the updated key_padding_mask.

get_downsampler_name(idx: int) str[source]

Get the name of the downsampling layer with index @idx.

Parameters:

idx – The index of the downsampling layer.

Returns:

A string representing the name of the donwsampling layer.

get_downsampler(idx: int) Module | None[source]

Get the module that performs downsampling after transformer layer @idx. If no downsampling occurs after that layer, return None.

Parameters:

idx – The desired index.

Returns:

The downsampling layer, or None.

forward(x: Tensor, *args, **kwargs) Tensor[source]

Perform a forward pass on input bytes. The tensor is stored as an integer tensor of shape [batch_size, sequence_length]. Integer tensors are used because @x usually contains mask tokens.

Parameters:

x – The input tensor of shape [batch_size, sequence_length].

Returns:

The output logits.

classmethod build_model(opts: Namespace, *args, **kwargs) BaseAnyNNModel[source]

Helper function to build a model.

Parameters:

opts – Command-line arguments.

Returns:

An instance of cvnets.models.BaseAnyNNModel.

cvnets.models.classification.efficientnet module

class cvnets.models.classification.efficientnet.EfficientNet(opts, *args, **kwargs: Any)[source]

Bases: BaseImageEncoder

This class defines the EfficientNet architecture

__init__(opts, *args, **kwargs: Any) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add image classification model-specific arguments

cvnets.models.classification.fastvit module

cvnets.models.classification.fastvit.basic_blocks(opts: Namespace, dim: int, block_index: int, num_blocks: List[int], token_mixer_type: str, kernel_size: int = 3, mlp_ratio: float = 4.0, drop_rate: float = 0.0, drop_path_rate: float = 0.0, inference_mode: bool = False, use_layer_scale: bool = True, layer_scale_init_value: float = 1e-05) Sequential[source]

Build FastViT blocks within a stage.

Parameters:
  • opts – Command line arguments.

  • dim – Number of embedding dimensions.

  • block_index – block index.

  • num_blocks – List containing number of blocks per stage.

  • token_mixer_type – Token mixer type.

  • kernel_size – Kernel size for repmixer.

  • mlp_ratio – MLP expansion ratio.

  • drop_rate – Dropout rate.

  • drop_path_rate – Drop path rate.

  • inference_mode – Flag to instantiate block in inference mode.

  • use_layer_scale – Flag to turn on layer scale regularization.

  • layer_scale_init_value – Layer scale value at initialization.

Returns:

nn.Sequential object of all the blocks within the stage.

class cvnets.models.classification.fastvit.FastViT(opts: Namespace, *args, **kwargs)[source]

Bases: BaseImageEncoder

This class implements FastViT architecture

__init__(opts: Namespace, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add model specific arguments

get_exportable_model() Module[source]

Method returns a reparameterized model for faster inference.

Returns:

Reparametrized FastViT model for faster inference.

cvnets.models.classification.mobilenetv1 module

class cvnets.models.classification.mobilenetv1.MobileNetv1(opts, *args, **kwargs)[source]

Bases: BaseImageEncoder

This class defines the MobileNet architecture

__init__(opts, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add model specific arguments

cvnets.models.classification.mobilenetv2 module

class cvnets.models.classification.mobilenetv2.MobileNetV2(opts, *args, **kwargs)[source]

Bases: BaseImageEncoder

This class defines the MobileNetv2 architecture

__init__(opts, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add image classification model-specific arguments

cvnets.models.classification.mobilenetv3 module

class cvnets.models.classification.mobilenetv3.MobileNetV3(opts, *args, **kwargs)[source]

Bases: BaseImageEncoder

This class implements the MobileNetv3 architecture

__init__(opts, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser)[source]

Add image classification model-specific arguments

cvnets.models.classification.mobileone module

class cvnets.models.classification.mobileone.MobileOne(opts, *args, **kwargs)[source]

Bases: BaseImageEncoder

This class implements MobileOne architecture

__init__(opts, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add model specific arguments

get_exportable_model() Module[source]
Method returns a model where a multi-branched structure
used in training is re-parameterized into a single branch

for inference.

Returns:

Reparametrized MobileOne model for faster inference.

cvnets.models.classification.mobilevit module

class cvnets.models.classification.mobilevit.MobileViT(opts, *args, **kwargs)[source]

Bases: BaseImageEncoder

This class implements the MobileViT architecture

__init__(opts, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add image classification model-specific arguments

cvnets.models.classification.mobilevit_v2 module

class cvnets.models.classification.mobilevit_v2.MobileViTv2(opts, *args, **kwargs)[source]

Bases: BaseImageEncoder

This class defines the MobileViTv2 architecture

__init__(opts, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add image classification model-specific arguments

cvnets.models.classification.regnet module

class cvnets.models.classification.regnet.RegNet(opts: Namespace, *args, **kwargs)[source]

Bases: BaseImageEncoder

This class implements the RegNet architecture

__init__(opts: Namespace, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add image classification model-specific arguments

cvnets.models.classification.resnet module

class cvnets.models.classification.resnet.ResNet(opts: Namespace, *args, **kwargs)[source]

Bases: BaseImageEncoder

This class implements the ResNet architecture

Note

Our ResNet implementation is different from the original implementation in two ways: 1. First 7x7 strided conv is replaced with 3x3 strided conv 2. MaxPool operation is replaced with another 3x3 strided depth-wise conv

__init__(opts: Namespace, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add image classification model-specific arguments

cvnets.models.classification.swin_transformer module

class cvnets.models.classification.swin_transformer.SwinTransformer(opts, *args, **kwargs)[source]

Bases: BaseImageEncoder

Implements Swin Transformer from the “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows” paper.

The code is adapted from “Torchvision repository”

__init__(opts, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

extract_end_points_all(x: Tensor, use_l5: bool | None = True, use_l5_exp: bool | None = False, *args, **kwargs) Dict[str, Tensor][source]

Extract feature maps from different spatial levels of the model.

Parameters:
  • x – Input image tensor

  • use_l5 – Include features from layer_5 in the output dictionary. Defaults to True.

  • use_l5_exp – Include features from conv_1x1_exp in the output dictionary. Defaults to False.

Returns:

A mapping containing the name and output at each spatial-level of the model.

…note:

This is useful for down-stream tasks.

classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add image classification model-specific arguments

cvnets.models.classification.vit module

class cvnets.models.classification.vit.VisionTransformer(opts: Namespace, *args, **kwargs)[source]

Bases: BaseImageEncoder

This class defines the Vision Transformer architecture. Our model implementation is inspired from Early Convolutions Help Transformers See Better

Note

Our implementation is different from the original implementation in two ways: 1. Kernel size is odd. 2. Our positional encoding implementation allows us to use ViT with any multiple input scales 3. We do not use StochasticDepth 4. We do not add positional encoding to class token (if enabled), as suggested in DeiT-3 paper

__init__(opts: Namespace, *args, **kwargs) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

update_layer_norm_eps()[source]
reset_simple_fpn_params() None[source]
classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add image classification model-specific arguments

extract_patch_embeddings(x: Tensor) Tuple[Tensor, Tuple[int, int]][source]
extract_features(x: Tensor, *args, **kwargs) Tuple[Tensor, Tensor | None][source]

This function is similar to extract_end_points_all. However, it returns a single tensor as the output of the last layer instead of a dictionary, and is typically used during classification tasks where intermediate feature maps are not required.

forward_classifier(x: Tensor, *args, **kwargs) Tuple[Tensor, Tensor][source]

A helper function to extract features and running a classifier.

forward(x: Tensor, *args, **kwargs) Tensor | Dict[str, Tensor][source]

A forward function of the model, optionally training the model with neural augmentation.

extract_end_points_all(x: Tensor, use_l5: bool | None = True, use_l5_exp: bool | None = False, *args, **kwargs) Dict[str, Tensor][source]

Extract feature maps from different spatial levels of the model.

Parameters:
  • x – Input image tensor

  • use_l5 – Include features from layer_5 in the output dictionary. Defaults to True.

  • use_l5_exp – Include features from conv_1x1_exp in the output dictionary. Defaults to False.

Returns:

A mapping containing the name and output at each spatial-level of the model.

…note:

This is useful for down-stream tasks.

Module contents