cvnets.models.classification package
Subpackages
- cvnets.models.classification.config package
- Submodules
- cvnets.models.classification.config.byteformer module
- cvnets.models.classification.config.efficientnet module
- cvnets.models.classification.config.fastvit module
- cvnets.models.classification.config.mobilenetv1 module
- cvnets.models.classification.config.mobilenetv2 module
- cvnets.models.classification.config.mobilenetv3 module
- cvnets.models.classification.config.mobileone module
- cvnets.models.classification.config.mobilevit module
- cvnets.models.classification.config.mobilevit_v2 module
- cvnets.models.classification.config.regnet module
- cvnets.models.classification.config.resnet module
- cvnets.models.classification.config.swin_transformer module
- cvnets.models.classification.config.vit module
- Module contents
Submodules
cvnets.models.classification.base_image_encoder module
- class cvnets.models.classification.base_image_encoder.BaseImageEncoder(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseAnyNNModel
Base class for different image classification models
- __init__(opts: Namespace, *args, **kwargs) None [source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- classmethod add_arguments(parser: ArgumentParser) ArgumentParser [source]
Add image classification model-specific arguments
- check_model() None [source]
Check to see if model is adhering to the image encoder structure. Sub-classes are not required to adhere to this structure. This is only required for easy integration with downstream tasks.
- update_classifier(opts: Namespace, n_classes: int) None [source]
This function updates the classification layer in a model. Useful for fine-tuning purposes.
- extract_end_points_all(x: Tensor, use_l5: bool | None = True, use_l5_exp: bool | None = False, *args, **kwargs) Dict[str, Tensor] [source]
Extract feature maps from different spatial levels of the model.
- Parameters:
x – Input image tensor
use_l5 – Include features from layer_5 in the output dictionary. Defaults to True.
use_l5_exp – Include features from conv_1x1_exp in the output dictionary. Defaults to False.
- Returns:
A mapping containing the name and output at each spatial-level of the model.
- …note:
This is useful for down-stream tasks.
- extract_end_points_l4(x: Tensor, *args, **kwargs) Dict[str, Tensor] [source]
This function is similar to extract_end_points_all, with an exception that it only returns output in a dictionary form till layer_4 of the model.
- extract_features(x: Tensor, *args, **kwargs) Tensor [source]
This function is similar to extract_end_points_all. However, it returns a single tensor as the output of the last layer instead of a dictionary, and is typically used during classification tasks where intermediate feature maps are not required.
- forward_classifier(x: Tensor, *args, **kwargs) Tensor [source]
A helper function to extract features and running a classifier.
- forward(x: Any, *args, **kwargs) Any [source]
A forward function of the model, optionally training the model with neural augmentation.
- get_trainable_parameters(weight_decay: float | None = 0.0, no_decay_bn_filter_bias: bool | None = False, *args, **kwargs) Tuple[List[Mapping], List[float]] [source]
Get parameters for training along with the learning rate.
- Parameters:
weight_decay – weight decay
no_decay_bn_filter_bias – Do not decay BN and biases. Defaults to False.
- Returns:
Returns a tuple of length 2. The first entry is a list of dictionary with three keys (params, weight_decay, param_names). The second entry is a list of floats containing learning rate for each parameter.
- dummy_input_and_label(batch_size: int) Dict [source]
Create dummy input and labels for CI/CD purposes. Child classes must override it if functionality is different.
- get_exportable_model() Module [source]
This function can be used to prepare the architecture for inference. For example, re-parameterizing branches when possible. The functionality of this method may vary from model to model, so child model classes have to implement this method, if such a transformation exists.
- classmethod build_model(opts: Namespace, *args, **kwargs) BaseAnyNNModel [source]
Helper function to build a model.
- Parameters:
opts – Command-line arguments
- Returns:
An instance of cvnets.models.BaseAnyNNModel.
- cvnets.models.classification.base_image_encoder.set_model_specific_opts_before_model_building(opts: Namespace) Dict[str, Any] [source]
Override library-level defaults with model-specific default values.
- Parameters:
opts – Command-line arguments
- Returns:
A dictionary containing the name of arguments that are updated along with their original values. This dictionary is used in unset_model_specific_opts_after_model_building function to unset the model-specific to library-specific defaults.
- cvnets.models.classification.base_image_encoder.unset_model_specific_opts_after_model_building(opts: Namespace, default_opts_info: Dict[str, Any], *ars, **kwargs) None [source]
Given command-line arguments and a mapping of opts that needs to be unset, this function unsets the library-level defaults that were over-ridden previously in set_model_specific_opts_before_model_building.
cvnets.models.classification.byteformer module
- cvnets.models.classification.byteformer.unfold_tokens(t: Tensor, kernel_size: int) Tensor [source]
Group tokens from tensor @t using torch.Tensor.unfold, using the given kernel size. This amounts to windowing @t using overlapping windows of size @kernel_size, with overlap of @kernel_size // 2.
- Parameters:
t – A tensor of shape [batch_size, sequence_length, num_channels].
kernel_size – The kernel size.
- Returns:
A tensor of shape [batch_size * (sequence_length - kernel_size) // (kernel_size // 2) + 1, kernel_size, num_channels].
- class cvnets.models.classification.byteformer.ByteFormer(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseAnyNNModel
This class defines the ByteFormer architecture.
- __init__(opts: Namespace, *args, **kwargs) None [source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- classmethod add_arguments(parser: ArgumentParser) ArgumentParser [source]
Add model-specific arguments
- dummy_input_and_label(batch_size: int) Dict [source]
Get a dummy input and label that could be passed to the model.
- Parameters:
batch_size – The batch size to use for the generated inputs.
- Returns:
- A dict with
- {
“samples”: tensor of shape [batch_size, sequence_length], “targets”: tensor of shape [batch_size],
}
- apply_token_reduction_net(x: Tensor, x_mask: Tensor) Tuple[Tensor, Tensor] [source]
Apply the portion of the network used to reduce sequence lengths before the transformer backbone.
- Parameters:
x – The input token embeddings of shape [batch_size, sequence_length, embed_dim].
x_mask – The input mask of shape [batch_size, sequence_length].
- Returns:
New versions of @x and @x_mask, downsampled along the sequence dimension by the token reduction net.
- get_backbone_inputs(x: Tensor) Tuple[Tensor, Tensor] [source]
Convert input bytes into embeddings to be passed to the network’s transformer backbone.
- Parameters:
x – The input bytes as an integer tensor of shape [batch_size, sequence_length]. Integer tensors are expected (rather than byte tensors) since -1 is usually used for padding.
- Returns:
The embeddings of shape [batch_size, new_sequence_length] and a mask tensor of shape [batch_size, new_sequence_length]. The mask contains 0 at unmasked positions and float(-inf) at masked positions.
- backbone_forward(x: Tensor, key_padding_mask: Tensor) Tuple[Tensor, Tensor] [source]
Execute the forward pass of the network’s transformer backbone.
- Parameters:
x – The input embeddings as a [batch_size, sequence_length, embed_dim] tensor.
key_padding_mask – The mask tensor of shape [batch_size, sequence_length].
- Returns:
The outputs of the backbone as a tuple. The first element is the feature tensor, and the second element is the updated key_padding_mask.
- get_downsampler_name(idx: int) str [source]
Get the name of the downsampling layer with index @idx.
- Parameters:
idx – The index of the downsampling layer.
- Returns:
A string representing the name of the donwsampling layer.
- get_downsampler(idx: int) Module | None [source]
Get the module that performs downsampling after transformer layer @idx. If no downsampling occurs after that layer, return None.
- Parameters:
idx – The desired index.
- Returns:
The downsampling layer, or None.
- forward(x: Tensor, *args, **kwargs) Tensor [source]
Perform a forward pass on input bytes. The tensor is stored as an integer tensor of shape [batch_size, sequence_length]. Integer tensors are used because @x usually contains mask tokens.
- Parameters:
x – The input tensor of shape [batch_size, sequence_length].
- Returns:
The output logits.
- classmethod build_model(opts: Namespace, *args, **kwargs) BaseAnyNNModel [source]
Helper function to build a model.
- Parameters:
opts – Command-line arguments.
- Returns:
An instance of cvnets.models.BaseAnyNNModel.
cvnets.models.classification.efficientnet module
- class cvnets.models.classification.efficientnet.EfficientNet(opts, *args, **kwargs: Any)[source]
Bases:
BaseImageEncoder
This class defines the EfficientNet architecture
cvnets.models.classification.fastvit module
- cvnets.models.classification.fastvit.basic_blocks(opts: Namespace, dim: int, block_index: int, num_blocks: List[int], token_mixer_type: str, kernel_size: int = 3, mlp_ratio: float = 4.0, drop_rate: float = 0.0, drop_path_rate: float = 0.0, inference_mode: bool = False, use_layer_scale: bool = True, layer_scale_init_value: float = 1e-05) Sequential [source]
Build FastViT blocks within a stage.
- Parameters:
opts – Command line arguments.
dim – Number of embedding dimensions.
block_index – block index.
num_blocks – List containing number of blocks per stage.
token_mixer_type – Token mixer type.
kernel_size – Kernel size for repmixer.
mlp_ratio – MLP expansion ratio.
drop_rate – Dropout rate.
drop_path_rate – Drop path rate.
inference_mode – Flag to instantiate block in inference mode.
use_layer_scale – Flag to turn on layer scale regularization.
layer_scale_init_value – Layer scale value at initialization.
- Returns:
nn.Sequential object of all the blocks within the stage.
- class cvnets.models.classification.fastvit.FastViT(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseImageEncoder
This class implements FastViT architecture
- __init__(opts: Namespace, *args, **kwargs) None [source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
cvnets.models.classification.mobilenetv1 module
- class cvnets.models.classification.mobilenetv1.MobileNetv1(opts, *args, **kwargs)[source]
Bases:
BaseImageEncoder
This class defines the MobileNet architecture
cvnets.models.classification.mobilenetv2 module
- class cvnets.models.classification.mobilenetv2.MobileNetV2(opts, *args, **kwargs)[source]
Bases:
BaseImageEncoder
This class defines the MobileNetv2 architecture
cvnets.models.classification.mobilenetv3 module
- class cvnets.models.classification.mobilenetv3.MobileNetV3(opts, *args, **kwargs)[source]
Bases:
BaseImageEncoder
This class implements the MobileNetv3 architecture
cvnets.models.classification.mobileone module
- class cvnets.models.classification.mobileone.MobileOne(opts, *args, **kwargs)[source]
Bases:
BaseImageEncoder
This class implements MobileOne architecture
- __init__(opts, *args, **kwargs) None [source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
cvnets.models.classification.mobilevit module
- class cvnets.models.classification.mobilevit.MobileViT(opts, *args, **kwargs)[source]
Bases:
BaseImageEncoder
This class implements the MobileViT architecture
cvnets.models.classification.mobilevit_v2 module
- class cvnets.models.classification.mobilevit_v2.MobileViTv2(opts, *args, **kwargs)[source]
Bases:
BaseImageEncoder
This class defines the MobileViTv2 architecture
cvnets.models.classification.regnet module
- class cvnets.models.classification.regnet.RegNet(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseImageEncoder
This class implements the RegNet architecture
cvnets.models.classification.resnet module
- class cvnets.models.classification.resnet.ResNet(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseImageEncoder
This class implements the ResNet architecture
Note
Our ResNet implementation is different from the original implementation in two ways: 1. First 7x7 strided conv is replaced with 3x3 strided conv 2. MaxPool operation is replaced with another 3x3 strided depth-wise conv
cvnets.models.classification.swin_transformer module
- class cvnets.models.classification.swin_transformer.SwinTransformer(opts, *args, **kwargs)[source]
Bases:
BaseImageEncoder
Implements Swin Transformer from the “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows” paper.
The code is adapted from “Torchvision repository”
- __init__(opts, *args, **kwargs) None [source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- extract_end_points_all(x: Tensor, use_l5: bool | None = True, use_l5_exp: bool | None = False, *args, **kwargs) Dict[str, Tensor] [source]
Extract feature maps from different spatial levels of the model.
- Parameters:
x – Input image tensor
use_l5 – Include features from layer_5 in the output dictionary. Defaults to True.
use_l5_exp – Include features from conv_1x1_exp in the output dictionary. Defaults to False.
- Returns:
A mapping containing the name and output at each spatial-level of the model.
- …note:
This is useful for down-stream tasks.
cvnets.models.classification.vit module
- class cvnets.models.classification.vit.VisionTransformer(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseImageEncoder
This class defines the Vision Transformer architecture. Our model implementation is inspired from Early Convolutions Help Transformers See Better
Note
Our implementation is different from the original implementation in two ways: 1. Kernel size is odd. 2. Our positional encoding implementation allows us to use ViT with any multiple input scales 3. We do not use StochasticDepth 4. We do not add positional encoding to class token (if enabled), as suggested in DeiT-3 paper
- __init__(opts: Namespace, *args, **kwargs) None [source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- classmethod add_arguments(parser: ArgumentParser) ArgumentParser [source]
Add image classification model-specific arguments
- extract_features(x: Tensor, *args, **kwargs) Tuple[Tensor, Tensor | None] [source]
This function is similar to extract_end_points_all. However, it returns a single tensor as the output of the last layer instead of a dictionary, and is typically used during classification tasks where intermediate feature maps are not required.
- forward_classifier(x: Tensor, *args, **kwargs) Tuple[Tensor, Tensor] [source]
A helper function to extract features and running a classifier.
- forward(x: Tensor, *args, **kwargs) Tensor | Dict[str, Tensor] [source]
A forward function of the model, optionally training the model with neural augmentation.
- extract_end_points_all(x: Tensor, use_l5: bool | None = True, use_l5_exp: bool | None = False, *args, **kwargs) Dict[str, Tensor] [source]
Extract feature maps from different spatial levels of the model.
- Parameters:
x – Input image tensor
use_l5 – Include features from layer_5 in the output dictionary. Defaults to True.
use_l5_exp – Include features from conv_1x1_exp in the output dictionary. Defaults to False.
- Returns:
A mapping containing the name and output at each spatial-level of the model.
- …note:
This is useful for down-stream tasks.