cvnets.models.multi_modal_img_text package

Submodules

cvnets.models.multi_modal_img_text.base_multi_modal_img_text module

class cvnets.models.multi_modal_img_text.base_multi_modal_img_text.BaseMultiModalImageText(opts, *args, **kwargs)[source]

Bases: BaseAnyNNModel

Base class for multi-modal image-text data

Parameters:: opts – Command-line arguments

__init__(opts, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]: Add model specific arguments

cvnets.models.multi_modal_img_text.clip module

class cvnets.models.multi_modal_img_text.clip.CLIP(opts: Namespace, image_encoder: BaseImageEncoder, text_encoder: BaseTextEncoder, *args, **kwargs)[source]

Bases: BaseMultiModalImageText

Base class for multi-modal image-text data

__init__(opts: Namespace, image_encoder: BaseImageEncoder, text_encoder: BaseTextEncoder, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser)[source]: Add model specific arguments

reset_parameters() → None[source]: Reset weights image and text models

get_trainable_parameters(weight_decay: float | None = 0.0, no_decay_bn_filter_bias: bool | None = False, *args, **kwargs)[source]

Get parameters for training along with the learning rate.

Parameters:

weight_decay – weight decay
no_decay_bn_filter_bias – Do not decay BN and biases. Defaults to False.

Returns:

Returns a tuple of length 2. The first entry is a list of dictionary with three keys (params, weight_decay, param_names). The second entry is a list of floats containing learning rate for each parameter.

Note

Kwargs may contain module_name. To avoid multiple arguments with the same name, we pop it and concatenate with encoder or head name

dummy_input_and_label(batch_size: int) → Dict[source]: Create dummy input and labels for CI/CD purposes. Child classes must override it if functionality is different.

forward(input: Dict, *args, **kwargs) → Dict[source]: Implement the model-specific forward function in sub-classes.

classmethod build_model(opts, *args, **kwargs) → BaseAnyNNModel[source]: Helper function to build the multi-modal image-text model

cvnets.models.multi_modal_img_text.clip.update_image_classifier(opts, image_classifier: Module, projection_dim: int, *args, **kwargs) → Module[source]

cvnets.models.multi_modal_img_text package

Submodules

cvnets.models.multi_modal_img_text.base_multi_modal_img_text module

cvnets.models.multi_modal_img_text.clip module

Module contents