cvnets.models.multi_modal_img_text package
Submodules
cvnets.models.multi_modal_img_text.base_multi_modal_img_text module
- class cvnets.models.multi_modal_img_text.base_multi_modal_img_text.BaseMultiModalImageText(opts, *args, **kwargs)[source]
Bases:
BaseAnyNNModel
Base class for multi-modal image-text data
- Parameters:
opts – Command-line arguments
cvnets.models.multi_modal_img_text.clip module
- class cvnets.models.multi_modal_img_text.clip.CLIP(opts: Namespace, image_encoder: BaseImageEncoder, text_encoder: BaseTextEncoder, *args, **kwargs)[source]
Bases:
BaseMultiModalImageText
Base class for multi-modal image-text data
- __init__(opts: Namespace, image_encoder: BaseImageEncoder, text_encoder: BaseTextEncoder, *args, **kwargs) None [source]
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- get_trainable_parameters(weight_decay: float | None = 0.0, no_decay_bn_filter_bias: bool | None = False, *args, **kwargs)[source]
Get parameters for training along with the learning rate.
- Parameters:
weight_decay – weight decay
no_decay_bn_filter_bias – Do not decay BN and biases. Defaults to False.
- Returns:
Returns a tuple of length 2. The first entry is a list of dictionary with three keys (params, weight_decay, param_names). The second entry is a list of floats containing learning rate for each parameter.
Note
Kwargs may contain module_name. To avoid multiple arguments with the same name, we pop it and concatenate with encoder or head name
- dummy_input_and_label(batch_size: int) Dict [source]
Create dummy input and labels for CI/CD purposes. Child classes must override it if functionality is different.
- forward(input: Dict, *args, **kwargs) Dict [source]
Implement the model-specific forward function in sub-classes.
- classmethod build_model(opts, *args, **kwargs) BaseAnyNNModel [source]
Helper function to build the multi-modal image-text model