cvnets.models.audio_classification package

Submodules

cvnets.models.audio_classification.audio_byteformer module

class cvnets.models.audio_classification.audio_byteformer.AudioByteFormer(opts: Namespace, *args, **kwargs)[source]

Bases: ByteFormer, BaseAudioClassification

Identical to byteformer.ByteFormer, but registered as an audio classification model.

forward(x: Dict[str, Tensor], *args, **kwargs) → Tensor[source]

Perform a forward pass on input bytes. The input is a dictionary containing the input tensor. The tensor is stored as an integer tensor of shape [batch_size, sequence_length]. Integer tensors are used because the tensor usually contains mask tokens.

Parameters:: x – A dictionary containing {“audio”: audio_bytes}.
Returns:: The output logits.

dummy_input_and_label(batch_size: int) → Dict[source]

Get a dummy input and label that could be passed to the model.

Parameters:

batch_size – The batch size to use for the generated inputs.

Returns:

A dict with

{: “samples”: {“audio”: tensor of shape [batch_size, sequence_length]}, “targets”: tensor of shape [batch_size],

}

cvnets.models.audio_classification.base_audio_classification module

class cvnets.models.audio_classification.base_audio_classification.BaseAudioClassification(opts: Namespace, *args, **kwargs)[source]

Bases: BaseAnyNNModel

Base class for audio classification.

Parameters:: opts – Command-line arguments

__init__(opts: Namespace, *args, **kwargs) → None[source]: Initializes internal Module state, shared by both nn.Module and ScriptModule.

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]: Add model specific arguments

cvnets.models.audio_classification package

Submodules

cvnets.models.audio_classification.audio_byteformer module

cvnets.models.audio_classification.base_audio_classification module

Module contents