cvnets.models.audio_classification package
Submodules
cvnets.models.audio_classification.audio_byteformer module
- class cvnets.models.audio_classification.audio_byteformer.AudioByteFormer(opts: Namespace, *args, **kwargs)[source]
Bases:
ByteFormer
,BaseAudioClassification
Identical to byteformer.ByteFormer, but registered as an audio classification model.
- forward(x: Dict[str, Tensor], *args, **kwargs) Tensor [source]
Perform a forward pass on input bytes. The input is a dictionary containing the input tensor. The tensor is stored as an integer tensor of shape [batch_size, sequence_length]. Integer tensors are used because the tensor usually contains mask tokens.
- Parameters:
x – A dictionary containing {“audio”: audio_bytes}.
- Returns:
The output logits.
- dummy_input_and_label(batch_size: int) Dict [source]
Get a dummy input and label that could be passed to the model.
- Parameters:
batch_size – The batch size to use for the generated inputs.
- Returns:
- A dict with
- {
“samples”: {“audio”: tensor of shape [batch_size, sequence_length]}, “targets”: tensor of shape [batch_size],
}
cvnets.models.audio_classification.base_audio_classification module
- class cvnets.models.audio_classification.base_audio_classification.BaseAudioClassification(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseAnyNNModel
Base class for audio classification.
- Parameters:
opts – Command-line arguments