data.datasets.audio_classification package

Submodules

data.datasets.audio_classification.speech_commands_v2 module

class data.datasets.audio_classification.speech_commands_v2.SpeechCommandsv2Dataset(opts: Namespace, *args, **kwargs)[source]

Bases: BaseDataset

Google’s Speech Commands dataset for keyword spotting (https://arxiv.org/abs/1804.03209).

This contains the “v2” version for 12-way classification (10 commands, plus unknown and background categories).

Parameters:

opts – Command-line arguments

__init__(opts: Namespace, *args, **kwargs) None[source]
classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]

Add dataset-specific arguments

get_sample(index: int) Tuple[Tensor, float, Tensor][source]

Get the dataset sample at the given index.

get_transformed_sample(index: int) Dict[str, Dict[str, Tensor] | Tensor | int][source]

Get the sample at the index specified by @index.

Parameters:

index – The index of the sample.

Returns:

{
“samples”:
{
“audio”: A [C, N] tensor, where C is the number of

channels, and N is the length.

}

”targets”: an integer class label. “sample_id”: an integer giving the sample index. “metadata”:

{

“audio_fps”: The sampling rate of the audio.

}

}

Return type:

A sample as a dictionary. It contains

Module contents