data.datasets.audio_classification package

Submodules

class data.datasets.audio_classification.speech_commands_v2.SpeechCommandsv2Dataset(opts: Namespace, *args, **kwargs)[source]

Google’s Speech Commands dataset for keyword spotting (https://arxiv.org/abs/1804.03209).

This contains the “v2” version for 12-way classification (10 commands, plus unknown and background categories).

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]: Add dataset-specific arguments

get_sample(index: int) → Tuple[Tensor, float, Tensor][source]: Get the dataset sample at the given index.

get_transformed_sample(index: int) → Dict[str, Dict[str, Tensor] | Tensor | int][source]

Get the sample at the index specified by @index.

Parameters:

index – The index of the sample.

Returns:

{

“samples”:

{

“audio”: A [C, N] tensor, where C is the number of: channels, and N is the length.

}

”targets”: an integer class label. “sample_id”: an integer giving the sample index. “metadata”:

{
“audio_fps”: The sampling rate of the audio.

}

}

Return type:

A sample as a dictionary. It contains