data.datasets.audio_classification package
Submodules
data.datasets.audio_classification.speech_commands_v2 module
- class data.datasets.audio_classification.speech_commands_v2.SpeechCommandsv2Dataset(opts: Namespace, *args, **kwargs)[source]
Bases:
BaseDataset
Google’s Speech Commands dataset for keyword spotting (https://arxiv.org/abs/1804.03209).
This contains the “v2” version for 12-way classification (10 commands, plus unknown and background categories).
- Parameters:
opts – Command-line arguments
- classmethod add_arguments(parser: ArgumentParser) ArgumentParser [source]
Add dataset-specific arguments
- get_sample(index: int) Tuple[Tensor, float, Tensor] [source]
Get the dataset sample at the given index.
- get_transformed_sample(index: int) Dict[str, Dict[str, Tensor] | Tensor | int] [source]
Get the sample at the index specified by @index.
- Parameters:
index – The index of the sample.
- Returns:
- {
- “samples”:
- {
- “audio”: A [C, N] tensor, where C is the number of
channels, and N is the length.
}
”targets”: an integer class label. “sample_id”: an integer giving the sample index. “metadata”:
- {
“audio_fps”: The sampling rate of the audio.
}
}
- Return type:
A sample as a dictionary. It contains