data.transforms.audio_aux package
Submodules
data.transforms.audio_aux.mfccs module
- data.transforms.audio_aux.mfccs.get_mfccs(data: Tensor, sampling_rate: float, num_mfccs: int, window_length: float = 0.023) Tensor [source]
Get Mel Frequency Cepstral Coefficients from an audio signal.
Explanation of Mel-Frequency Cepstral Coefficients (MFCCs): > https://librosa.org/doc/main/generated/librosa.stft.html#librosa.stft
- Parameters:
data – one channel of the audio signal, as a 1-D tensor.
sampling_rate – the sampling rate of the audio.
num_mfccs – the number of cepstral coefficients to use.
window_length – the window length used for computing the spectrogram. By default, we choose 23ms, which is a good value for human speech.
- data.transforms.audio_aux.mfccs.calculate_mfccs(audio: Tensor, sampling_rate: float, num_mfccs: int, window_length: float = 0.023) Tensor [source]
Calculate MFCCs on a batch of data.
- Parameters:
audio – the audio signal, in [batch_size, num_channels, temporal_size] order.
sampling_rate – the sampling rate of the audio signal.
num_mfccs – the number of coefficients to use.
window_length – the window length used for computing the spectrogram. By default, we choose 23ms, which is a good value for human speech.
- data.transforms.audio_aux.mfccs.get_mfcc_features(audio: Tensor, sampling_rate: float, num_mfccs: int, num_frames: int, window_length: float = 0.023) Tensor [source]
Get MFCC features for a batch of audio data.
- Parameters:
audio – the audio signal, in [batch_size, temporal_size, num_channels] order.
sampling_rate – the sampling rate of the audio signal.
num_mfccs – the number of coefficients to use.
window_length – the window length used for computing the spectrogram. By default, we choose 23ms, which is a good value for human speech.
num_frames – each MFCC spectrogram gets dividied into @num_frames frames (sub-time-slice temporal components) of length ceil(spectrogram_length/num_frames).
- Returns:
MFCCs in [N, C, num_mfccs, num_frames, ceil(spectrogram_length/num_frames)] order.
- data.transforms.audio_aux.mfccs.get_padded_features(features: Tensor, num_frames: int) Tensor [source]
Splits the temporal dimension (of length T) of MFCC features into @num_frames sub-vectors (of length
ceil(T/num_frames)
). As T may not be divisible by @num_frames, pads the temporal dimension if required.- Parameters:
features – Tensor[batchsize x C(num_audio_channels) x num_mfccs x T]
num_frames – number of padded sub-vectors
- Returns:
Tensor (batchsize x C x num_mfccs x num_frames x ceil(T/num_frames))
- Return type:
padded_features