data.transforms.audio_aux package

Submodules

data.transforms.audio_aux.mfccs module

data.transforms.audio_aux.mfccs.get_mfccs(data: Tensor, sampling_rate: float, num_mfccs: int, window_length: float = 0.023) → Tensor[source]

Get Mel Frequency Cepstral Coefficients from an audio signal.

Explanation of Mel-Frequency Cepstral Coefficients (MFCCs): > https://librosa.org/doc/main/generated/librosa.stft.html#librosa.stft

Parameters:

data – one channel of the audio signal, as a 1-D tensor.
sampling_rate – the sampling rate of the audio.
num_mfccs – the number of cepstral coefficients to use.
window_length – the window length used for computing the spectrogram. By default, we choose 23ms, which is a good value for human speech.

data.transforms.audio_aux.mfccs.calculate_mfccs(audio: Tensor, sampling_rate: float, num_mfccs: int, window_length: float = 0.023) → Tensor[source]

Calculate MFCCs on a batch of data.

Parameters:

audio – the audio signal, in [batch_size, num_channels, temporal_size] order.
sampling_rate – the sampling rate of the audio signal.
num_mfccs – the number of coefficients to use.
window_length – the window length used for computing the spectrogram. By default, we choose 23ms, which is a good value for human speech.

data.transforms.audio_aux.mfccs.get_mfcc_features(audio: Tensor, sampling_rate: float, num_mfccs: int, num_frames: int, window_length: float = 0.023) → Tensor[source]

Get MFCC features for a batch of audio data.

Parameters:

audio – the audio signal, in [batch_size, temporal_size, num_channels] order.
sampling_rate – the sampling rate of the audio signal.
num_mfccs – the number of coefficients to use.
window_length – the window length used for computing the spectrogram. By default, we choose 23ms, which is a good value for human speech.
num_frames – each MFCC spectrogram gets dividied into @num_frames frames (sub-time-slice temporal components) of length ceil(spectrogram_length/num_frames).

Returns:

MFCCs in [N, C, num_mfccs, num_frames, ceil(spectrogram_length/num_frames)] order.

data.transforms.audio_aux.mfccs.get_padded_features(features: Tensor, num_frames: int) → Tensor[source]

Splits the temporal dimension (of length T) of MFCC features into @num_frames sub-vectors (of length ceil(T/num_frames)). As T may not be divisible by @num_frames, pads the temporal dimension if required.

Parameters:

features – Tensor[batchsize x C(num_audio_channels) x num_mfccs x T]
num_frames – number of padded sub-vectors

Returns:

Tensor (batchsize x C x num_mfccs x num_frames x ceil(T/num_frames))

Return type:

padded_features

data.transforms.audio_aux package

Submodules

data.transforms.audio_aux.mfccs module

Module contents