data.video_reader package

Submodules

data.video_reader.base_av_reader module

exception data.video_reader.base_av_reader.VideoDurationDoesNotMatchAudioDurationError[source]: Bases: AssertionError

class data.video_reader.base_av_reader.BaseAVReader(opts: Namespace, is_training: bool | None = False, *args, **kwargs)[source]

Bases: object

Base AudioVideo Reader

Parameters:

opts – command line arguments
is_training – Training or validation mode. Default: False.

classmethod add_arguments(parser: ArgumentParser) → ArgumentParser[source]

__init__(opts: Namespace, is_training: bool | None = False, *args, **kwargs)[source]

static get_frame_transform(opts: Namespace, is_training: bool, *args, **kwargs) → BaseTransformation[source]

check_video(filename: str) → bool[source]

read_video(filename: str, stream_idx: int = 0, audio_sample_rate: int = -1, custom_frame_transforms: BaseTransformation | None = None, video_only: bool = False, *args, **kwargs) → Dict[source]

num_frames(filename: str) → int[source]

static random_sampling(total_video_frames: int, video_frames_per_clip: int, clips_per_video: int, total_audio_frames: int | None = None) → Tuple[Tensor, Tensor | None][source]

For a given video, sample clips_per_video indices randomly along with aligned audio indices (optionally).

Parameters:

total_video_frames – number of video frames in the given video.
video_frames_per_clip – number of frames required per clip.
clips_per_video – number of clips needed from a given video.
total_audio_frames – number of audio frames in the given video.

Returns:

indices corresponding to video frames [Tensor (clips_per_video x: video_frames_per_clip)].
aclip_idsindices corresponding to audio frames [Tensor (clips_per_video x: audio_frames_per_clip)].

Return type:

vclip_ids

static uniform_sampling(total_video_frames: int, video_frames_per_clip: int, clips_per_video: int, total_audio_frames: int | None = None) → Tuple[Tensor, Tensor | None][source]

For a given video, sample clips_per_video indices uniformly along with aligned audio indices (optionally).

Parameters:

total_video_frames – number of video frames in the given video.
video_frames_per_clip – number of frames required per clip.
clips_per_video – number of clips needed from a given video.
total_audio_frames – number of audio frames in the given video.

Returns:

indices corresponding to video frames [Tensor (clips_per_video x: video_frames_per_clip)].
aclip_idsindices corresponding to audio frames [Tensor (clips_per_video x: audio_frames_per_clip)].

Return type:

vclip_ids

read_video_file_into_clips(vid_filename: str, num_frames_per_clip: int, clips_per_video: int, is_training: bool, video_only: bool = False, output_video_fps: float = -1, output_audio_fps: int = -1, num_samples_per_clip: int = 1, custom_frame_transforms: BaseTransformation | None = None, *args, **kwargs) → Dict[source]

Read a video file into clips and sample the clips at the specified video/audio frame rate. First, we read all the video and audio frames into the memory, where audio is at output_audio_fps if specified; then we sample clips_per_video clips from the entire video/audio tensor. If the desired video frame rate is specified, we subsample the video at frame rate output_video_fps. Despite whether the video is subsampled or not, there are num_frames_per_clip video frames in each clip.

Parameters:

vid_filename – The path of the video to be read.
num_frames_per_clip – Number of frames per clip to read.
clips_per_video – Number of clips to read for each video.
training (is) – A boolean of whether the model is in training.
output_video_fps – The frame rate of the output video. Default is -1, which means no resampling is required.
output_audio_fps – The frame rate of the output audio. Default is -1, which means no resampling is required.
num_samples_per_clip – Number of random samples to generate per clip.
custom_frame_transforms –
If provided, the transformation will be used instead of the default @BaseAVReader.get_frame_transforms. Note: Be careful when customizing frame transforms, because there might

exist slight differences between the data type of frames read by different AVReaders before ToTensor() gets applied.

Returns:

A dictionary with video/audio as tensor and metadata. The format of the metadata is the following: {

”video_fps”: float, “audio_fps”: int, “video_frame_timestamps”: [num_clips x video_frames_per_clip] tensor.

}

dummy_audio_video_clips(clips_per_video: int, num_frames_to_sample: int, height: int, width: int) → Dict[source]

data.video_reader.decord_reader module

class data.video_reader.decord_reader.DecordAVReader(*args, **kwargs)[source]

Bases: BaseAVReader

Video Reader using Decord.

__init__(*args, **kwargs)[source]

read_video(av_file: str, stream_idx: int = 0, audio_sample_rate: int = -1, custom_frame_transforms: BaseTransformation | None = None, video_only: bool = False, *args, **kwargs) → Dict[source]

data.video_reader.pyav_reader module

class data.video_reader.pyav_reader.PyAVReader(opts: Namespace, is_training: bool | None = False, *args, **kwargs)[source]

Bases: BaseAVReader

Video Reader using PyAV.

read_video(av_file: str, stream_idx: int = 0, audio_sample_rate: int = -1, custom_frame_transforms: BaseTransformation | None = None, video_only: bool = False, *args, **kwargs) → Dict[source]

Module contents

data.video_reader.arguments_video_reader(parser: ArgumentParser)[source]

data.video_reader.get_video_reader(opts, *args, **kwargs) → BaseAVReader[source]

Helper function to build the video reader from command-line arguments.

Parameters:

opts – Command-line arguments
is_training –

Returns:

Image projection head module.