data.video_reader package

Submodules

data.video_reader.base_av_reader module

exception data.video_reader.base_av_reader.VideoDurationDoesNotMatchAudioDurationError[source]

Bases: AssertionError

class data.video_reader.base_av_reader.BaseAVReader(opts: Namespace, is_training: bool | None = False, *args, **kwargs)[source]

Bases: object

Base AudioVideo Reader

Parameters:
  • opts – command line arguments

  • is_training – Training or validation mode. Default: False.

classmethod add_arguments(parser: ArgumentParser) ArgumentParser[source]
__init__(opts: Namespace, is_training: bool | None = False, *args, **kwargs)[source]
static get_frame_transform(opts: Namespace, is_training: bool, *args, **kwargs) BaseTransformation[source]
check_video(filename: str) bool[source]
read_video(filename: str, stream_idx: int = 0, audio_sample_rate: int = -1, custom_frame_transforms: BaseTransformation | None = None, video_only: bool = False, *args, **kwargs) Dict[source]
num_frames(filename: str) int[source]
static random_sampling(total_video_frames: int, video_frames_per_clip: int, clips_per_video: int, total_audio_frames: int | None = None) Tuple[Tensor, Tensor | None][source]

For a given video, sample clips_per_video indices randomly along with aligned audio indices (optionally).

Parameters:
  • total_video_frames – number of video frames in the given video.

  • video_frames_per_clip – number of frames required per clip.

  • clips_per_video – number of clips needed from a given video.

  • total_audio_frames – number of audio frames in the given video.

Returns:

indices corresponding to video frames [Tensor (clips_per_video x

video_frames_per_clip)].

aclip_idsindices corresponding to audio frames [Tensor (clips_per_video x

audio_frames_per_clip)].

Return type:

vclip_ids

static uniform_sampling(total_video_frames: int, video_frames_per_clip: int, clips_per_video: int, total_audio_frames: int | None = None) Tuple[Tensor, Tensor | None][source]

For a given video, sample clips_per_video indices uniformly along with aligned audio indices (optionally).

Parameters:
  • total_video_frames – number of video frames in the given video.

  • video_frames_per_clip – number of frames required per clip.

  • clips_per_video – number of clips needed from a given video.

  • total_audio_frames – number of audio frames in the given video.

Returns:

indices corresponding to video frames [Tensor (clips_per_video x

video_frames_per_clip)].

aclip_idsindices corresponding to audio frames [Tensor (clips_per_video x

audio_frames_per_clip)].

Return type:

vclip_ids

read_video_file_into_clips(vid_filename: str, num_frames_per_clip: int, clips_per_video: int, is_training: bool, video_only: bool = False, output_video_fps: float = -1, output_audio_fps: int = -1, num_samples_per_clip: int = 1, custom_frame_transforms: BaseTransformation | None = None, *args, **kwargs) Dict[source]

Read a video file into clips and sample the clips at the specified video/audio frame rate. First, we read all the video and audio frames into the memory, where audio is at output_audio_fps if specified; then we sample clips_per_video clips from the entire video/audio tensor. If the desired video frame rate is specified, we subsample the video at frame rate output_video_fps. Despite whether the video is subsampled or not, there are num_frames_per_clip video frames in each clip.

Parameters:
  • vid_filename – The path of the video to be read.

  • num_frames_per_clip – Number of frames per clip to read.

  • clips_per_video – Number of clips to read for each video.

  • training (is) – A boolean of whether the model is in training.

  • output_video_fps – The frame rate of the output video. Default is -1, which means no resampling is required.

  • output_audio_fps – The frame rate of the output audio. Default is -1, which means no resampling is required.

  • num_samples_per_clip – Number of random samples to generate per clip.

  • custom_frame_transforms

    If provided, the transformation will be used instead of the default @BaseAVReader.get_frame_transforms. Note: Be careful when customizing frame transforms, because there might

    exist slight differences between the data type of frames read by different AVReaders before ToTensor() gets applied.

Returns:

A dictionary with video/audio as tensor and metadata. The format of the metadata is the following: {

”video_fps”: float, “audio_fps”: int, “video_frame_timestamps”: [num_clips x video_frames_per_clip] tensor.

}

dummy_audio_video_clips(clips_per_video: int, num_frames_to_sample: int, height: int, width: int) Dict[source]

data.video_reader.decord_reader module

class data.video_reader.decord_reader.DecordAVReader(*args, **kwargs)[source]

Bases: BaseAVReader

Video Reader using Decord.

__init__(*args, **kwargs)[source]
read_video(av_file: str, stream_idx: int = 0, audio_sample_rate: int = -1, custom_frame_transforms: BaseTransformation | None = None, video_only: bool = False, *args, **kwargs) Dict[source]

data.video_reader.pyav_reader module

class data.video_reader.pyav_reader.PyAVReader(opts: Namespace, is_training: bool | None = False, *args, **kwargs)[source]

Bases: BaseAVReader

Video Reader using PyAV.

read_video(av_file: str, stream_idx: int = 0, audio_sample_rate: int = -1, custom_frame_transforms: BaseTransformation | None = None, video_only: bool = False, *args, **kwargs) Dict[source]

Module contents

data.video_reader.arguments_video_reader(parser: ArgumentParser)[source]
data.video_reader.get_video_reader(opts, *args, **kwargs) BaseAVReader[source]

Helper function to build the video reader from command-line arguments.

Parameters:
  • opts – Command-line arguments

  • is_training

Returns:

Image projection head module.