data.video_reader package
Submodules
data.video_reader.base_av_reader module
- exception data.video_reader.base_av_reader.VideoDurationDoesNotMatchAudioDurationError[source]
Bases:
AssertionError
- class data.video_reader.base_av_reader.BaseAVReader(opts: Namespace, is_training: bool | None = False, *args, **kwargs)[source]
Bases:
object
Base AudioVideo Reader
- Parameters:
opts – command line arguments
is_training – Training or validation mode. Default: False.
- static get_frame_transform(opts: Namespace, is_training: bool, *args, **kwargs) BaseTransformation [source]
- read_video(filename: str, stream_idx: int = 0, audio_sample_rate: int = -1, custom_frame_transforms: BaseTransformation | None = None, video_only: bool = False, *args, **kwargs) Dict [source]
- static random_sampling(total_video_frames: int, video_frames_per_clip: int, clips_per_video: int, total_audio_frames: int | None = None) Tuple[Tensor, Tensor | None] [source]
For a given video, sample clips_per_video indices randomly along with aligned audio indices (optionally).
- Parameters:
total_video_frames – number of video frames in the given video.
video_frames_per_clip – number of frames required per clip.
clips_per_video – number of clips needed from a given video.
total_audio_frames – number of audio frames in the given video.
- Returns:
- indices corresponding to video frames [Tensor (clips_per_video x
video_frames_per_clip)].
- aclip_idsindices corresponding to audio frames [Tensor (clips_per_video x
audio_frames_per_clip)].
- Return type:
vclip_ids
- static uniform_sampling(total_video_frames: int, video_frames_per_clip: int, clips_per_video: int, total_audio_frames: int | None = None) Tuple[Tensor, Tensor | None] [source]
For a given video, sample clips_per_video indices uniformly along with aligned audio indices (optionally).
- Parameters:
total_video_frames – number of video frames in the given video.
video_frames_per_clip – number of frames required per clip.
clips_per_video – number of clips needed from a given video.
total_audio_frames – number of audio frames in the given video.
- Returns:
- indices corresponding to video frames [Tensor (clips_per_video x
video_frames_per_clip)].
- aclip_idsindices corresponding to audio frames [Tensor (clips_per_video x
audio_frames_per_clip)].
- Return type:
vclip_ids
- read_video_file_into_clips(vid_filename: str, num_frames_per_clip: int, clips_per_video: int, is_training: bool, video_only: bool = False, output_video_fps: float = -1, output_audio_fps: int = -1, num_samples_per_clip: int = 1, custom_frame_transforms: BaseTransformation | None = None, *args, **kwargs) Dict [source]
Read a video file into clips and sample the clips at the specified video/audio frame rate. First, we read all the video and audio frames into the memory, where audio is at output_audio_fps if specified; then we sample clips_per_video clips from the entire video/audio tensor. If the desired video frame rate is specified, we subsample the video at frame rate output_video_fps. Despite whether the video is subsampled or not, there are num_frames_per_clip video frames in each clip.
- Parameters:
vid_filename – The path of the video to be read.
num_frames_per_clip – Number of frames per clip to read.
clips_per_video – Number of clips to read for each video.
training (is) – A boolean of whether the model is in training.
output_video_fps – The frame rate of the output video. Default is -1, which means no resampling is required.
output_audio_fps – The frame rate of the output audio. Default is -1, which means no resampling is required.
num_samples_per_clip – Number of random samples to generate per clip.
custom_frame_transforms –
If provided, the transformation will be used instead of the default @BaseAVReader.get_frame_transforms. Note: Be careful when customizing frame transforms, because there might
exist slight differences between the data type of frames read by different AVReaders before ToTensor() gets applied.
- Returns:
A dictionary with video/audio as tensor and metadata. The format of the metadata is the following: {
”video_fps”: float, “audio_fps”: int, “video_frame_timestamps”: [num_clips x video_frames_per_clip] tensor.
}
data.video_reader.decord_reader module
- class data.video_reader.decord_reader.DecordAVReader(*args, **kwargs)[source]
Bases:
BaseAVReader
Video Reader using Decord.
- read_video(av_file: str, stream_idx: int = 0, audio_sample_rate: int = -1, custom_frame_transforms: BaseTransformation | None = None, video_only: bool = False, *args, **kwargs) Dict [source]
data.video_reader.pyav_reader module
- class data.video_reader.pyav_reader.PyAVReader(opts: Namespace, is_training: bool | None = False, *args, **kwargs)[source]
Bases:
BaseAVReader
Video Reader using PyAV.
- read_video(av_file: str, stream_idx: int = 0, audio_sample_rate: int = -1, custom_frame_transforms: BaseTransformation | None = None, video_only: bool = False, *args, **kwargs) Dict [source]
Module contents
- data.video_reader.get_video_reader(opts, *args, **kwargs) BaseAVReader [source]
Helper function to build the video reader from command-line arguments.
- Parameters:
opts – Command-line arguments
is_training –
- Returns:
Image projection head module.