pytorchvideo.data¶

pytorchvideo.data.Ava(frame_paths_file, frame_labels_file, video_path_prefix='', label_map_file=None, clip_sampler=<class 'pytorchvideo.data.clip_sampling.ClipSampler'>, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None)[source]¶

Parameters

frame_paths_file (str) – Path to a file containing relative paths to all the frames in the video. Each line in the file is of the form <original_vido_id video_id frame_id rel_path labels>
frame_labels_file (str) –
Path to the file containing containing labels per key frame. Acceptible file formats are, Type 1:

<original_vido_id, frame_time_stamp, bbox_x_1, bbox_y_1, … bbox_x_2, bbox_y_2, action_lable, detection_iou>

Type 2:
<original_vido_id, frame_time_stamp, bbox_x_1, bbox_y_1, … bbox_x_2, bbox_y_2, action_lable, person_label>
video_path_prefix (str) – Path to be augumented to the each relative frame path to get the global frame path.
label_map_file (str) – Path to a .pbtxt containing class id’s and class names. If not set, label_map is not loaded and bbox labels are not pruned based on allowable class_id’s in label_map.
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Optional[Callable]) – This callable is evaluated on the clip output and the corresponding bounding boxes before the clip and the bounding boxes are returned. It can be used for user defined preprocessing and augmentations to the clips. If transform is None, the clip and bounding boxes are returned as it is.

Return type

None

class pytorchvideo.data.Charades(*args, **kwds)[source]¶

Bases: torch.utils.data.dataset.IterableDataset

Action recognition video dataset for Charades stored as image frames.

This dataset handles the parsing of frames, loading and clip sampling for the videos. All io is done through iopath.common.file_io.PathManager, enabling non-local storage uri’s to be used.

NUM_CLASSES = 157¶

__init__(data_path, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', frames_per_clip=None)[source]¶

Parameters

data_path (str) – Path to the data file. This file must be a space separated csv with the format: (original_vido_id video_id frame_id path_labels)
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Optional[Callable]) – This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations on the clips. The clip output format is described in __next__().
video_path_prefix (str) – prefix path to add to all paths from data_path.
frames_per_clip (Optional[int]) – The number of frames per clip to sample.

Return type

None

property video_sampler¶

__next__()[source]¶

Retrieves the next clip based on the clip sampling strategy and video sampler.

Returns

A dictionary with the following format.

{
    'video': <video_tensor>,
    'label': <index_label>,
    'video_label': <index_label>
    'video_index': <video_index>,
    'clip_index': <clip_index>,
    'aug_index': <aug_index>,
}

Return type

dict

class pytorchvideo.data.ClipSampler(clip_duration)[source]¶

Bases: abc.ABC

Interface for clip samplers that take a video time, previous sampled clip time, and returns a named-tuple ClipInfo.

class pytorchvideo.data.RandomClipSampler(clip_duration)[source]¶

Bases: pytorchvideo.data.clip_sampling.ClipSampler

Randomly samples clip of size clip_duration from the videos.

__call__(last_clip_time, video_duration, annotation)[source]¶

Parameters

last_clip_time (float) – Not used for RandomClipSampler.
video_duration (float) – (float): the duration (in seconds) for the video that’s being sampled
annotation (Dict) – Not used by this sampler.

Returns

clip_info (ClipInfo) – includes the clip information of (clip_start_time, clip_end_time, clip_index, aug_index, is_last_clip). The times are in seconds. clip_index, aux_index and is_last_clip are always 0, 0 and True, respectively.

Return type

pytorchvideo.data.clip_sampling.ClipInfo

class pytorchvideo.data.UniformClipSampler(clip_duration, stride=None, backpad_last=False, eps=1e-06)[source]¶

Bases: pytorchvideo.data.clip_sampling.ClipSampler

Evenly splits the video into clips of size clip_duration.

__init__(clip_duration, stride=None, backpad_last=False, eps=1e-06)[source]¶

Parameters

clip_duration (float) – The length of the clip to sample (in seconds)
stride (float, optional) –
The amount of seconds to offset the next clip by

default value of None is equivalent to no stride => stride == clip_duration
eps (float) – Epsilon for floating point comparisons. Used to check the last clip.
backpad_last (bool) –
Whether to include the last frame(s) by “back padding”.

For instance, if we have a video of 39 frames (30 fps = 1.3s) with a stride of 16 (0.533s) with a clip duration of 32 frames (1.0667s). The clips will be (in frame numbers):

with backpad_last = False - [0, 32]

with backpad_last = True - [0, 32] - [8, 40], this is “back-padded” from [16, 48] to fit the last window

__call__(last_clip_time, video_duration, annotation)[source]¶

Parameters

last_clip_time (float) – the last clip end time sampled from this video. This should be 0.0 if the video hasn’t had clips sampled yet.
video_duration (float) – (float): the duration of the video that’s being sampled in seconds
annotation (Dict) – Not used by this sampler.

Returns

clip_info – (ClipInfo): includes the clip information (clip_start_time, clip_end_time, clip_index, aug_index, is_last_clip), where the times are in seconds and is_last_clip is False when there is still more of time in the video to be sampled.

Return type

pytorchvideo.data.clip_sampling.ClipInfo

pytorchvideo.data.make_clip_sampler(sampling_type, *args)[source]¶

Constructs the clip samplers found in pytorchvideo.data.clip_sampling from the given arguments.

Parameters

sampling_type (str) –
choose clip sampler to return. It has three options:
- uniform: constructs and return UniformClipSampler
- random: construct and return RandomClipSampler
- constant_clips_per_video: construct and return ConstantClipsPerVideoSampler
*args – the args to pass to the chosen clip sampler constructor.

Return type

pytorchvideo.data.clip_sampling.ClipSampler

class pytorchvideo.data.DomsevFrameDataset(*args, **kwds)[source]¶

Bases: torch.utils.data.dataset.Dataset

Egocentric video classification frame-based dataset for DoMSEV

This dataset handles the loading, decoding, and configurable sampling for the image frames.

__init__(video_data_manifest_file_path, video_info_file_path, labels_file_path, transform=None, multithreaded_io=False)[source]¶

Parameters

video_data_manifest_file_path (str) –
The path to a json file outlining the available video data for the associated videos. File must be a csv (w/header) with columns: {[f.name for f in dataclass_fields(EncodedVideoInfo)]}

To generate this file from a directory of video frames, see helper functions in module: pytorchvideo.data.domsev.utils
video_info_file_path (str) – Path or URI to manifest with basic metadata of each video. File must be a csv (w/header) with columns: {[f.name for f in dataclass_fields(VideoInfo)]}
labels_file_path (str) – Path or URI to manifest with temporal annotations for each video. File must be a csv (w/header) with columns: {[f.name for f in dataclass_fields(LabelData)]}
dataset_type (VideoDatasetType) – The data format in which dataset video data is stored (e.g. video frames, encoded video etc).
transform (Optional[Callable[[Dict[str, Any]], Any]]) – This callable is evaluated on the clip output before the clip is returned. It can be used for user-defined preprocessing and augmentations to the clips. The clip output format is described in __next__().
multithreaded_io (bool) – Boolean to control whether io operations are performed across multiple threads.

Return type

None

__getitem__(index)[source]¶

Samples an image frame associated to the given index.

Parameters

index (int) – index for the image frame

Returns

An image frame with the following format if transform is None.

{{
    'frame_id': <str>,
    'image': <image_tensor>,
    'label': <label_tensor>,
}}

Return type

Dict[str, Any]

__len__()[source]¶

Returns: The number of frames in the dataset.
Return type: int

class pytorchvideo.data.DomsevVideoDataset(*args, **kwds)[source]¶

Bases: torch.utils.data.dataset.Dataset

Egocentric classification video clip-based dataset for DoMSEV stored as an encoded video (with frame-level labels).

This dataset handles the loading, decoding, and configurable clip sampling for the videos.

__init__(video_data_manifest_file_path, video_info_file_path, labels_file_path, clip_sampler, dataset_type=<VideoDatasetType.Frame: 1>, frames_per_second=1, transform=None, frame_filter=None, multithreaded_io=False)[source]¶

Parameters

video_data_manifest_file_path (str) –
The path to a json file outlining the available video data for the associated videos. File must be a csv (w/header) with columns: {[f.name for f in dataclass_fields(EncodedVideoInfo)]}

To generate this file from a directory of video frames, see helper functions in module: pytorchvideo.data.domsev.utils
video_info_file_path (str) – Path or URI to manifest with basic metadata of each video. File must be a csv (w/header) with columns: {[f.name for f in dataclass_fields(VideoInfo)]}
labels_file_path (str) – Path or URI to manifest with annotations for each video. File must be a csv (w/header) with columns: {[f.name for f in dataclass_fields(LabelData)]}
(Callable[[Dict[str (clip_sampler) – List[VideoClipInfo]]): Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
Video] – List[VideoClipInfo]]): Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
Dict[str – List[VideoClipInfo]]): Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
List[LabelData]]] – List[VideoClipInfo]]): Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
clip_sampler (Callable[[Dict[str, pytorchvideo.data.video.Video], Dict[str, List[pytorchvideo.data.domsev.LabelData]]], List[pytorchvideo.data.dataset_manifest_utils.VideoClipInfo]]) –
dataset_type (pytorchvideo.data.dataset_manifest_utils.VideoDatasetType) –
frames_per_second (int) –
transform (Optional[Callable[[Dict[str, Any]], Any]]) –
frame_filter (Optional[Callable[[List[int]], List[int]]]) –
multithreaded_io (bool) –

Return type

None

:paramList[VideoClipInfo]]):: Defines how clips should be sampled from each video. See the clip sampling documentation for more information.

Parameters

dataset_type (VideoDatasetType) – The data format in which dataset video data is stored (e.g. video frames, encoded video etc).
frames_per_second (int) – The FPS of the stored videos. (NOTE: this is variable and may be different than the original FPS reported on the DoMSEV dataset website – it depends on the preprocessed subsampling and frame extraction).
transform (Optional[Callable[[Dict[str, Any]], Any]]) – This callable is evaluated on the clip output before the clip is returned. It can be used for user-defined preprocessing and augmentations to the clips. The clip output format is described in __next__().
frame_filter (Optional[Callable[[List[int]], List[int]]]) – This callable is evaluated on the set of available frame indices to be included in a sampled clip. This can be used to subselect frames within a clip to be loaded.
multithreaded_io (bool) – Boolean to control whether io operations are performed across multiple threads.
video_data_manifest_file_path (str) –
video_info_file_path (str) –
labels_file_path (str) –
clip_sampler (Callable[[Dict[str, pytorchvideo.data.video.Video], Dict[str, List[pytorchvideo.data.domsev.LabelData]]], List[pytorchvideo.data.dataset_manifest_utils.VideoClipInfo]]) –

Return type

None

__getitem__(index)[source]¶

Samples a video clip associated to the given index.

Parameters

index (int) – index for the video clip.

Returns

A video clip with the following format if transform is None.

{{
    'video_id': <str>,
    'video': <video_tensor>,
    'audio': <audio_tensor>,
    'labels': <labels_tensor>,
    'start_time': <float>,
    'stop_time': <float>
}}

Return type

Dict[str, Any]

__len__()[source]¶

Returns: The number of video clips in the dataset.
Return type: int

class pytorchvideo.data.EpicKitchenForecasting(*args, **kwds)[source]¶

Bases: pytorchvideo.data.epic_kitchen.epic_kitchen_dataset.EpicKitchenDataset

Action forecasting video data set for EpicKitchen-55 Dataset. <https://epic-kitchens.github.io/2019/>

This dataset handles the loading, decoding, and clip sampling for the videos.

class pytorchvideo.data.EpicKitchenRecognition(*args, **kwds)[source]¶

Bases: pytorchvideo.data.epic_kitchen.epic_kitchen_dataset.EpicKitchenDataset

Action recognition video data set for EpicKitchen-55 Dataset. <https://epic-kitchens.github.io/2019/>

This dataset handles the loading, decoding, and clip sampling for the videos.

pytorchvideo.data.Hmdb51(data_path, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', split_id=1, split_type='train', decode_audio=True, decoder='pyav')[source]¶

A helper function to create LabeledVideoDataset object for HMDB51 dataset

Parameters

data_path (pathlib.Path) –
Path to the data. The path type defines how the data should be read:
- For a file path, the file is read and each line is parsed into a video path and label.
- For a directory, the directory structure defines the classes (i.e. each subdirectory is a class).
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Callable) – This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations to the clips. See the LabeledVideoDataset class for clip output format.
video_path_prefix (str) – Path to root directory with the videos that are loaded in LabeledVideoDataset. All the video paths before loading are prefixed with this path.
split_id (int) – Fold id to be loaded. Options are 1, 2 or 3
split_type (str) – Split/Fold type to be loaded. Options are (“train”, “test” or “unused”)
decoder (str) – Defines which backend should be used to decode videos.

Return type

pytorchvideo.data.labeled_video_dataset.LabeledVideoDataset

pytorchvideo.data.Kinetics(data_path, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', decode_audio=True, decoder='pyav')[source]¶

A helper function to create LabeledVideoDataset object for the Kinetics dataset.

Parameters

data_path (str) –
Path to the data. The path type defines how the data should be read:
- For a file path, the file is read and each line is parsed into a video path and label.
- For a directory, the directory structure defines the classes (i.e. each subdirectory is a class).
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Callable) – This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations to the clips. See the LabeledVideoDataset class for clip output format.
video_path_prefix (str) – Path to root directory with the videos that are loaded in LabeledVideoDataset. All the video paths before loading are prefixed with this path.
decode_audio (bool) – If True, also decode audio from video.
decoder (str) – Defines what type of decoder used to decode a video.

Return type

pytorchvideo.data.labeled_video_dataset.LabeledVideoDataset

class pytorchvideo.data.LabeledVideoDataset(*args, **kwds)[source]¶

Bases: torch.utils.data.dataset.IterableDataset

LabeledVideoDataset handles the storage, loading, decoding and clip sampling for a video dataset. It assumes each video is stored as either an encoded video (e.g. mp4, avi) or a frame video (e.g. a folder of jpg, or png)

__init__(labeled_video_paths, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, decode_audio=True, decoder='pyav')[source]¶

Parameters

labeled_video_paths (List[Tuple[str, Optional[dict]]]) – List containing video file paths and associated labels. If video paths are a folder it’s interpreted as a frame video, otherwise it must be an encoded video.
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Callable) – This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations on the clips. The clip output format is described in __next__().
decode_audio (bool) – If True, also decode audio from video.
decoder (str) – Defines what type of decoder used to decode a video. Not used for frame videos.

Return type

None

property video_sampler¶: Returns: The video sampler that defines video sample order. Note that you’ll need to use this property to set the epoch for a torch.utils.data.DistributedSampler.

property num_videos¶: Returns: Number of videos in dataset.

__next__()[source]¶

Retrieves the next clip based on the clip sampling strategy and video sampler.

Returns

A dictionary with the following format.

{
    'video': <video_tensor>,
    'label': <index_label>,
    'video_label': <index_label>
    'video_index': <video_index>,
    'clip_index': <clip_index>,
    'aug_index': <aug_index>,
}

Return type

dict

pytorchvideo.data.labeled_video_dataset(data_path, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', decode_audio=True, decoder='pyav')[source]¶

A helper function to create LabeledVideoDataset object for Ucf101 and Kinetics datasets.

Parameters

data_path (str) –
Path to the data. The path type defines how the data should be read:
- For a file path, the file is read and each line is parsed into a video path and label.
- For a directory, the directory structure defines the classes (i.e. each subdirectory is a class).
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Callable) – This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations to the clips. See the LabeledVideoDataset class for clip output format.
video_path_prefix (str) – Path to root directory with the videos that are loaded in LabeledVideoDataset. All the video paths before loading are prefixed with this path.
decode_audio (bool) – If True, also decode audio from video.
decoder (str) – Defines what type of decoder used to decode a video.

Return type

pytorchvideo.data.labeled_video_dataset.LabeledVideoDataset

class pytorchvideo.data.SSv2(*args, **kwds)[source]¶

Bases: torch.utils.data.dataset.IterableDataset

Action recognition video dataset for Something-something v2 (SSv2) stored as image frames.

This dataset handles the parsing of frames, loading and clip sampling for the videos. All io is done through iopath.common.file_io.PathManager, enabling non-local storage uri’s to be used.

__init__(label_name_file, video_label_file, video_path_label_file, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', frames_per_clip=None, rand_sample_frames=False)[source]¶

Parameters

label_name_file (str) – SSV2 label file that contains the label names and indexes.
video_label_file (str) – a file that contains video ids and the corresponding video label.
video_path_label_file (str) – a file that contains frame paths for each video and the corresponding frame label. The file must be a space separated csv of the format: (original_vido_id video_id frame_id path labels).
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Optional[Callable]) – This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations on the clips. The clip output format is described in __next__().
video_path_prefix (str) – prefix path to add to all paths from data_path.
frames_per_clip (Optional[int]) – The number of frames per clip to sample.
rand_sample_frames (bool) – If True, randomly sampling frames for each clip.

Return type

None

property video_sampler¶

__next__()[source]¶

Retrieves the next clip based on the clip sampling strategy and video sampler.

Returns

A dictionary with the following format.

{
    'video': <video_tensor>,
    'label': <index_label>,
    'video_label': <index_label>
    'video_index': <video_index>,
    'clip_index': <clip_index>,
    'aug_index': <aug_index>,
}

Return type

dict

pytorchvideo.data.Ucf101(data_path, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', decode_audio=True, decoder='pyav')[source]¶

A helper function to create LabeledVideoDataset object for the Ucf101 dataset.

Parameters

data_path (str) –
Path to the data. The path type defines how the data should be read:
- For a file path, the file is read and each line is parsed into a video path and label.
- For a directory, the directory structure defines the classes (i.e. each subdirectory is a class).
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Callable) – This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations to the clips. See the LabeledVideoDataset class for clip output format.
video_path_prefix (str) – Path to root directory with the videos that are loaded in LabeledVideoDataset. All the video paths before loading are prefixed with this path.
decode_audio (bool) – If True, also decode audio from video.
decoder (str) – Defines what type of decoder used to decode a video.

Return type

pytorchvideo.data.labeled_video_dataset.LabeledVideoDataset