pytorchvideo.data¶
-
pytorchvideo.data.
Ava
(frame_paths_file, frame_labels_file, video_path_prefix='', label_map_file=None, clip_sampler=<class 'pytorchvideo.data.clip_sampling.ClipSampler'>, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None)[source]¶ - Parameters
frame_paths_file (str) – Path to a file containing relative paths to all the frames in the video. Each line in the file is of the form <original_vido_id video_id frame_id rel_path labels>
frame_labels_file (str) –
Path to the file containing containing labels per key frame. Acceptible file formats are, Type 1:
<original_vido_id, frame_time_stamp, bbox_x_1, bbox_y_1, … bbox_x_2, bbox_y_2, action_lable, detection_iou>
- Type 2:
<original_vido_id, frame_time_stamp, bbox_x_1, bbox_y_1, … bbox_x_2, bbox_y_2, action_lable, person_label>
video_path_prefix (str) – Path to be augumented to the each relative frame path to get the global frame path.
label_map_file (str) – Path to a .pbtxt containing class id’s and class names. If not set, label_map is not loaded and bbox labels are not pruned based on allowable class_id’s in label_map.
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Optional[Callable]) – This callable is evaluated on the clip output and the corresponding bounding boxes before the clip and the bounding boxes are returned. It can be used for user defined preprocessing and augmentations to the clips. If transform is None, the clip and bounding boxes are returned as it is.
- Return type
-
class
pytorchvideo.data.
Charades
(*args, **kwds)[source]¶ Bases:
torch.utils.data.dataset.IterableDataset
Action recognition video dataset for Charades stored as image frames.
This dataset handles the parsing of frames, loading and clip sampling for the videos. All io is done through
iopath.common.file_io.PathManager
, enabling non-local storage uri’s to be used.-
NUM_CLASSES
= 157¶
-
__init__
(data_path, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', frames_per_clip=None)[source]¶ - Parameters
data_path (str) – Path to the data file. This file must be a space separated csv with the format: (original_vido_id video_id frame_id path_labels)
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Optional[Callable]) – This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations on the clips. The clip output format is described in __next__().
video_path_prefix (str) – prefix path to add to all paths from data_path.
frames_per_clip (Optional[int]) – The number of frames per clip to sample.
- Return type
-
property
video_sampler
¶
-
__next__
()[source]¶ Retrieves the next clip based on the clip sampling strategy and video sampler.
- Returns
A dictionary with the following format.
{ 'video': <video_tensor>, 'label': <index_label>, 'video_label': <index_label> 'video_index': <video_index>, 'clip_index': <clip_index>, 'aug_index': <aug_index>, }
- Return type
-
-
class
pytorchvideo.data.
ClipSampler
(clip_duration)[source]¶ Bases:
abc.ABC
Interface for clip samplers that take a video time, previous sampled clip time, and returns a named-tuple
ClipInfo
.
-
class
pytorchvideo.data.
RandomClipSampler
(clip_duration)[source]¶ Bases:
pytorchvideo.data.clip_sampling.ClipSampler
Randomly samples clip of size clip_duration from the videos.
-
__call__
(last_clip_time, video_duration, annotation)[source]¶ - Parameters
- Returns
clip_info (ClipInfo) – includes the clip information of (clip_start_time, clip_end_time, clip_index, aug_index, is_last_clip). The times are in seconds. clip_index, aux_index and is_last_clip are always 0, 0 and True, respectively.
- Return type
pytorchvideo.data.clip_sampling.ClipInfo
-
-
class
pytorchvideo.data.
UniformClipSampler
(clip_duration, stride=None, backpad_last=False, eps=1e-06)[source]¶ Bases:
pytorchvideo.data.clip_sampling.ClipSampler
Evenly splits the video into clips of size clip_duration.
-
__init__
(clip_duration, stride=None, backpad_last=False, eps=1e-06)[source]¶ - Parameters
clip_duration (float) – The length of the clip to sample (in seconds)
stride (float, optional) –
The amount of seconds to offset the next clip by
default value of None is equivalent to no stride => stride == clip_duration
eps (float) – Epsilon for floating point comparisons. Used to check the last clip.
backpad_last (bool) –
Whether to include the last frame(s) by “back padding”.
For instance, if we have a video of 39 frames (30 fps = 1.3s) with a stride of 16 (0.533s) with a clip duration of 32 frames (1.0667s). The clips will be (in frame numbers):
with backpad_last = False - [0, 32]
with backpad_last = True - [0, 32] - [8, 40], this is “back-padded” from [16, 48] to fit the last window
-
__call__
(last_clip_time, video_duration, annotation)[source]¶ - Parameters
- Returns
clip_info – (ClipInfo): includes the clip information (clip_start_time, clip_end_time, clip_index, aug_index, is_last_clip), where the times are in seconds and is_last_clip is False when there is still more of time in the video to be sampled.
- Return type
pytorchvideo.data.clip_sampling.ClipInfo
-
-
pytorchvideo.data.
make_clip_sampler
(sampling_type, *args)[source]¶ Constructs the clip samplers found in
pytorchvideo.data.clip_sampling
from the given arguments.- Parameters
sampling_type (str) –
choose clip sampler to return. It has three options:
uniform: constructs and return
UniformClipSampler
random: construct and return
RandomClipSampler
constant_clips_per_video: construct and return
ConstantClipsPerVideoSampler
*args – the args to pass to the chosen clip sampler constructor.
- Return type
pytorchvideo.data.clip_sampling.ClipSampler
-
class
pytorchvideo.data.
DomsevFrameDataset
(*args, **kwds)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Egocentric video classification frame-based dataset for DoMSEV
This dataset handles the loading, decoding, and configurable sampling for the image frames.
-
__init__
(video_data_manifest_file_path, video_info_file_path, labels_file_path, transform=None, multithreaded_io=False)[source]¶ - Parameters
video_data_manifest_file_path (str) –
The path to a json file outlining the available video data for the associated videos. File must be a csv (w/header) with columns:
{[f.name for f in dataclass_fields(EncodedVideoInfo)]}
To generate this file from a directory of video frames, see helper functions in module:
pytorchvideo.data.domsev.utils
video_info_file_path (str) – Path or URI to manifest with basic metadata of each video. File must be a csv (w/header) with columns:
{[f.name for f in dataclass_fields(VideoInfo)]}
labels_file_path (str) – Path or URI to manifest with temporal annotations for each video. File must be a csv (w/header) with columns:
{[f.name for f in dataclass_fields(LabelData)]}
dataset_type (VideoDatasetType) – The data format in which dataset video data is stored (e.g. video frames, encoded video etc).
transform (Optional[Callable[[Dict[str, Any]], Any]]) – This callable is evaluated on the clip output before the clip is returned. It can be used for user-defined preprocessing and augmentations to the clips. The clip output format is described in __next__().
multithreaded_io (bool) – Boolean to control whether io operations are performed across multiple threads.
- Return type
-
-
class
pytorchvideo.data.
DomsevVideoDataset
(*args, **kwds)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Egocentric classification video clip-based dataset for DoMSEV stored as an encoded video (with frame-level labels).
This dataset handles the loading, decoding, and configurable clip sampling for the videos.
-
__init__
(video_data_manifest_file_path, video_info_file_path, labels_file_path, clip_sampler, dataset_type=<VideoDatasetType.Frame: 1>, frames_per_second=1, transform=None, frame_filter=None, multithreaded_io=False)[source]¶ - Parameters
video_data_manifest_file_path (str) –
The path to a json file outlining the available video data for the associated videos. File must be a csv (w/header) with columns:
{[f.name for f in dataclass_fields(EncodedVideoInfo)]}
To generate this file from a directory of video frames, see helper functions in module:
pytorchvideo.data.domsev.utils
video_info_file_path (str) – Path or URI to manifest with basic metadata of each video. File must be a csv (w/header) with columns:
{[f.name for f in dataclass_fields(VideoInfo)]}
labels_file_path (str) – Path or URI to manifest with annotations for each video. File must be a csv (w/header) with columns:
{[f.name for f in dataclass_fields(LabelData)]}
(Callable[[Dict[str (clip_sampler) – List[VideoClipInfo]]): Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
Video] – List[VideoClipInfo]]): Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
Dict[str – List[VideoClipInfo]]): Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
List[LabelData]]] – List[VideoClipInfo]]): Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
clip_sampler (Callable[[Dict[str, pytorchvideo.data.video.Video], Dict[str, List[pytorchvideo.data.domsev.LabelData]]], List[pytorchvideo.data.dataset_manifest_utils.VideoClipInfo]]) –
dataset_type (pytorchvideo.data.dataset_manifest_utils.VideoDatasetType) –
frames_per_second (int) –
transform (Optional[Callable[[Dict[str, Any]], Any]]) –
multithreaded_io (bool) –
- Return type
- :paramList[VideoClipInfo]]):
Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
- Parameters
dataset_type (VideoDatasetType) – The data format in which dataset video data is stored (e.g. video frames, encoded video etc).
frames_per_second (int) – The FPS of the stored videos. (NOTE: this is variable and may be different than the original FPS reported on the DoMSEV dataset website – it depends on the preprocessed subsampling and frame extraction).
transform (Optional[Callable[[Dict[str, Any]], Any]]) – This callable is evaluated on the clip output before the clip is returned. It can be used for user-defined preprocessing and augmentations to the clips. The clip output format is described in __next__().
frame_filter (Optional[Callable[[List[int]], List[int]]]) – This callable is evaluated on the set of available frame indices to be included in a sampled clip. This can be used to subselect frames within a clip to be loaded.
multithreaded_io (bool) – Boolean to control whether io operations are performed across multiple threads.
video_data_manifest_file_path (str) –
video_info_file_path (str) –
labels_file_path (str) –
clip_sampler (Callable[[Dict[str, pytorchvideo.data.video.Video], Dict[str, List[pytorchvideo.data.domsev.LabelData]]], List[pytorchvideo.data.dataset_manifest_utils.VideoClipInfo]]) –
- Return type
-
-
class
pytorchvideo.data.
EpicKitchenForecasting
(*args, **kwds)[source]¶ Bases:
pytorchvideo.data.epic_kitchen.epic_kitchen_dataset.EpicKitchenDataset
Action forecasting video data set for EpicKitchen-55 Dataset. <https://epic-kitchens.github.io/2019/>
This dataset handles the loading, decoding, and clip sampling for the videos.
-
class
pytorchvideo.data.
EpicKitchenRecognition
(*args, **kwds)[source]¶ Bases:
pytorchvideo.data.epic_kitchen.epic_kitchen_dataset.EpicKitchenDataset
Action recognition video data set for EpicKitchen-55 Dataset. <https://epic-kitchens.github.io/2019/>
This dataset handles the loading, decoding, and clip sampling for the videos.
-
pytorchvideo.data.
Hmdb51
(data_path, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', split_id=1, split_type='train', decode_audio=True, decoder='pyav')[source]¶ A helper function to create
LabeledVideoDataset
object for HMDB51 dataset- Parameters
data_path (pathlib.Path) –
Path to the data. The path type defines how the data should be read:
For a file path, the file is read and each line is parsed into a video path and label.
For a directory, the directory structure defines the classes (i.e. each subdirectory is a class).
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Callable) – This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations to the clips. See the
LabeledVideoDataset
class for clip output format.video_path_prefix (str) – Path to root directory with the videos that are loaded in LabeledVideoDataset. All the video paths before loading are prefixed with this path.
split_id (int) – Fold id to be loaded. Options are 1, 2 or 3
split_type (str) – Split/Fold type to be loaded. Options are (“train”, “test” or “unused”)
decoder (str) – Defines which backend should be used to decode videos.
- Return type
pytorchvideo.data.labeled_video_dataset.LabeledVideoDataset
-
pytorchvideo.data.
Kinetics
(data_path, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', decode_audio=True, decoder='pyav')[source]¶ A helper function to create
LabeledVideoDataset
object for the Kinetics dataset.- Parameters
data_path (str) –
Path to the data. The path type defines how the data should be read:
For a file path, the file is read and each line is parsed into a video path and label.
For a directory, the directory structure defines the classes (i.e. each subdirectory is a class).
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Callable) – This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations to the clips. See the
LabeledVideoDataset
class for clip output format.video_path_prefix (str) – Path to root directory with the videos that are loaded in
LabeledVideoDataset
. All the video paths before loading are prefixed with this path.decode_audio (bool) – If True, also decode audio from video.
decoder (str) – Defines what type of decoder used to decode a video.
- Return type
pytorchvideo.data.labeled_video_dataset.LabeledVideoDataset
-
class
pytorchvideo.data.
LabeledVideoDataset
(*args, **kwds)[source]¶ Bases:
torch.utils.data.dataset.IterableDataset
LabeledVideoDataset handles the storage, loading, decoding and clip sampling for a video dataset. It assumes each video is stored as either an encoded video (e.g. mp4, avi) or a frame video (e.g. a folder of jpg, or png)
-
__init__
(labeled_video_paths, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, decode_audio=True, decoder='pyav')[source]¶ - Parameters
labeled_video_paths (List[Tuple[str, Optional[dict]]]) – List containing video file paths and associated labels. If video paths are a folder it’s interpreted as a frame video, otherwise it must be an encoded video.
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Callable) – This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations on the clips. The clip output format is described in __next__().
decode_audio (bool) – If True, also decode audio from video.
decoder (str) – Defines what type of decoder used to decode a video. Not used for frame videos.
- Return type
-
property
video_sampler
¶ Returns: The video sampler that defines video sample order. Note that you’ll need to use this property to set the epoch for a torch.utils.data.DistributedSampler.
-
property
num_videos
¶ Returns: Number of videos in dataset.
-
__next__
()[source]¶ Retrieves the next clip based on the clip sampling strategy and video sampler.
- Returns
A dictionary with the following format.
{ 'video': <video_tensor>, 'label': <index_label>, 'video_label': <index_label> 'video_index': <video_index>, 'clip_index': <clip_index>, 'aug_index': <aug_index>, }
- Return type
-
-
pytorchvideo.data.
labeled_video_dataset
(data_path, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', decode_audio=True, decoder='pyav')[source]¶ A helper function to create
LabeledVideoDataset
object for Ucf101 and Kinetics datasets.- Parameters
data_path (str) –
Path to the data. The path type defines how the data should be read:
For a file path, the file is read and each line is parsed into a video path and label.
For a directory, the directory structure defines the classes (i.e. each subdirectory is a class).
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Callable) – This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations to the clips. See the
LabeledVideoDataset
class for clip output format.video_path_prefix (str) – Path to root directory with the videos that are loaded in
LabeledVideoDataset
. All the video paths before loading are prefixed with this path.decode_audio (bool) – If True, also decode audio from video.
decoder (str) – Defines what type of decoder used to decode a video.
- Return type
pytorchvideo.data.labeled_video_dataset.LabeledVideoDataset
-
class
pytorchvideo.data.
SSv2
(*args, **kwds)[source]¶ Bases:
torch.utils.data.dataset.IterableDataset
Action recognition video dataset for Something-something v2 (SSv2) stored as image frames.
This dataset handles the parsing of frames, loading and clip sampling for the videos. All io is done through
iopath.common.file_io.PathManager
, enabling non-local storage uri’s to be used.-
__init__
(label_name_file, video_label_file, video_path_label_file, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', frames_per_clip=None, rand_sample_frames=False)[source]¶ - Parameters
label_name_file (str) – SSV2 label file that contains the label names and indexes.
video_label_file (str) – a file that contains video ids and the corresponding video label.
video_path_label_file (str) – a file that contains frame paths for each video and the corresponding frame label. The file must be a space separated csv of the format: (original_vido_id video_id frame_id path labels).
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Optional[Callable]) – This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations on the clips. The clip output format is described in __next__().
video_path_prefix (str) – prefix path to add to all paths from data_path.
frames_per_clip (Optional[int]) – The number of frames per clip to sample.
rand_sample_frames (bool) – If True, randomly sampling frames for each clip.
- Return type
-
property
video_sampler
¶
-
__next__
()[source]¶ Retrieves the next clip based on the clip sampling strategy and video sampler.
- Returns
A dictionary with the following format.
{ 'video': <video_tensor>, 'label': <index_label>, 'video_label': <index_label> 'video_index': <video_index>, 'clip_index': <clip_index>, 'aug_index': <aug_index>, }
- Return type
-
-
pytorchvideo.data.
Ucf101
(data_path, clip_sampler, video_sampler=<class 'torch.utils.data.sampler.RandomSampler'>, transform=None, video_path_prefix='', decode_audio=True, decoder='pyav')[source]¶ A helper function to create
LabeledVideoDataset
object for the Ucf101 dataset.- Parameters
data_path (str) –
Path to the data. The path type defines how the data should be read:
For a file path, the file is read and each line is parsed into a video path and label.
For a directory, the directory structure defines the classes (i.e. each subdirectory is a class).
clip_sampler (ClipSampler) – Defines how clips should be sampled from each video. See the clip sampling documentation for more information.
video_sampler (Type[torch.utils.data.Sampler]) – Sampler for the internal video container. This defines the order videos are decoded and, if necessary, the distributed split.
transform (Callable) – This callable is evaluated on the clip output before the clip is returned. It can be used for user defined preprocessing and augmentations to the clips. See the
LabeledVideoDataset
class for clip output format.video_path_prefix (str) – Path to root directory with the videos that are loaded in
LabeledVideoDataset
. All the video paths before loading are prefixed with this path.decode_audio (bool) – If True, also decode audio from video.
decoder (str) – Defines what type of decoder used to decode a video.
- Return type
pytorchvideo.data.labeled_video_dataset.LabeledVideoDataset