pytorchvideo.transforms<a class="headerlink" href="#module-pytorchvideo.transforms" title="Permalink to this headline">¶

__call__(video)[source]¶

Perform AugMix to the input video tensor.

Parameters: video (torch.Tensor) – Input video tensor with shape (T, C, H, W).
Return type: torch.Tensor

class pytorchvideo.transforms.MixVideo(cutmix_prob=0.5, mixup_alpha=1.0, cutmix_alpha=1.0, label_smoothing=0.0, num_classes=400)[source]¶

Bases: torch.nn.modules.module.Module

Stochastically applies either MixUp or CutMix to the input video.

__init__(cutmix_prob=0.5, mixup_alpha=1.0, cutmix_alpha=1.0, label_smoothing=0.0, num_classes=400)[source]¶

Parameters

cutmix_prob (float) – Probability of using CutMix. MixUp will be used with probability 1 - cutmix_prob. If cutmix_prob is 0, then MixUp is always used. If cutmix_prob is 1, then CutMix is always used.
mixup_alpha (float) – MixUp alpha value.
cutmix_alpha (float) – CutMix alpha value.
label_smoothing (float) – Label smoothing value.
num_classes (int) – Number of total classes.

forward(x, labels)[source]¶

The input is a batch of samples and their corresponding labels.

Parameters

x (torch.Tensor) – Input tensor. The input should be a batch of videos with shape (B, C, T, H, W).
labels (torch.Tensor) – Labels for input with shape (B).

training: bool¶

class pytorchvideo.transforms.CutMix(alpha=1.0, label_smoothing=0.0, num_classes=400)[source]¶

Bases: torch.nn.modules.module.Module

CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (https://arxiv.org/abs/1905.04899)

__init__(alpha=1.0, label_smoothing=0.0, num_classes=400)[source]¶

This implements CutMix for videos.

Parameters

alpha (float) – CutMix alpha value.
label_smoothing (float) – Label smoothing value.
num_classes (int) – Number of total classes.

Return type

forward(x, labels)[source]¶

The input is a batch of samples and their corresponding labels.

Parameters

x (torch.Tensor) – Input tensor. The input should be a batch of videos with shape (B, C, T, H, W).
labels (torch.Tensor) – Labels for input with shape (B).

Return type

training: bool¶

class pytorchvideo.transforms.MixUp(alpha=1.0, label_smoothing=0.0, num_classes=400)[source]¶

Bases: torch.nn.modules.module.Module

Mixup: Beyond Empirical Risk Minimization (https://arxiv.org/abs/1710.09412)

__init__(alpha=1.0, label_smoothing=0.0, num_classes=400)[source]¶

This implements MixUp for videos.

Parameters

alpha (float) – Mixup alpha value.
label_smoothing (float) – Label smoothing value.
num_classes (int) – Number of total classes.

Return type

forward(x, labels)[source]¶

The input is a batch of samples and their corresponding labels.

Parameters

x (torch.Tensor) – Input tensor. The input should be a batch of videos with shape (B, C, T, H, W).
labels (torch.Tensor) – Labels for input with shape (B).

Return type

training: bool¶

class pytorchvideo.transforms.RandAugment(magnitude=9, num_layers=2, prob=0.5, transform_hparas=None, sampling_type='gaussian', sampling_hparas=None)[source]¶

Bases: object

This implements RandAugment for video. Assume the input video tensor with shape (T, C, H, W).

RandAugment: Practical automated data augmentation with a reduced search space (https://arxiv.org/abs/1909.13719)

__init__(magnitude=9, num_layers=2, prob=0.5, transform_hparas=None, sampling_type='gaussian', sampling_hparas=None)[source]¶

This implements RandAugment for video.

Parameters

magnitude (int) – Magnitude used for transform function.
num_layers (int) – How many transform functions to apply for each augmentation.
prob (float) – The probablity of applying each transform function.
transform_hparas (Optional[Dict[Any]]) – Transform hyper parameters. Needs to have key fill. By default, it uses transform_default_hparas.
sampling_type (str) – Sampling method for magnitude of transform. It should be either gaussian or uniform.
sampling_hparas (Optional[Dict[Any]]) – Hyper parameters for sampling. If gaussian sampling is used, it needs to have key sampling_std. By default, it uses SAMPLING_RANDAUG_DEFAULT_HPARAS.

Return type

__call__(video)[source]¶

Perform RandAugment to the input video tensor.

Parameters: video (torch.Tensor) – Input video tensor with shape (T, C, H, W).
Return type: torch.Tensor

pytorchvideo.transforms.create_video_transform(mode, video_key=None, remove_key=None, num_samples=8, convert_to_float=True, video_mean=0.45, 0.45, 0.45, video_std=0.225, 0.225, 0.225, min_size=256, max_size=320, crop_size=224, horizontal_flip_prob=0.5, aug_type='default', aug_paras=None, random_resized_crop_paras=None)[source]¶

Function that returns a factory default callable video transform, with default parameters that can be modified. The transform that is returned depends on the mode parameter: when in “train” mode, we use randomized transformations, and when in “val” mode, we use the corresponding deterministic transformations. Depending on whether video_key is set, the input to the transform can either be a video tensor or a dict containing video_key that maps to a video tensor. The video tensor should be of shape (C, T, H, W).

RandomResizedCrop/RandomShortSideScale+RandomCrop ShortSideScale+CenterCrop: ↓

RandomHorizontalFlip
(transform) = transform can be included or excluded in the returned: composition of transformations

Parameters

mode (str) – ‘train’ or ‘val’. We use randomized transformations in ‘train’ mode, and we use the corresponding deterministic transformation in ‘val’ mode.
video_key (str, optional) – Optional key for video value in dictionary input. When video_key is None, the input is assumed to be a torch.Tensor. Default is None.
remove_key (List[str], optional) – Optional key to remove from a dictionary input. Default is None.
num_samples (int, optional) – The number of equispaced samples to be selected in UniformTemporalSubsample. If None, then UniformTemporalSubsample will not be used. Default is 8.
convert_to_float (bool) – If True, converts images from uint8 to float. Otherwise, leaves the image as is. Default is True.
video_mean (Tuple[float, float, float]) – Sequence of means for each channel to normalize to zero mean and unit variance. Default is (0.45, 0.45, 0.45).
video_std (Tuple[float, float, float]) – Sequence of standard deviations for each channel to normalize to zero mean and unit variance. Default is (0.225, 0.225, 0.225).
min_size (int) – Minimum size that the shorter side is scaled to for RandomShortSideScale. If in “val” mode, this is the exact size the the shorter side is scaled to for ShortSideScale. Default is 256.
max_size (int) – Maximum size that the shorter side is scaled to for RandomShortSideScale. Default is 340.
crop_size (int or Tuple[int, int]) – Desired output size of the crop for RandomCrop in “train” mode and CenterCrop in “val” mode. If size is an int instead of sequence like (h, w), a square crop (size, size) is made. Default is 224.
horizontal_flip_prob (float) – Probability of the video being flipped in RandomHorizontalFlip. Default value is 0.5.
aug_type (str) – Currently supports ‘default’, ‘randaug’, or ‘augmix’. No augmentations other than RandomShortSideScale and RandomCrop area performed when aug_type is ‘default’. RandAugment is used when aug_type is ‘randaug’ and AugMix is used when aug_type is ‘augmix’. Default is ‘default’.
aug_paras (Dict[str, Any], optional) – A dictionary that contains the necessary parameters for the augmentation set in aug_type. If any parameters are missing or if None, default parameters will be used. Default is None.
random_resized_crop_paras (Dict[str, Any], optional) – A dictionary that contains the necessary parameters for Inception-style cropping. This crops the given videos to random size and aspect ratio. A crop of random size relative to the original size and a random aspect ratio is made. This crop is finally resized to given size. This is popularly used to train the Inception networks. If any parameters are missing or if None, default parameters in _RANDOM_RESIZED_CROP_DEFAULT_PARAS will be used. If None, RandomShortSideScale and RandomCrop will be used as a fallback. Default is None.

Returns

A factory-default callable composition of transforms.

Return type

Union[Callable[[torch.Tensor], torch.Tensor], Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]

class pytorchvideo.transforms.ApplyTransformToKey(key, transform)[source]¶

Bases: object

Applies transform to key of dictionary input.

Parameters

key (str) – the dictionary key the transform is applied to
transform (callable) – the transform that is applied

Example

>>>   transforms.ApplyTransformToKey(
>>>       key='video',
>>>       transform=UniformTemporalSubsample(num_video_samples),
>>>   )

pytorchvideo.transforms.Callable¶

Callable type; Callable[[int], str] is a function of (int) -> str.

The subscription syntax must always be used with exactly two values: the argument list and the return type. The argument list must be a list of types or ellipsis; the return type must be a single type.

There is no syntax to indicate optional or keyword arguments, such function types are rarely used as callback types.

alias of Callable

class pytorchvideo.transforms.ConvertUint8ToFloat[source]¶

Bases: torch.nn.modules.module.Module

Converts a video from dtype uint8 to dtype float32.

forward(x)[source]¶

Parameters: x (torch.Tensor) – video tensor with shape (C, T, H, W).
Return type: torch.Tensor

training¶

pytorchvideo.transforms.Dict¶

The central part of internal API.

This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.

alias of Dict

class pytorchvideo.transforms.Div255[source]¶

Bases: torch.nn.modules.module.Module

nn.Module wrapper for pytorchvideo.transforms.functional.div_255.

forward(x)[source]¶

Scale clip frames from [0, 255] to [0, 1]. :param x: A tensor of the clip’s RGB frames with shape:

Returns: x (Tensor) – Scaled tensor by dividing 255.
Parameters: x (Tensor) –
Return type: torch.Tensor

training¶

pytorchvideo.transforms.List¶

The central part of internal API.

This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.

alias of List

class pytorchvideo.transforms.Normalize(mean, std, inplace=False)[source]¶

Bases: torchvision.transforms.transforms.Normalize

Normalize the (CTHW) video clip by mean subtraction and division by standard deviation

Parameters

mean (3-tuple) – pixel RGB mean
std (3-tuple) – pixel RGB standard deviation
inplace (boolean) – whether do in-place normalization

forward(x)[source]¶

Parameters: x (torch.Tensor) – video tensor with shape (C, T, H, W).
Return type: torch.Tensor

training¶

class pytorchvideo.transforms.OpSampler(transforms_list, transforms_prob=None, num_sample_op=1, randomly_sample_depth=False, replacement=False)[source]¶

Bases: torch.nn.modules.module.Module

Given a list of transforms with weights, OpSampler applies weighted sampling to select n transforms, which are then applied sequentially to the input.

__init__(transforms_list, transforms_prob=None, num_sample_op=1, randomly_sample_depth=False, replacement=False)[source]¶

Parameters

transforms_list (List[Callable]) – A list of tuples of all available transforms to sample from.
transforms_prob (Optional[List[float]]) – The probabilities associated with each transform in transforms_list. If not provided, the sampler assumes a uniform distribution over all transforms. They do not need to sum up to one but weights need to be positive.
num_sample_op (int) – Number of transforms to sample and apply to input.
randomly_sample_depth (bool) – If randomly_sample_depth is True, then uniformly sample the number of transforms to apply, between 1 and num_sample_op.
replacement (bool) – If replacement is True, transforms are drawn with replacement.

forward(x)[source]¶

Parameters: x (torch.Tensor) – Input tensor.
Return type: torch.Tensor

training¶

class pytorchvideo.transforms.Permute(dims)[source]¶

Bases: torch.nn.modules.module.Module

Permutes the dimensions of a video.

__init__(dims)[source]¶

Parameters: dims (Tuple[int]) – The desired ordering of dimensions.

forward(x)[source]¶

Parameters: x (torch.Tensor) – video tensor whose dimensions are to be permuted.
Return type: torch.Tensor

training¶

class pytorchvideo.transforms.RandomResizedCrop(target_height, target_width, scale, aspect_ratio, shift=False, log_uniform_ratio=True, interpolation='bilinear', num_tries=10)[source]¶

Bases: torch.nn.modules.module.Module

nn.Module wrapper for pytorchvideo.transforms.functional.random_resized_crop.

__call__(x)[source]¶

Parameters: x (torch.Tensor) – Input video tensor with shape (C, T, H, W).
Return type: torch.Tensor

training¶

class pytorchvideo.transforms.RandomShortSideScale(min_size, max_size)[source]¶

Bases: torch.nn.modules.module.Module

nn.Module wrapper for pytorchvideo.transforms.functional.short_side_scale. The size parameter is chosen randomly in [min_size, max_size].

forward(x)[source]¶

Parameters: x (torch.Tensor) – video tensor with shape (C, T, H, W).
Return type: torch.Tensor

training¶

class pytorchvideo.transforms.RemoveKey(key)[source]¶

Bases: torch.nn.modules.module.Module

Removes the given key from the input dict. Useful for removing modalities from a video clip that aren’t needed.

__call__(x)[source]¶

Parameters: x (Dict[str, torch.Tensor]) – video clip dict.
Return type: Dict[str, torch.Tensor]

training¶

class pytorchvideo.transforms.ShortSideScale(size)[source]¶

Bases: torch.nn.modules.module.Module

nn.Module wrapper for pytorchvideo.transforms.functional.short_side_scale.

forward(x)[source]¶

Parameters: x (torch.Tensor) – video tensor with shape (C, T, H, W).
Return type: torch.Tensor

training¶

pytorchvideo.transforms.Tuple¶

Tuple type; Tuple[X, Y] is the cross-product type of X and Y.

Example: Tuple[T1, T2] is a tuple of two elements corresponding to type variables T1 and T2. Tuple[int, float, str] is a tuple of an int, a float and a string.

To specify a variable-length tuple of homogeneous type, use Tuple[T, …].

alias of Tuple

class pytorchvideo.transforms.UniformCropVideo(size, video_key='video', aug_index_key='aug_index')[source]¶

Bases: torch.nn.modules.module.Module

nn.Module wrapper for pytorchvideo.transforms.functional.uniform_crop.

__call__(x)[source]¶

Parameters: x (Dict[str, torch.Tensor]) – video clip dict.
Return type: Dict[str, torch.Tensor]

training¶

class pytorchvideo.transforms.UniformTemporalSubsample(num_samples)[source]¶

Bases: torch.nn.modules.module.Module

nn.Module wrapper for pytorchvideo.transforms.functional.uniform_temporal_subsample.

forward(x)[source]¶

Parameters: x (torch.Tensor) – video tensor with shape (C, T, H, W).
Return type: torch.Tensor

training¶

class pytorchvideo.transforms.UniformTemporalSubsampleRepeated(frame_ratios)[source]¶

Bases: torch.nn.modules.module.Module

nn.Module wrapper for pytorchvideo.transforms.functional.uniform_temporal_subsample_repeated.

forward(x)[source]¶

Parameters: x (torch.Tensor) – video tensor with shape (C, T, H, W).

training¶

pytorchvideo.transforms.functional¶

pytorchvideo.transforms.functional.Tuple¶

Tuple type; Tuple[X, Y] is the cross-product type of X and Y.

Example: Tuple[T1, T2] is a tuple of two elements corresponding to type variables T1 and T2. Tuple[int, float, str] is a tuple of an int, a float and a string.

To specify a variable-length tuple of homogeneous type, use Tuple[T, …].

alias of Tuple

pytorchvideo.transforms.functional.uniform_temporal_subsample(x, num_samples, temporal_dim=- 3)[source]¶

Uniformly subsamples num_samples indices from the temporal dimension of the video. When num_samples is larger than the size of temporal dimension of the video, it will sample frames based on nearest neighbor interpolation.

Parameters

x (torch.Tensor) – A video tensor with dimension larger than one with torch tensor type includes int, long, float, complex, etc.
num_samples (int) – The number of equispaced samples to be selected
temporal_dim (int) – dimension of temporal to perform temporal subsample.

Returns

An x-like Tensor with subsampled temporal dimension.

Return type

pytorchvideo.transforms.functional.short_side_scale(x, size, interpolation='bilinear', backend='pytorch')[source]¶

Determines the shorter spatial dim of the video (i.e. width or height) and scales it to the given size. To maintain aspect ratio, the longer side is then scaled accordingly. :param x: A video tensor of shape (C, T, H, W) and type torch.float32. :type x: torch.Tensor :param size: The size the shorter side is scaled to. :type size: int :param interpolation: Algorithm used for upsampling,

Parameters

backend (str) – backend used to perform interpolation. Options includes pytorch as default, and opencv. Note that opencv and pytorch behave differently on linear interpolation on some versions. https://discuss.pytorch.org/t/pytorch-linear-interpolation-is-different-from-pil-opencv/71181
x (torch.Tensor) –
size (int) –
interpolation (str) –

Returns

An x-like Tensor with scaled spatial dims.

Return type

pytorchvideo.transforms.functional.uniform_temporal_subsample_repeated(frames, frame_ratios, temporal_dim=- 3)[source]¶

Prepare output as a list of tensors subsampled from the input frames. Each tensor: maintain a unique copy of subsampled frames, which corresponds to a unique pathway.

Parameters

frames (tensor) – frames of images sampled from the video. Expected to have torch tensor (including int, long, float, complex, etc) with dimension larger than one.
frame_ratios (tuple) – ratio to perform temporal down-sampling for each pathways.
temporal_dim (int) – dimension of temporal.

Returns

frame_list (tuple) – list of tensors as output.

Return type

Tuple[torch.Tensor]

pytorchvideo.transforms.functional.convert_to_one_hot(targets, num_class, label_smooth=0.0)[source]¶

This function converts target class indices to one-hot vectors, given the number of classes.

Parameters

targets (torch.Tensor) – Index labels to be converted.
num_class (int) – Total number of classes.
label_smooth (float) – Label smooth value for non-target classes. Label smooth is disabled by default (0).

Return type

pytorchvideo.transforms.functional.short_side_scale_with_boxes(images, boxes, size, interpolation='bilinear', backend='pytorch')[source]¶

Perform a spatial short scale jittering on the given images and corresponding boxes. :param images: images to perform scale jitter. Dimension is

Parameters

boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.
size (int) – The size the shorter side is scaled to.
interpolation (str) – Algorithm used for upsampling, options: nearest’ | ‘linear’ | ‘bilinear’ | ‘bicubic’ | ‘trilinear’ | ‘area’
backend (str) – backend used to perform interpolation. Options includes pytorch as default, and opencv. Note that opencv and pytorch behave differently on linear interpolation on some versions. https://discuss.pytorch.org/t/pytorch-linear-interpolation-is-different-from-pil-opencv/71181
images (tensor) –

Returns

(tensor) –

the scaled images with dimension of: channel x num frames x height x width.
(tensor): the scaled boxes with dimension of: num boxes x 4.

Return type

Tuple[torch.Tensor, numpy.ndarray]

pytorchvideo.transforms.functional.random_short_side_scale_with_boxes(images, boxes, min_size, max_size, interpolation='bilinear', backend='pytorch')[source]¶

Perform a spatial short scale jittering on the given images and corresponding boxes. :param images: images to perform scale jitter. Dimension is

Parameters

boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.
min_size (int) – the minimal size to scale the frames.
max_size (int) – the maximal size to scale the frames.
interpolation (str) – Algorithm used for upsampling, options: nearest’ | ‘linear’ | ‘bilinear’ | ‘bicubic’ | ‘trilinear’ | ‘area’
backend (str) – backend used to perform interpolation. Options includes pytorch as default, and opencv. Note that opencv and pytorch behave differently on linear interpolation on some versions. https://discuss.pytorch.org/t/pytorch-linear-interpolation-is-different-from-pil-opencv/71181
images (tensor) –

Returns

(tensor) –

the scaled images with dimension of: channel x num frames x height x width.
(tensor): the scaled boxes with dimension of: num boxes x 4.

Return type

pytorchvideo.transforms.functional.random_crop_with_boxes(images, size, boxes)[source]¶

Perform random spatial crop on the given images and corresponding boxes. :param images: images to perform random crop. The dimension is

Parameters

size (int) – the size of height and width to crop on the image.
boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.
images (tensor) –

Returns

cropped (tensor) –

cropped images with dimension of: channel x num frames x height x width.
cropped_boxes (tensor): the cropped boxes with dimension of: num boxes x 4.

Return type

pytorchvideo.transforms.functional.uniform_crop(images, size, spatial_idx)[source]¶

Perform uniform spatial sampling on the images and corresponding boxes. :param images: images to perform uniform crop. The dimension is

Parameters

size (int) – size of height and weight to crop the images.
spatial_idx (int) – 0, 1, or 2 for left, center, and right crop if width is larger than height. Or 0, 1, or 2 for top, center, and bottom crop if height is larger than width.
images (tensor) –

Returns

cropped (tensor) –

images with dimension of: channel x num frames x height x width.

Return type

pytorchvideo.transforms.functional.uniform_crop_with_boxes(images, size, spatial_idx, boxes)[source]¶

Perform uniform spatial sampling on the images and corresponding boxes. :param images: images to perform uniform crop. The dimension is

Parameters

size (int) – size of height and weight to crop the images.
spatial_idx (int) – 0, 1, or 2 for left, center, and right crop if width is larger than height. Or 0, 1, or 2 for top, center, and bottom crop if height is larger than width.
boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.
images (tensor) –

Returns

cropped (tensor) –

images with dimension of: channel x num frames x height x width.
cropped_boxes (tensor): the cropped boxes with dimension of: num boxes x 4.

Return type

Tuple[torch.Tensor, numpy.ndarray]

pytorchvideo.transforms.functional.horizontal_flip_with_boxes(prob, images, boxes)[source]¶

Perform horizontal flip on the given images and corresponding boxes. :param prob: probility to flip the images. :type prob: float :param images: images to perform horizontal flip, the dimension is

Parameters

boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.
prob (float) –
images (tensor) –

Returns

images (tensor) –

images with dimension of: channel x num frames x height x width.
flipped_boxes (tensor): the flipped boxes with dimension of: num boxes x 4.

Return type

pytorchvideo.transforms.functional.clip_boxes_to_image(boxes, height, width)[source]¶

Clip an array of boxes to an image with the given height and width. :param boxes: bounding boxes to perform clipping.

Parameters

height (int) – given image height.
width (int) – given image width.
boxes (tensor) –

Returns

clipped_boxes (tensor) –

the clipped boxes with dimension of: num boxes x 4.

Return type

pytorchvideo.transforms.functional.crop_boxes(boxes, x_offset, y_offset)[source]¶

Peform crop on the bounding boxes given the offsets. :param boxes: bounding boxes to peform crop. The dimension

Parameters

x_offset (int) – cropping offset in the x axis.
y_offset (int) – cropping offset in the y axis.
boxes (torch.Tensor) –

Returns

cropped_boxes (torch.Tensor) –

the cropped boxes with dimension of: num boxes x 4.

Return type

pytorchvideo.transforms.functional.random_resized_crop(frames, target_height, target_width, scale, aspect_ratio, shift=False, log_uniform_ratio=True, interpolation='bilinear', num_tries=10)[source]¶

Crop the given images to random size and aspect ratio. A crop of random size relative to the original size and a random aspect ratio is made. This crop is finally resized to given size. This is popularly used to train the Inception networks.

Parameters

frames (torch.Tensor) – Video tensor to be resized with shape (C, T, H, W).
target_height (int) – Desired height after cropping.
target_width (int) – Desired width after cropping.
scale (Tuple[float, float]) – Scale range of Inception-style area based random resizing. Should be between 0.0 and 1.0.
aspect_ratio (Tuple[float, float]) – Aspect ratio range of Inception-style area based random resizing. Should be between 0.0 and +infinity.
shift (bool) – Bool that determines whether or not to sample two different boxes (for cropping) for the first and last frame. If True, it then linearly interpolates the two boxes for other frames. If False, the same box is cropped for every frame. Default is False.
log_uniform_ratio (bool) – Whether to use a log-uniform distribution to sample the aspect ratio. Default is True.
interpolation (str) – Algorithm used for upsampling. Currently supports ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’. Default is ‘bilinear’.
num_tries (int) – The number of times to attempt a randomly resized crop. Falls back to a central crop after all attempts are exhausted. Default is 10.

Returns

cropped (tensor) – A cropped video tensor of shape (C, T, target_height, target_width).

Return type