Shortcuts

pytorchvideo.transforms

class pytorchvideo.transforms.AugMix(magnitude=3, alpha=1.0, width=3, depth=- 1, transform_hparas=None, sampling_hparas=None)[source]

Bases: object

This implements AugMix for video. AugMix generates several chains of augmentations on the original video, which are then mixed together with each other and with the original video to create an augmented video. The input video tensor should have shape (T, C, H, W).

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty (https://arxiv.org/pdf/1912.02781.pdf)

__init__(magnitude=3, alpha=1.0, width=3, depth=- 1, transform_hparas=None, sampling_hparas=None)[source]
Parameters
  • magnitude (int) – Magnitude used for transform function. Default is 3.

  • alpha (float) – Parameter for choosing mixing weights from the beta and Dirichlet distributions. Default is 1.0.

  • width (int) – The number of transformation chains. Default is 3.

  • depth (int) – The number of transformations in each chain. If depth is -1, each chain will have a random length between 1 and 3 inclusive. Default is -1.

  • transform_hparas (Optional[Dict[Any]]) – Transform hyper parameters. Needs to have key fill. By default, the fill value is (0.5, 0.5, 0.5).

  • sampling_hparas (Optional[Dict[Any]]) – Hyper parameters for sampling. If gaussian sampling is used, it needs to have key sampling_std. By default, it uses SAMPLING_AUGMIX_DEFAULT_HPARAS.

Return type

None

__call__(video)[source]

Perform AugMix to the input video tensor.

Parameters

video (torch.Tensor) – Input video tensor with shape (T, C, H, W).

Return type

torch.Tensor

class pytorchvideo.transforms.MixVideo(cutmix_prob=0.5, mixup_alpha=1.0, cutmix_alpha=1.0, label_smoothing=0.0, num_classes=400)[source]

Bases: torch.nn.modules.module.Module

Stochastically applies either MixUp or CutMix to the input video.

__init__(cutmix_prob=0.5, mixup_alpha=1.0, cutmix_alpha=1.0, label_smoothing=0.0, num_classes=400)[source]
Parameters
  • cutmix_prob (float) – Probability of using CutMix. MixUp will be used with probability 1 - cutmix_prob. If cutmix_prob is 0, then MixUp is always used. If cutmix_prob is 1, then CutMix is always used.

  • mixup_alpha (float) – MixUp alpha value.

  • cutmix_alpha (float) – CutMix alpha value.

  • label_smoothing (float) – Label smoothing value.

  • num_classes (int) – Number of total classes.

forward(x, labels)[source]

The input is a batch of samples and their corresponding labels.

Parameters
  • x (torch.Tensor) – Input tensor. The input should be a batch of videos with shape (B, C, T, H, W).

  • labels (torch.Tensor) – Labels for input with shape (B).

training: bool
class pytorchvideo.transforms.CutMix(alpha=1.0, label_smoothing=0.0, num_classes=400)[source]

Bases: torch.nn.modules.module.Module

CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (https://arxiv.org/abs/1905.04899)

__init__(alpha=1.0, label_smoothing=0.0, num_classes=400)[source]

This implements CutMix for videos.

Parameters
  • alpha (float) – CutMix alpha value.

  • label_smoothing (float) – Label smoothing value.

  • num_classes (int) – Number of total classes.

Return type

None

forward(x, labels)[source]

The input is a batch of samples and their corresponding labels.

Parameters
  • x (torch.Tensor) – Input tensor. The input should be a batch of videos with shape (B, C, T, H, W).

  • labels (torch.Tensor) – Labels for input with shape (B).

Return type

Tuple[torch.Tensor, torch.Tensor]

training: bool
class pytorchvideo.transforms.MixUp(alpha=1.0, label_smoothing=0.0, num_classes=400)[source]

Bases: torch.nn.modules.module.Module

Mixup: Beyond Empirical Risk Minimization (https://arxiv.org/abs/1710.09412)

__init__(alpha=1.0, label_smoothing=0.0, num_classes=400)[source]

This implements MixUp for videos.

Parameters
  • alpha (float) – Mixup alpha value.

  • label_smoothing (float) – Label smoothing value.

  • num_classes (int) – Number of total classes.

Return type

None

forward(x, labels)[source]

The input is a batch of samples and their corresponding labels.

Parameters
  • x (torch.Tensor) – Input tensor. The input should be a batch of videos with shape (B, C, T, H, W).

  • labels (torch.Tensor) – Labels for input with shape (B).

Return type

Tuple[torch.Tensor, torch.Tensor]

training: bool
class pytorchvideo.transforms.RandAugment(magnitude=9, num_layers=2, prob=0.5, transform_hparas=None, sampling_type='gaussian', sampling_hparas=None)[source]

Bases: object

This implements RandAugment for video. Assume the input video tensor with shape (T, C, H, W).

RandAugment: Practical automated data augmentation with a reduced search space (https://arxiv.org/abs/1909.13719)

__init__(magnitude=9, num_layers=2, prob=0.5, transform_hparas=None, sampling_type='gaussian', sampling_hparas=None)[source]

This implements RandAugment for video.

Parameters
  • magnitude (int) – Magnitude used for transform function.

  • num_layers (int) – How many transform functions to apply for each augmentation.

  • prob (float) – The probablity of applying each transform function.

  • transform_hparas (Optional[Dict[Any]]) – Transform hyper parameters. Needs to have key fill. By default, it uses transform_default_hparas.

  • sampling_type (str) – Sampling method for magnitude of transform. It should be either gaussian or uniform.

  • sampling_hparas (Optional[Dict[Any]]) – Hyper parameters for sampling. If gaussian sampling is used, it needs to have key sampling_std. By default, it uses SAMPLING_RANDAUG_DEFAULT_HPARAS.

Return type

None

__call__(video)[source]

Perform RandAugment to the input video tensor.

Parameters

video (torch.Tensor) – Input video tensor with shape (T, C, H, W).

Return type

torch.Tensor

pytorchvideo.transforms.create_video_transform(mode, video_key=None, remove_key=None, num_samples=8, convert_to_float=True, video_mean=0.45, 0.45, 0.45, video_std=0.225, 0.225, 0.225, min_size=256, max_size=320, crop_size=224, horizontal_flip_prob=0.5, aug_type='default', aug_paras=None, random_resized_crop_paras=None)[source]

Function that returns a factory default callable video transform, with default parameters that can be modified. The transform that is returned depends on the mode parameter: when in “train” mode, we use randomized transformations, and when in “val” mode, we use the corresponding deterministic transformations. Depending on whether video_key is set, the input to the transform can either be a video tensor or a dict containing video_key that maps to a video tensor. The video tensor should be of shape (C, T, H, W).

“train” mode “val” mode

(UniformTemporalSubsample) (UniformTemporalSubsample)

(RandAugment/AugMix) ↓

(ConvertUint8ToFloat) (ConvertUint8ToFloat)

↓ ↓

Normalize Normalize

↓ ↓

RandomResizedCrop/RandomShortSideScale+RandomCrop ShortSideScale+CenterCrop

RandomHorizontalFlip

(transform) = transform can be included or excluded in the returned

composition of transformations

Parameters
  • mode (str) – ‘train’ or ‘val’. We use randomized transformations in ‘train’ mode, and we use the corresponding deterministic transformation in ‘val’ mode.

  • video_key (str, optional) – Optional key for video value in dictionary input. When video_key is None, the input is assumed to be a torch.Tensor. Default is None.

  • remove_key (List[str], optional) – Optional key to remove from a dictionary input. Default is None.

  • num_samples (int, optional) – The number of equispaced samples to be selected in UniformTemporalSubsample. If None, then UniformTemporalSubsample will not be used. Default is 8.

  • convert_to_float (bool) – If True, converts images from uint8 to float. Otherwise, leaves the image as is. Default is True.

  • video_mean (Tuple[float, float, float]) – Sequence of means for each channel to normalize to zero mean and unit variance. Default is (0.45, 0.45, 0.45).

  • video_std (Tuple[float, float, float]) – Sequence of standard deviations for each channel to normalize to zero mean and unit variance. Default is (0.225, 0.225, 0.225).

  • min_size (int) – Minimum size that the shorter side is scaled to for RandomShortSideScale. If in “val” mode, this is the exact size the the shorter side is scaled to for ShortSideScale. Default is 256.

  • max_size (int) – Maximum size that the shorter side is scaled to for RandomShortSideScale. Default is 340.

  • crop_size (int or Tuple[int, int]) – Desired output size of the crop for RandomCrop in “train” mode and CenterCrop in “val” mode. If size is an int instead of sequence like (h, w), a square crop (size, size) is made. Default is 224.

  • horizontal_flip_prob (float) – Probability of the video being flipped in RandomHorizontalFlip. Default value is 0.5.

  • aug_type (str) – Currently supports ‘default’, ‘randaug’, or ‘augmix’. No augmentations other than RandomShortSideScale and RandomCrop area performed when aug_type is ‘default’. RandAugment is used when aug_type is ‘randaug’ and AugMix is used when aug_type is ‘augmix’. Default is ‘default’.

  • aug_paras (Dict[str, Any], optional) – A dictionary that contains the necessary parameters for the augmentation set in aug_type. If any parameters are missing or if None, default parameters will be used. Default is None.

  • random_resized_crop_paras (Dict[str, Any], optional) – A dictionary that contains the necessary parameters for Inception-style cropping. This crops the given videos to random size and aspect ratio. A crop of random size relative to the original size and a random aspect ratio is made. This crop is finally resized to given size. This is popularly used to train the Inception networks. If any parameters are missing or if None, default parameters in _RANDOM_RESIZED_CROP_DEFAULT_PARAS will be used. If None, RandomShortSideScale and RandomCrop will be used as a fallback. Default is None.

Returns

A factory-default callable composition of transforms.

Return type

Union[Callable[[torch.Tensor], torch.Tensor], Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]

class pytorchvideo.transforms.ApplyTransformToKey(key, transform)[source]

Bases: object

Applies transform to key of dictionary input.

Parameters
  • key (str) – the dictionary key the transform is applied to

  • transform (callable) – the transform that is applied

Example

>>>   transforms.ApplyTransformToKey(
>>>       key='video',
>>>       transform=UniformTemporalSubsample(num_video_samples),
>>>   )
pytorchvideo.transforms.Callable

Callable type; Callable[[int], str] is a function of (int) -> str.

The subscription syntax must always be used with exactly two values: the argument list and the return type. The argument list must be a list of types or ellipsis; the return type must be a single type.

There is no syntax to indicate optional or keyword arguments, such function types are rarely used as callback types.

alias of Callable

class pytorchvideo.transforms.ConvertUint8ToFloat[source]

Bases: torch.nn.modules.module.Module

Converts a video from dtype uint8 to dtype float32.

forward(x)[source]
Parameters

x (torch.Tensor) – video tensor with shape (C, T, H, W).

Return type

torch.Tensor

training
pytorchvideo.transforms.Dict

The central part of internal API.

This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.

alias of Dict

class pytorchvideo.transforms.Div255[source]

Bases: torch.nn.modules.module.Module

nn.Module wrapper for pytorchvideo.transforms.functional.div_255.

forward(x)[source]

Scale clip frames from [0, 255] to [0, 1]. :param x: A tensor of the clip’s RGB frames with shape:

(C, T, H, W).

Returns

x (Tensor) – Scaled tensor by dividing 255.

Parameters

x (Tensor) –

Return type

torch.Tensor

training
pytorchvideo.transforms.List

The central part of internal API.

This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.

alias of List

class pytorchvideo.transforms.Normalize(mean, std, inplace=False)[source]

Bases: torchvision.transforms.transforms.Normalize

Normalize the (CTHW) video clip by mean subtraction and division by standard deviation

Parameters
  • mean (3-tuple) – pixel RGB mean

  • std (3-tuple) – pixel RGB standard deviation

  • inplace (boolean) – whether do in-place normalization

forward(x)[source]
Parameters

x (torch.Tensor) – video tensor with shape (C, T, H, W).

Return type

torch.Tensor

training
class pytorchvideo.transforms.OpSampler(transforms_list, transforms_prob=None, num_sample_op=1, randomly_sample_depth=False, replacement=False)[source]

Bases: torch.nn.modules.module.Module

Given a list of transforms with weights, OpSampler applies weighted sampling to select n transforms, which are then applied sequentially to the input.

__init__(transforms_list, transforms_prob=None, num_sample_op=1, randomly_sample_depth=False, replacement=False)[source]
Parameters
  • transforms_list (List[Callable]) – A list of tuples of all available transforms to sample from.

  • transforms_prob (Optional[List[float]]) – The probabilities associated with each transform in transforms_list. If not provided, the sampler assumes a uniform distribution over all transforms. They do not need to sum up to one but weights need to be positive.

  • num_sample_op (int) – Number of transforms to sample and apply to input.

  • randomly_sample_depth (bool) – If randomly_sample_depth is True, then uniformly sample the number of transforms to apply, between 1 and num_sample_op.

  • replacement (bool) – If replacement is True, transforms are drawn with replacement.

forward(x)[source]
Parameters

x (torch.Tensor) – Input tensor.

Return type

torch.Tensor

training
class pytorchvideo.transforms.Permute(dims)[source]

Bases: torch.nn.modules.module.Module

Permutes the dimensions of a video.

__init__(dims)[source]
Parameters

dims (Tuple[int]) – The desired ordering of dimensions.

forward(x)[source]
Parameters

x (torch.Tensor) – video tensor whose dimensions are to be permuted.

Return type

torch.Tensor

training
class pytorchvideo.transforms.RandomResizedCrop(target_height, target_width, scale, aspect_ratio, shift=False, log_uniform_ratio=True, interpolation='bilinear', num_tries=10)[source]

Bases: torch.nn.modules.module.Module

nn.Module wrapper for pytorchvideo.transforms.functional.random_resized_crop.

__call__(x)[source]
Parameters

x (torch.Tensor) – Input video tensor with shape (C, T, H, W).

Return type

torch.Tensor

training
class pytorchvideo.transforms.RandomShortSideScale(min_size, max_size)[source]

Bases: torch.nn.modules.module.Module

nn.Module wrapper for pytorchvideo.transforms.functional.short_side_scale. The size parameter is chosen randomly in [min_size, max_size].

forward(x)[source]
Parameters

x (torch.Tensor) – video tensor with shape (C, T, H, W).

Return type

torch.Tensor

training
class pytorchvideo.transforms.RemoveKey(key)[source]

Bases: torch.nn.modules.module.Module

Removes the given key from the input dict. Useful for removing modalities from a video clip that aren’t needed.

__call__(x)[source]
Parameters

x (Dict[str, torch.Tensor]) – video clip dict.

Return type

Dict[str, torch.Tensor]

training
class pytorchvideo.transforms.ShortSideScale(size)[source]

Bases: torch.nn.modules.module.Module

nn.Module wrapper for pytorchvideo.transforms.functional.short_side_scale.

forward(x)[source]
Parameters

x (torch.Tensor) – video tensor with shape (C, T, H, W).

Return type

torch.Tensor

training
pytorchvideo.transforms.Tuple

Tuple type; Tuple[X, Y] is the cross-product type of X and Y.

Example: Tuple[T1, T2] is a tuple of two elements corresponding to type variables T1 and T2. Tuple[int, float, str] is a tuple of an int, a float and a string.

To specify a variable-length tuple of homogeneous type, use Tuple[T, …].

alias of Tuple

class pytorchvideo.transforms.UniformCropVideo(size, video_key='video', aug_index_key='aug_index')[source]

Bases: torch.nn.modules.module.Module

nn.Module wrapper for pytorchvideo.transforms.functional.uniform_crop.

__call__(x)[source]
Parameters

x (Dict[str, torch.Tensor]) – video clip dict.

Return type

Dict[str, torch.Tensor]

training
class pytorchvideo.transforms.UniformTemporalSubsample(num_samples)[source]

Bases: torch.nn.modules.module.Module

nn.Module wrapper for pytorchvideo.transforms.functional.uniform_temporal_subsample.

forward(x)[source]
Parameters

x (torch.Tensor) – video tensor with shape (C, T, H, W).

Return type

torch.Tensor

training
class pytorchvideo.transforms.UniformTemporalSubsampleRepeated(frame_ratios)[source]

Bases: torch.nn.modules.module.Module

nn.Module wrapper for pytorchvideo.transforms.functional.uniform_temporal_subsample_repeated.

forward(x)[source]
Parameters

x (torch.Tensor) – video tensor with shape (C, T, H, W).

training

pytorchvideo.transforms.functional

pytorchvideo.transforms.functional.Tuple

Tuple type; Tuple[X, Y] is the cross-product type of X and Y.

Example: Tuple[T1, T2] is a tuple of two elements corresponding to type variables T1 and T2. Tuple[int, float, str] is a tuple of an int, a float and a string.

To specify a variable-length tuple of homogeneous type, use Tuple[T, …].

alias of Tuple

pytorchvideo.transforms.functional.uniform_temporal_subsample(x, num_samples, temporal_dim=- 3)[source]

Uniformly subsamples num_samples indices from the temporal dimension of the video. When num_samples is larger than the size of temporal dimension of the video, it will sample frames based on nearest neighbor interpolation.

Parameters
  • x (torch.Tensor) – A video tensor with dimension larger than one with torch tensor type includes int, long, float, complex, etc.

  • num_samples (int) – The number of equispaced samples to be selected

  • temporal_dim (int) – dimension of temporal to perform temporal subsample.

Returns

An x-like Tensor with subsampled temporal dimension.

Return type

torch.Tensor

pytorchvideo.transforms.functional.short_side_scale(x, size, interpolation='bilinear', backend='pytorch')[source]

Determines the shorter spatial dim of the video (i.e. width or height) and scales it to the given size. To maintain aspect ratio, the longer side is then scaled accordingly. :param x: A video tensor of shape (C, T, H, W) and type torch.float32. :type x: torch.Tensor :param size: The size the shorter side is scaled to. :type size: int :param interpolation: Algorithm used for upsampling,

options: nearest’ | ‘linear’ | ‘bilinear’ | ‘bicubic’ | ‘trilinear’ | ‘area’

Parameters
Returns

An x-like Tensor with scaled spatial dims.

Return type

torch.Tensor

pytorchvideo.transforms.functional.uniform_temporal_subsample_repeated(frames, frame_ratios, temporal_dim=- 3)[source]
Prepare output as a list of tensors subsampled from the input frames. Each tensor

maintain a unique copy of subsampled frames, which corresponds to a unique pathway.

Parameters
  • frames (tensor) – frames of images sampled from the video. Expected to have torch tensor (including int, long, float, complex, etc) with dimension larger than one.

  • frame_ratios (tuple) – ratio to perform temporal down-sampling for each pathways.

  • temporal_dim (int) – dimension of temporal.

Returns

frame_list (tuple) – list of tensors as output.

Return type

Tuple[torch.Tensor]

pytorchvideo.transforms.functional.convert_to_one_hot(targets, num_class, label_smooth=0.0)[source]

This function converts target class indices to one-hot vectors, given the number of classes.

Parameters
  • targets (torch.Tensor) – Index labels to be converted.

  • num_class (int) – Total number of classes.

  • label_smooth (float) – Label smooth value for non-target classes. Label smooth is disabled by default (0).

Return type

torch.Tensor

pytorchvideo.transforms.functional.short_side_scale_with_boxes(images, boxes, size, interpolation='bilinear', backend='pytorch')[source]

Perform a spatial short scale jittering on the given images and corresponding boxes. :param images: images to perform scale jitter. Dimension is

channel x num frames x height x width.

Parameters
  • boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.

  • size (int) – The size the shorter side is scaled to.

  • interpolation (str) – Algorithm used for upsampling, options: nearest’ | ‘linear’ | ‘bilinear’ | ‘bicubic’ | ‘trilinear’ | ‘area’

  • backend (str) – backend used to perform interpolation. Options includes pytorch as default, and opencv. Note that opencv and pytorch behave differently on linear interpolation on some versions. https://discuss.pytorch.org/t/pytorch-linear-interpolation-is-different-from-pil-opencv/71181

  • images (tensor) –

Returns

(tensor)

the scaled images with dimension of

channel x num frames x height x width.

(tensor): the scaled boxes with dimension of

num boxes x 4.

Return type

Tuple[torch.Tensor, numpy.ndarray]

pytorchvideo.transforms.functional.random_short_side_scale_with_boxes(images, boxes, min_size, max_size, interpolation='bilinear', backend='pytorch')[source]

Perform a spatial short scale jittering on the given images and corresponding boxes. :param images: images to perform scale jitter. Dimension is

channel x num frames x height x width.

Parameters
  • boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.

  • min_size (int) – the minimal size to scale the frames.

  • max_size (int) – the maximal size to scale the frames.

  • interpolation (str) – Algorithm used for upsampling, options: nearest’ | ‘linear’ | ‘bilinear’ | ‘bicubic’ | ‘trilinear’ | ‘area’

  • backend (str) – backend used to perform interpolation. Options includes pytorch as default, and opencv. Note that opencv and pytorch behave differently on linear interpolation on some versions. https://discuss.pytorch.org/t/pytorch-linear-interpolation-is-different-from-pil-opencv/71181

  • images (tensor) –

Returns

(tensor)

the scaled images with dimension of

channel x num frames x height x width.

(tensor): the scaled boxes with dimension of

num boxes x 4.

Return type

Tuple[torch.Tensor, torch.Tensor]

pytorchvideo.transforms.functional.random_crop_with_boxes(images, size, boxes)[source]

Perform random spatial crop on the given images and corresponding boxes. :param images: images to perform random crop. The dimension is

channel x num frames x height x width.

Parameters
  • size (int) – the size of height and width to crop on the image.

  • boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.

  • images (tensor) –

Returns

cropped (tensor)

cropped images with dimension of

channel x num frames x height x width.

cropped_boxes (tensor): the cropped boxes with dimension of

num boxes x 4.

Return type

Tuple[torch.Tensor, torch.Tensor]

pytorchvideo.transforms.functional.uniform_crop(images, size, spatial_idx)[source]

Perform uniform spatial sampling on the images and corresponding boxes. :param images: images to perform uniform crop. The dimension is

channel x num frames x height x width.

Parameters
  • size (int) – size of height and weight to crop the images.

  • spatial_idx (int) – 0, 1, or 2 for left, center, and right crop if width is larger than height. Or 0, 1, or 2 for top, center, and bottom crop if height is larger than width.

  • images (tensor) –

Returns

cropped (tensor)

images with dimension of

channel x num frames x height x width.

Return type

torch.Tensor

pytorchvideo.transforms.functional.uniform_crop_with_boxes(images, size, spatial_idx, boxes)[source]

Perform uniform spatial sampling on the images and corresponding boxes. :param images: images to perform uniform crop. The dimension is

channel x num frames x height x width.

Parameters
  • size (int) – size of height and weight to crop the images.

  • spatial_idx (int) – 0, 1, or 2 for left, center, and right crop if width is larger than height. Or 0, 1, or 2 for top, center, and bottom crop if height is larger than width.

  • boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.

  • images (tensor) –

Returns

cropped (tensor)

images with dimension of

channel x num frames x height x width.

cropped_boxes (tensor): the cropped boxes with dimension of

num boxes x 4.

Return type

Tuple[torch.Tensor, numpy.ndarray]

pytorchvideo.transforms.functional.horizontal_flip_with_boxes(prob, images, boxes)[source]

Perform horizontal flip on the given images and corresponding boxes. :param prob: probility to flip the images. :type prob: float :param images: images to perform horizontal flip, the dimension is

channel x num frames x height x width.

Parameters
  • boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.

  • prob (float) –

  • images (tensor) –

Returns

images (tensor)

images with dimension of

channel x num frames x height x width.

flipped_boxes (tensor): the flipped boxes with dimension of

num boxes x 4.

Return type

Tuple[torch.Tensor, torch.Tensor]

pytorchvideo.transforms.functional.clip_boxes_to_image(boxes, height, width)[source]

Clip an array of boxes to an image with the given height and width. :param boxes: bounding boxes to perform clipping.

Dimension is num boxes x 4.

Parameters
  • height (int) – given image height.

  • width (int) – given image width.

  • boxes (tensor) –

Returns

clipped_boxes (tensor)

the clipped boxes with dimension of

num boxes x 4.

Return type

torch.Tensor

pytorchvideo.transforms.functional.crop_boxes(boxes, x_offset, y_offset)[source]

Peform crop on the bounding boxes given the offsets. :param boxes: bounding boxes to peform crop. The dimension

is num boxes x 4.

Parameters
  • x_offset (int) – cropping offset in the x axis.

  • y_offset (int) – cropping offset in the y axis.

  • boxes (torch.Tensor) –

Returns

cropped_boxes (torch.Tensor)

the cropped boxes with dimension of

num boxes x 4.

Return type

torch.Tensor

pytorchvideo.transforms.functional.random_resized_crop(frames, target_height, target_width, scale, aspect_ratio, shift=False, log_uniform_ratio=True, interpolation='bilinear', num_tries=10)[source]

Crop the given images to random size and aspect ratio. A crop of random size relative to the original size and a random aspect ratio is made. This crop is finally resized to given size. This is popularly used to train the Inception networks.

Parameters
  • frames (torch.Tensor) – Video tensor to be resized with shape (C, T, H, W).

  • target_height (int) – Desired height after cropping.

  • target_width (int) – Desired width after cropping.

  • scale (Tuple[float, float]) – Scale range of Inception-style area based random resizing. Should be between 0.0 and 1.0.

  • aspect_ratio (Tuple[float, float]) – Aspect ratio range of Inception-style area based random resizing. Should be between 0.0 and +infinity.

  • shift (bool) – Bool that determines whether or not to sample two different boxes (for cropping) for the first and last frame. If True, it then linearly interpolates the two boxes for other frames. If False, the same box is cropped for every frame. Default is False.

  • log_uniform_ratio (bool) – Whether to use a log-uniform distribution to sample the aspect ratio. Default is True.

  • interpolation (str) – Algorithm used for upsampling. Currently supports ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’. Default is ‘bilinear’.

  • num_tries (int) – The number of times to attempt a randomly resized crop. Falls back to a central crop after all attempts are exhausted. Default is 10.

Returns

cropped (tensor) – A cropped video tensor of shape (C, T, target_height, target_width).

Return type

torch.Tensor

pytorchvideo.transforms.functional.div_255(x)[source]

Divide the given tensor x by 255.

Parameters

x (torch.Tensor) – The input tensor.

Returns

y (torch.Tensor) – Scaled tensor by dividing 255.

Return type

torch.Tensor

Read the Docs v: stable
Versions
latest
stable
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.