pytorchvideo.transforms¶
-
class
pytorchvideo.transforms.
AugMix
(magnitude=3, alpha=1.0, width=3, depth=- 1, transform_hparas=None, sampling_hparas=None)[source]¶ Bases:
object
This implements AugMix for video. AugMix generates several chains of augmentations on the original video, which are then mixed together with each other and with the original video to create an augmented video. The input video tensor should have shape (T, C, H, W).
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty (https://arxiv.org/pdf/1912.02781.pdf)
-
__init__
(magnitude=3, alpha=1.0, width=3, depth=- 1, transform_hparas=None, sampling_hparas=None)[source]¶ - Parameters
magnitude (int) – Magnitude used for transform function. Default is 3.
alpha (float) – Parameter for choosing mixing weights from the beta and Dirichlet distributions. Default is 1.0.
width (int) – The number of transformation chains. Default is 3.
depth (int) – The number of transformations in each chain. If depth is -1, each chain will have a random length between 1 and 3 inclusive. Default is -1.
transform_hparas (Optional[Dict[Any]]) – Transform hyper parameters. Needs to have key fill. By default, the fill value is (0.5, 0.5, 0.5).
sampling_hparas (Optional[Dict[Any]]) – Hyper parameters for sampling. If gaussian sampling is used, it needs to have key sampling_std. By default, it uses SAMPLING_AUGMIX_DEFAULT_HPARAS.
- Return type
-
__call__
(video)[source]¶ Perform AugMix to the input video tensor.
- Parameters
video (torch.Tensor) – Input video tensor with shape (T, C, H, W).
- Return type
-
-
class
pytorchvideo.transforms.
MixVideo
(cutmix_prob=0.5, mixup_alpha=1.0, cutmix_alpha=1.0, label_smoothing=0.0, num_classes=400)[source]¶ Bases:
torch.nn.modules.module.Module
Stochastically applies either MixUp or CutMix to the input video.
-
__init__
(cutmix_prob=0.5, mixup_alpha=1.0, cutmix_alpha=1.0, label_smoothing=0.0, num_classes=400)[source]¶ - Parameters
cutmix_prob (float) – Probability of using CutMix. MixUp will be used with probability 1 - cutmix_prob. If cutmix_prob is 0, then MixUp is always used. If cutmix_prob is 1, then CutMix is always used.
mixup_alpha (float) – MixUp alpha value.
cutmix_alpha (float) – CutMix alpha value.
label_smoothing (float) – Label smoothing value.
num_classes (int) – Number of total classes.
-
forward
(x, labels)[source]¶ The input is a batch of samples and their corresponding labels.
- Parameters
x (torch.Tensor) – Input tensor. The input should be a batch of videos with shape (B, C, T, H, W).
labels (torch.Tensor) – Labels for input with shape (B).
-
-
class
pytorchvideo.transforms.
CutMix
(alpha=1.0, label_smoothing=0.0, num_classes=400)[source]¶ Bases:
torch.nn.modules.module.Module
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (https://arxiv.org/abs/1905.04899)
-
__init__
(alpha=1.0, label_smoothing=0.0, num_classes=400)[source]¶ This implements CutMix for videos.
-
forward
(x, labels)[source]¶ The input is a batch of samples and their corresponding labels.
- Parameters
x (torch.Tensor) – Input tensor. The input should be a batch of videos with shape (B, C, T, H, W).
labels (torch.Tensor) – Labels for input with shape (B).
- Return type
Tuple[torch.Tensor, torch.Tensor]
-
-
class
pytorchvideo.transforms.
MixUp
(alpha=1.0, label_smoothing=0.0, num_classes=400)[source]¶ Bases:
torch.nn.modules.module.Module
Mixup: Beyond Empirical Risk Minimization (https://arxiv.org/abs/1710.09412)
-
__init__
(alpha=1.0, label_smoothing=0.0, num_classes=400)[source]¶ This implements MixUp for videos.
-
forward
(x, labels)[source]¶ The input is a batch of samples and their corresponding labels.
- Parameters
x (torch.Tensor) – Input tensor. The input should be a batch of videos with shape (B, C, T, H, W).
labels (torch.Tensor) – Labels for input with shape (B).
- Return type
Tuple[torch.Tensor, torch.Tensor]
-
-
class
pytorchvideo.transforms.
RandAugment
(magnitude=9, num_layers=2, prob=0.5, transform_hparas=None, sampling_type='gaussian', sampling_hparas=None)[source]¶ Bases:
object
This implements RandAugment for video. Assume the input video tensor with shape (T, C, H, W).
RandAugment: Practical automated data augmentation with a reduced search space (https://arxiv.org/abs/1909.13719)
-
__init__
(magnitude=9, num_layers=2, prob=0.5, transform_hparas=None, sampling_type='gaussian', sampling_hparas=None)[source]¶ This implements RandAugment for video.
- Parameters
magnitude (int) – Magnitude used for transform function.
num_layers (int) – How many transform functions to apply for each augmentation.
prob (float) – The probablity of applying each transform function.
transform_hparas (Optional[Dict[Any]]) – Transform hyper parameters. Needs to have key fill. By default, it uses transform_default_hparas.
sampling_type (str) – Sampling method for magnitude of transform. It should be either gaussian or uniform.
sampling_hparas (Optional[Dict[Any]]) – Hyper parameters for sampling. If gaussian sampling is used, it needs to have key sampling_std. By default, it uses SAMPLING_RANDAUG_DEFAULT_HPARAS.
- Return type
-
__call__
(video)[source]¶ Perform RandAugment to the input video tensor.
- Parameters
video (torch.Tensor) – Input video tensor with shape (T, C, H, W).
- Return type
-
-
pytorchvideo.transforms.
create_video_transform
(mode, video_key=None, remove_key=None, num_samples=8, convert_to_float=True, video_mean=0.45, 0.45, 0.45, video_std=0.225, 0.225, 0.225, min_size=256, max_size=320, crop_size=224, horizontal_flip_prob=0.5, aug_type='default', aug_paras=None, random_resized_crop_paras=None)[source]¶ Function that returns a factory default callable video transform, with default parameters that can be modified. The transform that is returned depends on the
mode
parameter: when in “train” mode, we use randomized transformations, and when in “val” mode, we use the corresponding deterministic transformations. Depending on whethervideo_key
is set, the input to the transform can either be a video tensor or a dict containingvideo_key
that maps to a video tensor. The video tensor should be of shape (C, T, H, W).“train” mode “val” mode
- (UniformTemporalSubsample) (UniformTemporalSubsample)
↓
- (RandAugment/AugMix) ↓
↓
- (ConvertUint8ToFloat) (ConvertUint8ToFloat)
↓ ↓
- Normalize Normalize
↓ ↓
- RandomResizedCrop/RandomShortSideScale+RandomCrop ShortSideScale+CenterCrop
↓
RandomHorizontalFlip
- (transform) = transform can be included or excluded in the returned
composition of transformations
- Parameters
mode (str) – ‘train’ or ‘val’. We use randomized transformations in ‘train’ mode, and we use the corresponding deterministic transformation in ‘val’ mode.
video_key (str, optional) – Optional key for video value in dictionary input. When video_key is None, the input is assumed to be a torch.Tensor. Default is None.
remove_key (List[str], optional) – Optional key to remove from a dictionary input. Default is None.
num_samples (int, optional) – The number of equispaced samples to be selected in UniformTemporalSubsample. If None, then UniformTemporalSubsample will not be used. Default is 8.
convert_to_float (bool) – If True, converts images from uint8 to float. Otherwise, leaves the image as is. Default is True.
video_mean (Tuple[float, float, float]) – Sequence of means for each channel to normalize to zero mean and unit variance. Default is (0.45, 0.45, 0.45).
video_std (Tuple[float, float, float]) – Sequence of standard deviations for each channel to normalize to zero mean and unit variance. Default is (0.225, 0.225, 0.225).
min_size (int) – Minimum size that the shorter side is scaled to for RandomShortSideScale. If in “val” mode, this is the exact size the the shorter side is scaled to for ShortSideScale. Default is 256.
max_size (int) – Maximum size that the shorter side is scaled to for RandomShortSideScale. Default is 340.
crop_size (int or Tuple[int, int]) – Desired output size of the crop for RandomCrop in “train” mode and CenterCrop in “val” mode. If size is an int instead of sequence like (h, w), a square crop (size, size) is made. Default is 224.
horizontal_flip_prob (float) – Probability of the video being flipped in RandomHorizontalFlip. Default value is 0.5.
aug_type (str) – Currently supports ‘default’, ‘randaug’, or ‘augmix’. No augmentations other than RandomShortSideScale and RandomCrop area performed when aug_type is ‘default’. RandAugment is used when aug_type is ‘randaug’ and AugMix is used when aug_type is ‘augmix’. Default is ‘default’.
aug_paras (Dict[str, Any], optional) – A dictionary that contains the necessary parameters for the augmentation set in aug_type. If any parameters are missing or if None, default parameters will be used. Default is None.
random_resized_crop_paras (Dict[str, Any], optional) – A dictionary that contains the necessary parameters for Inception-style cropping. This crops the given videos to random size and aspect ratio. A crop of random size relative to the original size and a random aspect ratio is made. This crop is finally resized to given size. This is popularly used to train the Inception networks. If any parameters are missing or if None, default parameters in _RANDOM_RESIZED_CROP_DEFAULT_PARAS will be used. If None, RandomShortSideScale and RandomCrop will be used as a fallback. Default is None.
- Returns
A factory-default callable composition of transforms.
- Return type
Union[Callable[[torch.Tensor], torch.Tensor], Callable[[Dict[str, torch.Tensor]], Dict[str, torch.Tensor]]]
-
class
pytorchvideo.transforms.
ApplyTransformToKey
(key, transform)[source]¶ Bases:
object
Applies transform to key of dictionary input.
- Parameters
key (str) – the dictionary key the transform is applied to
transform (callable) – the transform that is applied
Example
>>> transforms.ApplyTransformToKey( >>> key='video', >>> transform=UniformTemporalSubsample(num_video_samples), >>> )
-
pytorchvideo.transforms.
Callable
¶ Callable type; Callable[[int], str] is a function of (int) -> str.
The subscription syntax must always be used with exactly two values: the argument list and the return type. The argument list must be a list of types or ellipsis; the return type must be a single type.
There is no syntax to indicate optional or keyword arguments, such function types are rarely used as callback types.
alias of Callable
-
class
pytorchvideo.transforms.
ConvertUint8ToFloat
[source]¶ Bases:
torch.nn.modules.module.Module
Converts a video from dtype uint8 to dtype float32.
-
forward
(x)[source]¶ - Parameters
x (torch.Tensor) – video tensor with shape (C, T, H, W).
- Return type
-
training
¶
-
-
pytorchvideo.transforms.
Dict
¶ The central part of internal API.
This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.
alias of Dict
-
class
pytorchvideo.transforms.
Div255
[source]¶ Bases:
torch.nn.modules.module.Module
nn.Module
wrapper forpytorchvideo.transforms.functional.div_255
.-
forward
(x)[source]¶ Scale clip frames from [0, 255] to [0, 1]. :param x: A tensor of the clip’s RGB frames with shape:
(C, T, H, W).
- Returns
x (Tensor) – Scaled tensor by dividing 255.
- Parameters
x (Tensor) –
- Return type
-
training
¶
-
-
pytorchvideo.transforms.
List
¶ The central part of internal API.
This represents a generic version of type ‘origin’ with type arguments ‘params’. There are two kind of these aliases: user defined and special. The special ones are wrappers around builtin collections and ABCs in collections.abc. These must have ‘name’ always set. If ‘inst’ is False, then the alias can’t be instantiated, this is used by e.g. typing.List and typing.Dict.
alias of List
-
class
pytorchvideo.transforms.
Normalize
(mean, std, inplace=False)[source]¶ Bases:
torchvision.transforms.transforms.Normalize
Normalize the (CTHW) video clip by mean subtraction and division by standard deviation
- Parameters
mean (3-tuple) – pixel RGB mean
std (3-tuple) – pixel RGB standard deviation
inplace (boolean) – whether do in-place normalization
-
forward
(x)[source]¶ - Parameters
x (torch.Tensor) – video tensor with shape (C, T, H, W).
- Return type
-
training
¶
-
class
pytorchvideo.transforms.
OpSampler
(transforms_list, transforms_prob=None, num_sample_op=1, randomly_sample_depth=False, replacement=False)[source]¶ Bases:
torch.nn.modules.module.Module
Given a list of transforms with weights, OpSampler applies weighted sampling to select n transforms, which are then applied sequentially to the input.
-
__init__
(transforms_list, transforms_prob=None, num_sample_op=1, randomly_sample_depth=False, replacement=False)[source]¶ - Parameters
transforms_list (List[Callable]) – A list of tuples of all available transforms to sample from.
transforms_prob (Optional[List[float]]) – The probabilities associated with each transform in transforms_list. If not provided, the sampler assumes a uniform distribution over all transforms. They do not need to sum up to one but weights need to be positive.
num_sample_op (int) – Number of transforms to sample and apply to input.
randomly_sample_depth (bool) – If randomly_sample_depth is True, then uniformly sample the number of transforms to apply, between 1 and num_sample_op.
replacement (bool) – If replacement is True, transforms are drawn with replacement.
-
forward
(x)[source]¶ - Parameters
x (torch.Tensor) – Input tensor.
- Return type
-
training
¶
-
-
class
pytorchvideo.transforms.
Permute
(dims)[source]¶ Bases:
torch.nn.modules.module.Module
Permutes the dimensions of a video.
-
forward
(x)[source]¶ - Parameters
x (torch.Tensor) – video tensor whose dimensions are to be permuted.
- Return type
-
training
¶
-
-
class
pytorchvideo.transforms.
RandomResizedCrop
(target_height, target_width, scale, aspect_ratio, shift=False, log_uniform_ratio=True, interpolation='bilinear', num_tries=10)[source]¶ Bases:
torch.nn.modules.module.Module
nn.Module
wrapper forpytorchvideo.transforms.functional.random_resized_crop
.-
__call__
(x)[source]¶ - Parameters
x (torch.Tensor) – Input video tensor with shape (C, T, H, W).
- Return type
-
training
¶
-
-
class
pytorchvideo.transforms.
RandomShortSideScale
(min_size, max_size)[source]¶ Bases:
torch.nn.modules.module.Module
nn.Module
wrapper forpytorchvideo.transforms.functional.short_side_scale
. The size parameter is chosen randomly in [min_size, max_size].-
forward
(x)[source]¶ - Parameters
x (torch.Tensor) – video tensor with shape (C, T, H, W).
- Return type
-
training
¶
-
-
class
pytorchvideo.transforms.
RemoveKey
(key)[source]¶ Bases:
torch.nn.modules.module.Module
Removes the given key from the input dict. Useful for removing modalities from a video clip that aren’t needed.
-
__call__
(x)[source]¶ - Parameters
x (Dict[str, torch.Tensor]) – video clip dict.
- Return type
Dict[str, torch.Tensor]
-
training
¶
-
-
class
pytorchvideo.transforms.
ShortSideScale
(size)[source]¶ Bases:
torch.nn.modules.module.Module
nn.Module
wrapper forpytorchvideo.transforms.functional.short_side_scale
.-
forward
(x)[source]¶ - Parameters
x (torch.Tensor) – video tensor with shape (C, T, H, W).
- Return type
-
training
¶
-
-
pytorchvideo.transforms.
Tuple
¶ Tuple type; Tuple[X, Y] is the cross-product type of X and Y.
Example: Tuple[T1, T2] is a tuple of two elements corresponding to type variables T1 and T2. Tuple[int, float, str] is a tuple of an int, a float and a string.
To specify a variable-length tuple of homogeneous type, use Tuple[T, …].
alias of Tuple
-
class
pytorchvideo.transforms.
UniformCropVideo
(size, video_key='video', aug_index_key='aug_index')[source]¶ Bases:
torch.nn.modules.module.Module
nn.Module
wrapper forpytorchvideo.transforms.functional.uniform_crop
.-
__call__
(x)[source]¶ - Parameters
x (Dict[str, torch.Tensor]) – video clip dict.
- Return type
Dict[str, torch.Tensor]
-
training
¶
-
-
class
pytorchvideo.transforms.
UniformTemporalSubsample
(num_samples)[source]¶ Bases:
torch.nn.modules.module.Module
nn.Module
wrapper forpytorchvideo.transforms.functional.uniform_temporal_subsample
.-
forward
(x)[source]¶ - Parameters
x (torch.Tensor) – video tensor with shape (C, T, H, W).
- Return type
-
training
¶
-
-
class
pytorchvideo.transforms.
UniformTemporalSubsampleRepeated
(frame_ratios)[source]¶ Bases:
torch.nn.modules.module.Module
nn.Module
wrapper forpytorchvideo.transforms.functional.uniform_temporal_subsample_repeated
.-
forward
(x)[source]¶ - Parameters
x (torch.Tensor) – video tensor with shape (C, T, H, W).
-
training
¶
-
pytorchvideo.transforms.functional¶
-
pytorchvideo.transforms.functional.
Tuple
¶ Tuple type; Tuple[X, Y] is the cross-product type of X and Y.
Example: Tuple[T1, T2] is a tuple of two elements corresponding to type variables T1 and T2. Tuple[int, float, str] is a tuple of an int, a float and a string.
To specify a variable-length tuple of homogeneous type, use Tuple[T, …].
alias of Tuple
-
pytorchvideo.transforms.functional.
uniform_temporal_subsample
(x, num_samples, temporal_dim=- 3)[source]¶ Uniformly subsamples num_samples indices from the temporal dimension of the video. When num_samples is larger than the size of temporal dimension of the video, it will sample frames based on nearest neighbor interpolation.
- Parameters
x (torch.Tensor) – A video tensor with dimension larger than one with torch tensor type includes int, long, float, complex, etc.
num_samples (int) – The number of equispaced samples to be selected
temporal_dim (int) – dimension of temporal to perform temporal subsample.
- Returns
An x-like Tensor with subsampled temporal dimension.
- Return type
-
pytorchvideo.transforms.functional.
short_side_scale
(x, size, interpolation='bilinear', backend='pytorch')[source]¶ Determines the shorter spatial dim of the video (i.e. width or height) and scales it to the given size. To maintain aspect ratio, the longer side is then scaled accordingly. :param x: A video tensor of shape (C, T, H, W) and type torch.float32. :type x: torch.Tensor :param size: The size the shorter side is scaled to. :type size: int :param interpolation: Algorithm used for upsampling,
options: nearest’ | ‘linear’ | ‘bilinear’ | ‘bicubic’ | ‘trilinear’ | ‘area’
- Parameters
backend (str) – backend used to perform interpolation. Options includes pytorch as default, and opencv. Note that opencv and pytorch behave differently on linear interpolation on some versions. https://discuss.pytorch.org/t/pytorch-linear-interpolation-is-different-from-pil-opencv/71181
x (torch.Tensor) –
size (int) –
interpolation (str) –
- Returns
An x-like Tensor with scaled spatial dims.
- Return type
-
pytorchvideo.transforms.functional.
uniform_temporal_subsample_repeated
(frames, frame_ratios, temporal_dim=- 3)[source]¶ - Prepare output as a list of tensors subsampled from the input frames. Each tensor
maintain a unique copy of subsampled frames, which corresponds to a unique pathway.
- Parameters
- Returns
frame_list (tuple) – list of tensors as output.
- Return type
Tuple[torch.Tensor]
-
pytorchvideo.transforms.functional.
convert_to_one_hot
(targets, num_class, label_smooth=0.0)[source]¶ This function converts target class indices to one-hot vectors, given the number of classes.
- Parameters
targets (torch.Tensor) – Index labels to be converted.
num_class (int) – Total number of classes.
label_smooth (float) – Label smooth value for non-target classes. Label smooth is disabled by default (0).
- Return type
-
pytorchvideo.transforms.functional.
short_side_scale_with_boxes
(images, boxes, size, interpolation='bilinear', backend='pytorch')[source]¶ Perform a spatial short scale jittering on the given images and corresponding boxes. :param images: images to perform scale jitter. Dimension is
channel x num frames x height x width.
- Parameters
boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.
size (int) – The size the shorter side is scaled to.
interpolation (str) – Algorithm used for upsampling, options: nearest’ | ‘linear’ | ‘bilinear’ | ‘bicubic’ | ‘trilinear’ | ‘area’
backend (str) – backend used to perform interpolation. Options includes pytorch as default, and opencv. Note that opencv and pytorch behave differently on linear interpolation on some versions. https://discuss.pytorch.org/t/pytorch-linear-interpolation-is-different-from-pil-opencv/71181
images (tensor) –
- Returns
(tensor) –
- the scaled images with dimension of
channel x num frames x height x width.
- (tensor): the scaled boxes with dimension of
num boxes x 4.
- Return type
Tuple[torch.Tensor, numpy.ndarray]
-
pytorchvideo.transforms.functional.
random_short_side_scale_with_boxes
(images, boxes, min_size, max_size, interpolation='bilinear', backend='pytorch')[source]¶ Perform a spatial short scale jittering on the given images and corresponding boxes. :param images: images to perform scale jitter. Dimension is
channel x num frames x height x width.
- Parameters
boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.
min_size (int) – the minimal size to scale the frames.
max_size (int) – the maximal size to scale the frames.
interpolation (str) – Algorithm used for upsampling, options: nearest’ | ‘linear’ | ‘bilinear’ | ‘bicubic’ | ‘trilinear’ | ‘area’
backend (str) – backend used to perform interpolation. Options includes pytorch as default, and opencv. Note that opencv and pytorch behave differently on linear interpolation on some versions. https://discuss.pytorch.org/t/pytorch-linear-interpolation-is-different-from-pil-opencv/71181
images (tensor) –
- Returns
(tensor) –
- the scaled images with dimension of
channel x num frames x height x width.
- (tensor): the scaled boxes with dimension of
num boxes x 4.
- Return type
Tuple[torch.Tensor, torch.Tensor]
-
pytorchvideo.transforms.functional.
random_crop_with_boxes
(images, size, boxes)[source]¶ Perform random spatial crop on the given images and corresponding boxes. :param images: images to perform random crop. The dimension is
channel x num frames x height x width.
- Parameters
size (int) – the size of height and width to crop on the image.
boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.
images (tensor) –
- Returns
cropped (tensor) –
- cropped images with dimension of
channel x num frames x height x width.
- cropped_boxes (tensor): the cropped boxes with dimension of
num boxes x 4.
- Return type
Tuple[torch.Tensor, torch.Tensor]
-
pytorchvideo.transforms.functional.
uniform_crop
(images, size, spatial_idx)[source]¶ Perform uniform spatial sampling on the images and corresponding boxes. :param images: images to perform uniform crop. The dimension is
channel x num frames x height x width.
- Parameters
- Returns
cropped (tensor) –
- images with dimension of
channel x num frames x height x width.
- Return type
-
pytorchvideo.transforms.functional.
uniform_crop_with_boxes
(images, size, spatial_idx, boxes)[source]¶ Perform uniform spatial sampling on the images and corresponding boxes. :param images: images to perform uniform crop. The dimension is
channel x num frames x height x width.
- Parameters
size (int) – size of height and weight to crop the images.
spatial_idx (int) – 0, 1, or 2 for left, center, and right crop if width is larger than height. Or 0, 1, or 2 for top, center, and bottom crop if height is larger than width.
boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.
images (tensor) –
- Returns
cropped (tensor) –
- images with dimension of
channel x num frames x height x width.
- cropped_boxes (tensor): the cropped boxes with dimension of
num boxes x 4.
- Return type
Tuple[torch.Tensor, numpy.ndarray]
-
pytorchvideo.transforms.functional.
horizontal_flip_with_boxes
(prob, images, boxes)[source]¶ Perform horizontal flip on the given images and corresponding boxes. :param prob: probility to flip the images. :type prob: float :param images: images to perform horizontal flip, the dimension is
channel x num frames x height x width.
- Parameters
boxes (tensor) – Corresponding boxes to images. Dimension is num boxes x 4.
prob (float) –
images (tensor) –
- Returns
images (tensor) –
- images with dimension of
channel x num frames x height x width.
- flipped_boxes (tensor): the flipped boxes with dimension of
num boxes x 4.
- Return type
Tuple[torch.Tensor, torch.Tensor]
-
pytorchvideo.transforms.functional.
clip_boxes_to_image
(boxes, height, width)[source]¶ Clip an array of boxes to an image with the given height and width. :param boxes: bounding boxes to perform clipping.
Dimension is num boxes x 4.
- Parameters
- Returns
clipped_boxes (tensor) –
- the clipped boxes with dimension of
num boxes x 4.
- Return type
-
pytorchvideo.transforms.functional.
crop_boxes
(boxes, x_offset, y_offset)[source]¶ Peform crop on the bounding boxes given the offsets. :param boxes: bounding boxes to peform crop. The dimension
is num boxes x 4.
- Parameters
x_offset (int) – cropping offset in the x axis.
y_offset (int) – cropping offset in the y axis.
boxes (torch.Tensor) –
- Returns
cropped_boxes (torch.Tensor) –
- the cropped boxes with dimension of
num boxes x 4.
- Return type
-
pytorchvideo.transforms.functional.
random_resized_crop
(frames, target_height, target_width, scale, aspect_ratio, shift=False, log_uniform_ratio=True, interpolation='bilinear', num_tries=10)[source]¶ Crop the given images to random size and aspect ratio. A crop of random size relative to the original size and a random aspect ratio is made. This crop is finally resized to given size. This is popularly used to train the Inception networks.
- Parameters
frames (torch.Tensor) – Video tensor to be resized with shape (C, T, H, W).
target_height (int) – Desired height after cropping.
target_width (int) – Desired width after cropping.
scale (Tuple[float, float]) – Scale range of Inception-style area based random resizing. Should be between 0.0 and 1.0.
aspect_ratio (Tuple[float, float]) – Aspect ratio range of Inception-style area based random resizing. Should be between 0.0 and +infinity.
shift (bool) – Bool that determines whether or not to sample two different boxes (for cropping) for the first and last frame. If True, it then linearly interpolates the two boxes for other frames. If False, the same box is cropped for every frame. Default is False.
log_uniform_ratio (bool) – Whether to use a log-uniform distribution to sample the aspect ratio. Default is True.
interpolation (str) – Algorithm used for upsampling. Currently supports ‘nearest’, ‘bilinear’, ‘bicubic’, ‘area’. Default is ‘bilinear’.
num_tries (int) – The number of times to attempt a randomly resized crop. Falls back to a central crop after all attempts are exhausted. Default is 10.
- Returns
cropped (tensor) – A cropped video tensor of shape (C, T, target_height, target_width).
- Return type
-
pytorchvideo.transforms.functional.
div_255
(x)[source]¶ Divide the given tensor x by 255.
- Parameters
x (torch.Tensor) – The input tensor.
- Returns
y (torch.Tensor) – Scaled tensor by dividing 255.
- Return type