pytorchvideo.models.head¶
-
class
pytorchvideo.models.head.
SequencePool
(mode)[source]¶ Sequence pool produces a single embedding from a sequence of embeddings. Currently it supports “mean” and “cls”.
-
pytorchvideo.models.head.
create_res_basic_head
(*, in_features, out_features, pool=<class 'torch.nn.modules.pooling.AvgPool3d'>, output_size=(1, 1, 1), pool_kernel_size=(1, 7, 7), pool_stride=(1, 1, 1), pool_padding=(0, 0, 0), dropout_rate=0.5, activation=None, output_with_global_average=True)[source]¶ Creates ResNet basic head. This layer performs an optional pooling operation followed by an optional dropout, a fully-connected projection, an activation layer and a global spatiotemporal averaging.
Pooling ↓ Dropout ↓ Projection ↓ Activation ↓ Averaging
Activation examples include: ReLU, Softmax, Sigmoid, and None. Pool3d examples include: AvgPool3d, MaxPool3d, AdaptiveAvgPool3d, and None.
- Parameters
in_features (int) – input channel size of the resnet head.
out_features (int) – output channel size of the resnet head.
pool (callable) – a callable that constructs resnet head pooling layer, examples include: nn.AvgPool3d, nn.MaxPool3d, nn.AdaptiveAvgPool3d, and None (not applying pooling).
pool_kernel_size (tuple) – pooling kernel size(s) when not using adaptive pooling.
pool_stride (tuple) – pooling stride size(s) when not using adaptive pooling.
pool_padding (tuple) – pooling padding size(s) when not using adaptive pooling.
output_size (tuple) – spatial temporal output size when using adaptive pooling.
activation (callable) – a callable that constructs resnet head activation layer, examples include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not applying activation).
dropout_rate (float) – dropout rate.
output_with_global_average (bool) – if True, perform global averaging on temporal and spatial dimensions and reshape output to batch_size x out_features.
- Return type
torch.nn.modules.module.Module
-
pytorchvideo.models.head.
create_vit_basic_head
(*, in_features, out_features, seq_pool_type='cls', dropout_rate=0.5, activation=None)[source]¶ Creates vision transformer basic head.
Pooling ↓ Dropout ↓ Projection ↓ Activation
Activation examples include: ReLU, Softmax, Sigmoid, and None. Pool type examples include: cls, mean and none.
- Parameters
in_features (int) – input channel size of the resnet head.
out_features (int) – output channel size of the resnet head.
pool_type (str) – Pooling type. It supports “cls”, “mean ” and “none”. If set to “cls”, it assumes the first element in the input is the cls token and returns it. If set to “mean”, it returns the mean of the entire sequence.
activation (callable) – a callable that constructs vision transformer head activation layer, examples include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not applying activation).
dropout_rate (float) – dropout rate.
seq_pool_type (str) –
- Return type
torch.nn.modules.module.Module
-
pytorchvideo.models.head.
create_res_roi_pooling_head
(*, in_features, out_features, resolution, spatial_scale, sampling_ratio=0, roi=<class 'torchvision.ops.roi_align.RoIAlign'>, pool=<class 'torch.nn.modules.pooling.AvgPool3d'>, output_size=(1, 1, 1), pool_kernel_size=(1, 7, 7), pool_stride=(1, 1, 1), pool_padding=(0, 0, 0), pool_spatial=<class 'torch.nn.modules.pooling.MaxPool2d'>, dropout_rate=0.5, activation=None, output_with_global_average=True)[source]¶ Creates ResNet RoI head. This layer performs an optional pooling operation followed by an RoI projection, an optional 2D spatial pool, an optional dropout, a fully-connected projection, an activation layer and a global spatiotemporal averaging.
- Pool3d
↓
- RoI Align
↓
- Pool2d
↓
- Dropout
↓
- Projection
↓
- Activation
↓
Averaging
Activation examples include: ReLU, Softmax, Sigmoid, and None. Pool3d examples include: AvgPool3d, MaxPool3d, AdaptiveAvgPool3d, and None. RoI examples include: detectron2.layers.ROIAlign, detectron2.layers.ROIAlignRotated,
tochvision.ops.RoIAlign and None
Pool2d examples include: MaxPool2e, AvgPool2d, and None.
- Parameters
related configs (Output) – in_features: input channel size of the resnet head. out_features: output channel size of the resnet head.
layer related configs (RoI) –
resolution (tuple): h, w sizes of the RoI interpolation. spatial_scale (float): scale the input boxes by this number sampling_ratio (int): number of inputs samples to take for each output
sample interpolation. 0 to take samples densely.
- roi (callable): a callable that constructs the roi interpolation layer,
examples include detectron2.layers.ROIAlign, detectron2.layers.ROIAlignRotated, and None.
related configs –
- pool (callable): a callable that constructs resnet head pooling layer,
examples include: nn.AvgPool3d, nn.MaxPool3d, nn.AdaptiveAvgPool3d, and None (not applying pooling).
- pool_kernel_size (tuple): pooling kernel size(s) when not using adaptive
pooling.
pool_stride (tuple): pooling stride size(s) when not using adaptive pooling. pool_padding (tuple): pooling padding size(s) when not using adaptive
pooling.
- output_size (tuple): spatial temporal output size when using adaptive
pooling.
- pool_spatial (callable): a callable that constructs the 2d pooling layer which
follows the RoI layer, examples include: nn.AvgPool2d, nn.MaxPool2d, and None (not applying spatial pooling).
related configs –
- activation (callable): a callable that constructs resnet head activation
layer, examples include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not applying activation).
related configs – dropout_rate (float): dropout rate.
related configs –
- output_with_global_average (bool): if True, perform global averaging on temporal
and spatial dimensions and reshape output to batch_size x out_features.
in_features (int) –
out_features (int) –
resolution (Tuple) –
spatial_scale (float) –
sampling_ratio (int) –
roi (Callable) –
pool (Callable) –
output_size (Tuple[int]) –
pool_kernel_size (Tuple[int]) –
pool_stride (Tuple[int]) –
pool_padding (Tuple[int]) –
pool_spatial (Callable) –
dropout_rate (float) –
activation (Callable) –
output_with_global_average (bool) –
- Return type
torch.nn.modules.module.Module
-
class
pytorchvideo.models.head.
ResNetBasicHead
(pool=None, dropout=None, proj=None, activation=None, output_pool=None)[source]¶ ResNet basic head. This layer performs an optional pooling operation followed by an optional dropout, a fully-connected projection, an optional activation layer and a global spatiotemporal averaging.
Pool3d ↓ Dropout ↓ Projection ↓ Activation ↓ Averaging
The builder can be found in create_res_basic_head.
-
__init__
(pool=None, dropout=None, proj=None, activation=None, output_pool=None)[source]¶ - Parameters
pool (torch.nn.modules) – pooling module.
dropout (torch.nn.modules) – dropout module.
proj (torch.nn.modules) – project module.
activation (torch.nn.modules) – activation module.
output_pool (torch.nn.Module) – pooling module for output.
- Return type
-
-
class
pytorchvideo.models.head.
ResNetRoIHead
(pool=None, pool_spatial=None, roi_layer=None, dropout=None, proj=None, activation=None, output_pool=None)[source]¶ ResNet RoI head. This layer performs an optional pooling operation followed by an RoI projection, an optional 2D spatial pool, an optional dropout, a fully-connected projection, an activation layer and a global spatiotemporal averaging.
- Pool3d
↓
- RoI Align
↓
- Pool2d
↓
- Dropout
↓
- Projection
↓
- Activation
↓
Averaging
The builder can be found in create_res_roi_pooling_head.
-
__init__
(pool=None, pool_spatial=None, roi_layer=None, dropout=None, proj=None, activation=None, output_pool=None)[source]¶ - Parameters
pool (torch.nn.modules) – pooling module.
pool_spatial (torch.nn.modules) – pooling module.
roi_spatial (torch.nn.modules) – RoI (Ex: Align, pool) module.
dropout (torch.nn.modules) – dropout module.
proj (torch.nn.modules) – project module.
activation (torch.nn.modules) – activation module.
output_pool (torch.nn.Module) – pooling module for output.
roi_layer (torch.nn.modules.module.Module) –
- Return type