Shortcuts

pytorchvideo.models.head

class pytorchvideo.models.head.SequencePool(mode)[source]

Sequence pool produces a single embedding from a sequence of embeddings. Currently it supports “mean” and “cls”.

__init__(mode)[source]
Parameters

mode (str) – Optionals include “cls” and “mean”. If set to “cls”, it assumes the first element in the input is the cls token and returns it. If set to “mean”, it returns the mean of the entire sequence.

Return type

None

pytorchvideo.models.head.create_res_basic_head(*, in_features, out_features, pool=<class 'torch.nn.modules.pooling.AvgPool3d'>, output_size=(1, 1, 1), pool_kernel_size=(1, 7, 7), pool_stride=(1, 1, 1), pool_padding=(0, 0, 0), dropout_rate=0.5, activation=None, output_with_global_average=True)[source]

Creates ResNet basic head. This layer performs an optional pooling operation followed by an optional dropout, a fully-connected projection, an activation layer and a global spatiotemporal averaging.

 Pooling
    ↓
 Dropout
    ↓
Projection
    ↓
Activation
    ↓
Averaging

Activation examples include: ReLU, Softmax, Sigmoid, and None. Pool3d examples include: AvgPool3d, MaxPool3d, AdaptiveAvgPool3d, and None.

Parameters
  • in_features (int) – input channel size of the resnet head.

  • out_features (int) – output channel size of the resnet head.

  • pool (callable) – a callable that constructs resnet head pooling layer, examples include: nn.AvgPool3d, nn.MaxPool3d, nn.AdaptiveAvgPool3d, and None (not applying pooling).

  • pool_kernel_size (tuple) – pooling kernel size(s) when not using adaptive pooling.

  • pool_stride (tuple) – pooling stride size(s) when not using adaptive pooling.

  • pool_padding (tuple) – pooling padding size(s) when not using adaptive pooling.

  • output_size (tuple) – spatial temporal output size when using adaptive pooling.

  • activation (callable) – a callable that constructs resnet head activation layer, examples include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not applying activation).

  • dropout_rate (float) – dropout rate.

  • output_with_global_average (bool) – if True, perform global averaging on temporal and spatial dimensions and reshape output to batch_size x out_features.

Return type

torch.nn.modules.module.Module

pytorchvideo.models.head.create_vit_basic_head(*, in_features, out_features, seq_pool_type='cls', dropout_rate=0.5, activation=None)[source]

Creates vision transformer basic head.

 Pooling
    ↓
 Dropout
    ↓
Projection
    ↓
Activation

Activation examples include: ReLU, Softmax, Sigmoid, and None. Pool type examples include: cls, mean and none.

Parameters
  • in_features (int) – input channel size of the resnet head.

  • out_features (int) – output channel size of the resnet head.

  • pool_type (str) – Pooling type. It supports “cls”, “mean ” and “none”. If set to “cls”, it assumes the first element in the input is the cls token and returns it. If set to “mean”, it returns the mean of the entire sequence.

  • activation (callable) – a callable that constructs vision transformer head activation layer, examples include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not applying activation).

  • dropout_rate (float) – dropout rate.

  • seq_pool_type (str) –

Return type

torch.nn.modules.module.Module

pytorchvideo.models.head.create_res_roi_pooling_head(*, in_features, out_features, resolution, spatial_scale, sampling_ratio=0, roi=<class 'torchvision.ops.roi_align.RoIAlign'>, pool=<class 'torch.nn.modules.pooling.AvgPool3d'>, output_size=(1, 1, 1), pool_kernel_size=(1, 7, 7), pool_stride=(1, 1, 1), pool_padding=(0, 0, 0), pool_spatial=<class 'torch.nn.modules.pooling.MaxPool2d'>, dropout_rate=0.5, activation=None, output_with_global_average=True)[source]

Creates ResNet RoI head. This layer performs an optional pooling operation followed by an RoI projection, an optional 2D spatial pool, an optional dropout, a fully-connected projection, an activation layer and a global spatiotemporal averaging.

Pool3d

RoI Align

Pool2d

Dropout

Projection

Activation

Averaging

Activation examples include: ReLU, Softmax, Sigmoid, and None. Pool3d examples include: AvgPool3d, MaxPool3d, AdaptiveAvgPool3d, and None. RoI examples include: detectron2.layers.ROIAlign, detectron2.layers.ROIAlignRotated,

tochvision.ops.RoIAlign and None

Pool2d examples include: MaxPool2e, AvgPool2d, and None.

Parameters
  • related configs (Output) – in_features: input channel size of the resnet head. out_features: output channel size of the resnet head.

  • layer related configs (RoI) –

    resolution (tuple): h, w sizes of the RoI interpolation. spatial_scale (float): scale the input boxes by this number sampling_ratio (int): number of inputs samples to take for each output

    sample interpolation. 0 to take samples densely.

    roi (callable): a callable that constructs the roi interpolation layer,

    examples include detectron2.layers.ROIAlign, detectron2.layers.ROIAlignRotated, and None.

  • related configs

    pool (callable): a callable that constructs resnet head pooling layer,

    examples include: nn.AvgPool3d, nn.MaxPool3d, nn.AdaptiveAvgPool3d, and None (not applying pooling).

    pool_kernel_size (tuple): pooling kernel size(s) when not using adaptive

    pooling.

    pool_stride (tuple): pooling stride size(s) when not using adaptive pooling. pool_padding (tuple): pooling padding size(s) when not using adaptive

    pooling.

    output_size (tuple): spatial temporal output size when using adaptive

    pooling.

    pool_spatial (callable): a callable that constructs the 2d pooling layer which

    follows the RoI layer, examples include: nn.AvgPool2d, nn.MaxPool2d, and None (not applying spatial pooling).

  • related configs

    activation (callable): a callable that constructs resnet head activation

    layer, examples include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not applying activation).

  • related configs – dropout_rate (float): dropout rate.

  • related configs

    output_with_global_average (bool): if True, perform global averaging on temporal

    and spatial dimensions and reshape output to batch_size x out_features.

  • in_features (int) –

  • out_features (int) –

  • resolution (Tuple) –

  • spatial_scale (float) –

  • sampling_ratio (int) –

  • roi (Callable) –

  • pool (Callable) –

  • output_size (Tuple[int]) –

  • pool_kernel_size (Tuple[int]) –

  • pool_stride (Tuple[int]) –

  • pool_padding (Tuple[int]) –

  • pool_spatial (Callable) –

  • dropout_rate (float) –

  • activation (Callable) –

  • output_with_global_average (bool) –

Return type

torch.nn.modules.module.Module

class pytorchvideo.models.head.ResNetBasicHead(pool=None, dropout=None, proj=None, activation=None, output_pool=None)[source]

ResNet basic head. This layer performs an optional pooling operation followed by an optional dropout, a fully-connected projection, an optional activation layer and a global spatiotemporal averaging.

 Pool3d
    ↓
 Dropout
    ↓
Projection
    ↓
Activation
    ↓
Averaging

The builder can be found in create_res_basic_head.

__init__(pool=None, dropout=None, proj=None, activation=None, output_pool=None)[source]
Parameters
  • pool (torch.nn.modules) – pooling module.

  • dropout (torch.nn.modules) – dropout module.

  • proj (torch.nn.modules) – project module.

  • activation (torch.nn.modules) – activation module.

  • output_pool (torch.nn.Module) – pooling module for output.

Return type

None

class pytorchvideo.models.head.ResNetRoIHead(pool=None, pool_spatial=None, roi_layer=None, dropout=None, proj=None, activation=None, output_pool=None)[source]

ResNet RoI head. This layer performs an optional pooling operation followed by an RoI projection, an optional 2D spatial pool, an optional dropout, a fully-connected projection, an activation layer and a global spatiotemporal averaging.

Pool3d

RoI Align

Pool2d

Dropout

Projection

Activation

Averaging

The builder can be found in create_res_roi_pooling_head.

__init__(pool=None, pool_spatial=None, roi_layer=None, dropout=None, proj=None, activation=None, output_pool=None)[source]
Parameters
  • pool (torch.nn.modules) – pooling module.

  • pool_spatial (torch.nn.modules) – pooling module.

  • roi_spatial (torch.nn.modules) – RoI (Ex: Align, pool) module.

  • dropout (torch.nn.modules) – dropout module.

  • proj (torch.nn.modules) – project module.

  • activation (torch.nn.modules) – activation module.

  • output_pool (torch.nn.Module) – pooling module for output.

  • roi_layer (torch.nn.modules.module.Module) –

Return type

None

forward(x, bboxes)[source]
Parameters
  • x (torch.tensor) – input tensor

  • bboxes (torch.tensor) – Accociated bounding boxes. The format is N*5 (Index, X_1,Y_1,X_2,Y_2) if using RoIAlign and N*6 (Index, x_ctr, y_ctr, width, height, angle_degrees) if using RoIAlignRotated.

Return type

torch.Tensor

class pytorchvideo.models.head.VisionTransformerBasicHead(sequence_pool=None, dropout=None, proj=None, activation=None)[source]

Vision transformer basic head.

SequencePool
     ↓
  Dropout
     ↓
 Projection
     ↓
 Activation

The builder can be found in create_vit_basic_head.

__init__(sequence_pool=None, dropout=None, proj=None, activation=None)[source]
Parameters
  • sequence_pool (torch.nn.modules) – pooling module.

  • dropout (torch.nn.modules) – dropout module.

  • proj (torch.nn.modules) – project module.

  • activation (torch.nn.modules) – activation module.

Return type

None

Read the Docs v: stable
Versions
latest
stable
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.