Shortcuts

pytorchvideo.models.head

pytorchvideo.models.head.create_res_basic_head(*, in_features, out_features, pool=<class 'torch.nn.modules.pooling.AvgPool3d'>, output_size=(1, 1, 1), pool_kernel_size=(1, 7, 7), pool_stride=(1, 1, 1), pool_padding=(0, 0, 0), dropout_rate=0.5, activation=None, output_with_global_average=True)[source]

Creates ResNet basic head. This layer performs an optional pooling operation followed by an optional dropout, a fully-connected projection, an activation layer and a global spatiotemporal averaging.

 Pooling
    ↓
 Dropout
    ↓
Projection
    ↓
Activation
    ↓
Averaging

Activation examples include: ReLU, Softmax, Sigmoid, and None. Pool3d examples include: AvgPool3d, MaxPool3d, AdaptiveAvgPool3d, and None.

Parameters
  • in_features (int) – input channel size of the resnet head.

  • out_features (int) – output channel size of the resnet head.

  • pool (callable) – a callable that constructs resnet head pooling layer, examples include: nn.AvgPool3d, nn.MaxPool3d, nn.AdaptiveAvgPool3d, and None (not applying pooling).

  • pool_kernel_size (tuple) – pooling kernel size(s) when not using adaptive pooling.

  • pool_stride (tuple) – pooling stride size(s) when not using adaptive pooling.

  • pool_padding (tuple) – pooling padding size(s) when not using adaptive pooling.

  • output_size (tuple) – spatial temporal output size when using adaptive pooling.

  • activation (callable) – a callable that constructs resnet head activation layer, examples include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not applying activation).

  • dropout_rate (float) – dropout rate.

  • output_with_global_average (bool) – if True, perform global averaging on temporal and spatial dimensions and reshape output to batch_size x out_features.

Return type

torch.nn.modules.module.Module

pytorchvideo.models.head.create_res_roi_pooling_head(*, in_features, out_features, resolution, spatial_scale, sampling_ratio=0, roi=<class 'torchvision.ops.roi_align.RoIAlign'>, pool=<class 'torch.nn.modules.pooling.AvgPool3d'>, output_size=(1, 1, 1), pool_kernel_size=(1, 7, 7), pool_stride=(1, 1, 1), pool_padding=(0, 0, 0), pool_spatial=<class 'torch.nn.modules.pooling.MaxPool2d'>, dropout_rate=0.5, activation=None, output_with_global_average=True)[source]

Creates ResNet RoI head. This layer performs an optional pooling operation followed by an RoI projection, an optional 2D spatial pool, an optional dropout, a fully-connected projection, an activation layer and a global spatiotemporal averaging.

Pool3d

RoI Align

Pool2d

Dropout

Projection

Activation

Averaging

Activation examples include: ReLU, Softmax, Sigmoid, and None. Pool3d examples include: AvgPool3d, MaxPool3d, AdaptiveAvgPool3d, and None. RoI examples include: detectron2.layers.ROIAlign, detectron2.layers.ROIAlignRotated,

tochvision.ops.RoIAlign and None

Pool2d examples include: MaxPool2e, AvgPool2d, and None.

Parameters
  • related configs (Output) – in_features: input channel size of the resnet head. out_features: output channel size of the resnet head.

  • layer related configs (RoI) –

    resolution (tuple): h, w sizes of the RoI interpolation. spatial_scale (float): scale the input boxes by this number sampling_ratio (int): number of inputs samples to take for each output

    sample interpolation. 0 to take samples densely.

    roi (callable): a callable that constructs the roi interpolation layer,

    examples include detectron2.layers.ROIAlign, detectron2.layers.ROIAlignRotated, and None.

  • related configs

    pool (callable): a callable that constructs resnet head pooling layer,

    examples include: nn.AvgPool3d, nn.MaxPool3d, nn.AdaptiveAvgPool3d, and None (not applying pooling).

    pool_kernel_size (tuple): pooling kernel size(s) when not using adaptive

    pooling.

    pool_stride (tuple): pooling stride size(s) when not using adaptive pooling. pool_padding (tuple): pooling padding size(s) when not using adaptive

    pooling.

    output_size (tuple): spatial temporal output size when using adaptive

    pooling.

    pool_spatial (callable): a callable that constructs the 2d pooling layer which

    follows the RoI layer, examples include: nn.AvgPool2d, nn.MaxPool2d, and None (not applying spatial pooling).

  • related configs

    activation (callable): a callable that constructs resnet head activation

    layer, examples include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not applying activation).

  • related configs – dropout_rate (float): dropout rate.

  • related configs

    output_with_global_average (bool): if True, perform global averaging on temporal

    and spatial dimensions and reshape output to batch_size x out_features.

  • in_features (int) –

  • out_features (int) –

  • resolution (Tuple) –

  • spatial_scale (float) –

  • sampling_ratio (int) –

  • roi (Callable) –

  • pool (Callable) –

  • output_size (Tuple[int]) –

  • pool_kernel_size (Tuple[int]) –

  • pool_stride (Tuple[int]) –

  • pool_padding (Tuple[int]) –

  • pool_spatial (Callable) –

  • dropout_rate (float) –

  • activation (Callable) –

  • output_with_global_average (bool) –

Return type

torch.nn.modules.module.Module

class pytorchvideo.models.head.ResNetBasicHead(pool=None, dropout=None, proj=None, activation=None, output_pool=None)[source]

ResNet basic head. This layer performs an optional pooling operation followed by an optional dropout, a fully-connected projection, an optional activation layer and a global spatiotemporal averaging.

 Pool3d
    ↓
 Dropout
    ↓
Projection
    ↓
Activation
    ↓
Averaging

The builder can be found in create_res_basic_head.

__init__(pool=None, dropout=None, proj=None, activation=None, output_pool=None)[source]
Parameters
  • pool (torch.nn.modules) – pooling module.

  • dropout (torch.nn.modules) – dropout module.

  • proj (torch.nn.modules) – project module.

  • activation (torch.nn.modules) – activation module.

  • output_pool (torch.nn.Module) – pooling module for output.

Return type

None

class pytorchvideo.models.head.ResNetRoIHead(pool=None, pool_spatial=None, roi_layer=None, dropout=None, proj=None, activation=None, output_pool=None)[source]

ResNet RoI head. This layer performs an optional pooling operation followed by an RoI projection, an optional 2D spatial pool, an optional dropout, a fully-connected projection, an activation layer and a global spatiotemporal averaging.

Pool3d

RoI Align

Pool2d

Dropout

Projection

Activation

Averaging

The builder can be found in create_res_roi_pooling_head.

__init__(pool=None, pool_spatial=None, roi_layer=None, dropout=None, proj=None, activation=None, output_pool=None)[source]
Parameters
  • pool (torch.nn.modules) – pooling module.

  • pool_spatial (torch.nn.modules) – pooling module.

  • roi_spatial (torch.nn.modules) – RoI (Ex: Align, pool) module.

  • dropout (torch.nn.modules) – dropout module.

  • proj (torch.nn.modules) – project module.

  • activation (torch.nn.modules) – activation module.

  • output_pool (torch.nn.Module) – pooling module for output.

  • roi_layer (torch.nn.modules.module.Module) –

Return type

None

forward(x, bboxes)[source]
Parameters
  • x (torch.tensor) – input tensor

  • bboxes (torch.tensor) – Accociated bounding boxes. The format is N*5 (Index, X_1,Y_1,X_2,Y_2) if using RoIAlign and N*6 (Index, x_ctr, y_ctr, width, height, angle_degrees) if using RoIAlignRotated.

Return type

torch.Tensor

Read the Docs v: latest
Versions
latest
stable
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.