Shortcuts

pytorchvideo.models.x3d

pytorchvideo.models.x3d.create_x3d_stem(*, in_channels, out_channels, conv_kernel_size=(5, 3, 3), conv_stride=(1, 2, 2), conv_padding=(2, 1, 1), norm=<class 'torch.nn.modules.batchnorm.BatchNorm3d'>, norm_eps=1e-05, norm_momentum=0.1, activation=<class 'torch.nn.modules.activation.ReLU'>)[source]

Creates the stem layer for X3D. It performs spatial Conv, temporal Conv, BN, and Relu.

   Conv_xy
      ↓
   Conv_t
      ↓
Normalization
      ↓
  Activation
Parameters
  • in_channels (int) – input channel size of the convolution.

  • out_channels (int) – output channel size of the convolution.

  • conv_kernel_size (tuple) – convolutional kernel size(s).

  • conv_stride (tuple) – convolutional stride size(s).

  • conv_padding (tuple) – convolutional padding size(s).

  • norm (callable) – a callable that constructs normalization layer, options include nn.BatchNorm3d, None (not performing normalization).

  • norm_eps (float) – normalization epsilon.

  • norm_momentum (float) – normalization momentum.

  • activation (callable) – a callable that constructs activation layer, options include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not performing activation).

Returns

(nn.Module) – X3D stem layer.

Return type

torch.nn.modules.module.Module

pytorchvideo.models.x3d.create_x3d_bottleneck_block(*, dim_in, dim_inner, dim_out, conv_kernel_size=(3, 3, 3), conv_stride=(1, 2, 2), norm=<class 'torch.nn.modules.batchnorm.BatchNorm3d'>, norm_eps=1e-05, norm_momentum=0.1, se_ratio=0.0625, activation=<class 'torch.nn.modules.activation.ReLU'>, inner_act=<class 'pytorchvideo.layers.swish.Swish'>)[source]

Bottleneck block for X3D: a sequence of Conv, Normalization with optional SE block, and Activations repeated in the following order:

   Conv3d (conv_a)
          ↓
Normalization (norm_a)
          ↓
  Activation (act_a)
          ↓
   Conv3d (conv_b)
          ↓
Normalization (norm_b)
          ↓
Squeeze-and-Excitation
          ↓
  Activation (act_b)
          ↓
   Conv3d (conv_c)
          ↓
Normalization (norm_c)
Parameters
  • dim_in (int) – input channel size to the bottleneck block.

  • dim_inner (int) – intermediate channel size of the bottleneck.

  • dim_out (int) – output channel size of the bottleneck.

  • conv_kernel_size (tuple) – convolutional kernel size(s) for conv_b.

  • conv_stride (tuple) – convolutional stride size(s) for conv_b.

  • norm (callable) – a callable that constructs normalization layer, examples include nn.BatchNorm3d, None (not performing normalization).

  • norm_eps (float) – normalization epsilon.

  • norm_momentum (float) – normalization momentum.

  • se_ratio (float) – if > 0, apply SE to the 3x3x3 conv, with the SE channel dimensionality being se_ratio times the 3x3x3 conv dim.

  • activation (callable) – a callable that constructs activation layer, examples include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not performing activation).

  • inner_act (callable) – whether use Swish activation for act_b or not.

Returns

(nn.Module) – X3D bottleneck block.

Return type

torch.nn.modules.module.Module

pytorchvideo.models.x3d.create_x3d_res_block(*, dim_in, dim_inner, dim_out, bottleneck=<function create_x3d_bottleneck_block>, use_shortcut=True, conv_kernel_size=(3, 3, 3), conv_stride=(1, 2, 2), norm=<class 'torch.nn.modules.batchnorm.BatchNorm3d'>, norm_eps=1e-05, norm_momentum=0.1, se_ratio=0.0625, activation=<class 'torch.nn.modules.activation.ReLU'>, inner_act=<class 'pytorchvideo.layers.swish.Swish'>)[source]

Residual block for X3D. Performs a summation between an identity shortcut in branch1 and a main block in branch2. When the input and output dimensions are different, a convolution followed by a normalization will be performed.

  Input
    |-------+
    ↓       |
  Block     |
    ↓       |
Summation ←-+
    ↓
Activation
Parameters
  • dim_in (int) – input channel size to the bottleneck block.

  • dim_inner (int) – intermediate channel size of the bottleneck.

  • dim_out (int) – output channel size of the bottleneck.

  • bottleneck (callable) – a callable for create_x3d_bottleneck_block.

  • conv_kernel_size (tuple) – convolutional kernel size(s) for conv_b.

  • conv_stride (tuple) – convolutional stride size(s) for conv_b.

  • norm (callable) – a callable that constructs normalization layer, examples include nn.BatchNorm3d, None (not performing normalization).

  • norm_eps (float) – normalization epsilon.

  • norm_momentum (float) – normalization momentum.

  • se_ratio (float) – if > 0, apply SE to the 3x3x3 conv, with the SE channel dimensionality being se_ratio times the 3x3x3 conv dim.

  • activation (callable) – a callable that constructs activation layer, examples include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not performing activation).

  • inner_act (callable) – whether use Swish activation for act_b or not.

  • use_shortcut (bool) –

Returns

(nn.Module) – X3D block layer.

Return type

torch.nn.modules.module.Module

pytorchvideo.models.x3d.create_x3d_res_stage(*, depth, dim_in, dim_inner, dim_out, bottleneck=<function create_x3d_bottleneck_block>, conv_kernel_size=(3, 3, 3), conv_stride=(1, 2, 2), norm=<class 'torch.nn.modules.batchnorm.BatchNorm3d'>, norm_eps=1e-05, norm_momentum=0.1, se_ratio=0.0625, activation=<class 'torch.nn.modules.activation.ReLU'>, inner_act=<class 'pytorchvideo.layers.swish.Swish'>)[source]

Create Residual Stage, which composes sequential blocks that make up X3D.

 Input
    ↓
ResBlock
    ↓
    .
    .
    .
    ↓
ResBlock
Parameters
  • depth (init) – number of blocks to create.

  • dim_in (int) – input channel size to the bottleneck block.

  • dim_inner (int) – intermediate channel size of the bottleneck.

  • dim_out (int) – output channel size of the bottleneck.

  • bottleneck (callable) – a callable for create_x3d_bottleneck_block.

  • conv_kernel_size (tuple) – convolutional kernel size(s) for conv_b.

  • conv_stride (tuple) – convolutional stride size(s) for conv_b.

  • norm (callable) – a callable that constructs normalization layer, examples include nn.BatchNorm3d, None (not performing normalization).

  • norm_eps (float) – normalization epsilon.

  • norm_momentum (float) – normalization momentum.

  • se_ratio (float) – if > 0, apply SE to the 3x3x3 conv, with the SE channel dimensionality being se_ratio times the 3x3x3 conv dim.

  • activation (callable) – a callable that constructs activation layer, examples include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not performing activation).

  • inner_act (callable) – whether use Swish activation for act_b or not.

Returns

(nn.Module) – X3D stage layer.

Return type

torch.nn.modules.module.Module

pytorchvideo.models.x3d.create_x3d_head(*, dim_in, dim_inner, dim_out, num_classes, pool_act=<class 'torch.nn.modules.activation.ReLU'>, pool_kernel_size=(13, 5, 5), norm=<class 'torch.nn.modules.batchnorm.BatchNorm3d'>, norm_eps=1e-05, norm_momentum=0.1, bn_lin5_on=False, dropout_rate=0.5, activation=<class 'torch.nn.modules.activation.Softmax'>, output_with_global_average=True)[source]

Creates X3D head. This layer performs an projected pooling operation followed by an dropout, a fully-connected projection, an activation layer and a global spatiotemporal averaging.

ProjectedPool
      ↓
   Dropout
      ↓
  Projection
      ↓
  Activation
      ↓
  Averaging
Parameters
  • dim_in (int) – input channel size of the X3D head.

  • dim_inner (int) – intermediate channel size of the X3D head.

  • dim_out (int) – output channel size of the X3D head.

  • num_classes (int) – the number of classes for the video dataset.

  • pool_act (callable) – a callable that constructs resnet pool activation layer such as nn.ReLU.

  • pool_kernel_size (tuple) – pooling kernel size(s) when not using adaptive pooling.

  • norm (callable) – a callable that constructs normalization layer, examples include nn.BatchNorm3d, None (not performing normalization).

  • norm_eps (float) – normalization epsilon.

  • norm_momentum (float) – normalization momentum.

  • bn_lin5_on (bool) – if True, perform normalization on the features before the classifier.

  • dropout_rate (float) – dropout rate.

  • activation (callable) – a callable that constructs resnet head activation layer, examples include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not applying activation).

  • output_with_global_average (bool) – if True, perform global averaging on temporal and spatial dimensions and reshape output to batch_size x out_features.

Returns

(nn.Module) – X3D head layer.

Return type

torch.nn.modules.module.Module

pytorchvideo.models.x3d.create_x3d(*, input_channel=3, input_clip_length=13, input_crop_size=160, model_num_class=400, dropout_rate=0.5, width_factor=2.0, depth_factor=2.2, norm=<class 'torch.nn.modules.batchnorm.BatchNorm3d'>, norm_eps=1e-05, norm_momentum=0.1, activation=<class 'torch.nn.modules.activation.ReLU'>, stem_dim_in=12, stem_conv_kernel_size=(5, 3, 3), stem_conv_stride=(1, 2, 2), stage_conv_kernel_size=((3, 3, 3), (3, 3, 3), (3, 3, 3), (3, 3, 3)), stage_spatial_stride=(2, 2, 2, 2), stage_temporal_stride=(1, 1, 1, 1), bottleneck=<function create_x3d_bottleneck_block>, bottleneck_factor=2.25, se_ratio=0.0625, inner_act=<class 'pytorchvideo.layers.swish.Swish'>, head_dim_out=2048, head_pool_act=<class 'torch.nn.modules.activation.ReLU'>, head_bn_lin5_on=False, head_activation=<class 'torch.nn.modules.activation.Softmax'>, head_output_with_global_average=True)[source]

X3D model builder. It builds a X3D network backbone, which is a ResNet.

Christoph Feichtenhofer. “X3D: Expanding Architectures for Efficient Video Recognition.” https://arxiv.org/abs/2004.04730

Input
  ↓
Stem
  ↓
Stage 1
  ↓
  .
  .
  .
  ↓
Stage N
  ↓
Head
Parameters
  • input_channel (int) – number of channels for the input video clip.

  • input_clip_length (int) – length of the input video clip. Value for different models: X3D-XS: 4; X3D-S: 13; X3D-M: 16; X3D-L: 16.

  • input_crop_size (int) – spatial resolution of the input video clip. Value for different models: X3D-XS: 160; X3D-S: 160; X3D-M: 224; X3D-L: 312.

  • model_num_class (int) – the number of classes for the video dataset.

  • dropout_rate (float) – dropout rate.

  • width_factor (float) – width expansion factor.

  • depth_factor (float) – depth expansion factor. Value for different models: X3D-XS: 2.2; X3D-S: 2.2; X3D-M: 2.2; X3D-L: 5.0.

  • norm (callable) – a callable that constructs normalization layer.

  • norm_eps (float) – normalization epsilon.

  • norm_momentum (float) – normalization momentum.

  • activation (callable) – a callable that constructs activation layer.

  • stem_dim_in (int) – input channel size for stem before expansion.

  • stem_conv_kernel_size (tuple) – convolutional kernel size(s) of stem.

  • stem_conv_stride (tuple) – convolutional stride size(s) of stem.

  • stage_conv_kernel_size (tuple) – convolutional kernel size(s) for conv_b.

  • stage_spatial_stride (tuple) – the spatial stride for each stage.

  • stage_temporal_stride (tuple) – the temporal stride for each stage.

  • bottleneck_factor (float) – bottleneck expansion factor for the 3x3x3 conv.

  • se_ratio (float) – if > 0, apply SE to the 3x3x3 conv, with the SE channel dimensionality being se_ratio times the 3x3x3 conv dim.

  • inner_act (callable) – whether use Swish activation for act_b or not.

  • head_dim_out (int) – output channel size of the X3D head.

  • head_pool_act (callable) – a callable that constructs resnet pool activation layer such as nn.ReLU.

  • head_bn_lin5_on (bool) – if True, perform normalization on the features before the classifier.

  • head_activation (callable) – a callable that constructs activation layer.

  • head_output_with_global_average (bool) – if True, perform global averaging on the head output.

  • bottleneck (Callable) –

Returns

(nn.Module) – the X3D network.

Return type

torch.nn.modules.module.Module

class pytorchvideo.models.x3d.ProjectedPool(*, pre_conv=None, pre_norm=None, pre_act=None, pool=None, post_conv=None, post_norm=None, post_act=None)[source]

A pooling module augmented with Conv, Normalization and Activation both before and after pooling for the head layer of X3D.

   Conv3d (pre_conv)
          ↓
Normalization (pre_norm)
          ↓
  Activation (pre_act)
          ↓
       Pool3d
          ↓
   Conv3d (post_conv)
          ↓
Normalization (post_norm)
          ↓
  Activation (post_act)
__init__(*, pre_conv=None, pre_norm=None, pre_act=None, pool=None, post_conv=None, post_norm=None, post_act=None)[source]
Parameters
  • pre_conv (torch.nn.modules) – convolutional module.

  • pre_norm (torch.nn.modules) – normalization module.

  • pre_act (torch.nn.modules) – activation module.

  • pool (torch.nn.modules) – pooling module.

  • post_conv (torch.nn.modules) – convolutional module.

  • post_norm (torch.nn.modules) – normalization module.

  • post_act (torch.nn.modules) – activation module.

Return type

None

Read the Docs v: latest
Versions
latest
stable
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.