Shortcuts

pytorchvideo.models.stem

pytorchvideo.models.stem.create_res_basic_stem(*, in_channels, out_channels, conv_kernel_size=(3, 7, 7), conv_stride=(1, 2, 2), conv_padding=(1, 3, 3), conv_bias=False, conv=<class 'torch.nn.modules.conv.Conv3d'>, pool=<class 'torch.nn.modules.pooling.MaxPool3d'>, pool_kernel_size=(1, 3, 3), pool_stride=(1, 2, 2), pool_padding=(0, 1, 1), norm=<class 'torch.nn.modules.batchnorm.BatchNorm3d'>, norm_eps=1e-05, norm_momentum=0.1, activation=<class 'torch.nn.modules.activation.ReLU'>)[source]

Creates the basic resnet stem layer. It performs spatiotemporal Convolution, BN, and Relu following by a spatiotemporal pooling.

   Conv3d
      ↓
Normalization
      ↓
  Activation
      ↓
   Pool3d

Normalization options include: BatchNorm3d and None (no normalization). Activation options include: ReLU, Softmax, Sigmoid, and None (no activation). Pool3d options include: AvgPool3d, MaxPool3d, and None (no pooling).

Parameters
  • in_channels (int) – input channel size of the convolution.

  • out_channels (int) – output channel size of the convolution.

  • conv_kernel_size (tuple) – convolutional kernel size(s).

  • conv_stride (tuple) – convolutional stride size(s).

  • conv_padding (tuple) – convolutional padding size(s).

  • conv_bias (bool) – convolutional bias. If true, adds a learnable bias to the output.

  • conv (callable) – Callable used to build the convolution layer.

  • pool (callable) – a callable that constructs pooling layer, options include: nn.AvgPool3d, nn.MaxPool3d, and None (not performing pooling).

  • pool_kernel_size (tuple) – pooling kernel size(s).

  • pool_stride (tuple) – pooling stride size(s).

  • pool_padding (tuple) – pooling padding size(s).

  • norm (callable) – a callable that constructs normalization layer, options include nn.BatchNorm3d, None (not performing normalization).

  • norm_eps (float) – normalization epsilon.

  • norm_momentum (float) – normalization momentum.

  • activation (callable) – a callable that constructs activation layer, options include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not performing activation).

Returns

(nn.Module) – resnet basic stem layer.

Return type

torch.nn.modules.module.Module

pytorchvideo.models.stem.create_acoustic_res_basic_stem(*, in_channels, out_channels, conv_kernel_size=(3, 7, 7), conv_stride=(1, 1, 1), conv_padding=(1, 3, 3), conv_bias=False, pool=<class 'torch.nn.modules.pooling.MaxPool3d'>, pool_kernel_size=(1, 3, 3), pool_stride=(1, 2, 2), pool_padding=(0, 1, 1), norm=<class 'torch.nn.modules.batchnorm.BatchNorm3d'>, norm_eps=1e-05, norm_momentum=0.1, activation=<class 'torch.nn.modules.activation.ReLU'>)[source]

Creates the acoustic resnet stem layer. It performs a spatial and a temporal Convolution in parallel, then performs, BN, and Relu following by a spatiotemporal pooling.

Conv3d   Conv3d
       ↓
 Normalization
       ↓
   Activation
       ↓
    Pool3d

Normalization options include: BatchNorm3d and None (no normalization). Activation options include: ReLU, Softmax, Sigmoid, and None (no activation). Pool3d options include: AvgPool3d, MaxPool3d, and None (no pooling).

Parameters
  • in_channels (int) – input channel size of the convolution.

  • out_channels (int) – output channel size of the convolution.

  • conv_kernel_size (tuple) – convolutional kernel size(s).

  • conv_stride (tuple) – convolutional stride size(s), it will be performed as temporal and spatial convolution in parallel.

  • conv_padding (tuple) – convolutional padding size(s), it will be performed as temporal and spatial convolution in parallel.

  • conv_bias (bool) – convolutional bias. If true, adds a learnable bias to the output.

  • pool (callable) – a callable that constructs pooling layer, options include: nn.AvgPool3d, nn.MaxPool3d, and None (not performing pooling).

  • pool_kernel_size (tuple) – pooling kernel size(s).

  • pool_stride (tuple) – pooling stride size(s).

  • pool_padding (tuple) – pooling padding size(s).

  • norm (callable) – a callable that constructs normalization layer, options include nn.BatchNorm3d, None (not performing normalization).

  • norm_eps (float) – normalization epsilon.

  • norm_momentum (float) – normalization momentum.

  • activation (callable) – a callable that constructs activation layer, options include: nn.ReLU, nn.Softmax, nn.Sigmoid, and None (not performing activation).

Returns

(nn.Module) – resnet basic stem layer.

Return type

torch.nn.modules.module.Module

class pytorchvideo.models.stem.ResNetBasicStem(*, conv=None, norm=None, activation=None, pool=None)[source]

ResNet basic 3D stem module. Performs spatiotemporal Convolution, BN, and activation following by a spatiotemporal pooling.

   Conv3d
      ↓
Normalization
      ↓
  Activation
      ↓
   Pool3d

The builder can be found in create_res_basic_stem.

__init__(*, conv=None, norm=None, activation=None, pool=None)[source]
Parameters
  • conv (torch.nn.modules) – convolutional module.

  • norm (torch.nn.modules) – normalization module.

  • activation (torch.nn.modules) – activation module.

  • pool (torch.nn.modules) – pooling module.

Return type

None

class pytorchvideo.models.stem.PatchEmbed(*, patch_model=None)[source]

Transformer basic patch embedding module. Performs patchifying input, flatten and and transpose.

PatchModel
    ↓
 flatten
    ↓
transpose

The builder can be found in create_patch_embed.

pytorchvideo.models.stem.create_conv_patch_embed(*, in_channels, out_channels, conv_kernel_size=(1, 16, 16), conv_stride=(1, 4, 4), conv_padding=(1, 7, 7), conv_bias=True, conv=<class 'torch.nn.modules.conv.Conv3d'>)[source]

Creates the transformer basic patch embedding. It performs Convolution, flatten and transpose.

 Conv3d
    ↓
 flatten
    ↓
transpose
Parameters
  • in_channels (int) – input channel size of the convolution.

  • out_channels (int) – output channel size of the convolution.

  • conv_kernel_size (tuple) – convolutional kernel size(s).

  • conv_stride (tuple) – convolutional stride size(s).

  • conv_padding (tuple) – convolutional padding size(s).

  • conv_bias (bool) – convolutional bias. If true, adds a learnable bias to the output.

  • conv (callable) – Callable used to build the convolution layer.

Returns

(nn.Module) – transformer patch embedding layer.

Return type

torch.nn.modules.module.Module

Read the Docs v: latest
Versions
latest
stable
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.