pytorchvideo.models.csn¶

pytorchvideo.models.csn.create_csn(*, input_channel=3, model_depth=50, model_num_class=400, dropout_rate=0, norm=<class 'torch.nn.modules.batchnorm.BatchNorm3d'>, activation=<class 'torch.nn.modules.activation.ReLU'>, stem_dim_out=64, stem_conv_kernel_size=(3, 7, 7), stem_conv_stride=(1, 2, 2), stem_pool=None, stem_pool_kernel_size=(1, 3, 3), stem_pool_stride=(1, 2, 2), stage_conv_a_kernel_size=(1, 1, 1), stage_conv_b_kernel_size=(3, 3, 3), stage_conv_b_width_per_group=1, stage_spatial_stride=(1, 2, 2, 2), stage_temporal_stride=(1, 2, 2, 2), bottleneck=<function create_bottleneck_block>, bottleneck_ratio=4, head_pool=<class 'torch.nn.modules.pooling.AvgPool3d'>, head_pool_kernel_size=(1, 7, 7), head_output_size=(1, 1, 1), head_activation=None, head_output_with_global_average=True)[source]¶

Build Channel-Separated Convolutional Networks (CSN): Video classification with channel-separated convolutional networks. Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli. ICCV 2019.

CSN follows the ResNet style architecture including three parts: Stem, Stages and Head. The three parts are assembled in the following order:

Input
  ↓
Stem
  ↓
Stage 1
  ↓
  .
  .
  .
  ↓
Stage N
  ↓
Head

CSN uses depthwise convolution. To further reduce the computational cost, it uses low resolution (112x112), short clips (4 frames), different striding and kernel size, etc.

Parameters

input_channel (int) – number of channels for the input video clip.
model_depth (int) – the depth of the resnet. Options include: 50, 101, 152. model_num_class (int): the number of classes for the video dataset. dropout_rate (float): dropout rate.
norm (callable) – a callable that constructs normalization layer.
activation (callable) – a callable that constructs activation layer.
stem_dim_out (int) – output channel size to stem.
stem_conv_kernel_size (tuple) – convolutional kernel size(s) of stem.
stem_conv_stride (tuple) – convolutional stride size(s) of stem.
stem_pool (callable) – a callable that constructs resnet head pooling layer.
stem_pool_kernel_size (tuple) – pooling kernel size(s).
stem_pool_stride (tuple) – pooling stride size(s).
stage_conv_a_kernel_size (tuple) – convolutional kernel size(s) for conv_a.
stage_conv_b_kernel_size (tuple) – convolutional kernel size(s) for conv_b.
stage_conv_b_width_per_group (int) – the width of each group for conv_b. Set it to 1 for depthwise convolution.
stage_spatial_stride (tuple) – the spatial stride for each stage.
stage_temporal_stride (tuple) – the temporal stride for each stage.
bottleneck (callable) – a callable that constructs bottleneck block layer. Examples include: create_bottleneck_block.
bottleneck_ratio (int) – the ratio between inner and outer dimensions for the bottleneck block.
head_pool (callable) – a callable that constructs resnet head pooling layer.
head_pool_kernel_size (tuple) – the pooling kernel size.
head_output_size (tuple) – the size of output tensor for head.
head_activation (callable) – a callable that constructs activation layer.
head_output_with_global_average (bool) – if True, perform global averaging on the head output.
model_num_class (int) –
dropout_rate (float) –

Returns

(nn.Module) – the csn model.

Return type

torch.nn.modules.module.Module