Shortcuts

pytorchvideo.models.masked_multistream

class pytorchvideo.models.masked_multistream.MaskedTemporalPooling(method)[source]

Applies temporal pooling operations on masked inputs. For each pooling operation all masked values are ignored.

__init__(method)[source]
method (str): the method of pooling to use. Options:

‘max’: reduces temporal dimension to each valid max value. ‘avg’: averages valid values in the temporal dimension. ‘sum’: sums valid values in the temporal dimension. Note if all batch row elements are invalid, the temporal dimension is pooled to 0 values.

Parameters

method (str) –

forward(x, mask=None)[source]
Parameters
  • x (torch.Tensor) – tensor with shape (batch_size, seq_len, feature_dim)

  • mask (torch.Tensor) – bool tensor with shape (batch_size, seq_len). Sequence elements that are False are invalid.

Returns

Tensor with shape (batch_size, feature_dim)

Return type

torch.Tensor

class pytorchvideo.models.masked_multistream.TransposeMultiheadAttention(feature_dim, num_heads=1)[source]

Wrapper for nn.MultiheadAttention which first transposes the input tensor from (batch_size, seq_len, feature_dim) to (seq_length, batch_size, feature_dim), then applies the attention and transposes the attention outputs back to the input shape.

__init__(feature_dim, num_heads=1)[source]
Parameters
  • feature_dim (int) – attention embedding dimension

  • num_heads (int) – number of attention heads

property attention_weights

Contains attention weights from last forward call.

forward(x, mask=None)[source]
Parameters
  • x (torch.Tensor) – tensor of shape (batch_size, seq_len, feature_dim)

  • mask (torch.Tensor) – bool tensor with shape (batch_size, seq_len). Sequence elements that are False are invalid.

Returns

Tensor with shape (batch_size, seq_len, feature_dim)

Return type

torch.Tensor

class pytorchvideo.models.masked_multistream.LearnMaskedDefault(feature_dim, init_method='gaussian', freeze=False)[source]

Learns default values to fill invalid entries within input tensors. The invalid entries are represented by a mask which is passed into forward alongside the input tensor. Note the default value is only used if all entries in the batch row are invalid rather than just a portion of invalid entries within each batch row.

__init__(feature_dim, init_method='gaussian', freeze=False)[source]
Parameters
  • feature_dim (int) – the size of the default value parameter, this must match the input tensor size.

  • init_method (str) – the initial default value parameter. Options: ‘guassian’ ‘zeros’

  • freeze (bool) – If True, the learned default parameter weights are frozen.

forward(x, mask)[source]
Parameters
  • x (torch.Tensor) – tensor of shape (batch_size, feature_dim).

  • mask (torch.Tensor) – bool tensor of shape (batch_size, seq_len) If all elements in the batch dimension are False the learned default parameter is used for that batch element.

Returns

Tensor with shape (batch_size, feature_dim)

Return type

torch.Tensor

class pytorchvideo.models.masked_multistream.LSTM(dim_in, hidden_dim, dropout=0.0, bidirectional=False)[source]

Wrapper for torch.nn.LSTM that handles masked inputs.

__init__(dim_in, hidden_dim, dropout=0.0, bidirectional=False)[source]
Parameters
  • dim_in (int) – input feature dimension

  • hidden_dim (int) – hidden dimesion of lstm layer

  • dropout (float) – dropout rate - 0.0 if no dropout

  • bidirectional (bool) – bidirectional or forward only

forward(data, mask=None)[source]
Parameters
  • data (torch.Tensor) – tensor with shape (batch_size, seq_len, feature_dim)

  • mask (torch.Tensor) – bool tensor with shape (batch_size, seq_len). Sequence elements that are False are invalid.

Returns

Tensor with shape (batch_size, output_dim) - outoput_dim is determined by

hidden_dim and whether bidirectional or not

Return type

torch.Tensor

class pytorchvideo.models.masked_multistream.TransposeTransformerEncoder(dim_in, num_heads=1, num_layers=1)[source]

Wrapper for torch.nn.TransformerEncoder that handles masked inputs.

__init__(dim_in, num_heads=1, num_layers=1)[source]
Parameters
  • dim_in (int) – input feature dimension

  • num_heads (int) – number of heads in the nn.MultiHeadAttention layers

  • num_layers (int) – the number of sub-encoder-layers in the encoder

forward(data, mask=None)[source]
Parameters
  • data (torch.Tensor) – tensor with shape (batch_size, seq_len, feature_dim)

  • mask (torch.Tensor) – bool tensor with shape (batch_size, seq_len). Sequence elements that are False are invalid.

Returns

Tensor with shape (batch_size, feature_dim)

Return type

torch.Tensor

class pytorchvideo.models.masked_multistream.MaskedSequential(*args)[source]

A sequential container that overrides forward to take a mask as well as the usual input tensor. This mask is only applied to modules in _MASK_MODULES (which take the mask argument).

class pytorchvideo.models.masked_multistream.MaskedMultiPathWay(*, multipathway_blocks, multipathway_fusion)[source]

Masked multi-pathway is composed of a list of stream nn.Modules followed by a fusion nn.Module that reduces these streams. Each stream module takes a mask and input tensor.

Pathway 1  ... Pathway N
    ↓              ↓
 Block 1        Block N
    ↓⭠ --Fusion----↓
__init__(*, multipathway_blocks, multipathway_fusion)[source]
Parameters
  • multipathway_blocks (nn.module_list) – list of models from all pathways.

  • multipathway_fusion (nn.module) – fusion model.

Return type

None

Read the Docs v: latest
Versions
latest
stable
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.