Pad and Pack Variable Length Sequences pad_packed_sequence

There are many kinds of sequence data but here we will look at text data. It is most commonly used in natural language processing. The length of the text data is different. The sequence data has a variable length.

Now, let’s look at an example of the sequence data. Here, you can see 3 text sequence data composed of word units. There are different sizes in one batch.

If you want to pass these sequences to some recurrent neural network architecture then you have to <pad> all of the sequences (typically with 0s) in our batch to the maximum sequence length is 3.

Problem With Padding

For the sake of understanding, let’s also assume that we will matrix multiply the above-padded batch of sequences of shape (3, 3) with a weight matrix W.

Thus, we will have to perform 3×3 = 9 multiplication and 3×2 = 6 addition (nrows x (n-1)_cols) operations, only to throw away most of the computed results since they would be 0s (where we have pads).

Packed Sequences

The second method is the packing method. Here, the feather padding is not used, and information about the length of the sequence is stored.

You can now imagine how much compute (eventually: cost, energy, time, etc.) can be saved using pack padded sequences for large sequences with millions of entries, and million+ systems all over the world doing that, again and again.

If you use a PackedSequence, you can perform parallel computation only to exactly what you need without padding. In this tutorial, we will learn how to create PackedSequence and PaddedSequence that can utilize sequence batches in RNN / LSTM series models.

I created some simple example data for practice. The thing to remember here is that the batch size is 3, and the longest length of the sequence is 3.

docs = ['on your mark', 
        'get set',
        'go']

Pad Sequences using pad_sequence() function

In order to make one batch, padding is added at the back according to the length of the longest sequence. This is a commonly used padding method. At this time, padding can be easily added by using the PyTorch basic library function called pad_sequence.

x=[torch.LongTensor([word2idx[word]for word in seq.split(" ")])for seq in docs]
x_padded = pad_sequence(x, batch_first=True, padding_value=0)

print(x_padded)

By default, the padding value is 0, but if you specify a parameter such as padding_value=42. Using the pad_sequence function returns a Tensor with shape (T, batch_size, a, b, …). (Where T is the largest sequence length in the batch.)If you explicitly specify the parameter batch_first=True in pad_sequence then return Tensor with shape (batch_size, T, a, b, …).

Creating a Pack Sequence using the pack_sequence function

PackedSequence does not create a Tensor that fits the maximum length of the sequence by adding padding tokens as above. It is a data structure of PyTorch that allows the model to operate only up to the exact length of a given sequence without adding padding. Note that the input should be given as a list of Tensors. (It is a list of Tensors, not just Tensors.)

seq_len=torch.LongTensor(list(map(len,x)))
print(seq_len)

embed=nn.Embedding(vocab_size,embedding_dim)
lstm=nn.LSTM(embedding_dim,hidden_size=5,batch_first=True)

embedding_seq_tensor=embed(x_padded)
print(embedding_seq_tensor)

packed_input = pack_padded_sequence(embedding_seq_tensor, seq_len.cpu().numpy(), batch_first=True,enforce_sorted=False)
print(packed_input.data.shape)

where seq_len are the length of the individual sequence before padding.

Feed packed sequence in RNN

Now that I have created a PackedSequence and a PaddedSequence, a padded Tensor, I am going to test them by putting them as input in the RNN.

packed_output,(ht,ct)=lstm(packed_input)

packed_output.data.shape

output, input_sizes = pad_packed_sequence(packed_output, batch_first=True)

Now we do packing so that the RNN doesn’t see the unwanted padded index while processing the sequence which would affect the overall performance.

Pad packed sequence

It is an inverse operation to pack_padded_sequence(). It pads a packed batch of variable-length sequences.

output, input_sizes = pad_packed_sequence(packed_output, batch_first=True)
print(ht[-1])

The returned Tensor’s data will be of size T x B x *, where T is the length of the longest sequence and B is the batch size. If batch_first is True, the data will be transposed into B x T x * format. It also returns the list of lengths of each sequence in the batch. Batch elements will be re-ordered as they were ordered originally when the batch was passed to pack_padded_sequence or pack_sequence.