In this tutorial, we’re going to be learning about more advanced types of RNN is bidirectional LSTM. It’s all about information flowing left to right and right to left.
Sentiment Classification is the task when you have some kind of input sentence such as “the movie was terribly exciting !” and you want to classify this as a positive or negative sentiment. In this example, it should be seen as a positive sentiment. This is an example of how you might try to solve sentiment classification using a fairly simple RNN model.
The idea is that it’s a representation of the word “terribly” in the context of the sentence. Think about here is that this contextual representation, it only contains information about the left context. It hasn’t seen the information of the words exciting or exclamation mark. What about the right context?
In this example, it is important because we’ve got the phrase “terribly exciting”. If you look at the word “terribly” in isolation, it usually means something bad. But “terribly exciting” means something good because it just means very exciting. If you know about the right context, the word “exciting” might quite significantly modify your perception of the meaning of the word “terribly” in the context of the sentence.
Using a Bidirectional RNN in Practice
The idea is that you have two RNN going on. You have the forward RNN as before that encodes the sentence left to right. Then separately, you also have a backward RNN. This has completely separate weights to the forward RNN. The backward RNN is just doing the same thing except that it’s encoding the sequence from right to left. So each of the hidden states is computed based on the one to the right. Then finally, you just take the hidden states from the two RNN and then you concatenate them together and you’ve got your final kind of representations.
If we think about this contextual representation of the word “terribly” in the context, this vector has information for both the left and the right. Because you had the forwards and backward RNNs that respectively had information from both left and right.
model_multi_bi = tf.keras.Sequential() model_multi_bi.add(tf.keras.layers.Embedding(vocab_size, 16)) model_multi_bi.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32,return_sequences=True))) model_multi_bi.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32,return_sequences=True))) model_multi_bi.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32))) model_multi_bi.add(tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)) model_multi_bi.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])
It concatenated hidden states that can be regarded as a kind of the outputs of the bidirectional RNN. If you’re going to use these hidden states for any kind of further computation, then it’s these concatenated hidden states that you are going to be passing on to the next part of the network.
When to Use Bidirectional LSTM
There are some situations where you can’t assume this. For example, in language modeling, you only have access to the left context. You don’t know what’s coming next or don’t have the full sequence. However, if you have access to the entire sequence. For example, if you’re doing any kind of encoding similar to the sentiment example, then bidirectionality is pretty powerful. You should probably regard it as a good thing to do by default. Because it turns out that getting this information from both the left and right.