In this post, we will develop a neural network for a word base language model. We will give it text and ask it to predict the next word in the sequence of text.

We have RNN cell. We unroll it a certain number of times. We put our text on the inputs. It will produce some kind of text on the output. To teach it we force sequence on the outputs which are the **same text shifted by one word**.

## RNN Cell

There is a full API for working with RNN in TensorFlow. When you use RNN cell this defines all the weights and biases that we had internally, You need to specify the parameter it’s the internal size of the vectors in this cell.

1 |
cell = tf.nn.rnn_cell.GRUCell(CELLSIZE) #CELLSIZE=512 |

## Deep RNN

There is a function for Deep RNN. It’s called `MultiRNNCell.`

You pass it a cell and specify how many times you want you this cell. It creates a new cell for you which is a stacked cell.

1 |
mCell = tf.nn.rnn_cell.MultiRNNCell([cell]*NLAYERS,state_is_tuple=False) |

## Dynamic RNN

The last thing we do is to unroll it using the dynamic RNN command. You give it the stacked cell that you just produced how many times do you unroll that will depend on the shape of ‘X’.

1 |
Hr,H=tf.nn.dynamic_rnn(mCell,X,initial_state=Hin) |

Here my input(X) is a sequence of the word. If I feed it with a sequence of five word my network will be unrolled five times.

Dynamic RNN doesn’t actually physically duplicate this piece of the graph in memory. It uses a graph node which is a for loop. They managed to get the gradient computation and backpropagation to work somehow across those for loop. It means that this actually adds a for loop as a node in your computation graph which means that at training time it will be able to unroll this five or six or seven times. It’s very flexible.

In our case, it’s not very useful because all of our input sequences will be the same. We take chunks of 30 words of our text. If you are working with sequences which can have different lengths well there is nothing to do you just pass the length of the sequence and dynamic RNN unroll your RNN and cell as many times on as needed on a per example basis.

I will need to apply a softmax activation layer on output(Hr) to obtain my result word. I also get the last state ‘H’ which is the state that I will need to pass back in the beginning as I continue my training.

## Softmax Layer

Look at the output state that we have H0 to H4 call it Hf. Its initial size is** [BATCHSIZE,SEQLEN,CELLSIZE]**.It is as many as we enrolled the sequence and state vectors is the internal size of the cell.

As we process them through our softmax activation layer (processing one of those ‘H’). It comes from one of the iterations of the cell or one of the batch examples. The only thing we do is that we reshape into a big bag of vectors [BATCHSIZE x SEQLEN, CELLSIZE]. We put them all on the same bag all ‘H’ across all the examples in our batch of 100.

1 |
Hf = tf.reshape(Hr, [-1, CELLSIZE]) |

Just with this reshape operation, we can now add a softmax activation layer. Use the layers interface to define it. Weights matrix and a Bias vector does the weighted sums and then I feed this through my softmax activation function and this will treat all of those H to Y transitions as on little cell. It will process more of them because some of them come from the batch examples or some of them come from the fact that this cell is iterated but it doesn’t make any difference will produce all of my outputs like that.

1 2 3 |
Ylogits = tf.layers.linear(Hf, ALPHASIZE) Y = tf.nn.softmax(Ylogits) |

Now that I have my predictions I can compute my loss because I know what I want to have on my outputs that were the goal.

1 |
loss = tf.nn.softmax_cross_entropy_with_logits(Ylogits, Y_) |

## Prediction

From the predictions of my model a apply `argmax`

which is a vector of probabilities tells me which element is the biggest and returns the index of that element.

Putting all the batch examples and all the examples coming from the different unroll steps in the same bag.

1 2 |
predictions=tf.argmax(Y,1) predictions=tf.reshape(predictions,[BATCHSIZE,-1]) |

I need to reshape it to get back to [BATCHSIZE,SEQLEN] a nice matrix when I have this matrix on each line one sequence of predicted word, on the next line the next sequence of predictive word for the next element in the batch.

## Batching

I needs to batch this information correctly. Let’s consider the first batch of sequences. I have a text and I extract 30 words sequence from it.Let’s say the first sequence is “the quick..”.This will be inputs on my recurrent neural networks unrolled here 4 times.

Next batch I need to pass the output states which I obtained when I fed in “the quick…”. I need to pass that into where I asked it to process the continuation of quick something.

The second batch has to continue across the first line of the of those batches and into the third batch and so on because the internal state is passed from iteration to like that which means that on the second line all the batch.