In this tutorial, we are going to build machine translation seq2seq or encoder-decoder model in TensorFlow.The objective of this seq2seq model is translating English sentences into German sentences.

After training the model, you will be able to input an English sentence, such as “I am a student” and return the German translation: “Ich bin ein Student”.

Prepare Translation DataSet


In this tutorial, we will use an English to German dataset from the http://www.manythings.org/anki/ website. Download the dataset and decompress it. You will have a deu.txt that contains pairs of English to German phases, one pair per line with a tab separating the language.
tensorflow NMT
After downloading the dataset, here are the steps we’ll take to prepare the data:

  • First, we must load the data in a way that preserves the Unicode German characters.
  • We must split the loaded text by line and then by phrase.
  • Clean the sentences by removing special characters.
  • Create a word index and reverse word index (dictionaries mapping from word → id and id → word).
  • After clean dataset, all the text have less than 2 words and more than 30 words are removed from the dataset.

Any word that is used at least 1 time in the set of English or German is added to the vocabulary. The new English and German text are created by using only the words in the created vocabulary. Any word not found in the vocabulary is replaced by <UNK> in both files. This set act as the training set. The vocabulary of words is also mapped with integer aliases for identification purposes.

This dataset is then sorted based on the number of words in the English file to reduce the impact of padding required in the batch of questions put to training. Please visit the GitHub repo for more detailed implementation and code.

Data Input Pipeline


Estimator input_fn function creates and returns TF placeholders related to the building model.

source placeholder will be feed with English sentence data and its shape is [None, None]. The first None means the batch size and the batch size is unknown since the user can set it. The second None means the lengths of sentences. The maximum length of sentence is different from batch to batch, so it cannot be set with the exact number. The targets placeholder is similar to placeholder except that it will be feed with German sentence data.

Estimator Feed Data Function

Set the lengths of every sentence to the maximum length across all sentences in every batch. you need to add <pad> a special character.

Build NMT Seq2Seq Model


Tensorflow NMT Model
An encoder converts a source sentence into a “meaning” vector which is passed through a decoder to produce a translation. You have two recurrent neural network which you tag back to back. One is called an encoder and the other one is called a decoder. You will feed an English sentence to encoder then feed the output state of encoder into the decoder and the decoder will generate a German sentence.

Encoder

Let’s first embed our words using embedding lookups then we need a GRU cell for our encoder and actually just to show you that these cells can be wrapped to implement various regularization techniques like a dropout. Then I use my dynamic RNN to unroll this encoder cell.

Bidirectionality on the encoder side gives better performance. Here, we give a simplified example of how to build an encoder with a single bidirectional layer.encoder_outputs is the set of all source hidden states at the top layer and has the shape of [max_len, batch_size, num_units]

Decoder

The decoder is again a GRU cell. We will use the Beam Search trick to produce from the unrolled decoder the most probable sequence of words instead of just the most probable word. The seq2seq API also has a dynamic decoder function to which I feed my decoder cell and this will unroll the sequence and build my decoder.

Attention Mechanism

In the Encoder, encoder_outputs is the set of all source hidden states at the top layer and has the shape of [max_len, batch_size, num_units]. For the attention mechanism, we need to make sure the “memory” passed in is batch major, so we need to transpose attention_states. We pass source_sequence_length to the attention mechanism to ensure that the attention weights are properly normalized.

Decoder Input
One obvious question is what do you put on the inputs of this decoder neural network. During training is actually very simple what is supposed to be happening a little bit like in the language model. Each one of those cells in the decoder is supposed to produce a word and to produce an output state which feeds into the next cell. You are supposed to feed also the word that was produced before as the input into the next cell at least that’s how you train it.

Loss

Given the logits above, we are now ready to compute our training loss.

weights is a zero-one matrix of the same size as decoder_outputs. It masks padding positions outside of the target sequence lengths with values 0.

During Inference

When you are actually prediction something it’s a bit more complicated. Once you have trained this network to actually translate a sentence you feed in into the encoder “I am student” and you have an output vector then I will feed this output vector into my first decoder cell and this decoder cell needs also something on its inputs if this go token which is like a word so I need to embed it which is why I use embedding lookup then I run this through dynamic RNN cell.

Train Model


Create Estimators

An Estimator is TensorFlow’s high-level representation of a complete model. It handles the details of initialization, logging, saving and restoring, and many other features so you can concentrate on your model.

TensorFlow has written for you a ton of boilerplate code that is not interesting to write, things like regularly outputting checkpoints. If your training crashes after 24 hours, you can restart from where you were, exporting the model at the end so that you have something that is ready to deploy to a serving infrastructure or distributed training. The distributional algorithms of distributed training also baked in into the estimator.

Train Model

Train the model by calling the Estimator’s train method as follows:

Train NMT model

loss NMT

Predictions (inferring) from Trained Model

We now have a trained model. We can now use the trained model to translate the English sentence. As with training, we make inferring using a single function call.

Predicting NMT

Please visit the GitHub repo for more detailed information and actual codes. It will cover a bit more topics like how to preprocess the dataset, how to define inputs, and how to train and get a prediction.

Leave a Reply