In this tutorial, we use Google’s Tensor2Tensor  library to make Translators using advanced new neural net architectures, specifically the Transformer.

Tensor2Tensor is built on top of TensorFlow but it has an additional component that is maybe a bit more research-oriented.

It is a library of models, hyperparameter sets for those models and data sets. Everything from MNIST to many translation tasks to sequence tasks.

Tensor2Tensor has done a really good job in the TensorFlow ecosystem of bridging a lot of research to production and that means that a lot of Tensor2Tensor models are actually quite easily exported to TensorFlow servings format. You can serve their models in production. It should be easy to try things out it should be easy to add new models and try them on many different data sets.

Install Tensor2Tensor

Tensor2Tensor release on pip. It releases about every one to three weeks as you can see it got over 61 releases so far and there are 145 contributors.

!pip install -q -U tensor2tensor
!pip install -q tensorflow matplotlib

Datasets or Problems

You don’t have to worry about the pre-processing of machine learning datasets. You don’t have to worry about getting it into a common format. You just call you to generate data. It goes to download the data from wherever it ’s hosted. It kind of preprocesses it and puts it into a common format that’s suitable for performance training in. In this case, all data set of Teneor2Tensor right out to disk to TFRecord files with TensorFlow example protocol buffers.


Tensor2Tensor has a lot of models. It has auto-encoders language models there’s LSTM variants, image models like ResNet and you can see using the following command. There are many variants of the transformer model which is the really powerful attention sequence model.

# There are many models available in Tensor2Tensor

Hyperparameter Sets

The next part is to actually instantiate this model with hyperparameter. The set all of our hyper-parameter sets are defined in code. We instantiate the model in training mode.

Normally when you run T2T-trainer which is the kind of main command line scripts to train and evaluate models. You won’t even have to think about what your metrics to be used for a given problem.

Generate Dataset

In this tutorial, we train a translation model and train it with the transformer. Typically we first generate the data now and the command would look like this.

t2t-datagen \
  --data_dir=/tensor2tensor/data \
  --tmp_dir=/tensor2tensor/data_gen \

t2t-datagen we need to specify the problem that we want to generate the data for. In this case, we want to generate English to German translation of the WMT data set. That will generate on disk with a vocabulary of 32,000 and we would give it a data directory and a temp directory where it’ll download things too.

{Image Of Data Directory}

If we take look at that data directory we’ll find that we have 101 files. We’ve got one eval shard and then we’he got a 100 training shards here. We’ll also find that there’s a vocab file and we’re going to be using this one. This file has 32,000 tokens in the vocabulary. That automatically got created we don’t have to worry about it

Training Model

Here’s how we would train the model,t2t-trainer is kind of the main workhorse and it normally kind of requires a triplet of the model,hyperparameter set and problem.

t2t-trainer \
  --data_dir=/tensor2tensor/data \
  --problem=translate_ende_wmt32k \
  --model=transformer \
  --hparams_set=transformer_base \
  --output_dir=/tensor2tensor/train \
  --train_steps=10000 \

Here We’ll train a transformer model. We’ll train it on that same problem translate_ende_wmt32k

We’ll use the hyper-parameter set transformer_base and all the hyper-parameter sets are defined in the same file as the model definition so if you want to train in Tensor2Tensor you would see a pretty huge number of hyperparameters sets and almost always the one that you should kind of start with is the one that ends base.

The trainer takes a few more commands We’ll just train for 10000 steps, train_steps is 10,000. We have to provide couple directories one is the data directory and the other is an output directory where all the model checkpoints and event files will go.

This trainer both trains and evaluates and we’ll see eval metrics at the end of every 1000 steps.

{imag Training}

Running our model we see the log for the loss and the steps. After it’s finished 1000 steps it’s saving the checkpoint and it’s actually going through an evaluation.

{Eval image }

After 10000 training steps, we don’t really expect to see a very good evaluation metrics but as you can see it’s going through the evaluation data and it logs out a bunch of evaluation metrics. For example, accuracy or accuracy over the entire sequence and then for the real full evaluation it would report blue scores.


t2t-decoder \
  --data_dir=/tensor2tensor/data \
  --problem=translate_ende_wmt32k \
  --model=transformer \
  --hparams_set=transformer_base \
  --output_dir=/tensor2tensor/train \

There’s a command t2t-decoder it takes the same triplets the model and hyperparameter set and the problem. Here we’ll provide the same data directory so that it can grab the vocabulary file and the output directory where checkpoints are save. We have actually a decode interactive mode which is nice because you can just kind of them type right into the terminal and see predictions i just sort of interactively.


Interactive mode is ready and we can ask it to translate English sentence.