In this tutorial we’re going to build a seq2seq model in TensorFlow. We’re going to have some toy data. We’re going to give it some sequence of inputs and try to predict same sequence. We’re going to predict that same input sequence and in the process learn how memory works in sequence to sequence model.
The sequence to Sequence model is used for a whole bunch of different stuff everything from chatbots to speech to text to dialogue systems to Q&A to image captioning.
Sequences preserve the order of the inputs. It allows processing information that has an order element to it and preserves information that couldn’t be done via normal neural networks.
Two main components to a sequence to sequence model. An encoder, which basically takes info in time steps creates a hidden state and then sets it up to be passed to the decoder and then your decoder takes that hidden state and then uses that to start predicting things. Another thing is crucial with this kind of models is you need a lot of data.
The key ideas behind this are that the aim is to convert a sequence into a fixed sized feature vector that encodes only the important information in the sequence while losing the unnecessary information.
Prepare Dataset
We’ll use a language dataset provided by http://www.gutenberg.org/ebooks/1399.Download text file and convert characters into integers.
def parse_text(file_path): with open(file_path) as f: text = f.read() char2idx = {c: i + 3 for i, c in enumerate(set(text))} char2idx['<pad>'] = 0 char2idx['<start>'] = 1 char2idx['<end>'] = 2 ints = np.array([char2idx[char] for char in list(text)]) return ints, char2idx
Create Input Function
In this tutorial, we use TensorFlow DataSet API to feed data into the model. We initialize a Dataset from a generator, this is useful when we have an array of different elements length like sequences.
def create_dict(s1, s2): return {'input': s1, 'output': s2} def start_(x): _x = tf.fill([tf.shape(x)[0], 1], params['char2idx']['<start>']) return tf.concat([_x, x], 1) def end_(x): _x = tf.fill([tf.shape(x)[0], 1], params['char2idx']['<end>']) return tf.concat([x, _x], 1) def input_fn(ints): dataset1 = tf.data.Dataset.from_generator( lambda: next_batch(ints), tf.int32, tf.TensorShape([None, params['seq_len']])) dataset1 = dataset1.map(start_) dataset2 = tf.data.Dataset.from_generator( lambda: next_batch(ints), tf.int32, tf.TensorShape([None, params['seq_len']])) dataset2 = dataset2.map(end_) dataset = tf.data.Dataset.zip((dataset1, dataset2)) dataset = dataset.map(create_dict) iterator = dataset.make_one_shot_iterator() return iterator.get_next()
Create a Model
You can separate the entire model into 2 part. The first part is Encoder and the second part is Decoder. Encoder akes a raw input text data just like any other RNN architectures. In the end, Encoder outputs a neural representation. The output of Encoder is going to be the input data for the Decoder.
def seq2seq_model(features, labels, mode, params): ops = {} if mode == tf.estimator.ModeKeys.TRAIN: batch_sz = tf.shape(features['input'])[0] with tf.variable_scope('main', reuse=False): embedding = tf.get_variable('lookup_table', [params['vocab_size'], params['hidden_dim']]) cells = multi_cell_fn() helper = tf.contrib.seq2seq.TrainingHelper( inputs=tf.nn.embedding_lookup(embedding, features['input']), sequence_length=tf.count_nonzero(features['input'], 1, dtype=tf.int32)) decoder = tf.contrib.seq2seq.BasicDecoder( cell=cells, helper=helper, initial_state=cells.zero_state(batch_sz, tf.float32), output_layer=tf.layers.Dense(params['vocab_size'])) decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode( decoder=decoder) logits = decoder_output.rnn_output output = features['output'] ops['global_step'] = tf.Variable(0, trainable=False) ops['loss'] = tf.reduce_mean(tf.contrib.seq2seq.sequence_loss( logits=logits, targets=output, weights=tf.to_float(tf.ones_like(output)))) ops['train'] = tf.train.AdamOptimizer().apply_gradients( clip_grads(ops['loss']), global_step=tf.train.get_global_step()) return tf.estimator.EstimatorSpec( mode=mode, loss=ops['loss'], train_op=ops['train'] ) if mode == tf.estimator.ModeKeys.PREDICT: with tf.variable_scope('main', reuse=True): cells = multi_cell_fn() decoder = tf.contrib.seq2seq.BeamSearchDecoder( cell=cells, embedding=tf.get_variable('lookup_table'), start_tokens=tf.tile(tf.constant( [params['char2idx']['<start>']], dtype=tf.int32), [1]), end_token=params['char2idx']['<end>'], initial_state=tf.contrib.seq2seq.tile_batch( cells.zero_state(1, tf.float32), params['beam_width']), beam_width=params['beam_width'], output_layer=tf.layers.Dense(params['vocab_size'], _reuse=True)) decoder_out, _, _ = tf.contrib.seq2seq.dynamic_decode( decoder=decoder, maximum_iterations=params['seq_len']) tf.identity(decoder_out[0].predicted_ids, name='predictions') predict = decoder_out.predicted_ids[:, :, 0] return tf.estimator.EstimatorSpec(mode=mode, predictions=predict)
The embedding layer allows us to map each token ID to a vector representation. The token embeddings are feed to the encoder.
Training Model
After defining the model, we run training steps by passing in batched inputs. In this tutorial, we use TensorFlow Estimator API to train model.
ints, params['char2idx'] = parse_text(FILE_PATH) params['vocab_size'] = len(params['char2idx']) params['idx2char'] = {i: c for c, i in params['char2idx'].items()} est = tf.estimator.Estimator( model_fn=seq2seq_model, model_dir='model_dir', params=params) est.train(input_fn=lambda: input_fn(ints), steps=1000)