In this tutorial, we will build a language model to predict the next word based on the previous word in the sequence. This tutorial demonstrates how to predict the next word within TensorFlow Keras API.
Download and Prepare data
In this tutorial, we will use Shakespeare dataset. You can use any other dataset that you like. Our model is very simple to give one word as input from sequences and the model will learn to predict the next word in the sequence.
For example:
Input Text: “who often drown could never die” X Y who often often drown drown could could never never die
The first step is to assign a unique integer to each word in the sequence and convert the sequences of words to sequences of integers. Keras provides the Tokenizer API that can be used to encode sequences. First, the Tokenizer is fit on the source text to develop the mapping from words to unique integers. Then sequences of text can be converted to sequences of integers by calling the texts_to_sequences()
function.
import os
import time
import numpy as np
import tensorflow as tf
import unidecode
from keras_preprocessing.text import Tokenizer
tf.enable_eager_execution()
file_path = "INPUT_FILE_PATH"
text = unidecode.unidecode(open(file_path).read())
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])
encoded = tokenizer.texts_to_sequences([text])[0]
vocab_size = len(tokenizer.word_index) + 1
word2idx = tokenizer.word_index
idx2word = tokenizer.index_word
Next, we need to create sequences of words to train the model with one word as input and one word as output.
sequences = list()
for i in range(1, len(encoded)):
sequence = encoded[i - 1:i + 1]
sequences.append(sequence)
Then split the sequences into the input “X” and output elements “Y”. This is straightforward as we only have two columns in the data.
X, Y = sequences[:, 0], sequences[:, 1]
X = np.expand_dims(X, 1)
Y = np.expand_dims(Y, 1)
Create Input Pipelines
In this tutorial, we will use TensorFlow Dataset API to feed data into the model. It enables you to build complex input pipelines from simple, reusable pieces.
dataset = tf.data.Dataset.from_tensor_slices((X, Y)).shuffle(BUFFER_SIZE)
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)
Create the Model
We will use Keras functional API which is the right way for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers. We use the Model Subclassing API which gives us full flexibility to create the model and change it however we like. We use Embedding layer GRU layer and the Fully connected layer.
class Model(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, units, batch_size):
super(Model, self).__init__()
self.units = units
self.batch_size = batch_size
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.units,
return_sequences=True,
return_state=True,
recurrent_activation='sigmoid',
recurrent_initializer='glorot_uniform')
self.fc = tf.keras.layers.Dense(vocab_size)
def call(self, inputs, hidden):
inputs = self.embedding(inputs)
output, states = self.gru(inputs, initial_state=hidden)
output = tf.reshape(output, (-1, output.shape[2]))
x = self.fc(output)
return x, states
embedding_dim = 100
units = 512
model = Model(vocab_size, embedding_dim, units, BATCH_SIZE)
Save checkpoints during training
You can save checkpoints during—and after—training. This means a model can resume where it left off and avoid long training times. This way you can use a trained model without having to retrain it, or pick up training where you left of—in case the training process was interrupted.
optimizer = tf.train.AdamOptimizer()
checkpoint_dir = './training_checkpoints_1'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
Train the model
We will use a custom training loop with the help of GradientTape()
. We initialize the hidden state of the model with zeros and shape == (batch_size, number of RNN units). We do this by calling the function defined while creating the model.
Next, we iterate over the dataset(batch by batch) and calculate the predictions and the hidden states associated with that input.
def loss_function(labels, logits):
return tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
for epoch in range(EPOCHS):
start = time.time()
hidden = model.reset_states()
for (batch, (input, target)) in enumerate(dataset):
with tf.GradientTape() as tape:
predictions, hidden = model(input, hidden)
target = tf.reshape(target, (-1,))
loss = loss_function(target, predictions)
grads = tape.gradient(loss, model.variables)
optimizer.apply_gradients(zip(grads, model.variables))
if batch % 100 == 0:
print('Epoch {} Batch {} Loss{:.4f}'.format(epoch + 1, batch, loss))
if (epoch + 1) % 10 == 0:
checkpoint.save(file_prefix=checkpoint_prefix)
Predict next Word
Finally, it is time to predict next word using our train model.
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))
start_string = "you"
input_eval = [word2idx[start_string]]
input_eval = tf.expand_dims(input_eval, 0)
text_generated = ''
hidden = [tf.zeros((1, units))]
predictions, hidden = model(input_eval, hidden)
predicted_id = tf.argmax(predictions[-1]).numpy()
text_generated += " " + idx2word[predicted_id]
print(start_string + text_generated)
