In this tutorial, we will build a language model to predict the next word based on the previous word in the sequence. This tutorial demonstrates how to predict the next word within TensorFlow Keras API.

Download and Prepare data

In this tutorial, we will use Shakespeare dataset. You can use any other dataset that you like. Our model is very simple to give one word as input from sequences and the model will learn to predict the next word in the sequence.

For example:

Input Text: “who often drown could never die”

X           Y

who      often

often    drown

drown    could

could    never

never    die

The first step is to assign a unique integer to each word in the sequence and convert the sequences of words to sequences of integers. Keras provides the Tokenizer API that can be used to encode sequences. First, the Tokenizer is fit on the source text to develop the mapping from words to unique integers. Then sequences of text can be converted to sequences of integers by calling the texts_to_sequences() function.

import os
import time

import numpy as np
import tensorflow as tf
import unidecode
from keras_preprocessing.text import Tokenizer


file_path = "INPUT_FILE_PATH"

text = unidecode.unidecode(open(file_path).read())

tokenizer = Tokenizer()

encoded = tokenizer.texts_to_sequences([text])[0]

vocab_size = len(tokenizer.word_index) + 1

word2idx = tokenizer.word_index
idx2word = tokenizer.index_word

Next, we need to create sequences of words to train the model with one word as input and one word as output.

sequences = list()

for i in range(1, len(encoded)):
    sequence = encoded[i - 1:i + 1]

Then split the sequences into the input “X” and output elements “Y”. This is straightforward as we only have two columns in the data.

X, Y = sequences[:, 0], sequences[:, 1]
X = np.expand_dims(X, 1)
Y = np.expand_dims(Y, 1)

Create Input Pipelines

In this tutorial, we will use TensorFlow Dataset API to feed data into the model. It enables you to build complex input pipelines from simple, reusable pieces.

dataset =, Y)).shuffle(BUFFER_SIZE)
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)

Create the Model

We will use Keras functional API which is the right way for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers. We use the Model Subclassing API which gives us full flexibility to create the model and change it however we like. We use Embedding layer GRU layer and the Fully connected layer.

class Model(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, units, batch_size):
        super(Model, self).__init__()
        self.units = units
        self.batch_size = batch_size

        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)

        self.gru = tf.keras.layers.GRU(self.units,
        self.fc = tf.keras.layers.Dense(vocab_size)

    def call(self, inputs, hidden):
        inputs = self.embedding(inputs)

        output, states = self.gru(inputs, initial_state=hidden)

        output = tf.reshape(output, (-1, output.shape[2]))

        x = self.fc(output)

        return x, states

embedding_dim = 100

units = 512

model = Model(vocab_size, embedding_dim, units, BATCH_SIZE)

Save checkpoints during training

You can save checkpoints during—and after—training. This means a model can resume where it left off and avoid long training times. This way you can use a trained model without having to retrain it, or pick up training where you left of—in case the training process was interrupted.

optimizer = tf.train.AdamOptimizer()

checkpoint_dir = './training_checkpoints_1'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)

Train the model

We will use a custom training loop with the help of GradientTape(). We initialize the hidden state of the model with zeros and shape == (batch_size, number of RNN units). We do this by calling the function defined while creating the model.

Next, we iterate over the dataset(batch by batch) and calculate the predictions and the hidden states associated with that input.

def loss_function(labels, logits):
    return tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

for epoch in range(EPOCHS):
    start = time.time()

    hidden = model.reset_states()

    for (batch, (input, target)) in enumerate(dataset):
        with tf.GradientTape() as tape:
            predictions, hidden = model(input, hidden)

            target = tf.reshape(target, (-1,))
            loss = loss_function(target, predictions)

            grads = tape.gradient(loss, model.variables)
            optimizer.apply_gradients(zip(grads, model.variables))

            if batch % 100 == 0:
                print('Epoch {} Batch {} Loss{:.4f}'.format(epoch + 1, batch, loss))

    if (epoch + 1) % 10 == 0:

Predict next Word

Finally, it is time to predict next word using our train model.


start_string = "you"

input_eval = [word2idx[start_string]]
input_eval = tf.expand_dims(input_eval, 0)

text_generated = ''

hidden = [tf.zeros((1, units))]

predictions, hidden = model(input_eval, hidden)

predicted_id = tf.argmax(predictions[-1]).numpy()

text_generated += " " + idx2word[predicted_id]

print(start_string + text_generated)
predict next word in tensorflow keras