In sequence data, individual samples have different lengths. Consider the following example of text tokenized as words.
[
["The", "weather", "will", "be", "nice", "tomorrow"],
["Hello", "world", "!"],
["How", "are", "you", "today"]
]
The data is a 2D list where individual samples have lengths 6, 3, and 4 respectively. The input data for a deep learning model must be a single tensor of shape e.g. (batch_size, seq_len, vocab_size)
in this case, samples that are shorter than the longest item need to be padded with some placeholder value or truncate long samples before padding short samples.
In this tutorial, you will discover RaggedTensor
that you can use it to prepare your variable-length sequence data for NLP in Python with Keras without any additional padding or user-facing logic.
RaggedTensor
is a new type of Tensor, it efficiently represents sequence data. It is designed to handle text and other variable-length sequences. It native representation of sequences of varying shapes.

Different Between RaggedTensor and SparseTensor
SparseTensors make the assumption that the underlying dense tensor is regularly shaped and unmentioned values are missing.RaggedTensors, on the other hand, makes no such assumption.

Here, the SparseTensor interprets the first batch element as John, null, null. While the RaggedTensor interprets it as simply John.
Create RaggedTensor
The simplest way to construct a ragged tensor is using tf.ragged.constant, which builds the RaggedTensor corresponding to a given nested Python list or NumPy array.
max_features = 20000
batch_size = 32
BUFFER_SIZE=1000
(x_train, y_train), (x_test, y_test)=tf.keras.datasets.imdb.load_data(
path="imdb.npz",
num_words=max_features,
skip_top=0,
maxlen=None,
seed=113,
start_char=1,
oov_char=2,
index_from=3)
r_train_x = tf.ragged.constant(x_train)
r_test_x = tf.ragged.constant(x_test)
Shape
A RaggedTensor can contain any number of irregular dimensions. The RaggedTensor.shape attribute returns a tf.TensorShape for a ragged tensor, where ragged dimensions have size None.
r_train_x.shape
r_train_x.bounding_shape()
The method tf.RaggedTensor.bounding_shape can be used to find a tight bounding shape for a given RaggedTensor:
But with raggedTensors, you don’t need to worry about maximum sizes, padding, or anything else.
Create Model
RaggedTensors support many TensorFlow APIs, including Keras, Datasets, SavedModels
RaggedTensors passed as inputs to a Keras model by setting ragged=True
on tf.keras.Input. RaggedTensors may also be passed between Keras layers, and returned by Keras models. The following LSTM model is trained using ragged tensors.
keras_model = tf.keras.Sequential([
tf.keras.layers.Input(shape=[None], dtype=tf.int32, ragged=True),
tf.keras.layers.Embedding(max_features,128),
tf.keras.layers.LSTM(32, use_bias=False),
tf.keras.layers.Dense(32),
tf.keras.layers.Activation(tf.nn.relu),
tf.keras.layers.Dense(1)
])
NumEpochs = 10
BatchSize = 32
keras_model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
history = keras_model.fit(r_train_x, y_train, epochs=NumEpochs, batch_size=BatchSize, validation_data=(r_test_x, y_test))
Create tf.data from RaggedTensor
tf.data is an API that enables you to build input pipelines from RaggedTensor. Datasets can be built from RaggedTensors using the same methods that are used to build them from tf.Tensors
or NumPy arrays, such as Dataset.from_tensor_slices.
train_data=tf.data.Dataset.from_tensor_slices((r_train_x,y_train)).shuffle(BUFFER_SIZE).batch(32)
test_data=tf.data.Dataset.from_tensor_slices((r_test_x,y_test)).batch(32)
...
keras_model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
history = keras_model.fit(train_data,epochs=5,validation_data=test_data)
Save Model
RaggedTensors can be used transparently with the functions and methods defined by a SavedModel.
import tempfile
keras_model_path = tempfile.mkdtemp()
tf.saved_model.save(keras_model, keras_model_path)
imported_model = tf.saved_model.load(keras_model_path)
#predict
imported_model(r_train_x[:10])
Related Post
- How to use a saved Keras model to Predict Text from scratch
- State-of-the-Art Text Classification using BERT in ten lines of Keras
- Multi-Label text classification in TensorFlow Keras
- Text Classification using Attention Mechanism in Keras
- Simple Text Classification using BERT in TensorFlow Keras 2.0