Embedding Layer captures the semantic meaning of the input by placing semantically similar inputs close together. It maps high-dimensional input into a lower-dimensional space such that similar inputs are nearby.

There are many existing mathematical techniques for creating embedding. For example, Given a set of word vectors,  principal component analysis (PCA) tries to find highly correlated dimensions that can be collapsed into a single dimension.

The embedding layer determines which words are similar to each other. The following embeddings show geometrical relationships that capture semantic relations like the relation between a country and its capital.

Keras Embedding

Embedding is a dense vector of floating-point values. We do not have to specify these values manually, they are trainable parameters, weights learned by the model during training, in the same way, a model learns weights for a dense layer. 

Word embedding

An embedding is a matrix in which each column is the vector that corresponds to an item in your vocabulary. You could retrieve the embedding for each individual item.

After these weights have been learned, we can encode each word by looking up the dense vector it corresponds to in the table.

Embedding Dimension

It is common to see word embeddings that are 8-dimensional for small datasets, up to 1024 dimensions when working with large datasets. A higher dimensional embedding can capture fine-grained relationships between words but takes more data to learn.

When learning a d-dimensional embedding each item is mapped to a point in a d-dimensional space so that similar items are nearby in this space. 

How to use the embedding layer?

The embedding layer is just a special type of hidden layer of size d. This can be combined with any hidden layers. This layer connects to a single hidden layer that maps from integer indices to their embeddings. 

The Embedding layer takes the integer-encoded vocabulary. These vectors are learned as the model trains. The resulting dimensions are: (batch, sequence, embedding).

The weights for the embedding are randomly initialized. During training, they are gradually adjusted via backpropagation.


model = tf.keras.Sequential([
  layers.Embedding(encoder.vocab_size, embedding_dim),
  layers.Dense(16, activation='relu'),
  layers.Dense(1, activation='sigmoid')


After the model has been trained, you have an embedding. This embedding can be reused in other classifiers.

Save Embedding

You can get the word embeddings by using the get_weights() method of the embedding layer 



encoder = info.features['text'].encoder

out_v = io.open('vecs.tsv', 'w', encoding='utf-8')

for num, word in enumerate(encoder.subwords):
  vec = weights[num+1] # skip 0 for padding.
  out_v.write(word+'\t'.join([str(x) for x in vec]) + "\n")


Run this code in Google Colab