Once you have trained a model, you don’t want to just “hope” it generalizes to new cases. You want to evaluate it and fine-tune it if necessary. The only way to know how well a model will generalize to new cases is to actually try it out on a new dataset.

A better option is to train your model using the training set, and you evaluate using the test set. The error rate on new cases is called the generalization error, and by evaluating your model on the test set, you get an estimation of this error. This value tells you how well your model will perform on instances it has never seen before.

I was making a multi-class classifier (0 to 5) NLP Model in Keras using Kaggle Dataset. As classes (0 to 5) are imbalanced, we use precision and recall as evaluation metrics.

embed_size = 128

inp = keras.layers.Input(shape=(max_len, ))
x = keras.layers.Embedding(max_words, embed_size)(inp)
x = keras.layers.Bidirectional(keras.layers.LSTM(50, return_sequences=True))(x)
x = keras.layers.GlobalMaxPool1D()(x)
x = keras.layers.Dropout(0.1)(x)
x = keras.layers.Dense(50, activation="relu")(x)
x = keras.layers.Dropout(0.1)(x)
x = keras.layers.Dense(6, activation="sigmoid")(x)

model = keras.models.Model(inputs=inp, outputs=x)

The next important step in the construction phase is to specify how to evaluate the model. We will simply use accuracy as our performance measure. 

There are many ways to evaluate a multiclass classifier, and selecting the right metric really depends on your project. For example, one approach is to measure the F1 score for each individual class, then simply compute the average score. This code computes the average F1 score across all labels.

Keras metrics are functions that are used to evaluate the performance of your deep learning model. Choosing a good metric for your problem is usually a difficult task. You need to understand which metrics are already available in Keras and how to use them.

The Keras library provides a way to calculate standard metrics when training and evaluating deep learning models. In Keras, metrics are passed during the compile stage as shown below. You can pass several metrics by comma separating them.

model.compile(loss='binary_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy',tf.keras.metrics.Precision(),tf.keras.metrics.Recall()])

model.fit(x_train, y_train,validation_data=(x_test,y_test),batch_size=batch_size, epochs=2)

After training your models for a while, you eventually have a model that performs sufficiently well. Now is the time to evaluate the final model on the test set. There is nothing special about this process, just get the predictors and the labels from your test set, and evaluate the final model on the test set:

results = model.evaluate(x_test, y_test, batch_size=128)

#9ms/step - loss: 0.0486 - accuracy: 0.9940 - precision: 0.8008 - recall_1: 0.6826

The model.evaluate() return scalar test loss if the model has a single output and no metrics or list of scalars if the model has multiple outputs and multiple metrics.

The attribute model.metrics_names will give you the display labels for the scalar outputs and metrics names.

print(model.metrics_names)
print(results)

#['loss', 'accuracy', 'precision', 'recall_1']
#[0.048554591834545135, 0.9940181374549866, 0.8007616996765137, 0.6825559735298157]

Run this code in Google Colab