Model accuracy is not a preferred performance measure for classifiers, especially when you are dealing with very imbalanced validation data. A much better way to evaluate the performance of a classifier is to look at the Confusion Matrix, Precision, Recall or ROC curve.

In the previous tutorial, We discuss the Confusion Matrix. It gives you a lot of information, but sometimes you may prefer a more concise metric. An interesting one to look at is the accuracy of the positive predictions, this is called the precision of the classifier.

Keras Precision

 What percentage of predicted Positives is truly Positive?

True Positive is the number of truly classify as a positive, and False Positive is the number of truly classify as a negative.

Scikit-Learn provides functions to compute precision and recall:

from sklearn.metrics import precision_score , recall_score



#[0.58609539 0.81415929 0.54846939 0.42559242 0.64769231 0.5540.59534207 0.65247611 0.75 0.65563506]

The Precision also uses with another metric Recall, also called sensitivity or true positive rate ( TPR ). This is the ratio of positive instances that are correctly detected by the classifier

Keras Recall

What percentage of actual Positives is correctly classified?

False Negative is the number of falsely classified as negative.


#[0.725 0.644 0.43  0.449 0.421 0.554 0.818 0.751 0.63  0.733]

Precision is a measure of result relevancy, while recall is a measure of how many truly relevant results are returned. 

It is often convenient to combine precision and recall into a single metric called the F1 score, in particular, if you need a simple way to compare classifiers.

You can get the precision and recall for each class in a multi-class classifier using sklearn.metrics.classification_report.

from sklearn.metrics import classification_report

print(classification_report(y_val, y_val_pred))

Compute Precision, Recall, F1 score for each epoch.

As of Keras 2.0, precision and recall were removed from the master branch because they were batch-wise so the value may or may not be correct.

Keras allows us to access the model during training via a Callback function, on which we can extend to compute the desired quantities.

class ModelMetrics(tf.keras.callbacks.Callback):
  def on_train_begin(self,logs={}):
  def on_epoch_end(self, batch, logs={}):

Above code compute Precision, Recall and F1 score at the end of each epoch, using the whole validation data.

on_train_begin is initialized at the beginning of the training. Here we initiate 3 lists to hold the values of metrics, which are computed in on_epoch_end. Later on, we can access these lists as usual instance variables,

Define the model, and add the callback parameter in the fit function:


history =, y_train,
              validation_data=(x_val, y_val),