Model accuracy is not a preferred performance measure for classifiers, especially when you are dealing with very imbalanced validation data. A much better way to evaluate the performance of a classifier is to look at the Confusion Matrix, Precision, Recall, or ROC curve.

In the previous tutorial, We discuss the Confusion Matrix. It gives you a lot of information, but sometimes you may prefer a more concise metric. An interesting one to look at is the **accuracy of the positive predictions**, this is called the precision of the classifier.

What percentage ofpredicted Positivesis truly Positive?

True Positive is the number of *truly* classify as a positive, and False Positive is the number of truly classify as a negative.

Scikit-Learn provides functions to compute precision and recall:

```
from sklearn.metrics import precision_score , recall_score
y_val_pred=model.predict_classes(x_val)
print(precision_score(y_val,y_val_pred,average=None))
#[0.58609539 0.81415929 0.54846939 0.42559242 0.64769231 0.5540.59534207 0.65247611 0.75 0.65563506]
```

The Precision also uses with another metric Recall, also called sensitivity or true positive rate ( TPR ). This is the ratio of positive instances that are correctly detected by the classifier.

What percentage of

actual Positivesis correctly classified?

False Negative is the number of falsely classified as negative.

```
print(recall_score(y_val,y_val_pred,average=None))
#[0.725 0.644 0.43 0.449 0.421 0.554 0.818 0.751 0.63 0.733]
```

Precision is a measure of result relevancy, while recall is a measure of how many truly relevant results are returned.

It is often convenient to combine precision and recall into a single metric called the F1 score, in particular, if you need a simple way to compare classifiers.

You can get the precision and recall for each class in a multi-class classifier using `sklearn.metrics.classification_report`

.

```
from sklearn.metrics import classification_report
print(classification_report(y_val, y_val_pred))
```

### Compute Precision, Recall, F1 score for each epoch.

As of Keras 2.0, precision and recall were removed from the master branch because they were batch-wise so the value may or may not be correct.

Keras allows us to access the model during training via a Callback function, on which we can extend to compute the desired quantities.

```
class ModelMetrics(tf.keras.callbacks.Callback):
def on_train_begin(self,logs={}):
self.precisions=[]
self.recalls=[]
self.f1_scores=[]
def on_epoch_end(self, batch, logs={}):
y_val_pred=self.model.predict_classes(x_val)
_precision,_recall,_f1,_sample=score(y_val,y_val_pred)
self.precisions.append(_precision)
self.recalls.append(_recall)
self.f1_scores.append(_f1)
```

Above code compute Precision, Recall and F1 score at the end of each epoch, using the whole validation data.

`on_train_begin `

is initialized at the beginning of the training. Here we initiate 3 lists to hold the values of metrics, which are computed in `on_epoch_end`

. Later on, we can access these lists as usual instance variables,

Define the model, and add the callback parameter in the fit function:

```
metrics=ModelMetrics()
history = model.fit(x_train, y_train,
batch_size=BATCH_SIZE,
epochs=2,
validation_data=(x_val, y_val),
callbacks=[metrics],
shuffle=True)
print(metrics.precisions)
```

### Related Post

- Micro and Macro Averages for imbalance multiclass classification
- Calculate F1 Macro in Keras
- Calculate and Plot AUC ROC Curve for Multi-Class Classification
- Create a Confusion Matrix for the Keras model and plot it in TensorBoard
- Split Imbalanced dataset using sklearn Stratified train_test_split().
- How to get the ROC curve and AUC for Keras model?