Fine-tuning all parameters is the only way to create models that achieve good performance. One of these parameters is accuracy, measured using the loss function. The most widely used loss function in machine learning applications is cross entropy.

In this tutorial, we will deep dive into Categorical Cross Entropy loss functions and their applications in machine learning, particularly for image classification.

## What is Entropy?

It was first introduced by Claude Shannon in his groundbreaking work, A Theory of Communication,‘ in 1948. Entropy is the average number of bits required to represent or transmit an event drawn from the probability distribution for the random variable.

Entropy indicates the amount of uncertainty of an event. For example fair coin toss outcome. Fair coin have two outcomes, both have P[X=H] = P[X=T] = 1/2.

Both terms are 0 for the coin, always H or always T, so the entropy is 0. Now let’s understand how cross-entropy works for the deep neural network using a classification example.

The machine learning model determines the probability that falls within each class name. Cross-entropy can used to determine how the model output differs for each label.

Each predicted class probability is compared to the desired output of 0 or 1. The calculated loss penalizes the probability based on how far it is from the expected value.

## Cross Entropy Loss

Cross-entropy loss is used when adjusting model weights during training. The aim is to minimize the loss—the smaller the loss, the better the model. It measures the difference between a model’s actual output and predicted probability distributions.

An activation function (Sigmoid/Softmax) is applied to the scores before the CE loss computation.

Softmax converts logits into probabilities. The purpose of cross-entropy is to take the output probabilities (P) and measure the distance from the truth values.

## Categorical Cross Entropy

In multiclass classification, the raw outputs of the neural network are passed through the softmax activation, which then outputs a vector of predicted probabilities over the input classes.

Categorical Cross Entropy is also known as Softmax Loss. It’s a softmax activation plus a Cross-Entropy loss used for multiclass classification. Using this loss, we can train a Convolutional Neural Network to output a probability over the N classes for each image.

For multi-class classification, the labels are one-hot, so only the positive class keeps its term in the loss. There is only one element of the target vector, different than zero.

## Categorical Cross Entropy in Keras

categorical_crossentropy computes the cross-entropy loss between the labels and predictions.

```y_true = [[0, 1, 0], [0, 0, 1]]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
# Using 'auto'/'sum_over_batch_size' reduction type.
cce = tf.keras.losses.CategoricalCrossentropy()
cce(y_true, y_pred).numpy()
```

The cross entropy loss function is used when there are two or more label classes. Labels need to be provided in a one_hot representation. If you want to provide labels as integers, please use SparseCategoricalCrossentropy loss. There should be # classes floating point values per feature.