One way to evaluate model efficiency is accuracy.  The higher the accuracy, the more efficient the model is. It’s therefore essential to increase the accuracy by optimizing the model by applying loss functions

The Cross-Entropy function has a wide range of variants, of which the most common type is the Binary Cross-Entropy (BCE). The BCE Loss is mainly used for binary classification models, that is, models having only 2 classes. 

This article focuses more on the PyTorch BCELoss and BCEWithLogitsLoss for binary and multi-class classification and understanding the cross-entropy formula.

The Binary Cross-Entropy Loss function is a mathematical function that measures the difference between predicted probabilities and actual binary labels in classification tasks. The Binary Cross-Entropy Loss function has become a staple in training neural networks.


In binary classification problems, we have to predict the value only for one class because the probability of the negative class can be easily derived from it. 

Suppose we are performing a binary classification problem, and our outputs are dog or cat. We only need to predict the probability of a particular example being a dog. The probability of that specific example being a cat can be easily derived from it.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()        
        self.classifier = nn.Linear(1024, 1),             
    def forward(self, x):
        x = self.classifier(x)                
        x = self.sigm(x)
        return x 



How do we ensure that our model prediction output is in the range of (0, 1) or continuous?

PyTorch BCELoss With Sigmoid

We apply an activation function to output linear scores. Our example is what we call a binary classification, where you have two classes, either Dog or Cat. In this case, the activation function applied is referred to as the sigmoid activation function.

BCELoss Formula

PyTorch BCELoss Formula

y‘ is the label (1 for Dog and 0 for Cat) and p(y) is the predicted probability of the input being Dog for all N inputs. For each Dog(y=1), it adds log(p(y)) to the loss, that is, the log probability of it being Dog. Conversely, it adds log(1-p(y)), that is, the log probability of it being Cat, for each Cat input (y=0).


It’s called Binary Cross-Entropy Loss because it sets up a binary classification problem between  C  = 2  classes for every class in  C. So when using this Loss, the formulation of Cross Entropy Loss for binary problems is often used. It is also called Sigmoid Cross Entropy loss.

PyTorch BCEWithLogitsLoss

It is a Sigmoid activation plus a Cross-Entropy loss. It is independent for each class, meaning that the loss computed for every CNN output class is not affected by other component values.

That’s why it is used for multi-label classification, where the insight of an element belonging to a certain class should not influence the decision for another class.

Related Post