One way to evaluate model efficiency is accuracy. The higher the accuracy, the more efficient the model is. It’s therefore essential to increase the accuracy by optimizing the model by applying loss functions.
The Cross-Entropy function has a wide range of variants, of which the most common type is the Binary Cross-Entropy (BCE). The BCE Loss is mainly used for binary classification models, that is, models having only 2 classes.
This article focuses more on the PyTorch BCELoss and BCEWithLogitsLoss for binary and multi-class classification and understanding the cross-entropy formula.
The Binary Cross-Entropy Loss function is a mathematical function that measures the difference between predicted probabilities and actual binary labels in classification tasks. The Binary Cross-Entropy Loss function has become a staple in training neural networks.
BCELoss
In binary classification problems, we have to predict the value only for one class because the probability of the negative class can be easily derived from it.
Suppose we are performing a binary classification problem, and our outputs are dog or cat. We only need to predict the probability of a particular example being a dog. The probability of that specific example being a cat can be easily derived from it.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
....
self.classifier = nn.Linear(1024, 1),
self.sigm=nn.Sigmoid()
def forward(self, x):
.......
x = self.classifier(x)
x = self.sigm(x)
return x
loss_func=torch.nn.BCELoss()
Sigmoid
How do we ensure that our model prediction output is in the range of (0, 1) or continuous?

We apply an activation function to output linear scores. Our example is what we call a binary classification, where you have two classes, either Dog or Cat. In this case, the activation function applied is referred to as the sigmoid activation function.
BCELoss Formula

‘y‘ is the label (1 for Dog and 0 for Cat) and p(y) is the predicted probability of the input being Dog for all N inputs. For each Dog(y=1), it adds log(p(y)) to the loss, that is, the log probability of it being Dog. Conversely, it adds log(1-p(y)), that is, the log probability of it being Cat, for each Cat input (y=0).
BCEWithLogitsLoss
It’s called Binary Cross-Entropy Loss because it sets up a binary classification problem between C = 2 classes for every class in C. So when using this Loss, the formulation of Cross Entropy Loss for binary problems is often used. It is also called Sigmoid Cross Entropy loss.

It is a Sigmoid activation plus a Cross-Entropy loss. It is independent for each class, meaning that the loss computed for every CNN output class is not affected by other component values.
That’s why it is used for multi-label classification, where the insight of an element belonging to a certain class should not influence the decision for another class.
Related Post
- Difference between BCELoss and BCEWithLogitsLoss in PyTorch
- Activation function for Output Layer in Regression, Binary, Multi-Class, and Multi-Label Classification
- Loss function for multi-class and multi-label classification in Keras and PyTorch
- How many output neurons for binary classification, one or two?
- Advantages of ReLU vs Tanh vs Sigmoid activation function in deep neural networks.
- What is Categorical Cross Entropy Loss Function in Keras?