PyTorch loss function computes a single numerical value that the learning process will attempt to minimize. The calculation of loss typically involves taking the difference between the desired outputs for some training samples and the outputs produced by the model when fed those samples. 

The loss function is a way of prioritizing which errors to fix from our training samples so that our parameter updates result in adjustments to the outputs for the highly weighted samples instead of changes to some other samples’ output that had a smaller loss. 

For image classification problems depending on the type of problem (binary vs multiclass) and the type of output (logits vs probabilities), we should choose the appropriate loss function to train our model. Binary cross-entropy is the loss function for binary classification (with a single output unit), and categorical cross-entropy is the loss function for multiclass classification.

In this tutorial, we will discuss BCEWithLogitsLoss loss functions. binary cross-entropy loss (nn.BCELoss) computes BCE loss on the predictions generated in the range [0, 1]. It is possible to generate a more numerically stable variant of binary cross-entropy loss by combining the Sigmoid and the BCE Loss into one loss function.

BCEWithLogitsLoss is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.

Difference between BCELoss and BCEWithLogitsLoss in PyTorch

The following code will show you how to use these loss functions with the logits given as inputs to the loss functions.

import torch

from torch import nn
from torchvision import transforms

from import DataLoader
from torchvision.datasets import FakeData

class BCEModel(nn.Module):

  def __init__(self):
    self.layers = nn.Sequential(
      nn.Linear(28 * 28 * 3, 64),
      nn.Linear(64, 32),
      nn.Linear(32, 1)
  def forward(self, x):
    return self.layers(x)

ReLU is mainly used in the hidden layers of an NN to add non-linearities to our model. But others, like sigmoid (for binary) and softmax (for multiclass), are added at the last (output) layer, which results in class-member- ship probabilities as the output of the model. 

How many output neurons for binary classification, one or two?

If the sigmoid or softmax activations are not included at the output layer, then the model will compute the logits instead of the class-membership probabilities. 

dataset = FakeData(size=10000, image_size=(3, 28, 28), num_classes=2, transform=transforms.ToTensor())
trainloader =, batch_size=128, shuffle=True, num_workers = 2, pin_memory = True)  

model = BCEModel()

loss_function = nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

for epoch in range(0, 5): # 5 epochs at maximum
    print(f'Starting epoch {epoch+1}')
    current_loss = 0.0    
    for i, data in enumerate(trainloader, 0):

      inputs, targets = data
      targets = targets.type(torch.FloatTensor).reshape((targets.shape[0], 1))
      outputs = model(inputs)

      loss = loss_function(outputs, targets)
      current_loss += loss.item()
      if i % 10 == 0:
          print('Loss after mini-batch %5d: %.3f' %
                (i + 1, current_loss / 500))
          current_loss = 0.0

At each step in the training loop, we evaluate our model on the samples we got from the data loader. We then compare the outputs of our model to the desired output using some loss function

After we have compared our actual outputs to the ideal with the loss functions, we need to push the model a little to move its outputs to better resemble the target. 

Please note that For binary classification, we can either provide logits as inputs to the loss function nn.BCEWithLogitsLoss(), or compute the probabilities based on the logits and feed them to the loss function nn.BCELoss()

Related Post

Understand PyTorch Cross Entropy Loss with Multiclass Classification

Understand PyTorch BCELoss and BCEWithLogitsLoss Loss functions

Loss function for multi-class and multi-label classification in Keras and PyTorch