The process of running a trained model on new data is called evaluation or inference in deep learning. In order to do an evaluation, we need to put the network in eval mode:

#Turns off training-time behavior

for data in testloader:

        images, labels = data

        outputs = net(images)

        _, predicted = torch.max(, 1)
        total += labels.size(0)
  correct += (predicted == labels).sum().item()

There are certain layers that behave differently in training and evaluation modes, such as the dropout and batch normalize layers.

Dropout Layer

The Dropout layer use for regularization, with the argument ‘p’ that denotes the drop probability, which is used to determine the probability of dropping the units during training. 

One important point to remember is that units may drop randomly during training only, for the evaluation or inference phase, all the hidden units must be active. To ensure that the overall activations are on the same scale during training and prediction, the activations of the active neurons have to be scaled appropriately.

When calling this layer, its behavior can be controlled via model.train() and model.eval() to specify whether this call will be made during training or during the inference. When using dropout, alternating between these two modes is crucial to ensure that it behaves correctly, for instance, nodes are only randomly dropped during training, not evaluation or inference.

Batch Normalization Layer

The PyTorch API provides a class, nn.BatchNorm2d() that we can use as a layer when defining our models. Note that the behavior for updating the learnable parameters depends on whether the model is a training model or not. These parameters are learned only during training and are then used for normalization during the evaluation.

Note that using the designated settings for training model.train() and evaluation model.eval() will automatically set the mode for the dropout layer and batch normalization layers and rescale appropriately so that we do not have to worry about that at all.

with torch.no_grad()

The validation loop looks very similar to training but is somewhat simplified. The key difference is that validation is read-only. Specifically, the loss value returned is not used, and the weights are not updated.

#We do not want gradients as we will not want to update the parameters.

with torch.no_grad():
  for data, target in test_loader:
      output = model(data)
      test_loss += F.nll_loss(output, target, size_average=False).item()
      pred =, keepdim=True)[1]
      correct += pred.eq(
  test_loss /= len(test_loader.dataset)

We are encapsulating the update in a no_grad context using the ‘with’ statement. This means within the with block the PyTorch autograd mechanism should look away. That does not add edges to the forward graph. In fact, when we are executing this bit of code, the forward graph that PyTorch records is consumed when we call backward, leaving us with the params leaf node.

Using the torch.no_grad context manager, we won’t see any meaningful advantage in terms of speed or memory consumption on our small problem. However, for larger models, the differences can add up. 


Using the related ‘set_grad_enabled’ context, we can also condition the code to run with autograd enabled or disabled, according to a boolean expression indicating whether we are running in training or inference mode.

with torch.set_grad_enabled(phase == 'train'):
       outputs = model(inputs)
       _, preds = torch.max(outputs, 1)
       loss = criterion(outputs, labels)