At each step in the training loop, we evaluate our model on the samples we got from the data loader. We then compare the outputs of our model to the targets using the loss function. After we have compared our actual outputs to the ideal with the loss functions, we need to push the model a little to move its outputs to better resemble the target.

The training loss will tell us if our model can fit the training set at all. If the training loss is not decreasing, chances are the model is too simple for the data. The other possibility is that our data just doesn’t contain meaningful information.

If the loss evaluated in the validation set doesn’t decrease along with the training set, it means our model is improving during training, but it is not generalizing to samples outside this precise set. As soon as we evaluate the model at new, previously unseen points, the values of the loss function are poor. If the training loss and the validation loss diverge, we’re overfitting.

The PyTorch module produces outputs for a **batch of multiple inputs** at the same time. Thus, assuming we need to run the model on 32 samples, we can create an input tensor of size B × C × H × W with a batch size of 32, three-channel dimensions (red, green, and blue), and an unspecified number of pixels for height and width. The output is a tensor of size B × num_out, where num_out is the number of output features.

```
def train_model(model, criterion, optimizer, scheduler, num_epochs=10):
for epoch in range(num_epochs):
print(f'Epoch {epoch}/{num_epochs - 1}')
print('-' * 10)
model.train() # Set model to training mode
running_loss = 0.0
# Iterate over data.
for inputs, labels in dataloaders:
inputs = inputs.to(device)
labels = labels.to(device)
# zero the parameter gradients
optimizer.zero_grad()
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
scheduler.step()
epoch_loss = running_loss / dataset_sizes
print(f' Loss: {epoch_loss:.4f} ')
# load best model weights
model.load_state_dict(best_model_wts)
return model
```

At the core of our training, we have two nested loops: an outer one over the epochs and an inner one of the DataLoader that produces batches from our Dataset. In each loop, we then have to feed the inputs through the model and compute the loss.

## Loss.item()

loss.item() returns the value as a standard Python number and moves the data to the CPU. It converts the value into a plain python number and a plain python number can only live on the CPU.

## Batch Loss

loss.item() contains the loss of the entire mini-batch, It’s because the loss given loss functions is divided by the number of elements i.e. the reduction parameter is mean by default(divided by the batch size).

```
torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')
```

That’s why loss.item() is multiplied with batch size, given by inputs.size(0), while calculating running_loss.

## Training Loss

Since you are calculating the batch loss, you could just sum it and calculate the mean after the epoch finishes or at the end of the epoch, we divide by the number of steps(dataset size). It gives you the correct average sample loss for this particular epoch. This training loss is used to see, “how well your model performs on the training dataset”.