PyTorch deep neural networks have millions of trainable parameters, training them on Kaggle or Google Colab often leads to running out of memory on the GPU. There are several simple ways to reduce the GPU memory occupied by the model, for example:

  • Chang architecture of the model or using the model with fewer trainable parameters (for example, resnet18 over resnet50). This approach can affect the model’s performance metrics.

PyTorch in-place operations help to avoid the out-of-memory. However, it is not recommended to use in-place operations for several reasons.

In this tutorial, I will describe what is in-place operations and how they might help to save GPU memory. And also describe why should avoid the in-place operations.

In-place operation directly changes the content of a given Tensor without making a copy. In-place operations don’t make a copy of the input. That is why they can help to reduce memory usage when operating with high-dimensional data.

PyTorch has a small number of operations that exist only as methods of the Tensor object. They are recognizable from a trailing underscore in their name, like add_, which indicates that the method operates in place by modifying the input instead of creating a new output tensor and returning it.

import torch

x = torch.tensor(15)
y = torch.tensor(10)

print("add",x + y)

print("x  ", x) #The value of x is unchanged.
print("y  ", y) #The value of y is unchanged.
Output:
add tensor(25)
x   tensor(15)
y   tensor(10)

Use in-place operarations

print("add",x.add_(y))

print("x  ", x) #in-place addition modifies values of tensors itself, here the value of x changed.
print("y  ", y)
Output:
add tensor(25)
x   tensor(25)
y   tensor(10)

Any method without the trailing underscore leaves the source tensor unchanged and returns a new one. Python operations like += or *= are also in-place operations.

Avoid In-place Operations

PyTorch discourages the use of in-place operations in most cases. Autograd’s aggressive buffer freeing and reuse makes it very efficient and there are very few occasions when in-place operations lower memory usage by any significant amount. 

Unless you’re operating under heavy memory pressure, you might never need to use them. Two main reasons limit the applicability of in-place operations:

  • In-place operations can potentially overwrite values required to compute gradients.
  • Every in-place operation requires the implementation to rewrite the computational graph. Out-of-place versions simply allocate new objects and keep references to the old graph, while in-place operations, require changing the creator of all inputs to the function representing this operation.

Related Post

How to assign num_workers to PyTorch DataLoader?

Difference between nn.ReLU() and nn.ReLU(inplace=True).