Rectified Linear Units (ReLU) is the classification function in a deep neural network. ReLU is an activation function in neural networks, with the Softmax function as their classification function. 

ReLU is used as an activation function for the hidden layers in a deep neural network. It takes the activation of the penultimate layer in a neural network and then uses it to learn the weight parameters of the ReLU classification layer through backpropagation.

The ReLU is the most commonly used activation function in deep learning. The function returns 0 if it receives any negative input and for any positive value x it returns that value. It can be written as f(x)=max(0,x).

In-place Operations

In-place operation directly changes the content of a given linear algebra, or Tensor without making a copy. In-place operations don’t make a copy of the input. That is why they can help to reduce memory usage when operating with high-dimensional data.

The following PyTorch code demonstrates how in-place operations help to consume less GPU memory. To do this, I am going to measure allocated memory for both out-of-place ReLU and in-place ReLU from PyTorch, with this simple function:

import torch
import torch.nn as nn 
import torch.nn.functional as F

def get_memory_allocation(device,inplace=False):

  x=torch.randn(1000, 1000, device=device)

  # Measure allocated memory
  torch.cuda.synchronize()
  start_max_memory = torch.cuda.max_memory_allocated() / 1024**2
  start_memory = torch.cuda.memory_allocated() / 1024**2

  if inplace:
    F.relu_(x)
  else:
    output = F.relu(x)


  # Measure allocated memory after the call
  torch.cuda.synchronize()
  end_max_memory = torch.cuda.max_memory_allocated() / 1024**2
  end_memory = torch.cuda.memory_allocated() / 1024**2

  # Return amount of memory allocated for square call
  return end_memory - start_memory, end_max_memory - start_max_memory

In-place operations are recognizable from a trailing underscore in their name, like relu_, which indicates that the method operates in place by modifying the input instead of creating a new output tensor and returning it. 

Call the function to measure the allocated memory for the out-of-place ReLU:

device = torch.device('cuda:0' if torch.cuda.is_available() else "cpu")

memory_allocated, max_memory_allocated = get_memory_allocation(device,inplace = False)

print('Allocated memory: {}'.format(memory_allocated))
print('Allocated max memory: {}'.format(max_memory_allocated))
Output:
Allocated memory: 3.81494140625
Allocated max memory: 0.0

Then call the in-place ReLU as follows:

memory_allocated, max_memory_allocated = get_memory_allocation(device,inplace = True)

print('Allocated memory: {}'.format(memory_allocated))
print('Allocated max memory: {}'.format(max_memory_allocated))
Output:
Allocated memory: 0.0
Allocated max memory: 0.0

Looks like using in-place operations helps us to save some GPU memory. When using autograd, we usually avoid in-place updates because PyTorch’s autograd engine might need the values we would be modifying for the backward pass. 

Related Post

How ReLU works in convolutional neural network

Advantages of ReLU vs Tanh vs Sigmoid activation function in deep neural networks.

How to use LeakyReLU as an Activation Function in Keras?

What is an in-place operation in PyTorch?

What is the Dying ReLU problem in Neural Networks?