The tensor data structure is a fundamental building block of PyTorch. Tensors are pretty much like NumPy arrays, except that, a tensor is designed to take advantage of the parallel computation and capabilities of a GPU.

Tensor supports gradient calculations and operation tracking and is capable of tracking history. A lot of Tensor syntax is similar to NumPy arrays but tensor has some additional attributes.

Tensor is just an n-dimensional array in PyTorch. Tensors support some additional enhancements which make them unique. Apart from the CPU, they can be loaded on GPU for faster computations.

On setting requires_grad = True they start forming a backward graph that tracks every operation applied on them to calculate the gradients using a dynamic computation graph.

The tensor grad attribute holds the value of the gradient. If requires_grad is False it will hold a None value. Even if requires_grad is True, it will hold a None value unless the .backward() function is called from some other node. For example, if you call out.backward() for some variable out that involved x in its calculations then x.grad will hold ∂out/∂x.

The grad_fn  is the backward function used to calculate the gradient. The NumPy arrays don’t have this kind of attribute.

We create a tensor with requires_grad=True this means that autograd and computation history tracking are turned on.

a = torch.rand(2, requires_grad=True) # turn on autograd
print(a)

Create NumPy array from Tensor

If you have existing code with NumPy arrays, you may wish to express that same data as PyTorch tensors, whether to take advantage of PyTorch’s GPU acceleration or its efficient abstractions for building ML models. It’s easy to switch between NumPy arrays and PyTorch tensors:

a.numpy()

RuntimeError: Can’t call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

This is expected behavior because moving to NumPy will break the graph and so no gradient will be computed.

If you don’t actually need gradients, then you can explicitly .detach() the Tensor that requires grad to get a tensor with the same content that does not require grad. This other Tensor can then be converted to a NumPy array.

b=a.detach().numpy()
print(b) #[0.12650299 0.96350586]

PyTorch creates a tensor of the same shape and contains the same data as the NumPy array.

The tensor saves the operation history, and NumPy doesn’t have such objects. You can retrieve a tensor using the .data attribute. Then, this should work like

c=a.data.numpy()
c[1]=100

print(a) #tensor([ 10., 100.], requires_grad=True)

When creating an np.array from torch.tensor or vice versa, both object reference the same underlying storage in memory. Since np.ndarray does not store the computational graph associated with the array, this graph should be explicitly removed using detach() when sharing both numpy and torch wish to reference the same tensor.

b[0]=10

print(a) #tensor([10.0000,  0.9635], requires_grad=True)

The value of the element is shared by the tensor and the numpy array. Changing it to 10 they changed it in the tensor as well.

wrap with no_grad()

To prevent tracking history, you can wrap the code block with torch.no_grad(). This can be particularly helpful when evaluating a model because the model may have trainable parameters with requires_grad=True, but we don’t need the gradients.

with torch.no_grad():
    y = a.numpy()

In this mode, the result of every computation will have requires_grad=False, even when the inputs have requires_grad=True. This context manager will not affect computation in other threads.

PyTorch tensors are designed to be used in the context of gradient descent optimization, and therefore they hold not only a tensor with numeric values but the computational graph leading to these values. This computational graph is then used to compute the derivative of the loss function w.r.t for each of the independent variables used to compute the loss.

Related Post