Tensors are the primary data structure by which PyTorch stores and manipulates numerical information. In PyTorch, tensors are utilized universally. They are used to represent the inputs to models, the weight layers within the models themselves, and the outputs of models.

PyTorch tensors can remember where they come from, in terms of the operations and parent tensors that originated them, and they can automatically provide the chain of derivatives of such operations with respect to their inputs.

PyTorch tensors can be initialized with the argument

, which when set to **requires_grad**

, stores the tensor’s gradient in an attribute **True****called grad**.

```
params = torch.tensor([1.0, 0.0], requires_grad=True)
```

argument to the tensor constructor telling PyTorch to track the entire family tree of tensors resulting from operations on params. In other words, any tensor that will have params as an ancestor will have access to the chain of functions that were called to get from params to that tensor. **requires_grad=True**

The value of the derivative will be automatically populated as a grad attribute of the params tensor. In general, all PyTorch tensors have an attribute named grad. Normally, it’s None:

```
print(params.grad) #None
```

PyTorch creates the autograd graph with the operations. When we call

, PyTorch traverses this graph in the reverse direction to compute the gradients.**tensor.backward()**

```
x = torch.tensor(2.0, requires_grad=True)
y = torch.tensor(3.0, requires_grad=True)
z = torch.tensor(4.0, requires_grad=True)
f = x**2+y**2+z**2
f.backward()
print(x.grad, y.grad, z.grad,f)
#(tensor(4.), tensor(6.), tensor(8.), tensor(29., grad_fn=<AddBackward0>))
```

The call to

computes the partial derivative of the output **backward()***f* with respect to each of the input variables. In the case of neural networks, we can represent the neural network as *f (x, θ)* , where *f* is the neural network, *x* is some vector representing the input, and *θ* is the parameters of *f*.

**requires_grad=False**

If you want to freeze part of VGG16 pre-train PyTorch model and train the rest, you can set `requires_grad`

of the parameters you want to freeze to `False`

.

```
model = torchvision.models.vgg16(pretrained=True)
for param in model.features.parameters():
param.requires_grad = False
```

By switching the `requires_grad`

flags to `False`

, no intermediate buffers will be saved, until the computation gets to some point where one of the inputs of the operation requires the gradient.

## no_grad()

PyTorch allows us to switch off autograd when we don’t need it, using the

context manager. It is used to prevent calculating gradients in the following code block. It is used to evaluate the model and doesn’t need to call **“with torch.no_grad()”**`backward() `

to calculate the gradients and update the corresponding parameters.

```
x = torch.tensor([1.], requires_grad=True)
with torch.no_grad():
y = x * 2
print(y.requires_grad) #False
@torch.no_grad()
def doubler(x):
return x * 2
z = doubler(x)
print(z.requires_grad) #False
def doubler(x):
return x * 2
z = doubler(x)
print(z.requires_grad) #True
```

Using the `related set_grad_enabled`

context, we can also condition the code to run with autograd enabled or disabled, according to a Boolean expression—typically indicating whether we are running in training or inference mode.

## Count Trainable Parameter

Counting parameters might require us to check whether a parameter has `requires_grad`

set to `True`

, as well. We might want to differentiate the number of trainable parameters from the overall model size. Let’s take a look at what we have right now:

```
numel_list = [p.numel() for p in model.parameters() if p.requires_grad == True]
sum(numel_list)
```

### Related Post

- How loss.backward(), optimizer.step() and optimizer.zero_grad() related in PyTorch
- What is Pytorch nn.Parameters?
- Access PyTorch model weights and bise with its name and ‘requires_grad value’
- Print Computed Gradient Values of PyTorch Model
- PyTorch:Difference between “tensor.detach()” vs “with torch.nograd()”