Tensors are the primary data structure by which PyTorch stores and manipulates numerical information. In PyTorch, tensors are utilized universally. They are used to represent the inputs to models, the weight layers within the models themselves, and the outputs of models.
PyTorch tensors can remember where they come from, in terms of the operations and parent tensors that originated them, and they can automatically provide the chain of derivatives of such operations with respect to their inputs.
PyTorch tensors can be initialized with the argument
requires_grad,which when set to
True, stores the tensor’s gradient in an attribute called grad.
params = torch.tensor([1.0, 0.0], requires_grad=True)
requires_grad=True argument to the tensor constructor telling PyTorch to track the entire family tree of tensors resulting from operations on params. In other words, any tensor that will have params as an ancestor will have access to the chain of functions that were called to get from params to that tensor.
The value of the derivative will be automatically populated as a grad attribute of the params tensor. In general, all PyTorch tensors have an attribute named grad. Normally, it’s None:
PyTorch creates the autograd graph with the operations. When we call
tensor.backward(), PyTorch traverses this graph in the reverse direction to compute the gradients.
x = torch.tensor(2.0, requires_grad=True) y = torch.tensor(3.0, requires_grad=True) z = torch.tensor(4.0, requires_grad=True) f = x**2+y**2+z**2 f.backward() print(x.grad, y.grad, z.grad,f) #(tensor(4.), tensor(6.), tensor(8.), tensor(29., grad_fn=<AddBackward0>))
The call to
backward() computes the partial derivative of the output f with respect to each of the input variables. In the case of neural networks, we can represent the neural network as f (x, θ) , where f is the neural network, x is some vector representing the input, and θ is the parameters of f.
If you want to freeze part of VGG16 pre-train PyTorch model and train the rest, you can set
requires_grad of the parameters you want to freeze to
model = torchvision.models.vgg16(pretrained=True) for param in model.features.parameters(): param.requires_grad = False
By switching the
requires_grad flags to
False, no intermediate buffers will be saved, until the computation gets to some point where one of the inputs of the operation requires the gradient.
PyTorch allows us to switch off autograd when we don’t need it, using the
“with torch.no_grad()” context manager. It is used to prevent calculating gradients in the following code block. It is used to evaluate the model and doesn’t need to call
backward() to calculate the gradients and update the corresponding parameters.
x = torch.tensor([1.], requires_grad=True) with torch.no_grad(): y = x * 2 print(y.requires_grad) #False @torch.no_grad() def doubler(x): return x * 2 z = doubler(x) print(z.requires_grad) #False def doubler(x): return x * 2 z = doubler(x) print(z.requires_grad) #True
related set_grad_enabled context, we can also condition the code to run with autograd enabled or disabled, according to a Boolean expression—typically indicating whether we are running in training or inference mode.
Count Trainable Parameter
Counting parameters might require us to check whether a parameter has
requires_grad set to
True, as well. We might want to differentiate the number of trainable parameters from the overall model size. Let’s take a look at what we have right now:
numel_list = [p.numel() for p in model.parameters() if p.requires_grad == True] sum(numel_list)