Tensors are the building blocks for PyTorch Neural networks. It takes tensors as input and produces tensors as outputs. In fact, all operations within a neural network are between tensors, and all parameters (weights and biases) in a neural network are tensors.

A PyTorch module is a Python class deriving from the nn.Module base class. A module can have one or more Parameter instances as attributes, which are tensors, whose values are optimized during the training process (think w and b in our linear model). A module can also have one or more submodules (nn.Sequential) as attributes, and it will also be able to track their parameters.

In order to optimize the parameter of the model, its weights change in direction, leading to a decrease in the error. The procedure is repeated until the error, evaluated on unseen data, falls below an acceptable level.

Create a VGG Model

The following model builders can be used to instantiate a VGG16 model, with pre-trained weights. 

import torch
import torchvision.models as models

model = models.vgg16(pretrained=True)

The model builder above accepts the VGG16_Weights.DEFAULT values as the weights parameter. VGG16_Weights.DEFAULT is equivalent to VGG16_Weights.IMAGENET1K_V1. You can also use strings, e.g. weights='DEFAULT' or weights='IMAGENET1K_V1‘.

A pre-trained network is a model that has already been trained on a dataset. Such networks can typically produce useful results immediately after loading the network parameters.


Now we can use the parameters method to ask any nn.Module for a list of parameters owned by it or any of its submodules. Calling model.parameters() will collect weight and bias from modules. It’s instructive to inspect the parameters in this case by printing their shapes.

[param.shape for param in model.parameters()]

This call recurses into submodules defined in the module’s init constructor and returns a flat list of all parameters encountered.

optimi = optim.SGD(model.parameters(), lr=0.001)

The optimizer is provided with a list of tensors that were defined with requires_grad = True—all Parameters are defined this way by definition since they need to be optimized by gradient descent.

These are the tensors that the optimizer will get. After we call model.backward(), all parameters are populated with their grad, and the optimizer then updates their values accordingly during the optimizer.step() call.

Named Parameters

A few notes on parameters of nn.Modules. When inspecting the parameters of a model made up of several submodules, it is handy to be able to identify parameters by name. There’s a method for that, called named_parameters.

The name of each module in Sequential is just the ordinal with which the module appears in the arguments. Interestingly, Sequential also accepts an OrderedDict in which we can name each module passed to Sequential:

from collections import OrderedDict

seq_model = torch.nn.Sequential(OrderedDict([
    ('hidden_linear', torch.nn.Linear(1, 8)),
    ('hidden_activation', torch.nn.Tanh()),
    ('output_linear', torch.nn.Linear(8, 1))


This allows us to get more explanatory names for submodules:

for name, param in seq_model.named_parameters():
    print(name, param.shape,param.requires_grad)

#hidden_linear.weight torch.Size([8, 1]) True

This is more descriptive but it does not give us more flexibility in the flow of data through the network, which remains a purely sequential pass-through—the nn.Sequential is very aptly named. We can also access a particular Parameter by using submodules as attributes.


#Parameter containing:
#tensor([-0.1537], requires_grad=True)

PyTorch offers a quick way to determine how many parameters a model has through the parameters() method of nn.Model (the same method we use to provide the parameters to the optimizer). To find out how many elements are in each tensor instance, we can call the numel() method. Summing those gives us our total count. Let’s take a look at what we have right now:

numel_list = [p.numel() for p in model.parameters() if p.requires_grad == True]

sum(numel_list), numel_list

Depending on our use case, counting parameters might require us to check whether a parameter has requires_grad set to True. We might want to differentiate the number of trainable parameters from the overall model size.

require grad=True

Notice the requires_grad=True argument of the parameters? That argument is telling PyTorch to track the entire family tree of tensors resulting from operations on parameters. In other words, any tensor that will have parameters as an ancestor will have access to the chain of functions that were called to get from params to that tensor. 

In case these functions are differentiable (and most PyTorch tensor operations will be), the value of the derivative will be automatically populated as a grad attribute of the params tensor.

This is useful for inspecting parameters or their gradients: for instance, to monitor gradients during training. Say we want to print out the gradients of the weight of the linear portion of the hidden layer. We can run the training loop for the new neural network model and then look at the resulting gradients after the last epoch.

Related Post