PyTorch has a whole submodule dedicated to torch.nn. It contains the building blocks needed to create all neural network architectures. Those building blocks are often referred to as layers in PyTorch. 

A PyTorch module is a Python class deriving from the nn.Module base class. A module can have one or more Parameters (its weights and bise) instances as attributes, which are tensors. A module can also have one or more submodules (subclasses of nn.Module) as attributes, and it will also be able to track their parameters.

Tensors are the building blocks for data in PyTorch. Neural networks take tensors as input and produce tensors as outputs. In fact, all operations within a neural network and during optimization are operations between tensors, and all parameters (for example, weights and biases) in a neural network are tensors.

Parameters are subclasses of Tensor that are to be considered module parameters. It is automatically added to the list of its parameters and will appear in parameters() iterator. Assigning a Tensor doesn’t have such an effect. This is because one might want to cache some temporary state, like the last hidden state of the RNN, in the model.

Create Model

We’ve focused on a very simple regression problem that used a linear model with only one input and one output. torch.nn.Linear accepts three arguments: the number of input features, the number of output features, and whether the linear model includes a bias or not (defaulting to True, here):

model = torch.nn.Sequential(
        torch.nn.Linear(1, 13),
        torch.nn.Tanh(),
        torch.nn.Linear(13, 1))

model

#output
Sequential(
  (0): Linear(in_features=1, out_features=13, bias=True)
  (1): Tanh()
  (2): Linear(in_features=13, out_features=1, bias=True)
)

We have an instance of nn.Linear with one input and one output feature. That only requires one weight and one bias. The numbers specifying input and output dimensions for each layer are directly related to the number of parameters in a model.

View Model Parameters

Any nn.Module subclass can recursively collect and return its and its children’s parameters. This technique can be used to count them, feed them into the optimizer, or inspect their values. we can look at “How many parameters our model has?”

sum(p.numel() for p in model.parameters()) #40

This is useful for inspecting parameters or their gradients: for instance, to monitor gradients during training. Say we want to print out the gradients of the weight of the linear portion of the hidden layer. We can run the training loop for the new neural network model and then look at the resulting gradients after the last epoch.

for name, param in model.named_parameters():
    print(name, param)

When inspecting the parameters of a model made up of several submodules, it is handy to be able to identify parameters by name. There’s a method for that, called named_parameters:

for name, param in model.named_parameters():
    print(name, param.shape)

#Output
0.weight torch.Size([13, 1])
0.bias torch.Size([13])
2.weight torch.Size([1, 13])
2.bias torch.Size([1])

Calling model.parameters() will collect weight and bias from all submodules.

Model parameters can be saved to disk and loaded back in with one line of code. We can get at our model’s parameters using the model.state_dict() function.

PyTorch allows us to load those parameters into any model that expects parameters of the same shape, even if the class doesn’t match the model those parameters were saved under. The save parameters-only approach allows us to reuse and remix our models in more ways than saving the entire model.

Related Post