In deep neural nets, one forward pass simply performing consecutive matrix multiplications at each layer, between that layer’s inputs and weight matrix. The product of this multiplication at one layer becomes the inputs of the subsequent layer, and so on.

The first step that comes into consideration while building a neural network is the initialization of parameters, if done correctly then optimization will be achieved in the least time otherwise converging to a minimum using gradient descent will be impossible.

The aim of weight initialization is to prevent the model from exploding or vanishing during the forward pass through a deep neural network. If occurs, loss gradients will either be too large or too small to flow backward and the network will take longer to converge.

We assume that the reader is already familiar with the concept of neural network, weight, bias, activation functions, etc.

Default Initialization

This is a quick tutorial on how to initialize weight and bias for the neural networks in PyTorch. PyTorch has inbuilt weight initialization which works quite well so you wouldn’t have to worry about it but. You can check the default initialization of the Conv layer and Linear layer.

There are a bunch of different initialization techniques like uniform, normal, constant, kaiming and Xavier. You can read more about it here. if you want to know how to change it that’s what we’re going to learn in this tutorial.

Let’s just get started on how to initialize our network. The layers are initialized after creation. We have a very simple CNN example really nothing special here just Conv layer, Pooling layer, Linear layer, and BatchNorm.

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=10,
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_bn = nn.BatchNorm2d(20)
        self.dense1 = nn.Linear(in_features=320, out_features=50)
        self.dense1_bn = nn.BatchNorm1d(50)
        self.dense2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_bn(self.conv2(x)), 2))
        x = x.view(-1, 320) #reshape
        x = F.relu(self.dense1_bn(self.dense1(x)))
        x = F.relu(self.dense2(x))
        return F.log_softmax(x)

We’re gonna define another function to initialize weight which takes a layer.

def initialize_weights(m):
  if isinstance(m, nn.Conv2d):

We’re gonna check instant m if it’s convolution layer then we can initialize with a variety of different initialization techniques we’re just gonna do the kaiming_uniform_ on the weight of that specific module and we’re only gonna do if it’s a conv2d.

You can also define a bias in the convolution. The default is true so you know it initializes a bias by default but we can check bias are not none.

if m.bias is not None:
     nn.init.constant_(, 0)

Now we have also the BatchNorm layer, you can also initialize it. Here first check type layer. This is just standard initialization for the BatchNorm and the bias should be zero.

def initialize_weights(m):
  if isinstance(m, nn.Conv2d):
      if m.bias is not None:
          nn.init.constant_(, 0)
  elif isinstance(m, nn.BatchNorm2d):
      nn.init.constant_(, 1)
      nn.init.constant_(, 0)
  elif isinstance(m, nn.Linear):
      nn.init.constant_(, 0)

It’s just an example of how you initialize the weights and then you called apply function on the model to initialize weight after you defined all of your layers.  


Pass an initialization function to torch.nn.Module.apply. It will initialize the weights in the entire Module recursively. The apply function will search recursively for all the modules inside your network and call the function on each of them. So all layers you have in your model will be initialized using this one call.

Single-layer initialization

To initialize the weights of a single layer, use a function from torch.nn.init. For instance:

conv1 = nn.Conv2d(4, 4, kernel_size=5)

Alternatively, you can modify the parameters by writing to which is a torch.Tensor. Example:

You can also set the weights manually. Let’s you have the input of all ones:

input = torch.ones((4,4))

And you want to make a dense layer with no bias so we can visualize and set all the weights to 0.2 or anything else:


Run this code in Google Colab