In this tutorial, We’re defining what is a parameter and How we can calculate the number of these parameters within each layer using a simple Convolution neural network.
What are learnable Parameters?
During the training process, stochastic gradient descent(SGD) works to learn and optimize the weights and biases in a neural network. These weights and biases are indeed learnable parameters. In fact, any parameters within our model which are learned during training via SGD are considered learnable parameters. It’s these parameters are also referred to as trainable parameters, since they’re optimized during the training process.
Here we create a simple CNN model for image classification using an input layer, three hidden convolutional layers, and a dense output layer.
def create_model(): model = tf.keras.Sequential([ tf.keras.layers.Conv2D(kernel_size=(3,3), filters=32, padding='same', activation='relu', input_shape=[IMG_SIZE,IMG_SIZE, 3],name="Conv1"), tf.keras.layers.Conv2D(kernel_size=(3,3), filters=64, padding='same', activation='relu',name="Conv2",use_bias=True), tf.keras.layers.MaxPooling2D(pool_size=2,name="Max1"), tf.keras.layers.Conv2D(kernel_size=(3,3), filters=128, padding='same', activation='relu',name="Conv3"), tf.keras.layers.Conv2D(kernel_size=(1,1), filters=256, padding='same', activation='relu',name="Conv4"), tf.keras.layers.GlobalAveragePooling2D(name="GAP1"), tf.keras.layers.Dense(10,'relu',name="Dense1"), tf.keras.layers.Dense(10,'softmax',name="Output")]) model.compile(optimizer=tf.keras.optimizers.RMSprop(), loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=[tf.keras.metrics.SparseCategoricalAccuracy()]) return model model=create_model() model.summary()
Our input layer is made up of input data from images of size 32x32x3, where 32×32 specifies the width and height of the images, and 3 specifies the number of channels. The three channels indicate that our images are in RGB color scale, and these three channels will represent the input features in this layer.
Our first convolutional layer is made up of 32 filters of size 3×3. Our second convolutional layer is made up of 64 filters of size 3×3. And our output layer is a dense layer with 10 nodes.
First, we need to understand whether or not the layer contains biases for each layer. If it is, then we simply add the number of biases. The number of biases will be equal to the number of nodes(filters) in the layer. Additionally, we’re assuming our network contains biases. This means that there are bias terms within our hidden layer and our output layer.
Parameter for Input Layer
The input layer has no learnable parameters since the input layer is just made up of the input data, and the output from the layer is actually just going to be considered as input to the next layer.
Convolutional layer Parameter
A convolutional layer has filters, also known as kernels. First, we need to determine how many filters are in a convolutional layer as well as how large these filters are. We need to consider these things in our calculation.
The input for a convolutional layer depends on the previous layer types. If it was a dense layer, then it is just the number of nodes from the previous dense layer.
If it was a convolutional layer, the input will be the number of filters from that previous convolutional layer.
The output of a convolutional layer is the number of filters times the size of the filters. With a dense layer, it was just the number of nodes.
Let’s calculate the number of learnable parameters within the Convolution layer.
Convolutional layer 1
tf.keras.layers.Conv2D(kernel_size=(3,3), filters=32, padding='same', activation='relu', input_shape=[IMG_SIZE,IMG_SIZE, 3],name="Conv1")
We have 3 input coming from our input layer. The number of outputs is the number of filters times the filter size. So we have 32 filters, each of size 3×3. So 32*3*3 = 288. Multiplying our three inputs by our 288 outputs, we have 864 weights. Now, how many biases? Just 32, since the number of biases, is equal to the number of filters. So that gives us 896 total learnable parameters in this layer.
Convolutional Layer 2
tf.keras.layers.Conv2D(kernel_size=(3,3), filters=64, padding='same', activation='relu',name="Conv2",use_bias=True)
Now let’s move to our next convolutional layer. How many inputs are coming from the previous layer? We have 32, the number of filters in the previous layer. How many outputs? Well, we have 64 filters, again of size 3×3. So that’s 64*3*3 = 576 outputs. Multiplying our 32 inputs from the previous layer by the 576 outputs, we have 18432 weights in this layer. Adding biases terms from the 64 filters, we have 18496 learnable parameters in this layer.
We then do this same calculation for the remaining layers in the network
For a dense layer, this is what we determined would tell us the number of learnable parameters:
inputs * outputs + biases
Overall, we have the same general setup for the number of learnable parameters in the layer being calculated as the number of inputs times the number of outputs plus the number of biases.
This will give us the number of learnable parameters within a given layer. You can sum all the results together to get the total number of learnable parameters within the entire network.