ConvNet is a little bit a black box. Where some input image of raw pixels is input.It goes to the many layers of the convolution and pooling layer and we end up with some set of class scores or bounding box or labeled pixels or something like that. But the question is what are all these other layers in the middle doing? What kinds of things in the input image are they looking for? How ConvNet is working? What types of things in the image they are looking for?







Here we create simple ConvNet for MNIST digits classification. You can assign the name of each layer using the name attribute of the layer.

Model Summary

The model summary gives the output shape of each layer, e.g. the shape of the resulting ConvNet layer.

ConvNet Summary

At each layer in the convolutional network, our input image is like 28x28x1 and then it goes through many stages of convolution. Then, after each convolutional layer is some three-dimensional chunk of numbers which are the outputs from that layer of the convolutional network.

The entire three-dimensional chunk of numbers which are the output of the previous convolutional layer we call like an activation volume and then one of those slices is a, it’s an activation map.

Change Names of Layers

We can change the name of the layer. Changing the name attribute of a layer should not affect the accuracy of a model. They are simply descriptors. To get the layer name associated with a model you can use layers index.


Create New Model

The new model would have the same input layer as the original model, but the output would be the output of a given convolutional layer, which we know would be the activation of the layer or the feature map.

def visualize_conv_layer(layer_name):




  for row in range(0,row_size):
    for col in range(0,col_size):
      ax[row][col].imshow(intermediate_prediction[0, :, :, img_index], cmap='gray')


The human visual system is known to detect edges at the very early layers. It turns out that these convolutional networks to tend to do something similar at their first convolutional layers as well.


Our model has 32 filters. In the first layer, we can get some sense for what these layers are looking for by simply visualizing layer. We can just visualize that layer as a little 26x26x1 image with one channel.Because there are 32 of these filters we just visualize 32 little 26×26 images.

ConvNet input layer output

We can actually go and visualize each of those 26×26 elements slices of the feature map as a grayscale image and this gives us some sense for what types of things in the input is those features in that convolutional layer looking for.


Layer2 gives us this 24x24x64 dimensional tensor. But we can think of that as 64 different 24×24 images.

Output of Convolutional Layers

The second convolutional layer receives the 32 channel input. It does 3×3 convulsions with 32 convolutional filters. The problem is that you can’t really visualize these directly as images.

Most of these intermediate layer is kind of noisy. But there’s a one highlighted intermediate feature that seems that it’s activating on the portions of the feature map corresponding to the digits. That kind of suggests that maybe this particular slice of the feature map of this layer of this particular network is maybe looking for digits or something like that.

Run this code in Google Colab