Convolutional neural network (CNN) is a little bit of a black box. Where some input image of raw pixels is input. It goes to the many layers of the convolution and pooling layer and we end up with some set of class scores or bounding box or labeled pixels or something like that. But the question is what are all these other layers in the middle doing? What kinds of things in the input image are they looking for? How ConvNet is working? What types of things in the image they are looking for?







Here we create a simple ConvNet for MNIST digits classification. You can assign the name of each layer using the name attribute of the layer.

Model Summary

The model summary gives the output shape of each layer, e.g. the shape of the resulting ConvNet layer.

Keras Model ConvNet Summary

At each layer in the convolutional network, our input image is like 28x28x1 and then it goes through many stages of convolution. Then, after each convolutional layer is some three-dimensional chunk of numbers which are the outputs from that layer of the convolutional network.

The entire three-dimensional chunk of numbers which are the output of the previous convolutional layer we call an activation volume and then one of those slices is a, it’s an activation map.

Change Names of Layers

We can change the name of the layer. Changing the name attribute of a layer should not affect the accuracy of a model. They are simply descriptors. To get the layer name associated with a model you can use the layers index.


Create New Model

The new model would have the same input layer as the original model, but the output would be the output of a given convolutional layer, which we know would be the activation of the layer or the feature map.

def visualize_conv_layer(layer_name):




  for row in range(0,row_size):
    for col in range(0,col_size):
      ax[row][col].imshow(intermediate_prediction[0, :, :, img_index], cmap='gray')


The human visual system is known to detect edges at the very early layers. It turns out that these convolutional networks tend to do something similar at their first convolutional layers as well.


Our model has 32 filters. In the first layer, we can get a sense of what these layers are looking for by simply visualizing the layer. We can just visualize that layer as a little 26x26x1 image with one channel. Because there are 32 of these filters we just visualize 32 little 26×26 images.

ConvNet input layer output

We can actually go and visualize each of those 26x26 elements slices of the feature map as a grayscale image and this gives us some sense of what types of things in the input are those features in that convolutional layer looking for.


Layer 2 gives us this 24x24x64 dimensional tensor. But we can think of that as 64 different 24×24 images.

Output of Convolutional Layers

The second convolutional layer receives the 32-channel input. It does 3×3 convulsions with 32 convolutional filters. The problem is that you can’t really visualize these directly as images.

Most of these intermediate layer is kind of noisy. But there’s one highlighted intermediate feature that seems that it’s activating on the portions of the feature map corresponding to the digits. That kind of suggests that maybe this particular slice of the feature map of this layer of this particular network is maybe looking for digits or something like that.

Related Post

Run this code in Google Colab