Convolutional Neural Network often uses pooling layers to reduce the size and speed up computation as well as make some of the features detected a bit more robust.

In this tutorial, We’re going to explain the different types of pooling layers and show how it’s calculated by looking at some examples. We’ll also discuss the motivation for why the pooling layer is used.

Max Pooling

Max pooling is a type of operation that’s typically added to CNN’s following individual convolutional layers when added to a model max-pooling reduces the dimensionality of images by reducing the number of pixels in the output from the previous convolutional layer.

Suppose you have 4×4 input and you want to apply max-pooling. It is quite simple to take 4×4 break it into different regions. The output is 2×2 each of the outputs will just be the max from the correspondingly shaded region.

This is as if you’re applying a filter size of two because you’re taking 2×2 regions and a stride of 2.

The intuition behind max-pooling is, if you think of this 4×4 input as some set of features then a large number means that it’s maybe detected to the particular feature. The features detected anywhere in one of these quadrants then remain preserved in the output of max pooling.

The max-pooling is really safe you know if this feature is detected anywhere in this filter then keep a high number but if this feature is not detected so maybe if these features don’t exist.

So far I’ve shown max pulling on a 2d input if you have a 3d input then the output will have the same dimension for example if you have 32x32x64 then the output would be 16x16x64. Max-pooling computation is done independently on each of these number of channels.

Average pooling

One of the types of pooling that isn’t used very often is average pooling, instead of taking the max within each filter you take the average.

In this example, the average of the numbers in orange is 2.75 this is average pooling with hyperparameter filter =2 strides =2 you can choose another hyperparameter as well.

Max pooling is used much more often than average pooling with one exception which is sometimes very deep in the neural network you might use average pooling to collapse your representation from say 7x7x1000 and average over all the spatial experiments you get 1x1x1000.

Hyper-parameters

The hyperparameters for pulling are f=Filter Size, and s = stride, and common choices of parameters might be f =2 and s=2 this is used quite often and this has the effect of roughly shrinking the height and width by a factor of 2. You can add an extra hyperparameter for the padding although this is very rarely used when you do max pooling usually you do not use any padding although there.

Global Average Pooling

The feature maps of the last convolutional layer are vectorized and fed into fully connected layers followed by a softmax logistic regression layer. This structure bridges the convolutional structure with traditional neural networks. It treats the convolutional layers as feature extractors, and the resulting feature is classified in a traditional way.

The fully connected layers are prone to overfitting. You can use Dropout as a regularizer which randomly sets half of the activations to the fully connected layers to zero during training. It has improved the generalization ability and largely prevents overfitting.

You can use another strategy called global average pooling to replace the Flatten layers in CNN. It generates one feature map for each corresponding category of the classification task in the last Conv layer.

Instead of adding fully connected layers on top of the feature maps, it takes the average of each feature map, and the resulting vector is fed directly into the softmax layer. One advantage of global average pooling over the fully connected layers is that it is more native to the convolution structure by enforcing correspondences between feature maps and categories.

Another advantage is that there is no parameter to optimize in the global average pooling thus overfitting is avoided at this layer. Global average pooling sums out the spatial information, thus it is more robust to spatial translations of the input. We can see global average pooling as a structural regularizer that explicitly enforces feature maps to be confidence maps of concepts (categories).

Flatten Layer vs GlobalAveragePooling

Flatten Layer will take a tensor of any shape and transform it into a one-dimensional tensor but keeping all values in the tensor. For example a tensor (samples, 10, 10, 32) will be flattened to (samples, 10 * 10 * 32).

An architecture like this has the risk of overfitting to the training dataset. In practice, dropout layers are used to avoid overfitting.

Global Average Pooling does something different. It applies average pooling on the spatial dimensions until each spatial dimension is one, and leaves other dimensions unchanged. For example, a tensor (samples, 10, 10, 32) would be output as (samples, 1, 1, 32).

Global Max Pooling

It downsamples the input representation by taking the maximum value over the time dimension. Global pooling layers can be used in a variety of cases. Primarily, it can be used to reduce the dimensionality of the feature maps output by some convolutional layer, to replace Flattening and sometimes even Dense layers in your classifier.

References

Network In Network

Explain Pooling layers: Max Pooling, Average Pooling, Global Average Pooling, and Global Max pooling.