Spread the love

Very deep neural network are difficult to train because of vanishing and exploding gradients problems. You find that as you increase the number of layers the training error will decrease after a while but then they’ll tend to go back up. In theory, as you make the neural network deeper should only do better and better on the training set.

Vanishing Gradient Problem

In reality, your training error gets worse if you pick a network that’s too deep but what happens with ResNet is that even as a number of layers get deeper you can have the performance of the training error even to keep on going down.

ResNet enables you to train the very very deep neural network. Sometimes even network over 100 layers.ResNet is built of the residual block.

Residual Block

Here are two layers of a neural network where you start off with some activation a[l] then you go to a[l+1].In other words information from a[l] to flow a[l+2] it needs to go through all of these steps which call the main path of this set of layers.

Residual block

In a ResNet we’re going to make a change to this we’re gonna take a[l] and just fast forward it copies it much further into the neural network to before a[l+2]. just add al before applying the non-linearity and this the shortcut.

Shortcut Connections

Shortcut connection or Skip connections which allows you to take the activation from one layer and suddenly feed it to another layer.

Rather than follow the main path the information from a[l] you can now follow a shortcut to go much deeper into the neural network and what that means is that a[l+2] last equation goes away and we instead have that the output a[l+2] + a[l].

The additions of this a[l] here it makes this a residual block and in pictures,

Using the residual block allows you to train much deeper neural networks and the way you building a ResNet is by taking many of these blocks and stacking them together to form a deep network.

Building ResNet in TensorFlow using Keras API

Based on the plain network, we insert shortcut connections which turn the network into its counterpart residual version. The identity shortcuts can be directly used when the input and output are of the same dimensions.

Identity Mapping by Shortcuts

Projection Shortcuts

The dimensions of x and F must be equal in Identity Mapping. If this is not the case (e.g., when changing the input/output channels), we can perform a linear projection Ws by the shortcut connections to match the dimensions:

If this is not the case (e.g., when changing the input/output channels), we can perform a linear projection Ws by the shortcut connections to match the dimensions:

y = F(x, {Wi}) + Wsx.

the identity mapping is sufficient for addressing the degradation problem and is economical and thus Ws is only used when matching dimensions.

For each residual function F, we use a stack of 3 layers. The three layers are 1×1, 3×3, and 1×1 convolutions, where the 1×1 layers are responsible for reducing and then increasing (restoring) dimensions, leaving the 3×3 layer a bottleneck with smaller input/output dimensions.

Implement ResNet50

Building blocks are shown in brackets with the numbers of blocks stacked. Downsampling is performed by conv3 1, conv4 1, and conv5 1 with a stride of 2.

ResNet 50 architecture

We adopt batch normalization (BN) right after each convolution and before activation, following. Zero-padding shortcuts are used for increasing dimensions, and all shortcuts are parameter-free. Projections shortcuts are used for increasing dimensions, and other shortcuts are identity