All the deep learning frameworks rely on creating computation graphs to calculate gradient values required for gradient descent optimization. In general, deep learning frameworks represent neural networks as computational graphs. It is used to compute gradients of neural networks. 

You have to build the forward propagation graph and the framework takes care of the backward differentiation. There are two types of computational graphs, static and dynamic.

Static Computational Graph

In a static Graph, We create and connect all the variables at the beginning, and initialize them into a static (unchanging) session. This session and graph persist and are reused. It is not rebuilt after each iteration of training, making it efficient. 

With a static graph, variable sizes have to be defined at the beginning, which can be non-convenient for some applications, such as NLP with variable length inputs.

Dynamic Computational Graph

In a dynamic graph, the computational graph is built up dynamically, immediately after we declare variables. This graph is thus rebuilt after each iteration of training. Dynamic graphs are flexible and allow us to modify and inspect the internals of the graph at any time. The main drawback is that it can take time to rebuild the graph. PyTorch uses a dynamic Computational graph.

PyTorch Model Input Size
Computational graph with the backward() and grad() Source

Input Size for Model

Convolutional layers in PyTorch are dynamic by design, there is no straightforward way to get height and width. Any input image size is acceptable to a module composed completely of convolutions.

If the network subsequently contains a Linear layer with a fixed input size parameter, any image with size (height/n, n*width) should be acceptable input to the network.

PyTorch allows dynamically input size from one input to the next, during training or inference. For example, providing an image as a Tensor of size [1, 3, 150, 120] to your model, then another one as a Tensor of size [1, 3, 184, 100], and so forth, is completely fine, as, for each input, your model will dynamically adapt. The only important thing is the depth of the image(RGB or Grayscale).

Batch Input Processing

When you use Batch training or Inferencing using a DataLoader applying the dynamic input size does not work because a batch will be transformed to a single Tensor input with one extra dimension. For example, if you provide a list of n images, each of the size [1, 3, 255, 255], PyTorch will stack them into [n, 1, 3, 255,255], so that your model has a single Tensor input.

The stacking only happens between images of the same shape, this stacking operation cannot be done between images of different shapes, because the network cannot “guess” how the different images should “align” with one another in a batch if they are not all the same size.

You can create batches of the same size, for example, one batch for items of [1, 3, 150, 150], one for items of [1, 3, 300, 200], it is called “bucketing”.

Related Post

Filters, kernel size, input shape in Conv2d layer