One error you may encounter when you create a convolution neural network using PyTorch is:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2×1568 and 1050×10)
The error points to a shape mismatch in the model output and target. This tutorial explains why this error occurs and how to fix this error.
The core building block of neural networks is the layer, some data goes in, and it comes out in a more useful form. Specifically, layers extract representations out of the data fed into them.
Most of the deep learning consists of chaining together simple layers that will implement a form of progressive data distillation. A deep-learning model is like a sieve for data processing, made of a succession of increasingly refined data filters the layers.
import torch
import torch.nn as nn
from torchsummary import summary
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(in_channels=1,out_channels=16,kernel_size=5,stride=1,padding=2),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2))
self.conv2 = nn.Sequential(
nn.Conv2d(16, 32, 5, 1, 2),
nn.ReLU(),
nn.MaxPool2d(2),
)
# fully connected layer, output 10 classes
self.out = nn.Linear(1050, 10)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
# flatten the output of conv2 to (batch_size, 32 * 7 * 7)
x = x.view(x.size(0), -1)
output = self.out(x)
return output, x # return x for visualization
cnn = CNN()
summary(cnn, (1, 28, 28))
Here, our network consists of a sequence of two Convolution layers and one Linear layer, which are densely connected (also called fully connected) neural layers. The last layer is a 10-way softmax layer, which means it will return an array of 10 probability scores. Each score will be the probability that the current digit image belongs to one of our 10-digit classes.
In our initial example, we were building our network by stacking layers on top of each other. A PyTorch Linear layer instance looks like this:
torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)
This layer can be interpreted as a function, which takes as input a 2D tensor and returns another 2D tensor—a new representation for the input tensor. Specifically, the function is as follows (where W is a 2D tensor and b is a vector, both attributes of the layer):
output = relu(dot(W, input) + b)
Let’s unpack this. We have three tensor operations here: a dot product (dot) between the input tensor and a tensor named W; an addition (+) between the resulting 2D tensor and a vector b;
How to Fix?
Linear layers need an activation with the specified input feature dimension. If you are changing the input shape, the input activation to the linear layer might have a different shape and would thus yield a shape mismatch error.
Models in torchvision make sure the input activation to the first linear layer has the right shape. If you are not using these layers, you would need to change the model for different input shapes or would need to stick to the original input shape.
The shape mismatch is raised in the first linear layer of self.out
as it’s expecting an input activation with 1050 features while the actual activation has 1568 features. Change the in_features
of this linear layer to 1568 and it should work.
Related Post
Calculate Output Size of Convolutional and Pooling layers in CNN.
Calculate the number of parameters for a Convolutional and Dense layer in Keras