Matrix multiplication is an important operation in fully connected networks. To go from an input layer to a hidden layer, we employ matrix multiplication and add operations. 

In this tutorial, we will talk about some low-level matrix multiplication operations that underpin deep neural networks.

Let’s write a function that computes the matrix product of two tensors. We’ll need three nested for loops: one for the row indices, one for the column indices, and one for the inner sum. 

def matmul(a,b):
  ar,ac = a.shape # n_rows * n_cols
  br,bc = b.shape
  assert ac==br
  c = torch.zeros(ar, bc)
  for i in range(ar):
    for j in range(bc):
      for k in range(ac): c[i,j] += a[i,k] * b[k,j]
  return c

ac and ar stand for the number of columns of a and the number of rows of a, respectively, and the same convention is followed for b, and we make sure calculating the matrix product is possible by checking that a has as many columns as b has rows: 

To test this out, we’ll pretend using random matrices that we’re working with a small batch of 5 MNIST images, flattened into 28*28 vectors, with a linear model to turn them into 10 activations: 

m1 = torch.randn(5,28*28)
m2 = torch.randn(784,10)

%time t1=matmul(m1, m2)
CPU times: user 782 ms, sys: 1.72 ms, total: 784 ms
Wall time: 784 ms

And see how that compares to PyTorch’s built-in @?

%timeit -n 20 t2=m1@m2
#16.4 µs ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 20 loops each)

As we can see, in Python three nested loops is a bad idea! Python is a slow language, and this isn’t going to be efficient. We see here that PyTorch is around 100,000 times faster than Python 

PyTorch didn’t write its matrix multiplication in Python, but rather in C++ to make it fast. In general, whenever we do computations on tensors, we will need to vectorize them so that we can take advantage of the speed of PyTorch, usually by using two techniques: elementwise arithmetic and broadcasting. 

Elementwise Matrix Multiplication

Matrix Multiplication can be applied elementwise. That means if we write a*b for two tensors a and b that have the same shape, we will get a tensor composed of the sums of the elements of a and b:

m = torch.tensor([[1,2,3], 

The elementwise operations work on tensors of any rank, as long as they have the same shape. However, you can’t perform elementwise operations on tensors that don’t have the same shape.

m = torch.tensor([[1,2,3], 

n = tensor([[1,2,3], 
RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 0


Broadcasting term introduced by the Numpy Library that describes how tensors of different ranks are treated during arithmetic operations.

For instance, it’s obvious there is no way to add a 4×3 matrix with a 3×2 matrix, but what if we want to add one scalar (which can be represented as a 1×1 tensor) with a matrix?

Broadcasting gives specific rules to codify when shapes are compatible when trying to do an elementwise operation, and how the tensor of the smaller shape is expanded to match the tensor of the bigger shape.

You can perform matrix multiplication between two tensors using the tensor.matmul() function. For two matrices, tensor.matmul() performs matrix multiplication (e.g. if you have x of size [4,3] and y of size [3,2], matrix multiplication results in a [4,2] tensor. The figure illustrates the matrix multiplication operation.

PyTorch matrix multiplication
import torch

x = torch.tensor([[4,1,5],

y = torch.tensor([[2,3],


More generally, if you have an n x m matrix (a) and a m x p matrix (b), the result of  matrix multiplication c is given by 

matmul matrix multiplication

However, if you have high-dimensional tensors a and b, the sum product over the last axis of a and the second-to-last axis of b will be performed. Both a and b tensors need to have identical dimensionality except for the last two axes. For example, if you have a tensor a of size [3,5,7] and b of size [3,7,8], the result would be a [3,5,8]–sized tensor. 

Related Post

How to reshape tensor in PyTorch?

PyTorch element wise matrix multiplication

PyTorch Difference Between View and Reshape.

How to Indexing and Slicing PyTorch Tensor?

Normalize PyTorch batch of tensors between 0 and 1 using scikit-learn MinMaxScaler