That max pooling isn’t the only way you can achieve downsampling. You can also use strides in the prior convolution layer or you can use average pooling instead of max pooling,
Average pooling was a common approach early on but has fallen out of favor somewhat. As you might expect, it works exactly like a max pooling layer, except it computes the mean rather than the max.
Calling view on a tensor returns a new tensor that changes the number of dimensions and the striding information, without changing the storage.
Contiguous memory allocation, all the free memory space can stay in the same place concurrently. In non-contiguous memory allocation, the operation is allowed to different memory sections at multiple memory positions in the memory.
Sparse matrices also have significant advantages in terms of computational efficiency. Unlike operations with full matrices, operations with sparse matrices do not perform unnecessary low-level arithmetic
Sparse matrices are commonly used in machine learning, such as in data that contains counts, data encodings that map categories to counts, and even in whole subfields of machine learning such as natural language processing.
We can see the advantage of sparse matrices if we create a much larger matrix with many more zero elements and then compare this larger matrix with our original sparse matrix
For a multiclass classification problem, you can use the one-versus-all design and find a ROC curve for each class. The one-versus-all treats a multiclass classification problem as a set of binary classification problems
If you try to backward you will get an error. It is just harder to detect during the forward. The .backward() did NOT catch the in-place operation on a tensor that is in the forward computation graph. It only detects it during the backward.
In this article, we will explain the difference between detach(), clone(), copy.deepcopy(), detach().clone(), and clone().detach() for a tensor.