Every machine learning model comes with a large number of hyperparameters that need to specify. Hyperparameter tuning helps us control the behavior of machine learning algorithms when optimizing for performance, finding the right hyperparameter tuning for performance optimization is an art in itself, and there are no hard-and-fast rules that guarantee the best performance on a given dataset. 

In the holdout method, we split our dataset into two parts: a training and a test dataset. After the machine learning algorithm fit a model to the training set, we evaluated it on the independent test set that we withheld from the during model fitting. We used fixed hyperparameter settings in our learning algorithms, such as learning rate or batch size, etc. We defined hyperparameters as the parameters of the learning algorithm itself, which we have to specify a priori – before model fitting.

Selecting a model based on the test set performance seems to be a reasonable approach. Reusing the test set multiple times would result in overly optimistic estimates of the generalization performance.

In this tutorial, we will focus K-Fold cross-validation for model evaluation. K-fold cross-validation is the most common technique for model evaluation and model selection in machine learning.

The main idea behind K-Fold cross-validation is that each sample in our dataset has the opportunity of being tested. It is a special case of cross-validation where we iterate over a dataset set k times. In each round, we split the dataset into k parts: one part is used for validation, and the remaining k-1 parts are merged into a training subset for model evaluation as shown in the figure below, which illustrates the process of 5-fold cross-validation:

K-Fold Cross Validation

The final score is generally the average of all the scores obtained across the k-folds. 

PyTorch Model

Let’s define a simple convolutional neural network for the MNIST dataset.

class MNISTNet(nn.Module):
    def __init__(self):
        super(MNISTNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

Reset Model Weights

You need to reset the weights of the model so that each cross-validation fold starts from some random initial state and not learning from the previous folds. You could call reset_parameters() on all child modules.

def reset_weights(m):
    if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):

During the folds, it will be used to reset the parameters of the model. This way, we ensure that the model is trained with weights that are initialized randomly, avoiding weight leakage.

Concat Dataset

We need to concatenate the train and test parts of the MNIST dataset, which we’ll use for training the model. K-fold means that you generate the splits yourself, so you don’t want PyTorch to do this for you.

dataset1 = datasets.MNIST('../data', train=True, download=True,transform=transform)

dataset2 = datasets.MNIST('../data', train=False,transform=transform)


Now, you can generate the fold and train your model. You can do so by defining a loop where you iterate over the fold, specifying the fold and the list of identifiers of the training, and testing samples for that particular fold. These can be used for performing the actual training process.

for fold,(train_idx,test_idx) in enumerate(kfold.split(dataset)):
  print('------------fold no---------{}----------------------'.format(fold))
  train_subsampler = torch.utils.data.SubsetRandomSampler(train_idx)
  test_subsampler = torch.utils.data.SubsetRandomSampler(test_idx)

  trainloader = torch.utils.data.DataLoader(
                      batch_size=batch_size, sampler=train_subsampler)
  testloader = torch.utils.data.DataLoader(
                      batch_size=batch_size, sampler=test_subsampler)


  for epoch in range(1, epochs + 1):
    train(fold, model, device, trainloader, optimizer, epoch)
    test(fold,model, device, testloader)

Within the for loop, we first print the current fold. You then perform the training process. Sampling the actual elements from the train_idx and test_idx with a SubsetRandomSampler. A sampler can be used within a DataLoader to use particular samples only. The SubsetRandomSampler samples elements randomly from a list, without replacements. In other words, you create two subsamplers that adhere to the fold as specified within the forloop.

With the data loaders, we’ll actually sample these samples from the full dataset. After preparing the dataset for this particular fold, We train and test the neural network by calling functions.

K-Fold cross-validation is very expensive because you run the model several times on different dataset folds.

Run this code in Google colab