Learning rate is one of the key hyperparameters for gradient descent. It scales the magnitude of the model’s weight updates in order to minimize the model’s loss. Choosing the learning rate is challenging,  too small may result in a long training process that could get stuck, whereas a value too large may result in learning too fast or an unstable training process.

Learning Rate Decay

You need to find the best learning rate which decreases the model’s loss. You can simply experiment where gradually increase the learning rate after each mini-batch, recording the loss at each increment.  This gradual increase can be on either a linear or exponential scale.

The learning rate is the most important hyperparameter so it is vital to know the effects of the learning rate on model performance and to build an intuition about the dynamics of the learning rate on model behavior.

Learning Rate Schedulers

You can change the learning rate as the training progress using the learning rate schedules. It adjusts the learning rate according to a pre-defined schedule like time-base, step base or exponential, etc. We can define a learning rate schedule in which the learning rate is updating during training according to some specified rule.

The most popular learning rate scheduler is a step decay where the learning rate is reduced by some percentage after a set number of training epochs. 

There are a bunch of different schedules. Here, I’m gonna show you ExponentialLR which Decays the learning rate of each parameter group by gamma every epoch.

You can use one of the built-in learning rate schedulers in PyTorch hear just an example that a very generic one. For more schedules go to the PyTorch document, they have a selection of different learning rate schedules. 

In this tutorial, we train a convolutional neural network on the MNIST dataset using a learning rate scheduler provided by PyTorch. 

def train(model,optimizer,epoch,log_interval):
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        output = model(data)
        loss = F.nll_loss(output, target)
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} batch-{}\tLoss: {:.6f} Learning Rate: {}'.format(epoch, batch_idx ,loss.item(),lr))

For the illustrative purpose, we use Adam optimizer. It has a constant learning rate by default.


torch.optim.lr_scheduler provides several methods to adjust the learning rate based on the number of epochs. All scheduler has a step() method, that updates the learning rate.

scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.1)

for epoch in range(1,epochs+1):

Visualize Learning Rate

We can then visualize the learning rate by accessing  optimizer.param_groups[0][“lr”].

plt.title("PyTorch Learning Rate")
plt.ylabel("learning rate")

param_groups is a dictionary containing all parameter groups. It will hold the current state and will update the parameters based on the computed gradients.

PyTorch LR Scheduler

The optimal learning rate will be dependent on both your model architecture and your dataset. While using a default learning rate may provide decent results, you can often improve the performance or speed up training by searching for an optimal learning rate.

Run this code in Google colab