In addition to the weight parameters defined in our neural network, it also requires a couple of additional parameters to carry out the training process. One of these so-called hyperparameters is the learning rate. Picking the learning rate is a hard problem.

If it’s too small, we risk taking too long during the training process. But if we pick a learning rate that’s too big, we’ll most likely start diverging away from the minimum.

In this post, you will discover what is learning rate and how you can get the learning rate from optimizer and learning rate schedules during the training of neural network models in PyTorch.

In practice, at each step of moving perpendicular to the contour, we need to determine how far we want to walk before recalculating our new direction. This distance needs to depend on the steepness of the surface.

The closer we are to the minimum, the shorter we want to step forward. We know we are close to the minimum because the surface is a lot flatter, so we can use the steepness as an indicator of how close we are to the minimum. 

If our error surface is rather mellow, training can potentially take a large amount of time. As a result, we often multiply the gradient by a factor, the learning rate. 

First, you have to construct an optimizer object that will hold the current state and will update the parameters based on the computed gradients.

To construct an Optimizer you have to give it an iterable containing the parameters to optimize. Then, you can specify optimizer-specific options such as the learning rate, weight decay, etc.

model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01,momentum=momentum)

A plain stochastic gradient descent algorithm is used, with a fixed learning rate of 0.01.

Get Learning Rate From Optimizer

In PyTorch, a model is updated by an optimizer and the learning rate is a parameter of the optimizer. The learning rate schedule is an algorithm to update the learning rate in an optimizer. To get the learning rate from the optimizer, you could use [ group['lr'] for group in optim.param_groups ].

model.train()
for batch_idx, (data, target) in enumerate(train_loader):
    data, target = data.to(device), target.to(device)
    optimizer.zero_grad()
    output = model(data)
    loss = F.nll_loss(output, target)
    loss.backward()
    optimizer.step()
    if batch_idx % log_interval == 0:
          print(optimizer.param_groups[0]['lr'])

param_group is dict that specifies what Tensors should be optimized along with group-specific optimization options.

The state parameters of an optimizer can be found in optimizer.param_groups; which the learning rate is a floating point value at optimizer.param_groups[0]["lr"].

Alternatively, you may use a learning rate scheduler along with your optimizer and simply call the built-in lr_scheduler.get_last lr() method.

Get Learning Rate From Scheduler

PyTorch provides several methods to adjust the learning rate based on the number of epochs. The learning rate schedule should be applied after the optimizer’s update; e.g., you should write your code this way:

for epoch in range(1, n_epochs + 1):
  train(epoch)
  test()
  scheduler.step()

One can access the list of learning rates via the method scheduler.get_last_lr() – or directly scheduler.get_last_lr()[0] if you only use a single learning rate. It computes the learning rate using a chainable form of the scheduler and returns the last computed learning rate by the current scheduler.

scheduler.get_lr()

warnings.warn(“To get the last learning rate computed by the scheduler”, “please use `get_last_lr()`.”, UserWarning)

The problem is in the use of get_lr(). To get the current LR, what you need is actually the get_last_lr().Using get_lr outside of the internal manipulation of the learning rate would yield a warning.

Get Learning rate from ReduceLROnPlateau scheduler

ReduceLROnPlateau allows change in learning rate dynamic and reducing based on some validation measurements.ReduceLROnPlateau is the only object without get_last_lr() method among all schedulers. To retrieve the learning rate in this case,You can use optimizer.param_groups[0]['lr'].

optimizer = optim.Adam(model.parameters(), lr=1e-3)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5)

for i in range(10):
    ..........
    scheduler.step()    
    print(optimizer.param_groups[0]['lr'])

You could use the internal scheduler._last_lr or scheduler.get_last_lr(). Note that these two approaches would only work after the first scheduler.step() call or alternatively, you could check the learning rate in the optimizer via optimizer.param_groups[0]['lr'].

Related Post