It is one of the most important hyper-parameter for Long Short Term Models, always accounting for more than two third of the variance, which makes it very important to set it correctly to achieve good performance. Figure showing the importance and its impact on the error and the time, as observed from a research papers has been attached below to illustrate and emphasize its importance. [1]

Key Takeaway : Need to find the Goldilocks zone for the learning rate

  • Too small learning rate , longer training time and worst error rate (by getting stuck at local minima)
  • Too large learning rate, smaller training time but poor error rate.
  • “The analysis of hyperparameter interactions revealed that even the highest measured interaction (between learning rate and network size) is quite small. This implies that the hyperparameters can be tuned independently. In particular, the learning rate can be calibrated first using a fairly small network, thus saving a lot of experimentation time.” [1]

Learning Rate vs Error vs Training Time.jpg

Figure Showing the impact of the learning rate on the error and the training time

The following diagram 2 should also, be able to emphasize on its importance.


Figure. Showing importance of learning rate on accuracy [2]