It is one of the most important hyper-parameter for Long Short Term Models, always accounting for more than two third of the variance, which makes it very important to set it correctly to achieve good performance. Figure showing the importance and its impact on the error and the time, as observed from a research papers has been attached below to illustrate and emphasize its importance. 
Key Takeaway : Need to find the Goldilocks zone for the learning rate
- Too small learning rate , longer training time and worst error rate (by getting stuck at local minima)
- Too large learning rate, smaller training time but poor error rate.
- “The analysis of hyperparameter interactions revealed that even the highest measured interaction (between learning rate and network size) is quite small. This implies that the hyperparameters can be tuned independently. In particular, the learning rate can be calibrated first using a fairly small network, thus saving a lot of experimentation time.” 
Figure Showing the impact of the learning rate on the error and the training time
The following diagram 2 should also, be able to emphasize on its importance.
Figure. Showing importance of learning rate on accuracy 
Too small learning rate
Learning rate = 0.5, Number of computation steps = Large (11)
Too Large Learning rate: Sub -Optimal Convergence
Learning rate = 5, Number of computation steps = 2 , Sub-optimal convergence at 3.0
Too Large Learning rate : Also possible divergence
Here we can easily see that, because of high learning rate, the parameters are being toggle from negative to positive back and froth e.g θ0 toggles from 0.5 to -0.8 to 0.7 to -0.8 to 0.9 to -1.0. All in the hope that it will converge, but as we can see clearly the problem instead of converging is diverging.