Gradient descent algorithm uses the constant learning rate which you can provide in during the initialization. You can pass various learning rates in a way showed by Mrry.
But instead of it you can also use more advanced optimizers which have faster convergence rate and adapts to the situation.
Here is a brief explanation based on my understanding:
Adam or adaptive momentum is an algorithm similar to AdaDelta. But in addition to storing learning rates for each of the parameters it also stores momentum changes for each of them separately