I am using TensorFlow to train a neural network. This is how I am initializing the GradientDescentOptimizer
:
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
mse = tf.reduce_mean(tf.square(out - out_))
train_step = tf.train.GradientDescentOptimizer(0.3).minimize(mse)
The thing here is that I don't know how to set an update rule for the learning rate or a decay value for that.
How can I use an adaptive learning rate here?
This question is related to
python
tensorflow
Gradient descent algorithm uses the constant learning rate which you can provide in during the initialization. You can pass various learning rates in a way showed by Mrry.
But instead of it you can also use more advanced optimizers which have faster convergence rate and adapts to the situation.
Here is a brief explanation based on my understanding:
Adam or adaptive momentum is an algorithm similar to AdaDelta. But in addition to storing learning rates for each of the parameters it also stores momentum changes for each of them separately
From tensorflow official docs
global_step = tf.Variable(0, trainable=False)
starter_learning_rate = 0.1
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
100000, 0.96, staircase=True)
# Passing global_step to minimize() will increment it at each step.
learning_step = (
tf.train.GradientDescentOptimizer(learning_rate)
.minimize(...my loss..., global_step=global_step))
Tensorflow provides an op to automatically apply an exponential decay to a learning rate tensor: tf.train.exponential_decay
. For an example of it in use, see this line in the MNIST convolutional model example. Then use @mrry's suggestion above to supply this variable as the learning_rate parameter to your optimizer of choice.
The key excerpt to look at is:
# Optimizer: set up a variable that's incremented once per batch and
# controls the learning rate decay.
batch = tf.Variable(0)
learning_rate = tf.train.exponential_decay(
0.01, # Base learning rate.
batch * BATCH_SIZE, # Current index into the dataset.
train_size, # Decay step.
0.95, # Decay rate.
staircase=True)
# Use simple momentum for the optimization.
optimizer = tf.train.MomentumOptimizer(learning_rate,
0.9).minimize(loss,
global_step=batch)
Note the global_step=batch
parameter to minimize. That tells the optimizer to helpfully increment the 'batch' parameter for you every time it trains.
If you want to set specific learning rates for intervals of epochs like 0 < a < b < c < ...
. Then you can define your learning rate as a conditional tensor, conditional on the global step, and feed this as normal to the optimiser.
You could achieve this with a bunch of nested tf.cond
statements, but its easier to build the tensor recursively:
def make_learning_rate_tensor(reduction_steps, learning_rates, global_step):
assert len(reduction_steps) + 1 == len(learning_rates)
if len(reduction_steps) == 1:
return tf.cond(
global_step < reduction_steps[0],
lambda: learning_rates[0],
lambda: learning_rates[1]
)
else:
return tf.cond(
global_step < reduction_steps[0],
lambda: learning_rates[0],
lambda: make_learning_rate_tensor(
reduction_steps[1:],
learning_rates[1:],
global_step,)
)
Then to use it you need to know how many training steps there are in a single epoch, so that we can use the global step to switch at the right time, and finally define the epochs and learning rates you want. So if I want the learning rates [0.1, 0.01, 0.001, 0.0001]
during the epoch intervals of [0, 19], [20, 59], [60, 99], [100, \infty]
respectively, I would do:
global_step = tf.train.get_or_create_global_step()
learning_rates = [0.1, 0.01, 0.001, 0.0001]
steps_per_epoch = 225
epochs_to_switch_at = [20, 60, 100]
epochs_to_switch_at = [x*steps_per_epoch for x in epochs_to_switch_at ]
learning_rate = make_learning_rate_tensor(epochs_to_switch_at , learning_rates, global_step)
Source: Stackoverflow.com