Alexey Novakov published on
8 min, 1488 words
Adam is one more optimization algorithm used in neural networks. It is based on adaptive estimates of lower-order moments. It has more hyper-parameters than classic Gradient Descent to tune externally
Good default settings for the tested machine learning problems are:
- α = 0.001, // learning rate. We have already seen this one in classic Gradient Descent.
- β1 = 0.9,
- β2 = 0.999
- eps = 10−8.