Linear Regression with Adam Optimizer
Alexey Novakov published on
6 min, 1077 words
Adam is one more optimization algorithm used in neural networks. It is based on adaptive estimates of lower-order moments. It has more hyper-parameters than classic Gradient Descent to tune externally
Good default settings for the tested machine learning problems are:
- α = 0.001, // learning rate. We have already seen this one in classic Gradient Descent.
- β1 = 0.9,
- β2 = 0.999
- eps = 10−8.
Categories: scala