## Linear Regression with Adam Optimizer

Alexey Novakov published on

8 min, 1488 words

Adam is one more optimization algorithm used in neural networks. It is based on adaptive estimates of lower-order moments. It has more hyper-parameters than classic Gradient Descent to tune externally

Good default settings for the tested machine learning problems are:

- α = 0.001, // learning rate. We have already seen this one in classic Gradient Descent.
- β
_{1}= 0.9, - β
_{2}= 0.999 - eps = 10−8.

Categories: scala