Picta

Linear Regression with Adam Optimizer

Alexey Novakov published on

6 min, 1077 words

Adam is one more optimization algorithm used in neural networks. It is based on adaptive estimates of lower-order moments. It has more hyper-parameters than classic Gradient Descent to tune externally

Good default settings for the tested machine learning problems are:

  • α = 0.001, // learning rate. We have already seen this one in classic Gradient Descent.
  • β1 = 0.9,
  • β2 = 0.999
  • eps = 10−8.
Read More