Alexey Novakov Notes

Linear Regression with Adam Optimizer

Alexey Novakov published on February 24, 2021

6 min, 1077 words

Adam is one more optimization algorithm used in neural networks. It is based on adaptive estimates of lower-order moments. It has more hyper-parameters than classic Gradient Descent to tune externally

Good default settings for the tested machine learning problems are:

α = 0.001, // learning rate. We have already seen this one in classic Gradient Descent.
β₁ = 0.9,
β₂ = 0.999
eps = 10−8.

Categories: scala

Tags: deep learning machine learning linear regression Adam Picta