Gradient Descent Essentials

Gradient descent is arguably the most widely used optimization techniques. At least in my personal research, it almost occupied the majority of my research solutions.

Gradient descent looks pretty simple at the first glance but it is worthwhile to dig into it a little bit.

What is Gradient Descent?

Think about optimizing an unconstrained, smooth convex optimization problem

Gradient descent solves the above problem as follows:

Start the iteration and stop at some point.

Intuitively, this can be understood easily: every iteration, we are moving along the direction which can minimize the function value and how far we move is controlled by step size $t_k$.

Mathematical Interpretation

The intuition of gradient descent is pretty clear. However, as a mathematical tool, it should root in the solid mathematical foundation.

Remember the Taylor Series,

In optimization, a widely used trick is quadratic approximation based on taylor expansion,

Here, we play the same trick with a slight change, we replace by , which leads to the following approximation:

represents the linear approximation to .
represents the proximity term to , with a weight .

Going back to our minimization problem, we want to find the minimizer iteratively. Assume we just finish the th iteration and are about to start the th iteration. We have to solve the following problem:

By applying the slightly changed quadratic optimization, we will solve the following new problem:

Of course, this is much easier to solve. Let’s to the high school algebra,

Note that we apply this approximation at every iteration, and we can set to different values according to each iteration, such as setting to at th iteration.

Here is a plot illustration from my instructor Ryan’s slides.

gd_approx_illustration

How to choose Step Size

Setting the step size is an art in gradient descent and it can directly lead your optimization procedure to success or failure.

I have seen lots of presentations and papers talked about this. Some did it right, some did it wrong, and some did it partially right (of course, it means partially wrong.)

There is no need to argue the importance of step size: if it is too small, it takes forever to get converged; if it is too large, the program jumps back and forth and never reaches the optimality.

Backtracking Line Search

A general solution to adaptively choose the step size is to use backtracking line search. It is a general and simple solution and lots of people have demonstrate its practical usage.

The backtracking line search works as follows:

First fix parameters and .
At each iteration, start with , and while , shrink . Else perform gradient descent update .

Usually, can be simply set to 1/2.

Interpretation

Exact Line Search

Constant Step Size

I have to admit the power of backtracking line search. However, in some (lots of) cases, we even don’t need the backtracking.

When the function is

convex
differentiable
dom
is Lipschitz continuous.

If our objective function satisfy the above conditions, we can simply set the step size to constant!

Lipschitz continuous

A function is Lipschitz continuous if for any x and y, we can always find a positive constant , such that

Here we use since our input is scalar, the distance function depends on the metric space.

In our case, we require is Lipschitz continuous, which means

Choice of step size

Theorem

Gradient descent with fixed step size satisfies .

Proof

The above Theorem leads to the following two equivalent lemmas.

Lemma1

Gradient descent with fixed step size has convergence rate .

Lemma2

Gradient descent with fixed step size needs to get .

Gradient Descent Essentials

What is Gradient Descent?

Mathematical Interpretation

How to choose Step Size

Backtracking Line Search

Interpretation

Exact Line Search

Constant Step Size

Lipschitz continuous

Choice of step size

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Bureau of Internal Revenue: Regional Offices (Directory)

GTA 5 PPSSPP Zip File Download For Android Mediafire 382 MB

Shivaji University Result 2017 BA B.Com B.Sc 1st, 2nd & 3rd Year परिणाम यंहा...

NY-PHIL Mafia’s “Peter Pan” Tuccio Got A Beat Down For Being Disrespectful To...

Black Angus Grilled Artichokes

[GET] Jenna Kutcher – The Instagram Lab 2.0 ($297.00)

[アメリカドラマ][WEBDL] ナルコワールド麻薬取引の実態全4話

Maryland: State Police report DUI arrests for Aug. 16th – 31st 2015; beer and...

Cris MJ – Apocalipsis [iTunes Plus M4A]

Demi Lovato – Tell Me You Love Me (Remixes) – 2018 – iTunes Plus AAC M4A – EP

Windows Update / Microsoft Update の接続先 URL について

Return To Forever – Musicmagic (1977) [Audio Fidelity 2016] {SACD ISO + FLAC...

The 10 Tennessee Cities With The Largest Black Population For 2021

Moondru Mudichu 16-05-2017 – Polimer tv Serial

New Guidelines for settlement of Medical claims of pensioners and others in...

Cecil Smith Has Taken His Life, After Being the Subject of Conspiracy...

99 Rain Status for Whatsapp - Best Rain Dp Collection

RE: Same voucher no. with different dates in AX 2009

DJ Snake – Encore [iTunes Plus M4A]