Efficiency is key in optimization.
To that end, how can you craft your algorithm
to respect a problemโs geometry?
Standard algorithms often use Euclidean geometries,
but one can generalize these with
๐๐ฟ๐ฒ๐ด๐บ๐ฎ๐ป ๐ฑ๐ถ๐๐ฒ๐ฟ๐ด๐ฒ๐ป๐ฐ๐ฒ๐.
Essentially, one can use a convex function
and measure the difference between two points
as the difference between the function value
at one point and a tangent line,
starting from the other point.
This can yield simpler algorithms
(e.g. when minimizing over the unit simplex)
and/or more quickly converging algorithms.
For more details, check out the slides below.
Cheers,
Howard
๐ฅ๐ฒ๐ฐ๐ผ๐บ๐บ๐ฒ๐ป๐ฑ๐ฎ๐๐ถ๐ผ๐ป๐ ๐ณ๐ผ๐ฟ ๐ณ๐๐ฟ๐๐ต๐ฒ๐ฟ ๐ฟ๐ฒ๐ฎ๐ฑ๐ถ๐ป๐ด:
- Chapter 2 of Yairโs book "Parallel Optimization: Theory, Algorithms, and Applications"
- Chapter 9 of Amir Beckโs book "First-Order Methods in Optimization"
- Lieven Vandenberghe's 236C slides
๐ฆ๐ถ๐ฑ๐ฒ ๐๐๐ผ๐ฟ๐:
Yair Censor and I used to make annual trips
to this one Dairy Queen in SoCal.
As I recall, he said back in the day (i.e. before internet),
he and a collaborator would scroll
through lists of abstracts at the library
looking for anything interesting.
On one of those library trips,
they came across Lev Bregmanโs 1967 paper.
Apparently, the idea went unnoticed for 14 years
until Yairโs 1981 paper introduced these
divergences in the form we use today.
What a good reminder to not just study recent works.
p.s. Thanks are due to reader Liang Dai for introducing me to the area perspective for visualizing Bregman divergences.