After the min-max exchange, the inner simplex minimization equals the minimum coefficient.
Proposition 1.5.2.
Assume the setup of Definition 1.5.1 and the normalization (5.2).
Let ‖·‖ denote the l_infty norm on ℝ^n, so ‖s‖ = max_i |s^{(i)}|.
Then the optimal value psi* of (5.1) satisfies the dual representation
-psi* = min_{τ ≥ 0} { ∑_{i=1}^n xbar^(i) (gbar^(i) - 2 τ)_+ + τ^2/(2L) }
with (a)_+ = max{a,0} (equation (5.3)).
Consequently, psi* can be computed by a one-dimensional search over τ ≥ 0 after sorting
the components of gbar.
Definition 1.5.2.1.
For μ > 0 and u ∈ ℝ^m, define the log-sum-exp smoothing function
η(u) = μ * log (∑_{j=1}^m exp (u^{(j)} / μ)) (equation (5.4)).
Instances For
Proposition 1.5.2.1.
Let η be defined by (5.4). For any u ∈ ℝ^m, let \bar u = max_{1 ≤ j ≤ m} u^{(j)} and define
v ∈ ℝ^m by v^{(j)} = u^{(j)} - \bar u. Then η(u) = \bar u + η(v) and
\nabla η(u) = \nabla η(v) (equation (eq:auto_Proposition_5_5_content_1)).
Definition 1.5.3.1.
Assume d : Q → ℝ is differentiable and σ-strongly convex on Q. Define the Bregman distance
ξ(z,x) = d x - d z - ⟪∇ d z, x - z⟫ for z, x ∈ Q
(equation (eq:auto_Definition_5_6_content_1)).
Equations
- bregmanDistance d z x = d x - d z - DualPairing (↑(fderiv ℝ d z)) (x - z)
Instances For
Expand the Bregman distance using the Fréchet derivative.
Secant slope bound along the segment from z to x under strong convexity.
Derivative of t ↦ d (z + t • (x - z)) at t = 0.
Convert a right-hand secant bound into a bound on the derivative.
Definition 1.5.3.1.
In the setting of Definition 1.5.3.1, the Bregman distance satisfies
ξ(z,x) ≥ (σ/2) ‖x - z‖^2 for all z, x ∈ Q.
Definition 1.5.3.1.
Define the mapping
V_Q(z,g) = argmin_{x ∈ Q} { ⟪g, x - z⟫ + ξ(z,x) }
(equation (eq:auto_Definition_5_6_content_2)).
Equations
- V_Q Q d z g = if h : ∃ (x : ↑Q), IsMinOn (fun (x : E) => DualPairing g (x - ↑z) + bregmanDistance d (↑z) x) Q ↑x then Classical.choose h else z
Instances For
If the minimization problem has a minimizer, V_Q selects one.
On the simplex, the entropy Bregman distance equals the KL divergence.