Documentation

Papers.SmoothMinimization_Nesterov_2004.Sections.section05_part3

theorem simplexProximalValue_dual_after_exchange (n : ) (xbar gbar : Fin n) (L : ) (hxbar : xbar standardSimplex n) (hL : 0 < L) :
simplexProximalValue n xbar gbar L = sSup ((fun (s : Fin n) => -i : Fin n, (gbar i + s i) * xbar i - 1 / (2 * L) * s ^ 2 + sInf ((fun (i : Fin n) => gbar i + s i) '' Set.univ)) '' Set.univ)

After the min-max exchange, the inner simplex minimization equals the minimum coefficient.

theorem simplexProximalValue_exists_zero_coord (n : ) (gbar : Fin n) (hn : 0 < n) (hmin : sInf ((fun (i : Fin n) => gbar i) '' Set.univ) = 0) :
∃ (i0 : Fin n), gbar i0 = 0

Normalization on a finite index set yields a zero coordinate.

theorem simplexProximalValue_gbar_nonneg (n : ) (gbar : Fin n) (hmin : sInf ((fun (i : Fin n) => gbar i) '' Set.univ) = 0) (i : Fin n) :
0 gbar i

Normalization implies nonnegativity of all coordinates.

theorem max_sub_eq (a b : ) :
max a b - a = max (b - a) 0

Shifting inside a max yields a positive-part expression.

theorem max_sub_ge_of_le (lam τ g : ) (hlam : lam τ) :
max lam (g - τ) - lam max (g - 2 * τ) 0

If λ ≤ τ, the shifted max dominates the truncation.

theorem simplexProximalValue_dual_reduce_to_tau_lower_bound (n : ) (xbar gbar : Fin n) (L : ) (hn : 0 < n) (hxbar : xbar standardSimplex n) (hmin : sInf ((fun (i : Fin n) => gbar i) '' Set.univ) = 0) (s : Fin n) :
i : Fin n, s i * xbar i + 1 / (2 * L) * s - gbar ^ 2 - sInf ((fun (i : Fin n) => s i) '' Set.univ) i : Fin n, xbar i * max (gbar i - 2 * s - gbar) 0 + s - gbar ^ 2 / (2 * L)

Lower bound the dual objective by the τ-reduced expression.

theorem simplexProximalValue_dual_reduce_to_tau_construct (n : ) (xbar gbar : Fin n) (L : ) (hn : 0 < n) (hxbar : xbar standardSimplex n) (hmin : sInf ((fun (i : Fin n) => gbar i) '' Set.univ) = 0) {τ : } :
0 τ∃ (s : Fin n), i : Fin n, s i * xbar i + 1 / (2 * L) * s - gbar ^ 2 - sInf ((fun (i : Fin n) => s i) '' Set.univ) = i : Fin n, xbar i * max (gbar i - 2 * τ) 0 + τ ^ 2 / (2 * L)

Construct a dual variable achieving the τ-reduction.

theorem simplexProximalValue_dual_reduce_to_tau_core (n : ) (xbar gbar : Fin n) (L : ) (hxbar : xbar standardSimplex n) (hmin : sInf ((fun (i : Fin n) => gbar i) '' Set.univ) = 0) (hL : 0 < L) :
sInf ((fun (s : Fin n) => i : Fin n, s i * xbar i + 1 / (2 * L) * s - gbar ^ 2 - sInf ((fun (i : Fin n) => s i) '' Set.univ)) '' Set.univ) = sInf ((fun (τ : ) => i : Fin n, xbar i * max (gbar i - 2 * τ) 0 + τ ^ 2 / (2 * L)) '' Set.Ici 0)

Reduce the swapped dual expression to the one-dimensional τ infimum.

theorem simplexProximalValue_dual_reduce_to_tau (n : ) (xbar gbar : Fin n) (L : ) (hxbar : xbar standardSimplex n) (hmin : sInf ((fun (i : Fin n) => gbar i) '' Set.univ) = 0) (hL : 0 < L) :
-sSup ((fun (s : Fin n) => -i : Fin n, (gbar i + s i) * xbar i - 1 / (2 * L) * s ^ 2 + sInf ((fun (i : Fin n) => gbar i + s i) '' Set.univ)) '' Set.univ) = sInf ((fun (τ : ) => i : Fin n, xbar i * max (gbar i - 2 * τ) 0 + τ ^ 2 / (2 * L)) '' Set.Ici 0)

Reduce the swapped dual expression to the one-dimensional τ infimum.

theorem simplexProximalValue_dual_representation (n : ) (xbar gbar : Fin n) (L : ) (hxbar : xbar standardSimplex n) (hmin : sInf ((fun (i : Fin n) => gbar i) '' Set.univ) = 0) (hL : 0 < L) :
-simplexProximalValue n xbar gbar L = sInf ((fun (τ : ) => i : Fin n, xbar i * max (gbar i - 2 * τ) 0 + τ ^ 2 / (2 * L)) '' Set.Ici 0)

Proposition 1.5.2. Assume the setup of Definition 1.5.1 and the normalization (5.2). Let ‖·‖ denote the l_infty norm on ℝ^n, so ‖s‖ = max_i |s^{(i)}|. Then the optimal value psi* of (5.1) satisfies the dual representation -psi* = min_{τ ≥ 0} { ∑_{i=1}^n xbar^(i) (gbar^(i) - 2 τ)_+ + τ^2/(2L) } with (a)_+ = max{a,0} (equation (5.3)). Consequently, psi* can be computed by a one-dimensional search over τ ≥ 0 after sorting the components of gbar.

noncomputable def logSumExpSmooth (m : ) (μ : ) (u : Fin m) :

Definition 1.5.2.1. For μ > 0 and u ∈ ℝ^m, define the log-sum-exp smoothing function η(u) = μ * log (∑_{j=1}^m exp (u^{(j)} / μ)) (equation (5.4)).

Equations
Instances For
    theorem logSumExpSmooth_add_const (m : ) (μ : ) (hm : 0 < m) ( : 0 < μ) (u : Fin m) (c : ) :
    (logSumExpSmooth m μ fun (j : Fin m) => u j + c) = c + logSumExpSmooth m μ u

    Shifting the input of log-sum-exp by a constant adds that constant.

    theorem fderiv_logSumExpSmooth_add_const (m : ) (μ : ) (hm : 0 < m) ( : 0 < μ) (u : Fin m) (c : ) :
    fderiv (logSumExpSmooth m μ) (u + fun (x : Fin m) => c) = fderiv (logSumExpSmooth m μ) u

    The derivative of log-sum-exp is invariant under constant shifts.

    theorem logSumExpSmooth_shift (m : ) (μ : ) ( : 0 < μ) (u : Fin m) :
    have ubar := sSup (Set.range u); have v := fun (j : Fin m) => u j - ubar; logSumExpSmooth m μ u = ubar + logSumExpSmooth m μ v fderiv (logSumExpSmooth m μ) u = fderiv (logSumExpSmooth m μ) v

    Proposition 1.5.2.1. Let η be defined by (5.4). For any u ∈ ℝ^m, let \bar u = max_{1 ≤ j ≤ m} u^{(j)} and define v ∈ ℝ^m by v^{(j)} = u^{(j)} - \bar u. Then η(u) = \bar u + η(v) and \nabla η(u) = \nabla η(v) (equation (eq:auto_Proposition_5_5_content_1)).

    noncomputable def bregmanDistance {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] (d : E) (z x : E) :

    Definition 1.5.3.1. Assume d : Q → ℝ is differentiable and σ-strongly convex on Q. Define the Bregman distance ξ(z,x) = d x - d z - ⟪∇ d z, x - z⟫ for z, x ∈ Q (equation (eq:auto_Definition_5_6_content_1)).

    Equations
    Instances For
      theorem bregmanDistance_eq_sub_fderiv {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] (d : E) (z x : E) :
      bregmanDistance d z x = d x - d z - (fderiv d z) (x - z)

      Expand the Bregman distance using the Fréchet derivative.

      theorem strongConvexOn_secant_slope_bound {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] {Q : Set E} {d : E} {σ : } (hconv : StrongConvexOn Q σ d) {z x : E} (hz : z Q) (hx : x Q) {t : } (ht : t Set.Ioo 0 1) :
      t⁻¹ * (d (z + t (x - z)) - d z) d x - d z - (1 - t) * (σ / 2 * x - z ^ 2)

      Secant slope bound along the segment from z to x under strong convexity.

      theorem hasDerivAt_bregman_line {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] {d : E} {z x : E} (hdiffz : DifferentiableAt d z) :
      HasDerivAt (fun (t : ) => d (z + t (x - z))) ((fderiv d z) (x - z)) 0

      Derivative of t ↦ d (z + t • (x - z)) at t = 0.

      theorem deriv_le_of_secant_bound_nhdsGT {φ g : } {φ' G : } (hderiv : HasDerivAt φ φ' 0) (hbound : tSet.Ioo 0 1, t⁻¹ * (φ t - φ 0) g t) (hlim : Filter.Tendsto g (nhdsWithin 0 (Set.Ioi 0)) (nhds G)) :
      φ' G

      Convert a right-hand secant bound into a bound on the derivative.

      theorem bregmanDistance_lower_bound {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {d : E} {σ : } (hdiff : zQ, DifferentiableAt d z) (hconv : StrongConvexOn Q σ d) (z : E) :
      z QxQ, bregmanDistance d z x 1 / 2 * σ * x - z ^ 2

      Definition 1.5.3.1. In the setting of Definition 1.5.3.1, the Bregman distance satisfies ξ(z,x) ≥ (σ/2) ‖x - z‖^2 for all z, x ∈ Q.

      noncomputable def V_Q {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] (Q : Set E) (d : E) (z : Q) (g : Module.Dual E) :
      Q

      Definition 1.5.3.1. Define the mapping V_Q(z,g) = argmin_{x ∈ Q} { ⟪g, x - z⟫ + ξ(z,x) } (equation (eq:auto_Definition_5_6_content_2)).

      Equations
      Instances For
        theorem V_Q_spec_isMinOn {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {d : E} (z : Q) (g : Module.Dual E) (hmin : ∃ (x : Q), IsMinOn (fun (x : E) => DualPairing g (x - z) + bregmanDistance d (↑z) x) Q x) :
        IsMinOn (fun (x : E) => DualPairing g (x - z) + bregmanDistance d (↑z) x) Q (V_Q Q d z g)

        If the minimization problem has a minimizer, V_Q selects one.

        theorem DualPairing_eq_sum_gcoord_standardBasis (n : ) (g : Module.Dual (Fin n)) (x : Fin n) :
        DualPairing g x = i : Fin n, (g fun (j : Fin n) => if j = i then 1 else 0) * x i

        Expand a linear functional on Fin n → ℝ in the standard basis.

        theorem fderiv_entropy_sum (n : ) (z : Fin n) (hz_pos : ∀ (i : Fin n), 0 < z i) :
        fderiv (fun (x : Fin n) => i : Fin n, x i * Real.log (x i)) z = i : Fin n, (Real.log (z i) + 1) ContinuousLinearMap.proj i

        Fréchet derivative of the entropy sum ∑ i, x i * log(x i) at a positive point.

        theorem bregmanDistance_entropy_eq_sum_mul_log_div_on_simplex (n : ) (z : (standardSimplex n)) (x : Fin n) (hx : x standardSimplex n) (hz_pos : ∀ (i : Fin n), 0 < z i) :
        have d := fun (y : Fin n) => Real.log n + i : Fin n, y i * Real.log (y i); bregmanDistance d (↑z) x = i : Fin n, x i * Real.log (x i / z i)

        On the simplex, the entropy Bregman distance equals the KL divergence.