Papers.SmoothMinimization_Nesterov_2004.Sections.section05

source

theorem simplexProximalValue_dual_after_exchange (n : ℕ) (xbar gbar : Fin n → ℝ) (L : ℝ) (hxbar : xbar ∈ standardSimplex n) (hL : 0 < L) :

simplexProximalValue n xbar gbar L = sSup ((fun (s : Fin n → ℝ) => -∑ i : Fin n, (gbar i + s i) * xbar i - 1 / (2 * L) * ‖s‖ ^ 2 + sInf ((fun (i : Fin n) => gbar i + s i) '' Set.univ)) '' Set.univ)

After the min-max exchange, the inner simplex minimization equals the minimum coefficient.

source

theorem simplexProximalValue_exists_zero_coord (n : ℕ) (gbar : Fin n → ℝ) (hn : 0 < n) (hmin : sInf ((fun (i : Fin n) => gbar i) '' Set.univ) = 0) :

∃ (i0 : Fin n), gbar i0 = 0

Normalization on a finite index set yields a zero coordinate.

source

theorem simplexProximalValue_gbar_nonneg (n : ℕ) (gbar : Fin n → ℝ) (hmin : sInf ((fun (i : Fin n) => gbar i) '' Set.univ) = 0) (i : Fin n) :

0 ≤ gbar i

Normalization implies nonnegativity of all coordinates.

source

theorem max_sub_eq (a b : ℝ) :

max a b - a = max (b - a) 0

Shifting inside a max yields a positive-part expression.

source

theorem max_sub_ge_of_le (lam τ g : ℝ) (hlam : lam ≤ τ) :

max lam (g - τ) - lam ≥ max (g - 2 * τ) 0

If λ ≤ τ, the shifted max dominates the 2τ truncation.

source

theorem simplexProximalValue_dual_reduce_to_tau_lower_bound (n : ℕ) (xbar gbar : Fin n → ℝ) (L : ℝ) (hn : 0 < n) (hxbar : xbar ∈ standardSimplex n) (hmin : sInf ((fun (i : Fin n) => gbar i) '' Set.univ) = 0) (s : Fin n → ℝ) :

∑ i : Fin n, s i * xbar i + 1 / (2 * L) * ‖s - gbar‖ ^ 2 - sInf ((fun (i : Fin n) => s i) '' Set.univ) ≥ ∑ i : Fin n, xbar i * max (gbar i - 2 * ‖s - gbar‖) 0 + ‖s - gbar‖ ^ 2 / (2 * L)

Lower bound the dual objective by the τ-reduced expression.

source

theorem simplexProximalValue_dual_reduce_to_tau_construct (n : ℕ) (xbar gbar : Fin n → ℝ) (L : ℝ) (hn : 0 < n) (hxbar : xbar ∈ standardSimplex n) (hmin : sInf ((fun (i : Fin n) => gbar i) '' Set.univ) = 0) {τ : ℝ} :

0 ≤ τ → ∃ (s : Fin n → ℝ), ∑ i : Fin n, s i * xbar i + 1 / (2 * L) * ‖s - gbar‖ ^ 2 - sInf ((fun (i : Fin n) => s i) '' Set.univ) = ∑ i : Fin n, xbar i * max (gbar i - 2 * τ) 0 + τ ^ 2 / (2 * L)

Construct a dual variable achieving the τ-reduction.

source

theorem simplexProximalValue_dual_reduce_to_tau_core (n : ℕ) (xbar gbar : Fin n → ℝ) (L : ℝ) (hxbar : xbar ∈ standardSimplex n) (hmin : sInf ((fun (i : Fin n) => gbar i) '' Set.univ) = 0) (hL : 0 < L) :

sInf ((fun (s : Fin n → ℝ) => ∑ i : Fin n, s i * xbar i + 1 / (2 * L) * ‖s - gbar‖ ^ 2 - sInf ((fun (i : Fin n) => s i) '' Set.univ)) '' Set.univ) = sInf ((fun (τ : ℝ) => ∑ i : Fin n, xbar i * max (gbar i - 2 * τ) 0 + τ ^ 2 / (2 * L)) '' Set.Ici 0)

Reduce the swapped dual expression to the one-dimensional τ infimum.

source

theorem simplexProximalValue_dual_reduce_to_tau (n : ℕ) (xbar gbar : Fin n → ℝ) (L : ℝ) (hxbar : xbar ∈ standardSimplex n) (hmin : sInf ((fun (i : Fin n) => gbar i) '' Set.univ) = 0) (hL : 0 < L) :

-sSup ((fun (s : Fin n → ℝ) => -∑ i : Fin n, (gbar i + s i) * xbar i - 1 / (2 * L) * ‖s‖ ^ 2 + sInf ((fun (i : Fin n) => gbar i + s i) '' Set.univ)) '' Set.univ) = sInf ((fun (τ : ℝ) => ∑ i : Fin n, xbar i * max (gbar i - 2 * τ) 0 + τ ^ 2 / (2 * L)) '' Set.Ici 0)

Reduce the swapped dual expression to the one-dimensional τ infimum.

source

theorem simplexProximalValue_dual_representation (n : ℕ) (xbar gbar : Fin n → ℝ) (L : ℝ) (hxbar : xbar ∈ standardSimplex n) (hmin : sInf ((fun (i : Fin n) => gbar i) '' Set.univ) = 0) (hL : 0 < L) :

-simplexProximalValue n xbar gbar L = sInf ((fun (τ : ℝ) => ∑ i : Fin n, xbar i * max (gbar i - 2 * τ) 0 + τ ^ 2 / (2 * L)) '' Set.Ici 0)

Proposition 1.5.2. Assume the setup of Definition 1.5.1 and the normalization (5.2). Let ‖·‖ denote the l_infty norm on ℝ^n, so ‖s‖ = max_i |s^{(i)}|. Then the optimal value psi* of (5.1) satisfies the dual representation -psi* = min_{τ ≥ 0} { ∑_{i=1}^n xbar^(i) (gbar^(i) - 2 τ)_+ + τ^2/(2L) } with (a)_+ = max{a,0} (equation (5.3)). Consequently, psi* can be computed by a one-dimensional search over τ ≥ 0 after sorting the components of gbar.

source

noncomputable def logSumExpSmooth (m : ℕ) (μ : ℝ) (u : Fin m → ℝ) :

ℝ

Definition 1.5.2.1. For μ > 0 and u ∈ ℝ^m, define the log-sum-exp smoothing function η(u) = μ * log (∑_{j=1}^m exp (u^{(j)} / μ)) (equation (5.4)).

Equations

logSumExpSmooth m μ u = μ * Real.log (∑ j : Fin m, Real.exp (u j / μ))

Instances For

source

theorem logSumExpSmooth_add_const (m : ℕ) (μ : ℝ) (hm : 0 < m) (hμ : 0 < μ) (u : Fin m → ℝ) (c : ℝ) :

(logSumExpSmooth m μ fun (j : Fin m) => u j + c) = c + logSumExpSmooth m μ u

Shifting the input of log-sum-exp by a constant adds that constant.

source

theorem fderiv_logSumExpSmooth_add_const (m : ℕ) (μ : ℝ) (hm : 0 < m) (hμ : 0 < μ) (u : Fin m → ℝ) (c : ℝ) :

fderiv ℝ (logSumExpSmooth m μ) (u + fun (x : Fin m) => c) = fderiv ℝ (logSumExpSmooth m μ) u

The derivative of log-sum-exp is invariant under constant shifts.

source

theorem logSumExpSmooth_shift (m : ℕ) (μ : ℝ) (hμ : 0 < μ) (u : Fin m → ℝ) :

have ubar := sSup (Set.range u); have v := fun (j : Fin m) => u j - ubar; logSumExpSmooth m μ u = ubar + logSumExpSmooth m μ v ∧ fderiv ℝ (logSumExpSmooth m μ) u = fderiv ℝ (logSumExpSmooth m μ) v

Proposition 1.5.2.1. Let η be defined by (5.4). For any u ∈ ℝ^m, let \bar u = max_{1 ≤ j ≤ m} u^{(j)} and define v ∈ ℝ^m by v^{(j)} = u^{(j)} - \bar u. Then η(u) = \bar u + η(v) and \nabla η(u) = \nabla η(v) (equation (eq:auto_Proposition_5_5_content_1)).

source

noncomputable def bregmanDistance {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] (d : E → ℝ) (z x : E) :

ℝ

Definition 1.5.3.1. Assume d : Q → ℝ is differentiable and σ-strongly convex on Q. Define the Bregman distance ξ(z,x) = d x - d z - ⟪∇ d z, x - z⟫ for z, x ∈ Q (equation (eq:auto_Definition_5_6_content_1)).

Equations

bregmanDistance d z x = d x - d z - DualPairing (↑(fderiv ℝ d z)) (x - z)

Instances For

source

theorem bregmanDistance_eq_sub_fderiv {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] (d : E → ℝ) (z x : E) :

bregmanDistance d z x = d x - d z - (fderiv ℝ d z) (x - z)

Expand the Bregman distance using the Fréchet derivative.

source

theorem strongConvexOn_secant_slope_bound {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] {Q : Set E} {d : E → ℝ} {σ : ℝ} (hconv : StrongConvexOn Q σ d) {z x : E} (hz : z ∈ Q) (hx : x ∈ Q) {t : ℝ} (ht : t ∈ Set.Ioo 0 1) :

t⁻¹ * (d (z + t • (x - z)) - d z) ≤ d x - d z - (1 - t) * (σ / 2 * ‖x - z‖ ^ 2)

Secant slope bound along the segment from z to x under strong convexity.

source

theorem hasDerivAt_bregman_line {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] {d : E → ℝ} {z x : E} (hdiffz : DifferentiableAt ℝ d z) :

HasDerivAt (fun (t : ℝ) => d (z + t • (x - z))) ((fderiv ℝ d z) (x - z)) 0

Derivative of t ↦ d (z + t • (x - z)) at t = 0.

source

theorem deriv_le_of_secant_bound_nhdsGT {φ g : ℝ → ℝ} {φ' G : ℝ} (hderiv : HasDerivAt φ φ' 0) (hbound : ∀ t ∈ Set.Ioo 0 1, t⁻¹ * (φ t - φ 0) ≤ g t) (hlim : Filter.Tendsto g (nhdsWithin 0 (Set.Ioi 0)) (nhds G)) :

φ' ≤ G

Convert a right-hand secant bound into a bound on the derivative.

source

theorem bregmanDistance_lower_bound {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} {d : E → ℝ} {σ : ℝ} (hdiff : ∀ z ∈ Q, DifferentiableAt ℝ d z) (hconv : StrongConvexOn Q σ d) (z : E) :

z ∈ Q → ∀ x ∈ Q, bregmanDistance d z x ≥ 1 / 2 * σ * ‖x - z‖ ^ 2

Definition 1.5.3.1. In the setting of Definition 1.5.3.1, the Bregman distance satisfies ξ(z,x) ≥ (σ/2) ‖x - z‖^2 for all z, x ∈ Q.

source

noncomputable def V_Q {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] (Q : Set E) (d : E → ℝ) (z : ↑Q) (g : Module.Dual ℝ E) :

↑Q

Definition 1.5.3.1. Define the mapping V_Q(z,g) = argmin_{x ∈ Q} { ⟪g, x - z⟫ + ξ(z,x) } (equation (eq:auto_Definition_5_6_content_2)).

Equations

V_Q Q d z g = if h : ∃ (x : ↑Q), IsMinOn (fun (x : E) => DualPairing g (x - ↑z) + bregmanDistance d (↑z) x) Q ↑x then Classical.choose h else z

Instances For

source

theorem V_Q_spec_isMinOn {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} {d : E → ℝ} (z : ↑Q) (g : Module.Dual ℝ E) (hmin : ∃ (x : ↑Q), IsMinOn (fun (x : E) => DualPairing g (x - ↑z) + bregmanDistance d (↑z) x) Q ↑x) :

IsMinOn (fun (x : E) => DualPairing g (x - ↑z) + bregmanDistance d (↑z) x) Q ↑(V_Q Q d z g)

If the minimization problem has a minimizer, V_Q selects one.

source

theorem DualPairing_eq_sum_gcoord_standardBasis (n : ℕ) (g : Module.Dual ℝ (Fin n → ℝ)) (x : Fin n → ℝ) :

DualPairing g x = ∑ i : Fin n, (g fun (j : Fin n) => if j = i then 1 else 0) * x i

Expand a linear functional on Fin n → ℝ in the standard basis.

source

theorem fderiv_entropy_sum (n : ℕ) (z : Fin n → ℝ) (hz_pos : ∀ (i : Fin n), 0 < z i) :

fderiv ℝ (fun (x : Fin n → ℝ) => ∑ i : Fin n, x i * Real.log (x i)) z = ∑ i : Fin n, (Real.log (z i) + 1) • ContinuousLinearMap.proj i

Fréchet derivative of the entropy sum ∑ i, x i * log(x i) at a positive point.

source

theorem bregmanDistance_entropy_eq_sum_mul_log_div_on_simplex (n : ℕ) (z : ↑(standardSimplex n)) (x : Fin n → ℝ) (hx : x ∈ standardSimplex n) (hz_pos : ∀ (i : Fin n), 0 < ↑z i) :

have d := fun (y : Fin n → ℝ) => Real.log ↑n + ∑ i : Fin n, y i * Real.log (y i); bregmanDistance d (↑z) x = ∑ i : Fin n, x i * Real.log (x i / ↑z i)

On the simplex, the entropy Bregman distance equals the KL divergence.

Documentation

Papers.SmoothMinimization_Nesterov_2004.Sections.section05_part3