Papers.SmoothMinimization_Nesterov_2004.Sections.section04

source

theorem entropySimplex_softmax_den_pos_and_mem (m : ℕ) (μ : ℝ) (s : Fin m → ℝ) (hm : 0 < m) :

have Z := ∑ j : Fin m, Real.exp (s j / μ); 0 < Z ∧ (fun (j : Fin m) => Real.exp (s j / μ) / Z) ∈ standardSimplex m

The softmax vector belongs to the standard simplex (and its normalizer is positive).

source

theorem entropySimplex_mul_log_div_ge_sub {u v : ℝ} (hu : 0 < u) (hv : 0 ≤ v) :

v * Real.log (v / u) ≥ v - u

Scalar inequality underlying KL nonnegativity: v * log(v/u) ≥ v - u for u > 0, v ≥ 0.

source

theorem entropySimplex_mul_log_div_eq (u v : ℝ) (hu : u ≠ 0) :

v * Real.log (v / u) = v * Real.log v - v * Real.log u

Rewrite v * log(v/u) as v * log v - v * log u (handles v = 0 by cases).

source

theorem entropySimplex_KL_nonneg {m : ℕ} {v u : Fin m → ℝ} (hv : v ∈ standardSimplex m) (hu_pos : ∀ (j : Fin m), 0 < u j) (hu_sum : ∑ j : Fin m, u j = 1) :

0 ≤ ∑ j : Fin m, v j * Real.log (v j / u j)

Nonnegativity of the KL divergence ∑ v_j log(v_j/u_j) on the simplex against a positive u.

source

theorem entropySimplex_obj_at_softmax (m : ℕ) (μ : ℝ) (s : Fin m → ℝ) (hm : 0 < m) (hμ : 0 < μ) :

have Z := ∑ j : Fin m, Real.exp (s j / μ); have uμ := fun (j : Fin m) => Real.exp (s j / μ) / Z; ∑ j : Fin m, uμ j * s j - μ * ∑ j : Fin m, uμ j * Real.log (uμ j) = μ * Real.log Z

The entropy-regularized objective at the softmax point equals μ * log(∑ exp(s/μ)).

source

theorem entropySimplex_obj_le_softmax (m : ℕ) (μ : ℝ) (s : Fin m → ℝ) (hm : 0 < m) (hμ : 0 < μ) :

have Z := ∑ j : Fin m, Real.exp (s j / μ); have _uμ := fun (j : Fin m) => Real.exp (s j / μ) / Z; ∀ v ∈ standardSimplex m, ∑ j : Fin m, v j * s j - μ * ∑ j : Fin m, v j * Real.log (v j) ≤ μ * Real.log Z

Any point of the simplex achieves objective value at most the softmax value μ * log(∑ exp(s/μ)).

source

theorem entropySimplex_softmax_maximizer (m : ℕ) (μ : ℝ) (s : Fin m → ℝ) (hm : 0 < m) (hμ : 0 < μ) :

have Φμ := sSup ((fun (u : Fin m → ℝ) => ∑ j : Fin m, u j * s j - μ * ∑ j : Fin m, u j * Real.log (u j)) '' standardSimplex m); have uμ := fun (j : Fin m) => Real.exp (s j / μ) / ∑ l : Fin m, Real.exp (s l / μ); uμ ∈ standardSimplex m ∧ (∀ v ∈ standardSimplex m, ∑ j : Fin m, v j * s j - μ * ∑ j : Fin m, v j * Real.log (v j) ≤ ∑ j : Fin m, uμ j * s j - μ * ∑ j : Fin m, uμ j * Real.log (uμ j)) ∧ Φμ = μ * Real.log (∑ j : Fin m, Real.exp (s j / μ))

Lemma 1.4.1.2. For μ > 0 and s ∈ ℝ^m, define Φ_μ(s) = max_{u ∈ Δ_m} {∑ u^{(j)} s^{(j)} - μ ∑ u^{(j)} ln u^{(j)}} (4.13). The maximizer u_μ(s) has entries u_μ^{(j)}(s) = exp(s^{(j)}/μ) / ∑_l exp(s^{(l)}/μ) (4.14), and Φ_μ(s) = μ ln(∑_j exp(s^{(j)}/μ)).

source

theorem matrixGame_contLin_apply_eq_sum_basis (m : ℕ) (ℓ : (Fin m → ℝ) →L[ℝ ] ℝ) (u : Fin m → ℝ) :

ℓ u = ∑ j : Fin m, u j * ℓ fun (k : Fin m) => if k = j then 1 else 0

Evaluate a continuous linear functional ℓ : (Fin m → ℝ) →L[ℝ] ℝ as a coordinate sum against the standard basis vectors e_j(k) = if k = j then 1 else 0.

source

theorem matrixGame_entropy_objective_rewrite (m : ℕ) (μ : ℝ) (ℓ : (Fin m → ℝ) →L[ℝ ] ℝ) (b u : Fin m → ℝ) (hu : u ∈ standardSimplex m) :

ℓ u + ∑ j : Fin m, b j * u j - μ * (Real.log ↑m + ∑ j : Fin m, u j * Real.log (u j)) = ∑ j : Fin m, u j * ((ℓ fun (k : Fin m) => if k = j then 1 else 0) + b j - μ * Real.log ↑m) - μ * ∑ j : Fin m, u j * Real.log (u j)

On the standard simplex Δ_m, rewrite the smoothed matrix-game integrand as the entropy objective ∑ u_j s_j - μ ∑ u_j log u_j with s_j = ℓ(e_j) + b_j - μ log m, absorbing the constant term via ∑ u_j = 1.

source

theorem matrixGame_logsumexp_shift_by_log_m (m : ℕ) (μ : ℝ) (t : Fin m → ℝ) (hm : 0 < m) (hμ : μ ≠ 0) :

∑ j : Fin m, Real.exp ((t j - μ * Real.log ↑m) / μ) = 1 / ↑m * ∑ j : Fin m, Real.exp (t j / μ)

Rewrite the log-sum-exp shift ∑ exp((t_j - μ log m)/μ) = (1/m) * ∑ exp(t_j/μ) for m > 0 and μ ≠ 0.

source

theorem matrixGame_smoothedObjective_explicit (n m : ℕ) (A : (Fin n → ℝ) →L[ℝ ] (Fin m → ℝ) →L[ℝ ] ℝ) (c : Fin n → ℝ) (b : Fin m → ℝ) (μ : ℝ) (hμ : 0 < μ) :

have fhat := fun (x : Fin n → ℝ) => ∑ i : Fin n, c i * x i; have phihat := fun (u : Fin m → ℝ) => -∑ j : Fin m, b j * u j; have d2 := fun (u : Fin m → ℝ) => Real.log ↑m + ∑ j : Fin m, u j * Real.log (u j); have fbar := SmoothedObjective (standardSimplex m) A phihat d2 μ fhat; fbar = fun (x : Fin n → ℝ) => ∑ i : Fin n, c i * x i + μ * Real.log (1 / ↑m * ∑ j : Fin m, Real.exp ((((A x) fun (k : Fin m) => if k = j then 1 else 0) + b j) / μ))

Proposition 1.4.1.3. In the matrix-game setting (4.10), under the entropy-distance setup of Lemma 1.4.1.2, the smoothed objective (4.1) for the primal problem is \bar f_μ(x) = ⟪c,x⟫_1 + μ ln((1/m) ∑_{j=1}^m exp((⟪a_j,x⟫_1 + b^{(j)})/μ)), with the minimization min_{x ∈ Δ_n} \bar f_μ(x) (eq:matrixgame_smooth).

Documentation

Papers.SmoothMinimization_Nesterov_2004.Sections.section04_part5