Documentation

Papers.SmoothMinimization_Nesterov_2004.Sections.section04_part5

theorem entropySimplex_softmax_den_pos_and_mem (m : ) (μ : ) (s : Fin m) (hm : 0 < m) :
have Z := j : Fin m, Real.exp (s j / μ); 0 < Z (fun (j : Fin m) => Real.exp (s j / μ) / Z) standardSimplex m

The softmax vector belongs to the standard simplex (and its normalizer is positive).

theorem entropySimplex_mul_log_div_ge_sub {u v : } (hu : 0 < u) (hv : 0 v) :
v * Real.log (v / u) v - u

Scalar inequality underlying KL nonnegativity: v * log(v/u) ≥ v - u for u > 0, v ≥ 0.

theorem entropySimplex_mul_log_div_eq (u v : ) (hu : u 0) :
v * Real.log (v / u) = v * Real.log v - v * Real.log u

Rewrite v * log(v/u) as v * log v - v * log u (handles v = 0 by cases).

theorem entropySimplex_KL_nonneg {m : } {v u : Fin m} (hv : v standardSimplex m) (hu_pos : ∀ (j : Fin m), 0 < u j) (hu_sum : j : Fin m, u j = 1) :
0 j : Fin m, v j * Real.log (v j / u j)

Nonnegativity of the KL divergence ∑ v_j log(v_j/u_j) on the simplex against a positive u.

theorem entropySimplex_obj_at_softmax (m : ) (μ : ) (s : Fin m) (hm : 0 < m) ( : 0 < μ) :
have Z := j : Fin m, Real.exp (s j / μ); have := fun (j : Fin m) => Real.exp (s j / μ) / Z; j : Fin m, j * s j - μ * j : Fin m, j * Real.log ( j) = μ * Real.log Z

The entropy-regularized objective at the softmax point equals μ * log(∑ exp(s/μ)).

theorem entropySimplex_obj_le_softmax (m : ) (μ : ) (s : Fin m) (hm : 0 < m) ( : 0 < μ) :
have Z := j : Fin m, Real.exp (s j / μ); have _uμ := fun (j : Fin m) => Real.exp (s j / μ) / Z; vstandardSimplex m, j : Fin m, v j * s j - μ * j : Fin m, v j * Real.log (v j) μ * Real.log Z

Any point of the simplex achieves objective value at most the softmax value μ * log(∑ exp(s/μ)).

theorem entropySimplex_softmax_maximizer (m : ) (μ : ) (s : Fin m) (hm : 0 < m) ( : 0 < μ) :
have Φμ := sSup ((fun (u : Fin m) => j : Fin m, u j * s j - μ * j : Fin m, u j * Real.log (u j)) '' standardSimplex m); have := fun (j : Fin m) => Real.exp (s j / μ) / l : Fin m, Real.exp (s l / μ); standardSimplex m (∀ vstandardSimplex m, j : Fin m, v j * s j - μ * j : Fin m, v j * Real.log (v j) j : Fin m, j * s j - μ * j : Fin m, j * Real.log ( j)) Φμ = μ * Real.log (∑ j : Fin m, Real.exp (s j / μ))

Lemma 1.4.1.2. For μ > 0 and s ∈ ℝ^m, define Φ_μ(s) = max_{u ∈ Δ_m} {∑ u^{(j)} s^{(j)} - μ ∑ u^{(j)} ln u^{(j)}} (4.13). The maximizer u_μ(s) has entries u_μ^{(j)}(s) = exp(s^{(j)}/μ) / ∑_l exp(s^{(l)}/μ) (4.14), and Φ_μ(s) = μ ln(∑_j exp(s^{(j)}/μ)).

theorem matrixGame_contLin_apply_eq_sum_basis (m : ) ( : (Fin m) →L[] ) (u : Fin m) :
u = j : Fin m, u j * fun (k : Fin m) => if k = j then 1 else 0

Evaluate a continuous linear functional ℓ : (Fin m → ℝ) →L[ℝ] ℝ as a coordinate sum against the standard basis vectors e_j(k) = if k = j then 1 else 0.

theorem matrixGame_entropy_objective_rewrite (m : ) (μ : ) ( : (Fin m) →L[] ) (b u : Fin m) (hu : u standardSimplex m) :
u + j : Fin m, b j * u j - μ * (Real.log m + j : Fin m, u j * Real.log (u j)) = j : Fin m, u j * (( fun (k : Fin m) => if k = j then 1 else 0) + b j - μ * Real.log m) - μ * j : Fin m, u j * Real.log (u j)

On the standard simplex Δ_m, rewrite the smoothed matrix-game integrand as the entropy objective ∑ u_j s_j - μ ∑ u_j log u_j with s_j = ℓ(e_j) + b_j - μ log m, absorbing the constant term via ∑ u_j = 1.

theorem matrixGame_logsumexp_shift_by_log_m (m : ) (μ : ) (t : Fin m) (hm : 0 < m) ( : μ 0) :
j : Fin m, Real.exp ((t j - μ * Real.log m) / μ) = 1 / m * j : Fin m, Real.exp (t j / μ)

Rewrite the log-sum-exp shift ∑ exp((t_j - μ log m)/μ) = (1/m) * ∑ exp(t_j/μ) for m > 0 and μ ≠ 0.

theorem matrixGame_smoothedObjective_explicit (n m : ) (A : (Fin n) →L[] (Fin m) →L[] ) (c : Fin n) (b : Fin m) (μ : ) ( : 0 < μ) :
have fhat := fun (x : Fin n) => i : Fin n, c i * x i; have phihat := fun (u : Fin m) => -j : Fin m, b j * u j; have d2 := fun (u : Fin m) => Real.log m + j : Fin m, u j * Real.log (u j); have fbar := SmoothedObjective (standardSimplex m) A phihat d2 μ fhat; fbar = fun (x : Fin n) => i : Fin n, c i * x i + μ * Real.log (1 / m * j : Fin m, Real.exp ((((A x) fun (k : Fin m) => if k = j then 1 else 0) + b j) / μ))

Proposition 1.4.1.3. In the matrix-game setting (4.10), under the entropy-distance setup of Lemma 1.4.1.2, the smoothed objective (4.1) for the primal problem is \bar f_μ(x) = ⟪c,x⟫_1 + μ ln((1/m) ∑_{j=1}^m exp((⟪a_j,x⟫_1 + b^{(j)})/μ)), with the minimization min_{x ∈ Δ_n} \bar f_μ(x) (eq:matrixgame_smooth).