Documentation

Papers.SmoothMinimization_Nesterov_2004.Sections.section05_part4

theorem V_Q_entropy_objective_rewrite_on_simplex (n : ) (z : (standardSimplex n)) (g : Module.Dual (Fin n)) (hz_pos : ∀ (i : Fin n), 0 < z i) :
have d := fun (x : Fin n) => Real.log n + i : Fin n, x i * Real.log (x i); have gcoord := fun (i : Fin n) => g fun (j : Fin n) => if j = i then 1 else 0; xstandardSimplex n, DualPairing g (x - z) + bregmanDistance d (↑z) x = i : Fin n, x i * Real.log (x i) + i : Fin n, x i * (gcoord i - Real.log (z i)) - i : Fin n, z i * gcoord i

Rewrite the entropy V_Q objective on the simplex.

theorem V_Q_entropy_candidate_isMinOn (n : ) (z : (standardSimplex n)) (g : Module.Dual (Fin n)) (hz_pos : ∀ (i : Fin n), 0 < z i) :
have d := fun (x : Fin n) => Real.log n + i : Fin n, x i * Real.log (x i); have gcoord := fun (i : Fin n) => g fun (j : Fin n) => if j = i then 1 else 0; have s := fun (i : Fin n) => Real.log (z i) - gcoord i; have u := fun (i : Fin n) => Real.exp (s i) / j : Fin n, Real.exp (s j); u standardSimplex n IsMinOn (fun (x : Fin n) => DualPairing g (x - z) + bregmanDistance d (↑z) x) (standardSimplex n) u

The softmax candidate minimizes the entropy V_Q objective on the simplex.

theorem V_Q_entropy_objective_strictConvexOn (n : ) (z : (standardSimplex n)) (g : Module.Dual (Fin n)) :
have d := fun (x : Fin n) => Real.log n + i : Fin n, x i * Real.log (x i); StrictConvexOn (standardSimplex n) fun (x : Fin n) => DualPairing g (x - z) + bregmanDistance d (↑z) x

The entropy V_Q objective is strictly convex on the simplex.

theorem V_Q_entropy_simplex_explicit (n : ) (z : (standardSimplex n)) (g : Module.Dual (Fin n)) (hz_pos : ∀ (i : Fin n), 0 < z i) :
have d := fun (x : Fin n) => Real.log n + i : Fin n, x i * Real.log (x i); have gcoord := fun (i : Fin n) => g fun (j : Fin n) => if j = i then 1 else 0; ∀ (i : Fin n), (V_Q (standardSimplex n) d z g) i = z i * Real.exp (-gcoord i) / j : Fin n, z j * Real.exp (-gcoord j)

Proposition 1.5.3.1. Let Q = Δ_n and let the prox-function be the entropy distance d(x) = ln n + ∑ i x^{(i)} ln x^{(i)}. Then for z ∈ Δ_n and g ∈ ℝ^n, the mapping V_Q(z,g) from Definition 1.5.3.1 has the explicit form V_Q^{(i)}(z,g) = z^{(i)} e^{-g^{(i)}} [∑_j z^{(j)} e^{-g^{(j)}}]^{-1}, i = 1, …, n (equation (5.7)).

theorem section05_modified_Rk_support_at_xk1 {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (α : ) (xSeq ySeq : Q) (k : ) (hQ_convex : Convex Q) (hf_convex : ConvexOn Q f) (hf_diff : xQ, DifferentiableAt f x) (hα_pos : ∀ (k : ), 0 < α k) (hRk : R_k Q f d L σ α xSeq ySeq k) :
have xk1 := (xSeq (k + 1)); have g := (fderiv f (xSeq (k + 1))); A_k α k * (f xk1 + DualPairing g ((ySeq k) - xk1)) psi_k Q f d L σ α xSeq k

Support-plane refinement of R_k at x_{k+1} for the modified update.

theorem section05_modified_dualpairing_combine {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (α : ) (xSeq ySeq : Q) (k : ) (hA_pos : 0 < A_k α (k + 1)) (hx_update : (xSeq (k + 1)) = (α (k + 1) / A_k α (k + 1)) (z_k Q f d L σ α xSeq k) + (1 - α (k + 1) / A_k α (k + 1)) (ySeq k)) (g : Module.Dual E) (x : E) :
A_k α k * DualPairing g ((ySeq k) - (xSeq (k + 1))) + α (k + 1) * DualPairing g (x - (xSeq (k + 1))) = α (k + 1) * DualPairing g (x - (z_k Q f d L σ α xSeq k))

Combine gradient pairings using the modified x_{k+1} update.

theorem section05_VQ_scaled_min_property {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (α : ) (xSeq : Q) (k : ) (g : Module.Dual E) (hVQ_min : IsMinOn (fun (x : E) => L / σ * bregmanDistance d (↑(z_k Q f d L σ α xSeq k)) x + α (k + 1) * DualPairing g (x - (z_k Q f d L σ α xSeq k))) Q (V_Q Q d (z_k Q f d L σ α xSeq k) ((σ / L * α (k + 1)) g))) (x : E) :
x QL / σ * bregmanDistance d (z_k Q f d L σ α xSeq k) (V_Q Q d (z_k Q f d L σ α xSeq k) ((σ / L * α (k + 1)) g)) + α (k + 1) * DualPairing g ((V_Q Q d (z_k Q f d L σ α xSeq k) ((σ / L * α (k + 1)) g)) - (z_k Q f d L σ α xSeq k)) L / σ * bregmanDistance d (↑(z_k Q f d L σ α xSeq k)) x + α (k + 1) * DualPairing g (x - (z_k Q f d L σ α xSeq k))

Convert the V_Q minimizer into the scaled Bregman inequality.

theorem section05_yminusx_equals_tau_smul {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (α : ) (xSeq ySeq : Q) (k : ) (hx_update : (xSeq (k + 1)) = (α (k + 1) / A_k α (k + 1)) (z_k Q f d L σ α xSeq k) + (1 - α (k + 1) / A_k α (k + 1)) (ySeq k)) (hy_update : (ySeq (k + 1)) = have τk := α (k + 1) / A_k α (k + 1); have γk := σ / L * α (k + 1); τk (V_Q Q d (z_k Q f d L σ α xSeq k) (γk (fderiv f (xSeq (k + 1))))) + (1 - τk) (ySeq k)) :
have τk := α (k + 1) / A_k α (k + 1); have γk := σ / L * α (k + 1); have xhat := (V_Q Q d (z_k Q f d L σ α xSeq k) (γk (fderiv f (xSeq (k + 1))))); (ySeq (k + 1)) - (xSeq (k + 1)) = τk (xhat - (z_k Q f d L σ α xSeq k))

Express y_{k+1} - x_{k+1} via the modified prox update.

theorem section05_final_convexity_step {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f : E} (xSeq ySeq : Q) (k : ) (hQ_convex : Convex Q) (hf_convex : ConvexOn Q f) (hf_diff : xQ, DifferentiableAt f x) :
have xk1 := (xSeq (k + 1)); have g := (fderiv f (xSeq (k + 1))); f (ySeq (k + 1)) f xk1 + DualPairing g ((ySeq (k + 1)) - xk1)

Final convexity step for the modified invariant update.