Papers.SmoothMinimization_Nesterov_2004.Sections.section04

source

theorem Section04Part10.image_neg_eq_smul (s : Set ℝ) :

(fun (x : ℝ) => -x) '' s = -1 • s

Image under negation is scalar multiplication by -1 on ℝ.

source

theorem Section04Part10.sSup_image_neg_eq_neg_sInf (s : Set ℝ) :

sSup ((fun (x : ℝ) => -x) '' s) = -sInf s

Push a negation through sSup for subsets of ℝ.

source

theorem Section04Part10.sInf_image_neg_eq_neg_sSup (s : Set ℝ) :

sInf ((fun (x : ℝ) => -x) '' s) = -sSup s

Push a negation through sInf for subsets of ℝ.

source

theorem Section04Part10.variationalInequality.minimax_eq_of_saddlePoint {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] (Q : Set E) (B : E → Module.Dual ℝ E) {w0 : E} (hw0 : w0 ∈ Q) (hgap_le : ∀ v ∈ Q, (B v) (w0 - v) ≤ 0) (hgap_ge : ∀ w ∈ Q, 0 ≤ (B w0) (w - w0)) (hbdAbove : ∀ w ∈ Q, BddAbove ((fun (v : E) => (B v) (w - v)) '' Q)) (hbdBelow : ∀ v ∈ Q, BddBelow ((fun (w : E) => (B v) (w - v)) '' Q)) :

sInf ((fun (w : E) => sSup ((fun (v : E) => (B v) (w - v)) '' Q)) '' Q) = sSup ((fun (v : E) => sInf ((fun (w : E) => (B v) (w - v)) '' Q)) '' Q)

A saddle point with bounded images forces the minimax equality by showing both sides are zero.

source

theorem Section04Part10.variationalInequality.minimax_eq_of_exists_VI_solution {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] (Q : Set E) (ℬ : E →ₗ[ℝ ] Module.Dual ℝ E) (b : Module.Dual ℝ E) (hmono : ∀ (h : E), 0 ≤ (ℬ h) h) (hex : ∃ (w0 : E), IsVariationalInequalitySolution Q ℬ b w0) (hbdAbove : ∀ w ∈ Q, BddAbove ((fun (v : E) => (ℬ v + b) (w - v)) '' Q)) (hbdBelow : ∀ v ∈ Q, BddBelow ((fun (w : E) => (ℬ v + b) (w - v)) '' Q)) :

sInf ((fun (w : E) => sSup ((fun (v : E) => (ℬ v + b) (w - v)) '' Q)) '' Q) = sSup ((fun (v : E) => sInf ((fun (w : E) => (ℬ v + b) (w - v)) '' Q)) '' Q)

A VI solution yields a saddle point for the gap payoff, hence the minimax equality.

source

theorem Section04Part10.variationalInequality_sign_flip {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] (Q : Set E) (B : E → Module.Dual ℝ E) :

sSup ((fun (v : E) => sInf ((fun (w : E) => (B v) (w - v)) '' Q)) '' Q) = -sInf ((fun (v : E) => sSup ((fun (w : E) => (B v) (v - w)) '' Q)) '' Q)

Reformulate the inner max by flipping the sign.

source

theorem Section04Part10.variationalInequality_dual_maxStructure {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] (Q : Set E) (ℬ : E →L[ℝ ] E →L[ℝ ] ℝ) (b : Module.Dual ℝ E) (d : E → ℝ) (μ σ1 D1 ε : ℝ) :

have ℬ' := { toFun := fun (x : E) => ↑(ℬ x), map_add' := ⋯, map_smul' := ⋯ }; have B := fun (v : E) => ℬ' v + b; have phihat := fun (u : E) => b u; have fμ := SmoothedMaxFunction Q ℬ phihat d μ; ((∀ (h : E), 0 ≤ (ℬ' h) h) → (∃ (w0 : E), IsVariationalInequalitySolution Q ℬ' b w0) → (∀ w ∈ Q, BddAbove ((fun (v : E) => (B v) (w - v)) '' Q)) → (∀ v ∈ Q, BddBelow ((fun (w : E) => (B v) (w - v)) '' Q)) → sInf ((fun (w : E) => sSup ((fun (v : E) => (B v) (w - v)) '' Q)) '' Q) = sSup ((fun (v : E) => sInf ((fun (w : E) => (B v) (w - v)) '' Q)) '' Q)) ∧ sSup ((fun (v : E) => sInf ((fun (w : E) => (B v) (w - v)) '' Q)) '' Q) = -sInf ((fun (v : E) => sSup ((fun (w : E) => (B v) (v - w)) '' Q)) '' Q) ∧ (∀ (x : E), fμ x = sSup ((fun (u : E) => (ℬ x) u - μ * d u - b u) '' Q)) ∧ (0 < ε → 0 < σ1 → StrongConvexOn Q σ1 d → IsProxDiameterBound Q d D1 → 0 ≤ D1 → ∃ (N : ℕ), ↑N ≤ 4 * D1 * OperatorNormDef ℬ' / (σ1 * ε) + √(D1 * OperatorNormDef ℬ' / (σ1 * ε)))

Proposition 1.4.3.3. A dual max-structure representation of (4.19) comes from min_{w ∈ Q} max_{v ∈ Q} ⟪B(v), w - v⟫_1 = max_{v ∈ Q} min_{w ∈ Q} ⟪B(v), w - v⟫_1 = - min_{v ∈ Q} max_{w ∈ Q} ⟪B(v), v - w⟫_1. Equivalently, take E1 = E2 = E, Q1 = Q2 = Q, d1 = d2 = d, A = ℬ, fhat(x) = ⟪b, x⟫_1 + ⟪ℬ x, x⟫_1, and phihat(u) = ⟪b, u⟫_1. Then f_μ(x) = max_{u ∈ Q} {⟪ℬ x, u⟫_1 - μ d(u) - ⟪b, u⟫_1}. In this dual representation M = ‖ℬ‖_{1,2}, and the complexity bound corresponding to (4.21) is N ≤ (4 D1 ‖ℬ‖_{1,2})/(σ1 ε) + sqrt(D1 ‖ℬ‖_{1,2}/(σ1 ε)).

source

noncomputable def Section04Part10.piecewiseLinearMinimizationOptimalValue {E1 : Type u_1} [NormedAddCommGroup E1] [NormedSpace ℝ E1] (m : ℕ) (Q1 : Set E1) (a : Fin m → E1 →L[ℝ ] ℝ) (b : Fin m → ℝ) :

ℝ

Definition 1.4.4.1. Let a_j ∈ E1^* and b^{(j)} ∈ ℝ for j = 1, ..., m. Consider the piecewise linear minimization problem min_{x ∈ Q1} f(x), where f(x) := max_{1 ≤ j ≤ m} |⟪a_j, x⟫_1 - b^{(j)}| (4.22).

Equations

Section04Part10.piecewiseLinearMinimizationOptimalValue m Q1 a b = sInf ((fun (x : E1) => sSup ((fun (j : Fin m) => |(a j) x - b j|) '' Set.univ)) '' Q1)

Instances For

source

theorem Section04Part10.piecewiseLinear_entropy_rbar_nonneg (n : ℕ) (Q1 : Set (Fin n → ℝ)) :

0 ≤ sSup ((fun (x : Fin n → ℝ) => ‖x‖) '' Q1)

The radius defined as a supremum of norms is nonnegative.

source

theorem Section04Part10.piecewiseLinear_entropy_complexity_const_nonneg (n m : ℕ) (Q1 : Set (Fin n → ℝ)) (hatA' : (Fin n → ℝ) →ₗ[ℝ ] Module.Dual ℝ (Fin (2 * m) → ℝ)) {ε : ℝ} (hε : 0 < ε) :

have rbar := sSup ((fun (x : Fin n → ℝ) => ‖x‖) '' Q1); have maxAj := OperatorNormDef hatA'; 0 ≤ 2 * √2 * rbar * maxAj * √(Real.log (2 * ↑m)) * (1 / ε)

The entropy-complexity constant is nonnegative.

source

theorem Section04Part10.piecewiseLinear_entropy_complexity_exists_N {C : ℝ} (hC : 0 ≤ C) :

∃ (N : ℕ), ↑N ≤ C

Round a nonnegative constant to a natural upper bound.

source

theorem Section04Part10.piecewiseLinear_entropy_complexity (n m : ℕ) (Q1 : Set (Fin n → ℝ)) (a : Fin m → Module.Dual ℝ (Fin n → ℝ)) (b : Fin m → ℝ) (hatA : (Fin n → ℝ) →L[ℝ ] (Fin (2 * m) → ℝ) →L[ℝ ] ℝ) (hatb : (Fin (2 * m) → ℝ) → ℝ) (μ ε : ℝ) (hε : 0 < ε) :

have rbar := sSup ((fun (x : Fin n → ℝ) => ‖x‖) '' Q1); have σ1 := 1; have σ2 := 1; have D1 := 1 / 2 * rbar ^ 2; have D2 := Real.log (2 * ↑m); have f := fun (x : Fin n → ℝ) => sSup ((fun (u : Fin (2 * m) → ℝ) => (hatA x) u - hatb u) '' standardSimplex (2 * m)); have hatA' := { toFun := fun (x : Fin n → ℝ) => ↑(hatA x), map_add' := ⋯, map_smul' := ⋯ }; have maxAj := OperatorNormDef hatA'; have ξ := fun (τ : ℝ) => 1 / 2 * (Real.exp τ + Real.exp (-τ)); have fbar := fun (x : Fin n → ℝ) => μ * Real.log (1 / ↑m * ∑ j : Fin m, ξ (((a j) x + b j) / μ)); (∀ (x : Fin n → ℝ), f x = sSup ((fun (u : Fin (2 * m) → ℝ) => (hatA x) u - hatb u) '' standardSimplex (2 * m))) ∧ σ1 = 1 ∧ σ2 = 1 ∧ D1 = 1 / 2 * rbar ^ 2 ∧ D2 = Real.log (2 * ↑m) ∧ OperatorNormDef hatA' = maxAj ∧ (∃ (N : ℕ), ↑N ≤ 2 * √2 * rbar * maxAj * √(Real.log (2 * ↑m)) * (1 / ε)) ∧ ∀ (x : Fin n → ℝ), fbar x = μ * Real.log (1 / ↑m * ∑ j : Fin m, ξ (((a j) x + b j) / μ))

Proposition 1.4.4.1. In the setting of Definition 1.4.4.1, take E1 = ℝ^n with Euclidean norm, the prox-function d1(x) = (1/2) ‖x‖_1^2, and rbar = max_{x ∈ Q1} ‖x‖_1. Let E2 = ℝ^{2m} with simplex Δ_{2m} and define hatA = (A; -A) and hatb = (b; -b) from rows a_j and scalars b^{(j)}. Then f(x) = max_{u ∈ Δ_{2m}} {⟪hatA x, u⟫_2 - ⟪hatb, u⟫_2} (4.19). With the entropy prox-function d2(u) = ln(2m) + ∑ u^{(j)} ln u^{(j)}, we have σ1 = σ2 = 1, D1 = (1/2) rbar^2, D2 = ln(2m), and ‖hatA‖_{1,2} = max_{1≤j≤m} ‖a_j‖_{1,*}. Therefore the complexity estimate (4.4) gives N ≤ 2*sqrt 2 rbar (max_{1≤j≤m} ‖a_j‖_{1,*}) sqrt(ln(2m)) / ε iterations for (4.22), and the smooth approximation has the explicit form bar f_μ(x) = μ ln((1/m) ∑_{j=1}^m ξ((1/μ) [⟪a_j, x⟫_1 + b^{(j)}])) with ξ(τ) = (1/2) (e^τ + e^{-τ}) (4.22_smooth).

source

noncomputable def Section04Part10.piecewiseLinearSumAbsOptimalValue {E1 : Type u_1} [NormedAddCommGroup E1] [NormedSpace ℝ E1] (m : ℕ) (Q1 : Set E1) (a : Fin m → E1 →L[ℝ ] ℝ) (b : Fin m → ℝ) :

ℝ

Definition 1.4.4.2. Let a_j ∈ E1^* and b^{(j)} ∈ ℝ for j = 1, ..., m. Consider minimizing the sum of absolute values (4.23): min_{x ∈ Q1} f(x) with f(x) := ∑_{j=1}^m |⟪a_j, x⟫_1 - b^{(j)}|.

Equations

Section04Part10.piecewiseLinearSumAbsOptimalValue m Q1 a b = sInf ((fun (x : E1) => ∑ j : Fin m, |(a j) x - b j|) '' Q1)

Instances For

source

theorem Section04Part10.sumAbs_smoothed_unfold {E1 : Type u_1} [NormedAddCommGroup E1] [NormedSpace ℝ E1] (m : ℕ) (a : Fin m → E1 →L[ℝ ] ℝ) (b : Fin m → ℝ) (A : E1 →L[ℝ ] (Fin m → ℝ) →L[ℝ ] ℝ) (phihat : (Fin m → ℝ) → ℝ) (μ : ℝ) (hA : ∀ (x : E1) (u : Fin m → ℝ), (A x) u = ∑ j : Fin m, u j * (a j) x) (hphihat : ∀ (u : Fin m → ℝ), phihat u = ∑ j : Fin m, b j * u j) (x : E1) :

have Q2 := {u : Fin m → ℝ | ∀ (j : Fin m), |u j| ≤ 1}; have d2 := fun (u : Fin m → ℝ) => 1 / 2 * ∑ j : Fin m, ‖a j‖ * u j ^ 2; have fμ := SmoothedMaxFunction Q2 A phihat d2 μ; fμ x = sSup ((fun (u : Fin m → ℝ) => ∑ j : Fin m, (u j * ((a j) x - b j) - 1 / 2 * μ * ‖a j‖ * u j ^ 2)) '' Q2)

Unfold the smoothed max for the sum-of-abs objective into a separable quadratic sum.

source

theorem Section04Part10.sumAbs_sSup_separable {E1 : Type u_1} [NormedAddCommGroup E1] [NormedSpace ℝ E1] (m : ℕ) (a : Fin m → E1 →L[ℝ ] ℝ) (b : Fin m → ℝ) (μ : ℝ) (x : E1) :

have Q2 := {u : Fin m → ℝ | ∀ (j : Fin m), |u j| ≤ 1}; sSup ((fun (u : Fin m → ℝ) => ∑ j : Fin m, (u j * ((a j) x - b j) - 1 / 2 * μ * ‖a j‖ * u j ^ 2)) '' Q2) = ∑ j : Fin m, sSup ((fun (t : ℝ) => t * ((a j) x - b j) - 1 / 2 * μ * ‖a j‖ * t ^ 2) '' Set.Icc (-1) 1)

Split the separable quadratic supremum over Q2 into coordinatewise suprema.

source

theorem Section04Part10.sSup_quadratic_Icc_eq_norm_mul_psi (μ τ c : ℝ) (hc : 0 < c) :

sSup ((fun (t : ℝ) => t * τ - 1 / 2 * μ * c * t ^ 2) '' Set.Icc (-1) 1) = c * sSup ((fun (γ : ℝ) => γ * (|τ| / c) - 1 / 2 * μ * γ ^ 2) '' Set.Icc 0 1)

Express the quadratic supremum on [-1,1] via the scaled ψμ on [0,1].

source

theorem Section04Part10.sumAbs_operatorNorm_bound {E1 : Type u_1} [NormedAddCommGroup E1] [NormedSpace ℝ E1] [FiniteDimensional ℝ E1] (m : ℕ) (a : Fin m → E1 →L[ℝ ] ℝ) (A : E1 →L[ℝ ] (Fin m → ℝ) →L[ℝ ] ℝ) (hA : ∀ (x : E1) (u : Fin m → ℝ), (A x) u = ∑ j : Fin m, u j * (a j) x) :

have D := ∑ j : Fin m, ‖a j‖; have A' := { toFun := fun (x : E1) => ↑(A x), map_add' := ⋯, map_smul' := ⋯ }; OperatorNormDef A' ≤ D

A crude operator-norm bound using the supremum norm on Fin m → ℝ.

source

theorem Section04Part10.sumAbs_complexity_exists_N (D D1 σ1 ε : ℝ) (hε : 0 < ε) (hD : 0 ≤ D) :

∃ (N : ℕ), ↑N ≤ 2 / ε * √(2 * D1 / σ1) * D

Produce a natural bound from the explicit complexity constant.

source

theorem Section04Part10.piecewiseLinear_sumAbs_smoothedApprox {E1 : Type u_1} [NormedAddCommGroup E1] [NormedSpace ℝ E1] [FiniteDimensional ℝ E1] (m : ℕ) (a : Fin m → E1 →L[ℝ ] ℝ) (b : Fin m → ℝ) (A : E1 →L[ℝ ] (Fin m → ℝ) →L[ℝ ] ℝ) (phihat : (Fin m → ℝ) → ℝ) (μ σ1 D1 ε : ℝ) (hε : 0 < ε) (hpos : ∀ (j : Fin m), 0 < ‖a j‖) (hA : ∀ (x : E1) (u : Fin m → ℝ), (A x) u = ∑ j : Fin m, u j * (a j) x) (hphihat : ∀ (u : Fin m → ℝ), phihat u = ∑ j : Fin m, b j * u j) :

have Q2 := {u : Fin m → ℝ | ∀ (j : Fin m), |u j| ≤ 1}; have d2 := fun (u : Fin m → ℝ) => 1 / 2 * ∑ j : Fin m, ‖a j‖ * u j ^ 2; have fμ := SmoothedMaxFunction Q2 A phihat d2 μ; have ψμ := fun (τ : ℝ) => sSup ((fun (γ : ℝ) => γ * τ - 1 / 2 * μ * γ ^ 2) '' Set.Icc 0 1); have D := ∑ j : Fin m, ‖a j‖; have D2 := 1 / 2 * D; have A' := { toFun := fun (x : E1) => ↑(A x), map_add' := ⋯, map_smul' := ⋯ }; (∀ (x : E1), fμ x = ∑ j : Fin m, ‖a j‖ * ψμ (|(a j) x - b j| / ‖a j‖)) ∧ D2 = 1 / 2 * D ∧ OperatorNormDef A' ≤ D ∧ ∃ (N : ℕ), ↑N ≤ 2 / ε * √(2 * D1 / σ1) * D

Proposition 1.4.4.2. In the setting of Definition 1.4.4.2, let A be the matrix with rows a_j and take E2 = ℝ^m, Q2 = {u : |u^{(j)}| ≤ 1}, and d2(u) = (1/2) ‖u‖_2^2 = (1/2) ∑ ‖a_j‖_{1,*} (u^{(j)})^2. Then the smoothed approximation has the form f_μ(x) = max_{u ∈ Q2} {⟪A x - b, u⟫_2 - μ d2(u)} = ∑_{j=1}^m ‖a_j‖_{1,*} ψ_μ(|⟪a_j,x⟫_1 - b^{(j)}| / ‖a_j‖_{1,*}), where ψ_μ is the function in (4.17). Moreover, with D = ∑ ‖a_j‖_{1,*} we have D2 = (1/2) D, σ2 = 1, and ‖A‖_{1,2} ≤ D^{1/2}. Therefore, in the case M = 0, N ≤ (2/ε) * sqrt(2 D1/σ1) * ∑ ‖a_j‖_{1,*} (equation (4.24)).

Documentation

Papers.SmoothMinimization_Nesterov_2004.Sections.section04_part10