Papers.SmoothMinimization_Nesterov_2004.Sections.section04

source

theorem weighted_sum_bound_hatu {E1 : Type u_1} {E2 : Type u_2} [NormedAddCommGroup E1] [NormedSpace ℝ E1] [FiniteDimensional ℝ E1] [NormedAddCommGroup E2] [NormedSpace ℝ E2] [FiniteDimensional ℝ E2] (Q1 : Set E1) (Q2 : Set E2) (A : E1 →L[ℝ ] E2 →L[ℝ ] ℝ) (fhat : E1 → ℝ) (phihat d2 : E2 → ℝ) (μ : ℝ) (uμ : E1 → E2) (A' : E1 →ₗ[ℝ ] Module.Dual ℝ E2) (xSeq : ℕ → ↑Q1) (N : ℕ) (hatu : ↑Q2) (hhatudef : ↑hatu = ∑ i ∈ Finset.range (N + 1), (2 * (↑i + 1) / ((↑N + 1) * (↑N + 2))) • uμ ↑(xSeq i)) (hQ1_open : IsOpen Q1) (hQ1_convex : Convex ℝ Q1) (hfhat_convex : ConvexOn ℝ Q1 fhat) (hfhat_diff : DifferentiableOn ℝ fhat Q1) (hfbar_diff : DifferentiableOn ℝ (SmoothedObjective Q2 A phihat d2 μ fhat) Q1) (hphihat_convex : ConvexOn ℝ Q2 phihat) (hμ_nonneg : 0 ≤ μ) (hd2_nonneg : ∀ u ∈ Q2, 0 ≤ d2 u) (hmax : ∀ (x : E1), IsSmoothedMaximizer Q2 A phihat d2 μ x (uμ x)) (hderivμ : ∀ (x : E1), fderiv ℝ (SmoothedMaxFunction Q2 A phihat d2 μ) x = LinearMap.toContinuousLinearMap ((AdjointOperator A') (uμ x))) (hpair : ∀ (x : E1) (u : E2), DualPairing ((AdjointOperator A') u) x = (A x) u) (x : E1) :

x ∈ Q1 → ∑ i ∈ Finset.range (N + 1), (↑i + 1) * (SmoothedObjective Q2 A phihat d2 μ fhat ↑(xSeq i) + DualPairing (↑(fderiv ℝ (SmoothedObjective Q2 A phihat d2 μ fhat) ↑(xSeq i))) (x - ↑(xSeq i))) ≤ (↑N + 1) * (↑N + 2) / 2 * (fhat x + (A x) ↑hatu - phihat ↑hatu)

Weighted linearization sum bound using the averaged dual point.

source

theorem mu_sq_identity (D1 D2 σ1 σ2 μ Aop : ℝ) (N : ℕ) (hμ_def : μ = 2 * Aop / (↑N + 1) * √(D1 / (σ1 * σ2 * D2))) (hσ1 : 0 < σ1) (hσ2 : 0 < σ2) (hD1 : 0 ≤ D1) (hD2 : 0 ≤ D2) (hD2ne : D2 ≠ 0) :

μ ^ 2 * (σ1 * σ2 * D2) * (↑N + 1) ^ 2 = 4 * Aop ^ 2 * D1

If μ is chosen as in the smoothing rule, its square satisfies the scaling identity.

source

theorem mu_mul_D2_sqrt (D1 D2 σ1 σ2 μ Aop : ℝ) (N : ℕ) (hμ_def : μ = 2 * Aop / (↑N + 1) * √(D1 / (σ1 * σ2 * D2))) (hD2 : 0 ≤ D2) (hσ1 : 0 < σ1) (hσ2 : 0 < σ2) :

μ * D2 = 2 * Aop / (↑N + 1) * √(D1 * D2 / (σ1 * σ2))

If μ is chosen as in the smoothing rule, then μ * D2 simplifies to the sqrt expression.

source

theorem mu_inv_term_eq (D1 D2 σ1 σ2 μ Aop : ℝ) (N : ℕ) (hμ_def : μ = 2 * Aop / (↑N + 1) * √(D1 / (σ1 * σ2 * D2))) (hσ1 : 0 < σ1) (hσ2 : 0 < σ2) (hD1 : 0 ≤ D1) (hD2 : 0 ≤ D2) (hμ : μ ≠ 0) :

4 * (1 / (μ * σ2)) * Aop ^ 2 * D1 / (σ1 * (↑N + 1) * (↑N + 2)) = μ * D2 * (↑N + 1) / (↑N + 2)

The inverse-μ term equals μ * D2 scaled by (N+1)/(N+2).

source

theorem mu_inv_term_le (D1 D2 σ1 σ2 μ Aop : ℝ) (N : ℕ) (hμ_def : μ = 2 * Aop / (↑N + 1) * √(D1 / (σ1 * σ2 * D2))) (hσ1 : 0 < σ1) (hσ2 : 0 < σ2) (hD1 : 0 ≤ D1) (hD2 : 0 ≤ D2) (hAop : 0 ≤ Aop) :

4 * (1 / (μ * σ2)) * Aop ^ 2 * D1 / (σ1 * (↑N + 1) * (↑N + 2)) ≤ 2 * Aop / (↑N + 1) * √(D1 * D2 / (σ1 * σ2))

The inverse-μ term is bounded by the sqrt expression.

source

theorem mu_simplify_bound (D1 D2 σ1 σ2 μ : ℝ) (N : ℕ) (Aop : ℝ) (hμ_def : μ = 2 * Aop / (↑N + 1) * √(D1 / (σ1 * σ2 * D2))) (hD1 : 0 ≤ D1) (hD2 : 0 ≤ D2) (hσ1 : 0 < σ1) (hσ2 : 0 < σ2) (hAop : 0 ≤ Aop) :

μ * D2 + 4 * (1 / (μ * σ2)) * Aop ^ 2 * D1 / (σ1 * (↑N + 1) * (↑N + 2)) ≤ 4 * Aop / (↑N + 1) * √(D1 * D2 / (σ1 * σ2))

Simplify the μ-dependent terms in the gap bound.

source

theorem epsilon_bound_from_gap {a b ε : ℝ} {N : ℕ} (hε : 0 < ε) (ha : 0 ≤ a) (hb : 0 ≤ b) (hN : ↑N + 1 ≥ a / ε + √(b / ε)) :

a / (↑N + 1) + b / (↑N + 1) ^ 2 ≤ ε

Convert an iteration lower bound to an ε-accuracy bound.

Documentation

Papers.SmoothMinimization_Nesterov_2004.Sections.section04_part2