Documentation

Papers.SmoothMinimization_Nesterov_2004.Sections.section03_part2

theorem section03_strongConvex_add_convex {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] {Q : Set E} {m : } {f g : E} (hf : StrongConvexOn Q m f) (hg : ConvexOn Q g) :
StrongConvexOn Q m (f + g)

Adding a convex function preserves strong convexity.

theorem section03_Fk_sum_convex {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} (hQ_convex : Convex Q) {f : E} (α : ) (xSeq : Q) (k : ) (hα_pos : ∀ (k : ), 0 < α k) :
ConvexOn Q fun (x : E) => iFinset.range (k + 1), α i * (f (xSeq i) + DualPairing (↑(fderiv f (xSeq i))) (x - (xSeq i)))

Convexity of the affine sum term in F_k.

theorem section03_Fk_quadratic_lower_bound_at_zk {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (α : ) (xSeq : Q) (k : ) (hQ_convex : Convex Q) (hconv : StrongConvexOn Q σ d) ( : 0 < σ) (hL : 0 < L) (hα_pos : ∀ (k : ), 0 < α k) (hzk_min : IsMinOn (fun (x : E) => L / σ * d x + iFinset.range (k + 1), α i * (f (xSeq i) + DualPairing (↑(fderiv f (xSeq i))) (x - (xSeq i)))) Q (z_k Q f d L σ α xSeq k)) (x : E) :
x Qpsi_k Q f d L σ α xSeq k + L / 2 * x - (z_k Q f d L σ α xSeq k) ^ 2 L / σ * d x + iFinset.range (k + 1), α i * (f (xSeq i) + DualPairing (↑(fderiv f (xSeq i))) (x - (xSeq i)))

Quadratic lower bound at a minimizer of F_k.

theorem Rk_succ_update {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (α : ) (hQ_closed : IsClosed Q) (hQ_convex : Convex Q) (hf_convex : ConvexOn Q f) (hf_diff : xQ, DifferentiableAt f x) (hL : 0 < L) (hgrad_lipschitz : xQ, yQ, DualNormDef ((fderiv f x) - (fderiv f y)) L * x - y) (hconv : StrongConvexOn Q σ d) ( : 0 < σ) (xSeq ySeq : Q) (k : ) (hα_pos : ∀ (k : ), 0 < α k) (hα_sq : ∀ (k : ), α (k + 1) ^ 2 A_k α (k + 1)) (hRk : R_k Q f d L σ α xSeq ySeq k) (hzk_min : IsMinOn (fun (x : E) => L / σ * d x + iFinset.range (k + 1), α i * (f (xSeq i) + DualPairing (↑(fderiv f (xSeq i))) (x - (xSeq i)))) Q (z_k Q f d L σ α xSeq k)) (hx_update : (xSeq (k + 1)) = (α (k + 1) / A_k α (k + 1)) (z_k Q f d L σ α xSeq k) + (1 - α (k + 1) / A_k α (k + 1)) (ySeq k)) (hy_update : ySeq (k + 1) = T_Q Q f L (xSeq (k + 1))) :
R_k Q f d L σ α xSeq ySeq (k + 1)

Lemma 1.3.1. Let {α_k}_{k≥0} be positive numbers with α_0 ∈ (0,1] and α_{k+1}^2 ≤ A_{k+1} for all k ≥ 0, where A_{k+1} = ∑_{i=0}^{k+1} α_i (equation (3.6)). Assume (R_k) holds for some k ≥ 0. Define τ_k = α_{k+1} / A_{k+1} (equation (tau_def)) and update x_{k+1} = τ_k z_k + (1-τ_k) y_k, y_{k+1} = T_Q(x_{k+1}) (equation (3.7)). Then (R_{k+1}) holds.

theorem section03_Ak_alpha_closed_form (k : ) :
A_k (fun (k : ) => (k + 1) / 2) k = (k + 1) * (k + 2) / 4

Closed form for A_k when α_k = (k+1)/2.

theorem section03_tau_alpha_closed_form (k : ) :
(fun (k : ) => (k + 1) / 2) (k + 1) / A_k (fun (k : ) => (k + 1) / 2) (k + 1) = 2 / (k + 3)

Closed form for τ_k = α_{k+1}/A_{k+1} when α_k = (k+1)/2.

theorem section03_alpha0_mem_Ioc :
(fun (k : ) => (k + 1) / 2) 0 Set.Ioc 0 1

The initial step size satisfies α_0 ∈ (0,1].

theorem section03_alpha_sq_le_Ak (k : ) :
(fun (k : ) => (k + 1) / 2) (k + 1) ^ 2 A_k (fun (k : ) => (k + 1) / 2) (k + 1)

The inequality α_{k+1}^2 ≤ A_{k+1} for α_k = (k+1)/2.

theorem alpha_seq_closed_form :
have α := fun (k : ) => (k + 1) / 2; (∀ (k : ), A_k α k = (k + 1) * (k + 2) / 4) (∀ (k : ), α (k + 1) / A_k α (k + 1) = 2 / (k + 3)) α 0 Set.Ioc 0 1 ∀ (k : ), α (k + 1) ^ 2 A_k α (k + 1)

Lemma 1.3.2. For k ≥ 0, define α_k = (k+1)/2 (equation (alpha_k_def)). Then τ_k = 2/(k+3) and A_k = (k+1)(k+2)/4 (equation (3.10)), and the conditions (3.6) (α_0 ∈ (0,1] and α_{k+1}^2 ≤ A_{k+1}) are satisfied.

def OptimalSchemeAlgorithm {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] (Q : Set E) (f d : E) (L σ : ) (xSeq ySeq : Q) :

Algorithm 1.3.1. Assume f satisfies (grad_lipschitz) on Q, and let d be a prox-function for Q with parameter σ. Initialize at a prox-center x0 of Q and set y0 = T_Q(x0). For k ≥ 0, iterate:

  1. compute f(x_k) and ∇ f(x_k);
  2. compute y_k = T_Q(x_k);
  3. compute z_k as an argmin of (L/σ) d(x) + ∑_{i=0}^k (i+1)/2 [ f(x_i) + ⟪∇ f(x_i), x - x_i⟫ ] over x ∈ Q (equation (alg311:zk));
  4. set x_{k+1} = (2/(k+3)) z_k + ((k+1)/(k+3)) y_k (equation (3.11)).
Equations
  • One or more equations did not get rendered due to their size.
Instances For
    def ModifiedSelectionRule {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] (Q : Set E) (f : E) (L : ) (xSeq ySeq : Q) (k : ) :

    Definition 1.3.6. Define a modified selection rule: given y_{k-1}, set \tilde y_k = T_Q(x_k) and choose y_k ∈ argmin { f x : x ∈ {y_{k-1}, x_k, \tilde y_k} } (equation (3.14)).

    Equations
    • One or more equations did not get rendered due to their size.
    Instances For
      theorem modified_selection_rule_monotone {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f : E} {L : } (xSeq ySeq : Q) (hselect : ∀ (k : ), ModifiedSelectionRule Q f L xSeq ySeq k) (h0 : f (ySeq 0) f (xSeq 0)) :
      (∀ (k : ), f (ySeq (k + 1)) f (ySeq k)) ∀ (k : ), f (ySeq k) f (xSeq 0)

      Proposition 1.3.5. If {y_k} is chosen by the rule (3.14), then f(y_k) ≤ f(y_{k-1}) ≤ ⋯ ≤ f(x_0) (equation (3.15)).

      theorem section03_psi_k_le_eval_of_isMinOn {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (α : ) (xSeq : Q) (k : ) (hzk_min : IsMinOn (fun (x : E) => L / σ * d x + iFinset.range (k + 1), α i * (f (xSeq i) + DualPairing (↑(fderiv f (xSeq i))) (x - (xSeq i)))) Q (z_k Q f d L σ α xSeq k)) (x : E) :
      x Qpsi_k Q f d L σ α xSeq k L / σ * d x + iFinset.range (k + 1), α i * (f (xSeq i) + DualPairing (↑(fderiv f (xSeq i))) (x - (xSeq i)))

      The infimum defining ψ_k is bounded above by evaluation at any feasible point.

      theorem section03_psik_upper_bound_at_xstar {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (α : ) (xSeq : Q) (k : ) (hQ_convex : Convex Q) (hf_convex : ConvexOn Q f) (hf_diff : xQ, DifferentiableAt f x) (hα_pos : ∀ (k : ), 0 < α k) (hzk_min : IsMinOn (fun (x : E) => L / σ * d x + iFinset.range (k + 1), α i * (f (xSeq i) + DualPairing (↑(fderiv f (xSeq i))) (x - (xSeq i)))) Q (z_k Q f d L σ α xSeq k)) (xstar : E) :
      xstar Qpsi_k Q f d L σ α xSeq k L / σ * d xstar + A_k α k * f xstar

      Upper bound ψ_k by evaluating the affine model at a point x★ ∈ Q.

      theorem section03_rate_bound_from_Rk_and_psik_upper {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (α : ) (xSeq ySeq : Q) (xstar : E) (k : ) (hA_pos : 0 < A_k α k) (hRk : R_k Q f d L σ α xSeq ySeq k) (hpsi : psi_k Q f d L σ α xSeq k L / σ * d xstar + A_k α k * f xstar) :
      f (ySeq k) - f xstar L / σ * d xstar / A_k α k

      Convert R_k and an upper bound on ψ_k into a gap estimate.

      theorem section03_optimalScheme_Rk_all {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (xSeq ySeq : Q) (hAlg : OptimalSchemeAlgorithm Q f d L σ xSeq ySeq) (k : ) :
      R_k Q f d L σ (fun (i : ) => (i + 1) / 2) xSeq ySeq k

      The invariant R_k holds for all iterates of the optimal scheme.

      theorem optimal_scheme_rate {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (xSeq ySeq : Q) (hAlg : OptimalSchemeAlgorithm Q f d L σ xSeq ySeq) :
      (∀ (k : ), (k + 1) * (k + 2) / 4 * f (ySeq k) psi_k Q f d L σ (fun (i : ) => (i + 1) / 2) xSeq k) xstarQ, (∀ yQ, f xstar f y)∀ (k : ), f (ySeq k) - f xstar 4 * L * d xstar / (σ * (k + 1) * (k + 2))

      Theorem 1.3.1. Let sequences {x_k} ⊆ Q and {y_k} ⊆ Q be generated by Algorithm 1.3.1. Then for any k ≥ 0, `((k+1)(k+2)/4) f(y_k) ≤ min_{x ∈ Q} { (L/σ) d x + ∑_{i=0}^k (i+1)/2 [ f(x_i)

      • ⟪∇ f(x_i), x - x_i⟫ ] }(equation (3.12)). Ifx* ∈ argmin_{x ∈ Q} f x, then f(y_k) - f(x*) ≤ 4 L d(x*) / (σ (k+1)(k+2))` (equation (3.13)).