Documentation

Papers.SmoothMinimization_Nesterov_2004.Sections.section03_part1

theorem smooth_convex_upper_bound_gamma_mem {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace E] {Q : Set E} (hQ_convex : Convex Q) {x y : E} (hx : x Q) (hy : y Q) (t : ) :
t Set.Icc 0 1x + t (y - x) Q

Points on the segment between x and y stay in a convex set.

theorem smooth_convex_upper_bound_gamma_sub {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace E] (x y : E) (t : ) :
x + t (y - x) - x = t (y - x)

Segment parametrization difference from the left endpoint.

theorem smooth_convex_upper_bound_gamma_sub_sub {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace E] (x y : E) (s t : ) :
x + s (y - x) - (x + t (y - x)) = (s - t) (y - x)

Difference of two points on the same segment.

theorem smooth_convex_upper_bound {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} (hQ_convex : Convex Q) {f : E} (hf_diff : xQ, DifferentiableAt f x) {L : } (hL : 0 < L) (hgrad_lipschitz : xQ, yQ, DualNormDef ((fderiv f x) - (fderiv f y)) L * x - y) (x : E) :
x QyQ, f y f x + DualPairing (↑(fderiv f x)) (y - x) + L / 2 * y - x ^ 2

Proposition 1.3.1. Let Q be a closed convex subset of a finite-dimensional real normed space and let f : Q → Real be convex and differentiable. If the gradient is Lipschitz with constant L > 0 in the dual norm (equation (grad_lipschitz)), then for any x, y in Q, f y <= f x + pairing (grad f x) (y - x) + (L / 2) (norm (y - x))^2 (inequality (3.1)).

noncomputable def T_Q {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] (Q : Set E) (f : E) (L : ) (x : Q) :
Q

Definition 1.3.1. Let f satisfy the assumptions of Proposition 1.3.1. For x ∈ Q, define T_Q(x) ∈ Q to be any minimizer of y ↦ ⟪∇ f(x), y - x⟫ + (L / 2) ‖y - x‖^2 over y ∈ Q (equation (3.2)). If the norm is not strictly convex, the minimizer may be non-unique, and T_Q(x) denotes an arbitrary choice.

Equations
Instances For
    theorem isMinOn_image_isLeast {α : Type u_1} {β : Type u_2} [Preorder β] (f : αβ) (s : Set α) (a : α) (ha : a s) (hmin : IsMinOn f s a) :
    IsLeast (f '' s) (f a)

    A pointwise minimizer yields an IsLeast on the image.

    theorem smooth_convex_TQ_exists_isMinOn {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} (hQ_closed : IsClosed Q) {f : E} {L : } (hL : 0 < L) (x : Q) :
    ∃ (y : Q), IsMinOn (fun (y : E) => DualPairing (↑(fderiv f x)) (y - x) + L / 2 * y - x ^ 2) Q y

    Existence of a minimizer for the quadratic model on a closed set.

    theorem T_Q_isMinOn_of_exists {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f : E} {L : } (x : Q) (h : ∃ (y : Q), IsMinOn (fun (y : E) => DualPairing (↑(fderiv f x)) (y - x) + L / 2 * y - x ^ 2) Q y) :
    IsMinOn (fun (y : E) => DualPairing (↑(fderiv f x)) (y - x) + L / 2 * y - x ^ 2) Q (T_Q Q f L x)

    If a minimizer exists, T_Q is a minimizer of the quadratic model.

    theorem smooth_convex_TQ_value_eq_sInf {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} (hQ_closed : IsClosed Q) {f : E} {L : } (hL : 0 < L) (x : Q) :
    sInf ((fun (y : E) => DualPairing (↑(fderiv f x)) (y - x) + L / 2 * y - x ^ 2) '' Q) = (fun (y : E) => DualPairing (↑(fderiv f x)) (y - x) + L / 2 * y - x ^ 2) (T_Q Q f L x)

    The quadratic model at T_Q equals the infimum over Q.

    theorem smooth_convex_TQ_min_bound {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} (hQ_closed : IsClosed Q) (hQ_convex : Convex Q) {f : E} (hf_diff : xQ, DifferentiableAt f x) {L : } (hL : 0 < L) (hgrad_lipschitz : xQ, yQ, DualNormDef ((fderiv f x) - (fderiv f y)) L * x - y) (x : Q) :
    f (T_Q Q f L x) f x + sInf ((fun (y : E) => DualPairing (↑(fderiv f x)) (y - x) + L / 2 * y - x ^ 2) '' Q)

    Proposition 1.3.2. Under the assumptions of Proposition 1.3.1, for any x ∈ Q we have f(T_Q(x)) ≤ f(x) + min_{y ∈ Q} { ⟪∇ f(x), y - x⟫ + (L / 2) ‖y - x‖^2 } (equation (3.3)).

    def ProxFunction {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] (Q : Set E) (d : E) :

    Definition 1.3.2. Let Q ⊆ E be closed and convex. A function d : E → ℝ is a prox-function for Q if it is continuous and σ-strongly convex on Q for some σ > 0.

    Equations
    Instances For
      def IsNormalizedProxCenter {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] (Q : Set E) (d : E) (x0 : E) :

      Definition 1.3.2. Define the prox-center x0 ∈ Q by x0 ∈ argmin_{x ∈ Q} d x (equation (prox_center_3)) and assume the normalization d x0 = 0.

      Equations
      Instances For
        theorem prox_center_lower_bound_section03 {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] {Q : Set E} {d : E} {σ : } {x0 : E} (hconv : StrongConvexOn Q σ d) (hx0 : IsNormalizedProxCenter Q d x0) (x : E) :
        x Qd x 1 / 2 * σ * x - x0 ^ 2

        Proposition 1.3.3. Assume d is σ-strongly convex on Q and x0 ∈ argmin_{x ∈ Q} d x with d x0 = 0. Then for all x ∈ Q, d x ≥ (σ / 2) ‖x - x0‖^2 (equation (3.4)).

        Definition 1.3.3. Consider the smooth convex minimization problem min_{x ∈ Q} f x (equation (3.5)), where Q ⊆ E is closed and convex, and f is convex and differentiable on Q with L-Lipschitz continuous gradient as in (grad_lipschitz).

        Equations
        • One or more equations did not get rendered due to their size.
        Instances For
          def A_k (α : ) (k : ) :

          Definition 1.3.4. Let {α_i}_{i ≥ 0} be positive step-size parameters and define A_k = ∑_{i=0}^k α_i (equation (Ak_def)).

          Equations
          Instances For
            noncomputable def psi_k {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] (Q : Set E) (f d : E) (L σ : ) (α : ) (xSeq : Q) (k : ) :

            Definition 1.3.4. Given points x_0, …, x_k ∈ Q, define ψ_k = min_{x ∈ Q} { (L/σ) d x + ∑_{i=0}^k α_i [ f(x_i) + ⟪∇ f(x_i), x - x_i⟫ ] } (equation (psi_k_def)).

            Equations
            • One or more equations did not get rendered due to their size.
            Instances For
              def R_k {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] (Q : Set E) (f d : E) (L σ : ) (α : ) (xSeq ySeq : Q) (k : ) :

              Definition 1.3.4. Given a sequence {y_k} ⊆ Q, define the relation (R_k): A_k f(y_k) ≤ ψ_k (equation (Rk)).

              Equations
              • R_k Q f d L σ α xSeq ySeq k = (A_k α k * f (ySeq k) psi_k Q f d L σ α xSeq k)
              Instances For
                theorem R0_TQ_model_le {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} (hQ_closed : IsClosed Q) {f : E} {L : } (hL : 0 < L) (x0 : Q) (z : E) :
                z QDualPairing (↑(fderiv f x0)) ((T_Q Q f L x0) - x0) + L / 2 * (T_Q Q f L x0) - x0 ^ 2 DualPairing (↑(fderiv f x0)) (z - x0) + L / 2 * z - x0 ^ 2

                The quadratic model at T_Q is minimal on Q.

                theorem R0_scaled_quadratic_le_prox {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] {Q : Set E} {d : E} {σ L : } (α : ) {x0 : E} (hconv : StrongConvexOn Q σ d) (hx0 : IsNormalizedProxCenter Q d x0) ( : 0 < σ) (hL : 0 < L) (hα0 : α 0 Set.Ioc 0 1) (z : E) :
                z Qα 0 * (L / 2 * z - x0 ^ 2) L / σ * d z

                Scale the prox-center lower bound to compare with the quadratic model.

                theorem R0_pointwise_bound_for_csInf {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (α : ) (hQ_closed : IsClosed Q) (hQ_convex : Convex Q) (hf_diff : xQ, DifferentiableAt f x) (hL : 0 < L) (hgrad_lipschitz : xQ, yQ, DualNormDef ((fderiv f x) - (fderiv f y)) L * x - y) (hconv : StrongConvexOn Q σ d) ( : 0 < σ) (x0 : Q) (hx0 : IsNormalizedProxCenter Q d x0) (hα0 : α 0 Set.Ioc 0 1) (z : E) :
                z Qα 0 * f (T_Q Q f L x0) L / σ * d z + α 0 * (f x0 + DualPairing (↑(fderiv f x0)) (z - x0))

                Pointwise lower bound for the psi_0 comparison.

                theorem R0_relation {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (α : ) (hQ_closed : IsClosed Q) (hQ_convex : Convex Q) (hf_diff : xQ, DifferentiableAt f x) (hL : 0 < L) (hgrad_lipschitz : xQ, yQ, DualNormDef ((fderiv f x) - (fderiv f y)) L * x - y) (hconv : StrongConvexOn Q σ d) ( : 0 < σ) (xSeq ySeq : Q) (x0 : Q) (hx0 : IsNormalizedProxCenter Q d x0) (hα0 : α 0 Set.Ioc 0 1) (hxSeq0 : xSeq 0 = x0) (hySeq0 : ySeq 0 = T_Q Q f L x0) :
                R_k Q f d L σ α xSeq ySeq 0

                Proposition 1.3.4. Assume α_0 ∈ (0,1]. Let x0 be the prox-center of Q from Definition 3.4, and set y0 = T_Q(x0) (equation (y0_def)). Then the relation (R_0) in (Rk) holds.

                noncomputable def z_k {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] (Q : Set E) (f d : E) (L σ : ) (α : ) (xSeq : Q) (k : ) :
                Q

                Definition 1.3.5. For k ≥ 0, define z_k ∈ Q to be any minimizer achieving ψ_k in (psi_k_def), namely z_k ∈ argmin_{x ∈ Q} { (L/σ) d x + ∑_{i=0}^k α_i [ f(x_i) + ⟪∇ f(x_i), x - x_i⟫ ] } (equation (zk_def)).

                Equations
                • One or more equations did not get rendered due to their size.
                Instances For
                  theorem section03_convex_support_fderiv {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f : E} (hQ_convex : Convex Q) (hf_convex : ConvexOn Q f) {u v : E} (hu : u Q) (hv : v Q) (hv_diff : DifferentiableAt f v) :
                  f u f v + DualPairing (↑(fderiv f v)) (u - v)

                  Supporting hyperplane inequality for a differentiable convex function on a convex set.

                  theorem section03_tau_props (α : ) (k : ) (hα_pos : ∀ (k : ), 0 < α k) (hα_sq : ∀ (k : ), α (k + 1) ^ 2 A_k α (k + 1)) :
                  have A := A_k α (k + 1); have τ := α (k + 1) / A; 0 < A 0 τ τ 1 τ ^ 2 1 / A

                  Basic properties of τ_k = α_{k+1} / A_{k+1} under (3.6).

                  theorem section03_change_of_variables_y {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} (hQ_convex : Convex Q) {f d : E} {L σ : } (α : ) (xSeq ySeq : Q) (k : ) {x : E} (hx : x Q) (hτ_nonneg : 0 α (k + 1) / A_k α (k + 1)) (hτ_le : α (k + 1) / A_k α (k + 1) 1) (hx_update : (xSeq (k + 1)) = (α (k + 1) / A_k α (k + 1)) (z_k Q f d L σ α xSeq k) + (1 - α (k + 1) / A_k α (k + 1)) (ySeq k)) :
                  have τ := α (k + 1) / A_k α (k + 1); have y := τ x + (1 - τ) (ySeq k); y Q y - (xSeq (k + 1)) = τ (x - (z_k Q f d L σ α xSeq k))

                  Change-of-variables identity for y = τ_k x + (1-τ_k) y_k.

                  theorem section03_dualpairing_combine_update {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (α : ) (xSeq ySeq : Q) (k : ) (hA_pos : 0 < A_k α (k + 1)) (hx_update : (xSeq (k + 1)) = (α (k + 1) / A_k α (k + 1)) (z_k Q f d L σ α xSeq k) + (1 - α (k + 1) / A_k α (k + 1)) (ySeq k)) (g : Module.Dual E) (x : E) :
                  A_k α k * DualPairing g ((ySeq k) - (xSeq (k + 1))) + α (k + 1) * DualPairing g (x - (xSeq (k + 1))) = α (k + 1) * DualPairing g (x - (z_k Q f d L σ α xSeq k))

                  Combine the gradient pairing terms using the update for x_{k+1}.

                  theorem section03_quad_coeff_compare {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] {L A τ : } (hL : 0 L) (hA : 0 < A) ( : τ ^ 2 1 / A) (v : E) :
                  L / (2 * A) * v ^ 2 L / 2 * τ ^ 2 * v ^ 2

                  Compare quadratic coefficients using the bound τ^2 ≤ 1 / A.

                  theorem section03_change_of_variables_model {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} {f d : E} {L σ : } (α : ) (xSeq : Q) (k : ) {x y : E} {τ : } (hy_shift : y - (xSeq (k + 1)) = τ (x - (z_k Q f d L σ α xSeq k))) (g : Module.Dual E) :
                  L / 2 * τ ^ 2 * x - (z_k Q f d L σ α xSeq k) ^ 2 + τ * DualPairing g (x - (z_k Q f d L σ α xSeq k)) = L / 2 * y - (xSeq (k + 1)) ^ 2 + DualPairing g (y - (xSeq (k + 1)))

                  Quadratic-model identity after a change of variables.

                  theorem section03_model_at_TQ_ge_fdiff {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} (hQ_convex : Convex Q) {f : E} (hf_diff : xQ, DifferentiableAt f x) {L : } (hL : 0 < L) (hgrad_lipschitz : xQ, yQ, DualNormDef ((fderiv f x) - (fderiv f y)) L * x - y) (x0 : Q) :
                  DualPairing (↑(fderiv f x0)) ((T_Q Q f L x0) - x0) + L / 2 * (T_Q Q f L x0) - x0 ^ 2 f (T_Q Q f L x0) - f x0

                  The quadratic model at T_Q dominates the function decrease.

                  theorem section03_convex_dualpairing_sub {E : Type u_1} [NormedAddCommGroup E] [NormedSpace E] [FiniteDimensional E] {Q : Set E} (hQ_convex : Convex Q) (s : Module.Dual E) (x0 : E) :
                  ConvexOn Q fun (x : E) => DualPairing s (x - x0)

                  Affine function x ↦ ⟪s, x - x0⟫ is convex on a convex set.