Papers.SmoothMinimization_Nesterov_2004.Sections.section03

source

theorem smooth_convex_upper_bound_gamma_mem {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace ℝ E] {Q : Set E} (hQ_convex : Convex ℝ Q) {x y : E} (hx : x ∈ Q) (hy : y ∈ Q) (t : ℝ) :

t ∈ Set.Icc 0 1 → x + t • (y - x) ∈ Q

Points on the segment between x and y stay in a convex set.

source

theorem smooth_convex_upper_bound_gamma_sub {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace ℝ E] (x y : E) (t : ℝ) :

x + t • (y - x) - x = t • (y - x)

Segment parametrization difference from the left endpoint.

source

theorem smooth_convex_upper_bound_gamma_sub_sub {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace ℝ E] (x y : E) (s t : ℝ) :

x + s • (y - x) - (x + t • (y - x)) = (s - t) • (y - x)

Difference of two points on the same segment.

source

theorem smooth_convex_upper_bound {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} (hQ_convex : Convex ℝ Q) {f : E → ℝ} (hf_diff : ∀ x ∈ Q, DifferentiableAt ℝ f x) {L : ℝ} (hL : 0 < L) (hgrad_lipschitz : ∀ x ∈ Q, ∀ y ∈ Q, DualNormDef (↑(fderiv ℝ f x) - ↑(fderiv ℝ f y)) ≤ L * ‖x - y‖) (x : E) :

x ∈ Q → ∀ y ∈ Q, f y ≤ f x + DualPairing (↑(fderiv ℝ f x)) (y - x) + L / 2 * ‖y - x‖ ^ 2

Proposition 1.3.1. Let Q be a closed convex subset of a finite-dimensional real normed space and let f : Q → Real be convex and differentiable. If the gradient is Lipschitz with constant L > 0 in the dual norm (equation (grad_lipschitz)), then for any x, y in Q, f y <= f x + pairing (grad f x) (y - x) + (L / 2) (norm (y - x))^2 (inequality (3.1)).

source

noncomputable def T_Q {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] (Q : Set E) (f : E → ℝ) (L : ℝ) (x : ↑Q) :

↑Q

Definition 1.3.1. Let f satisfy the assumptions of Proposition 1.3.1. For x ∈ Q, define T_Q(x) ∈ Q to be any minimizer of y ↦ ⟪∇ f(x), y - x⟫ + (L / 2) ‖y - x‖^2 over y ∈ Q (equation (3.2)). If the norm is not strictly convex, the minimizer may be non-unique, and T_Q(x) denotes an arbitrary choice.

Equations

T_Q Q f L x = if h : ∃ (y : ↑Q), IsMinOn (fun (y : E) => DualPairing (↑(fderiv ℝ f ↑x)) (y - ↑x) + L / 2 * ‖y - ↑x‖ ^ 2) Q ↑y then Classical.choose h else x

Instances For

source

theorem isMinOn_image_isLeast {α : Type u_1} {β : Type u_2} [Preorder β] (f : α → β) (s : Set α) (a : α) (ha : a ∈ s) (hmin : IsMinOn f s a) :

IsLeast (f '' s) (f a)

A pointwise minimizer yields an IsLeast on the image.

source

theorem smooth_convex_TQ_exists_isMinOn {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} (hQ_closed : IsClosed Q) {f : E → ℝ} {L : ℝ} (hL : 0 < L) (x : ↑Q) :

∃ (y : ↑Q), IsMinOn (fun (y : E) => DualPairing (↑(fderiv ℝ f ↑x)) (y - ↑x) + L / 2 * ‖y - ↑x‖ ^ 2) Q ↑y

Existence of a minimizer for the quadratic model on a closed set.

source

theorem T_Q_isMinOn_of_exists {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} {f : E → ℝ} {L : ℝ} (x : ↑Q) (h : ∃ (y : ↑Q), IsMinOn (fun (y : E) => DualPairing (↑(fderiv ℝ f ↑x)) (y - ↑x) + L / 2 * ‖y - ↑x‖ ^ 2) Q ↑y) :

IsMinOn (fun (y : E) => DualPairing (↑(fderiv ℝ f ↑x)) (y - ↑x) + L / 2 * ‖y - ↑x‖ ^ 2) Q ↑(T_Q Q f L x)

If a minimizer exists, T_Q is a minimizer of the quadratic model.

source

theorem smooth_convex_TQ_value_eq_sInf {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} (hQ_closed : IsClosed Q) {f : E → ℝ} {L : ℝ} (hL : 0 < L) (x : ↑Q) :

sInf ((fun (y : E) => DualPairing (↑(fderiv ℝ f ↑x)) (y - ↑x) + L / 2 * ‖y - ↑x‖ ^ 2) '' Q) = (fun (y : E) => DualPairing (↑(fderiv ℝ f ↑x)) (y - ↑x) + L / 2 * ‖y - ↑x‖ ^ 2) ↑(T_Q Q f L x)

The quadratic model at T_Q equals the infimum over Q.

source

theorem smooth_convex_TQ_min_bound {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} (hQ_closed : IsClosed Q) (hQ_convex : Convex ℝ Q) {f : E → ℝ} (hf_diff : ∀ x ∈ Q, DifferentiableAt ℝ f x) {L : ℝ} (hL : 0 < L) (hgrad_lipschitz : ∀ x ∈ Q, ∀ y ∈ Q, DualNormDef (↑(fderiv ℝ f x) - ↑(fderiv ℝ f y)) ≤ L * ‖x - y‖) (x : ↑Q) :

f ↑(T_Q Q f L x) ≤ f ↑x + sInf ((fun (y : E) => DualPairing (↑(fderiv ℝ f ↑x)) (y - ↑x) + L / 2 * ‖y - ↑x‖ ^ 2) '' Q)

Proposition 1.3.2. Under the assumptions of Proposition 1.3.1, for any x ∈ Q we have f(T_Q(x)) ≤ f(x) + min_{y ∈ Q} { ⟪∇ f(x), y - x⟫ + (L / 2) ‖y - x‖^2 } (equation (3.3)).

source

def ProxFunction {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] (Q : Set E) (d : E → ℝ) :

Prop

Definition 1.3.2. Let Q ⊆ E be closed and convex. A function d : E → ℝ is a prox-function for Q if it is continuous and σ-strongly convex on Q for some σ > 0.

Equations

ProxFunction Q d = IsProxFunction Q d

Instances For

source

def IsNormalizedProxCenter {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] (Q : Set E) (d : E → ℝ) (x0 : E) :

Prop

Definition 1.3.2. Define the prox-center x0 ∈ Q by x0 ∈ argmin_{x ∈ Q} d x (equation (prox_center_3)) and assume the normalization d x0 = 0.

Equations

IsNormalizedProxCenter Q d x0 = (IsProxCenter Q d x0 ∧ d x0 = 0)

Instances For

source

theorem prox_center_lower_bound_section03 {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] {Q : Set E} {d : E → ℝ} {σ : ℝ} {x0 : E} (hconv : StrongConvexOn Q σ d) (hx0 : IsNormalizedProxCenter Q d x0) (x : E) :

x ∈ Q → d x ≥ 1 / 2 * σ * ‖x - x0‖ ^ 2

Proposition 1.3.3. Assume d is σ-strongly convex on Q and x0 ∈ argmin_{x ∈ Q} d x with d x0 = 0. Then for all x ∈ Q, d x ≥ (σ / 2) ‖x - x0‖^2 (equation (3.4)).

source

def SmoothConvexMinimizationProblem {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] (Q : Set E) (f : E → ℝ) (L : ℝ) :

Prop

Definition 1.3.3. Consider the smooth convex minimization problem min_{x ∈ Q} f x (equation (3.5)), where Q ⊆ E is closed and convex, and f is convex and differentiable on Q with L-Lipschitz continuous gradient as in (grad_lipschitz).

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def A_k (α : ℕ → ℝ) (k : ℕ) :

ℝ

Definition 1.3.4. Let {α_i}_{i ≥ 0} be positive step-size parameters and define A_k = ∑_{i=0}^k α_i (equation (Ak_def)).

Equations

A_k α k = ∑ i ∈ Finset.range (k + 1), α i

Instances For

source

noncomputable def psi_k {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] (Q : Set E) (f d : E → ℝ) (L σ : ℝ) (α : ℕ → ℝ) (xSeq : ℕ → ↑Q) (k : ℕ) :

ℝ

Definition 1.3.4. Given points x_0, …, x_k ∈ Q, define ψ_k = min_{x ∈ Q} { (L/σ) d x + ∑_{i=0}^k α_i [ f(x_i) + ⟪∇ f(x_i), x - x_i⟫ ] } (equation (psi_k_def)).

Equations

One or more equations did not get rendered due to their size.

Instances For

source

def R_k {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] (Q : Set E) (f d : E → ℝ) (L σ : ℝ) (α : ℕ → ℝ) (xSeq ySeq : ℕ → ↑Q) (k : ℕ) :

Prop

Definition 1.3.4. Given a sequence {y_k} ⊆ Q, define the relation (R_k): A_k f(y_k) ≤ ψ_k (equation (Rk)).

Equations

R_k Q f d L σ α xSeq ySeq k = (A_k α k * f ↑(ySeq k) ≤ psi_k Q f d L σ α xSeq k)

Instances For

source

theorem R0_TQ_model_le {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} (hQ_closed : IsClosed Q) {f : E → ℝ} {L : ℝ} (hL : 0 < L) (x0 : ↑Q) (z : E) :

z ∈ Q → DualPairing (↑(fderiv ℝ f ↑x0)) (↑(T_Q Q f L x0) - ↑x0) + L / 2 * ‖↑(T_Q Q f L x0) - ↑x0‖ ^ 2 ≤ DualPairing (↑(fderiv ℝ f ↑x0)) (z - ↑x0) + L / 2 * ‖z - ↑x0‖ ^ 2

The quadratic model at T_Q is minimal on Q.

source

theorem R0_scaled_quadratic_le_prox {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] {Q : Set E} {d : E → ℝ} {σ L : ℝ} (α : ℕ → ℝ) {x0 : E} (hconv : StrongConvexOn Q σ d) (hx0 : IsNormalizedProxCenter Q d x0) (hσ : 0 < σ) (hL : 0 < L) (hα0 : α 0 ∈ Set.Ioc 0 1) (z : E) :

z ∈ Q → α 0 * (L / 2 * ‖z - x0‖ ^ 2) ≤ L / σ * d z

Scale the prox-center lower bound to compare with the quadratic model.

source

theorem R0_pointwise_bound_for_csInf {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} {f d : E → ℝ} {L σ : ℝ} (α : ℕ → ℝ) (hQ_closed : IsClosed Q) (hQ_convex : Convex ℝ Q) (hf_diff : ∀ x ∈ Q, DifferentiableAt ℝ f x) (hL : 0 < L) (hgrad_lipschitz : ∀ x ∈ Q, ∀ y ∈ Q, DualNormDef (↑(fderiv ℝ f x) - ↑(fderiv ℝ f y)) ≤ L * ‖x - y‖) (hconv : StrongConvexOn Q σ d) (hσ : 0 < σ) (x0 : ↑Q) (hx0 : IsNormalizedProxCenter Q d ↑x0) (hα0 : α 0 ∈ Set.Ioc 0 1) (z : E) :

z ∈ Q → α 0 * f ↑(T_Q Q f L x0) ≤ L / σ * d z + α 0 * (f ↑x0 + DualPairing (↑(fderiv ℝ f ↑x0)) (z - ↑x0))

Pointwise lower bound for the psi_0 comparison.

source

theorem R0_relation {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} {f d : E → ℝ} {L σ : ℝ} (α : ℕ → ℝ) (hQ_closed : IsClosed Q) (hQ_convex : Convex ℝ Q) (hf_diff : ∀ x ∈ Q, DifferentiableAt ℝ f x) (hL : 0 < L) (hgrad_lipschitz : ∀ x ∈ Q, ∀ y ∈ Q, DualNormDef (↑(fderiv ℝ f x) - ↑(fderiv ℝ f y)) ≤ L * ‖x - y‖) (hconv : StrongConvexOn Q σ d) (hσ : 0 < σ) (xSeq ySeq : ℕ → ↑Q) (x0 : ↑Q) (hx0 : IsNormalizedProxCenter Q d ↑x0) (hα0 : α 0 ∈ Set.Ioc 0 1) (hxSeq0 : xSeq 0 = x0) (hySeq0 : ySeq 0 = T_Q Q f L x0) :

R_k Q f d L σ α xSeq ySeq 0

Proposition 1.3.4. Assume α_0 ∈ (0,1]. Let x0 be the prox-center of Q from Definition 3.4, and set y0 = T_Q(x0) (equation (y0_def)). Then the relation (R_0) in (Rk) holds.

source

noncomputable def z_k {E : Type u_1} [SeminormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] (Q : Set E) (f d : E → ℝ) (L σ : ℝ) (α : ℕ → ℝ) (xSeq : ℕ → ↑Q) (k : ℕ) :

↑Q

Definition 1.3.5. For k ≥ 0, define z_k ∈ Q to be any minimizer achieving ψ_k in (psi_k_def), namely z_k ∈ argmin_{x ∈ Q} { (L/σ) d x + ∑_{i=0}^k α_i [ f(x_i) + ⟪∇ f(x_i), x - x_i⟫ ] } (equation (zk_def)).

Equations

One or more equations did not get rendered due to their size.

Instances For

source

theorem section03_convex_support_fderiv {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} {f : E → ℝ} (hQ_convex : Convex ℝ Q) (hf_convex : ConvexOn ℝ Q f) {u v : E} (hu : u ∈ Q) (hv : v ∈ Q) (hv_diff : DifferentiableAt ℝ f v) :

f u ≥ f v + DualPairing (↑(fderiv ℝ f v)) (u - v)

Supporting hyperplane inequality for a differentiable convex function on a convex set.

source

theorem section03_tau_props (α : ℕ → ℝ) (k : ℕ) (hα_pos : ∀ (k : ℕ), 0 < α k) (hα_sq : ∀ (k : ℕ), α (k + 1) ^ 2 ≤ A_k α (k + 1)) :

have A := A_k α (k + 1); have τ := α (k + 1) / A; 0 < A ∧ 0 ≤ τ ∧ τ ≤ 1 ∧ τ ^ 2 ≤ 1 / A

Basic properties of τ_k = α_{k+1} / A_{k+1} under (3.6).

source

theorem section03_change_of_variables_y {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} (hQ_convex : Convex ℝ Q) {f d : E → ℝ} {L σ : ℝ} (α : ℕ → ℝ) (xSeq ySeq : ℕ → ↑Q) (k : ℕ) {x : E} (hx : x ∈ Q) (hτ_nonneg : 0 ≤ α (k + 1) / A_k α (k + 1)) (hτ_le : α (k + 1) / A_k α (k + 1) ≤ 1) (hx_update : ↑(xSeq (k + 1)) = (α (k + 1) / A_k α (k + 1)) • ↑(z_k Q f d L σ α xSeq k) + (1 - α (k + 1) / A_k α (k + 1)) • ↑(ySeq k)) :

have τ := α (k + 1) / A_k α (k + 1); have y := τ • x + (1 - τ) • ↑(ySeq k); y ∈ Q ∧ y - ↑(xSeq (k + 1)) = τ • (x - ↑(z_k Q f d L σ α xSeq k))

Change-of-variables identity for y = τ_k x + (1-τ_k) y_k.

source

theorem section03_dualpairing_combine_update {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} {f d : E → ℝ} {L σ : ℝ} (α : ℕ → ℝ) (xSeq ySeq : ℕ → ↑Q) (k : ℕ) (hA_pos : 0 < A_k α (k + 1)) (hx_update : ↑(xSeq (k + 1)) = (α (k + 1) / A_k α (k + 1)) • ↑(z_k Q f d L σ α xSeq k) + (1 - α (k + 1) / A_k α (k + 1)) • ↑(ySeq k)) (g : Module.Dual ℝ E) (x : E) :

A_k α k * DualPairing g (↑(ySeq k) - ↑(xSeq (k + 1))) + α (k + 1) * DualPairing g (x - ↑(xSeq (k + 1))) = α (k + 1) * DualPairing g (x - ↑(z_k Q f d L σ α xSeq k))

Combine the gradient pairing terms using the update for x_{k+1}.

source

theorem section03_quad_coeff_compare {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] {L A τ : ℝ} (hL : 0 ≤ L) (hA : 0 < A) (hτ : τ ^ 2 ≤ 1 / A) (v : E) :

L / (2 * A) * ‖v‖ ^ 2 ≥ L / 2 * τ ^ 2 * ‖v‖ ^ 2

Compare quadratic coefficients using the bound τ^2 ≤ 1 / A.

source

theorem section03_change_of_variables_model {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} {f d : E → ℝ} {L σ : ℝ} (α : ℕ → ℝ) (xSeq : ℕ → ↑Q) (k : ℕ) {x y : E} {τ : ℝ} (hy_shift : y - ↑(xSeq (k + 1)) = τ • (x - ↑(z_k Q f d L σ α xSeq k))) (g : Module.Dual ℝ E) :

L / 2 * τ ^ 2 * ‖x - ↑(z_k Q f d L σ α xSeq k)‖ ^ 2 + τ * DualPairing g (x - ↑(z_k Q f d L σ α xSeq k)) = L / 2 * ‖y - ↑(xSeq (k + 1))‖ ^ 2 + DualPairing g (y - ↑(xSeq (k + 1)))

Quadratic-model identity after a change of variables.

source

theorem section03_model_at_TQ_ge_fdiff {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} (hQ_convex : Convex ℝ Q) {f : E → ℝ} (hf_diff : ∀ x ∈ Q, DifferentiableAt ℝ f x) {L : ℝ} (hL : 0 < L) (hgrad_lipschitz : ∀ x ∈ Q, ∀ y ∈ Q, DualNormDef (↑(fderiv ℝ f x) - ↑(fderiv ℝ f y)) ≤ L * ‖x - y‖) (x0 : ↑Q) :

DualPairing (↑(fderiv ℝ f ↑x0)) (↑(T_Q Q f L x0) - ↑x0) + L / 2 * ‖↑(T_Q Q f L x0) - ↑x0‖ ^ 2 ≥ f ↑(T_Q Q f L x0) - f ↑x0

The quadratic model at T_Q dominates the function decrease.

source

theorem section03_convex_dualpairing_sub {E : Type u_1} [NormedAddCommGroup E] [NormedSpace ℝ E] [FiniteDimensional ℝ E] {Q : Set E} (hQ_convex : Convex ℝ Q) (s : Module.Dual ℝ E) (x0 : E) :

ConvexOn ℝ Q fun (x : E) => DualPairing s (x - x0)

Affine function x ↦ ⟪s, x - x0⟫ is convex on a convex set.

Documentation

Papers.SmoothMinimization_Nesterov_2004.Sections.section03_part1