CGO 04-1: Quadratic Problem in R^2

Download the Jupyter Notebook file from here.

CGO 05-1: Quadratic Problem in $\mathbb{R}^2$

Homework

Please do the following and upload the completed notebook to Moodle.

  1. Fill in the TODOs in the code
  2. Answer the questions at the end

A bit of Theory

Next few sections cover some theory regarding optimization that was not given in class, but can be useful for this homework.

Strong Convexity and Implications

  • We assume that the objective function is strongly convex on $S$, which means there exists an $m>0$ such that \(\nabla^2 f(x) \succeq m I,\quad \forall x\in S\)
    • Equivalent to saying smallest eigenvalue of $\nabla^2 f(x)$ is $m$
  • For $x,y \in S$ we have
\[f(y) = f(x) + \nabla f(x)^\intercal (y-x) + \frac{1}{2}(y-x)^\intercal \nabla^2 f(z)(y-x)\]

for some $z$ on the line segment $[x,y]$

  • Incorporating strong convexity assumption
\[f(y) \geq f(x) + \nabla f(x)^\intercal (y-x) + \frac{m}{2}\|y-x\|_2^2, \; \forall x,y \in S\]
  • $m=0$ is the convexity inequality

Bounding $f(x)-p^\star$ with Strong Convexity

  • We want to bound $f(x)-p^\star$
  • Setting $\nabla f(y)=0$ we obtain $\tilde{y}=x-(1/m)\nabla f(x)$, and thus
\[\begin{align} f(y) &\geq f(x) + \nabla f(x)^\intercal(y-x)+\frac{m}{2}\|y-x\|_2^2 \\ &\geq f(x) + \nabla f(x)^\intercal(\tilde{y}-x)+\frac{m}{2}\|\tilde{y}-x\|_2^2 \\ &= f(x) - \frac{1}{2m}\| \nabla f(x) \|_2^2 \end{align}\]
  • Since this holds for any $y\in S$, we have
\[p^\star \geq f(x)-\frac{1}{2m}\|\nabla f(x)\|_2^2\]

Upper bound on $\nabla^2 f(x)$

\[f(y) \geq f(x) + \nabla f(x)^\intercal (y-x) + \frac{m}{2}\|y-x\|_2^2, \; \forall x,y \in S\]
  • The above equation implies that the sublevel sets contained in $S$ are bounded and thus $S$ is bounded
  • This implies that the maximum eigenvalue of $\nabla^2 f(x)$ is bounded above on $S$:
\[\nabla^2 f(x) \preceq M I\]
  • This upper bound implies that for any $x, y \in S$
\[f(y) \leq f(x) + \nabla f(x)^\intercal (y-x) + \frac{M}{2}\|y-x\|_2^2\]
  • Minimizing each side of $y$ yields
\[p^\star \leq f(x) - \frac{1}{2M} \| \nabla f(x) \|_2^2\]

Condition Number of Sublevel Sets

  • Combining the upper bound on $\nabla^2 f(x)$ and strong convexity
\[m I \preceq \nabla^2 f(x) \preceq MI, \quad \forall x \in S\]
  • The ratio $M/m$ is an upper bound on the condition number of $\nabla^2 f(x)$
  • Condition number is the ratio of the largest eigenvalue to its smallest eigenvalue
  • Can be used to determine stopping condition
    • If we terminate an algorithm when $|\nabla f(x^{(k)})|_2 \leq \rho$, where $\rho>0$ is smaller than $(m \epsilon)^{1/2}$, then
\[f(x^{(k)}) - p^\star \leq \epsilon\]

Quadratic Problem in $\mathbb{R}^2$

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
  • Consider the simple example \(f(x) = \frac{1}{2}(x_1^2 + \gamma x_2^2),\quad \gamma>0\)
    • Optimal point is $x^\star = 0$
    • Hessian is constant with eigenvalues $1$ and $\gamma$
  • Condition number of the sublevel sets of $f$ are \(\frac{\max\{1,\gamma\}}{\min\{1,\gamma\}} = \max\{\gamma, 1/\gamma\}\)
  • Tightest choices for strong convexity constraints are \(m = \min\{1,\gamma\},\quad M=\max\{1,\gamma\}\)
# problem definition
gamma = 8
A = np.diag( [1, gamma] )

def f(x):
    "Function value at point x"
    return np.diag( 0.5 * x.T @ A @ x )
def g(x):
    "Gradient at point x"
    return A @ x
# backtracking parameters
alpha = 0.1
beta  = 0.7
# starting point
x0 = np.array( [-1,1], ndmin=2, dtype=np.float64 ).T
f0 = f(x0) # initial function value
# calculate the minimum
xmin = np.array([0,0], ndmin=2).T # we know the analytic minimum to the problem
fmin = f( xmin )

Show five contour lines $f_0 + k \Delta,\quad k\in{-3,-2,-1,0,1},\quad \Delta = (f_0-f_\text{min})/4$

nangles = 1000
nopts   = 1000
thetas  = np.linspace( 0, 2*np.pi, nangles )
cont    = np.zeros((5,2,nangles))
Delta   = (f0-fmin)/4
for i in range(nangles):
    ray  = np.vstack( [np.linspace(0,4,nopts)*np.cos(thetas[i]),
                       np.linspace(0,4,nopts)*np.sin(thetas[i])] )
    fray = f( ray )

    def get_ind( expression ):
        ind = np.nonzero( expression )[0].max(0)
        return (ray[:,ind-1] + ray[:,ind]) / 2

    cont[0,:,i] = get_ind( fray < f0 )
    cont[1,:,i] = get_ind( fray < f0 - 3*Delta )
    cont[2,:,i] = get_ind( fray < f0 - 2*Delta )
    cont[3,:,i] = get_ind( fray < f0 - Delta )
    cont[4,:,i] = get_ind( fray < f0 + Delta )

def plt_contour():
    fig, ax = plt.subplots()
    ax.plot( cont[0,0,:], cont[0,1,:], '--',
             cont[1,0,:], cont[1,1,:], '--',
             cont[2,0,:], cont[2,1,:], '--',
             cont[3,0,:], cont[3,1,:], '--',
             cont[4,0,:], cont[4,1,:], '--',
           )
    ax.axis('equal')
    ax.axis('off')
    return fig, ax

Run the Gradient Method

Algorithm 9.1 General Descent Method

  1. given a starting point $x\in\text{dom}{f}$.
  2. repeat
    1. Determine a descent direction $\Delta x$
    2. Line search. Choose a step size $t>0$
    3. Update. $x := x + t\Delta x$
  3. until stopping criterion is satisfied.
  1. given a descent direction $\Delta x$ for $f$ at $x\in\text{dom}{f}$, $\alpha \in (0,\,0.5)$, $\beta\in(0,\,1)$.
  2. $t := 1$
  3. while $f(x+t\Delta x) > f(x) + \alpha t \nabla f(x)^\intercal \Delta x$
    1. $t := \beta t$

Algorithm 9.3 Gradient Descent Method

  1. given a starting point $x \in\text{dom}f$`
  2. repeat
    1. $\Delta x := - \nabla f(x)$
    2. Line search. Choose step size $t$ via exact or backtracking line search.
    3. Update. $x := x + t \Delta x$
  3. until stopping criterion is satisfied.
maxiters = 100
xs = np.zeros( (2,maxiters) )
fs = np.zeros( (1,maxiters) )
x  = x0.copy()
for i in range(maxiters):
    fval    = f( x )
    xs[:,i:i+1] = x
    fs[:,i]   = fval
    if fval-fmin < 1e-10: # stopping criterion
        break
    # Backtracking Line Search
    gval = g(x) # gradient
    v = -gval
    s = 1
    for k in range(10):
        xnew   = x + s * v
        fxnew  = f( xnew )
        if fxnew < fval + s * alpha * gval.T @ v:
            break
        else:
            s = s * beta
    x = x + s * v

# Plot path
fig, ax = plt_contour()
ax.plot( xs[0,0:i+1], xs[1,0:i+1], 'o' )
ax.plot( xs[0,0:i+1], xs[1,0:i+1], '-' )

Steepest Descent Method For Quadratic Norm

Algorithm 9.4 Steepest Descent Method

  1. given a starting point $x\in\text{dom}{f}$
  2. repeat
    1. Compute steepest descent direction $\Delta x_\text{sd}$
    2. Line search. Choose $t$ via backtracking or exact line search.
    3. Update. $x :=x+t \Delta x_{sd}$
  3. until stopping criterion is satisfied.
maxiters = 100
xs = np.zeros( (2,maxiters) )
fs = np.zeros( (1,maxiters) )
x  = x0.copy()
for i in range(maxiters):
    fval = f(x)
    xs[:,i:i+1] = x
    fs[:,i] = fval
    if fval-fmin < 1e-10:
        break
    gval = g(x) # gradient
    v = ... # TODO Finish implementing Steepest Descent Method for Quadratic norm
    s = 1
    for k in range(10):
        xnew = x + s * v
        fxnew = f(xnew)
        if fxnew < fval + s * alpha * gval.T @ v:
            break
        else:
            s = beta * s
    x = x + s * v

# Plot path
fig, ax = plt_contour()
ax.plot( xs[0,0:i+1], xs[1,0:i+1], 'o' )
ax.plot( xs[0,0:i+1], xs[1,0:i+1], '-' )

Questions

  • Try with two quadratic norms defined by \(P_1 = \begin{bmatrix} 2 & 0 \\ 0 & 8 \end{bmatrix} \quad P_2 = \begin{bmatrix} 8 & 0 \\ 0 & 2 \end{bmatrix}\)
  • Answer the following questions.
    1. What is going on?
    2. Why is this happening?
    3. What happens if you use a quadratic norm defined by the Hessian?
    4. What is this related to?