Edgar SIMO-SERRA
シモセラ エドガー
f(x)
by creating a sequence of iterates {xk}k=0inf
x^k
such that
x^{k+1} = x^k + \alpha^k p^k, \quad \alpha^k \in \mathbb{R},\; p^k \in \mathbb{R}^n
p^k
and then computing the step length \alpha
by
a^k = \argmin_{\alpha>0} f(x^k+\alpha p^k)
p^k
is required to be a descent direction such that (p^k)^\intercal \nabla f(x^k) < 0
f
can be reduced along the direction p^k
p^k = (-B^k)^{-1} \nabla f(x^k)
B^k\in\mathbb{R}^{n\times n}
is symmetric and non-singular matrixB^k
being positive definite, we have
(p^k)^\intercal \nabla f(x^k) = -\nabla f(x^k)^\intercal (B^k)^{-1} \nabla f(x^k) < 0
p^k = -\nabla f(x^k)
p^k = -\nabla^2 f(x^k)^{-1} \nabla f(x^k)
p^k = -(B^k)^{-1} \nabla f(x^k)
B^k
approximates the hessian \nabla^2 f(x^k)
p^k = - \nabla f(x^k) + \beta^k p^{k-1}
Approach | Convergence Rate |
---|---|
Steepest descent | Q-linear |
Newton’s method | Q-quadratic |
Quasi-Newton methods | Q-superlinearly |
\alpha
(Review)p^k
, we want to find \alpha^k
for each iteration k=1,\ldots
a^k = \argmin_{\alpha>0} f(x^k+\alpha p^k)
\alpha^k
such that
f(x^k + \alpha^k p^k) < f(x^k)
f
each iteration f(x^k+\alpha p^k) \leq f(x^k) + c_1 \alpha \nabla (f^k)^\intercal p^k
c_1 \in (0,1)
: constant (in practice c_1 \approx 10^{-4}
) \nabla f(x^k + \alpha^k p^k)^\intercal p^k \geq c_2 \nabla f(x^k)^\intercal p^k
c_2 \in (c_1,1)
: constant (in practice c_2 \approx 0.9
for Newton/quasi-Newton methods)An optimization problem is called a linear program (線型計画) if the objective and constraint functions f_0,\ldots,f_m
are linear, i.e.,
f_i(\alpha x + \beta y) = \alpha f_i(x) + \beta f_i(y), \\ \forall x,y\in\mathbb{R}^n \quad\text{and}\quad \forall \alpha, \beta \in \mathbb{R}
An optimization problem is called a convex optimization problem (凸最適化問題) if the objective and constraint functions f_0,\ldots,f_m
satisfy
f_i(\alpha x + \beta y) \leq \alpha f_i(x) + \beta f_i(y), \\ \forall x,y\in\mathbb{R}^n \quad\text{and} \\ \forall \alpha, \beta \in \mathbb{R} \;\; \text{s.t}\;\; \alpha+\beta=1,\,\alpha\geq 0,\,\beta\geq 0
x_1 \neq x_2
are two points in \mathbb{R}^n
. A line (直線) passing through x_1
and x_2
has the form
y = \theta x_1 + (1-\theta)x_2 \; ,
where \theta\in\mathbb{R}
. y = x_2 + \theta(x_1 - x_2) \; .
C \subseteq \mathbb{R}^n
is affine (アフィン) if the line through any two distinct points in C
lies in C
:
\theta\,x_1 + (1-\theta)\,x_2 \in C, \quad \forall x_1, x_2 \in C,\; \text{and} \; \theta \in \mathbb{R}
C
is an affine set and x_0 \in C
, then the set
V = C - x_0 = \{x - x_0 \;|\; x\in C\}
is a subspace \alpha\,v_1 + \beta\,v_2 + x_0 = \\ \alpha\,(v_1 + x_0) + \beta\,(v_2 + x_0) + (1-\alpha-\beta)\,x_0 \in C
C
is the dimension of the subspace V=C-x_0
, where x_0
is any element of C
.C\subseteq\mathbb{R}^n
is called the affine hull (アフィン包) of C
, and denoted \aff{C}
: \begin{align} \aff{C} = \{&\theta_1 x_1+ \cdots + \theta_k x_k \;| \\ &\; x-1, \ldots, x_k \in C,\;\theta_1+\cdots+\theta_k=1\}\; . \end{align}
C
, i.e., if S
is any affine set with C \subseteq S
, then \aff{C} \subseteq S
.C
is the dimension of its affine hull.C \subseteq \mathbb{R}^n
is less than n
, then the affine set \aff{C}\neq \mathbb{R}^n
.C
, denoted \relint{C}
, as its interior relative to \aff{C}
:
\relint{\mathcal{D}} = \{\, x\in\mathcal{D} \;|\; B(x,r) \cap \aff{\mathcal{D}} \subseteq \mathcal{D} \;\text{for some}\; r>0 \,\}
where
B(x,r) = \{\, y \;|\; \|x-y\| \leq r \,\}
C
is convex (凸) if the line segment between any two points in C
lies in C
, i.e., if
\theta\,x_1 + (1-\theta)\,x_2 \in C,\quad \forall x_1,x_2 \in C \;\text{and}\; 0 \leq \theta \leq 1
C
is the set of all convex combinations of points in C
: \begin{align} \conv{C} = \{&\theta_1x_1 + \cdots + \theta_k x_k \;|\;\\ & x_i \in C,\; \theta_i \geq 0,\; i=1,\ldots,k,\; \\ & \theta_1+\cdots+\theta_k=1\}\;. \end{align}
\conv{C}
is the smallest convex set that contains C
.C
is called a cone (錐) if
\theta\,x\in C, \forall x \in C \quad\text{and}\quad \theta \geq 0
C
is called a convex cone (凸錐) if
\theta_1\,x_1 + \theta_2\,x_2 \in C, \forall x_1, x_2 \in C \quad\text{and}\quad \theta_1, \theta_2 \geq 0
C
is the set of all conic combinations of points in C
, i.e.,
\{ \theta\,x_1+\cdots+\theta_k\,x_k \;|\; x_i \in C,\; \theta_i\geq0,\; i=1,\ldots,k \}
\varnothing
, and single point (i.e., singleton) \{x_0\}
, and the whole space \mathbb{R}^n
are affine (hence, convex) subsets of \mathbb{R}^n
.\{ x_0+\theta\,v \;|\; \theta\geq0 \}
with v\neq0
) is convex, but not affine. It is a convex cone if x_0=0
. \{\;x \;|\; a^\intercal x = b\;\}
where \alpha \in \mathbb{R}^n
, a \neq 0
, and b \in \mathbb{R}
x
(and thus an affine set)\alpha
and offset from origin b
\mathbb{R}^n
into two halfspaces(半空間) which have the form
\{\;x \;|\; a^\intercal x \leq b\;\}
\mathbb{R}^n
has the form \begin{align} B(x_c, r) &= \{\; x \;|\; \|x-x_c\|_2 \leq r\;\} \\ &= \{\; x \;|\; (x-x_c)^\intercal (x-x_c) \leq r^2 \;\} \end{align}
where r>0
x_c
is the center of the ballr
is the radius \begin{align} \mathcal{P} = \{\;x \;|\; & a_j^\intercal x - b_j \leq 0,\; j=1,\ldots,m, \\ & c_j^\intercal x-d_j=0,\; j=1,\ldots,p\;\} \end{align}
S_1
and S_2
are convex, then S_1 \cap S_2
is convex. f(x) = Ax+b
where A\in\mathbb{R}^{m\times n}
and b\in\mathbb{R}^m
.S\subseteq \mathbb{R}^n
is convex and f \colon \mathbb{R}^n \to \mathbb{R}^m
is an affine function. Then the image of S
under f
,
f(S) = \{ f(x) \;|\; x\in S \}
is convex.P\colon \mathbb{R}^{n+1} \to \mathbb{R}^n
, with domain \dom{P}=\mathbb{R}^n \times \mathbb{R}_{++}
(\mathbb{R}_{++}=\{x\in\mathbb{R} \;|\; x>0\}
). If C\subseteq \dom{P}
is convex, then its image
P(C)= \{P(x) \;|\; x\in C \}
is convex.f\colon \mathbb{R}^n \rightarrow \mathbb{R}
is convex(凸) if \dom{f}
is a convex set and if for all x
, y \in \dom{f}
, and \theta
with 0 \leq \theta \leq 1
, we have
f(\theta\,x+(1-\theta)\,y) \leq \theta\,f(x) + (1-\theta)\,f(y)
x \neq y
f
is concave(凹) if -f
is convex\mathbb{R}^n
by defining the value to be \infty
outside of its domain. If f
is convex, we define its extended-value extension f\colon \mathbb{R}^n \rightarrow \mathbb{R}\cup\{\infty\}
by
\tilde{f}= \left\{\begin{array}{lr} f(x) & x\in\dom{f} \\ \infty & x\notin\dom{f} \\ \end{array}\right.
f
is differentiable. Then f
is convex if and only if \dom{f}
is convex and
f(y) \geq f(x) + \nabla f(x)^\intercal (y-x)
holds for all x, y\in\dom{f}
.f
is twice differentiable, then f
is convex if and only if \dom{f}
is convex and its Hessian is positive semi-definite: for all x\in\dom{f}
,
\nabla^2f(x) \succeq 0
X\succeq Y
denotes matrix inequality between symmetric matrices X
and Y
\mathbb{R}
, this reduces to f''(x)\geq 0
(and \dom{f}
being convex)e^{ax}
is convex on \mathbb{R}
, for any a\in\mathbb{R}
x^a
is convex on \mathbb{R}_{++}
when a\geq 1
or a\leq 0
, and concave for 0\leq a \leq 1
|x|^p
, for p\geq1
, is convex on \mathbb{R}
\log{x}
is concave on \mathbb{R}_{++}
x \log{x}
is convex on \mathbb{R}_{++}
\mathbb{R}^n
is convexf(x) = \max\{x_1,\ldots,x_n\}
is convex on \mathbb{R}^n
f(x) = \max\{x_1,\ldots,x_n\}
satisfies, for 0 \leq \theta \leq 1
,
\begin{align} f(\theta\,x + (1-\theta)\,y) &= \max_i (\theta\,x_i + (1-\theta)\,y_i) \\ &\leq \max_i x_i + (1-\theta)\max_i y_i \\ &= \theta f(x) + (1-\theta)f(y) \end{align}
f = w_1 f_1 + \cdots + w_m f_m
is convex when w_i > 0
and f_i
are convex.f \colon \mathbb{R}^n \rightarrow \mathbb{R}
, \in\mathbb{R}^{n\times m}
, and b\in\mathbb{R}^n
. Define g \colon \mathbb{R}^m \rightarrow \mathbb{R}
by
g(x) = f(Ax+b)
with \dom{g} = \{\,x \;|\; Ax+b\in\dom{f}\,\}
. Then if f
is convex, so is g
.f_1
and f_2
are convex functions, then their *pointwise maximum f
, defined by
f(x) = \max\{\,f_1(x),\,f_2(x)\,\}
is also convex. L(y) = \max( 0,\;1-t\; y )
wherey
: is the output of a model (e.g., y = w^\intercal x + b
)t
: is either +1 or -1 depending on the true class of the example \sum_{i=0}^N \max(0,\;1-t_i\;w^\intercal x)
\begin{align} \text{minimize} \quad & f_0(x) \\ \text{subject to} \quad & f_i(x) \leq 0, \quad i=1,\ldots,m \\ & h_i(x) = 0, \quad i=1,\ldots,p \end{align}
where:
\mathcal{D} = \bigcap_{i=0}^m \dom{f_i} \cap \bigcap_{i=0}^p \dom{h_i}
p^\star
is defined as
\begin{align} p^\star = \inf \{\, f_0(x) \;|\; &f_i(x) \leq 0,\,i=1,\ldots,m\, \\ &h_i(x)=0,\,i=1,\ldots,p\} \end{align}
p^\star
is infeasible it will have the extended value +\infty
p^\star=-\infty
, we say it is unbounded belowx^\star
is an optimal point or solution if it is feasible and f_0(x^\star)=p^\star
.
\begin{align} X_{opt} = \{\, x \;|\; &f_i(x) \leq 0, i=1,\ldots,m, \\ &h_i(x)=0, i=1,\ldots,p,\; f_0(x)=p^\star \,\} \end{align}
\begin{align} \text{find} \quad & x \\ \text{subject to} \quad & f_i(x) \leq 0, \quad i=1,\ldots,m \\ & h_i(x) = 0, \quad i=1,\ldots,p \end{align}
x^\star \neq \infty
is found, we say that the problem is feasible
y
that approximates input signal x
\begin{align} \min_y&\quad \frac{1}{n} \sum_{i=0}^n (x_i-y_i)^2 + \lambda \frac{1}{n-1} \sum_{i=0}^{n-1} \|y_{i+1} - y_i\| \\ \end{align}
KITTI semantic segmentation benchmark
\Omega \in \mathbb{R}^2
k+1
regions
\begin{align} \min_{\{\Omega_i\}}&\quad \sum_{i=0}^k \int_{\Omega_i} f_i(x)\, dx + \frac{1}{2}\sum_{i=1}^k | \partial \Omega_i | \\ \text{subject to}&\quad \bigcup_{i=0}^k \Omega_i = \Omega, \quad \Omega_s \cap \Omega_t = \varnothing\quad \forall s \neq t \end{align}
Decomposing Images into Layers via RGB-space Geometry. Jianchao Tan, Jyh-Ming Lien, and Yotam Gingold. ACM Transactions on Graphics (TOG), 2016.