The example from previous lecture suggests that we need to differentiate two types of stability:
Take a closer look at the response in terms of "energy" $$ E(t)=\norm{x(t)}^2 $$
The eigenvectors and eigenvalues of $A$ are $$ V=\begin{bmatrix} \cos(\pi/4-\delta) & \cos(\pi/4+\delta) \\ \sin(\pi/4-\delta) & \sin(\pi/4+\delta) \end{bmatrix},\quad \Lambda=\begin{bmatrix} -0.1 & 0 \\ 0 & -0.2 \end{bmatrix} $$
So the linear system should be stable, at least when $t\rightarrow \infty$
But when $\delta\rightarrow 0$, $V$ becomes ill-conditioned.
X0 = [1, 0.1]
ds = [np.pi/4, 0.1, 0.05, 0.02]
pltEnergy(X0, ds)
Consider a linear system, $$ \dot{x} = Ax + f,\quad x(0)=x_0 $$ When $f=0$, the solution is (by matrix exponential) $$ x(t) = \exp(At)x_0 $$ When $x_0=0$, the solution is (by convolution) $$ x(t) = \int_0^t \exp((\tau-t)A)f(\tau)d\tau $$
We will first focus on the zero-input case
Using the concept of energy, define the stability in the sense of the growth of perturbation energy over time, based on the matrix exponential. $$ G(t) = \max_{x_0}\frac{\norm{x(t)}^2}{\norm{x_0}^2} = \max_{x_0}\frac{\norm{\exp(At)x_0}^2}{\norm{x_0}^2} = \norm{\exp(At)}^2 $$ which represents the maximum possible amplification of perturbation energy over all initial conditions.
Recall that a matrix norm $\norm{A}$ can be "induced" by a vector norm (typically L2), $\norm{A}=\max_x \frac{\norm{Ax}}{\norm{x}}$
If we diagonalize $A=V\Lambda V^{-1}$, then $$ G(t) = \norm{\exp(At)}^2 = \norm{V\exp(\Lambda t)V^{-1}}^2 $$
Again, if we follow linear stability analysis, then we would deduce the behavior of $G(t)$ from $\exp(\Lambda t)$, or simply $\Lambda$, only. But this is insufficient to characterize $G(t)$ as shown in the previous example.
Let's think in terms of the following bounds of $G(t)$, $$ \exp(2\lambda_\max t) \leq G(t) \leq \norm{V}^2\norm{V^{-1}}^2\exp(2\lambda_\max t) $$
The second inequality is a property of matrix norm $\norm{ABC}\leq\norm{A}\norm{B}\norm{C}$
If the system is linearly stable, i.e., $Re(\lambda)<0$
Note that term $\norm{V}\norm{V^{-1}}$ - looking familiar? It is the condition number of $V$, $\kappa(V)$.
Now, if $V$ is "well-behaved", so that $\kappa(V)\approx 1$, then $$ G(t) \approx \exp(2\lambda_\max t) $$ i.e. the energy amplification is governed by the least stable eigenvalue, $\lambda_\max$.
But if $V$ is "badly-behaved", so that $\kappa(V)\gg 1$, then $$ \exp(2\lambda_\max t) \leq G(t) \leq \kappa(V)^2\exp(2\lambda_\max t) $$ i.e. the energy amplification is governed by $\lambda_\max$ only for large times.
Now we introduce the concept of normality of a matrix $A$.
Consider the rate of change for $E(t)$, considering that $\dot x=Ax$ $$ \frac{dE(t)}{dt} = (\dot x^H) x + x^H (\dot x) = (x^HA^H)x + x^HAx = 2Re((Ax)^H x) $$
The last step is because $(x^HA^H)x$ and $x^HAx$ are scalars and complex conjugates of each other - so only the real part remains
With the form of $dE(t)/dt$, some people found it compelling to define the numerical range $$ F(A) = \left\{ z\ |\ z=\frac{(Ax)^H x}{x^Hx} \right\} $$ i.e. the (scaled) relative growth rate of energy, normalized by the initial energy
If you have heard of Rayleigh's quotient, then $F(A)$ is essentially all possible values of the quotient.
Things we need to know about numerical range:
Also, we define the maximum real number in the numerical range as the numerical abscissa, $\alpha(A)$
ds = [np.pi/4, 0.175, 0.1, 0.05] # Note at \delta=0.175, \alpha(A)~=0
pltNumRange(ds)
X0 = [1, 0.1]
ds = [np.pi/4, 0.175, 0.1, 0.05] # As expected at \delta=0.175, energy growth~=0
pltEnergy(X0, ds, awid=0.1)
Let's think about the meaning of $\alpha(A)$ and the initial condition $x_0^*$ that generates $\alpha(A)$
Think again the matrix normality
Consider a non-normal system that initially undergoes fast growth and eventually decay to zero. It is of interest to estimate the maximum amplitude of the growth, or the maximum gain of the system $$ G_\max = \max_{t\geq 0}\norm{\exp(tA)} $$
We essentially want $$ \tilde{G}_\max(t^*) = \max_{x_0} \frac{\norm{\exp(t^*A)x_0}}{\norm{x_0}} = \norm{\exp(t^*A)} $$ Then the SVD of $\exp(t^*A)=U\Sigma V^H$ solves the problem
Also, the minimum $\sigma$ corresponds to the minimum gain of the system, i.e. we cannot get lower than that!
delta = 0.05 # This is a highly non-normal case, and we get a pretty large gain (and disparity in gains)
ti = 14
pltInitGain(delta, ti)
We would like maximum gain of the system $$ G_\max = \max_{t\geq 0}\norm{\exp(tA)} = \max_{t\geq 0}\tilde{G}_\max(t) $$ Consider the Laplace Transform of the solution $$ \tilde{x}(s) = \int_0^\infty \exp(-st)\exp(tA)x_0 dt = (sI-A)^{-1}x_0 $$ Here we can define the resolvent operator $$ H(s) = (sI-A)^{-1} \equiv \int_0^\infty \exp(-st)\exp(tA) dt $$
Using inequalities for integrals, we can find the lower bound of $G_\max$ $$ \norm{H(s)} \leq \int_0^\infty \norm{\exp(-st)} \norm{\exp(tA)} dt \leq \frac{1}{Re(s)} \max_{t\geq 0}\norm{\exp(tA)} $$ or $$ \max_{Re(s)} Re(s)\norm{H(s)} \leq G_\max $$ The lower bound is also called the Kreiss constant, $K(A)$.
$K(A)\geq 1$; and $K(A)=1$ if $A$ is normal - think about why?
It turns out one can also find an upper bound using $K(A)$, so that $$ K(A)\leq G_\max \leq e n K(A) $$ where $e=\exp(1)$ and $n$ is dimension of $A$.
So Kreiss constant is very useful for characterizing the max gain.
One may also find another upper bound for $G_\max$ from complex analysis $$ \exp(tA) = \frac{1}{2\pi i} \oint_\Lambda \exp(zt) (zI-A)^{-1} dz \leq \frac{1}{2\pi} \oint_\Lambda \norm{(zI-A)^{-1}} |dz| $$ Then $$ G_\max \leq \frac{1}{2\pi} \oint_\Lambda \norm{H(s)} |ds| $$
delta = 0.05
ts = range(1,80)
pltGainSweep(delta, ts) # So Kreiss constant does provide an estimate of the lower bound of gains
Similarly, we can think about the zero-initial-condition solution $$ x(t) = \int_0^t \exp((\tau-t)A)f(\tau)d\tau $$
Assume $f(t)=\hat{f}\exp(i\omega t)$ and do Fourier Transform $$ \hat{x} = (i\omega I-A)^{-1} \hat{f} $$ Then we essentially want $$ G_\max = \max_{\hat{f}} \frac{\norm{(i\omega I-A)^{-1} \hat{f}}}{\norm{\hat{f}}} $$ Again the SVD gives the solution. Let $(i\omega I-A)^{-1}=U(\omega)\Sigma(\omega) V(\omega)^H$, then
The range of $\sigma(\omega)$'s give the range of possible gains at the given frequency $\omega$
W = np.linspace(0.01, 0.75, 51)
ds = [np.pi/4, 0.05]
pltSVDsweep(W, ds)
S = makeSys2D(0.785) # The normal system behaves "normally", no extremely large gains
ugv = frMIMO(S, [0.3], ifFull=True)
pltGainResp(S, 0.3, ugv, 0) # The first
# The second
pltGainResp(S, 0.3, ugv, 1)
S = makeSys2D(0.05) # But things get crazy in non-normal systems
ugv = frMIMO(S, [0.3], ifFull=True)
pltGainResp(S, 0.3, ugv, 1) # The mild one
# The wild one
pltGainResp(S, 0.3, ugv, 0)
Think about the example more, $$ \norm{(i\omega I-A)^{-1}} = \norm{V(i\omega I-\Lambda)^{-1}V^{-1}} \leq \frac{\kappa(V)}{\norm{i\omega I-\Lambda}} $$
Consider the eigenvalue problem of $A$, perturbed by another matrix $E$, $\norm{E}=\epsilon$ is small $$ (A+E)u=\lambda u,\quad \mbox{or}\quad (A-\lambda I)u = -Eu $$ Using the matrix norm inequality $$ \begin{align} \norm{(A-\lambda I)u} &= \norm{-Eu} \leq \norm{E}\norm{u} = \epsilon\norm{u} \\ \epsilon^{-1} &\leq \norm{u}/\norm{(A-\lambda I)u} = \norm{(A-\lambda I)^{-1}} \end{align} $$ Or $$ \norm{H(s)} \geq \epsilon^{-1} $$ In other words,
# note how as non-normality increases, the pseudospectrum extrudes into positive side,
# which indicates the potential (easier) destabilization of the system.
pltPseudoSpec([np.pi/4, 0.175, 0.1, 0.05], range(-5, 0))
One can define the spectrum of $A$, which is equivalent to eigenvalues $$ \sigma(A) = \left\{ z\in\mathbb{C}\ |\ (zI-A)^{-1} \mbox{ does not exist} \right\} $$
Here we are following the common notation in literature, but be careful to differentiate from singular values of $A$.
Then pseudospectrum can be thought of the generalization of eigenvalues $$ \sigma_\epsilon(A) = \left\{ z\in\mathbb{C}\ |\ \norm{(zI-A)^{-1}} > \epsilon^{-1} \right\} $$
So that we formally have $$ \lim_{\epsilon\rightarrow 0}\sigma_\epsilon(A)=\sigma(A) $$ since as $\epsilon\rightarrow 0$, $\norm{(zI-A)^{-1}} > \infty$, meaning $(zI-A)^{-1}$ does not exist.
In fact, there is a theorem, stating that the intersection of $\sigma_\epsilon(A)$ of all $\epsilon$ gives $\sigma(A)$ $$ \cap_{\epsilon>0}\sigma_\epsilon(A)=\sigma(A) $$
With the above connection, one can also have the following equivalent definitions $$ \sigma_\epsilon(A) = \left\{ z\in\mathbb{C}\ |\ \exists \norm{E}<\epsilon,\ z\in\sigma(A+E) \right\} $$ $$ \sigma_\epsilon(A) = \left\{ z\in\mathbb{C}\ |\ \exists v,\ \norm{(zI-A)v}=\epsilon \norm{v} \right\} $$
And one can also define the abscissa of the pseudospectrum, similar to numerical abscissa, as $$ \alpha_\epsilon(A) = \max Re(\sigma_\epsilon(A)) $$ Then this connects to the Kreiss constant again $$ K(A) = \max_{\epsilon>0} \frac{\alpha_\epsilon(A)}{\epsilon} $$
To characterize a linear system, esp. in terms of stability, now we know
In the next lecture let's revisit Dynamic Mode Decomposition in terms of resolvent analysis
Finally, the analysis using resolvent operator can extend to more complex dynamical systems, such as
See [Schmid2009] for more details.
$A$ is normal if it is unitarily diagonalizable: $A = QDQ^H$
$A$ is normal iff it is diagonalizable and its different eigenspaces are orthogonal to each other
For a general diagonalizable matrix, $A = XDX^{-1}$, the condition number $\kappa(X)$ can be arbitrarily large, which may pose an issue in numerical computation.
Yet, one should also keep the non-diagonlizable matrices in mind
For example, in 2D, normal matrices can only be of the following two forms, $$ \begin{bmatrix} a & b \\ b & c \end{bmatrix}, \begin{bmatrix} a & b \\ -b & a \end{bmatrix} $$
And one example type of non-normal matrix is $$ \begin{bmatrix} a & b \\ 0 & a \end{bmatrix} $$
Alternatively, we can define a commutator, which is always Hermitian, $$ C = AA^H- A^HA $$ and apparently when $A$ is normal $C=0$.
When $A$ is non-normal - general properties of $C$
There are many ways to do so. In the previous sections, we have seen numerical range/abscissa.
While $A$ is not necessarily unitarily diagonalizable, it always has a Schur decomposition, $$ A = QTQ^H $$ where $Q$ is unitary and $T=D+M$ is upper triangular. $D$ is still the eigenvalue matrix, and $M$ is a strict upper triangular matrix.
When $A$ is normal, certainly $M=0$ and Schur decomposition reduces to eigendecomposition.
It is bounded as $$ \frac{\norm{C}_F}{4\norm{A}_2} \le \nu(A) \le \Bigl( \frac{n^3-n}{12} \Bigr)^{1/4} \norm{C}_F^{1/2}. $$
Alternatively, one can define the distance to normality by "brutal force" as the minimum "effort" $E$ required to make $A$ normal, $$ d(A) = \min \left\{ \norm{E}_F: A+E \in \mathbb{C}^{n\times n} \mbox{ is normal} \right\} $$ See Higham1988 for algorithms for $d(A)$.
It is bounded as $$ \frac{\nu(A)}{n^{1/2}} \le d(A) \le \nu(A) $$
Considering only the diagonalizable matrices, $A=X\Lambda X^{-1}$, one may simply use the condition number of $X$ to quantify the non-normality. Specifically, we consider the L2-norm condition number $$ \kappa_2(X) = \norm{X}\norm{X^{-1}} $$
But since in the diagonalization process, $X$ is not unique, we pick the one with the smallest $\kappa_2$, i.e., $$ \kappa_2(X) = \min\left\{ \kappa_2(Y): A = YDY^{-1}, D \mbox{ diagonal} \right\}. $$
Next we show a series of bounds relevant to $\kappa_2(X)$ and observe that when $\kappa_2(X)=1$, i.e. when $A$ is normal, all the bounds simplify to equalities. This makes $\kappa_2(X)$ a good indicator of non-normality.
Notations
Utilizing the norm inequality on $A=X\Lambda X^{-1}$ $$ \frac{\|A\|_2}{\kappa_2(X)} \le \rho(A) \le \|A\|_2. $$
Relating singular values and eigenvalues $$ \frac{\sigma_i(A)}{\kappa_2(X)} \le |\lambda_i(A)| \le \kappa_2(X) \sigma_i(A) $$
For any real $p > 0$, $$ \frac{\rho(A)^p}{\kappa_2(X)} \le \norm{A^p}_2 \le \kappa_2(X) \rho(A)^p. $$
For any function $f$ defined on the spectrum of $A$, $$ \frac{\max_i|f(\lambda_i)|}{\kappa_2(X)} \le \norm{f(A)}_2 \le \max_i|f(\lambda_i)|. $$