DS_Nonnormal slides

Machine Learning for Engineering¶

Dynamical System: Nonnormality¶

Instructor: Daning Huang¶

$$ \newcommand\norm[1]{{\left\lVert #1 \right\rVert}} \newcommand{\bR}{\mathbb{R}} \newcommand{\vf}{\mathbf{f}} \newcommand{\vg}{\mathbf{g}} \newcommand{\vv}{\mathbf{v}} \newcommand{\vw}{\mathbf{w}} \newcommand{\vx}{\mathbf{x}} \newcommand{\vy}{\mathbf{y}} \newcommand{\vA}{\mathbf{A}} \newcommand{\vB}{\mathbf{B}} \newcommand{\vU}{\mathbf{U}} \newcommand{\vV}{\mathbf{V}} \newcommand{\vW}{\mathbf{W}} \newcommand{\vX}{\mathbf{X}} \newcommand{\vtf}{\boldsymbol{\phi}} \newcommand{\vty}{\boldsymbol{\psi}} \newcommand{\vtF}{\boldsymbol{\Phi}} \newcommand{\vtS}{\boldsymbol{\Sigma}} \newcommand{\vtL}{\boldsymbol{\Lambda}} \newcommand{\cF}{\mathcal{F}} \newcommand{\cK}{\mathcal{K}} $$

TODAY: Dynamical System - II¶

Finite-time stability analysis
- Transient growth
- Maximum gain
- Resolvent analysis

Example problem and code:¶

Available here.

References:¶

Schmid2009 P. Schmid, Nonmodal stability analysis, 2009; or a more formal paper version Schmid2014
AIAAJ2017 Modal Analysis of Fluid Flows: An Overview, 2017
AIAAJ2020 Modal Analysis of Fluid Flows: Applications and Outlook, 2020

Stability analysis (Nonnormal)¶

The example from previous lecture suggests that we need to differentiate two types of stability:

Linear stability: we are interested in the minimum critical parameter above which a specific initial condition of infinitesimal amplitude grows exponentially
Energy stability: we are interested in the maximum critical parameter below which a general initial condition of finite amplitude decays monotonically

Strategies for transient behaviors¶

Take a closer look at the response in terms of "energy" $$ E(t)=\norm{x(t)}^2 $$

Continuing to use the previous example 2D linear system¶

The eigenvectors and eigenvalues of $A$ are $$ V=\begin{bmatrix} \cos(\pi/4-\delta) & \cos(\pi/4+\delta) \\ \sin(\pi/4-\delta) & \sin(\pi/4+\delta) \end{bmatrix},\quad \Lambda=\begin{bmatrix} -0.1 & 0 \\ 0 & -0.2 \end{bmatrix} $$

So the linear system should be stable, at least when $t\rightarrow \infty$

But when $\delta\rightarrow 0$, $V$ becomes ill-conditioned.

In [2]:

X0 = [1, 0.1]
ds = [np.pi/4, 0.1, 0.05, 0.02]
pltEnergy(X0, ds)

The energy certainly decay to zero over long time, since the system is linearly stable
Black arrows: The initial growth rate of energy gives a good estimate how high the energy may go
- This leads to the Transient growth analysis
Red bars: The maximum growth in energy gives a more direct characterization for stability
- This leads to the Maximum gain analysis
- w.r.t. either initial conditions or inputs

Some preparations¶

Consider a linear system, $$ \dot{x} = Ax + f,\quad x(0)=x_0 $$ When $f=0$, the solution is (by matrix exponential) $$ x(t) = \exp(At)x_0 $$ When $x_0=0$, the solution is (by convolution) $$ x(t) = \int_0^t \exp((\tau-t)A)f(\tau)d\tau $$

We will first focus on the zero-input case

Using the concept of energy, define the stability in the sense of the growth of perturbation energy over time, based on the matrix exponential. $$ G(t) = \max_{x_0}\frac{\norm{x(t)}^2}{\norm{x_0}^2} = \max_{x_0}\frac{\norm{\exp(At)x_0}^2}{\norm{x_0}^2} = \norm{\exp(At)}^2 $$ which represents the maximum possible amplification of perturbation energy over all initial conditions.

Recall that a matrix norm $\norm{A}$ can be "induced" by a vector norm (typically L2), $\norm{A}=\max_x \frac{\norm{Ax}}{\norm{x}}$

If we diagonalize $A=V\Lambda V^{-1}$, then $$ G(t) = \norm{\exp(At)}^2 = \norm{V\exp(\Lambda t)V^{-1}}^2 $$

Again, if we follow linear stability analysis, then we would deduce the behavior of $G(t)$ from $\exp(\Lambda t)$, or simply $\Lambda$, only. But this is insufficient to characterize $G(t)$ as shown in the previous example.

Let's think in terms of the following bounds of $G(t)$, $$ \exp(2\lambda_\max t) \leq G(t) \leq \norm{V}^2\norm{V^{-1}}^2\exp(2\lambda_\max t) $$

The second inequality is a property of matrix norm $\norm{ABC}\leq\norm{A}\norm{B}\norm{C}$

If the system is linearly stable, i.e., $Re(\lambda)<0$

When $t\rightarrow\infty$, both sides decay to zero, so $G(t)$ is certainly zero.
But before that, would $G(t)$ do something else?
- Or, would $G(t)$ go to (close-to-)infinity in finite time?

Note that term $\norm{V}\norm{V^{-1}}$ - looking familiar? It is the condition number of $V$, $\kappa(V)$.
- Recall we have argued before that when $\kappa(A)$ is large, it would be difficult to numerically solve linear system of equations $Ax=b$.
Now, if $V$ is "well-behaved", so that $\kappa(V)\approx 1$, then $$ G(t) \approx \exp(2\lambda_\max t) $$ i.e. the energy amplification is governed by the least stable eigenvalue, $\lambda_\max$.
But if $V$ is "badly-behaved", so that $\kappa(V)\gg 1$, then $$ \exp(2\lambda_\max t) \leq G(t) \leq \kappa(V)^2\exp(2\lambda_\max t) $$ i.e. the energy amplification is governed by $\lambda_\max$ only for large times.

Now we introduce the concept of normality of a matrix $A$.

$A$ is normal iff it is diagonalizable and its different eigenspaces are orthogonal to each other
- In other words, $A=V\Lambda V^H$, i.e. $V$ is unitary, $V^{-1}=V^H$
- Then $\kappa(V)=1$
- Eigenvalue analysis captures the entire dynamics
Otherwise $A$ is non-normal
- Characterized by non-orthogonal eigenvectors
- $\kappa(V)$ can be large
- Eigenvalue analysis captures the asymptotic dynamics, but not the short-time behavior

Transient short-term analysis¶

Consider the rate of change for $E(t)$, considering that $\dot x=Ax$ $$ \frac{dE(t)}{dt} = (\dot x^H) x + x^H (\dot x) = (x^HA^H)x + x^HAx = 2Re((Ax)^H x) $$

The last step is because $(x^HA^H)x$ and $x^HAx$ are scalars and complex conjugates of each other - so only the real part remains

With the form of $dE(t)/dt$, some people found it compelling to define the numerical range $$ F(A) = \left\{ z\ |\ z=\frac{(Ax)^H x}{x^Hx} \right\} $$ i.e. the (scaled) relative growth rate of energy, normalized by the initial energy

If you have heard of Rayleigh's quotient, then $F(A)$ is essentially all possible values of the quotient.

Things we need to know about numerical range:

It is convex;
It contains the spectrum of $A$, i.e. all of its eigenvalues;
It is the convex hull of the spectrum, if $A$ is normal.

Also, we define the maximum real number in the numerical range as the numerical abscissa, $\alpha(A)$

Alternatively $$\alpha(A)=Re[\lambda_\max(A+A^H)/2]$$ Think about why.

In [3]:

ds = [np.pi/4, 0.175, 0.1, 0.05]  # Note at \delta=0.175, \alpha(A)~=0
pltNumRange(ds)

In [4]:

X0 = [1, 0.1]
ds = [np.pi/4, 0.175, 0.1, 0.05]  # As expected at \delta=0.175, energy growth~=0
pltEnergy(X0, ds, awid=0.1)

Let's think about the meaning of $\alpha(A)$ and the initial condition $x_0^*$ that generates $\alpha(A)$

If $\alpha(A)>0$, then if the system starts from $x_0^*$, the system energy will grow
In short time the growth can be high and the system may show some instability
This is true even when all the eigenvalues are on the left half plane (i.e. stable when $t\rightarrow\infty$)

Think again the matrix normality

If $A$ is normal, $\alpha(A)=Re(\lambda_\max(A))$, then long-term stability equals to short-term stability
If $A$ is non-normal, it is possible that $\alpha(A) \gg Re(\lambda_\max(A))$, then long/short-term analysis differ

Maximum-gain analysis: Initial conditions¶

Consider a non-normal system that initially undergoes fast growth and eventually decay to zero. It is of interest to estimate the maximum amplitude of the growth, or the maximum gain of the system $$ G_\max = \max_{t\geq 0}\norm{\exp(tA)} $$

If the amplitude is low, we are probably still in a linearizable region and the system is still stable over all
If the amplitude is high, then the linearization itself might break - i.e. the long-term behavior becomes meaningless

For a given time¶

Starting from zero-input solution $$ x(t) = \exp(At)x_0 $$
What is the $x_0$ that maximizes the energy amplification at a given time $t^*$?

We essentially want $$ \tilde{G}_\max(t^*) = \max_{x_0} \frac{\norm{\exp(t^*A)x_0}}{\norm{x_0}} = \norm{\exp(t^*A)} $$ Then the SVD of $\exp(t^*A)=U\Sigma V^H$ solves the problem

The first right singular vector $v_1$ defines the (unit) $x_0$
The first left singular vector $u_1$ defines the maximum state $x$ at $t=t^*$
The first singular value $\sigma_1$ is the maximum gain

Also, the minimum $\sigma$ corresponds to the minimum gain of the system, i.e. we cannot get lower than that!

In [5]:

delta = 0.05   # This is a highly non-normal case, and we get a pretty large gain (and disparity in gains)
ti = 14
pltInitGain(delta, ti)

For any time¶

We would like maximum gain of the system $$ G_\max = \max_{t\geq 0}\norm{\exp(tA)} = \max_{t\geq 0}\tilde{G}_\max(t) $$ Consider the Laplace Transform of the solution $$ \tilde{x}(s) = \int_0^\infty \exp(-st)\exp(tA)x_0 dt = (sI-A)^{-1}x_0 $$ Here we can define the resolvent operator $$ H(s) = (sI-A)^{-1} \equiv \int_0^\infty \exp(-st)\exp(tA) dt $$

Using inequalities for integrals, we can find the lower bound of $G_\max$ $$ \norm{H(s)} \leq \int_0^\infty \norm{\exp(-st)} \norm{\exp(tA)} dt \leq \frac{1}{Re(s)} \max_{t\geq 0}\norm{\exp(tA)} $$ or $$ \max_{Re(s)} Re(s)\norm{H(s)} \leq G_\max $$ The lower bound is also called the Kreiss constant, $K(A)$.

$K(A)\geq 1$; and $K(A)=1$ if $A$ is normal - think about why?

It turns out one can also find an upper bound using $K(A)$, so that $$ K(A)\leq G_\max \leq e n K(A) $$ where $e=\exp(1)$ and $n$ is dimension of $A$.

So Kreiss constant is very useful for characterizing the max gain.

Side note¶

One may also find another upper bound for $G_\max$ from complex analysis $$ \exp(tA) = \frac{1}{2\pi i} \oint_\Lambda \exp(zt) (zI-A)^{-1} dz \leq \frac{1}{2\pi} \oint_\Lambda \norm{(zI-A)^{-1}} |dz| $$ Then $$ G_\max \leq \frac{1}{2\pi} \oint_\Lambda \norm{H(s)} |ds| $$

In [6]:

delta = 0.05
ts = range(1,80)
pltGainSweep(delta, ts)  # So Kreiss constant does provide an estimate of the lower bound of gains

Maximum-gain analysis: Forced response¶

Similarly, we can think about the zero-initial-condition solution $$ x(t) = \int_0^t \exp((\tau-t)A)f(\tau)d\tau $$

What is the (harmonic) input $f$ that maximizes the energy amplification over time?

Assume $f(t)=\hat{f}\exp(i\omega t)$ and do Fourier Transform $$ \hat{x} = (i\omega I-A)^{-1} \hat{f} $$ Then we essentially want $$ G_\max = \max_{\hat{f}} \frac{\norm{(i\omega I-A)^{-1} \hat{f}}}{\norm{\hat{f}}} $$ Again the SVD gives the solution. Let $(i\omega I-A)^{-1}=U(\omega)\Sigma(\omega) V(\omega)^H$, then

$V(\omega)$ gives the forcing and $U(\omega)$ gives the output
$\Sigma(\omega)$ gives the gain

The range of $\sigma(\omega)$'s give the range of possible gains at the given frequency $\omega$

In [7]:

W = np.linspace(0.01, 0.75, 51)
ds = [np.pi/4, 0.05]
pltSVDsweep(W, ds)

In [11]:

S = makeSys2D(0.785)    # The normal system behaves "normally", no extremely large gains
ugv = frMIMO(S, [0.3], ifFull=True)
pltGainResp(S, 0.3, ugv, 0)  # The first

In [12]:

# The second
pltGainResp(S, 0.3, ugv, 1)

In [13]:

S = makeSys2D(0.05)    # But things get crazy in non-normal systems
ugv = frMIMO(S, [0.3], ifFull=True)
pltGainResp(S, 0.3, ugv, 1)  # The mild one

In [14]:

# The wild one
pltGainResp(S, 0.3, ugv, 0)

Think about the example more, $$ \norm{(i\omega I-A)^{-1}} = \norm{V(i\omega I-\Lambda)^{-1}V^{-1}} \leq \frac{\kappa(V)}{\norm{i\omega I-\Lambda}} $$

For a normal system, $\kappa(V)=1$, and the norm becomes infinite (if no damping) if the forcing frequency coincides with the spectrum - then we have resonance, as expected.
For non-normal system, it is possible to have pseudo-resonance, i.e. large response to external forcing, even though the forcing frequency is far from the actual spectrum.

Alternative view of $\norm{H(s)}$¶

Consider the eigenvalue problem of $A$, perturbed by another matrix $E$, $\norm{E}=\epsilon$ is small $$ (A+E)u=\lambda u,\quad \mbox{or}\quad (A-\lambda I)u = -Eu $$ Using the matrix norm inequality $$ \begin{align} \norm{(A-\lambda I)u} &= \norm{-Eu} \leq \norm{E}\norm{u} = \epsilon\norm{u} \\ \epsilon^{-1} &\leq \norm{u}/\norm{(A-\lambda I)u} = \norm{(A-\lambda I)^{-1}} \end{align} $$ Or $$ \norm{H(s)} \geq \epsilon^{-1} $$ In other words,

The resolvent contours represent the eigenvalue distribution of all possible perturbed $A$'s
- Therefore the contours are also called pseudospectrum
If the eigenvalues change a lot with a small $\epsilon$, i.e. are sensitive to perturbations, then the system is likely to be non-normal

In [10]:

# note how as non-normality increases, the pseudospectrum extrudes into positive side,
# which indicates the potential (easier) destabilization of the system.
pltPseudoSpec([np.pi/4, 0.175, 0.1, 0.05], range(-5, 0))

Spectrum and Pseudospectrum¶

One can define the spectrum of $A$, which is equivalent to eigenvalues $$ \sigma(A) = \left\{ z\in\mathbb{C}\ |\ (zI-A)^{-1} \mbox{ does not exist} \right\} $$

Here we are following the common notation in literature, but be careful to differentiate from singular values of $A$.

Then pseudospectrum can be thought of the generalization of eigenvalues $$ \sigma_\epsilon(A) = \left\{ z\in\mathbb{C}\ |\ \norm{(zI-A)^{-1}} > \epsilon^{-1} \right\} $$

So that we formally have $$ \lim_{\epsilon\rightarrow 0}\sigma_\epsilon(A)=\sigma(A) $$ since as $\epsilon\rightarrow 0$, $\norm{(zI-A)^{-1}} > \infty$, meaning $(zI-A)^{-1}$ does not exist.

In fact, there is a theorem, stating that the intersection of $\sigma_\epsilon(A)$ of all $\epsilon$ gives $\sigma(A)$ $$ \cap_{\epsilon>0}\sigma_\epsilon(A)=\sigma(A) $$

With the above connection, one can also have the following equivalent definitions $$ \sigma_\epsilon(A) = \left\{ z\in\mathbb{C}\ |\ \exists \norm{E}<\epsilon,\ z\in\sigma(A+E) \right\} $$ $$ \sigma_\epsilon(A) = \left\{ z\in\mathbb{C}\ |\ \exists v,\ \norm{(zI-A)v}=\epsilon \norm{v} \right\} $$

And one can also define the abscissa of the pseudospectrum, similar to numerical abscissa, as $$ \alpha_\epsilon(A) = \max Re(\sigma_\epsilon(A)) $$ Then this connects to the Kreiss constant again $$ K(A) = \max_{\epsilon>0} \frac{\alpha_\epsilon(A)}{\epsilon} $$

Summary¶

To characterize a linear system, esp. in terms of stability, now we know

Linear stability analysis: Real parts of eigenvalues
Transient growth analysis: Numerical range and abscissa
Maximum gain analysis: Resolvent operator and its SVD; Pseudospectrum and the Kreiss constant

In the next lecture let's revisit Dynamic Mode Decomposition in terms of resolvent analysis

Side note¶

Finally, the analysis using resolvent operator can extend to more complex dynamical systems, such as

stochastic forcing
parameter and mean-flow uncertainty
time-periodic and generally time-dependent flow
nonlinear perturbation dynamics
multiple inhomogeneous directions/complex geometry using Lyapunov analysis, Floquet theory, adjoint analysis, and so on.

See [Schmid2009] for more details.

Properties of Nonnormal matrices (Optional)¶

Summary based on this post by Nick Higham.

This discussion is from a pure linear algebra perspective, different from the previous discussion based on linear systems.

Definitions¶

$A$ is normal if it is unitarily diagonalizable: $A = QDQ^H$

$D$ diagonal, having the eigenvalues
$Q$ unitary, representing a complete set of orthonormal eigenvectors
The condition number $\kappa(Q)=\norm{Q} \norm{Q^{-1}}=\norm{Q} \norm{Q^H}=1$

$A$ is normal iff it is diagonalizable and its different eigenspaces are orthogonal to each other

For a general diagonalizable matrix, $A = XDX^{-1}$, the condition number $\kappa(X)$ can be arbitrarily large, which may pose an issue in numerical computation.

Yet, one should also keep the non-diagonlizable matrices in mind

For example, in 2D, normal matrices can only be of the following two forms, $$ \begin{bmatrix} a & b \\ b & c \end{bmatrix}, \begin{bmatrix} a & b \\ -b & a \end{bmatrix} $$

And one example type of non-normal matrix is $$ \begin{bmatrix} a & b \\ 0 & a \end{bmatrix} $$

Alternatively, we can define a commutator, which is always Hermitian, $$ C = AA^H- A^HA $$ and apparently when $A$ is normal $C=0$.

When $A$ is non-normal - general properties of $C$

$C$ has zero trace, because the diagonal cancels out; then the eigenvalues sum to zero
But if $C$ is nonzero, there should be at least two different nonzero eigenvalues (so they cancel out and make trace zero)
And the rank of $C$ is at least 2.

Measurement of non-normality¶

There are many ways to do so. In the previous sections, we have seen numerical range/abscissa.

Using Triangularization¶

While $A$ is not necessarily unitarily diagonalizable, it always has a Schur decomposition, $$ A = QTQ^H $$ where $Q$ is unitary and $T=D+M$ is upper triangular. $D$ is still the eigenvalue matrix, and $M$ is a strict upper triangular matrix.

When $A$ is normal, certainly $M=0$ and Schur decomposition reduces to eigendecomposition.

Then $\norm{M}_F$ is a natural measure of how far $A$ is from being normal
Formally, one can use the following equivalent form, given by Henrici in 1960s $$ \nu(A) = \norm{M}_F \equiv \biggl( \norm{A}_F^2 - \sum_{j=1}^n |\lambda_j|^2 \biggr)^{1/2}. $$

It is bounded as $$ \frac{\norm{C}_F}{4\norm{A}_2} \le \nu(A) \le \Bigl( \frac{n^3-n}{12} \Bigr)^{1/4} \norm{C}_F^{1/2}. $$

Using Frobenius norm¶

Alternatively, one can define the distance to normality by "brutal force" as the minimum "effort" $E$ required to make $A$ normal, $$ d(A) = \min \left\{ \norm{E}_F: A+E \in \mathbb{C}^{n\times n} \mbox{ is normal} \right\} $$ See Higham1988 for algorithms for $d(A)$.

It is bounded as $$ \frac{\nu(A)}{n^{1/2}} \le d(A) \le \nu(A) $$

Using condition number¶

Considering only the diagonalizable matrices, $A=X\Lambda X^{-1}$, one may simply use the condition number of $X$ to quantify the non-normality. Specifically, we consider the L2-norm condition number $$ \kappa_2(X) = \norm{X}\norm{X^{-1}} $$

But since in the diagonalization process, $X$ is not unique, we pick the one with the smallest $\kappa_2$, i.e., $$ \kappa_2(X) = \min\left\{ \kappa_2(Y): A = YDY^{-1}, D \mbox{ diagonal} \right\}. $$

Next we show a series of bounds relevant to $\kappa_2(X)$ and observe that when $\kappa_2(X)=1$, i.e. when $A$ is normal, all the bounds simplify to equalities. This makes $\kappa_2(X)$ a good indicator of non-normality.

Notations

Order the eigenvalues of $A$ as $|\lambda_1| \ge |\lambda_2| \ge \cdots \ge |\lambda_n|$
$\rho(A)$ denotes the spectral radius of $A$, the largest absolute value of any eigenvalue of $A$, so $\rho(A)=|\lambda_1|$.
Assume $A$ has singular values $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_n$

Utilizing the norm inequality on $A=X\Lambda X^{-1}$ $$ \frac{\|A\|_2}{\kappa_2(X)} \le \rho(A) \le \|A\|_2. $$
Relating singular values and eigenvalues $$ \frac{\sigma_i(A)}{\kappa_2(X)} \le |\lambda_i(A)| \le \kappa_2(X) \sigma_i(A) $$
For any real $p > 0$, $$ \frac{\rho(A)^p}{\kappa_2(X)} \le \norm{A^p}_2 \le \kappa_2(X) \rho(A)^p. $$
For any function $f$ defined on the spectrum of $A$, $$ \frac{\max_i|f(\lambda_i)|}{\kappa_2(X)} \le \norm{f(A)}_2 \le \max_i|f(\lambda_i)|. $$