Standard linear (time-invariant) system $$ \begin{align*} \dot{x} &= Ax+Bu \\ y &= Cx+Du \end{align*} $$ with initial condition $x(0)=x_0$
For any full-rank matrix $T\in R^{N\times N}$, one can transform $$ x'=Tx $$ to obtain an equivalent system $$ \begin{align*} \dot{x}' &= A'x'+B'u \\ y &= C'x'+D'u \end{align*} $$ where $$ A'=TAT^{-1},\ B'=TB,\ C'=CT^{-1},\ D'=D $$
For convenience, we ignore $D$ in the following.
A state $x^*$ is reachable
Denote the set of reachable states as $\Wr$.
A state $x^*$ is controllable
Denote the set of controllable states as $\Wc$.
For example, if a linear system is $$ \begin{align} \begin{bmatrix}\dot{x}^{(1)} \\ \dot{x}^{(2)}\end{bmatrix} &= \begin{bmatrix}A^{(1,1)} & A^{(1,2)} \\ \rO & A^{(2,2)}\end{bmatrix} \begin{bmatrix}x^{(1)} \\ x^{(2)}\end{bmatrix} + \begin{bmatrix}B^{(1)} \\ \rO\end{bmatrix}u \\ y &= \begin{bmatrix}C^{(1)} & C^{(2)}\end{bmatrix} \begin{bmatrix}x^{(1)} \\ x^{(2)}\end{bmatrix} \end{align} $$ Then $x^{(2)}$ are not controllable, because
A system is completely controllable if there is no transformation such that the system is equivalent to the previous form with
A state $x^*$ is unobservable, if
For example, if a linear system is $$ \begin{align} \begin{bmatrix}\dot{x}^{(1)} \\ \dot{x}^{(2)}\end{bmatrix} &= \begin{bmatrix}A^{(1,1)} & \rO \\ A^{(2,1)} & A^{(2,2)}\end{bmatrix} \begin{bmatrix}x^{(1)} \\ x^{(2)}\end{bmatrix} + \begin{bmatrix}B^{(1)} \\ B^{(2)}\end{bmatrix}u \\ y &= \begin{bmatrix}C^{(1)} & \rO\end{bmatrix} \begin{bmatrix}x^{(1)} \\ x^{(2)}\end{bmatrix} \end{align} $$ Then $x^{(2)}$ are not observable, because
A system is completely observable if there is no transformation such that the system is equivalent to the previous form with
Consider all combinations: (un)controllable $\times$ (un)observable, a linear system can be transformed to $$ \begin{align} \begin{bmatrix}\dot{x}^{(1)} \\ \dot{x}^{(2)} \\ \dot{x}^{(3)} \\ \dot{x}^{(4)}\end{bmatrix} &= \begin{bmatrix} A^{(1,1)} & A^{(1,2)} & A^{(1,3)} & A^{(1,4)} \\ \rO & A^{(2,2)} & \rO & A^{(2,4)} \\ \rO & \rO & A^{(3,3)} & A^{(3,4)} \\ \rO & \rO & \rO & A^{(4,4)}\end{bmatrix} \begin{bmatrix}x^{(1)} \\ x^{(2)} \\ x^{(3)} \\ x^{(4)}\end{bmatrix} + \begin{bmatrix}B^{(1)} \\ B^{(2)} \\ \rO \\ \rO\end{bmatrix}u \\ y &= \begin{bmatrix}\rO & C^{(2)} & \rO & C^{(4)}\end{bmatrix} \begin{bmatrix}x^{(1)} \\ x^{(2)} \\ x^{(3)} \\ x^{(4)}\end{bmatrix} \end{align} $$
In the following we will deal with a linear system of 4 states, 1 input, and 1 output. Printed as one matrix $$ \begin{bmatrix} A & B \\ C & D \end{bmatrix} $$
sys_print((A,B,C)) # The LTI system
[[ 0.8 0. -16. -7.2 -0.8] [ 0.4 -2. -1.5 -2.1 -1.4] [ 0. 0. -3. 0. 0. ] [ 1.2 0. -4. -5.8 -0.2] [ 1. 2. -1. -3. 0. ]]
sys_print((_A,_B,_C)) # The system is transformed from this one
# Note the canonical structure
[[-1. 0. -6. -3. 1.] [ 0. -2. -2. -2. 3.] [ 0. 0. -3. 0. 0.] [ 0. 0. 0. -4. 0.] [ 0. -1. 0. 1. 0.]]
For $\exp(At)$ in linear system solution:
By Taylor series expansion $$ \exp(At) = I + tA + \frac{t^2}{2}A^2 + \cdots + \frac{t^N}{N!}A^N + \cdots $$
But for any $K\geq N$, $A^K = \sum_{i=0}^{N-1} c_i A^i$ for some $c_i$. Cayley–Hamilton theorem
Then we have an exact and finite expression: $$ \exp(At) = \sum_{i=0}^{N-1} \alpha_i(t) A^i $$ for certain coefficients $\alpha_i(t)$.
Revisit the linear system solution with $x_0=0$ and any $u(t)$ $$ \begin{align*} x(t) &= \int_0^t \exp(A(t-\tau))Bu(\tau) d\tau = \int_0^t \exp(As)Bu(t-s) ds \\ &= \int_0^t \left(\sum_{i=0}^{N-1} \alpha_i(s) A^i\right)Bu(t-s) ds \\ &= \sum_{i=0}^{N-1} \beta_i(t) A^iB \end{align*} $$ where $$ \beta_i(t) = \int_0^t \alpha_i(s)u(t-s) ds $$
This means a reachable state $x(t)$ must be a linear combination of the following matrices $$ \cR(A,B) = [B,AB,\cdots,A^{N-1}B] $$ Hence $\cR$ is the reachability matrix and $$ \Wr = \Im \cR(A,B) $$
Also note that if we look at an equivalent system with transformation $T$, $$ \cR(A',B') = [\cdots,(A')^iB',\cdots] = [\cdots,(TAT^{-1})^iTB,\cdots] = [\cdots,TA^iB,\cdots] = T\cR(A,B) $$ so $$ \Im \cR(A',B') = \Im \cR(A,B) $$ so $\cR$ is coordinate-independent.
Now we start from $x_0\neq 0$ and want to drive the states to 0 at time $t$.
$$ \begin{align*} x(t) &= \exp(At)x_0 + \int_0^t \exp(A(t-\tau))Bu(\tau) d\tau \\ x(t)-\exp(At)x_0 \equiv x_d &= \int_0^t \exp(A(t-\tau))Bu(\tau) d\tau \\ &= \sum_{i=0}^{N-1} \beta_i(t) A^iB \end{align*} $$Then for $x_0$ to be controllable, it must be a linear combination of $\cR(A,B)$ as well. So $$ \Wc=\Im\cR(A,B) = \Wr $$
R = comp_R(A, B) # The controllability matrix
u, s, vh = np.linalg.svd(R)
print(s) # This has two non-zero SVs
CC = u[:,:2].T # Controllable subspace
UC = u[:,2:].T # Uncontrollable
[1.3732e+01 9.6594e-01 3.4139e-16 5.7619e-32]
Tinv = np.vstack([CC, UC]).T
sys = sys_transform((A,B,C), Tinv)
sys_print(sys)
[[-2.0023e+00 2.2992e-02 -2.9439e+00 -3.1023e+00 1.4738e+00] [-9.8276e-02 -9.9775e-01 8.1516e+00 1.6267e+01 -6.8406e-01] [ 1.2364e-16 1.0854e-15 -4.0000e+00 6.2172e-15 -1.3878e-16] [ 2.7453e-32 2.4101e-31 -2.2204e-16 -3.0000e+00 -3.0815e-32] [-2.0141e+00 4.6205e-02 3.1530e+00 1.0000e+00 0.0000e+00]]
Beyond knowing whether a state is controllable, we can also quantify the level of controllability
Suppose we try to find the necessary control to reach $x_d$ $$ \begin{align*} x_d &= \int_0^t \exp(A(t-\tau))Bu(\tau) d\tau \\ &= \lim_{N\rightarrow\infty} \sum_{i=0}^N \exp(A(t-\tau_i))Bu(\tau_i)\Delta\tau \\ &= \lim_{N\rightarrow\infty} LU \end{align*} $$ where $$ L = [\cdots,\exp(A(t-\tau_i))B\Delta\tau,\cdots],\quad U^T=[\cdots,u(\tau_i)^T,\cdots] $$
For finite $N$, least-norm solution $$ U = V\Sigma^{-1} W^T x_d $$ where $L = W\Sigma V^T$.
Then $$ \int_0^t \norm{u(\tau)}^2 d\tau \approx U^TU\Delta\tau = x_d^T (W \Sigma^{-2} W^T\Delta\tau) x_d = x_d^T (LL^T)^{-1}\Delta\tau x_d $$ since $LL^T = W\Sigma^2 W^T$
Note that $$ (LL^T)^{-1}\Delta\tau = \left(\sum_{i=0}^{N_t} \exp(A(t-\tau_i))BB^T\exp(A^T(t-\tau_i))\right)^{-1}\Delta\tau^{-1} $$
Taking the limit of the inverse $$ \cP = \lim_{\Delta\tau\rightarrow 0} (LL^T)\Delta\tau = \int_0^t \exp(A(t-\tau))BB^T\exp(A^T(t-\tau))d\tau $$
By change of variable, we can also write, $$ \cP(t) = \int_0^t \exp(A\tau)BB^T\exp(A^T\tau)d\tau $$ assuming real-valued $(A,B)$.
The total energy for controlling a certain state $x_d$ is $$ \int_0^t \norm{u(\tau)}^2 d\tau = x_d^T \cP^{-1} x_d $$
In other words, $\cP^{-1}$ characterizes the energy needed to control a state.
One critical property of $\cP$ is its positive semi-definiteness.
For $\forall x\neq 0$ $$ x^T\cP x = \int_0^t x^T\exp(A\tau)BB^T\exp(A^T\tau)xd\tau = \int_0^t \norm{B^T\exp(A^T\tau)x}^2d\tau \geq 0 $$
Over a unit magnitude, the state that requires the least energy to control satisfies $$ \max_{\norm{x}=1} \norm{x}_{\cP^{-1}} = \lambda_\max^{-1} $$ where $\lambda_\max$ is the maximum eigenvalue of $\cP$.
cP = comp_cP_inf(A, B) # The controllability Gramian
w, v = spl.eigh(cP)
print(w) # This has two non-zero EVs; ascending order
tmp = v[:,::-1].T
CC = tmp[:2] # Controllable subspace
UC = tmp[2:] # Uncontrollable
[-5.3747e-18 0.0000e+00 2.7234e-02 7.8027e-01]
Tinv = np.vstack([CC, UC]).T
sys = sys_transform((A,B,C), Tinv)
sys_print(sys)
[[-1.6444e+00 -4.2188e-01 -1.1776e+01 -7.0455e+00 1.6019e+00] [-5.4315e-01 -1.3556e+00 1.1644e+01 5.0474e+00 2.7173e-01] [ 0.0000e+00 0.0000e+00 -3.0000e+00 0.0000e+00 0.0000e+00] [-4.1971e-16 3.3005e-15 1.4211e-14 -4.0000e+00 4.7184e-16] [-1.6856e+00 -1.1035e+00 1.0000e+00 3.1530e+00 0.0000e+00]]
Using a similar approach, we can characterize the Observability.
If we want $x_0$ to be observable, then $$ y(t) = C\exp(At)x_0 \neq 0 $$ Knowing $\exp(At)=\sum_{i=0}^{N-1}\alpha_i(t)A^i$, then $x_0$ should not be in the null spaces of $CA^i$, $i=0,1,\cdots,N-1$
In other words, $x_0$ must be in the row space of the Observability matrix $$ Q=\begin{bmatrix} C \\ CA \\ \vdots \\ CA^{N-1} \end{bmatrix} $$
Now, suppose we have measurements $y(\tau_i)$ over interval $[0,t]$ with step size $\Delta \tau$, and want to infer $x_0$, $$ Y = Mx_0 $$ where $$ Y = \begin{bmatrix} \vdots \\ y(\tau_i) \\ \vdots \end{bmatrix},\quad M = \begin{bmatrix} \vdots \\ C\exp(A\tau_i) \\ \vdots \end{bmatrix} $$
Then $$ x_0 = (M^TM)^{-1}M^TY $$ with $$ Q = M^TM = \sum_{i=0}^{N_t} \exp(A^T\tau_i)C^TC\exp(A\tau_i) $$
Taking the limit, we can obtain an Observability Gramian
$$ \cQ = \int_0^t \exp(A^T(t-\tau))C^TC\exp(A(t-\tau)) d\tau = \int_0^t \exp(A^T\tau)C^TC\exp(A\tau) d\tau $$which apparently is positive semi-definite.
Physically, if we compute a $\cQ$-norm of an initial state $x_0$, $$ \norm{x_0}_{\cQ} = x_0^T\cQ x_0 = \int_0^t x_0^T\exp(A^T\tau)C^TC\exp(A\tau)x_0 d\tau = \int_0^t \norm{y(t)}^2 d\tau $$ which we can interpret as the observed energy from $x_0$.
Intuitively, the more energy from $x_0$ can be observed, the easier to observe/measure $x_0$.
Given the initial unit magnitude, the most observable state is the max eigenvector of $\cQ$, as $$ \max_{\norm{x}=1}\norm{x}_{\cQ} = \lambda_\max $$
Q = comp_Q(A, C) # The observability matrix
u, s, vh = np.linalg.svd(Q)
print(s) # This also has two non-zero SVs
OO = vh.conj()[:2] # Observable subspace
UO = vh.conj()[2:] # Unobservable
[1.1409e+02 1.0070e+01 4.4769e-15 3.5886e-16]
Tinv = np.vstack([OO, UO]).T
sys = sys_transform((A,B,C), Tinv)
sys_print(sys)
[[-3.4355e+00 -4.3549e-01 4.2811e-16 -3.4071e-17 2.3277e-01] [ 1.4353e+00 -1.5645e+00 -2.1000e-16 1.3753e-16 -7.6725e-01] [-3.1755e-01 -2.5134e+00 -3.4711e+00 -2.8803e-01 1.1711e+00] [-1.3627e+01 -1.1650e+01 -4.5375e+00 -1.5289e+00 7.9099e-01] [-1.2932e-01 3.8708e+00 1.4433e-15 -3.3307e-16 0.0000e+00]]
cQ = comp_cQ_inf(A, C) # The observability Gramian
w, v = spl.eigh(cQ)
print(w) # This has two non-zero EVs; ascending order
Os = v[:,::-1].T
[-1.5346e-16 3.8311e-16 1.7595e-01 4.7740e+00]
s0 = sps.lti(A, B, C, [0]); ts = np.linspace(0,5,101); stys = ['b-', 'r--', 'k:', 'g-']
f = plt.figure(figsize=(6,4))
for _i in range(4):
t, y, x = sps.lsim(s0, 0, ts, X0=Os[_i])
plt.plot(t, y, stys[_i], label=f"Eigenvector {_i}")
print(np.sum(y**2)*t[1])
plt.legend(); plt.xlabel('t'); plt.ylabel('y');
5.124306321170764 0.21068233773855669 2.820288270249914e-31 7.651009482383496e-31
The primal
$$ \begin{align*} \dot{x} &= Ax+Bu \\ y &= Cx+Du \end{align*} $$The dual
$$ \begin{align*} \dot{z} &= A^Tz+C^Tv \\ w &= B^Tz+D^Tv \end{align*} $$The observability/controllability matrices/gramians of the primal are ...
... the controllability/observability matrices/gramians of the dual
First think about a simple case, $$ \begin{align*} \min_y &\ J(u) = \beta^T u \\ \mathrm{s.t.} &\ y=Au \end{align*} $$ To find the sensitivity $dJ/dy$ - how to change $u$ to produce change in $y$. $$ \begin{align*} \cL(u,y,\lambda) &= J(u) + \lambda^T (y-Au) \\ \Rightarrow \frac{\partial\cL}{\partial u} &= \beta - A^T\lambda = 0 \end{align*} $$ Then $\lambda=A^{-T}\beta$, and $$ \frac{dJ}{dy} = \frac{\partial\cL}{\partial y} = \lambda $$ i.e., $\lambda$ is the sensitivity.
Now we extend the linear matrices and vectors to linear operators $$ \begin{align*} \min&\ J[u(t)] = <\beta(t), u(t)> \\ \mathrm{s.t.}&\ y(t) = \cA[u(t)] \end{align*} $$ where $$ <\beta, u> = \int_0^T \beta(t)^T u(t) dt,\quad \cA[u] = \int_0^t C\exp(A(t-\tau))Bu(\tau) d\tau + Du(t) $$
Work on the new Lagrangian $$ \begin{align*} \cL[u,y,\lambda] &= J[u] + <\lambda, y - \cA[u]> \\ \delta\cL &= <\beta,\delta u> - <\lambda, \cA[\delta u]> + <\lambda, \delta y> \\ &= <\beta,\delta u> - <\cA^T\lambda, \delta u> + <\lambda, \delta y> \\ &= <\beta - \cA^T\lambda, \delta u> + <\lambda, \delta y> \end{align*} $$ Then formally we get sensitivity $dJ/dy=\lambda$, with $\lambda=\cA^{-T}\beta$.
But what is $A^T$ in the sense of linear operator?
For our particular case, we just manipulate the integral to find the adjoint $$ \begin{align*} <\lambda, \cA[\delta u]> &= \int_0^T \lambda(t)^T \left(\int_0^t C\exp(A(t-\tau))B\delta u(\tau) d\tau + D\delta u(t)\right) dt \\ &= \int_0^T \int_0^t \lambda(t)^T C\exp(A(t-\tau))B\delta u(\tau) d\tau dt + \int_0^T\lambda(t)^T D\delta u(t) dt \end{align*} $$ The second integral is easy to treat: $$ \int_0^T \left(D^T\lambda(t)\right)^T \delta u(t) dt $$ For the first one, note an identity: $$ \int_0^T\int_0^t f(t,\tau)d\tau dt = \int_0^T\int_\tau^T f(t,\tau)dtd\tau $$
Then the first integral becomes $$ \begin{align*} &\ \int_0^T \int_0^t \lambda(t)^T C\exp(A(t-\tau))B\delta u(\tau) d\tau dt \\ &= \int_0^T \int_\tau^T \lambda(t)^T C\exp(A(t-\tau))B\delta u(\tau) dt d\tau \\ &= \int_0^T \left(\int_\tau^T \lambda(t)^T C\exp(A(t-\tau))B dt\right) \delta u(\tau) d\tau \\ &= \int_0^T \left(\int_\tau^T B^T\exp(A^T(t-\tau))C^T \lambda(t) dt\right)^T \delta u(\tau) d\tau \\ &= \int_0^T \left(\int_t^T B^T\exp(A^T(\tau-t))C^T \lambda(\tau) d\tau\right)^T \delta u(t) dt \end{align*} $$ Note the swap of variable names in the end: $t\rightarrow \tau$, $\tau\rightarrow t$.
From the two integrals, we obtain $$ \cA^T\lambda = \int_t^T B^T\exp(A^T(\tau-t))C^T\lambda(\tau) d\tau + D^T\lambda(t) $$ Corresponds exactly to the dual system!
The concepts of Controllability and Observability are coordinate independent.
However, the Gramians are not.
Given a coordinate transformation $x'=Tx$, one can verify that the new Gramians are $$ \cP' = T\cP T^T,\quad \cQ' = T^{-T}\cQ T^{-1} $$
It would be a good idea to balance the two via a certain coordinate transformation.
The hope is to have the same magnitude (if possible) for $\norm{\cP'}$ and $\norm{\cQ'}$ (for a certain norm). This is equivalent to have $$ \min_T \mathrm{trace}(\cP'+\cQ') $$
It turns out the trace minimizes when $\cP'=\cQ'=\Sigma$, where $\Sigma$ is a diagonal matrix of Hankel singular values
Hankel singular values are the SVs of $\cP\cQ$.
Algorithmically, we can find the balancing transformation by
The balancing transformation naturally gives us a coordinate system that
As a result, we could truncate out the last states, that are the least controllable and observable.
In practice, just replace "SVD" with "truncated SVD" in the balancing transformation!
Consider the transfer function for the linear system $Y(s)=G(s)U(s)$ $$ G(s) = C(sI-A)^{-1}B $$
Suppose we truncate the states, so that the system matrices become $(\tilde{A},\tilde{B},\tilde{C})$, then $$ \tilde{G}(s) = \tilde{C}(sI-\tilde{A})^{-1}\tilde{B} $$
A natural idea is to quantify the truncation error by $$ \norm{G(s)-\tilde{G}(s)} $$ But we need to choose the appropriate norm.
$G$ is a linear mapping from $U$ to $Y$.
Approximately $\sup$ is like $\max$ - just for math rigor
Effectively this is equivalent to $$ \norm{G(s)}_{H_\infty} = \sup_u \sqrt{\frac{ \int_0^\infty\norm{y(t)}^2dt }{ \int_0^\infty\norm{u(t)}^2dt }} $$
In other words, we can quantify $$ \norm{G(s)-\tilde{G}(s)} $$ by the output error $$ \norm{y(t)-\tilde{y}(t)} $$ given the same input $u(t)$
Then we can find a formula that is very similar to truncated SVD: $$ \norm{G(s)-\tilde{G}(s)} \leq 2\sum_{i=r}^N \sigma_i $$ where $\sigma_i$ are the truncated distinct singular values.
Requires the computation of Gramians, which is $O(N^3)$ and hence expensive.
An alternative is the Balanced Proper Orthogonal Decomposition (BPOD).
The key idea is to note the impulse response with zero initial condition $$ x_i(t) = \exp(At)b_i $$ Then the controllability Gramian is approximated as $$ \cP = \int_0^\infty \sum_{i}^p x_i(t)x_i(t)^T dt \approx \sum_{j=1}^N \sum_{i}^p x_i(t_j)x_i(t_j)^T\Delta t \equiv XX^T $$
Similarly, if one has access to the dual system and obtain the impulse responses, then the observability Gramian is obtained as well. $$ \cQ\approx YY^T $$
# Contr Gramian, standard vs. data-driven
P0 = comp_cP_inf(A, B)
P1 = comp_cP_approx(A, B)
print(np.linalg.norm(P0-P1)/np.linalg.norm(P0))
# Obser Gramian, standard vs. data-driven
Q0 = comp_cQ_inf(A, C)
Q1 = comp_cQ_approx(A, C)
print(np.linalg.norm(Q0-Q1)/np.linalg.norm(Q0))
0.034218825534171514 0.0317342309031413
Then one can just do $$ Y^TX = W\Sigma V^T $$ and the rest follows.
A conventional approach for model order reduction is by Proper Orthogonal Decomposition (i.e., without "balanced")
One just takes the impulse responses $X$ and perform (truncated) SVD $$ X = W\Sigma V^T $$ and assume the states are approximated by $$ x(t) = Wz(t) $$ so $$ \dot{x}=Ax+Bu \Rightarrow W\dot{z} = AWz + Bu \Rightarrow \dot{z} = W^TAWz + W^TBu $$ This is effectively picking $T=W^T$ and $T^{-1}=W$.
But now we know simple POD is not the best choice!
Tinv, T = bal_trunc((A,B,C)) # Balanced truncation
Ab, Bb, Cb = sys_transform((A,B,C), Tinv, T=T)
sys_print((Ab, Bb, Cb))
Ar, Br, Cr = np.array([[Ab[0,0]]]), np.array([[Bb[0,0]]]), np.array([[Cb[0,0]]])
[[-2.0000e+00 -9.2839e-12 1.6585e-08 -2.2667e-04 1.7321e+00] [ 1.7273e-04 -1.0000e+00 -8.7400e-04 -1.3968e-08 -7.4795e-05] [ 4.8527e-09 -2.2365e-04 -4.8423e+00 -1.8382e-04 1.2525e-08] [-1.0632e-04 6.8819e-01 1.4101e+04 5.3529e-01 4.0600e-05] [-1.7321e+00 -8.1855e-12 1.4363e-08 -1.9631e-04 0.0000e+00]]
Tinv, T = cP_trunc((A,B,C), 1) # POD
Ap, Bp, Cp = sys_transform((A,B,C), Tinv, T=T)
sys_print((Ap, Bp, Cp))
[[-1.6444 1.6019] [-1.6856 0. ]]
Tinv, T = bpod_trunc((A,B,C), 1) # Balanced POD
Ad, Bd, Cd = sys_transform((A,B,C), Tinv, T=T)
sys_print((Ad, Bd, Cd))
[[-2. -1.7321] [ 1.7321 0. ]]
f = plt.figure(figsize=(6,4))
for _i in range(4):
# t, y = sps.impulse(ss[_i], T=ts)
t, y = sps.step(ss[_i], T=ts)
plt.plot(t, y, stys[_i], label=lbls[_i])
plt.legend()
plt.xlabel('t')
plt.ylabel('y');