Machine Learning for Engineering¶

Math Background: Solving Differential Equations¶

Instructor: Daning Huang¶

TODAY: Numerical Algorithms for DEs¶

Galerkin method

References:¶

[Strang]

Previously...¶

From undergrad math, one should already know how to numerically solve initial value problems (IVPs) of ODEs by, e.g.,

Euler methods (explicit, implicit, improved, etc.)
Runge-Kutta family

as well as IVPs and boundary value problems of PDEs by, e.g.,

Finite difference (central difference, Crank-Nicolson, etc.)
Finite element for specific applications: bars, beams, etc.

We focus on a more general treatment of DEs from the point of view of Galerkin method.

Model problems¶

Let's start with a linear PDE, $$ \begin{align*} L[u] &= f,\ \text{in}\ \Omega \\ B[u] &= g,\ \text{on}\ \partial\Omega \end{align*} $$ where $L$ and $B$ are linear operators, $\Omega\in\bR^n$, $\partial\Omega$ denotes the boundary.

Formally, in this lecture, we consider $2N$th-order PDEs (i.e., even).

The boundary conditions (BCs) of such PDEs have derivatives up to $(2N-1)$th-order, resulting in $2N$ types of BCs.

Those of order $< N$ are called essential BCs.
Those of order $\geq N$ are called natural BCs.

Also,

BCs that equal to zero are homogeneous.
BCs that are non-zero are non-homogeneous.

A classical 2D example is the steady heat transfer on a rectangular domain $\Omega=[0,a]\times[0,b]$, $$ \begin{align*} &\Delta u \equiv u_{xx}+u_{yy} = f \\ &u(x,0) = g_1(x),\ u_y(x,b) = g_2(x),\ u(0,y) = 0,\ u_x(a,y) = 0 \end{align*} $$ where we dilibrately introduced 4 different BCs:

Essential BCs, Temperate $u$
- $u(x,0)$ is non-homogeneous and $u(0,y)$ is homogeneous.
- These are also called Dirichlet BC.
Natural BCs, heat flux $u'$
- $u_y(x,b)$ is non-homogeneous and $u_x(a,y)$ is homogeneous.
- These are also called Neumann BC.

Another example is the beam vibration $\Omega=[0,a]\times[0,\infty]$, $$ m \ddot{u} + EI u'''' = f $$ where $\dot{u}=\partial u/\partial t$ and $u'=\partial u/\partial x$, with possibly

Essential BCs: Displacement $u$ and slope $u'$ at end points.
Natural BCs: Moment $u''$ and shear force $u'''$ at end points.

For the ease of discussion next, we focus on a 1D problem, which technically we can solve by hand. $$ u''(x)=f(x),\ x\in[a,b] $$ with either essential or natural BC's at $x=a$ and $x=b$, to be specified and treated later.

Core idea of Galerkin¶

We represent the unknown function by a linear combination of trial functions, $$ u_N(x)=\sum_{i=1}^N w_i\phi_i(x)\equiv w^T \phi(x) $$ with the hope that $u_N$ approximately satisfies the DE.

In fact we can define the residual, $$ r(x) = u_N''(x)-f(x) = w^T\phi''(x)-f(x) $$ and hope $r(x)$ is close to zero "everywhere" in $\Omega$.

To enforce $r(x)$ as close as possible to zero, we introduce the test functions, $$ \int_a^b \psi_i(x) r(x) dx=0,\ i=1,2,\cdots,N $$ which is effectively a weighted sum of residual over $\Omega$.

If we define the inner product, $$ <\psi_i,r> = \int_a^b \psi_i(x) r(x) dx $$ Then we are effectively projecting the residual onto the subspace spanned by $\psi$, and enforce the projected quantities to be zero.

Instead of "strongly" enforcing $$ r(x)=0 $$ everywhere in $\Omega$, we are "weakly" requiring $$ <\psi,r>=0 $$ which means $r(x)$ can be non-zero in $\Omega$. This inner product form is referred to as weak form.

In other words, Galerkin method uses weak form to solve a DE.

The projection results in a system of linear equations, $$ Aw=b $$ where $$ A_{ij} = <\psi_i,\phi_j''>,\quad b=<\psi_i,f> $$ and we can solve for $w$ and find $u_N(x)$.

Special case: Collocation method¶

A naive choice of $\psi$ is a series of Dirac delta functions, $$ \psi_i(x)=\delta(x-x_i) $$ at a set of "collocation points", $x_2,x_3,\cdots,x_{N-1}$, together with end points $x_1=a$ and $x_N=b$.

Using property, $$ \int_{-\infty}^\infty \delta(x-a)f(x) dx = f(a) $$ The inner product equations simplify to $$ <\psi_i,r> = 0\ \Rightarrow\ r(x_i)=0\ \Rightarrow\ A_{ij}=\phi_j''(x_i) $$ for $i=2,3,\cdots,N-1$.

Then at end points, we explicitly enforce the given BCs. For example, if we had $$ u(a)=u_0,\ u'(b)=n_1 $$ Then we have two extra equations, $$ \phi(x_1)^Tw=u_0,\ \phi'(x_N)^Tw=n_1 $$ which completes the linear system of equations, $Aw=b$, and one can solve for $w$.

General treatment¶

But for many, if not most, applications, we hope to choose $\phi$ and $\psi$ to be orthogonal bases of functional space containing the solution $u$, so as to achieve good convergence rate and numerical accuracy.

In general, we have

Bubnov-Galerkin (or just Galerkin) method, where $\phi=\psi$, that is effective for conservative systems (e.g., Hamiltonian dynamics), where the DEs are of even-order.
Petrov–Galerkin method, where $\phi\neq\psi$, that is effective for DEs of odd-order. This is seen frequently in fluid dynamics problems.

In this lecture we focus on the former, so $\phi=\psi$.

Side note:

We are presenting the "spectral" Galerkin method, where we use global bases to discretize the DE.
There is also Galerkin "finite element" method, that discretizes the domain into many pieces and uses local bases.
Petrov-Galerkin is more relevant for the latter case - search for "upwinding" for more details.

To develop the standard Galerkin methods, we need to manipulate $A_{ij} = <\psi_i,\phi_j''>$ more, because,

There is an uneven requirement of smoothness of $\psi$ and $\phi$. In particular, $\phi$ needs to be $C^2$, which might be inconvenient to achieve.
it is not straightforward to incorporate the BCs.

Hence we introduce the weak form to solve both problems above.

We apply the integration by parts to the weighted residual $$ \begin{align*} <\psi_i,r> &= \int_a^b \psi_i(x) (u''(x)-f(x)) dx \\ &= \psi_i(b) u'(b) - \psi_i(a) u'(a) - \int_a^b \psi_i'(x) u'(x) dx - \int_a^b \psi_i(x) f(x) dx \end{align*} $$ Then,

$\psi$ and $\phi$ only need to be $C^1$.
BCs appear automatically.

Also note that in the BC terms

Test functions $\psi$ are associated with the essential BCs (i.e., lower-order derivatives)
Trial functions $\phi$ are associated with the natural BCs (i.e., higher-order derivatives)

The specific choice of $\psi$ and $\phi$ and the BC treatment results in different numerical methods/schemes, which we discuss next.

Standard Galerkin method¶

We organize the weighted residual as $$ \int_a^b \psi_i'(x) u'(x) dx = \psi_i(b) u'(b) - \psi_i(a) u'(a) - \int_a^b \psi_i(x) f(x) dx $$ and knowing $u\approx u_N(x)=\phi(x)^Tw$, define, $$ \begin{align*} A_{ij} &= <\psi_i',\phi_j'> = \int_a^b \psi_i'(x) \phi_j'(x) dx \\ b_i &= -<\psi_i,f> = -\int_a^b \psi_i(x) f(x) dx \\ \bar{b}_i &= \psi_i(b) u'(b) - \psi_i(a) u'(a) \end{align*} $$ and then $$ Aw=b+\bar{b} $$

Since $\phi=\psi$, $A$ is symmetric, which is numerically favorable (faster to solve).

Homogeneous essential BCs¶

In this case, the essential BCs are zero, with possibly non-zero natural BCs.

We choose $\psi$ to be zero at essential BCs and free at the other BCs.

For example,

$u(a)=0$, $u(b)=0$:
- Let $\psi(a)=\psi(b)=0$, e.g., $\psi_i(x)=\sin(i\pi x/a)$.
- All BC terms vanish, so $\bar{b}=0$.

$u(a)=0$, $u'(b)=0$:
- Let $\psi(a)=0$, e.g., $\psi_i(x)=\sin(i\pi x/(2a))$, $\psi_i(x)=x^i$, etc.
- Since $u'(b)=0$, all BC terms still vanish, so $\bar{b}=0$.

$u(a)=0$, $u'(b)=n_1$:
- Let $\psi(a)=0$, like the previous example.
- But now only one BC term vanishes, and $\bar{b}_i=\psi_i(b)n_1$.

Non-homogeneous essential BCs¶

In this case, the essential BCs are non-zero.

We still choose $\psi$ to be zero at essential BCs and free at the other BCs, so are $\phi$.

But now we also need to choose a special trial function $\phi_0(x)$, that satisfies the non-zero essential BCs, so $$ u(x) \approx u_N(x) = \phi(x)^Tw + \phi_0(x) $$

Then the LHS of the weighted residual is $$ \begin{align*} <\psi_i', u'> &\approx <\psi_i', \phi'^Tw + \phi_0'> \\ &= <\psi_i', \phi'^T>w + <\psi_i', \phi_0'> \\ &= \sum_{j=1}^N A_{ij}w_j - \hat{b}_i \end{align*} $$ where we define $\hat{b}_i=-<\psi_i', \phi_0'>$.

The modified linear system is thus $$ Aw = b+\bar{b}+\hat{b} $$ where $\hat{b}$ effectively accounts for the non-zero essential BCs.

For example

$u(a)=u_0$, $u'(b)=n_1$:
- Let $\psi(a)=0$, e.g., $\psi_i(x)=x^i$.
- Define $\phi_0(x)=u_0+n_1(x-a)$.
- We have both $\bar{b}_i=\psi_i(b)n_1$ and $\hat{b}$ for BC terms.

Finite-element-like approach¶

In the standard approach, we choose appropriate $\psi$ and $\phi$ to satisfy BCs, so that when forming the linear system, all the BCs are automatically satisfied. However, treating non-homogeneous essential BCs require devising a special trial function, that is sometimes (or often) cumbersome to do.

An alternative approach is to employ a finite-element-like approach, that works the other way around.

Select a set of $\psi$, that do not necessarily satisfy essential BCs.
Derive the linear system "pretending" $\psi$ satisfy homogeneous essential BCs.
Add linear constraint on $w$ to enforce the essential BCs (both homogeneous and non-homogeneous).

For example, if $u(a)=u_0$ and $u(b)=u_1$, we require $$ Bw=c $$ where $$ B=[\phi(a),\ \phi(b)]^T,\quad c=[u_0,u_1]^T $$

Then we solve an augmented system: $$ Aw=b,\ Bw=c $$ where $\bar{b}=0$ following the same argument in standard approach (i.e., homogeneous BCs for $\psi$).

To be precise, the previous formulation should be derived via variational calculus, which would lead us to the finite element method.

Alternatively, one could also use the concept of projection and stay within the Galerkin approach. See the 2D case for more details.

Solving the augmented system - Exact¶

The linear system is solved via an equality-constrained optimization: $$ \begin{align*} \min_w &\ w^TAw-w^Tb \\ s.t. &\ Bw=c \end{align*} $$ which translates to a linear system with Lagrange multipliers (i.e., the stationarity condition) $$ \begin{bmatrix} A & B^T \\ B & O \end{bmatrix} \begin{bmatrix} w \\ \lambda \end{bmatrix} = \begin{bmatrix} b \\ c \end{bmatrix} $$

We do not necessarily need to solve the entire system to find $w$. Instead, we attempt to cancel out $\lambda$ to obtain a reduced system.

For the equality constraint, from least-norm problem, we know $$ w=w^\parallel + w^\perp $$ where $$ w^\parallel = B^+ c, Bw^\perp=0 $$ $B^+$ is the pseudoinverse of $B$.

Define a projection matrix $P=I-B^+B$, then $$ Pw = (I-B^+B)(B^+ c + w^\perp) = (I-B^+B)w^\perp = w^\perp $$ so $$ w=w^\parallel + Pw $$

Then the first equation of stationarity condition is $$ \begin{align*} Aw + B^T\lambda &= b \\ A(w^\parallel + Pw) + B^T\lambda &= b \\ PA(w^\parallel + Pw) + \underbrace{PB^T\lambda}_{=0} &= Pb \\ PAPw &= P(b - Aw^\parallel) \end{align*} $$ and if its solution is $w^*$, then the final solution is $$ w=w^\parallel + Pw^* $$

Solving the augmented system - Approximate¶

An alternative, approximate approach is to use a penalty method. We solve an unconstrained problem instead, $$ \min_w \ w^TAw-w^Tb + \mu \lVert Bw-c \rVert^2 $$ with sufficiently large $\mu$.

Easy to check that the extremum is achieved by $$ (A+\mu B^TB)w = b+\mu B^T c $$

Note that this formulation only approximately satisfies the constraint - only exact when $\mu\rightarrow\infty$.

2D Case¶

Slightly more general treatment; skip in class.

We consider a steady advection-diffusion equation, where $c\in R^2$ is constant. We delibrately include all possible (simple) BCs

$$ \begin{align*} c\cdot\nabla u - \Delta u = f(x,y),&\quad \text{in}\quad \Omega \\ u(x,y) = 0 &\quad \text{on}\quad \Gamma_d^h \\ u(x,y) = g_1(x,y) &\quad \text{on}\quad \Gamma_d^n \\ n\cdot\nabla u(x,y) = 0 &\quad \text{on}\quad \Gamma_n^h \\ n\cdot\nabla u(x,y) = g_2(x,y) &\quad \text{on}\quad \Gamma_n^n \end{align*} $$

Domain: $\Omega$

Boundary: $\Gamma=\partial\Omega\equiv \Gamma_d\cup\Gamma_n$

Dirichlet BC: $\Gamma_d = \Gamma_d^h \cup \Gamma_d^n$

Neumann BC: $\Gamma_n = \Gamma_n^h \cup \Gamma_n^n$

Initial weak form derivation¶

Select $\psi$ that satisfies $\psi=0$ on $\Gamma_d$.

$$ \begin{align*} <\psi, c\cdot\nabla u - \Delta u> &= <\psi, f> \\ <\psi, c\cdot\nabla u> - <\psi, \Delta u> &= <\psi, f> \end{align*} $$

In the following we will need the integral-by-parts formula: $$ \begin{align*} \int_\Omega \psi\nabla\cdot u d\Omega &= \int_\Gamma \psi u\cdot n d\Gamma - \int_\Omega \nabla \psi \cdot u d\Omega \\ \int_\Omega \psi\Delta u d\Omega &= \int_\Gamma \psi (\nabla u\cdot n) d\Gamma - \int_\Omega \nabla \psi\cdot\nabla u d\Omega \end{align*} $$

For the diffusion term: $$ \begin{align*} <\psi, \Delta u> &= \int_\Omega \psi \Delta u d\Omega \\ &= \int_\Gamma \psi (\nabla u\cdot n) d\Gamma - \int_\Omega \nabla \psi\cdot\nabla u d\Omega \\ &= \int_\Gamma \psi (n\cdot\nabla u) d\Gamma - < \nabla \psi, \nabla u > \end{align*} $$

On $\Gamma_d$, $\psi=0$

On $\Gamma_n^h$, $n\cdot\nabla u=0$

On $\Gamma_n^n$, $n\cdot\nabla u=g_2$

So $$ <\psi, \Delta u> = \int_{\Gamma_n^n} \psi g_2 d\Gamma - < \nabla \psi, \nabla u > $$

For the advection term: $$ \begin{align*} <\psi, c\cdot\nabla u> &= \int_\Omega \psi (c\cdot\nabla u) d\Omega \\ &= \int_\Omega \psi\nabla\cdot (cu) d\Omega \\ &= \int_\Gamma \psi (cu)\cdot n d\Gamma - \int_\Omega \nabla \psi \cdot (cu) d\Omega \\ &= \int_\Gamma \psi u (c\cdot n) d\Gamma - < c\cdot\nabla \psi, u > \end{align*} $$

Only know $\psi=0$ on $\Gamma_d$.

So $$ <\psi, c\cdot\nabla u> = \int_{\Gamma_n} \psi u (c\cdot n) d\Gamma - < c\cdot\nabla \psi, u > $$

The weak form is now organized as,

$$ - < c\cdot\nabla \psi, u > + \int_{\Gamma_n} \psi u (c\cdot n) d\Gamma + < \nabla \psi, \nabla u > = <\psi, f> + \int_{\Gamma_n^n} \psi g_2 d\Gamma $$

The natural BCs are explicitly incorporated, and no further treatment is needed.

Treatment of essential BCs¶

Define $u=\phi^Tw+\phi_0$, where:

$\phi_i=0$ on $\Gamma_d$

$\phi_0=0$ on $\Gamma_d^h$, $\phi_0=g_1$ on $\Gamma_d^n$

Now consider $\psi=\phi$ and arbitrary choice of basis $\varphi$ that might not satisfy the essential BCs.

$\phi_0 = \omega^T \varphi$: Use $\varphi$ to represent the non-homogeneous solution. $\omega$ can be found by, e.g., collocation at $\Gamma_d$, which results in an under-determined equation $B\omega=c$.
$\phi = P\varphi$: Project the basis with $P=I-B^+B$.

For clarity, we consider separately

The diffusion part

$$ < \nabla \psi, \nabla \phi >w ;\quad <\psi, f> + \int_{\Gamma_n^n} \psi g_2 d\Gamma - < \nabla \psi, \nabla \phi_0 > $$

The advection part

$$ - < c\cdot\nabla \psi, \phi >w + \left(\int_{\Gamma_n} \psi \phi (c\cdot n) d\Gamma\right)w ;\quad < c\cdot\nabla \psi, \phi_0 > - \int_{\Gamma_n} \psi \phi_0 (c\cdot n) d\Gamma $$

Substitute $\varphi$ into the diffusion part, we get

$$ \begin{align*} P< \nabla \varphi, \nabla \varphi >P w &;\quad P<\varphi, f> + P\int_{\Gamma_n^n} \varphi g_2 d\Gamma - P< \nabla \varphi, \nabla \varphi > \omega \\ \Rightarrow\quad PA_dP w &;\quad P(b_1 + b_2 - A_d\omega) \end{align*} $$

This is what was derived earlier for 1D case.

Similarly for the advection part, we get

$$ P(-A_a+N)Pw ;\quad P(A_a-N)\omega $$

where

$$ A_a = < c\cdot\nabla \varphi, \varphi >,\quad N = \int_{\Gamma_n} \varphi \varphi (c\cdot n) d\Gamma $$

Together we have $$ PAPw = P(b - A\omega) $$ where $A=-A_a+N+A_d$.

The solution is $$ u = w^T\phi+\phi_0 = \varphi^T(Pw+\omega) \equiv \varphi^T\tilde{w} $$

This recovers the results from the "finite-element-like" approach.

Lastly, we can "reverse-engineer" the optimization problem, $$ \begin{bmatrix} A & B^T \\ B & O \end{bmatrix} \begin{bmatrix} \tilde{w} \ \lambda

\end{bmatrix}¶

\begin{bmatrix} b \\ c \end{bmatrix}

Numerical Example¶

We solve a 1D advection-diffusion equation $$ p u' - u'' = f(x),\quad u(0)=u_0,\ u'(1)=n_1 $$ where $p$ is a constant, representing wave speed.

Based on prior derivation, now it should be easy to obtain the following discretization, $$ (-pA-pN+D)w=b,\quad Bw=c $$ where $$ u=w^T\phi,\quad A=<\nabla\phi,\phi>,\quad D=<\nabla\phi,\nabla\phi>,\quad b=<\phi, f> + \phi(1)n_1 $$ and $$ N=\phi(1)\phi(1)^T-\phi(0)\phi(0)^T,\quad B=\phi(0)^T,\quad c=[u_0] $$

In [2]:

# We choose Legendre polynomials as our basis, scaled to [0,1]
x = np.linspace(0,1,41)
f, ax = plt.subplots(ncols=2, figsize=(8,4))
for _i in range(0,5):
    ax[0].plot(x, eval_leg0(_i, x), label=f"Order={_i}")
    ax[1].plot(x, eval_leg1(_i, x))
ax[0].legend()
ax[0].set_title("Legendre polynomials")
ax[1].set_title("Derivatives");

We use Gauss-Legendre quadratures to evaluate the integrals. $$ \int_{-1}^1 f(x)g(x) dx = \sum_{i=1}^N w_i f(x_i) g(x_i) $$ where $x_i$ are the roots of a $N$th order Legendre polynomial and $w_i$ are the quadrature weights.

A polynomial of order $2N-1$ is evaluated exactly using $N$ points.

We are looking at an interval of $[0,1]$ instead of $[-1,1]$, so the formula is slightly modified, $$ \int_{0}^1 f(x)g(x) dx = \sum_{i=1}^N \tilde{w}_i f(\eta_i) g(\eta_i) $$ where $\eta_i = 2x_i-1$ and $\tilde{w}_i = w_i/2$ are due to coordinate transformation.

Choosing $M$ polynomials, to compute each term in the linear system, we can first define matrices $\Phi_0,\Phi_1\in R^{M\times N}$, and $W\in R^{N\times N}$ $$ [\Phi_0]_{ij} = \phi_i(\eta_j),\quad [\Phi_1]_{ij} = \phi_i'(\eta_j),\quad [W]_{ii}=\tilde{w}_i $$ Then, $$ A=\Phi_1 W \Phi_0^T,\quad D=\Phi_1 W \Phi_1^T,\quad b=\Phi_0 W f $$ where $f_i=f(\eta_i)$.

In [3]:

# Numerical solution
M, N, p = 6, 10, 10.0
ub, nb = [1.0, None], [None, 1.0]
qs = [0.0, 0.0]
sol, _ = galerkin(M, N, p, ub, nb, qs)
compare_sol(sol, fu, 41);

In [4]:

Ms, Es = [4, 6, 8, 10, 12], []
for _M in Ms:
    sol, _ = galerkin(_M, _M+4, p, ub, nb, qs)
    Es.append( np.linalg.norm(eval_sol(sol, x) - fu(x)) )
f = plt.figure(figsize=(6,4))
plt.semilogy(Ms, Es)
plt.xlabel("# of basis")
plt.ylabel("Error");

In [ ]: