From undergrad math, one should already know how to numerically solve initial value problems (IVPs) of ODEs by, e.g.,
as well as IVPs and boundary value problems of PDEs by, e.g.,
We focus on a more general treatment of DEs from the point of view of Galerkin method.
Let's start with a linear PDE, $$ \begin{align*} L[u] &= f,\ \text{in}\ \Omega \\ B[u] &= g,\ \text{on}\ \partial\Omega \end{align*} $$ where $L$ and $B$ are linear operators, $\Omega\in\bR^n$, $\partial\Omega$ denotes the boundary.
Formally, in this lecture, we consider $2N$th-order PDEs (i.e., even).
The boundary conditions (BCs) of such PDEs have derivatives up to $(2N-1)$th-order, resulting in $2N$ types of BCs.
Also,
A classical 2D example is the steady heat transfer on a rectangular domain $\Omega=[0,a]\times[0,b]$, $$ \begin{align*} &\Delta u \equiv u_{xx}+u_{yy} = f \\ &u(x,0) = g_1(x),\ u_y(x,b) = g_2(x),\ u(0,y) = 0,\ u_x(a,y) = 0 \end{align*} $$ where we dilibrately introduced 4 different BCs:
Another example is the beam vibration $\Omega=[0,a]\times[0,\infty]$, $$ m \ddot{u} + EI u'''' = f $$ where $\dot{u}=\partial u/\partial t$ and $u'=\partial u/\partial x$, with possibly
For the ease of discussion next, we focus on a 1D problem, which technically we can solve by hand. $$ u''(x)=f(x),\ x\in[a,b] $$ with either essential or natural BC's at $x=a$ and $x=b$, to be specified and treated later.
We represent the unknown function by a linear combination of trial functions, $$ u_N(x)=\sum_{i=1}^N w_i\phi_i(x)\equiv w^T \phi(x) $$ with the hope that $u_N$ approximately satisfies the DE.
In fact we can define the residual, $$ r(x) = u_N''(x)-f(x) = w^T\phi''(x)-f(x) $$ and hope $r(x)$ is close to zero "everywhere" in $\Omega$.
To enforce $r(x)$ as close as possible to zero, we introduce the test functions, $$ \int_a^b \psi_i(x) r(x) dx=0,\ i=1,2,\cdots,N $$ which is effectively a weighted sum of residual over $\Omega$.
If we define the inner product, $$ <\psi_i,r> = \int_a^b \psi_i(x) r(x) dx $$ Then we are effectively projecting the residual onto the subspace spanned by $\psi$, and enforce the projected quantities to be zero.
Instead of "strongly" enforcing $$ r(x)=0 $$ everywhere in $\Omega$, we are "weakly" requiring $$ <\psi,r>=0 $$ which means $r(x)$ can be non-zero in $\Omega$. This inner product form is referred to as weak form.
In other words, Galerkin method uses weak form to solve a DE.
The projection results in a system of linear equations, $$ Aw=b $$ where $$ A_{ij} = <\psi_i,\phi_j''>,\quad b=<\psi_i,f> $$ and we can solve for $w$ and find $u_N(x)$.
A naive choice of $\psi$ is a series of Dirac delta functions, $$ \psi_i(x)=\delta(x-x_i) $$ at a set of "collocation points", $x_2,x_3,\cdots,x_{N-1}$, together with end points $x_1=a$ and $x_N=b$.
Using property, $$ \int_{-\infty}^\infty \delta(x-a)f(x) dx = f(a) $$ The inner product equations simplify to $$ <\psi_i,r> = 0\ \Rightarrow\ r(x_i)=0\ \Rightarrow\ A_{ij}=\phi_j''(x_i) $$ for $i=2,3,\cdots,N-1$.
Then at end points, we explicitly enforce the given BCs. For example, if we had $$ u(a)=u_0,\ u'(b)=n_1 $$ Then we have two extra equations, $$ \phi(x_1)^Tw=u_0,\ \phi'(x_N)^Tw=n_1 $$ which completes the linear system of equations, $Aw=b$, and one can solve for $w$.
But for many, if not most, applications, we hope to choose $\phi$ and $\psi$ to be orthogonal bases of functional space containing the solution $u$, so as to achieve good convergence rate and numerical accuracy.
In general, we have
In this lecture we focus on the former, so $\phi=\psi$.
Side note:
To develop the standard Galerkin methods, we need to manipulate $A_{ij} = <\psi_i,\phi_j''>$ more, because,
Hence we introduce the weak form to solve both problems above.
We apply the integration by parts to the weighted residual $$ \begin{align*} <\psi_i,r> &= \int_a^b \psi_i(x) (u''(x)-f(x)) dx \\ &= \psi_i(b) u'(b) - \psi_i(a) u'(a) - \int_a^b \psi_i'(x) u'(x) dx - \int_a^b \psi_i(x) f(x) dx \end{align*} $$ Then,
Also note that in the BC terms
The specific choice of $\psi$ and $\phi$ and the BC treatment results in different numerical methods/schemes, which we discuss next.
We organize the weighted residual as $$ \int_a^b \psi_i'(x) u'(x) dx = \psi_i(b) u'(b) - \psi_i(a) u'(a) - \int_a^b \psi_i(x) f(x) dx $$ and knowing $u\approx u_N(x)=\phi(x)^Tw$, define, $$ \begin{align*} A_{ij} &= <\psi_i',\phi_j'> = \int_a^b \psi_i'(x) \phi_j'(x) dx \\ b_i &= -<\psi_i,f> = -\int_a^b \psi_i(x) f(x) dx \\ \bar{b}_i &= \psi_i(b) u'(b) - \psi_i(a) u'(a) \end{align*} $$ and then $$ Aw=b+\bar{b} $$
Since $\phi=\psi$, $A$ is symmetric, which is numerically favorable (faster to solve).
In this case, the essential BCs are zero, with possibly non-zero natural BCs.
We choose $\psi$ to be zero at essential BCs and free at the other BCs.
For example,
In this case, the essential BCs are non-zero.
We still choose $\psi$ to be zero at essential BCs and free at the other BCs, so are $\phi$.
But now we also need to choose a special trial function $\phi_0(x)$, that satisfies the non-zero essential BCs, so $$ u(x) \approx u_N(x) = \phi(x)^Tw + \phi_0(x) $$
Then the LHS of the weighted residual is $$ \begin{align*} <\psi_i', u'> &\approx <\psi_i', \phi'^Tw + \phi_0'> \\ &= <\psi_i', \phi'^T>w + <\psi_i', \phi_0'> \\ &= \sum_{j=1}^N A_{ij}w_j - \hat{b}_i \end{align*} $$ where we define $\hat{b}_i=-<\psi_i', \phi_0'>$.
The modified linear system is thus $$ Aw = b+\bar{b}+\hat{b} $$ where $\hat{b}$ effectively accounts for the non-zero essential BCs.
For example
In the standard approach, we choose appropriate $\psi$ and $\phi$ to satisfy BCs, so that when forming the linear system, all the BCs are automatically satisfied. However, treating non-homogeneous essential BCs require devising a special trial function, that is sometimes (or often) cumbersome to do.
An alternative approach is to employ a finite-element-like approach, that works the other way around.
For example, if $u(a)=u_0$ and $u(b)=u_1$, we require $$ Bw=c $$ where $$ B=[\phi(a),\ \phi(b)]^T,\quad c=[u_0,u_1]^T $$
Then we solve an augmented system: $$ Aw=b,\ Bw=c $$ where $\bar{b}=0$ following the same argument in standard approach (i.e., homogeneous BCs for $\psi$).
To be precise, the previous formulation should be derived via variational calculus, which would lead us to the finite element method.
Alternatively, one could also use the concept of projection and stay within the Galerkin approach. See the 2D case for more details.
The linear system is solved via an equality-constrained optimization: $$ \begin{align*} \min_w &\ w^TAw-w^Tb \\ s.t. &\ Bw=c \end{align*} $$ which translates to a linear system with Lagrange multipliers (i.e., the stationarity condition) $$ \begin{bmatrix} A & B^T \\ B & O \end{bmatrix} \begin{bmatrix} w \\ \lambda \end{bmatrix} = \begin{bmatrix} b \\ c \end{bmatrix} $$
We do not necessarily need to solve the entire system to find $w$. Instead, we attempt to cancel out $\lambda$ to obtain a reduced system.
For the equality constraint, from least-norm problem, we know $$ w=w^\parallel + w^\perp $$ where $$ w^\parallel = B^+ c, Bw^\perp=0 $$ $B^+$ is the pseudoinverse of $B$.
Define a projection matrix $P=I-B^+B$, then $$ Pw = (I-B^+B)(B^+ c + w^\perp) = (I-B^+B)w^\perp = w^\perp $$ so $$ w=w^\parallel + Pw $$
Then the first equation of stationarity condition is $$ \begin{align*} Aw + B^T\lambda &= b \\ A(w^\parallel + Pw) + B^T\lambda &= b \\ PA(w^\parallel + Pw) + \underbrace{PB^T\lambda}_{=0} &= Pb \\ PAPw &= P(b - Aw^\parallel) \end{align*} $$ and if its solution is $w^*$, then the final solution is $$ w=w^\parallel + Pw^* $$
An alternative, approximate approach is to use a penalty method. We solve an unconstrained problem instead, $$ \min_w \ w^TAw-w^Tb + \mu \lVert Bw-c \rVert^2 $$ with sufficiently large $\mu$.
Easy to check that the extremum is achieved by $$ (A+\mu B^TB)w = b+\mu B^T c $$
Note that this formulation only approximately satisfies the constraint - only exact when $\mu\rightarrow\infty$.
Slightly more general treatment; skip in class.
We consider a steady advection-diffusion equation, where $c\in R^2$ is constant. We delibrately include all possible (simple) BCs
$$ \begin{align*} c\cdot\nabla u - \Delta u = f(x,y),&\quad \text{in}\quad \Omega \\ u(x,y) = 0 &\quad \text{on}\quad \Gamma_d^h \\ u(x,y) = g_1(x,y) &\quad \text{on}\quad \Gamma_d^n \\ n\cdot\nabla u(x,y) = 0 &\quad \text{on}\quad \Gamma_n^h \\ n\cdot\nabla u(x,y) = g_2(x,y) &\quad \text{on}\quad \Gamma_n^n \end{align*} $$Domain: $\Omega$
Boundary: $\Gamma=\partial\Omega\equiv \Gamma_d\cup\Gamma_n$
Dirichlet BC: $\Gamma_d = \Gamma_d^h \cup \Gamma_d^n$
Neumann BC: $\Gamma_n = \Gamma_n^h \cup \Gamma_n^n$
Select $\psi$ that satisfies $\psi=0$ on $\Gamma_d$.
$$ \begin{align*} <\psi, c\cdot\nabla u - \Delta u> &= <\psi, f> \\ <\psi, c\cdot\nabla u> - <\psi, \Delta u> &= <\psi, f> \end{align*} $$In the following we will need the integral-by-parts formula: $$ \begin{align*} \int_\Omega \psi\nabla\cdot u d\Omega &= \int_\Gamma \psi u\cdot n d\Gamma - \int_\Omega \nabla \psi \cdot u d\Omega \\ \int_\Omega \psi\Delta u d\Omega &= \int_\Gamma \psi (\nabla u\cdot n) d\Gamma - \int_\Omega \nabla \psi\cdot\nabla u d\Omega \end{align*} $$
For the diffusion term: $$ \begin{align*} <\psi, \Delta u> &= \int_\Omega \psi \Delta u d\Omega \\ &= \int_\Gamma \psi (\nabla u\cdot n) d\Gamma - \int_\Omega \nabla \psi\cdot\nabla u d\Omega \\ &= \int_\Gamma \psi (n\cdot\nabla u) d\Gamma - < \nabla \psi, \nabla u > \end{align*} $$
On $\Gamma_d$, $\psi=0$
On $\Gamma_n^h$, $n\cdot\nabla u=0$
On $\Gamma_n^n$, $n\cdot\nabla u=g_2$
So $$ <\psi, \Delta u> = \int_{\Gamma_n^n} \psi g_2 d\Gamma - < \nabla \psi, \nabla u > $$
For the advection term: $$ \begin{align*} <\psi, c\cdot\nabla u> &= \int_\Omega \psi (c\cdot\nabla u) d\Omega \\ &= \int_\Omega \psi\nabla\cdot (cu) d\Omega \\ &= \int_\Gamma \psi (cu)\cdot n d\Gamma - \int_\Omega \nabla \psi \cdot (cu) d\Omega \\ &= \int_\Gamma \psi u (c\cdot n) d\Gamma - < c\cdot\nabla \psi, u > \end{align*} $$
Only know $\psi=0$ on $\Gamma_d$.
So $$ <\psi, c\cdot\nabla u> = \int_{\Gamma_n} \psi u (c\cdot n) d\Gamma - < c\cdot\nabla \psi, u > $$
The weak form is now organized as,
$$ - < c\cdot\nabla \psi, u > + \int_{\Gamma_n} \psi u (c\cdot n) d\Gamma + < \nabla \psi, \nabla u > = <\psi, f> + \int_{\Gamma_n^n} \psi g_2 d\Gamma $$The natural BCs are explicitly incorporated, and no further treatment is needed.
Define $u=\phi^Tw+\phi_0$, where:
$\phi_i=0$ on $\Gamma_d$
$\phi_0=0$ on $\Gamma_d^h$, $\phi_0=g_1$ on $\Gamma_d^n$
Now consider $\psi=\phi$ and arbitrary choice of basis $\varphi$ that might not satisfy the essential BCs.
For clarity, we consider separately
Substitute $\varphi$ into the diffusion part, we get
$$ \begin{align*} P< \nabla \varphi, \nabla \varphi >P w &;\quad P<\varphi, f> + P\int_{\Gamma_n^n} \varphi g_2 d\Gamma - P< \nabla \varphi, \nabla \varphi > \omega \\ \Rightarrow\quad PA_dP w &;\quad P(b_1 + b_2 - A_d\omega) \end{align*} $$This is what was derived earlier for 1D case.
Similarly for the advection part, we get
$$ P(-A_a+N)Pw ;\quad P(A_a-N)\omega $$where
$$ A_a = < c\cdot\nabla \varphi, \varphi >,\quad N = \int_{\Gamma_n} \varphi \varphi (c\cdot n) d\Gamma $$Together we have $$ PAPw = P(b - A\omega) $$ where $A=-A_a+N+A_d$.
The solution is $$ u = w^T\phi+\phi_0 = \varphi^T(Pw+\omega) \equiv \varphi^T\tilde{w} $$
This recovers the results from the "finite-element-like" approach.
Lastly, we can "reverse-engineer" the optimization problem, $$ \begin{bmatrix} A & B^T \\ B & O \end{bmatrix} \begin{bmatrix} \tilde{w} \ \lambda
$$
We solve a 1D advection-diffusion equation $$ p u' - u'' = f(x),\quad u(0)=u_0,\ u'(1)=n_1 $$ where $p$ is a constant, representing wave speed.
Based on prior derivation, now it should be easy to obtain the following discretization, $$ (-pA-pN+D)w=b,\quad Bw=c $$ where $$ u=w^T\phi,\quad A=<\nabla\phi,\phi>,\quad D=<\nabla\phi,\nabla\phi>,\quad b=<\phi, f> + \phi(1)n_1 $$ and $$ N=\phi(1)\phi(1)^T-\phi(0)\phi(0)^T,\quad B=\phi(0)^T,\quad c=[u_0] $$
# We choose Legendre polynomials as our basis, scaled to [0,1]
x = np.linspace(0,1,41)
f, ax = plt.subplots(ncols=2, figsize=(8,4))
for _i in range(0,5):
ax[0].plot(x, eval_leg0(_i, x), label=f"Order={_i}")
ax[1].plot(x, eval_leg1(_i, x))
ax[0].legend()
ax[0].set_title("Legendre polynomials")
ax[1].set_title("Derivatives");
We use Gauss-Legendre quadratures to evaluate the integrals. $$ \int_{-1}^1 f(x)g(x) dx = \sum_{i=1}^N w_i f(x_i) g(x_i) $$ where $x_i$ are the roots of a $N$th order Legendre polynomial and $w_i$ are the quadrature weights.
A polynomial of order $2N-1$ is evaluated exactly using $N$ points.
We are looking at an interval of $[0,1]$ instead of $[-1,1]$, so the formula is slightly modified, $$ \int_{0}^1 f(x)g(x) dx = \sum_{i=1}^N \tilde{w}_i f(\eta_i) g(\eta_i) $$ where $\eta_i = 2x_i-1$ and $\tilde{w}_i = w_i/2$ are due to coordinate transformation.
Choosing $M$ polynomials, to compute each term in the linear system, we can first define matrices $\Phi_0,\Phi_1\in R^{M\times N}$, and $W\in R^{N\times N}$ $$ [\Phi_0]_{ij} = \phi_i(\eta_j),\quad [\Phi_1]_{ij} = \phi_i'(\eta_j),\quad [W]_{ii}=\tilde{w}_i $$ Then, $$ A=\Phi_1 W \Phi_0^T,\quad D=\Phi_1 W \Phi_1^T,\quad b=\Phi_0 W f $$ where $f_i=f(\eta_i)$.
# Numerical solution
M, N, p = 6, 10, 10.0
ub, nb = [1.0, None], [None, 1.0]
qs = [0.0, 0.0]
sol, _ = galerkin(M, N, p, ub, nb, qs)
compare_sol(sol, fu, 41);
Ms, Es = [4, 6, 8, 10, 12], []
for _M in Ms:
sol, _ = galerkin(_M, _M+4, p, ub, nb, qs)
Es.append( np.linalg.norm(eval_sol(sol, x) - fu(x)) )
f = plt.figure(figsize=(6,4))
plt.semilogy(Ms, Es)
plt.xlabel("# of basis")
plt.ylabel("Error");