Module 3.2
Embed differential equations directly into the loss function—the physics itself provides supervision, no labelled data required.
| Difficulty: | Advanced |
| Estimated Time: | 4–5 hours |
Traditional numerical PDE solvers (finite differences, finite elements) require meshes, careful discretization, and separate code for each equation. Physics-Informed Neural Networks (PINNs) offer a radically different approach: approximate the solution with a neural network and train it by minimizing the PDE residual at randomly sampled collocation points. The physics itself becomes the loss function.
This means PINNs can solve forward problems (given equation and conditions, find the solution), inverse problems (given partial observations, recover unknown parameters), and even problems on irregular domains without mesh generation. Applications span fluid dynamics, heat transfer, solid mechanics, medical imaging, and geophysics. PINNs are not a replacement for classical solvers but a powerful complement, especially when data and physics must be combined.
torch.autograd.grad with create_graph=True computes these automatically.
Solve $y'' + y = 0$, $y(0) = 0$, $y'(0) = 1$ on $[0, 2\pi]$. Exact solution: $y(x) = \sin(x)$.
Step 1: Network architecture. Let $u_\theta(x)$ be an MLP with 2 hidden layers of 20 neurons each and $\tanh$ activation. Input: $x \in \mathbb{R}$. Output: $u \in \mathbb{R}$.
Step 2: Compute derivatives. Using the chain rule through the network:
Step 3: Define losses.
Step 4: Train. Use Adam optimizer for 5000 iterations. The PDE loss drops below $10^{-5}$ and the learned solution matches $\sin(x)$ to 3–4 decimal places.
Solve $u_t = 0.01\,u_{xx}$ on $x \in [0, 1]$, $t \in [0, 1]$ with $u(x, 0) = \sin(\pi x)$ and $u(0,t) = u(1,t) = 0$.
Exact solution: $u(x,t) = e^{-0.01\pi^2 t}\sin(\pi x)$.
Collocation: Sample $N_r = 10{,}000$ random $(x,t)$ pairs in $(0,1)\times(0,1)$; $N_0 = 200$ points for IC at $t=0$; $N_b = 100$ points each at $x=0$ and $x=1$.
Residual: $r = u_t - 0.01\,u_{xx}$. Computed by two passes of automatic differentiation: first $\partial u/\partial t$ and $\partial u/\partial x$, then $\partial^2 u/\partial x^2$.
Training: Network with layers [2, 50, 50, 50, 1]. Adam for 10,000 iterations, learning rate $10^{-3}$ with step decay. Achieve relative $L^2$ error below 1% on the test grid.
Given noisy measurements of a spring-mass system $y'' + ky = 0$ where $k$ is unknown, recover $k$ using a PINN.
Setup: Generate data from $y'' + 4y = 0$, $y(0) = 1$, $y'(0) = 0$ (true $k = 4$, solution $y = \cos(2x)$). Add 5% Gaussian noise to 50 observation points.
Network: Same architecture as Example 1, but add $k$ as a learnable parameter initialized to $k_{\text{init}} = 1.0$.
Loss: $\mathcal{L} = \mathcal{L}_{\text{PDE}} + 10\,\mathcal{L}_{\text{data}}$ where $\mathcal{L}_{\text{data}} = \frac{1}{50}\sum |u_\theta(x_i) - y_i^{\text{obs}}|^2$.
Result: After 10,000 training steps, $k$ converges from 1.0 to approximately 3.97 (true value: 4.0). The PINN simultaneously learns the solution shape and the unknown parameter.
We solve $y'' + y = 0$, $y(0) = 0$, $y'(0) = 1$ using a minimal PINN built entirely in NumPy. The exact solution is $\sin(x)$.
For production PINNs with automatic differentiation, see notebooks/pinn_full.ipynb.
import numpy as np
import matplotlib.pyplot as plt
# ── Tiny MLP in pure NumPy ───────────────────────────────────
class TinyMLP:
"""2-hidden-layer MLP with tanh activation for PINN demo."""
def __init__(self, layer_sizes, seed=42):
rng = np.random.RandomState(seed)
self.weights = []
self.biases = []
for i in range(len(layer_sizes) - 1):
fan_in, fan_out = layer_sizes[i], layer_sizes[i+1]
W = rng.randn(fan_in, fan_out) * np.sqrt(2.0 / (fan_in + fan_out))
b = np.zeros(fan_out)
self.weights.append(W)
self.biases.append(b)
def forward(self, x):
"""Forward pass returning u, du/dx, d2u/dx2 via manual chain rule."""
# x is shape (N, 1)
# Layer 1: z1 = W1*x + b1, a1 = tanh(z1)
z1 = x @ self.weights[0] + self.biases[0]
a1 = np.tanh(z1)
da1 = 1.0 - a1**2 # tanh'(z1)
dda1 = -2.0 * a1 * da1 # tanh''(z1)
# Layer 2: z2 = W2*a1 + b2, a2 = tanh(z2)
z2 = a1 @ self.weights[1] + self.biases[1]
a2 = np.tanh(z2)
da2 = 1.0 - a2**2
dda2 = -2.0 * a2 * da2
# Output: u = W3*a2 + b3 (linear output)
u = a2 @ self.weights[2] + self.biases[2] # (N, 1)
# du/dx via chain rule:
# du/da2 = W3^T, da2/dz2 = da2, dz2/da1 = W2^T, da1/dz1 = da1, dz1/dx = W1^T
# du/dx = W3 * da2 * W2 * da1 * W1 (element-wise along paths)
# For scalar output, this simplifies:
# du/dz2 = W3^T shape (hidden,)
du_dz2 = da2 * (self.weights[2].T) # (N, hidden2)
du_da1 = du_dz2 @ self.weights[1].T # (N, hidden1)
du_dz1 = du_da1 * da1 # (N, hidden1)
dudx = du_dz1 @ self.weights[0].T # (N, 1)
# d2u/dx2 (second derivative through the chain rule)
# Need to differentiate dudx w.r.t. x again
# d(du_dz2)/dx = dda2 * W3^T * (dz2/dx)
# dz2/dx = W2^T * da1 * W1^T
dz2_dx = (da1 @ self.weights[1]) * 1.0 # we need dz2/dx per sample
# Actually let's compute d2u/dx2 numerically for simplicity
# (manual second derivative is complex; use finite differences on du/dx)
# This is the toy demo; production code uses autograd.
return u, dudx
def forward_with_d2(self, x, eps=1e-5):
"""Forward pass with second derivative via finite differences on dudx."""
u, dudx = self.forward(x)
_, dudx_plus = self.forward(x + eps)
_, dudx_minus = self.forward(x - eps)
d2udx2 = (dudx_plus - dudx_minus) / (2 * eps)
return u, dudx, d2udx2
def get_params(self):
return [w.copy() for w in self.weights] + [b.copy() for b in self.biases]
def set_params(self, params):
n = len(self.weights)
for i in range(n):
self.weights[i] = params[i].copy()
self.biases[i] = params[i + n].copy()
# ── PINN Training ────────────────────────────────────────────
net = TinyMLP([1, 20, 20, 1])
# Collocation points
N_col = 100
x_col = np.linspace(0, 2*np.pi, N_col).reshape(-1, 1)
# Training parameters
lr = 0.001
lambda_ic = 50.0
n_epochs = 3000
losses = []
for epoch in range(n_epochs):
# Forward pass at collocation points
u, dudx, d2udx2 = net.forward_with_d2(x_col)
# PDE residual: y'' + y = 0
residual = d2udx2 + u
loss_pde = np.mean(residual**2)
# IC: y(0) = 0, y'(0) = 1
u0, dudx0, _ = net.forward_with_d2(np.array([[0.0]]))
loss_ic = (u0[0,0] - 0.0)**2 + (dudx0[0,0] - 1.0)**2
loss = loss_pde + lambda_ic * loss_ic
losses.append(loss)
# Numerical gradient update (finite differences on each parameter)
eps_grad = 1e-5
params = net.get_params()
grads = []
for i, p in enumerate(params):
g = np.zeros_like(p)
for idx in np.ndindex(p.shape):
params_plus = [pp.copy() for pp in params]
params_plus[i][idx] += eps_grad
net.set_params(params_plus)
u_p, _, d2_p = net.forward_with_d2(x_col)
r_p = d2_p + u_p
l_pde_p = np.mean(r_p**2)
u0_p, du0_p, _ = net.forward_with_d2(np.array([[0.0]]))
l_ic_p = (u0_p[0,0])**2 + (du0_p[0,0] - 1.0)**2
loss_plus = l_pde_p + lambda_ic * l_ic_p
params_minus = [pp.copy() for pp in params]
params_minus[i][idx] -= eps_grad
net.set_params(params_minus)
u_m, _, d2_m = net.forward_with_d2(x_col)
r_m = d2_m + u_m
l_pde_m = np.mean(r_m**2)
u0_m, du0_m, _ = net.forward_with_d2(np.array([[0.0]]))
l_ic_m = (u0_m[0,0])**2 + (du0_m[0,0] - 1.0)**2
loss_minus = l_pde_m + lambda_ic * l_ic_m
g[idx] = (loss_plus - loss_minus) / (2 * eps_grad)
grads.append(g)
# Gradient descent
for i in range(len(params)):
params[i] -= lr * grads[i]
net.set_params(params)
if (epoch + 1) % 500 == 0:
print(f"Epoch {epoch+1}: loss={loss:.6f}, PDE={loss_pde:.6f}, IC={loss_ic:.6f}")
# ── Evaluation ───────────────────────────────────────────────
x_test = np.linspace(0, 2*np.pi, 200).reshape(-1, 1)
u_pred, _, _ = net.forward_with_d2(x_test)
u_exact = np.sin(x_test)
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].plot(x_test, u_exact, 'b-', lw=2, label='Exact: sin(x)')
axes[0].plot(x_test, u_pred, 'r--', lw=2, label='PINN prediction')
axes[0].set_xlabel('x'); axes[0].set_ylabel('y')
axes[0].legend(); axes[0].set_title('PINN Solution for y\'\' + y = 0')
axes[0].grid(True, alpha=0.3)
axes[1].semilogy(losses)
axes[1].set_xlabel('Epoch'); axes[1].set_ylabel('Total Loss')
axes[1].set_title('Training History')
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print(f"Max absolute error: {np.max(np.abs(u_pred - u_exact)):.4f}")
Note: This toy implementation uses finite-difference gradients for both the PDE derivatives and the parameter updates. Production PINNs use PyTorch or JAX automatic differentiation, which is exact and orders of magnitude faster. See notebooks/pinn_full.ipynb for a full PyTorch implementation solving the Burgers equation.
Exploration ideas:
| MDP Component | PINN Interpretation |
|---|---|
| State | Current network parameters $\theta$, current loss components $(\mathcal{L}_{\text{PDE}}, \mathcal{L}_{\text{BC}}, \mathcal{L}_{\text{IC}})$, training iteration $k$ |
| Action | Choose loss weights $(\lambda_{\text{PDE}}, \lambda_{\text{BC}}, \lambda_{\text{IC}})$; select new collocation points; adjust learning rate |
| Reward | Negative total weighted loss; bonus when PDE residual drops below threshold across the domain |
| Policy | Adaptive loss weighting (e.g., based on gradient magnitudes); residual-based adaptive refinement of collocation points |
Adaptive collocation as active learning. A key design choice in PINNs is where to place collocation points. Uniform random sampling wastes evaluations in smooth regions. A smarter agent samples more collocation points where the PDE residual is large—this is residual-based adaptive refinement (RAR). At each training stage, the agent evaluates the residual on a coarse grid, identifies high-residual regions, and adds new collocation points there. This is directly analogous to active learning: the agent chooses the most informative training examples.
Loss balancing as reward shaping. The weights $\lambda_{\text{PDE}}, \lambda_{\text{BC}}, \lambda_{\text{IC}}$ shape the effective reward landscape. Wang et al. (2021) proposed using the Neural Tangent Kernel (NTK) to set weights that balance the learning rates of different loss components. This is analogous to reward shaping in RL: the raw reward (PDE residual) is supplemented with auxiliary rewards (BC, IC satisfaction) to guide learning.
Curriculum learning. Some problems benefit from a training curriculum: first train with low-frequency collocation (or on an easier sub-domain), then gradually increase difficulty. This parallels curriculum learning in RL, where agents first learn simple tasks before tackling hard ones.
Exercise 1. Write out the PINN loss function for the advection equation $u_t + cu_x = 0$ on $[0,1]\times[0,1]$ with $u(x,0) = \sin(2\pi x)$ and periodic boundary conditions.
$\mathcal{L} = \frac{1}{N_r}\sum_i|u_t(x_i, t_i) + c\,u_x(x_i, t_i)|^2 + \frac{\lambda_0}{N_0}\sum_j|u(x_j, 0) - \sin(2\pi x_j)|^2 + \frac{\lambda_b}{N_b}\sum_k|u(0, t_k) - u(1, t_k)|^2$.
Exercise 2. For a network $u_\theta(x) = w_2\tanh(w_1 x + b_1) + b_2$ with scalar parameters, compute $du_\theta/dx$ and $d^2u_\theta/dx^2$ explicitly in terms of $w_1, w_2, b_1$.
Let $z = w_1 x + b_1$. Then $u = w_2\tanh(z) + b_2$. $du/dx = w_2(1 - \tanh^2(z))w_1 = w_1 w_2 \text{sech}^2(z)$. $d^2u/dx^2 = w_1^2 w_2 \cdot (-2\tanh(z)\text{sech}^2(z)) = -2w_1^2 w_2 \tanh(z)\text{sech}^2(z)$.
Exercise 3. Explain why the total PINN error has three components (approximation, optimization, generalization) and give a concrete example of when each dominates.
Approximation error dominates when the network is too small to represent the solution (e.g., 5 neurons trying to approximate a high-frequency solution). Optimization error dominates when the loss landscape has many local minima (common for multi-scale problems). Generalization error dominates when too few collocation points are used (e.g., 10 points for a 2D domain).
Exercise 4. Show that for the 1D Poisson equation $-u'' = f(x)$, $u(0) = u(1) = 0$, the PINN residual loss is $\mathcal{L}_{\text{PDE}} = \frac{1}{N}\sum |u''_\theta(x_i) + f(x_i)|^2$.
The PDE is $-u'' - f = 0$, so the residual is $r(x) = -u''_\theta(x) - f(x)$. $|r|^2 = |u''_\theta(x) + f(x)|^2$. Averaging over collocation points: $\mathcal{L}_{\text{PDE}} = \frac{1}{N}\sum |u''_\theta(x_i) + f(x_i)|^2$.
Exercise 5. Modify the toy PINN code to solve $y'' + 4y = 0$, $y(0) = 1$, $y'(0) = 0$ (solution: $\cos(2x)$). Report the maximum error on $[0, \pi]$.
Exercise 6. Implement a PINN for the exponential decay ODE $y' + 2y = 0$, $y(0) = 3$ (solution: $3e^{-2x}$). Since this is first-order, you only need $u$ and $u'$.
Exercise 7. Add a data-fitting term to the code lab: generate 20 noisy observations of $\sin(x)$ and include them in the loss. Compare the result with and without the data term.
Exercise 8. Implement residual-based adaptive refinement: every 500 epochs, evaluate the PDE residual on a fine grid, identify the 10 points with largest residual, and add them to the collocation set. Does this improve accuracy?
Exercise 9. Design an agent that adaptively adjusts $\lambda_{\text{IC}}$ during training. Start with $\lambda_{\text{IC}} = 100$ and reduce it as the IC loss decreases below a threshold. Compare the final accuracy with fixed $\lambda_{\text{IC}}$.
Exercise 10. Implement a “multi-task PINN agent” that trains one network to solve $y'' + ky = 0$ for $k = 1, 4, 9$ simultaneously (with $k$ as an additional input). How does the agent balance accuracy across the three tasks?
Q1. What are the three components of a typical PINN loss?
PDE residual loss, boundary condition loss, and initial condition loss.
Q2. What are collocation points?
Points in the domain where the PDE residual is evaluated during training. They do not need to lie on a structured mesh.
Q3. How does a PINN compute $\partial^2 u/\partial x^2$ through the network?
By applying automatic differentiation twice: first compute $\partial u/\partial x$, then differentiate that result with respect to $x$ again.
Q4. What distinguishes a forward PINN problem from an inverse PINN problem?
In a forward problem, the PDE and all conditions are known; the PINN finds the solution. In an inverse problem, some parameters are unknown and are learned from observation data alongside the solution.
Q5. Why might a PINN fail to converge even when the loss decreases?
Because one loss component (e.g., PDE residual) may decrease while another (e.g., boundary conditions) increases. The total loss can decrease while the actual solution quality degrades in certain regions.
Q6. What is spectral bias in the context of PINNs?
Neural networks tend to learn low-frequency components of the solution first and struggle with high-frequency features. This can cause slow convergence or poor accuracy for solutions with sharp gradients or rapid oscillations.
Q7. How does an inverse PINN learn an unknown parameter like viscosity?
The parameter is initialized as a trainable scalar alongside the network weights. The loss includes a data-fitting term that compares network predictions with observations. Gradients flow through the PDE residual to update both the network and the parameter.
Q8. Why is loss weighting ($\lambda_{\text{PDE}}, \lambda_{\text{BC}}, \lambda_{\text{IC}}$) important?
Different loss components may have different magnitudes and gradient scales. Without proper weighting, the optimizer may satisfy one constraint at the expense of others, leading to inaccurate solutions.
Q9. Name two advantages of PINNs over finite-difference methods.
1) PINNs are mesh-free, making them easier to apply on irregular domains. 2) PINNs can naturally incorporate sparse observation data and solve inverse problems.
Q10. In the agent lens, what does residual-based adaptive refinement correspond to?
Active learning: the agent identifies the regions where its model is most uncertain (highest PDE residual) and allocates more training data (collocation points) there.
Solve $u_t = 0.01\,u_{xx}$ on $[0,1] \times [0,1]$ with $u(x,0) = \sin(\pi x)$ and $u(0,t) = u(1,t) = 0$.
Deliverables:
| Criterion | Weight | Description |
|---|---|---|
| Correctness | 30% | Both PINN and FD solutions are accurate (relative error below 5%) |
| Comparison | 25% | Meaningful comparison of accuracy, runtime, and ease of implementation |
| Visualization | 20% | Clear heatmaps, error plots, and training curves |
| Analysis | 15% | Discussion of when PINNs are advantageous over classical methods |
| Code quality | 10% | Clean, well-structured, commented code |
For a complete PINN implementation with automatic differentiation solving the Burgers equation (forward and inverse): notebooks/pinn_full.ipynb