Module 3.3

Neural Operators and Function Space Learning

Learn the entire solution operator $\mathcal{G}: f \mapsto u$—once trained, get instant predictions for any new input function.

Difficulty:Advanced
Estimated Time:4–5 hours

Prerequisites

  • Module 3.1: Neural Differential Equations (MLP basics, gradient descent)
  • Module 2.2: Numerical PDE Methods (finite differences, Poisson solvers)
  • Fourier analysis basics (DFT, frequency domain)

Why This Matters

A classical PDE solver computes one solution at a time: change the forcing function, initial condition, or geometry and you must re-solve from scratch. Neural operators learn the entire solution operator—a mapping from input functions to output functions. Once trained, they predict solutions for new inputs in milliseconds, enabling real-time simulation, rapid design optimization, and uncertainty quantification that would be infeasible with traditional solvers.

Two architectures dominate the field. DeepONet (Lu et al. 2021) decomposes the operator into a branch network (encodes the input function) and a trunk network (encodes the query location), combining them via a learned basis expansion. The Fourier Neural Operator (FNO, Li et al. 2021) performs convolution in the spectral domain, achieving resolution invariance and computational efficiency. Both architectures are backed by universal approximation theorems for nonlinear operators.

Learning Objectives

  1. Distinguish between a function $u(x)$ and an operator $\mathcal{G}[f](x) = u(x)$ that maps functions to functions.
  2. Describe the DeepONet architecture: branch network, trunk network, and their combination via inner product.
  3. Describe the FNO architecture: lifting, spectral convolution layers, and projection.
  4. State the universal approximation theorem for operators (Chen & Chen 1995).
  5. Generate training data for operator learning by solving PDEs with multiple input functions.
  6. Train a toy DeepONet in NumPy on the Poisson equation operator.
  7. Explain resolution invariance and why FNOs can generalize across discretizations.
  8. Compare operator learning with pointwise prediction (e.g., PINNs) in terms of amortized cost.
  9. Frame architecture and hyperparameter selection as an agent design problem.

Core Concepts

Definition 3.3.1 (Nonlinear Operator). A nonlinear operator $\mathcal{G}: \mathcal{U} \to \mathcal{V}$ maps functions in a Banach space $\mathcal{U}$ to functions in a Banach space $\mathcal{V}$. Example: the solution operator of $-u'' = f$ maps $f \in L^2([0,1])$ to $u \in H^2([0,1])$.
Definition 3.3.2 (Solution Operator). For a PDE parameterized by input function $a$ (forcing, initial condition, coefficient), the solution operator $\mathcal{G}$ maps $a \mapsto u$ where $u$ is the corresponding PDE solution. For the Poisson equation $-\Delta u = f$ with zero boundary conditions, $\mathcal{G}[f] = u = (-\Delta)^{-1}f$.
Definition 3.3.3 (DeepONet). A DeepONet approximates $\mathcal{G}[f](x)$ using: $$\mathcal{G}_\theta[f](x) \approx \sum_{k=1}^{p} b_k(f;\theta_b)\,\tau_k(x;\theta_t) + b_0$$ where $b_k$ are outputs of the branch network (an MLP taking $[f(s_1), \ldots, f(s_m)]$ as input, where $s_j$ are fixed sensor locations), $\tau_k$ are outputs of the trunk network (an MLP taking query location $x$ as input), and $p$ is the basis dimension.
Definition 3.3.4 (Fourier Neural Operator). The FNO maps input function $v_0$ to output function $v_L$ through $L$ layers, each performing: $$v_{l+1}(x) = \sigma\!\left(W_l\,v_l(x) + (\mathcal{K}_l v_l)(x)\right)$$ where $W_l$ is a pointwise linear transform and $\mathcal{K}_l$ is a spectral convolution: $$(\mathcal{K}_l v_l)(x) = \mathcal{F}^{-1}\!\left(R_l \cdot \mathcal{F}(v_l)\right)(x)$$ with learnable weight tensor $R_l$ in Fourier space, and $\mathcal{F}$ denoting the discrete Fourier transform.
Definition 3.3.5 (Spectral Convolution Layer). In a spectral convolution layer, the input $v(x)$ (discretized at $n$ points) is transformed to frequency domain via FFT, multiplied by a learnable complex weight matrix $R \in \mathbb{C}^{d_{\text{out}} \times d_{\text{in}} \times k_{\max}}$ for the lowest $k_{\max}$ Fourier modes, then transformed back via inverse FFT. Higher modes are truncated (set to zero). This enables efficient global interaction at $O(n \log n)$ cost.
Definition 3.3.6 (Resolution Invariance). An operator learner is resolution invariant if a model trained on functions discretized at resolution $n_1$ can be evaluated at a different resolution $n_2$ without retraining. FNOs achieve this because the spectral convolution weights $R_l$ parameterize a finite set of Fourier modes independent of the spatial discretization. DeepONets are resolution invariant by design since the trunk network maps continuous coordinates.
Theorem 3.3.1 (Universal Approximation for Operators, Chen & Chen 1995). Let $\mathcal{G}: K \to C(\Omega)$ be a continuous nonlinear operator where $K$ is a compact subset of $C([0,1])$. For any $\varepsilon > 0$, there exist continuous functions $g_k, \tau_k$ and points $s_1, \ldots, s_m$ such that: $$\sup_{f \in K}\sup_{x \in \Omega}\left|\mathcal{G}[f](x) - \sum_{k=1}^{p} g_k(f(s_1),\ldots,f(s_m))\,\tau_k(x)\right| < \varepsilon$$ This is the theoretical foundation of DeepONet: the branch-trunk decomposition can approximate any continuous operator to arbitrary accuracy.
Theorem 3.3.2 (DeepONet Convergence Rates). Under Sobolev regularity assumptions on the input and output spaces, DeepONet achieves approximation rates that depend on the smoothness of the operator and the number of basis functions $p$. For operators with Lipschitz regularity $s$, the approximation error scales as $O(p^{-s/d})$ where $d$ is the input dimension (Lanthaler et al. 2022).
Theorem 3.3.3 (FNO Approximation in Sobolev Spaces). The Fourier Neural Operator can approximate any continuous operator $\mathcal{G}: H^s(\Omega) \to H^r(\Omega)$ (Sobolev spaces) to arbitrary accuracy, provided sufficiently many Fourier modes $k_{\max}$ and layers $L$ are used (Kovachki et al. 2023). The truncation of high Fourier modes introduces an aliasing error bounded by the Sobolev decay of the solution.

Common Misconceptions

Misconception 1: “Neural operators are just big neural networks.” Neural operators have fundamentally different input/output: they take functions (discretized as vectors) and produce functions. A standard MLP maps fixed-size vectors. The architectures (branch-trunk decomposition, spectral convolution) are specifically designed for the function-to-function setting.
Misconception 2: “FNO only works for periodic domains.” While the FFT assumes periodicity, FNOs can handle non-periodic problems by zero-padding, using a larger computational domain, or switching to other spectral bases (Chebyshev, wavelet). In practice, FNOs work well on non-periodic problems with appropriate preprocessing.
Misconception 3: “More Fourier modes always means better accuracy.” Keeping too many modes increases model size and the risk of overfitting. For smooth solutions, a small number of modes (12–20) captures most of the energy. The optimal $k_{\max}$ depends on the solution regularity.
Misconception 4: “Neural operators don’t generalize to new resolutions.” Both DeepONet and FNO are resolution-invariant by design. The trunk network and spectral weights parameterize continuous operators, not discrete grids. Empirically, FNOs trained at 64 points can predict accurately at 256 points.
Misconception 5: “Training data requires expensive PDE simulations only.” Training data can come from analytical solutions, cheap low-fidelity simulations (multi-fidelity learning), experimental measurements, or combinations thereof. Physics-informed operator learning (PINO) adds PDE residual losses to reduce data requirements.

Worked Examples

Example 1: DeepONet for the Antiderivative Operator

Learn the operator $\mathcal{G}[f](x) = \int_0^x f(s)\,ds$.

Setup: Sample $f$ as random sine series: $f(s) = \sum_{k=1}^{3} a_k \sin(k\pi s)$ with $a_k \sim \mathcal{N}(0,1)$. Compute $F(x) = \int_0^x f(s)\,ds$ analytically.

Branch network: Input: $[f(s_1), \ldots, f(s_{50})]$ at 50 equally-spaced sensor points $s_j \in [0,1]$. Architecture: 50 → 40 → $p$.

Trunk network: Input: query location $x$. Architecture: 1 → 40 → $p$.

Forward pass for one input: Let $f(s) = 2\sin(\pi s) - \sin(2\pi s)$. Evaluate at sensors to get a 50-dimensional vector. Pass through branch to get $\mathbf{b} = [b_1, \ldots, b_p]$. For query $x = 0.5$, pass through trunk to get $\boldsymbol{\tau} = [\tau_1, \ldots, \tau_p]$. Output: $\hat{F}(0.5) = \sum_{k} b_k \tau_k + b_0$.

Training: Generate 1000 random $f$’s. MSE loss between predicted and true antiderivatives at random query points. After training, test on unseen $f$’s: relative $L^2$ error $\sim 10^{-3}$.

Example 2: FNO Layer Computation (Concrete Numbers)

Illustrate one spectral convolution on a function discretized at $n = 8$ points.

Input: $v = [1, 2, 3, 4, 3, 2, 1, 0]$ (single channel).

Step 1: FFT. $\hat{v} = \text{FFT}(v) = [16, -2-5.83i, 0+2i, -2-0.17i, 0, -2+0.17i, 0-2i, -2+5.83i]$ (5 unique modes due to symmetry).

Step 2: Truncate. Keep $k_{\max} = 3$ modes: $\hat{v}_{\text{trunc}} = [16, -2-5.83i, 0+2i, 0, 0, 0, 0, 0]$.

Step 3: Multiply. For weight $R = [1.5, 0.8+0.1i, -0.3+0.5i]$ (one per kept mode): $\hat{w}_0 = 1.5 \times 16 = 24$, $\hat{w}_1 = (0.8+0.1i)(-2-5.83i) = -1.02-5.86i$, $\hat{w}_2 = (-0.3+0.5i)(0+2i) = -1.0-0.6i$.

Step 4: IFFT. Zero-pad back to 8 modes and compute $w = \text{IFFT}(\hat{w})$ to get the output in physical space. The result captures the low-frequency structure of $v$ transformed by the learned weights.

Example 3: Poisson Operator $-u'' = f$

Learn $\mathcal{G}: f \mapsto u$ where $-u''(x) = f(x)$, $u(0) = u(1) = 0$.

Data generation: For each of 1000 training samples, draw $f$ as a random Fourier series with 1–5 modes. Solve using a tridiagonal finite-difference solver (see Code Lab below).

Training: Branch input: $f$ at 64 sensor points. Trunk input: query $x$. Loss: $\frac{1}{N}\sum_{j}\|u_{\text{pred}}(x_j) - u_{\text{true}}(x_j)\|^2$ averaged over samples and query points.

After 5000 epochs: Mean relative test error $\approx 5 \times 10^{-3}$. The trained model predicts the solution for a new $f$ in a single forward pass ($\sim$1 ms) versus $\sim$10 ms for the tridiagonal solve—a 10x speedup that grows with problem dimension.

Interactive Code Lab

Toy DeepONet in NumPy

Build and train a minimal DeepONet to learn the Poisson operator $-u'' = f$, $u(0) = u(1) = 0$.

For full FNO and DeepONet implementations in PyTorch, see notebooks/neural_operator_full.ipynb.


import numpy as np
import matplotlib.pyplot as plt

# ── 1. Data generation: solve -u'' = f with u(0)=u(1)=0 ─────
def solve_poisson(f_vals, x_grid):
    """Solve -u''=f via tridiagonal finite differences."""
    n = len(x_grid)
    dx = x_grid[1] - x_grid[0]
    # Interior points only (exclude boundaries)
    A_diag = 2.0 * np.ones(n - 2)
    A_off = -1.0 * np.ones(n - 3)
    # Solve tridiagonal system: (1/dx^2) * A * u_int = f_int
    rhs = dx**2 * f_vals[1:-1]
    # Thomas algorithm
    n_int = n - 2
    c = np.zeros(n_int); d = np.zeros(n_int)
    c[0] = A_off[0] / A_diag[0] if n_int > 1 else 0
    d[0] = rhs[0] / A_diag[0]
    for i in range(1, n_int):
        m = A_diag[i] - A_off[min(i-1, len(A_off)-1)] * c[i-1]
        c[i] = A_off[min(i, len(A_off)-1)] / m if i < n_int - 1 else 0
        d[i] = (rhs[i] - A_off[min(i-1, len(A_off)-1)] * d[i-1]) / m
    u_int = np.zeros(n_int)
    u_int[-1] = d[-1]
    for i in range(n_int - 2, -1, -1):
        u_int[i] = d[i] - c[i] * u_int[i+1]
    u = np.zeros(n)
    u[1:-1] = u_int
    return u

n_points = 32
x_grid = np.linspace(0, 1, n_points)

def random_forcing(n_samples):
    """Random Fourier forcing functions."""
    F = np.zeros((n_samples, n_points))
    for i in range(n_samples):
        n_modes = np.random.randint(1, 5)
        for k in range(1, n_modes + 1):
            a = np.random.randn() * 2.0
            F[i] += a * np.sin(k * np.pi * x_grid)
    return F

N_train, N_test = 500, 100
f_train = random_forcing(N_train)
u_train = np.array([solve_poisson(f, x_grid) for f in f_train])
f_test = random_forcing(N_test)
u_test = np.array([solve_poisson(f, x_grid) for f in f_test])

# ── 2. Tiny DeepONet in NumPy ────────────────────────────────
class TinyDeepONet:
    def __init__(self, n_sensors, trunk_in, hidden, p, seed=42):
        rng = np.random.RandomState(seed)
        sc = lambda m, n: rng.randn(m, n) * np.sqrt(2.0/(m+n))
        # Branch: n_sensors -> hidden -> p
        self.Wb1 = sc(n_sensors, hidden)
        self.bb1 = np.zeros(hidden)
        self.Wb2 = sc(hidden, p)
        self.bb2 = np.zeros(p)
        # Trunk: trunk_in -> hidden -> p
        self.Wt1 = sc(trunk_in, hidden)
        self.bt1 = np.zeros(hidden)
        self.Wt2 = sc(hidden, p)
        self.bt2 = np.zeros(p)
        # Bias
        self.b0 = np.zeros(1)
        self.params = [self.Wb1, self.bb1, self.Wb2, self.bb2,
                       self.Wt1, self.bt1, self.Wt2, self.bt2, self.b0]

    def branch(self, f_sensors):
        """f_sensors: (batch, n_sensors) -> (batch, p)"""
        h = np.tanh(f_sensors @ self.Wb1 + self.bb1)
        return h @ self.Wb2 + self.bb2

    def trunk(self, x_query):
        """x_query: (n_query, 1) -> (n_query, p)"""
        h = np.tanh(x_query @ self.Wt1 + self.bt1)
        return h @ self.Wt2 + self.bt2

    def forward(self, f_sensors, x_query):
        """Output: (batch, n_query)"""
        b = self.branch(f_sensors)   # (batch, p)
        t = self.trunk(x_query)      # (n_query, p)
        return b @ t.T + self.b0     # (batch, n_query)

# ── 3. Training ──────────────────────────────────────────────
p = 20  # basis dimension
net = TinyDeepONet(n_points, 1, 32, p)
x_query = x_grid.reshape(-1, 1)  # (n_points, 1)

lr = 0.001
batch_size = 32
n_epochs = 2000
losses = []

for epoch in range(n_epochs):
    idx = np.random.choice(N_train, batch_size, replace=False)
    f_batch = f_train[idx]    # (batch, n_points)
    u_batch = u_train[idx]    # (batch, n_points)

    # Forward
    u_pred = net.forward(f_batch, x_query)  # (batch, n_points)
    loss = np.mean((u_pred - u_batch)**2)
    losses.append(loss)

    # Numerical gradient update (parameter perturbation)
    eps = 1e-5
    for pi in range(len(net.params)):
        grad = np.zeros_like(net.params[pi])
        flat = net.params[pi].ravel()
        grad_flat = grad.ravel()
        # Sample subset of parameters to update (for speed)
        n_param = len(flat)
        sample_idx = np.random.choice(n_param, min(50, n_param), replace=False)
        for j in sample_idx:
            old_val = flat[j]
            flat[j] = old_val + eps
            net.params[pi] = flat.reshape(net.params[pi].shape)
            u_p = net.forward(f_batch, x_query)
            loss_p = np.mean((u_p - u_batch)**2)
            flat[j] = old_val - eps
            net.params[pi] = flat.reshape(net.params[pi].shape)
            u_m = net.forward(f_batch, x_query)
            loss_m = np.mean((u_m - u_batch)**2)
            grad_flat[j] = (loss_p - loss_m) / (2 * eps)
            flat[j] = old_val
        net.params[pi] = flat.reshape(net.params[pi].shape)
        net.params[pi] -= lr * grad.reshape(net.params[pi].shape)

    # Keep references in sync
    (net.Wb1, net.bb1, net.Wb2, net.bb2,
     net.Wt1, net.bt1, net.Wt2, net.bt2, net.b0) = net.params

    if (epoch + 1) % 500 == 0:
        print(f"Epoch {epoch+1}: MSE = {loss:.6f}")

# ── 4. Evaluation ────────────────────────────────────────────
u_pred_test = net.forward(f_test, x_query)
errors = np.linalg.norm(u_pred_test - u_test, axis=1) / (np.linalg.norm(u_test, axis=1) + 1e-10)
print(f"\nMean relative L2 error: {errors.mean():.4f}")
print(f"Max relative L2 error:  {errors.max():.4f}")

fig, axes = plt.subplots(2, 3, figsize=(14, 8))
for col in range(3):
    i = np.random.randint(N_test)
    axes[0, col].plot(x_grid, f_test[i], 'k-', lw=1.5)
    axes[0, col].set_title(f'Forcing f(x) #{i}')
    axes[0, col].set_xlabel('x'); axes[0, col].grid(True, alpha=0.3)
    axes[1, col].plot(x_grid, u_test[i], 'b-', lw=2, label='True')
    axes[1, col].plot(x_grid, u_pred_test[i], 'r--', lw=2, label='DeepONet')
    axes[1, col].set_title(f'Solution, err={errors[i]:.3e}')
    axes[1, col].set_xlabel('x'); axes[1, col].legend()
    axes[1, col].grid(True, alpha=0.3)
plt.suptitle('DeepONet: Poisson Operator Learning', fontsize=14)
plt.tight_layout()
plt.show()

Note: This toy implementation uses numerical gradients and trains slowly. Production implementations use PyTorch autograd for exact, efficient gradients. See notebooks/neural_operator_full.ipynb for full DeepONet and FNO implementations.

Exploration ideas:

  • Increase the basis dimension $p$ from 20 to 50 and compare accuracy.
  • Train on forcing functions with 1–3 modes, then test on 5-mode functions. Does the operator generalize?
  • Change the PDE to $-u'' + u = f$ (Helmholtz) and retrain. How does the operator change?
  • Measure the inference time per sample and compare with the tridiagonal solve time.

Agent Lens: The Operator-Learning Agent

MDP ComponentOperator-Learning Interpretation
StateCurrent model parameters, training/validation loss, library of PDE families already learned
ActionChoose architecture (DeepONet vs FNO vs hybrid), select number of Fourier modes $k_{\max}$, choose sensor locations, decide training distribution of input functions
RewardLow relative prediction error on unseen inputs + fast inference time + generalization to new resolutions
PolicyMeta-learning across PDE families; active data acquisition (run expensive solver only for most informative inputs)

Active data acquisition. Generating training data requires running a classical PDE solver, which is expensive. A smart agent chooses which input functions to solve next based on where the current model is most uncertain. This reduces the total number of expensive simulations needed. Techniques from Bayesian optimization and active learning apply directly.

Architecture search as agent design. Choosing between DeepONet and FNO, selecting $k_{\max}$, choosing network width/depth, and setting the basis dimension $p$ are all design decisions. A meta-learning agent could try different configurations on small-scale problems and transfer the best architecture to the full-scale problem.

Multi-fidelity learning. An agent can combine cheap low-resolution data with expensive high-resolution data, learning to correct the low-fidelity predictions rather than learning from scratch. This is analogous to curriculum learning: start with easy (low-res) tasks, then refine with hard (high-res) tasks.

Exercises

Analytical

Exercise 1. Write out the DeepONet prediction formula for the antiderivative operator $\mathcal{G}[f](x) = \int_0^x f(s)\,ds$ with $p = 3$ basis functions, 4 sensor points, and query point $x = 0.7$.

Solution

$\hat{F}(0.7) = b_1(f(s_1), f(s_2), f(s_3), f(s_4))\tau_1(0.7) + b_2(\cdots)\tau_2(0.7) + b_3(\cdots)\tau_3(0.7) + b_0$, where $b_k$ are branch network outputs and $\tau_k$ are trunk network outputs.

Exercise 2. Explain why the FNO spectral convolution is more efficient than a pointwise convolution in physical space for long-range interactions.

Solution

In physical space, capturing interactions between points separated by distance $d$ requires a convolution kernel of size at least $d$, costing $O(n \cdot d)$ per output point. In frequency space, all frequencies interact through a simple matrix multiply at cost $O(k_{\max} \cdot d_{\text{in}} \cdot d_{\text{out}})$. The FFT costs $O(n \log n)$, making the total $O(n \log n + k_{\max} d_{\text{in}} d_{\text{out}})$, which is cheaper when $k_{\max} \ll n$.

Exercise 3. State the conditions under which Chen & Chen’s universal approximation theorem applies. Does it require the operator to be linear?

Solution

The theorem requires: (1) the operator $\mathcal{G}$ is continuous, (2) the input set $K$ is compact in $C([0,1])$, (3) the output is in $C(\Omega)$. Linearity is not required—the theorem applies to nonlinear operators.

Exercise 4. For a 1D FNO with $k_{\max} = 5$ Fourier modes on a grid of $n = 64$ points, how many learnable parameters does one spectral convolution layer have (for single input/output channel)?

Solution

Each of the $k_{\max} = 5$ modes has one complex weight, so $5 \times 2 = 10$ real parameters. Plus the pointwise linear transform $W$ has 1 parameter. Total: 11 parameters per layer. For $d_{\text{in}} = d_{\text{out}} = d$ channels: $5 \times d \times d \times 2 + d \times d + d$ parameters.

Computational

Exercise 5. Modify the code lab to learn the antiderivative operator $\mathcal{G}[f](x) = \int_0^x f(s)\,ds$ instead of the Poisson operator. Generate data analytically and compare training convergence.

Exercise 6. Implement a 1D spectral convolution layer in NumPy: take a vector of length 32, apply FFT, multiply the first 8 modes by random weights, apply IFFT, and plot the result.

Exercise 7. Train the toy DeepONet on forcing functions with amplitudes scaled to $[-1, 1]$, then test on forcing functions with amplitudes in $[-3, 3]$. How well does the operator extrapolate?

Exercise 8. Implement a simple resolution-transfer experiment: train the DeepONet on $n = 32$ point grids, then evaluate at $n = 64$ by querying the trunk network at 64 locations. Report the error compared to the 64-point FD solution.

Agentic

Exercise 9. Design an active data acquisition agent: maintain a pool of candidate forcing functions, train the DeepONet on a small initial set, then iteratively select the candidate with highest predicted uncertainty (approximated as ensemble disagreement) and add its solution to the training set. Compare with random acquisition.

Exercise 10. Implement a model selection agent that trains both a DeepONet and a simple fully-connected network on the same data and automatically selects the architecture with lower validation error. Run on 3 different PDE families and report which architecture wins each time.

Assessment

Quiz

Q1. What is the difference between a function and an operator?

Answer

A function maps numbers (or vectors) to numbers. An operator maps functions to functions. The Poisson solution operator maps a forcing function $f$ to the solution function $u$.

Q2. What do the branch and trunk networks in DeepONet represent?

Answer

The branch network encodes the input function (evaluated at sensor points) into a coefficient vector. The trunk network encodes the query location into a basis vector. Their inner product gives the operator output at the query point.

Q3. How does the FNO achieve global spatial interaction efficiently?

Answer

By performing convolution in the Fourier domain. The FFT transforms the input to frequency space at $O(n \log n)$ cost, a pointwise multiply captures global interactions, and the inverse FFT returns to physical space.

Q4. What does resolution invariance mean for a neural operator?

Answer

A model trained on one grid resolution can be evaluated at a different resolution without retraining. FNOs achieve this because spectral weights are defined in mode space (independent of grid size), and DeepONets because the trunk network takes continuous coordinates.

Q5. What is the role of $k_{\max}$ in the FNO?

Answer

$k_{\max}$ is the number of Fourier modes retained in the spectral convolution. It controls the frequency resolution: higher $k_{\max}$ captures finer spatial features but increases model size and overfitting risk.

Q6. Why does the universal approximation theorem for operators (Chen & Chen 1995) not guarantee practical accuracy?

Answer

The theorem is existential: it guarantees approximation is possible with enough basis functions but says nothing about how many are needed, how to find the right parameters, or how much training data is required. In practice, optimization difficulties and finite data limit accuracy.

Q7. What is the main advantage of neural operators over PINNs?

Answer

Neural operators learn the entire solution map and amortize the cost: training is expensive but inference for any new input is nearly instant. PINNs must retrain (or re-optimize) for each new set of conditions.

Q8. How is training data generated for operator learning?

Answer

By sampling many input functions from a distribution and solving the PDE for each using a classical solver. The input-output pairs $(f_i, u_i)$ form the training set.

Q9. What is the computational cost of one FNO forward pass on a 1D grid of $n$ points?

Answer

$O(L \cdot (n \log n + k_{\max} d^2))$ where $L$ is the number of layers, $d$ is the channel width, and $k_{\max}$ is the number of Fourier modes. The $n \log n$ term comes from FFT/IFFT per layer.

Q10. In the agent lens, what does “active data acquisition” mean?

Answer

Choosing which input functions to simulate next based on where the current model is most uncertain, rather than sampling randomly. This reduces the total number of expensive simulations needed to achieve a target accuracy.

Mini-Project: Build an Antiderivative Operator

Train a DeepONet in NumPy to learn $\mathcal{G}[f](x) = \int_0^x f(s)\,ds$.

  1. Generate training data: 500 random polynomial inputs $f(s) = \sum_{k=0}^{4} a_k s^k$ with $a_k \sim \mathcal{N}(0,1)$
  2. Compute antiderivatives analytically
  3. Train a DeepONet with $p = 30$ basis functions
  4. Test on 100 unseen polynomial inputs and 50 unseen sinusoidal inputs (out-of-distribution)
  5. Report in-distribution and out-of-distribution errors
CriterionWeightDescription
Correctness30%In-distribution relative error below 5%
Generalization25%Analysis of out-of-distribution performance with discussion
Visualization20%Clear plots of predictions, errors, and training curves
Analysis15%Discussion of basis dimension, sensor placement, and training distribution effects
Code quality10%Modular, well-commented code

References & Next Steps

References

  1. Lu, L., Jin, P., Pang, G., Zhang, Z., & Karniadakis, G. E. (2021). “Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.” Nature Machine Intelligence, 3, 218–229.
  2. Li, Z., Kovachki, N., Azizzadenesheli, K., Liu, B., Bhatt, K., Stuart, A., & Anandkumar, A. (2021). “Fourier neural operator for parametric partial differential equations.” ICLR 2021.
  3. Chen, T. & Chen, H. (1995). “Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems.” IEEE Transactions on Neural Networks, 6(4), 911–917.
  4. Kovachki, N., Li, Z., Liu, B., Azizzadenesheli, K., Bhatt, K., Stuart, A., & Anandkumar, A. (2023). “Neural operator: Learning maps between function spaces with applications to PDEs.” JMLR, 24(89), 1–97.
  5. Lanthaler, S., Mishra, S., & Karniadakis, G. E. (2022). “Error estimates for DeepONets: A deep learning framework in infinite dimensions.” Transactions of Mathematics and Its Applications, 6(1).

Full PyTorch Notebook

For complete DeepONet and FNO implementations: notebooks/neural_operator_full.ipynb

Next Steps