Summary

The deep BSDE method works as follows. Given a semilinear parabolic PDE (eq. 1) with terminal condition u(T,x) = g(x), reformulate it as a BSDE via the nonlinear Feynman-Kac formula: the solution u(t,X_t) satisfies Y_t = u(t,X_t) with Z_t = sigma^T(t,X_t) grad u(t,X_t). Discretize time into N steps (t_0 < t_1 < … < t_N = T). At each time step t_n, approximate sigma^T grad u by a feedforward neural network phi_n(X_{t_n}). Also treat u(0,X_0) and grad u(0,X_0) as learnable parameters. Stack all sub-networks into one deep network and train end-to-end by minimizing the loss l(theta) = E|g(X_{t_N}) - Y_{t_N}|^2, where Y_{t_N} is the terminal value propagated forward through the discretized BSDE dynamics (eq. 5).

Implementation details: Each sub-network has 4 layers: 1 input layer (d-dimensional), 2 hidden layers (d+10 neurons each), 1 output layer (d-dimensional). Activation: ReLU. Batch normalization after each linear transformation and before activation. Parameters initialized from normal or uniform distribution. Optimizer: Adam with default hyperparameters. Batch size: 64. Total parameters: (H+1)(N-1) layers where H = number of hidden layers per sub-network. The whole network has (H+1)(N-1) layers with free parameters.

Results on 100-dimensional problems: Nonlinear Black-Scholes with default risk: 0.46% relative error in 1607 seconds (N=40 time steps, lr=0.008). HJB equation: 0.17% relative error in 330 seconds (N=20, lr=0.01). Allen-Cahn equation: 0.30% relative error in 647 seconds (N=20, lr=0.0005). All on a 2013 MacBook Pro. Accuracy improves with more hidden layers (Table 1: 2.29% with 0 hidden layers down to 0.53% with 4 hidden layers for a reaction-diffusion PDE).

Key Contributions

  • First practical deep learning algorithm for high-dimensional nonlinear PDEs
  • BSDE reformulation with gradient as “policy function” (RL analogy)
  • Sub-1% errors in 100 dimensions on financial and physics PDEs
  • Architecture: stacked sub-networks across time steps, trained end-to-end
  • Overcomes curse of dimensionality for parabolic PDEs

Methodology

The connection between PDEs and BSDEs is the nonlinear Feynman-Kac formula: for a semilinear parabolic PDE, the solution u satisfies a BSDE where Y_t = u(t,X_t) and Z_t = sigma^T grad u(t,X_t). The key insight is to approximate Z_t (the gradient, not u itself) by neural networks, since Z appears linearly in the BSDE dynamics. The loss function only penalizes the terminal mismatch, but the BSDE dynamics enforce the PDE structure at all intermediate times through the forward propagation.

Key Findings

  • Batch normalization is critical for training stability
  • Learning rate scheduling (larger early, smaller late) improves convergence
  • More hidden layers per sub-network improve accuracy (diminishing returns after 2-3)
  • Residual/skip connections help for complex PDEs
  • Runtime scales roughly linearly in dimension (not exponentially)

Important References

  1. Deep Learning-Based Numerical Methods for High-Dimensional PDEs and BSDEs — E, Han, Jentzen (2017), companion paper with more details
  2. Convergence of the Deep BSDE Method for Coupled FBSDEs — Han, Long (2018), convergence analysis
  3. On Multilevel Picard Numerical Approximations — E, Hutzenthaler, Jentzen, Kruse (2016), alternative high-dim PDE solver

Atomic Notes


paper