Abstract
Developing algorithms for solving high-dimensional partial differential equations (PDEs) has been an exceedingly difficult task for a long time, due to the notoriously difficult problem known as the “curse of dimensionality.” This paper introduces a deep learning-based approach that can handle general high-dimensional parabolic PDEs. To this end, the PDEs are reformulated using backward stochastic differential equations and the gradient of the unknown solution is approximated by neural networks, very much in the spirit of deep reinforcement learning with the gradient acting as the policy function. Numerical results on examples including the nonlinear Black-Scholes equation, the Hamilton-Jacobi-Bellman equation, and the Allen-Cahn equation suggest that the proposed algorithm is quite effective in high dimensions, in terms of both accuracy and cost.
Summary
The deep BSDE method works as follows. Given a semilinear parabolic PDE (eq. 1) with terminal condition u(T,x) = g(x), reformulate it as a BSDE via the nonlinear Feynman-Kac formula: the solution u(t,X_t) satisfies Y_t = u(t,X_t) with Z_t = sigma^T(t,X_t) grad u(t,X_t). Discretize time into N steps (t_0 < t_1 < … < t_N = T). At each time step t_n, approximate sigma^T grad u by a feedforward neural network phi_n(X_{t_n}). Also treat u(0,X_0) and grad u(0,X_0) as learnable parameters. Stack all sub-networks into one deep network and train end-to-end by minimizing the loss l(theta) = E|g(X_{t_N}) - Y_{t_N}|^2, where Y_{t_N} is the terminal value propagated forward through the discretized BSDE dynamics (eq. 5).
Implementation details: Each sub-network has 4 layers: 1 input layer (d-dimensional), 2 hidden layers (d+10 neurons each), 1 output layer (d-dimensional). Activation: ReLU. Batch normalization after each linear transformation and before activation. Parameters initialized from normal or uniform distribution. Optimizer: Adam with default hyperparameters. Batch size: 64. Total parameters: (H+1)(N-1) layers where H = number of hidden layers per sub-network. The whole network has (H+1)(N-1) layers with free parameters.
Results on 100-dimensional problems: Nonlinear Black-Scholes with default risk: 0.46% relative error in 1607 seconds (N=40 time steps, lr=0.008). HJB equation: 0.17% relative error in 330 seconds (N=20, lr=0.01). Allen-Cahn equation: 0.30% relative error in 647 seconds (N=20, lr=0.0005). All on a 2013 MacBook Pro. Accuracy improves with more hidden layers (Table 1: 2.29% with 0 hidden layers down to 0.53% with 4 hidden layers for a reaction-diffusion PDE).
Key Contributions
- First practical deep learning algorithm for high-dimensional nonlinear PDEs
- BSDE reformulation with gradient as “policy function” (RL analogy)
- Sub-1% errors in 100 dimensions on financial and physics PDEs
- Architecture: stacked sub-networks across time steps, trained end-to-end
- Overcomes curse of dimensionality for parabolic PDEs
Methodology
The connection between PDEs and BSDEs is the nonlinear Feynman-Kac formula: for a semilinear parabolic PDE, the solution u satisfies a BSDE where Y_t = u(t,X_t) and Z_t = sigma^T grad u(t,X_t). The key insight is to approximate Z_t (the gradient, not u itself) by neural networks, since Z appears linearly in the BSDE dynamics. The loss function only penalizes the terminal mismatch, but the BSDE dynamics enforce the PDE structure at all intermediate times through the forward propagation.
Key Findings
- Batch normalization is critical for training stability
- Learning rate scheduling (larger early, smaller late) improves convergence
- More hidden layers per sub-network improve accuracy (diminishing returns after 2-3)
- Residual/skip connections help for complex PDEs
- Runtime scales roughly linearly in dimension (not exponentially)
Important References
- Deep Learning-Based Numerical Methods for High-Dimensional PDEs and BSDEs — E, Han, Jentzen (2017), companion paper with more details
- Convergence of the Deep BSDE Method for Coupled FBSDEs — Han, Long (2018), convergence analysis
- On Multilevel Picard Numerical Approximations — E, Hutzenthaler, Jentzen, Kruse (2016), alternative high-dim PDE solver