The Deep BSDE Solver, introduced by Han, Jentzen, and E (2018), is a numerical method for solving high-dimensional backward stochastic differential equations by parametrising the control process Z at each time step with a feedforward neural network. The method exploits the equivalence between solving a BSDE and solving a stochastic optimal control problem: given a forward SDE for X and a backward equation for (Y, Z) with terminal condition Y_T = g(X_T), the solution minimises E[|g(X_T) - Y_T|^2] over the initial value y and the control trajectory Z.
The algorithm discretises the BSDE on a uniform time grid with N steps. At each time step n, a distinct neural network phi_n^rho maps the current state X_n to the control Z_n^rho. The forward process X is simulated via Euler-Maruyama, and the backward process Y is propagated using the discretised BSDE dynamics. All network parameters rho and the initial value xi are jointly optimised by stochastic gradient descent to minimise the empirical mean squared terminal error over L sample paths. This formulation can be viewed as model-based reinforcement learning, where the “policy” is the parametrised Markovian control and the “reward” is the negative terminal loss.
A key advantage over traditional regression-based BSDE solvers (e.g., Least-Squares Monte Carlo) is that the Deep BSDE Solver directly produces the control process Z, which in financial applications corresponds to the hedging strategy. This avoids the need for separate differentiation to recover sensitivities. The use of neural networks also leverages the universal approximation property, and recent theoretical results (Reisinger and Zhang, 2020; Hutzenthaler et al., 2018) show that deep networks can overcome the curse of dimensionality for certain classes of PDEs arising from SDE control problems.
Key Details
- Each time step uses a separate neural network with shared architecture (typically 2 hidden layers, d+10 to d+20 nodes, ReLU activation)
- Batch normalisation (Ioffe and Szegedy, 2015) stabilises training
- The a posteriori error bound (Han and Long, 2020) states: sup E|Y_t - Y_t^approx|^2 + integral of E|Z_t - Z_t^approx|^2 dt is bounded by C(Delta_t + terminal loss)
- The initial value xi converges more easily than the network parameters rho; errors in Y and Z are typically of similar magnitude
- Originally developed for nonlinear PDEs; adapted for xVA in Deep xVA Solver - A Neural Network Based Counterparty Credit Risk Management Framework
- Reference implementation: Han et al. (2018), “Solving high-dimensional partial differential equations using deep learning”