The deep BSDE method has two principal variants that differ in the direction of time-stepping for the backward SDE component and consequently in their loss functions. Both methods time-step the forward SDE for X_t forward, but they differ in how the BSDE for Y_t is handled.
The forward deep BSDE method, introduced by Han, Jentzen, E (2018), treats the initial value Y_0 and the gradient approximations Z_{t_n} as unknowns, propagates Y_t forward using the discretised BSDE dynamics Y_{t_{n+1}} = Y_{t_n} - f(t_n, X_{t_n}, Y_{t_n}, Z_{t_n}) Delta t + Z_{t_n}^T Delta W_n, and minimises the terminal condition mismatch loss = E|g(X_T) - Y_T|^2. This transforms the BSDE into a stochastic control problem: choose Y_0 and controls {Z_{t_n}} to steer Y_t from an unknown initial value to the known terminal condition. The architecture stacks sub-networks at each time step into a deep residual network (see Han, Jentzen, E 2025 review).
The backward deep BSDE method, applied to nonlinear problems by Yu, Hientzsch, Ganesan (2020), starts from the known terminal value Y_T = g(X_T) and time-steps backward: Y_{t_i} = ybackstep(t_i, Y_{t_{i+1}}, X_{t_i}, Pi_{t_i}, Delta W_i). Each backward step requires solving a (potentially nonlinear) equation for Y_{t_i}. The loss function minimises the variance of Y_0 (for fixed X_0) or the distance E|Y_0 - Y_init(X_0)|^2 (for random X_0). The backward method has the advantage that the terminal condition is satisfied exactly, but requires solving a local nonlinear problem at each time step.
Key Details
- Forward method loss: E|g(X_T) - Y_T|^2 (terminal mismatch); unknowns are Y_0 and {Z_{t_n}}
- Backward method loss: var(Y_0) for fixed X_0, or E|Y_0^B - Y_init(X_0)|^2 for random X_0; unknowns are the hedging strategies {Pi_{t_n}}
- Nonlinear backward step: for nonlinear generators like differential rates, the backward step Y_{t_i} - f(t_i, X_{t_i}, Y_{t_i}, Pi_{t_i}) Delta t = Y_{t_{i+1}} - Pi^T sigma Delta W requires either exact case-splitting or Taylor expansion
- Taylor vs exact: Yu et al. (2020) showed differences of order 10^{-5} between exact and first-order Taylor backward steps for differential rates
- Convergence behaviour: the learned Y_0 parameter method converges more slowly initially but more smoothly than batch-variance methods; using 100 mini-batch means instead of 1 improves batch-variance convergence
- Prior work: Wang et al. (2018) introduced backward deep BSDE for zero-drift generators; Liang et al. (2019) handled linear generators; Yu et al. (2020) were first for nonlinear generators