The multi-layer deep BSDE solver is an extension of the deep BSDE method of Han, Jentzen, and E (2018) designed to solve systems of coupled BSDEs that arise in portfolio-wide xVA computation. Rather than solving a single monolithic BSDE for the adjusted portfolio value, the method decomposes the problem into four hierarchical layers: (1) clean derivative values, (2) initial margin via deep quantile regression, (3) ColVA, CVA, DVA, and MVA, and (4) FVA. Each layer’s output feeds forward as a known input to subsequent layers, ensuring that the full interdependence structure of the xVA system is respected without requiring deeply nested simulations.
A key architectural innovation is the simultaneous solution of multiple BSDEs within a single layer using a shared neural network function approximator. In Layer 1, for instance, all P clean-value BSDEs share one network Z: [0,T] x R^d → R^{sum d_j}, with time as an additional input feature (rather than training separate networks per time step as in the original deep BSDE method). This reduces memory usage and avoids computation time that would otherwise grow linearly with the number of derivatives. The method is combined with a Girsanov-based measure change in Layers 3 and 4, where two forward SDE trajectories are simulated per BSDE — one under the original measure and one under a tilted measure with increased default probability — and the loss function sums the terminal condition errors across both measures.
The solver is formulated as a variational problem: given the forward SDE discretized by Euler-Maruyama, the backward SDE is rewritten as a forward SDE with unknown initial condition y_0 and unknown control process Z. The neural network parametrizes Z, while y_0 is a learnable scalar parameter. Training minimises the mean-squared terminal condition mismatch. The architecture uses fully connected networks with ReLU activations and three hidden layers (100 neurons for Layer 1, 50 for Layers 3-4, 16 for Layer 2), trained with Adam on 2^20 samples with batch size 2^11.
Key Details
- Layer 1 (Clean values): Simultaneously approximates P BSDEs for individual derivative clean values; uses only the d non-defaultable risk factors
- Layer 2 (Initial margin): Separate quantile regression networks per time step; estimates VaR of portfolio value changes over the margin period of risk
- Layer 3 (ColVA, CVA, DVA, MVA): Four xVA BSDEs solved jointly on the full (d+2)-dimensional state space including defaultable entities; uses measure change to boost default event frequency
- Layer 4 (FVA): Single BSDE depending on all preceding layers’ outputs; the FVA driver involves the total valuation adjustment from Layer 3
- Error propagation across layers is characterised by Theorem 7.3 of ANDERSSON2025: each layer’s error inherits a posteriori terms from all preceding layers
- Tested on a portfolio of 33 basket options on 5 assets plus 2 default processes (93 dimensions total, 201 time steps)