Abstract
We propose a deep neural network-based algorithm to identify the Markovian Nash equilibrium of general large N-player stochastic differential games. Following the idea of fictitious play, we recast the N-player game into N decoupled decision problems (one for each player) and solve them iteratively. The individual decision problem is characterized by a semilinear Hamilton-Jacobi-Bellman equation, to solve which we employ the recently developed deep BSDE method. The resulted algorithm can solve large N-player games for which conventional numerical methods would suffer from the curse of dimensionality. Multiple numerical examples involving identical or heterogeneous agents, with risk-neutral or risk-sensitive objectives, are tested to validate the accuracy of the proposed algorithm in large group games. Even for a fifty-player game with the presence of common noise, the proposed algorithm still finds the approximate Nash equilibrium accurately.
Summary
The core algorithm (Algorithm 1) works as follows. Given N players, initialize N neural networks representing each player’s value function V^i. At each fictitious play stage m = 1,…,M: for all players i in parallel, generate N_sample sample paths of the state process X using opponents’ stage-(m-1) strategies, then update player i’s neural network via N_SGD_per_stage steps of Adam SGD on the deep BSDE method loss function (terminal condition mismatch). The key implementation insight is that individual problems need not be solved accurately at early stages — 100 SGD steps per stage suffices because opponents’ strategies are still far from equilibrium. Networks carry over between stages without re-initialization, so parameters are updated incrementally.
The network architecture for each player’s value function Net(t,x) is a fully-connected feedforward network with 3 hidden layers. The hypothesis spaces N_0’ (for Y_0 = V^i(0,x)) and N_k (for Z_k = Sigma^T grad_x V^i at each time step) share the same parameters. The Feynman-Kac relation Y_t = V^i(t,X_t) and Z_t = Sigma(t,X_t)^T grad_x V^i(t,X_t) is exploited so that the gradient of the value function defines both the BSDE solution and the optimal control. Tested hyperparameters: N_sample = 256 paths, N_batch = 64, N_SGD_per_stage = 100, learning rate = 0.01 with Adam. Time discretisation: N_T = 20 steps on [0,T].
Four numerical examples validate the approach: (1) 10-player inter-bank borrowing/lending game with common noise (RSE of V: 4.6%, grad V: 0.2%), (2) risk-sensitive version with heterogeneous theta_i = 0.6 + 0.02i (RSE: 1.4%, 0.1%), (3) general 10-player LQ game with fully heterogeneous agents in 10 dimensions (RSE: 6.5%, 0.4%), (4) N=50 inter-bank game, including a variant with nonlinear (cubic) drift where no analytic solution exists.
Key Contributions
- Algorithm combining deep fictitious play with deep BSDE method for N-player SDGs
- Handles heterogeneous agents, risk-sensitive objectives, and common noise
- Parallelizable across players (each player’s problem is independent per stage)
- Scales to N=50 players in (N+1)-dimensional state space
- Moderate SGD budget per stage is sufficient and preferred
Methodology
At each fictitious play stage, the N-player game (eq. 1) is decoupled into N individual stochastic control problems (eq. 5). Each player i minimizes J_0^i(alpha^i; alpha^{-i,m}) treating opponents’ strategies as fixed. Under the Markovian structure, this reduces to solving a semilinear parabolic PDE (eq. 8) via the deep BSDE method. The PDE is connected to a BSDE (eqs. 9-10) through the nonlinear Feynman-Kac formula. The optimal control is recovered from the gradient of the value function: alpha^{i,m}(t,x) = argmin G^i(t,x,(alpha^i, alpha^{-i,m-1}(t,x)), grad_x V^{i,m}, V^{i,m}).
Key Findings
- RSE errors of 1-7% for value functions and 0.1-0.4% for gradients across all examples
- N_SGD_per_stage insensitive: 10, 50, 100, 400 all give similar final accuracy with same total budget
- Handles N=50 with common noise — distributions of terminal X_T and alpha_T match analytic solutions
- Nonlinear (cubic) drift variant shows the algorithm works where no analytic benchmark exists
Important References
- Solving High-Dimensional Partial Differential Equations Using Deep Learning — foundational deep BSDE method used as subroutine
- Deep Fictitious Play for Stochastic Differential Games — Hu (2019), open-loop version; not directly applicable to Markovian equilibrium
- Mean Field Games and Systemic Risk — Carmona, Fouque, Sun (2015), the inter-bank model used as benchmark