Deep Fictitious Play for Finding Markovian Nash Equilibrium in Multi-Agent Games

Abstract

We propose a deep neural network-based algorithm to identify the Markovian Nash equilibrium of general large N-player stochastic differential games. Following the idea of fictitious play, we recast the N-player game into N decoupled decision problems (one for each player) and solve them iteratively. The individual decision problem is characterized by a semilinear Hamilton-Jacobi-Bellman equation, to solve which we employ the recently developed deep BSDE method. The resulted algorithm can solve large N-player games for which conventional numerical methods would suffer from the curse of dimensionality. Multiple numerical examples involving identical or heterogeneous agents, with risk-neutral or risk-sensitive objectives, are tested to validate the accuracy of the proposed algorithm in large group games. Even for a fifty-player game with the presence of common noise, the proposed algorithm still finds the approximate Nash equilibrium accurately.

Summary

The core algorithm (Algorithm 1) works as follows. Given N players, initialize N neural networks representing each player’s value function V^i. At each fictitious play stage m = 1,…,M: for all players i in parallel, generate N_sample sample paths of the state process X using opponents’ stage-(m-1) strategies, then update player i’s neural network via N_SGD_per_stage steps of Adam SGD on the deep BSDE method loss function (terminal condition mismatch). The key implementation insight is that individual problems need not be solved accurately at early stages — 100 SGD steps per stage suffices because opponents’ strategies are still far from equilibrium. Networks carry over between stages without re-initialization, so parameters are updated incrementally.

The network architecture for each player’s value function Net(t,x) is a fully-connected feedforward network with 3 hidden layers. The hypothesis spaces N_0’ (for Y_0 = V^i(0,x)) and N_k (for Z_k = Sigma^T grad_x V^i at each time step) share the same parameters. The Feynman-Kac relation Y_t = V^i(t,X_t) and Z_t = Sigma(t,X_t)^T grad_x V^i(t,X_t) is exploited so that the gradient of the value function defines both the BSDE solution and the optimal control. Tested hyperparameters: N_sample = 256 paths, N_batch = 64, N_SGD_per_stage = 100, learning rate = 0.01 with Adam. Time discretisation: N_T = 20 steps on [0,T].

Four numerical examples validate the approach: (1) 10-player inter-bank borrowing/lending game with common noise (RSE of V: 4.6%, grad V: 0.2%), (2) risk-sensitive version with heterogeneous theta_i = 0.6 + 0.02i (RSE: 1.4%, 0.1%), (3) general 10-player LQ game with fully heterogeneous agents in 10 dimensions (RSE: 6.5%, 0.4%), (4) N=50 inter-bank game, including a variant with nonlinear (cubic) drift where no analytic solution exists.

Key Contributions

Algorithm combining deep fictitious play with deep BSDE method for N-player SDGs
Handles heterogeneous agents, risk-sensitive objectives, and common noise
Parallelizable across players (each player’s problem is independent per stage)
Scales to N=50 players in (N+1)-dimensional state space
Moderate SGD budget per stage is sufficient and preferred

Methodology

At each fictitious play stage, the N-player game (eq. 1) is decoupled into N individual stochastic control problems (eq. 5). Each player i minimizes J_0^i(alpha^i; alpha^{-i,m}) treating opponents’ strategies as fixed. Under the Markovian structure, this reduces to solving a semilinear parabolic PDE (eq. 8) via the deep BSDE method. The PDE is connected to a BSDE (eqs. 9-10) through the nonlinear Feynman-Kac formula. The optimal control is recovered from the gradient of the value function: alpha^{i,m}(t,x) = argmin G^i(t,x,(alpha^i, alpha^{-i,m-1}(t,x)), grad_x V^{i,m}, V^{i,m}).

Key Findings

RSE errors of 1-7% for value functions and 0.1-0.4% for gradients across all examples
N_SGD_per_stage insensitive: 10, 50, 100, 400 all give similar final accuracy with same total budget
Handles N=50 with common noise — distributions of terminal X_T and alpha_T match analytic solutions
Nonlinear (cubic) drift variant shows the algorithm works where no analytic benchmark exists

Important References

Solving High-Dimensional Partial Differential Equations Using Deep Learning — foundational deep BSDE method used as subroutine
Deep Fictitious Play for Stochastic Differential Games — Hu (2019), open-loop version; not directly applicable to Markovian equilibrium
Mean Field Games and Systemic Risk — Carmona, Fouque, Sun (2015), the inter-bank model used as benchmark

Atomic Notes

paper

Alethograph

Explorer

Deep Fictitious Play for Finding Markovian Nash Equilibrium in Multi-Agent Games

Summary

Key Contributions

Methodology

Key Findings

Important References

Atomic Notes

Graph View

Table of Contents

Backlinks