Deep fictitious play (DFP) is an algorithm for computing Markovian Nash equilibria of N-player non-zero-sum stochastic differential games, combining classical fictitious play with the deep BSDE method.

The algorithm exploits the fact that, under fictitious play, the N-player game decouples into N independent single-player stochastic control problems at each stage. Each player best-responds to opponents’ previous strategies, yielding a semilinear Hamilton-Jacobi-Bellman (HJB) equation that can be solved via the deep BSDE method. The N problems are solved in parallel on GPUs.

The algorithm iterates M fictitious play stages. At stage m, each player i faces: min_{alpha^i} J^i(alpha^i; alpha^{-i,m-1}), where opponents’ strategies are fixed from the previous stage. This yields an HJB equation, which is reformulated as a BSDE and solved by a neural network. Crucially, the networks carry over from stage to stage without re-initialization, so only a moderate number of SGD steps per stage (e.g., 100) is needed. The forward process is chosen driftless so the same sample paths can be reused across all N players and stages.

Algorithm

  1. Initialize N neural networks for V^i(t,x), i = 1,…,N
  2. For m = 1 to M (fictitious play stages):
    • For all i in I in parallel:
      • Generate N_sample paths of X using opponents’ strategies alpha^{-i,m-1}
      • Update player i’s network with N_SGD_per_stage SGD steps on the deep BSDE loss
      • Extract optimal policy alpha^{i,m}(t,x) = argmin H^i(t,x,alpha, grad V^i, V^i)
    • Collect alpha^m = (alpha^{1,m}, …, alpha^{N,m})
  3. Return alpha

Key Properties

  • Handles heterogeneous agents (different cost functions, risk preferences)
  • Accommodates risk-sensitive objectives (exponential utility)
  • Scales to N = 50 players with common noise
  • Convergence rate: geometric (a_alpha + epsilon)^m where a_alpha < 1
  • Strategy forms an epsilon-Nash equilibrium (Corollary 1 of Han-Hu-Long 2021)
  • Typical hyperparameters: N_sample=256, N_batch=64, N_SGD_per_stage=100, lr=0.01, M=20-40 stages, N_T=20 time steps

method