Convergence of Deep Fictitious Play for Stochastic Differential Games

Abstract

Stochastic differential games have been used extensively to model agents’ competitions in Finance, for instance, in P2P lending platforms from the Fintech industry, the banking system for systemic risk, and insurance markets. The recently proposed machine learning algorithm, deep fictitious play, provides a novel efficient tool for finding Markovian Nash equilibrium of large N-player asymmetric stochastic differential games. By incorporating the idea of fictitious play, the algorithm decouples the game into N sub-optimization problems, and identifies each player’s optimal strategy with the deep backward stochastic differential equation (BSDE) method parallelly and repeatedly. In this paper, we prove the convergence of deep fictitious play (DFP) to the true Nash equilibrium. We can also show that the strategy based on DFP forms an epsilon-Nash equilibrium. We generalize the algorithm by proposing a new approach to decouple the games, and present numerical results of large population games showing the empirical convergence of the algorithm beyond the technical assumptions in the theorems.

Summary

This paper provides the theoretical foundation for deep fictitious play. The algorithm has a two-step structure: Step I decouples the game, Step II solves each sub-problem via deep BSDE method. Two decoupling options are provided: (I) Fictitious Play — each player best-responds to opponents’ previous strategies (eq. 12, nonlinear PDE), and (II) Policy Update — evaluate the value function under all players’ previous strategies (eq. 13, linear PDE), then extract the optimal policy as if it were an optimizer. Both result in BSDE systems (eqs. 15, 17) solved via the deep BSDE variational formulation.

For implementation, the key differences from the original DFP paper are: the forward process X_t in the BSDE is chosen without drift (driftless, eq. 9) so that the same forward paths can be reused across all N players and across stages. This is a crucial efficiency gain. The BSDE loss function (eq. 21) minimizes E|g(X_T) - Y_T|^2, where Y and Z are parameterized by neural networks (psi_0 for Y_0, phi_k for Z at each time step). At each stage, there are N such losses (one per player). After each stage, a Lipschitz projection (eq. 36) is applied to maintain regularity of the learned strategies — this is needed for the convergence proof but also helps numerically (weight clipping or spectral normalization).

The convergence results are: Theorem 2 shows the decoupling step converges at rate (a_alpha + epsilon)^m where a_alpha in (0,1) is a contraction constant related to the Lipschitz constant of the best-response map. Theorem 3 combines decoupling convergence with deep BSDE approximation error. Corollary 1: for sufficiently many stages m and sufficiently fine time mesh, the DFP strategy forms an epsilon-Nash equilibrium. Theorem 4/5: the game values and paths produced by DFP are close to the true Nash equilibrium.

Key Contributions

First convergence proof for deep fictitious play to true Nash equilibrium
Two decoupling options: fictitious play (nonlinear sub-problems) and policy update (linear sub-problems)
Epsilon-Nash equilibrium guarantee (Corollary 1)
Convergence rate: geometric in the number of stages, (a_alpha + epsilon)
Practical insight: driftless forward process enables path reuse across players

Methodology

The BSDE system (eq. 9) uses a forward process X_t = x_0 + integral Sigma dW_s (no drift) with backward Y_t = g(X_T) + integral hat{H} ds - integral Z^T dW_s. The connection Y_t^i = V^i(t,X_t) and Z_t^i = Sigma^T grad_x V^i(t,X_t) recovers the value function and optimal control. The deep BSDE method discretizes this system (eqs. 22-23) and parameterizes (Y_0, Z_k) with neural networks trained via SGD.

Key Findings

Convergence requires small time duration T or weak coupling (Assumption 3)
Empirical convergence observed beyond the technical assumptions
Policy update (option II) solves linear PDEs — potentially faster per stage
Projection operator maintains Lipschitz regularity of learned strategies
Both decoupling options converge at the same rate

Important References

Deep Fictitious Play for Finding Markovian Nash Equilibrium in Multi-Agent Games — the original algorithm this paper analyzes
Solving High-Dimensional Partial Differential Equations Using Deep Learning — the deep BSDE subroutine
Mean Field Games and Systemic Risk — Carmona, Fouque, Sun (2015), benchmark financial game

Atomic Notes

paper

Alethograph

Explorer