The Wasserstein distance of order between probability measures on a Polish metric space is defined as the -th root of the optimal transport cost with power cost:

The Wasserstein space consists of all probability measures with finite -th moment, equipped with . When is Polish, is itself Polish (complete, separable). The embedding is an isometry: .

metrises weak convergence in : iff weakly and -th moments converge. The two most important cases are (Kantorovich-Rubinstein distance, dual to Lipschitz functions, most flexible) and (reflects Riemannian geometry, scales well with dimension, natural for displacement interpolation).

Key Details

  • Triangle inequality: proved via the Gluing Lemma + Minkowski inequality in
  • for (Hölder’s inequality)
  • dual:
  • compact iff compact; but NOT locally compact if is only locally compact
  • Controlled by weighted total variation:
  • For : displacement interpolation is a constant-speed geodesic in

Textbook References

Optimal Transport Old and New (Villani, 2009)

  • Definition 6.1 (p. 105):
  • Definition 6.4 (pp. 106-107):
  • Remark 6.5 (p. 107): Kantorovich-Rubinstein duality for
  • Theorem 6.9 (p. 108): metrises weak convergence in
  • Theorem 6.18 (pp. 116-117): is Polish. Proof: separability via finite-support approximation; completeness via Prokhorov + Cauchy tightness (Lemma 6.14)
  • Five reasons is preferred (pp. 110-111): handles large distances, natural for OT, rich duality, easy upper bounds via any coupling, encodes geometry

concept