The Wasserstein distance of order between probability measures on a Polish metric space is defined as the -th root of the optimal transport cost with power cost:
The Wasserstein space consists of all probability measures with finite -th moment, equipped with . When is Polish, is itself Polish (complete, separable). The embedding is an isometry: .
metrises weak convergence in : iff weakly and -th moments converge. The two most important cases are (Kantorovich-Rubinstein distance, dual to Lipschitz functions, most flexible) and (reflects Riemannian geometry, scales well with dimension, natural for displacement interpolation).
Key Details
- Triangle inequality: proved via the Gluing Lemma + Minkowski inequality in
- for (Hölder’s inequality)
- dual:
- compact iff compact; but NOT locally compact if is only locally compact
- Controlled by weighted total variation:
- For : displacement interpolation is a constant-speed geodesic in
Textbook References
Optimal Transport Old and New (Villani, 2009)
- Definition 6.1 (p. 105):
- Definition 6.4 (pp. 106-107):
- Remark 6.5 (p. 107): Kantorovich-Rubinstein duality for
- Theorem 6.9 (p. 108): metrises weak convergence in
- Theorem 6.18 (pp. 116-117): is Polish. Proof: separability via finite-support approximation; completeness via Prokhorov + Cauchy tightness (Lemma 6.14)
- Five reasons is preferred (pp. 110-111): handles large distances, natural for OT, rich duality, easy upper bounds via any coupling, encodes geometry