see also:

The Wasserstein distance, also known as the Earth Mover’s distance (EMD), is a measure of the distance between two probability distributions over a given space. It originates from the field of optimal transport, a branch of mathematics concerned with finding the most efficient way to move mass from one distribution to another. The Wasserstein distance is particularly useful because it takes into account the underlying geometry of the space, making it a powerful tool for comparing distributions that are not only different in mass but also located differently within the space.

Definition

Given two probability distributions (P) and (Q) over a metric space (M), with a distance metric (d) defined on (M), the Wasserstein distance of order (p) between (P) and (Q) is defined as:

where:

  • (\Gamma(P, Q)) is the set of all joint distributions (\gamma) on (M \times M) with marginals (P) and (Q).
  • (d(x, y)) is the distance between points (x) and (y) in (M).
  • The infimum (inf) is taken over all possible transport plans (\gamma) that describe how mass is moved from (P) to (Q).

The most commonly used Wasserstein distances are (W_1) (the first-order Wasserstein distance) and (W_2) (the second-order Wasserstein distance), with (W_1) often being used for practical applications due to its computational properties and interpretability as the minimum work required to transform (P) into (Q).

Interpretation

The Wasserstein distance can be interpreted as the minimum cost required to transform one probability distribution into another, with the cost computed based on the distances each unit of mass has to be moved. This makes it especially suited for applications where the spatial arrangement of the distributions is important.

Applications

  • Image Processing and Computer Vision: Wasserstein distance is used in comparing images represented as distributions of features or pixel intensities.

  • Machine Learning: It is employed in generative adversarial networks (GANs) to measure the distance between the distribution of generated data and real data, improving the training process and the quality of generated samples.

  • Data Science: Wasserstein distance finds applications in clustering, anomaly detection, and domain adaptation, where measuring how similar or different two distributions are is crucial.

  • Geosciences and Environmental Science: Used in analyzing spatial distributions of phenomena such as rainfall or pollution levels.

Challenges

  • Computational Complexity: Computing the Wasserstein distance, especially in high-dimensional spaces, can be computationally intensive, although recent advances have led to more efficient algorithms and approximations.

  • Choice of Metric: The choice of the distance metric (d) in the underlying space (M) can significantly impact the Wasserstein distance, requiring careful consideration based on the application.

The Wasserstein distance offers a nuanced way to compare distributions, capturing both the amount of mass that needs to be moved and the distance it needs to be moved over, making it a valuable tool in various analytical and machine learning applications.