Evidence Lower BoundDig That Data

tags: - colorclass/probability theory ---The Evidence Lower Bound (ELBO) is a central concept in variational inference, a method used in Bayesian statistics and machine learning to approximate complex posterior distributions. The ELBO emerges as a key component in the effort to make Bayesian inference computationally tractable for models where the exact computation of posterior distributions is infeasible due to the high-dimensional integration involved.

The Challenge of Computing the Evidence

In Bayesian inference, we often want to compute the posterior distribution of parameters $θ$ given data $X$ , which is given by Bayes’ theorem:

P (θ ∣ X) = \frac{P ( X ∣ θ ) P ( θ )}{P ( X )}

Here, $P (X)$ is the evidence, also known as the marginal likelihood of the data, which integrates over all possible parameter values:

P (X) = \int P (X ∣ θ) P (θ) d θ

Computing $P (X)$ directly can be extremely challenging, especially for complex models, due to the high-dimensional integration over the parameter space.

Introduction to the ELBO

Variational inference offers a solution by introducing a simpler, parameterized distribution $q_{ϕ} (θ)$ to approximate the true posterior $P (θ ∣ X)$ . The goal is to choose $ϕ$ such that $q_{ϕ} (θ)$ is as close as possible to $P (θ ∣ X)$ . The closeness between these two distributions is often measured using the Kullback-Leibler (KL) divergence:

KL (q_{ϕ} (θ) ∣∣ P (θ ∣ X))

Directly minimizing this KL divergence is difficult since it involves the intractable evidence $P (X)$ . However, it can be shown that minimizing the KL divergence is equivalent to maximizing the Evidence Lower Bound (ELBO), which is defined as:

ELBO (ϕ) = E_{q_{ϕ} (θ)} [lo g P (X ∣ θ)] - KL (q_{ϕ} (θ) ∣∣ P (θ))

This formulation of the ELBO decomposes the problem into two parts: the expected log likelihood of the data given the parameters (which encourages the model to explain the observed data well) and the KL divergence between the approximate posterior $q_{ϕ} (θ)$ and the prior $P (θ)$ (which encourages the approximate posterior to be close to the prior).

Implications and Uses of the ELBO

- Variational Inference: Maximizing the ELBO with respect to $ϕ$ allows for efficient approximate Bayesian Inference, enabling the application of Bayesian methods to complex models like deep neural networks.

- Model Comparison and Selection: The ELBO can also serve as a criterion for model comparison and selection, as it provides a lower bound on the log evidence $lo g P (X)$ . Models with higher ELBO values are considered to have better support from the data, assuming the same class of variational approximations.

- Connection to Autoencoders: In machine learning, particularly in the context of Variational Autoencoders (VAEs), the ELBO forms the objective function. VAEs use neural networks to parameterize both the approximate posterior $q_{ϕ} (θ ∣ X)$ and the likelihood $P (X ∣ θ)$ , and training involves maximizing the ELBO with respect to the parameters of these networks.

Conclusion

The Evidence Lower Bound is a fundamental concept in the intersection of Bayesian inference and machine learning, enabling practical inference in complex probabilistic models. By providing a computationally tractable way to approximate posterior distributions, the ELBO facilitates a wide range of applications, from generative modeling to Bayesian neural networks, significantly broadening the scope of models that can be efficiently trained and analyzed.

David Marx

Explorer

Evidence Lower Bound

The Challenge of Computing the Evidence

Introduction to the ELBO

Implications and Uses of the ELBO

Conclusion

Graph View

Table of Contents

Backlinks