tags: - colorclass/probability theory ---The Evidence Lower Bound (ELBO) is a central concept in variational inference, a method used in Bayesian statistics and machine learning to approximate complex posterior distributions. The ELBO emerges as a key component in the effort to make Bayesian inference computationally tractable for models where the exact computation of posterior distributions is infeasible due to the high-dimensional integration involved.
The Challenge of Computing the Evidence
In Bayesian inference, we often want to compute the posterior distribution of parameters given data , which is given by Bayes’ theorem:
Here, is the evidence, also known as the marginal likelihood of the data, which integrates over all possible parameter values:
Computing directly can be extremely challenging, especially for complex models, due to the high-dimensional integration over the parameter space.
Introduction to the ELBO
Variational inference offers a solution by introducing a simpler, parameterized distribution to approximate the true posterior . The goal is to choose such that is as close as possible to . The closeness between these two distributions is often measured using the Kullback-Leibler (KL) divergence:
Directly minimizing this KL divergence is difficult since it involves the intractable evidence . However, it can be shown that minimizing the KL divergence is equivalent to maximizing the Evidence Lower Bound (ELBO), which is defined as:
This formulation of the ELBO decomposes the problem into two parts: the expected log likelihood of the data given the parameters (which encourages the model to explain the observed data well) and the KL divergence between the approximate posterior and the prior (which encourages the approximate posterior to be close to the prior).
Implications and Uses of the ELBO
- Variational Inference: Maximizing the ELBO with respect to allows for efficient approximate Bayesian Inference, enabling the application of Bayesian methods to complex models like deep neural networks.
- Model Comparison and Selection: The ELBO can also serve as a criterion for model comparison and selection, as it provides a lower bound on the log evidence . Models with higher ELBO values are considered to have better support from the data, assuming the same class of variational approximations.
- Connection to Autoencoders: In machine learning, particularly in the context of Variational Autoencoders (VAEs), the ELBO forms the objective function. VAEs use neural networks to parameterize both the approximate posterior and the likelihood , and training involves maximizing the ELBO with respect to the parameters of these networks.
Conclusion
The Evidence Lower Bound is a fundamental concept in the intersection of Bayesian inference and machine learning, enabling practical inference in complex probabilistic models. By providing a computationally tractable way to approximate posterior distributions, the ELBO facilitates a wide range of applications, from generative modeling to Bayesian neural networks, significantly broadening the scope of models that can be efficiently trained and analyzed.