tags: - colorclass/a thermodynamic theory of statistical learning ---# Fluctuation-Dissipation Framework for Neural Network Training

System Definition

Let represent the network parameters, evolving according to the stochastic differential equation:

where: - is the mobility tensor modulated by cosine annealing - is the free energy landscape - is the effective temperature - is a Wiener process

Cosine-Annealed Mobility

The mobility tensor incorporates learning rate annealing:

where is the cosine-annealed learning rate:

Detailed Balance Condition

The system satisfies detailed balance with respect to the Gibbs distribution:

Fluctuation-Dissipation Relation

The noise correlation matrix satisfies:

Training Dynamics

The Fokker-Planck equation describing the evolution of parameter distribution :

Information Capacity Saturation

The effective dimensionality of learned features relates to the Fisher Information Matrix :

where is the network function.

Convergence Analysis

The expected convergence rate under temperature annealing:

where is the minimum eigenvalue of the Hessian at optimum .