tags: - colorclass/a thermodynamic theory of statistical learning ---# Fluctuation-Dissipation Framework for Neural Network Training
System Definition
Let represent the network parameters, evolving according to the stochastic differential equation:
where: - is the mobility tensor modulated by cosine annealing - is the free energy landscape - is the effective temperature - is a Wiener process
Cosine-Annealed Mobility
The mobility tensor incorporates learning rate annealing:
where is the cosine-annealed learning rate:
Detailed Balance Condition
The system satisfies detailed balance with respect to the Gibbs distribution:
Fluctuation-Dissipation Relation
The noise correlation matrix satisfies:
Training Dynamics
The Fokker-Planck equation describing the evolution of parameter distribution :
Information Capacity Saturation
The effective dimensionality of learned features relates to the Fisher Information Matrix :
where is the network function.
Convergence Analysis
The expected convergence rate under temperature annealing:
where is the minimum eigenvalue of the Hessian at optimum .