Metropolis-Adjusted Langevin Algorithm

The Metropolis-Adjusted Langevin algorithm updates the position of the chain \(\boldsymbol{\theta}_t\) at time \(t\) based of the overdamped Langevin diffusion equation:

\begin{equation}
\mathrm{d} L_t = \mathrm{d} W_t + \frac{1}{2}\;\nabla \log \pi(L_t)\;\mathrm{d}t
\end{equation}

Where \(W_t\) is the standard Brownian motion. Equation (1) is a continuous time diffusion and we instead use a discrete approximation to this diffusion:

\begin{equation*}
\theta_{t+1} = \theta_t + \tau\;\nabla \log \pi(\theta_t) + \sqrt{2\, \tau\; \xi
\end{equation*}

where

\begin{equation*}
\xi \sim \operatorname{Normal}(0,\mathbb{I}_k)
\end{equation*}

\(k\) is the number of dimension of \(\boldsymbol{\theta}\), \(\tau\) the step size and \(\pi\) the probability density. The idea, like with Hamiltonian Monte Carlo, is to use information about the gradient of the probability density funtion to produce a "better" proposal.

The dynamics defined by the discretized version of the Langevin dynamics only maintain the invariance of \(\pi\) approximately and so we need to perform an accept/reject step accepting \(\boldsymbol{\theta}_t\) with probability:

\begin{equation*}
 \alpha = \min \left\{1, \frac{\pi(\theta_{t+1}) q(\theta_t | \theta_{t+1})}{\pi(\theta_{t}) q(\theta_{t+1} | \theta_{t})} \right\}
\end{equation*}

where

\begin{equation*}
  q\left(\theta_{t+1}|\theta_{t}\right) = \exp\left(-\frac{1}{4\, \tau} || \theta_{t+1} - \theta_t - \tau\;\nabla\log \pi(\theta_t) ||_2\right)
\end{equation*}

since

\begin{equation*}
  \theta_{{t+1}} \sim \operatorname{Normal}\left( \theta_{t} + \frac{\tau}{2}\;\nabla\log \pi(\theta_{t}) \right, \tau\,\mathbb{I}_k)
\end{equation*}

References

Links to this note

Stochastic gradient Langevin Dynamics (SgLD)