3.1. Theory#

3.1.1. Monte Carlo and the Importance of Importance Sampling#

In the first exercise session, you applied the simplest Monte Carlo technique - a random sampling technique - to find the value of \(\pi\). A generalisation of such a Monte Carlo scheme is, however, less than straightforward. Imagine you are interested in a microcanonical observable average:

(3.1)#\[ \begin{aligned} \left<O\right> = \frac{\int_\Gamma O(\Gamma)e^{-\beta E(\Gamma)} \mathrm{d}\Gamma}{\int_\Gamma e^{-\beta E(\Gamma)} \mathrm{d}\Gamma} . \end{aligned} \]

If such an integral were to be obtained from numerical quadrature procedures, with a suitably fine mesh with \(M\) points along each degree of freedom, the calculation would become completely untractable all too quickly, since the procedure scales as \(O(M^{6N})\) (remember \(\mathrm{d}\Gamma = \mathrm{d}x \mathrm{d}y \mathrm{d}z \mathrm{d}v_x \mathrm{d}v_y \mathrm{d}v_z\), so in the 3 dimensional case each particle has \(6\) degrees of freedoms). Furthermore, such a scheme would be associated to a large statistical error; numerical quadratures work reasonably fine for functions that are smooth on the scale of the mesh - but the Boltzmann factor is a highly oscillatory quantity. The oscillatory nature of \(e^{-\beta E(\Gamma)}\) will furthermore result in many points of no importance being sampled, since their Boltzmann factor will be vanishing.

The idea behind importance sampling lies in an extended sampling of regions where the Boltzmann factor is of considerable magnitude, with fewer sampling moves elsewhere. Very simple importance sampling schemes can be constructed, but they fail when used on multidimensional integrals, since they require analytical expressions for the partition function. If that were possible, there would be little interest in performing computer simulations - as discussed in statistical mechanics, all thermodynamic quantities can directly be determined if an analytic expression for the partition function is known.

3.1.2. The Concept of Detailed Balance#

When calculating ensemble averages, one is often not interested in the configurational part of the partition function, but in the average instead:

(3.2)#\[ \begin{aligned} \left<O\right> = \frac{\int O(\mathbf{q})e^{-\beta V(\mathbf{q})} \mathrm{d}\mathbf{q}}{\int e^{-\beta V(\mathbf{q})} \mathrm{d}\mathbf{q}}, \end{aligned} \]

where with respect to (3.1), we have restricted the expression to a configurational average, integrating out over all momenta. One is thus left with the configurational partition function, containing only the potential energy in the exponent, rather than the potential and kinetic term. (You can easily convince yourself that this is possible if the potential and kinetic term are not coupled).

Exercise 1

Show that if the Hamiltonian of a system can be written as: \( H = T(\mathbf{p}) + V(\mathbf{q})\), i.e. if the kinetic and potential energy can be decoupled, then the partition function can be divided into a kinetic and potential part. Derive (3.2) for an ensemble average of an observable \(O(\mathbf{q})\), which depends only on \(\mathbf{q}\).

Rewriting (3.2) in terms of a probability density \(\mathcal{N}(\mathbf{q})\) (cf. the Boltzmann distribution), one finds:

(3.3)#\[\begin{split} \begin{aligned} \left<O\right> &= \int O(\mathbf{q}) \mathcal{N}(\mathbf{q}) \mathrm{d}\mathbf{q} \\ \mathcal{N}(\mathbf{q}) &= \frac{e^{-\beta V(\mathbf{q})}}{\int e^{-\beta V(\mathbf{q})}\mathrm{d}\mathbf{q}}. \end{aligned} \end{split}\]

Therefore, an ensemble average of an observable should be accessible by random sampling if this sampling can be carried out according to the probability distribution defined in \(\mathcal{N}(\mathbf{q})\). In such a case, on average, the number of points generated in a volume element \(\mathrm{d}\mathbf{q}\) must be equal to \(L\mathcal{N}(\mathbf{q})\), where \(L\) denotes the total number of points which are generated. One may therefore rewrite the above expression in an approximate form:

(3.4)#\[ \begin{aligned} \left<O\right> \approx \frac{1}{L} \sum_{i=1}^L n_i O(\mathbf{q}_i). \end{aligned} \]

The average over an observable will be given by the sum over all \(L\) points \(i\) that have been generated in the sampling, each weighted by the number of occurence \(n_i\) of state \(O(\mathbf{q}_i)\). Still, one is left with the problem that the points have to be generated with a relative probability that corresponds to the Boltzmann distribution \(\mathcal{N}\).

Consider a system that is in equilibrium and where all states are equally likely to occur. In order for the system to remain in its equilibrium state, every move from an old state \(o\) to a new state \(n\) must be compensated by an inverse move. If we denote the transition probability by \(\mathcal{P}\), this implies:

(3.5)#\[ \begin{aligned} \mathcal{P} (o \rightarrow n) = \mathcal{P} (n \rightarrow o). \end{aligned} \]

In a system where the probability distribution is not uniform, but given by some probability distribution \(\mathcal{N}\) - such as the Boltzmann distribution - instead, one will equivalently find:

(3.6)#\[ \begin{aligned} \mathcal{N}(o)\mathcal{P} (o \rightarrow n) = \mathcal{N}(n) \mathcal{P}(n \rightarrow o). \end{aligned} \]

That is, the transition matrix \(\mathcal{P}\) that governs the probability of a step to occur must satisfy the above relation. Such a detailed balance ensures that at equilibrium, every process is equilibrated by its reverse.

3.1.3. The Metropolis Algorithm#

There are many possible choices for \(\mathcal{P}\), with maybe the most straightforward one brought forward by Metropolis et al. In the Metropolis Monte Carlo scheme the total transition probability \(\mathcal{P}\) is expressed as a product of the probability for moving from \(o\) to \(n\), \(P^\prime(o \rightarrow n)\), and the probability of accepting this trial move, \(P_\text{acc}(o \rightarrow n)\), such that:

(3.7)#\[ \begin{aligned} \mathcal{P} (o \rightarrow n) = P^\prime (o \rightarrow n)P_\text{acc}(o \rightarrow n). \end{aligned} \]

The transition matrix \(P^\prime\) is chosen to be symmetric, such that \(P^\prime(o \rightarrow n) = P^\prime(n \rightarrow o)\).

Exercise 2 - Bonus

Why is it important that \(P'\) is a symmetric matrix?

Hence when detailed balance is maintained, the acceptance probability is:

(3.8)#\[\begin{split} \begin{aligned} \frac{P_\text{acc}(o \rightarrow n)}{P_\text{acc}(n \rightarrow o)} &= \frac{\mathcal{N}(n)}{\mathcal{N}(o)} \\ &=\frac{e^{-\beta V(n)}}{e^{-\beta V(o)}} \\ &= e^{-\beta\Delta V(n \rightarrow o)}. \end{aligned} \end{split}\]

Many possibilities exist that account for this condition, with the choice in the Metropolis algorithm being:

(3.9)#\[\begin{split} \begin{aligned} P_\text{acc}(o \rightarrow n) = \begin{cases} \frac{\mathcal{N}(n)}{\mathcal{N}(o)} & \text{if}\ \mathcal{N}(n) < \mathcal{N}(o) \\ 1 & \text{if}\ \mathcal{N}(n) \ge \mathcal{N}(o), \end{cases} \end{aligned} \end{split}\]

where the factor \(1\) in the second case is due to \(P_\text{acc}(o \rightarrow n)\) being a probability which cannot exceed a value of \(1\). The total transition matrix \(\mathcal{P}\) is then:

(3.10)#\[\begin{split} \begin{aligned} \mathcal{P}(o \rightarrow n) &= \begin{cases} \frac{\mathcal{N}(n)}{\mathcal{N}(o)}P^\prime(o \rightarrow n) & \text{if}\ \mathcal{N}(n) < \mathcal{N}(o) \\ P^\prime(o \rightarrow n) & \text{if}\ \mathcal{N}(n) \ge \mathcal{N}(o) \end{cases} \end{aligned} \end{split}\]

and

(3.11)#\[ \begin{aligned} \mathcal{P} (n \rightarrow o) &= 1 -\sum_{n \ne o} \mathcal{P}(o \rightarrow n). \end{aligned} \]

The criterion whether or not to accept a trial move is inferred from the above equations and the normalisation of the probability distribution:

(3.12)#\[\begin{split} \begin{aligned} P_\text{acc}(o \rightarrow n) &= e^{-\beta\left[ V(n) - V(o) \right]} \\ &< 1. \end{aligned} \end{split}\]

If \(V(n) < V(o)\), the move is always accepted. However, after a move for which \(V(n) > V(o)\), a random number is generated out of a uniform distribution in the interval \([0,1]\), such that the interval spans the same range as the Boltzmann factor. The probability that the random number \(X\) is less than \(P_\text{acc}(o \rightarrow n)\) is equal to \(P_\text{acc}(o \rightarrow n)\) itself:

(3.13)#\[ \begin{aligned} P\left(X < P_\text{acc}(o \rightarrow n)\right) = P_\text{acc}(o \rightarrow n). \end{aligned} \]

The trial move is then only accepted if \(X < P_\text{acc}(o \rightarrow n)\), and rejected otherwise. This scheme guarantees that the probability of accepting some trial move \(o \rightarrow n\) is equal to the probability \(P_\text{acc}(o \rightarrow n)\). Thus, the system moves towards an equilibrium distribution (\(P_\text{acc} = 1\) for new states lower in energy). Once equilibrium is reached, it is ensured to retain the equilibrium distribution (\(P_\text{acc}\) according to the Boltzmann factor). The overall acceptance probability is:

(3.14)#\[\begin{split} \begin{aligned} P_\text{acc}(o \rightarrow n) &= \min \left(1,\frac{P_\text{acc}(o \rightarrow n)}{P_\text{acc}(n \rightarrow o)}\right) \\ &= \min \left(1, e^{-\beta\left[V(n) - V(o) \right]}\right), \end{aligned} \end{split}\]

i.e. the acceptance probability is 1 if the Boltzmann factor exceeds 1, and it is the Boltzmann factor itself otherwise. This guarantees that the sampling preserves the equilibrium distribution, i.e. it fulfills detailed balance.