Counting N-Queens Solutions

Based on a report for Markov Chains and Applications at EPFL in 2022 with Jean-Baptiste Michel and Gaiëtan Renault. Original research from Cheng Zhang & Jianpeng Ma (2009). [1]

Got comments? Send them to oiliossatos@gmail.com!

The 8-queens puzzle

Can you place eight queens on a chessboard such that no two queens attack each other? Click on the chessboard below to try the puzzle!

Click to add / remove a queen

Now that you've tried the puzzle, how many different solutions are possible? And more generally, how can we count \(Q(n)\), the number of solutions for \(n\) queens on an \(n\)-by-\(n\) board? That's the problem we'll explore in this post.

We'll look at three different approaches to counting solutions, each addressing the problems of the last:

Depth-first backtracking
Uniform Monte Carlo integration
Annealed importance sampling

This post contains many interactive visualisations to aid in understanding, accompanied by an implementation in Zig. Please let me know if something breaks on your end.

1. Depth-first backtracking

The following board shows the progression of a depth-first backtracking solution counting algorithm for the 4-queens problem. Use the ➡️ buttons below to step 🔁 through (left) or reset (right) the algorithm.

➡️

🔁

Solutions found: 0

Algorithm

The backtracking algorithm works by building the solution one column at a time. We place exactly one queen in each column so that we never have to worry about vertical attacks; only rows and diagonals need checking. When no spaces in a column are available, we backtrack to the previous column.

Here's what that would look like as a recursive function in Zig:


		Click to reveal / hide.
		fn countSolutions(board: []u8, c: usize, n: usize) usize {
    if (c == n) return 1;

    var count: usize = 0;
    for (0..n) |r| {
	if (isSafe(board, r, c)) {
	    board[c] = @intCast(r);
	    count += countSolutions(board, c+1, n);
	}
    }
    return count;
}

Now, to determine whether a square is safe, recall that we don't need to worry about vertical attacks. Also, because we progress one column at a time, we only need to worry about attacks from previous columns. When board stores the row position \(r_i\) of each queen \(i\), checking for horizontal attacks is easy, while a diagonal attack between queens in columns \(c_i\lt c_j\) is characterised by the following equality condition: \[|r_j - r_i| = c_j-c_i.\]

Again, here's a Zig function that implements the check:


		Click to reveal / hide.
		fn isSafe(board: []u8, rj: usize, cj: usize) bool {
    for (board[0..cj], [0..cj]) |ri, ci| {
	if (ri == rj or
	    @max(ri, rj) - @min(ri, rj) == cj - ci)
	    return false;
    }
    return true;
}

Results

Here is the performance I was able to achieve with the above Zig program, running on an Intel Core i5-5250U CPU at 1.6GHz.

\(n\)	\(Q(n)\)	time (s)
8	92	0.002
9	352	0.010
10	724	0.041
11	2'680	0.239
12	14'200	1.227
13	73'712	6.861
14	365'596	44.773
15	2'279'184	330.301

It is pretty clear that the wall clock time balloons as a function of \(n\). Solution counts for higher board sizes are collected in sequence A000170 of the OEIS, which tops out at \(n=26\).

Conclusion

Since exact results for very large \(n\) are evidently out of reach, the challenge for the rest of the post will be to get a good estimate for the solution count for larger \(n\) in reasonable time.

2. Uniform Monte Carlo integration

Suppose the queens are already on separate columns and rows, preventing horizontal and vertical threats. Picture all the ways of permuting the queens as a vast candidate space \(\mathcal{P}\). We wish to estimate the size of the solution set \(S\) among the candidates.

Before we get into details, we can see how uniform Monte Carlo integration works with the following interactive simulation. Click the 🎲 buttons to generate random permutations of queens.

🎲

10🎲

100🎲

Rounding the estimate to the closest integer gives an estimate of 2 solutions after enough samples, which concurs with depth-first backtracking.

Algorithm

Uniform Monte Carlo integration estimates \(p(n):=Q(n)/n!\), the fractional size of the solution set \(\mathcal{S}\), by independently and uniformily sampling random permutations \(X_1, X_2, \dots, X_T\) from the candidates \(\mathcal{P}\) and computing the estimator \[\hat{p}(n):=\frac{1}{T}\sum_{t=1}^T \mathbf{1}_\mathcal{S}(X_t).\] This estimator is unbiased because the expected value of the indicator function \(\mathbf{1}_\mathcal{S}\) is \(p(n)\). In the following we also refer to \(p(n)\) as \(p\) for clarity.

In Zig the outline of the algorithm is like this.


    Click to reveal / hide.
    fn fill(n: usize, board: []usize) void {
    for (0..n) |i|
	board[i] = i;
}

fn uniform_monte_carlo(n: usize, T: usize) usize {
    var rng = std.Random.DefaultPrng.init(0);
    const rand = rng.random();

    var board: [n]usize = undefined;
    fill(n, &board);

    // Enables O(n) safe check.
    var diags: [4 * n - 2]usize = undefined; 

    var S: usize = 0;
    for (0..T) |t| {
        rand.shuffle(usize, &board);
        if (fastSafe(n, t, &board, &diags))
            S += 1;
    }

    return S;
}

There is a big optimization hiding here thanks to the fact that the queens are already on separate rows and columns. Recall the condition \(|r_j-r_i|=c_j-c_i\) for queens \(i \lt j\). Naïvely checking all pairs for attacks requires \(O(n^2)\) work, but this grows costly as \(n\) increases.

Verifying a solution in O(n)

Instead of checking all pairs, scan each queen and record which diagonals were occupied. A repeated diagonal means a conflict; no repeats means a valid solution. Using arrays of length \(2n-1\) for marking the major and minor diagonals, lookup is \(O(1)\) and memory use is \(O(n)\). With this method, time complexity for verification drops to \(O(n)\).

Here is what that would look like in Zig.


    Click to reveal / hide.
    inline fn fastSafe(
    n: usize,
    t: usize,
    board: []usize,
    diags: []usize,
) bool {
    for (0..n) |c| {
        const r = board[c];
        const a = r + c;
        const b = 3 * n - 2 + r - c;

        if (diags[a] == t or diags[b] == t) {
            return false;
        }

        diags[a] = t;
        diags[b] = t;
    }
    return true;
}

Results

Here are the estimates for a fixed \(T=1'000'000\) candidates. The estimates are accompanied by a 95% Wilson confidence interval \((L, U)\) rounded to the closest integer.

\(n\)	solutions	\(\hat{p}n!\)	L	U	time (s)
10	724	660	571	764	6.018
11	2'680	2'395	1'860	3'082	6.655
12	14'200	14'849	10'461	21'076	7.554
13	73'712	56'043	29'485	106'522	8.421
14	365'596	435'891	186'184	1'020'498	8.623
15	2'279'184	1'307'674	230'830	7'408'050	9.223
16	?	41'845'579	11'475'330	152'592'188	10.400

While the estimates look reasonable and were computed very quickly, the confidence intervals for \(n\ge 15\) are so large that one is not really inclined to believe the estimate, especially as I was unable to compute the ground-truth number of solutions for \(n=16\) at all. To tighten the confidence intervals we need to increase \(T\), but how much?

Analysis

Say we were content with merely having high confidence in the estimated order of magnitude. That is, we would like \(T\) large enough so that \(\hat{p}\) is with at least 95% probability within a factor 10 of the true \(p\). We can deduce how many samples we need from the Chernoff bound \[\operatorname{Pr}\left[|\hat{p} - p| \geq \epsilon p\right] \leq 2\exp\left(-\frac{\epsilon^2 p T}{3}\right) \leq \delta,\] with \(\epsilon = 9\) and \(\delta=0.05\). This tells us that the probably-correct-order-of-magnitude estimate is achievable with \[T\geq \frac{3}{\epsilon^2 p}\log\left(\frac{2}{\delta}\right)\] samples. But the probability \(p\) appears in the denominator of the lower bound, which is precisely the thing we are trying to estimate anyway, and has no closed form outside of an asymptotic result by Simkin (2021): \[p(n)=\frac{\left((1+o(1))ne^{-\alpha})\right)^n}{n!}\approx \frac{(ne^{-\alpha})^n}{n!}\approx\frac{e^{-(\alpha-1)n}}{\sqrt{2\pi n}},\] with \(\alpha\approx 1.944\). Unfortunately, it follows that we need an exponential amount of samples to estimate the correct order of magnitude with uniform sampling.

Conclusion

The fundamental problem of integration via uniform random sampling for \(n\)-queens is that the proportion of solutions is so small that for very large \(n\) it becomes unlikely to even see a single solution at all. This is something importance sampling will address.

3. Annealed Importance Sampling

Let's recap. We wanted to find the number of ways of safely placing \(n\) queens on an \(n\)-by-\(n\) chessboard. Exhaustive counting is too slow, and so is uniform Monte Carlo integration of the indicator function of the solution set.

Motivation

There are a few important facts that will let us get around the problem of vanishing \(p(n)\). Here is fact number one, stated abstractly.

Given positive numbers \(A\lt B \lt C\), the fractions involved in the equation \[\frac{A}{C}=\frac{A}{B}\frac{B}{C}\] are larger on the right hand side than the left hand side.

Applying this fact to our problem, we can estimate \(Q(n)/n!\) by constructing a sequence of intermediate fractions whose size we may better estimate individually. To do that, we need a way to climb monotonically from \(Q(n)\) to \(n\).

Interpolating between \(Q(n)\) and \(n!\)

Recall that these \(Q(n)\) and \(n!\) may be understood as discrete integrals of indicator functions on \(\mathcal{S}\) and \(\mathcal{P}\), respectively. This hint leads us to fact number two.

An interpolation from \(\mathbf{1}_\mathcal{P}\) to \(\mathbf{1}_\mathcal{S}\) is provided by the Boltzmann function \[X \mapsto \exp{(-\beta f(X))}\] when the inverse temperature \(\beta\) varies from \(0\) to \(\infty\), and \(f\) is a positive energy function that is zero on the solution set \(\mathcal{S}\).[2]

Play with the \(\beta\) slider to see how it tempers the original function, singling out the two solutions (\(n=4\)).

\(\beta\) 0

Our choice of energy function \(f\) simply returns the number of diagonal threats. This energy function is nice for a reason that will make sense later.

As a side note, if there are \(k\) queens in a diagonal, then that diagonal contributes \(k(k-1)/2\) threats to the energy function, and the total number of threats (energy) is the sum over all diagonals.

inline fn pairs(comptime T: type, k: T) T {
    return @divExact(k * (k - 1), 2);
}

inline fn threats(diags: []isize) isize {
    var sum: isize = 0;
    for (diags) |d|
        sum += pairs(isize, d);
    return sum;
}

Returning to the point, let's discuss how to use the intermediate functions to build a ladder from \(Q(n)\) to \(n!\). The discrete integral of the Boltzmann function is a normalisation constant \[Z(\beta)=\sum_{X\in \mathcal{P}}\exp{(-\beta f(X))}\] that yields the desired interpolation between \(Q(n)\) and \(n!\) since \(Q(n)=Z(\infty)\lt \dots \lt Z(0)= n!\). To apply this fact in our estimation, we create a temperature schedule \(\beta_1 \lt \beta_2\lt \dots \lt \beta_K\approx \infty\), and compute the number of solutions as follows: \[\frac{Q(n)}{n!}=\prod_{k=1}^{K-1} \frac{Z(\beta_{k+1})}{Z(\beta_k)}:= \prod_{k=1}^{K-1} w_k.\]

Now, the schedule design matters. If \(\beta_k\) and \(\beta_{k+1}\) are close enough, then so are \(Z(\beta_{k})\) and \(Z(\beta_k)\), thus we benefit from balancing the factors. Too close, and we waste our time on factors with no impact.

Designing a well-balanced temperature schedule depends on how we estimate the fractions, so we turn our attention to this problem next.

Importance sampling

Instead of uniform sampling, we switch to Boltzmann sampling. To make the switch, introduce the Boltzmann distribution \[\pi_{\beta}(X)=\frac{\exp{(-\beta f(X))}} {Z(\beta)}\] in the formula as follows. \[w_k=\sum_{X\in\mathcal{P}}\exp{\left(-(\beta_{k+1} - \beta_k)f(X)\right)}\pi_{\beta_k}(X).\] This lets us estimate the very same ratio by drawing M independent samples \(X_1,\dots, X_M\) according to \(\pi_{\beta_k}\) and computing the estimator \[\hat{w}_k=\frac{1}{M}\sum_{m=1}^M \exp{(-(\beta_{k+1}-\beta_k)f(X_m))}.\]

This switch of sampling distributions an example of importance sampling. The new estimator is unbiased in the sense that \[\mathbb{E}_{\pi_{\beta_k}}\left[\hat{w}_{k}\right]=w_k,\] and it has lower variance than a uniform estimator since lower energy elements are sampled more often under \(\pi_\beta\) than under the uniform distribution. Either way, our final estimator of \(Q(n)/n!\) is \[\hat{p}(n)=\prod_{k=1}^{K-1}\hat{w}_{k}.\]

But doesn't this mean we have to incur the cost of computing the distribution \(\pi_\beta\) with its \(n!\)-sized support? Actually, the answer is no.

Metropolis algorithm

To sample according to \(\pi_\beta\), we set up a Markov chain whose limiting distribution is theoretically \(\pi_\beta\). Press the 🔥 button to make a random Markov chain transition: a column swap.

🔥

By making enough of these swaps, the permutation of queens becomes more and more shuffled. As a result, the limiting distribution of permutations is uniform, corresponding to \(\pi_0\).

The Metropolis algorithm augments the chain so that the limiting distribution is any \(\pi_\beta\) at all. It consists of repeating the following steps for as long as required. The result is a sequence of samples that tend to \(\pi_\beta\) in distribution.

Propose a Markov chain transition.
Accept the transition with probability \(\min\{1, \exp{(-\beta\Delta f)}\}.\)

The energy change \(\Delta f\) is the difference in energy if the transition were accepted. For our number-of-diagonal-threats energy function \(f\), the energy change is computable in \(O(1)\). Additionally, this \(f\) acts as a search heuristic that naturally leads the chain to solutions.

Press the 🏛️ button to run the Metropolis algorithm.

🏛️

1K🏛️

Notice how the hot distribution converges quickly, while the cold one converges slowly (try ~50k steps). Convergence is slower because bad swaps are discouraged at lower temperatures, and potentially many bad swaps are required to move between solutions.

Designing a temperature schedule

Recall that we wanted a temperature schedule that balances the fractions \(w_k\) as fairly as possible. It turns out that balancing the KL divergence of successive \(\pi_{\beta_k}\) is a simple proxy for this goal.

KL-constant temperature schedule

Suppose we aim to keep the KL divergence fixed at some \(\alpha \gt 0\). Then after running Metropolis it suffices to choose \[\Delta\beta = \sqrt{\frac{2\alpha}{\operatorname{Var}[f]}},\]

and update \(\beta \leftarrow \beta + \Delta\beta\). Here, \(\operatorname{Var}\left[f\right]\) is estimated from the Metropolis samples with Bessel correction. [3] We stop once \(\beta\) surpasses a maximum \(\beta_K := \beta^*\).

Maximum inverse temperature \(\beta^*\)

Theoretically, \(\beta^*\) is a stand-in for \(\infty\), but in practice there is a tradeoff to consider when choosing finite \(\beta^*\).

Small \(\beta^*\) means \(\pi_{\beta^*}\) doesn't put enough weight on \(\mathcal{S}\). This leads to over-estimating \(p(n)\), equivalent to dropping factors \(w_{k\gg 1}\).
Large \(\beta^*\) and \(M\) finite with the Metropolis chain restarting from the highest-energy position, skews the energy distribution and thus the extra factors \(\hat{w}_{k\gg 1}\lt1\) begin to drag the estimate down.

We can avoid situation (1) by increasing \(\beta\) until almost-all (99.9%) of Metropolis samples have energy zero. Additionally, we can mitigate the effects of the Metropolis warm-up period causing the non-unity of \(\hat{w}_{k\gg1}\) in situation (2) by re-using the final board state from the previous temperature.

Algorithm

This is the outline of annealed importance sampling.

Decide on \(M\), \(\alpha\). Initialize board and \(\beta_0=0\).
For \(k=1,2,\dots\)
1. With Metropolis, sample \(X_1,\dots,X_M\sim \pi_{\beta_k}\)
2. Pick \(\Delta\beta\) and update \(\beta \leftarrow \beta + \Delta\beta\).
3. Compute \(\hat{w}_k = \frac{1}{M}\sum_{m=1}^M \exp{(-\Delta\beta f(X_m))}\)
4. If 99.9% of samples have \(f(X_m)=0\) then break.
Return \(n!\prod_k \hat{w}_k\)

Here's what the outline might look like in Zig.


			Click to reveal / hide.
			// convenient casting
inline fn F64(x: anytype) f64 {
    return @floatFromInt(x);
}

// annealed importance sampling
fn ais(
    comptime n: usize,
    comptime M: usize,
) f64 {

    // set up random number generator
    var rng = std.Random.DefaultPrng.init(0);
    const rand = rng.random();

    // set up board.
    var board: [n]usize = undefined;
    fill(n, &board);
    var diags: [4 * n - 2]isize = undefined;
    fillDiags(n, &board, &diags);

    // initialize schedule and estimate
    var beta: f64 = 0;
    var estimate: f64 = 0;

    while (true) {

        var energies: [M]usize = undefined;
        metropolis(n, M, rand, &board, &diags, beta, &energies);

        const delta_beta = compute_delta_beta(&energies);
        beta += delta_beta;

        var w: f64 = 0;
        for (energies) |f|
            w += @exp(-delta_beta * F64(f));

        estimate += @log(w) - @log(F64(M));

        var sum: usize = 0;
        for (energies) |f| {
            sum += if (f == 0) 1 else 0;
        }
        if (F64(sum) >= 0.999 * F64(M)) break;
    }

    for (1..(n + 1)) |i|
        estimate += @log(F64(i));

    return @exp(estimate);
}

Now let's dive in to the implementation of the Metropolis algorithm. We need to compute the change of energy \(\Delta f\) due to a column swap. This only involves looking at eight indices, so it may be done in \(O(1)\).


		    Click to reveal / hide
		    inline fn metropolis(
    comptime n: usize,
    comptime M: usize,
    rand: std.Random,
    board: []usize,
    diags: []isize,
    beta: f64,
    energies: []usize,
) void {
    var energy: usize = @intCast(threats(diags));
    for (0..M) |m| {
        energies[m] = energy;

        // Pick two columns to swap
        const c1 = rand.uintAtMost(usize, n - 1);
        const c2 = (c1 + 1 + rand.uintAtMost(usize, n - 2)) % n;
        const r1 = board[c1];
        const r2 = board[c2];

        // Note indices of occupied diagonals.
        const offset = 3 * n - 2;
	const diag_idxs: [8]usize = .{
	    r1 + c1,
	    offset + r1 - c1,
	    r2 + c2,
	    offset + r2 - c2,
	    r2 + c1,
	    offset + r2 - c1,
	    r1 + c2,
	    offset + r1 - c2,
	};

	// Change in number of pieces on diagonal due to swap.
        const piece_deltas: [8]isize = .{ -1, -1, -1, -1, 1, 1, 1, 1 };

        // Compute energy change due to swap
        const energy_delta = step(diags, diag_idxs, piece_deltas);

        // If swap accepted, commit and update board, energy, diagonals 
        if (@log(rand.float(f64)) < @min(0, -beta * F64(energy_delta))) {
            const tmp = board[c1];
            board[c1] = board[c2];
            board[c2] = tmp;

            energy = @intCast(@as(isize, @intCast(energy)) + energy_delta);

            for (0..8) |i|
                diags[diag_idxs[i]] += piece_deltas[i];
        }
    }
}

But some of those indices may overlap, for instance when two queens are initially on the same diagonal. So we can count the change in threats due to both queens leaving, simultaneously.


			Click to reveal / hide.
inline fn step(
    diags: []isize,		// how many queens per diagonal
    diag_idxs: [8]usize,	// diagonals involved in the swap
    piece_deltas: [8]isize,	// whether those diagonals lose or gain a piece
) isize {
    var energy_delta: isize = 0;

    // Count the total gain or loss of pieces per diagonal
    var diag_read: [8]bool = .{false} ** 8;
    for (0..8) |i| {
        if (diag_read[i]) continue;

        const idx = diag_idxs[i];
        var total_piece_delta = piece_deltas[i];

        // Look ahead to see if diagonal is involved with other piece
        for (i + 1..8) |j| {
            if (diag_idxs[j] == idx and !diag_read[j]) {
                total_piece_delta += piece_deltas[j];
                diag_read[j] = true;
            }
        }

        // Continue if number of pieces stays the same.
        if (total_piece_delta == 0) continue;

	// Compute change in number of threats
        const d_old = diags[idx];
        const d_new = d_old + total_piece_delta;
        energy_delta += pairs(isize, d_new) - pairs(isize, d_old);
    }

    return energy_delta;
}

Missing here is the code computing \(\Delta\beta\), which is not very interesting.

Results

We deploy the algorithm as follows. Fix \(\alpha=0.001\). Take \(M=25'000\) samples. Here we record \(K\) and \(\beta^*\) out of interest.

\(n\)	solutions	AIS (balanced KL)	\(K\)	\(\beta^*\)	time (s)
4	2	2	73	15.404	5.577
5	10	10	65	5.563	5.016
6	4	5	83	6.102	6.346
7	40	41	92	8.074	7.272
8	92	97	105	10.242	8.059
9	352	359	112	\(\infty\)	8.770
10	724	850	118	10.319	9.049
11	2'680	2'737	126	10.633	9.552
12	14'200	14'563	133	\(\infty\)	10.463
13	73'712	78'710	141	\(\infty\)	11.038
14	365'596	378'424	149	11.767	11.357
15	2'279'184	2'301'984	153	\(\infty\)	11.610
16	?	15'307'300	159	\(\infty\)	12.003

Not bad, right? To clear up any confusion \(\beta^*=\infty\) occurs when all samples have energy zero, which causes \(\Delta\beta\) to blow up. In my code I added a small constant in the denominator to prevent this, so it actually shows up as \(\beta^*\approx 52\).

So far, we pretended \(Q(16)\) was unknown, but in fact it is known to be 14'772'512. That makes ours a pretty good guess for a 12 second calculation! Let's push our luck and compare results for \(n=27\), the known maximum.

\(n\)	solutions	AIS (balanced KL)	\(K\)	time (s)
27	234'907'967'154'122'528	240'409'405'794'640'320	203	14.256

And now, into the unknown!

\(n\)	solutions	AIS (balanced KL)	\(K\)	time (s)
50	?	2.982 \(\times 10^{44}\)	281	20.468
100	?	3.079\(\times 10^{117}\)	391	27.809
200	?	2.248 \(\times 10^{293}\)	530	37.658

Conclusion

Lots of open questions here! In particular,

Is there a more efficient balance of \(M\) and \(\alpha\)?
Would a finer analysis for the choice of \(\Delta\beta\) result in better estimates?
How should we adjust the estimate of \(\operatorname{Var}[f]\) given the autocorrelation inherent to Metropolis sampling?

Overall, annealed importance sampling works really well for this problem, allowing us to get estimates that would be impossible under uniform Monte Carlo sampling.

Ending note

Of course, you could swap the \(n\)-queens problem for any high-dimensional estimation problem with an energy heuristic that measures solution quality. Can you think of other places you'd apply this method? Is there another puzzle you have in mind? I'm happy to chat over email at oiliossatos@gmail.com.

Footnotes

Cheng Zhang & Jianpeng Ma. 2019. Counting solutions for the N-queens and Latin-square problems by Monte Carlo simulations.
The appearance of physics vocabulary is explained by the connection of this method to the theory of thermodynamics. This great video is relevant: Simulating and understanding phase change | Vilas Winstein.
Consider consecutive distributions \(\pi_\beta\) and \(\pi_{\beta + \Delta\beta}\). Their forward KL divergence is \[D_{KL}(\pi_\beta \| \pi_{\beta+\Delta\beta}) = \mathbb{E}_{\pi_\beta}\left[\log\frac{\pi_\beta}{\pi_{\beta+\Delta\beta}}\right].\] Plugging in definitions, we can show that \[\log\frac{\pi_\beta(X)}{\pi_{\beta + \Delta\beta}(X)}=\Delta\beta f(X) + \log Z(\beta+\Delta\beta) - \log Z(\beta).\] We are interested in simplifying this expression. Note that \(Z(\beta + \Delta\beta)\) factors as \[Z(\beta + \Delta\beta) = Z(\beta)\mathbb{E}_{\pi_\beta}\left[\exp{(-\Delta\beta f)}\right].\]

Taking logs on both sides, we have \[\log{Z(\beta + \Delta\beta)} = \log{Z(\beta)} + \log{\mathbb{E}_{\pi_\beta}}\left[\exp{(-\Delta\beta f)}\right],\] where the second term is the cumulant generating function of \(-f\) under \(\pi_\beta\). Hence in terms of familiar statistics we have \[\log{Z(\beta + \Delta\beta)} \approx \log{Z(\beta)} - \Delta\beta \mathbb{E}_{\pi_\beta}\left[f\right] + \frac{1}{2}(\Delta\beta)^2\operatorname{Var}_{\pi_\beta}\left[f\right].\]

Substituting this into previous formula yields the following approximation of KL divergence: \[D_{KL}(\pi_\beta \| \pi_{\beta + \Delta\beta}) \approx \frac{1}{2}(\Delta\beta)^2\operatorname{Var}_{\pi_\beta}\left[f\right],\]which can be solved for \(\Delta \beta\).

Last edited 12/09/2025

Counting N-Queens Solutions

The 8-queens puzzle

Contents

1. Depth-first backtracking

Algorithm

Results

Conclusion

2. Uniform Monte Carlo integration

Algorithm

Verifying a solution in O(n)

Results

Analysis

Conclusion

3. Annealed Importance Sampling

Motivation

Interpolating between \(Q(n)\) and \(n!\)

Importance sampling

Metropolis algorithm

Designing a temperature schedule

KL-constant temperature schedule

Maximum inverse temperature \(\beta^*\)

Algorithm

Results

Conclusion

Ending note

Footnotes