Mixing Times

Kevin Xu

Suppose we have a Markov chain, or a sequence of states so that every state is independent of any past steps. One example is a frog jumping around on the x-axis and moving one unit in any direction at every step. At any position the frog doesn’t care about how it got to that position; it will just move either left or right because it is at that current position. For another example, imagine that the axis is now bounded; this is still a Markov chain because even at the ends the frog doesn’t care about its past and can only move in one direction.

7 Best Printable Blank Number Line 1-10 - printablee.com

We can represent a Markov chain using a matrix by letting the term in the ith row and jth column be p(i,j) or the probability from getting to state j from i. This matrix is called the transition matrix, and by repeatedly multiplying by itself we can get the probabilities p_n(i,j), or the probability of getting to state j from i in n steps.

Exercise: Verify the above statement.

When n\to\infty, then this matrix will become the stationary distribution, and we find the overall probabilities of ever getting from state i to j. In most cases, this eventual transition matrix will become uniform (specifically, it will be if the chain is ergodic). However, we can measure how long that this process to the standard distribution takes. But this would just be \infty, so instead we consider how long it takes until the variation distance is sufficiently small. Mathematicians usually consider that to be when the largest possible difference between an element in the current transition matrix T^n  and the stationary distribution is not greater than \frac{1}{4}. The time it takes for this to occur, or the exponent of the transition matrix, is called the mixing time.

Mixing Times on Cyclic Groups

In this section, we consider a reversible ergodic Markov chain with generators \{g_1,g_2,\dots,g_m\} on the cyclic group \mathbb Z_n. What this means is that on the remainders modulo n, we have a frog start from 0 and at every step it can jump forward or backward g_i units. For example, if 1 is a generator, the frog could move to 1 or n-1 on the first step. Our question is:

What is the generating group that gives us the highest and lowest mixing tines?

It turns out that the mixing time is bounded by the 2nd largest eigenvalue of the transition matrix. Specifically this bound is

    \[\frac{\lambda_2}{1-\lambda_2}\log\frac{1}{2\epsilon}\le t_{mix}(\epsilon)\le \log\left(\frac{n}{\epsilon}\right)\frac{1}{1-\lambda_2}\]

where \epsilon is usually \frac{1}{4}. In order to find the best and worst mixing times, we just find the maximum and minimum values of \lambda_2 for every n. This isn’t exactly correct, but it works for the most part because the bound is quite strict.

Now how do we find the eigenvalues of the transition matrix T? Well, if we have some eigenfunction f, then since Tf = \lambda f we get

    \[Tf(\zeta^j) = \lambda f(\zeta^j)\]

where \zeta is a primitive nth root of unity. But note that

    \[Tf(\zeta^j) = \frac{1}{2m}\sum_i f(\zeta^{j+g_i})+f(\zeta^{j-g_i}),\]

so if we define f(\zeta^j) = \zeta^{jk}, then we have

    \[Tf(\zeta^j) = \frac{\zeta^{jk}}{m}\sum_i \left(\frac{\zeta^{g_ik}+\zeta^{-g_ik}}{2}\right) = f(\zeta^j)\sum_i \cos\frac{2\pi g_ik}{n},\]

which gives us our eigenvalues. Noting that \cos x = 1-\frac{1}{x^2}+O(x^{-4}), it suffices to find the maximum and minimum values of \sum_i g_i^2\pmod n for large enough n.

Let’s assume that this sum can equal 1, and we have the relation

    \[g_i = ig_1\]

for 1\le i<m. This may be a little weird when n is composite, but for now we’ll ignore it and leave an extra term g_m = cg_1. Then we have

    \[(\frac{m(m-1)(2m-1)}{6}+c^2)g_1^2\equiv 1\pmod n,\]

so in order to have a solution we need the coefficient to be a quadratic residue. Naturally, this is also hard to figure out because quadratic residues aren’t fully understood yet. Therefore we just turn to an easy alternative: if this coefficient is a square itself, then it will obviously be a quadratic residue!

Suppose that this square is d^2. Substituting gives

    \[(d+c)(d-c) = \frac{m(m-1)(2m-1}{6}\]

so we immediately get integer solutions

    \[(d,c) = \left(\frac{(m+1)(2m^2-5m+6)}{12},\frac{(m-2)(2m^2+m+3)}{12}\right).\]

And substituting this value of c back gives

    \[g_1 = \pm \frac{12}{2m^3-3m^2+m+6}\]

which gives us a direct solution. Some finer details need to be hammered out, such as 2m^3-3m^2+m+6 possibly not being coprime to n, but notice we can always choose other rational (d,c) to give us different g_1 values. This concludes finding a generating set that yields the minimum residue. Further extensions can be done similarly, and are listed below as exercises to the reader.

Exercise: Find a generating set that maximizes the sum of squares.

Exercise: Here we tackled the reversible case. What is the minimum and maximum for the general case, where if we can jump forward g_i and not backwards?

My research focuses on the general case, and if you wish to talk more about this topic, feel free to use the email provided in the team page.

Thanks for reading!
Kevin