Categories
Pure Math

Random Walks on Groups

 

By Kevin Xu

Introduction

This article goes into the basics of random walks on groups, and may skip or skim over prerequisite information. It is advisable to look up the articles on those topics before reading this.


Recall the definition of a group:

Definition 1.1: A group G is a set closed under a binary operation satisfying

  1. Identity: There exists e\in G such that e\cdot g = g for all g\in G
  2. Inverse: For every g\in G, there exists h\in G such that g\cdot h = h\cdot g = e
  3. Association: For every g,h,k\in G, we have (gh)\cdot k = g\cdot (hk)

Subsets of groups that are also groups are called subgroups. For example, the set containing just the identity is the trivial subgroup, and a group is a subgroup of itself. Because subgroups are closed under their operation (as a subset of the group), we can split a group into cosets of a subgroup H, where each coset is gH for elements g\in G. Some of these cosets can overlap, though cosets are either mutually exclusive or are equivalent.

Exercise 1.2: For a,b\in G, prove that for any subgroup H the cosets aH and bH are either the same set or contain no overlap.

Hint: What happens when we have ah_1 = bh_2 for h_1,h_2\in H?

In fact, gH is actually called a left coset, and Hg would be a right coset. However, we usually talk about cosets of normal subgroups, which is when gH and Hg are the same set. Note that this is equivalent to H being closed under conjugation (i.e subgroup H\in G is normal if and only if ghg^{-1}\in H for all g\in G, h\in H). It turns out that the set of cosets of a normal subgroup forms another group called the quotient group. One familiar quotient group is the remainders when dividing by n, or \Z/n\Z, which is a quotient group of the integers under addition.


Random Walks

Now that we’ve introduced groups, let’s talk about probability. When we usually think of probability, we think of events such as rolling a die or flipping a coin, and associate a value from 0 to 1 to that event. We can do the same but with groups, by using a function to associate a probability to each element of a group.

Definition 1.3: A probability distribution is a function P\colon G\to [0,1] such that

    \[\sum_{g\in G} P(g) = 1.\]

Essentially, the probability of picking element g is P(g). Below are several examples of probability distributions:

Flipping a fair coin can be represented as a distribution on \Z/2\Z, or the remainders modulo 2, where both probabilities are \frac{1}{2}. Rolling a fair die is a distribution on \Z/6\Z, again with equal probabilities for each element. When we have equal probabilities, or P(g)=\frac{1}{|G|}\ \forall g\in G, we say the distribution is uniform. Here, |G| is the order of G, and is the number of elements in the group.

A game where you can move forward or backward at every step with equal probability is best represented by \Z^+, the integers under addition, with distribution P(1)=P(-1)=\frac{1}{2} and P(X)=0 otherwise. In these non-uniform distributions, the elements that have probability zero are not important as they can never occur, so in many cases we want to work with the support of P, or the set of elements that have positive probability.

Now just like how we can construct a sequence of coin flips or dice rolls, we can construct a sequence by starting from the identity and then multiply by an element g with probability P(g) at every step.

Definition 1.4: A random walk on a group G along with its associated probability distribution P is a sequence of elements starting from e such that the probability of g moving to gh is P(h) for g,h\in G.

A random walk is called irreducible if the probability of getting to h from g is positive for all g,h\in G, meaning we can always get from any element to any other element, and a random walk is called aperiodic if for time t>T where T is fixed the probability of reaching an element is positive, meaning we can reach any state at any time given that the time is large enough. When both conditions are satisfied, we say that the walk is ergodic. Below are a few examples of random walks on groups:

Example 1.5: A random walk on the plane moving one unit in any direction parallel to the axes can be represented as a random walk on \Z^2 defined on P[(0,1)]=P[(0,-1)]=P[(1,0)]=P[(-1,0)]=\frac{1}{4}.

Example 1.6: Shuffling cards is a random walk on S_n, the symmetric group. The probability distribution depends on what method is used to shuffle the cards.

One easy albeit inefficient way to shuffle a deck is by transposition, or switching the positions of two not necessarily distinct cards in the deck. This can be represented by the distribution

    \[P(\pi) = \begin{cases} \frac{1}{n} & \pi = e \\ \frac{2}{n^2} & \pi\text{ is a transposition} \\ 0 & \text{otherwise}. \end{cases}\]

Exercise 1.7: Are Examples 1.5 and 1.6 ergodic random walks?

In standard probability, as long as we repeat for a large amount of times, our sequence will “converge” to some value. For example, a fair coin will always converge to a 50-50 distribution after a large number of tries no matter what the first few flips are.

In the same way, as long as we make a large amount of steps in any random walk, we also approach a “stationary distribution” consisting of the probabilities of landing on each element. But to find a stationary distribution, we need to find the probability of being at element g after a long time k, which is quite hard in practice. Thus we define a convolution recursively by

    \[P\star P(g) = \sum_h P(gh^{-1})P(h)\]

and

    \[P^k(g) = P^{k-1}(g)\star P(g).\]

Basically, P^2(g) is the sum over all instances of first rolling element h and then rolling gh^{-1}, which results in being at element g after two steps. The recursion then follows inductively.

The convolutions will converge to a unique stationary distribution when the random walk is ergodic (and in fact, will be uniform). However, ergodicity is also somewhat hard to establish.


Theorem 1.8: For a group G and its probability distribution P, the random walk on G will be ergodic if and only if the support of P cannot be contained by a proper subgroup or a coset of a proper normal subgroup of G.

Proof:
Here, proper subgroup means it is not equivalent to G. Let the support of P be \Sigma. First suppose that the random walk is irreducible, so starting from e we can arrive at any element g\in G. Because we can only multiply by elements in \Sigma, in order to reach any element the support must contain a generating set for G. Thus \Sigma generates G. But now, if \Sigma can be contained by a proper subgroup of G, then due to closure the set generated by \Sigma, or G, must be contained within the subgroup. This is a contradiction, so \Sigma cannot be contained within a proper subgroup.

Conversely, suppose \Sigma cannot be contained within a proper subgroup H. The set generated by \Sigma must be closed and contain inverses, so it must be a subgroup. However, it cannot be a proper subgroup, so then it must be G. Thus \Sigma generates G. Then any element can be written as a product of elements in \Sigma, so we can always get from element g to h by multiplying the corresponding representation for g^{-1}h. Therefore P is irreducible.

Now suppose that \Sigma is contained by some coset of a proper normal subgroup gH. Then the first n steps can be expressed as multiplying by gh_1,gh_2,\dots,gh_n in order. Thus we arrive at their product

    \[(gh_1)(gh_2)\cdots(gh_n)\]

after n steps. Now note that

    \[(gh_i)(gh_j) = g(h_ig)h_j = g(gh_i^\prime)h_j = g^2(h_i^\prime h_j)\]

because of association and normality. By carrying out this process multiple times, we see that the product becomes g^n(h_1^\prime \cdots h_{n-1}^\prime h_n). Closure tells us that h_1^\prime\cdots h_n must lie in H, and therefore the product lies in the coset g^nH. Since H is a proper normal subgroup, we can take an element of the nonempty set G\backslash g^nH, and observe that we cannot arrive at this element in n steps. Because n was chosen arbitrarily, the random walk cannot be aperiodic.

Conversely, suppose the random walk is not aperiodic. Since the walk is irreducible, this means that we will arrive at any element periodically with interval m>1. We can express this formally as the common factor

    \[m = \gcd\{k\mid \xi_1\cdots\xi_k = e,\ \xi_i\in \Sigma\}.\]

This is because two sequences will arrive at the same element at different steps only if we multiply by the representation of the identity in \Sigma. Now define

    \[\Sigma_i = \{\xi_1\cdots\xi_k\mid \xi_j\in \Sigma,\ k\equiv j\pmod m\}.\]

We prove that these sets do not overlap with each other. First note that if x\in \Sigma_i,\ y\in \Sigma_j, then xy is the product of i+j elements in \Sigma and thus lies in \Sigma_{i+j}, which further implies x^{-1}\in \Sigma_{-i}. Now suppose we have overlap between two sets, i.e. \xi_1\cdots \xi_p = \xi_1^\prime\cdots \xi_q^\prime. Rearranging, this is

    \[e = \xi_1^\prime\cdots\xi_q^\prime\xi_p^{-1}\cdots\xi_1{-1}.\]

By the definition of m this product must lie in \Sigma^0, so we have p(m-1)+q\equiv 0\pmod m, which implies p\equiv q. Therefore \Sigma_p and \Sigma_q have no common elements unless p,q are equivalent. Lastly, note that \Sigma_0 is a normal subgroup of G. If g_0,h_0\in \Sigma_0, then g_0h_0\in \Sigma_{0+0} and g_0^{-1}\in \Sigma_{-0}. For some element g\in \Sigma_k, we have g^{-1}g_0g \in \Sigma_{-k+0+k}. Therefore since \Sigma is a subset of \Sigma_1, it is contained in a coset of \Sigma_0.


Besides wanting to know what a distribution will converge, we also want to find out how many steps we need to get a close approximation of the stationary distribution.

Definition 1.9: Total variation measures how much a distribution deviates from its stationary distribution u after a certain amount of steps, and is calculated by

    \[||P^k-u||_{TV} = \frac{1}{2}\sum_g \l|P^k(g)-\frac{1}{|G|}\r|.\]

When the total variation is small, commonly measured with \frac{1}{4} or other small number, we say that the random walk is sufficiently mixed. The mixing time, is the minimum time k for which a random walk is mixed.

Conclusion

It is always nice to find the mixing time or find better bounds on it in a random walk in order to have more efficient processes. For example, Persi Diaconis and Mehrdad Shahshahani proved in 1980 that it takes 270 transpositions on a 52-card deck to ensure the total variation was less than 0.01. In comparison, the popular riffle shuffle takes about 6 shuffles to ensure the same “randomness.” Many real-life events can be expressed as random walks on groups, and I encourage you to explore them yourself and find their stationary distributions/mixing times!

References

  • [Dia87] Persi Diaconis. Random Walks on Groups: Characters and Geometry. Stanford University, 1987.
  • [Dia12] Persi Diaconis. Group Representations in Probability and Statistics. Harvard University, 2012.
  • [DS80] Persi Diaconis and Mehrdad Shahshahani. Generating a random permutation with random transpositions. Stanford University, 1980.
  • [Out15] Nolan Outlaw. Markov Chains, Random Walks, and Card Shuffling. Illinois Institute of Technology, 2015.