RANDOM MATRICES AND FREE PROBABILITY
JACOB CAMPBELL
Contents
Notation
For , let
be the set of probability
measures on , and let
be the set of probability
measures on with
compact support. Let
and
be the upper and lower half-planes, respectively.
1. GUE and genus expansion
1.1. Moments of gaussians.
Recall the following distribution:
Definition 1.1. A random variable
has a standard gaussian distribution if it has density
for .
This is the case of mean
and variance .
Actually, we care about all the moments:
Proposition 1.2. If
is standard gaussian, then
|
|
for .
Definition 1.3 (Set partitions). A partition of the set
is a collection
of blocks, which are disjoint subsets of ,
with .
The set of partitions of
will be denoted by .
A pair partition is a partition whose blocks all have size ;
the set of pair partitions of
will be denoted by .
Note that
when
is odd.
Let
be the map which turns blocks into cycles, with the usual order inherited from
.
This is obviously injective, and the range of
is the set of permutations in
with order
and no fixed points. We will refer interchangeably to
and its image in ;
notice that in the case of ,
the arbitrary choice of order in the definition of
does not actually matter.
Example 1.4.
|
|
Proposition 1.5. We have
for .
Idea of proof. When you choose what gets paired with ,
you have
choices. Next, you have
choices. This goes on to give you
choices in total. Exercise: formalize this idea. □
Theorem 1.6 (Wick formula). Let
be gaussian with covariance matrix .
Then for ,
we have
|
|
Definition 1.7 (Complex gaussian). A standard complex gaussian variable
is obtained by
letting and
be independent standard
real gaussians and letting .
This has mean
and variance :
|
|
and
|
|
Proposition 1.8. Let
be a standard complex gaussian variable. Then
|
|
for .
1.2. Self-adjoint gaussian matrix.
Definition 1.10 (Gaussian unitary ensemble). Let
be a
matrix where
-
are iid real gaussian with mean
and variance ,
-
- a𝑖𝑗 = a𝑗𝑖¯
for 1 ≤ j /mo> i ≤ N.
The point of this definition is that A
is self-adjoint with independent gaussian entries, except as required by the self-adjointness. The
choice of variance 1
N
is for normalization and will play an important role when we make
N →∞.
Definition 1.11. Let (Ω,F,P) be a
probability space and for 1 ≤ i,j ≤ N,
let a𝑖𝑗 : Ω → ℂ be a random
variable. Assume that A = (a𝑖𝑗)i,j
is self-adjoint. This random matrix produces a random probability measure
on ℝ,
where λ1,…,λN
are the eigenvalues of A.
This is called the empirical spectral distribution (or ESD) of A.
The average eigenvalue distribution of
A, denoted by
μA, is simply
the mean of νA:
define μA
by
|
∫
ℝf(x)dμA(x) = 𝔼 (∫
ℝf(x)dνA(x))
|
for measurable f.
This gives us a nice way to access the moments:
|
∫
ℝxmdμ
A(x) = 𝔼 (∫
ℝxmdν
A(x)) = 𝔼 ( 1
N∑
i=1Nλ
im) = 𝔼tr
N(Am).
|
Theorem 1.12 (Wigner’s semicircle law). Let
A be the random matrix from
Definition 1.10. Then μA → μ
weakly as N →∞,
where 𝑑𝜇 = f𝑑𝑥
with
|
f(x) = { 1
2π4 − x2if x ∈ [−2,2]
0 otherwise .
|
In this class and next, we will prove Theorem 1.12 using moments. The moments of
the average eigenvalue distribution turn out to be encoded by a combinatorially rich
sequence of polynomials – related to topological genus of certain surfaces – evaluated at
1
N. We
will come back to the latter connection later on; for now, we will only be concerned with
the leading order.
Notation 1.14. Let γn = (1,…,n).
We will use the notation #(⋅)
for the number of disjoint cycles in a permutation (including singletons/fixed points).
Theorem 1.15 (Genus expansion). We have
|
𝔼(trN(Am)) = ∑
π∈P2(m) ( 1
N )m
2 +1−#(γmπ)
|
for m ≥ 1.
When m
is odd, the formula above should be read as an empty sum, i.e. as 0.
- gπ
is the genus of the surface obtained from a polygon with 2k
sides by gluing the sides together in pairs according to π;
- gπ
is the smallest possible genus for which the following can be done: put the
elements of [2k]
on a circle clockwise, make that circle the boundary of a surface with genus
gπ,
and draw π
on the surface without crossings.
This is where the phrase “genus expansion” comes from.
Proof of Theorem 1.15. We have
𝔼(trN(Am)) = 1
N∑
𝐢:[m]→[N]𝔼(a𝐢(1)𝐢(2)⋯a𝐢(m)𝐢(1))
= 1
N∑
𝐢:[m]→[N] ∑
π∈P2(m) ∏
(r,s)∈π𝔼(a𝐢(r)𝐢(r+1)a𝐢(s)𝐢(s+1))
= 1
N∑
𝐢:[m]→[N] ∑
π∈P2(m) ∏
(r,s)∈πδ𝐢(r)=𝐢(s+1)δ𝐢(r+1)=𝐢(s) 1
N
= 1
N∑
𝐢:[m]→[N] ∑
π∈P2(m)
𝐢=𝐢∘γm∘π
( 1
N )m
2
= ( 1
N )m
2 +1 ∑
π∈P2(m)|{𝐢 : [m] → [N] : 𝐢 = 𝐢 ∘ γm ∘ π}|
= ( 1
N )m
2 +1 ∑
π∈P2(m)N#(γmπ)
= ∑
π∈P2(m) ( 1
N )k+1−#(γmπ)
since a map 𝐢 : [m] → [N] with
𝐢 = 𝐢 ∘ γm ∘ π amounts to a
choice of label from [N]
for each cycle in γmπ.
□
1.3. Exercises.
Exercise 1.17. In this exercise, you will show that the moments of a standard gaussian
variable count pair partitions.
-
Let X
be a standard gaussian variable. Prove that
|
𝔼(X2k) = (2k − 1)!! and 𝔼(X2k−1) = 0
|
for all k ≥ 1.
Use integration by parts to find a recursion.
- Prove that |P2(2k)| = (2k − 1)!!
by putting P2(2k)
in bijection with a set of cardinality (2k − 1)|P2(2k − 2)|.
Exercise 1.18. In this exercise, you will prove the Wick formula: if
X1,…,Xn are gaussian
with mean 0 and
covariance matrix Σ,
then
|
𝔼(X𝐢(1)⋯X𝐢(k)) = ∑
π∈P2(k) ∏
(r,s)∈π𝔼(X𝐢(r)X𝐢(s))
|
for all 𝐢 : [k] → [n].
- Assume Σ
is diagonal and prove the claim.
- Prove the claim in general by diagonalizing Σ
and using the multilinearity of the claimed formula.
Exercise 1.19 ([12, Exercise 1.6]). Let Z
be a standard complex gaussian variable. Show that the moments are
|
𝔼(ZmZ¯n) = { m!if m = n
0 otherwise .
|
Hint: start by showing that
|
𝔼(ZmZ¯n) = 1
π∫
ℝ2(t1 + it2)m(t
1 − it2)ne−(t12+t
22)
dt1dt1
|
and then switch to polar coordinates to show that
|
𝔼(ZmZ¯n) = 1
π∫
02π ∫
0∞rm+n+1e𝑖𝜃(m−n)e−r2
𝑑𝑟𝑑𝜃.
|
Use this to prove the claim.
2. Semicircle law and noncommutative probability spaces
Recall the theorem from last class:
Theorem 1.15 (Genus expansion). We have
|
𝔼(trN(Am)) = ∑
π∈P2(m) ( 1
N )m
2 +1−#(γmπ)
|
for m ≥ 1.
When m
is odd, the formula above should be read as an empty sum, i.e. as 0.
What happens when we send N →∞?
Clearly, for the RHS of Theorem 1.15 to converge, the exponents have to be
non-negative; if this is indeed the case, the only summands that will survive are the ones
with m
2 + 1 − #(𝛾𝜋) = 0.
Definition 2.1. A partition π ∈ P(n)
is said to be noncrossing if the following situation never occurs: there are some
blocks V,W ∈ π
with V ≠W,
and some a,c ∈ V
and b,d ∈ W
with a /mo> b /mo> c /mo> d.
The set of noncrossing partitions of [n]
is denoted by 𝑁𝐶(n).
Theorem 2.2 ([3]). We have #(γ2kπ) ≤ k + 1
for all π ∈ P2(2k).
Moreover, we have equality if and only if π
is noncrossing.
We will come back to this in a moment. First, let’s use it to find the limiting moments
of μA.
Notation 2.3. Write NC2(m)
for the non-crossing pair partitions of [m].
When m
is odd, NC2(m)
is of course empty.
Proposition 2.4. We have
|
|NC2(2k)| = 1
k + 1( 2k
k) and |NC2(2k − 1)| = 0
|
for all k ≥ 1.
Definition 2.5. The k-th
Catalan number is Cat(k) := 1
k+1( 2k
k) .
Corollary 2.6. We have
|
lim N→∞𝔼(trN(A2k)) = Cat(k) and lim
N→∞𝔼(trN(A2k−1)) = 0
|
for k ≥ 1.
2.1. Which pairings survive in the limit?.
Recall the core combinatorial theorem:
Theorem 2.2 ([3]). We have #(γ2kπ) ≤ k + 1
for all π ∈ P2(2k).
Moreover, we have equality if and only if π
is noncrossing.
To see the inequality, we can make a simple observation about permutations:
Lemma 2.7. Let α ∈ Sn
and let τ = (i,j)
be a transposition. Then
|
#(𝛼𝜏) = { #(α) + 1if i and j are in the same cycle of α #(α) − 1 if i and j are in different cycles of α
.
|
For example, let k = 3
and π = (1,4)(2,3)(5,6). Then we
start at (1,2,3,4,5,6) with
1 cycle. Next, we
multiply by (1,4):
|
(1,2,3,4,5,6)(1,4) = (1,5,6)(2,3,4)
|
which has two cycles. Next,
|
(1,2,3,4,5,6)(1,4)(2,3) = (1,5,6)(2,4)(3)
|
which has three cycles. Finally,
|
(1,2,3,4,5,6)(1,4)(2,3)(5,6) = (1,5)(2,4)(3)(6)
|
which has four cycles. This is the maximal situation: each pair split a cycle in
two.
On the other hand, now consider π = (1,5)(2,3)(4,6).
Then we again start at (1,2,3,4,5,6)
with 1
cycle. Then
|
(1,2,3,4,5,6)(1,5) = (1,6)(2,3,4,5)
|
has two cycles, as before. Next,
|
(1,2,3,4,5,6)(1,5)(2,3) = (1,6)(2,4,5)(3)
|
which is another splitting step (notice we haven’t arrived at the crossing yet) giving us
three cycles. The last step is different:
|
(1,2,3,4,5,6)(1,5)(2,3)(4,6) = (1,6,5,2,4)(3)
|
which is back down to two cycles. This is the generic situation: we can have both
splits and merges.
Analyzing the case of equality can probably be done directly, but it is really more insightful to
put all this in a proper algebraic framework. Let’s work with general partitions and permutations,
with size n;
the results we need for pairings will fall out as special cases.
Definition 2.9. For α ∈ Sn,
let |α|
be the length of α:
the minimal number of transpositions needed to factor α.
For α,β ∈ Sn,
write d(α,β) := |α−1β|.
Proposition 2.10. Notation as above. Then
- |α| = n − #(α)
for all α ∈ Sn;
- (Sn,d)
is a metric space.
Proof. For (1), use Lemma 2.7. (2) is a very direct and straightforward verification
of axioms. □
Proposition 2.11 ([3]). Let ϕ : P(n) → Sn
be the injective map which turns blocks into cycles. Let
|
S𝑁𝐶(γn) := {α ∈ Sn : d(e,α) + d(α,γn) = d(e,γn)}.
|
Then S𝑁𝐶(γn)
is the range of ϕ|𝑁𝐶(n).
Proof. Finicky induction but elementary. See [3, Theorem 1]. □
Proof of Theorem 2.2. In (S2k,d),
for π ∈ P2(2k),
the triangle inequality gives
|
d(e,γ2k) ≤ d(e,π) + d(π,γ2k).
|
Unpacking notation, this translates to
|
2k − 1 ≤ (2k − k) + (2k − #(γ2kπ))
|
which simplifies to #(γ2kπ) ≤ k + 1.
The case of equality, i.e.
|
d(e,γ2k) = d(e,π) + d(π,γ2k),
|
corresponds to
π
being noncrossing due to Proposition 2.11. □
2.2. Moment problem.
Question 2.12. We just showed that the moments of the average eigenvalue
distribution of a GUE converge to the Catalan numbers. How do we know this is
a moment sequence, and how do we know this determines the weak limit of the
average eigenvalue distribution?
The Catalan numbers appear as the moments of a particular distribution:
Proposition 2.13. Let 𝑑𝜇 = f(x)𝑑𝑥
where
|
f(x) = { 1
2π4 − x2if x ∈ [−2,2]
0 otherwise .
|
Then
|
∫
ℝx2k𝑑𝜇(x) = 1
k + 1( 2k
k) and ∫
ℝx2k−1𝑑𝜇(x) = 0
|
for all k ≥ 1.
Okay, so we can guess that maybe the average eigenvalue distribution of a
GUE converges to the semicircle distribution. But we just found the moments
match – a priori this could just be a funny coincidence. So we can refocus
Question 2.12:
Question 2.12′.
If (μn)n≥1 is a
sequence in M(ℝ)
and μ ∈M(ℝ)
has
|
lim n→∞∫
ℝxmdμ
n(x) = ∫
ℝxm𝑑𝜇(x)
|
for all m ≥ 1,
then does μn
converge to μ
in some meaningful sense at the level of measures?
The answer to Question ′
is a qualified “yes”:
Theorem 2.14. Let μ be
a probability measure on ℝ
with compact support.
-
If ν is another
probability measure on ℝ
with
for all m ≥ 1,
then μ = ν.
-
If (μn)n≥1
is a sequence of probability measures on
ℝ
with
|
lim n→∞∫
ℝxmdμ
n(x) = ∫
ℝx,𝑑𝜇(x)
|
for all m ≥ 1,
then μn → μ
weakly as n →∞.
Proof. Big exercise. See [4, Theorems 30.1 &30.2]. □
Proof of Theorem 1.12. Let μN
be the average eigenvalue distribution of an
N × N GUE, and let
μ be the semicircle
distribution with radius 2.
We have already proved that
|
lim N→∞∫
ℝxmdμ
N(x) = ∫
ℝxm𝑑𝜇(x)
|
for all m ≥ 1.
Since μ
has compact support, Theorem 2.14 shows that μA → μ
weakly as N →∞.
□
2.3. Noncommutative probability spaces.
Definition 2.15. A ∗-algebra
is a unital associative algebra over ℂ
with an operation 𝒜 →𝒜 : a↦a∗
which is conjugate-linear and has (a∗)∗ = a
and (𝑎𝑏)∗ = b∗a∗
for all a,b ∈𝒜.
An element a ∈𝒜
is said to be positive if a = b∗b
for some b ∈𝒜.
Definition 2.16. A ∗-probability
space is a pair (𝒜,φ)
where 𝒜
is a ∗-algebra
and φ
is a linear functional on 𝒜
with φ(a∗a) ≥ 0
for all a ∈𝒜,
and φ(1) = 1.
Example 2.18 (Commutative case). Let
(Ω,F,P)
be a (classical) probability space, i.e. a measure space with total measure
1. Let
𝒜 := L∞(Ω,F,P)
be the algebra of essentially bounded functions, and let
φ be the
linear functional defined by
for f ∈𝒜.
This is a ∗-probability
space, where the ∗-operation
is complex conjugation – in fact, it has a lot of analytic structure that we will put
aside for now.
Example 2.19 (Finite moments). Major objection to Example 2.18: many random
variables we care about are not bounded. We can set up a similar example that will
actually be more relevant for us: let 𝒜 = L∞−(Ω,F,P)
where L∞−(Ω,F,P) := ⋂
p≥1Lp(Ω,F,P),
i.e. the algebra of random variables with all moments finite. (Use Hölder’s inequality
to show 𝒜
is a ∗-algebra:
once to show Lp ⊇ Lq
for p ≤ q,
and once to show 𝑓𝑔 ∈ L1
for f,g ∈𝒜.)
The expectation φ
is defined in the same way, by integrating functions against P.
This is again a ∗-probability
space.
Example 2.20 (Scalar matrices). Let 𝒜 = MN
be the algebra of N × N
matrices and let φ(A) = 1
NTr(A).
This is a ∗-probability
space which is not commutative.
Example 2.21 (Random matrices). Let 𝒜 = MN(L∞−(Ω,F,P))
be the ∗-algebra
of matrices with entries that are RVs with finite moments, and let φ
be the expected trace: φ(A) = 𝔼tr(A).
For example, our GUE random matrix lives here.
So far, we haven’t seen anything new. The real reason for making this abstraction is
to include genuinely noncommutative situations – where there is no hope of
identifying an underlying classical probability space – and view them as essentially
probabilistic in nature. The best example comes from groups and their group
rings.
Example 2.22 (Group algebra). Let G
be a group and let 𝒜 := ℂ[G].
The expectation is defined by
This is a ∗-probability
space.
Next class, we will use group algebras and group-theoretical freeness to motivate
Voiculescu’s highly influential notion of free independence. Then, we will use it to
construct a concrete operator model for the asymptotics of multiple GUEs.
2.4. Exercises.
Exercise 2.23. In this exercise, you will enumerate the noncrossing pairings and
show that they are counted by the moments of the semicircle distribution. (1)
and (2) are classic textbook combinatorics. (3) is a somewhat involved calculus
problem.
- Find a recursion for |NC2(2k)|.
Think back to the enumeration of P2(2k).
- Show that Cat(k) := 1
k+1( 2k
k)
satisfy the same recursion that you found in (1). To do this, first show that
with C(z) := ∑
k=0∞Cat(k)zk,
we have C(z) = 1−1−4z
2z ,
and derive the functional equation C(z) = 1 + 𝑧𝐶(z)2.
Recover the recursion from this functional equation.
-
Let 𝑑𝜇 = f(x)𝑑𝑥
where
|
f(x) = { 1
2π4 − x2if x ∈ [−2,2]
0 otherwise .
|
Show that
|
∫
ℝx2k𝑑𝜇(x) = Cat(k) and ∫
ℝx2k−1𝑑𝜇(x)
|
for k ≥ 1. To do this,
make the substitution x = 2cos𝜃.
You can use the following identity without proof:
|
∫
0π cos2m𝜃𝑑𝜃 = (2m − 1)!!
(2m)!! π.
|
Exercise 2.24. In this exercise, you will fill in the details of why convergence in
moments implies weak convergence in the case of the semicircular law. In both parts, let
μ be a probability
measure on ℝ
with compact support. This is part genuine exercise, part “book report”. Both parts
require some knowledge of measure theory.
-
Let ν be a probability
measure on ℝ
with
for all m ≥ 1.
Prove that μ = ν.
Note: there is no assumption that ν
has compact support! Be careful and think about how to get around this
issue.
-
Let (μn)n≥1
be a sequence of probability measures on
ℝ
with
|
lim n→∞∫
ℝxmdμ
n(x) = ∫
ℝxm𝑑𝜇(x)
|
for all m ≥ 1.
Prove that μn → μ
weakly. This is standard textbook material (Billingsley P& Theorems 30.1
and 30.2). What I’m looking for is a concise summary of the argument that
addresses its various subtleties: for example, how exactly is (1) being used,
how do you establish tightness, etc. You don’t need to reproduce proofs of
basic classical theorems, like Markov inequality or Helly selection theorem/Prokhorov’s
theorem.
3. Free independence
3.1. Motivation and definition.
Definition 3.1 (Group-theoretical freeness). Let G
be a group and let {Gi : i ∈ I}
be a family of subgroups. The subgroups are free if for all k ≥ 1,
for all i1,…,ik ∈ I
with ij≠ij+1,
and for all gj ∈ Gij
with gj≠e,
we have g1⋯gk≠e.
How does this look in terms of the
∗-probability
space (ℂ[G],φ)?
Well, the condition that
|
gj≠e⟹g1⋯gk≠e translates to φ(gj) = 0⟹φ(g1⋯gk) = 0.
|
Definition 3.2 (Freeness of subalgebras). Let (𝒜,φ)
be a ∗-probability
space, and let {𝒜i : i ∈ I}
be a family of unital ∗-subalgebras.
The family {𝒜i : i ∈ I}
is said to be freely independent (or free) if for all k ≥ 1,
for all i1,…,ik ∈ I
with ij≠ij+1,
and for all aj ∈𝒜ij
with φ(aj) = 0,
we have φ(a1⋯ak) = 0.
Proposition 3.3. Let G
be a group and let {Gi : i ∈ I}
be a family of subgroups. Then the subgroups are free in the group-theoretical sense if
and only if the subalgebras ℂ[Gi]
of ℂ[G]
are freely independent with respect to φ.
Proof. Exercise. I think this is an essential one. □
Definition 3.4 (Freeness of variables). Let (𝒜,φ)
be a ∗-probability
space and let {ai : i ∈ I}
be a family in 𝒜.
For i ∈ I,
let 𝒜i
be the ∗-subalgebra
of 𝒜
generated by ai.
The family {ai : i ∈ I}
is said to be freely independent (or free) if the family {𝒜i : i ∈ I}
is freely independent.
3.2. What does freeness actually do?.
To understand what freeness means, let’s look at some examples. This section is
entirely based on [13, Lecture 5].
Example 3.5. Suppose that 𝒜
and B
are freely independent subalgebras of some ambient algebra. Let’s look at some
words with mixtures between 𝒜
and B.
Length two: for a ∈𝒜
and b ∈B,
freeness says
|
0 = φ((a − φ(a))(b − φ(b))) = φ(𝑎𝑏) − φ(a)φ(b)
|
so φ(𝑎𝑏) = φ(a)φ(b).
Length three: for a1,a2 ∈𝒜
and b ∈B,
freeness says
|
φ((a − φ(a))(b − φ(b))(a − φ(a))) = 0.
|
Expanding and using the length two factorization, this says
0 = φ(a1ba2) − φ(a2)φ(a1b) − φ(b)φ(a1a2) + φ(a1)φ(a2)φ(b)
− φ(a1)φ(ba2) + φ(a1)φ(a2)φ(b) + φ(a1)φ(b)φ(a2) − φ(a1)φ(a2)φ(b)
= φ(a1ba2) − φ(a2)φ(a1)φ(b) − φ(b)φ(a1a2) + φ(a1)φ(a2)φ(b)
− φ(a1)φ(b)φ(a2) + φ(a1)φ(a2)φ(b) + φ(a1)φ(b)φ(a2) − φ(a1)φ(a2)φ(b)
= φ(a1ba2) − φ(a1a2)φ(b)
so φ(a1ba2) = φ(a1a2)φ(b).
IMPORTANT: this factorization pattern does not continue. The same kind of calculation,
with a1,a2 ∈𝒜
and b1,b2 ∈B,
shows that
|
φ(a1b1a2b2) = φ(a1a2)φ(b1)φ(b2) + φ(a1)φ(a2)φ(b1b2) − φ(a1)φ(b1)φ(a2)φ(b2).
|
The point of all this is that mixed moments only depend on mixed moments from the
same subalgebra, via some complicated pattern.
Proposition 3.6. Let (𝒜,φ)
be a ∗-probability
space and let {𝒜i : i ∈ I}
be freely independent ∗-subalgebras.
Let B
be the ∗-subalgebra
of 𝒜
generated by {𝒜i : i ∈ I}.
Then φ|B
is uniquely determined by {φ|𝒜i : i ∈ I}.
Proof. See [13, Lemma 5.13]. □
3.3. Free central limit theorem.
Similar to the last section, this one is entirely based on [13, Lecture 8]. From now on, fix a
∗-probability
space (𝒜,φ) and
a sequence (an)n≥1
in 𝒜
with an∗ = an,
φ(an) = 0, and
φ(an2) = 1 for all
n ≥ 1. Also assume
all the an
have the same moment sequence and they are freely independent.
Theorem 3.7 ([15]). We have
|
lim N→∞φ ((a1 + ⋯ + aN
N )m) = { Cat(k)if m = 2k
0 otherwise
|
for m ≥ 1.
The first step is to simply unpack the LHS:
|
φ ((a1 + ⋯ + aN
N )m) = 1
Nm∕2 ∑
𝐢:[m]→[N]φ(a𝐢(1)⋯a𝐢(m)).
|
Notation 3.8. For 𝐢 : [m] → ℕ,
let ker(𝐢)
be the partition of [m]
defined by letting p
and q
be in the same block if and only if 𝐢(p) = 𝐢(q).
Lemma 3.9. If 𝐢,𝐣 : [m] → ℕ
have ker(𝐢) = ker(𝐣),
then
|
φ(a𝐢(1)⋯a𝐢(m)) = φ(a𝐣(1)⋯a𝐣(m)).
|
Idea of proof. This follows from the assumption that the
an
all have the same distribution, plus the fact that free independence means mixed
moments are determined by the individual distributions.
For example, suppose the common partition is
{{1,3},{2,4}}, and
𝐢 = (1,2,1,2) and
𝐣 = (5,3,5,3).
Then, using the computations from Example 3.5, we have
|
φ(a1a2a1a2) = φ(a12)φ(a
2)2 + φ(a
1)2φ(a
22) − φ(a
1)2φ(a
2)2
|
and
|
φ(a5a3a5a3) = φ(a52)φ(a
3)2 + φ(a
5)2φ(a
32) − φ(a
5)2φ(a
3)2.
|
Since we assume the
an
all have the same moment sequence, these are the same. □
Notation 3.10. In light of Lemma 3.9, for π ∈ P(m),
write φ(π)
for the common value of φ(a𝐢(1)⋯a𝐢(m))
where 𝐢 : [m] → [N]
has ker(𝐢) = π.
Let AπN
be the number of maps 𝐢 : [m] → [N]
with ker(𝐢) = π.
So we have
|
φ ((a1 + ⋯ + aN
N )m) = 1
Nm∕2 ∑
π∈P(m)AπNφ(π)
|
and the task is to see what happens to
AπN as
N →∞. We can immediately
dispense with any π
that has a singleton block:
Lemma 3.11. If π ∈ P(m)
and there is some V ∈ π
with |V | = 1,
then φ(π) = 0.
Proof. Suppose that 𝐢(j) = r
and {j}∈ π.
Then
φ(π) = φ(a𝐢(1)⋯a𝐢(j−1)ara𝐢(j+1)⋯a𝐢(m))
= φ(a𝐢(1)⋯a𝐢(j−1)a𝐢(j+1)⋯a𝐢(m))φ(ar)
using freeness and one of the computations in Example 3.5. We have
φ(ar) = 0, so
φ(π) = 0.
□
Now, we need to understand the limit of
AπN where
π ∈ P(m) has
|V |≥ 2 for all
V ∈ π. This is
straightforward: a map 𝐢 : [m] → [N]
with π = ker(𝐢) amounts to a
choice of label from [N]
for each block of π,
where we do not allow duplicate labels. There are
choices, so we have
|
φ ((a1 + ⋯ + aN
N )m) = ∑
π∈P(m)
|V |≥2∀ V ∈π
N(N − 1)⋯(N −|π| + 1)
Nm∕2 φ(π).
|
The condition on π
makes |π|≤ m∕2, and
when N →∞, the
fraction involving N
goes to 1 if
|π| = m∕2, and goes to
0 otherwise. The only
way we can have |π| = 1
is if π ∈ P2(m).
So we have
|
lim N→∞φ ((a1 + ⋯ + aN
N )m) = ∑
π∈P2(m)φ(π).
|
Of course, if m
is odd, this is 0.
Now let us compute φ(π)
for π ∈ P2(2k). Pick
𝐢 : [m] → ℕ with
π = ker(𝐢). If
𝐢(j)≠𝐢(j + 1) for all
1 ≤ j ≤ m − 1, then
φ(π) = 0 by freeness; otherwise,
there is some 1 ≤ j ≤ m − 1
with 𝐢(j) = 𝐣(j + 1).
Using the computations from Example 3.5, we have
φ(π) = φ(a𝐢(1)⋯a𝐢(j−1)(a𝐢(j)a𝐢(j+1))a𝐣(j+2)⋯a𝐢(m))
= φ(a𝐢(1)⋯a𝐢(j−1)a𝐢(j+2)⋯a𝐢(m))φ(a𝐢(j)a𝐢(j+1))
= φ(a𝐢(1)⋯a𝐢(j−1)a𝐢(j+2)⋯a𝐢(m)).
The pairings where we can keep doing this
k times
are exactly the ones that are noncrossing, so we have
|
φ(π) = { 1if π ∈ NC2(2k)
0if π ∈ P2(2k) ∖ NC2(2k) .
|
This shows that
|
lim N→∞φ ((a1 + ⋯ + aN
N )m) = { Cat(k)if m = 2k
0 otherwise
|
which is the claim of Theorem 3.7.
3.4. Exercises.
Exercise 3.12. Let G
be a group and let 𝒜 = ℂ[G].
Let φ be the linear
functional on 𝒜
defined by
I would say (3) is essential for understanding the concept of free independence. I
strongly suggest you do it, even if you don’t hand it in.
- Prove that φ
is positive by directly using the definition above.
- Find a Hilbert space H,
a ∗-homomorphism
π : 𝒜 →B(H),
and a vector ξ ∈ H
such that φ(a) = ⟨π(a)ξ,ξ⟩
for all a ∈𝒜.
Use the fact that such a thing exists to prove that φ
is positive.
- Let {Gi : i ∈ I}
be a family of subgroups of G,
and for i ∈ I,
let 𝒜i := ℂ[Gi].
Prove that {Gi :∈ I}
are free in the group-theoretical sense if and only if {𝒜i : i ∈ I}
are freely independent.
4. Asymptotic freeness of multiple GUEs
4.1. Free semicircular families.
The example of freeness in the last section will become relevant when we talk about
random unitary matrices, but for now, we are looking for some noncommutative
data that captures the asymptotics of multiple GUEs. The answer is that when
N →∞, they
become freely independent families of semicircular variables.
Definition 4.1 (Semicircular family). Let
(𝒜,φ) be a
∗-probability space. A free
semicircular family in (𝒜,φ)
consists of some elements s1,…,sr ∈𝒜
such that
- sj
is self-adjoint and semicircular, and
- {s1,…,sr}
is freely independent.
Theorem 4.2. For each r ≥ 1,
there is a ∗-probability
space (𝒜,φ) and
some elements s1,…,sr ∈𝒜
such that
- {s1,…,sr}
is a free semicircular family, and
- 𝒜
is generated by {s1,…,sr}.
In this section, we will construct a concrete instance of a free semicircular family using
creation and annihilation operators on the so-called full Fock space. This model will allow
for very concrete calculations involving certain lattice paths which are in obvious bijection
with NC2.
Definition 4.3 (Full Fock space). Fix a Hilbert space
H. The full Fock
space on H
is the Hilbert space
Here, the inner product of H⊗n
is given on elementary tensors by
|
⟨ξ1 ⊗… ⊗ ξn,η1 ⊗⋯ ⊗ ηn⟩ = ⟨ξ1,η1⟩⋯⟨ξn,ηn⟩
|
and the direct sum is the set of sequences
(ξn)n≥0 where
ξ0 ∈ ℂΩ and
ξn ∈ H⊗n for
n ≥ 1, with
∑
n=1∞∥ξn∥2 /mo> ∞. The
inner product is given by
|
⟨(ξn)n≥0,(ηn)n≥0⟩ = ∑
n=0∞⟨ξ
n,ηn⟩.
|
The ∗-algebra
B(F(H))
carries a canonical vacuum state:
Then (B(F(H))),φ)
is a ∗-probability
space.
Proposition 4.4. For ξ ∈ H,
there is an operator c(ξ) ∈B(F(H))
defined by c(ξ)Ω = ξ
and
|
c(ξ)ξ1 ⊗⋯ ⊗ ξn = ξ ⊗ ξ1 ⊗⋯ ⊗ ξn.
|
The adjoint is given by
|
c(ξ)∗Ω = 0, c(ξ)∗ξ
1 = ⟨ξ1,ξ⟩Ω,
|
and
|
c(ξ)∗ξ
1 ⊗⋯ ⊗ ξn = ⟨ξ1,ξ⟩ξ2 ⊗⋯ ⊗ ξn
|
and has the following useful property:
Specifically, c(ξ)
is a scalar multiple of an isometry, and ∥c(ξ)∥ = ∥ξ∥.
Proposition 4.5 (Semicircular). Let H
be a Hilbert space and let ξ ∈ H.
Then c(ξ) + c(ξ)∗∈B(F(H))
is a semicircular element with radius 2∥ξ∥.
The combinatorics of noncrossing pairings are directly built into this model, but in a
slightly different form.
Definition 4.6. A Dyck path with m
steps is a path in ℤ≥0 × ℤ≥0
which
- starts at (0,0),
- takes steps of (1,1)
or (1,−1),
and
- ends at (m,0).
Note that this definition includes the requirement that the path never goes below the
x-axis. Let
D(m) be the set of
Dyck paths with m
steps; when m
is odd, D(m) = ∅.
Lemma 4.7. There is a canonical bijection NC2(m) ≃ D(m).
Proof of Proposition 4.5. To clean up notation, let
a = c(ξ). We
will compute moments:
|
φ((a + a∗)m) = ∑
𝜖1,…,𝜖m∈{1,∗}φ(a𝜖1
⋯a𝜖m
).
|
The key observation is that φ(a𝜖1⋯a𝜖m)
is just an indicator of a certain combinatorial property of the sequence (𝜖1,…,𝜖m):
the sequence determines a lattice path with NE and SE steps, and the indicator
detects whether or not it is a Dyck path.
To see this, for 𝜖1,…,𝜖m ∈{1,∗},
let
|
λj = { 1 if 𝜖m−j+1 = 1
−1if 𝜖m−j+1 = ∗
|
for 1 ≤ j ≤ m. Then, if we
read the expression ξ⊗0
as Ω, we
have
|
a𝜖1
⋯a𝜖m
Ω = { ξ⊗(λ1+⋯+λm)if λ
1 + ⋯ + λk ≥ 0∀ 1 ≤ k ≤ m
0 otherwise
|
and
φ(a𝜖1
⋯a𝜖m
) = ⟨a𝜖1
⋯a𝜖m
Ω,Ω⟩
= { ⟨ξ⊗(λ1+⋯+λm),Ω⟩if λ
1 + ⋯ + λk ≥ 0∀ 1 ≤ k ≤ m
0 otherwise
The inner product in the last line is 1
if λ1 + ⋯ + λm = 0, and
0 otherwise. The
condition λ1 + ⋯ + λk ≥ 0 for all
1 ≤ k ≤ m means the path doesn’t go
below the x-axis, and the
condition that λ1 + ⋯ + λm = 0 means it does
return to the x-axis, i.e. it’s a
Dyck path. So φ((a + a∗)m) is the number
of Dyck paths with m steps. By
Lemma 4.7, this shows that a + a∗
is semicircular. □
Proposition 4.8 (Freeness). Let H
be a Hilbert space and let H1,…,Hr
be mutually orthogonal subspaces of H.
For 1 ≤ i ≤ r,
let 𝒜i
be the ∗-subalgebra
of B(F(H))
generated by {c(ξ) : ξ ∈ Hi}.
Then 𝒜1,…,𝒜r
are freely independent.
Proof. Let 1 ≤ i1,…,ik ≤ r
with ij≠ij+1, and
pick Tj ∈𝒜ij
with φ(Tj) = 0.
Assume without loss of generality that
|
Tj = c(ξ1j)⋯c(ξ
mjj)c(η
1j)∗⋯c(η
njj)∗
|
with mj + nj ≥ 1
and all vectors in Hij.
The reason we can do this, and ignore contributions from 1,
is that any non-zero contribution from 1
will make φ
non-zero.
If there is some 1 ≤ j ≤ k − 1
with nj≠0
and mj+1≠0,
then the product TjTj+1
includes the product c(ηnjj)∗c(ξ1j+1) = ⟨ξ1j+1,ηnjj⟩1.
Since Hj ⊥ Hj+1,
this is 0
and then φ(T1⋯Tk) = 0.
Otherwise, for all 1 ≤ j ≤ k − 1,
we have either nj = 0
or mj+1 = 0.
Then
|
T1⋯Tk = c(ξ1)⋯c(ξm)c(η1)∗⋯c(η
n)∗
|
for some ξ1,…,ξm,η1,…,ηn ∈ H
with m + n ≥ 1,
and
φ(T1⋯Tk) = ⟨c(ξ1)⋯c(ξm)c(η1)∗⋯c(η
n)∗Ω,Ω⟩
= ⟨c(η1)∗⋯c(η
n)∗Ω,c(ξ
m)∗⋯c(ξ
1)∗Ω⟩.
Since at least one of m
or n is
≥ 1,
one of the two arguments in the inner product is of the form
c(ξ)∗Ω = 0, so
φ(T1⋯Tk) = 0.
□
Proof of Theorem 4.2. Let {e1,…,er}
be the standard orthonormal basis of ℂr,
and let sj = c(ej) ∈B(F(ℂr))
for 1 ≤ j ≤ r.
Let 𝒜
be the ∗-algebra
in B(F(ℂr))
generated by {s1,…,sr},
and let φ(T) = ⟨TΩ,Ω⟩
for T ∈𝒜.
□
- Ω
is a cyclic vector for 𝒜,
meaning that {TΩ : T ∈𝒜}
is dense in F(H);
- φ
is a faithful trace;
- for any C∗-algebra
B
with a faithful state ψ,
if B
is generated by a free semicircular family of size r,
then there is a state-preserving ∗-isomorphism
𝒜 ≃B.
This is a very nice object and in light of (3), it is referred to as the semicircular
C∗-algebra
with r
generators. See [13, Lecture 7] for details.
4.2. Asymptotic freeness of GUEs.
Theorem 4.10 ([16]). Let {A1,…,Ar}
be an independent family of N × N
GUEs, and let {s1,…,sr}be a free
semicircular family. Then (A1,…,Ar)
converges in distribution to (s1,…,sr),
in the following sense: for all m ≥ 1
and 𝐢 : [m] → [r],
we have
|
lim N→∞𝔼tr(A𝐢(1)⋯A𝐢(m)) = φ(s𝐢(1)⋯s𝐢(m)).
|
This theorem follows from an easy multivariate generalization of the genus
expansion:
Theorem 1.15′.
Let m ≥ 1
and 𝐢 : [m] → [r].
Then
|
𝔼(trN(A𝐢(1)⋯A𝐢(m))) = ∑
π∈P2(m)
π≤ker(𝐢)
( 1
N )m
2 +1−#(γmπ).
|
When m
is odd, the formula should be read as an empty sum, i.e. as 0.
Proof of Theorem 4.10. When N →∞
in the formula above, the only summands that survive are the ones with
m
2 + 1 − #(γmπ) = 0. These are
the ones with π
noncrossing, so
|
𝔼(trN(A𝐢(1)⋯A𝐢(m))) = |{π ∈ NC2(m) : π ≤ker(𝐢)}| + O(N−2)
|
and we need to show that
|
φ(s𝐢(1)⋯s𝐢(m)) = |{π ∈ NC2(m) : π ≤ker(𝐢)}|.
|
This is similar to Proposition 4.5: in the proof there, the words a𝜖1⋯a𝜖m
which contribute to the count |NC2(m)|
correspond to Dyck paths. There is a similar constraint here, but we also need pairs
(r,s)
of up and down steps to have 𝐢(r) = 𝐢(s).
In terms of NC2(m),
this amounts to π ≤ker(𝐢).
Here is an example of how this works: consider the path which makes two
up steps and then two down steps, or in other words the noncrossing pairing
{{1,4},{2,3}}. This
corresponds to
a𝐢(1)∗a
𝐢(2)∗a
𝐢(3)a𝐢(4)Ω = a𝐢(1)∗a
𝐢(2)∗(e
𝐢(3) ⊗ e𝐢(4))
= a𝐢(1)∗⟨e
𝐢(3),e𝐢(2)⟩e𝐢(4)
= ⟨e𝐢(3),e𝐢(2)⟩⟨e𝐢(1),e𝐢(4)⟩Ω
so we have the additional requirement that
𝐢(2) = 𝐢(3) and
𝐢(1) = 𝐢(4), i.e. the pairing is
compatible with 𝐢.
□
4.3. Exercises.
Exercise 4.11. Let H be
a Hilbert space, let F(H) be its
full Fock space, and let F0(H)
be the dense subspace of finite linear combinations of elementary tensors. In this exercise,
you will fill in the details of the construction of creation and annihilation operators on
F(H).
-
For ξ ∈ H,
define c(ξ) : F0(H) →F(H)
by letting
|
c(ξ)(ξ1 ⊗⋯ ⊗ ξn) = ξ ⊗ ξ1 ⊗⋯ ⊗ ξn
|
and extending by linearity. Prove that c(ξ)
is bounded and extends to an element c(ξ) ∈B(H)
with ∥c(ξ)∥ = ∥ξ∥.
Try to do it in a way that implies c(ξ)
is a scalar multiple of an isometry.
-
Prove that
|
c(ξ)∗Ω = 0, c(ξ)∗ξ
1 = ⟨ξ1,ξ⟩Ω,
|
and
|
c(ξ)∗(ξ
1 ⊗⋯ ⊗ ξn) = ⟨ξ1,ξ⟩ξ2 ⊗⋯ ⊗ ξn
|
for n ≥ 2.
- Prove that c(ξ)∗c(η) = ⟨η,ξ⟩⋅ 1.
Exercise 4.12. A C∗-probability
space is a ∗-probability
space (𝒜,φ)
where 𝒜
is equipped with a norm, with respect to which it is complete, which has the properties
- ∥𝑎𝑏∥≤∥a∥∥b∥
for all a,b ∈𝒜,
and
- ∥a∗a∥ = ∥a∥2
for all a ∈𝒜.
This might be familiar to you as a C∗-algebra
equipped with a state. As usual in this course, we consider all algebras and related
structures to be unital. This exercise requires some knowledge of functional analysis
beyond the prerequisites.
-
Let (𝒜,φ) be a
C∗-probability space
and let a ∈𝒜 be normal
(meaning that a∗a = aa∗).
Prove that there is a unique compactly supported measure on
ℂ
such that
for all p,q ≥ 0.
This is called the ∗-distribution,
or simply the distribution, of a.
- Let (𝒜,φ)
be a C∗-probability
space and let {𝒜i : i ∈ I}
be a freely independent family of ∗-subalgebras
of 𝒜.
For i ∈ I,
let Bi
be the norm-closure of 𝒜i.
Prove that {Bi : i ∈ I}
are freely independent.
5. How to integrate over compact groups
In this class and the next, I want to show how to integrate functions on the unitary group
UN. Specifically,
C(UN) has a dense subalgebra
generated by the functions u𝑖𝑗 : UN → ℂ
– which take a matrix U ∈ UN
and pick out the (i,j)-th
entry – and their conjugates (this is an easy exercise with the Stone-Weierstrass
theorem). So here is the problem we will solve:
Problem 5.1. Let 𝐢,𝐣 : [k] → [N]
and 𝐢′,𝐣′ : [l] → [N].
How to compute
|
∫
UNu𝐢(1)𝐣(1)⋯u𝐢(k)𝐣(k)u𝐢′(1)𝐣′(1)¯⋯u𝐢′(l)𝐣′(l)¯𝑑𝑈?
|
This problem was solved comprehensively in [5] using the representation theory of
Sk and
Schur-Weyl duality. The ideas of that paper were refined in [7], and in my opinion the
most attractive approach to the problem is the one outlined in the recent survey [6]. I
claim no originality in my exposition of these works.
5.1. Reminders about integration.
Let G be
a compact group. Fundamental theorem: there is a unique left-invariant probability measure
on G,
called the Haar measure. In terms of integration, left invariance means that
for all y ∈ G.
Here are some important properties of this measure:
-
it is also right-invariant, which in terms of integration means that
for all y ∈ G;
-
it has the property of unimodularity:
- every non-empty open set has positive measure.
5.2. Projection onto fixed points.
The key starting point is the following observation: we can assemble the integration
problem into an operator
|
P := (∫
UNu𝐢(1)𝐣(1)⋯u𝐢(k)𝐣(k)u𝐢′(1)𝐣′(1)¯⋯u𝐢′(l)𝐣′(l)¯𝑑𝑈 )𝐢,𝐢′,𝐣,𝐣′
|
on (ℂN)⊗k ⊗ (ℂN)⊗l.
This matrix is actually an orthogonal projection, and its range is easy to describe. I
think it will clarify things if we work in a more general setting.
Proposition 5.2. Let G be
a compact group and let ρ : G →𝒰(H) be
a unitary representation of G
on a Hilbert space H.
Let
where the integral is defined in the weak sense by
for all ξ,η ∈ H.
Then P
is the orthogonal projection onto
|
FixG(ρ) := {ξ ∈ H : ρ(x)ξ = ξ for all x ∈ G}.
|
|
P = (∫
G⟨ρ(x)ej,ei⟩𝑑𝑥)i,j∈I.
|
You will not lose anything – this is the situation we care about.
Proof of Proposition 5.2. Invariance gives
⟨P2ξ,η⟩ = ∫
G⟨ρ(x)𝑃𝜉,η⟩𝑑𝑥
= ∫
G⟨𝑃𝜉,ρ(x)∗η⟩𝑑𝑥
= ∫
G (∫
G⟨ρ(y)ξ,ρ(x)∗η⟩𝑑𝑦)𝑑𝑥
= ∬G⟨ρ(𝑥𝑦)ξ,η⟩𝑑𝑦𝑑𝑥
= ∫
G⟨ρ(x)ξ,η⟩𝑑𝑥
= ⟨𝑃𝜉,η⟩
so P2 = P,
and unimodularity gives
⟨P∗ξ,η⟩ = ⟨ξ,𝑃𝜂⟩
= ⟨𝑃𝜂,ξ⟩¯
= ∫
G⟨ρ(x)η,ξ⟩𝑑𝑥¯
= ∫
G⟨ρ(x)η,ξ⟩¯𝑑𝑥
= ∫
G⟨ξ,ρ(x)η⟩𝑑𝑥
= ∫
G⟨ρ(x)∗ξ,η⟩𝑑𝑥
= ∫
G⟨ρ(x−1)ξ,η⟩𝑑𝑥
= ∫
G⟨ρ(x)ξ,η⟩𝑑𝑥
= ⟨𝑃𝜉,η⟩
so P∗ = P. In other
words, P is
an orthogonal projection.
To compute the range, one inclusion is immediate: if
ξ ∈FixG(ρ), then
|
⟨𝑃𝜉,η⟩ = ∫
G⟨ρ(x)ξ,η⟩𝑑𝑥 = ∫
G⟨ξ,η⟩𝑑𝑥 = ⟨ξ,η⟩
|
for all η ∈ H
so 𝑃𝜉 = ξ and
ξ is in the
range of P.
For the reverse inclusion, suppose that
ξ∉FixG(ρ), so there is
some x ∈ G such that
ρ(x)ξ≠ξ. If we can show that
∥𝑃𝜉∥ /mo> ∥ξ∥, then we can conclude
that ξ is not in the
range of P, because if it
was, we would have 𝑃𝜉 = ξ.
Claim: for any ξ ∈ H,
we have
|
∥ξ∥2 −∥𝑃𝜉∥2 = 1
2∫
G∥ξ − ρ(x)ξ∥2𝑑𝑥.
|
To see this, expand
∥ξ − ρ(x)ξ∥2 = ⟨ξ − ρ(x)ξ,ξ − ρ(x)ξ⟩
= ⟨ξ,ξ⟩−⟨ξ,ρ(x)ξ⟩−⟨ρ(x)ξ,ξ⟩ + ⟨ρ(x)ξ,ρ(x)ξ⟩
= 2∥ξ∥2 − (⟨ρ(x)ξ,ξ⟩ + ⟨ρ(x)ξ,ξ⟩¯) (ρ(x) is unitary)
so
∫
G∥ξ − ρ(x)ξ∥2 = 2∥ξ∥2 −(∫
G⟨ρ(x)ξ,ξ⟩𝑑𝑥 + ∫
G⟨ρ(x)ξ,ξ⟩¯𝑑𝑥)
= 2∥ξ∥2 −(∫
G⟨ρ(x)ξ,ξ⟩𝑑𝑥 + ∫
G⟨ρ(x)ξ,ξ⟩𝑑𝑥¯)
= 2∥ξ∥2 − 2Re∫
G⟨ρ(x)ξ,ξ⟩𝑑𝑥
= 2∥ξ∥2 − 2Re⟨𝑃𝜉,ξ⟩
= 2∥ξ∥2 − 2Re⟨P2ξ,ξ⟩
= 2∥ξ∥2 − 2Re⟨𝑃𝜉,𝑃𝜉⟩
= 2∥ξ∥2 − 2∥𝑃𝜉∥2
and dividing by 2
proves the claim.
The point of this claim is that since we have
ξ with
ρ(x)ξ≠ξ for some
x ∈ G, i.e.
∥ξ − ρ(x)ξ∥ /mo> 0 for some
x ∈ G, the same is true on a set
of positive measure, so ∫
G∥ξ − ρ(x)ξ∥2𝑑𝑥 /mo> 0
and ∥𝑃𝜉∥2 /mo> ∥ξ∥2, hence
∥𝑃𝜉∥ /mo> ∥ξ∥ and
ξ cannot be in
the range of P.
□
5.3. Gram matrices and pseudoinverses.
The basic data we have in hand is the following: a finite-dimensional Hilbert space
H, a subspace
K, and the orthogonal
projection P
of H onto
K. Suppose we have a
finite spanning set {ξα : α ∈ A}
for K, and
let G be
its Gram matrix:
We want to describe the entries of P.
Proposition 5.4 (Moore-Penrose pseudoinverse). Let
A be an
n × n matrix. There
is a unique n × n
matrix B
such that
- 𝐴𝐵𝐴 = A,
- 𝐵𝐴𝐵 = B,
and
- 𝐴𝐵
and 𝐵𝐴
are self-adjoint.
This matrix B is called
the pseudoinverse of A
and is denoted by A+.
If A is invertible,
then A+ = A−1.
Theorem 5.6. Let K
be a finite-dimensional Hilbert space with a finite spanning set
{ξα : α ∈ A} with Gram
matrix G.
Then for η,ζ ∈ K,
we have
|
⟨η,ζ⟩ = ∑
α,β∈A⟨η,ξα⟩⟨ζ,ξβ⟩¯Gα,β+.
|
Proof. The RHS can be written as
|
∑
β∈A (∑
α∈A⟨η,ξα⟩Gα,β+) ⟨ξ
β,ζ⟩
|
so we want to show that with η = ∑
β∈Ayβξβ,
we have
Let z
be the vector of values on the RHS above:
|
zβ := ∑
α∈A⟨η,ξα⟩Gα,β+ i.e. z := (⟨η,ξ
α⟩)α∈AG+
|
so 𝑧𝐺 = (⟨η,ξα⟩)α∈AG+G.
We have
|
(⟨η,ξα⟩)α∈A = (\langle ∑
β∈Ayβξβ,ξα\rangle ) α∈A = (∑
β∈Ayβ⟨ξβ,ξα⟩)α∈A = 𝑦𝐺
|
so
(\langle ∑
α∈Azαξα,ξβ\rangle ) β∈A = (∑
α∈Azα⟨ξα,ξβ⟩)β∈A
= (zα)α∈A(⟨ξα,ξβ⟩)α,β∈A
= 𝑧𝐺 = 𝑦𝐺G+G = 𝑦𝐺 = (⟨η,ξ
β⟩)β∈A.
Now, to show η = ∑
α∈Azαξα, it is enough
to show that for any v ∈ K,
we have
|
⟨η,v⟩ = \langle ∑
α∈Azαξα,v\rangle .
|
But we know that v = ∑
β∈Avβξβ
since {ξβ : β ∈ A}
spans K,
so
|
⟨η,v⟩ = ∑
β∈Avβ¯⟨η,ξβ⟩ = ∑
β∈Avβ¯ \langle ∑
α∈Azαξα,ξβ\rangle = \langle ∑
α∈Azαξα,v\rangle
|
and we are done. □
Corollary 5.7. Let H
be a finite-dimensional Hilbert space with orthonormal basis
{ei : i ∈ I}. Let
K be a subspace of
H with a spanning
set {ξα : α ∈ A}and Gram
matrix G, and let
P be the orthogonal
projection of H
onto K.
Then
|
⟨Pei,ej⟩ = ∑
α,β∈A⟨ei,ξα⟩⟨ξβ,ej⟩Gα,β+
|
for i,j ∈ I.
Proof. The theorem says
⟨Pei,ej⟩ = ⟨P2e
i,ej⟩
= ⟨Pei,Pej⟩
= ∑
α,β∈A⟨Pei,ξα⟩⟨ξβ,Pej⟩Gα,β+
= ∑
α,β∈A⟨ei,P∗ξ
α⟩⟨P∗ξ
β,ej⟩Gα,β+
= ∑
α,β∈A⟨ei,ξα⟩⟨ξβ,ej⟩Gα,β+
as claimed. □
5.4. Exercises.
Exercise 5.8. Prove that if k≠l,
then for any 𝐢,𝐣 : [k] → [N]
and 𝐢′,𝐣′ : [l] → [N],
we have
|
∫
UNu𝐢(1)𝐣(1)⋯u𝐢(k)𝐣(k)u𝐢′(1)𝐣′(1)¯⋯u𝐢′(l)𝐣′(l)¯𝑑𝑈 = 0.
|
Hint: this can be done directly using facts that have appeared before this point.
Recall what we did last time: we let P
be the operator
|
(∫
UNu𝐢(1)𝐣(1)⋯u𝐢(k)𝐣(k)u𝐢′(1)𝐣′(1)¯⋯u𝐢′(k)𝐣′(k)¯𝑑𝑈 )𝐢,𝐢′,𝐣,𝐣′
|
on (ℂN)⊗k ⊗ (ℂN)⊗k, and we
showed that P
is the orthogonal projection onto the subspace
|
FixUN(U⊗k ⊗U¯⊗k) := {ξ ∈ (ℂN)⊗k ⊗ (ℂN)⊗k : (U⊗k ⊗U¯⊗k)ξ = ξ∀ U ∈ U
N}.
|
If {ξα : α ∈ A}
is a spanning set of this range, then one can show (by elementary linear algebra) that the
entries of P
are
|
⟨Pe𝐣 ⊗ e𝐣′,e𝐢 ⊗ e𝐢′⟩ = ∑
α,β∈A⟨ξα,e𝐢 ⊗ e𝐢′⟩⟨ξβ,e𝐣 ⊗ e𝐣′⟩Gα,β+
|
where e𝐢 := e𝐢(1) ⊗⋯ ⊗ e𝐢(k),
G := (⟨ξα,ξβ⟩)α,β∈A is the Gram matrix of
the spanning set, and G+
is the pseudoinverse of G
(or just the inverse if the spanning set is linearly independent).
So at least in principle, the solution to our problem of computing unitary matrix
integrals comes down to finding a convenient spanning set for the space of fixed
points.
6.1. The spanning set.
Notation 6.1. For α ∈ Sk,
let
|
Eα := ∑
𝐢:[k]→[N]e𝐢(α(1)) ⊗⋯ ⊗ e𝐢(α(k)) ⊗ e𝐢(1) ⊗⋯ ⊗ e𝐢(k).
|
Theorem 6.2 (Schur-Weyl duality). We have
|
FixUN(U⊗k ⊗U¯⊗l) = span{E
α : α ∈ Sk}.
|
|
ρ(U)ξ1 ⊗⋯ ⊗ ξk = Uξ1 ⊗⋯ ⊗ Uξk and π(σ)ξ1 ⊗⋯ ⊗ ξk = ξσ−1(1) ⊗⋯ ⊗ ξσ−1(k).
|
You can immediately check that they commute. The basic principle of Schur-Weyl
duality is that if 𝒜
is the subalgebra spanned by ρ
and B
is the subalgebra spanned by π,
we actually have 𝒜′ = B
and B′ = 𝒜.
The theorem above is a consequence of this more fundamental theorem.
Proposition 6.4. For α ∈ Sk
and 𝐢,𝐢′ : [k] → [N],
we have
|
⟨Eα,e𝐢⊗e𝐢′⟩ = { 1if 𝐢 = 𝐢′∘ α
0otherwise .
|
Proof. We have
⟨Eα,e𝐢 ⊗ e𝐢′⟩ = ∑
𝐣:[k]→[N]⟨e𝐣(α(1)) ⊗⋯ ⊗ e𝐣(α(k)) ⊗ e𝐣(1) ⊗⋯ ⊗ e𝐣(k),
e𝐢(1) ⊗⋯ ⊗ e𝐢(k) ⊗ e𝐢′(1) ⊗⋯ ⊗ e𝐢′(k)⟩
= ⟨e𝐢′(α(1)) ⊗⋯ ⊗ e𝐢′(α(k)),e𝐢(1) ⊗⋯ ⊗ e𝐢(k)⟩ (𝐣 = 𝐢 ′ )
= { 1if 𝐢 = 𝐢′∘ α
0otherwise
hence the claim. □
So with W := G+,
the integration formula now reads as
|
∫
UNu𝐢(1)𝐣(1)⋯u𝐢(k)𝐣(k)u𝐢′(1)𝐣′(1)¯⋯u𝐢′(k)𝐣′(k)¯𝑑𝑈 = ∑
α,β∈Sk
𝐢=𝐢′∘α
𝐣=𝐣′∘β
Wα,β
|
and what remains is to understand something about the matrix
W.
6.2. Character expansion.
Notation 6.5. Define GrN,k : Sk → ℂ
by
for σ ∈ Sk.
If no confusion will arise, we may refer to GrN,k
as Gr.
Proposition 6.6. The function GrN,k
appears in the following familiar ways:
- it is the character of the representation π
of Sk
on (ℂN)⊗k;
- we have Gα,β = GrN,k(α−1β)
for all α,β ∈ Sk.
Proof. For (1), compute
Tr(π(σ)) = ∑
𝐢:[k]→[N]⟨π(σ)e𝐢(1) ⊗⋯ ⊗ e𝐢(k),e𝐢(1) ⊗⋯ ⊗ e𝐢(k)⟩
= ∑
𝐢:[k]→[N]⟨e𝐢(σ−1(1)) ⊗⋯ ⊗ e𝐢(σ−1(k)),e𝐢(1) ⊗⋯ ⊗ e𝐢(k)⟩
= ∑
𝐢:[k]→[N]δ𝐢=𝐢∘σ−1
= #(𝐢 : [k] → [N] : 𝐢 = 𝐢 ∘ σ−1)
= N#(σ).
The last line is a straightforward count: 𝐢 = 𝐢 ∘ σ−1
just means 𝐢 is constant on
the cycles of σ−1, so we are
picking a label from [N] for
each cycle, and there are N#(σ−1)
= N#(σ)
choices.
For (2), for α,β ∈ Sk,
we have
⟨Eα,Eβ⟩ = ∑
𝐢,𝐣:[k]→[N]⟨e𝐢(α(1)) ⊗⋯ ⊗ e𝐢(α(k)) ⊗ e𝐢(1) ⊗⋯ ⊗ e𝐢(k),
e𝐣(β(1)) ⊗⋯ ⊗ e𝐣(β(k)) ⊗ e𝐣(1) ⊗⋯ ⊗ e𝐣(k)⟩
= ∑
𝐢:[k]→[N]⟨e𝐢(α(1)) ⊗⋯ ⊗ e𝐢(α(k)),e𝐢(β(1)) ⊗⋯ ⊗ e𝐢(β(k))⟩
(the last k components force 𝐢 = 𝐣 )
= ∑
𝐢:[k]→[N]⟨e𝐢(1) ⊗⋯ ⊗ e𝐢(k),e𝐢(α−1β(1)) ⊗⋯ ⊗ e𝐢(α−1β(k))⟩
(replace 𝐢 with 𝐢 ∘ α−1 )
= ∑
𝐢:[k]→[N]δ𝐢=𝐢∘(α−1β) = N#(α−1β)
so ⟨Eα,Eβ⟩ = GrN,k(α−1β).
□
We have a function GrN,k
on Sk, and it is clearly a
class function – GrN,k(σ) only
depends on σ through
#(σ), which only depends on
the conjugacy class of σ.
Equivalently, this can be seen as a central element of
ℂ[Sk]:
Lemma 6.7. Let
f : Sk → ℂ
be a function. The element
∑
σ∈Skf(σ)σ
of
ℂ[Sk]
is central if and only if
f
is a class function.
Proof. First, suppose that f is a
class function. Then for any τ ∈ Sk,
we have
|
∑
σ∈Skf(σ)𝜎𝜏 = ∑
σ∈Skf(𝜏𝜎τ−1)𝜏𝜎τ−1τ = ∑
σ∈Skf(σ)𝜏𝜎 = τ∑
σ∈Skf(σ)σ
|
and passing to linear combinations of
τs, we see that
∑
σ∈Skf(σ)σ is central.
Conversely, if ∑
σ∈Skf(σ)σ is
central, then for any τ ∈ Sk,
we have
∑
σ∈Skf(𝜏𝜎τ−1)σ = ∑
σ∈Skf(σ)τ−1𝜎𝜏 = τ−1 (∑
σ∈Skf(σ)σ)τ
= τ−1τ∑
σ∈Skf(σ)σ = ∑
σ∈Skf(σ)σ
so for all σ ∈ Sk,
we have f(𝜏𝜎τ−1) = f(σ).
□
This means, in particular, that we have a central element of
ℂ[Sk] corresponding
to GrN,k:
|
Gr~N,k = ∑
σ∈SkGrN,k(σ)σ.
|
To describe the inverse, we need some tools from algebra. From this point forward, we
will assume N ≥ k
for simplicity.
Notation 6.8. Let 𝕐k
be the set of Young diagrams with k
boxes: these are grids of boxes, justified to the left, where the length of rows is
non-increasing. (These are in bijection with partitions of k.)
The length of λ ∈ 𝕐k,
denoted by ℓ(λ),
is the number of rows in λ.
For λ ∈ 𝕐k,
a tableau is a labeling of the boxes by elements of {1,…,k},
and a tableau is said to be standard if the rows and columns are strictly increasing.
The number of standard tableaux of shape λ
will be denoted by dim(λ).
Theorem 6.9 (Young, Frobenius, etc?). There is a family
{χλ : λ ∈ 𝕐k} of functions
on Sk
with the following properties:
- χλ
is a class function, in the sense that it is constant on conjugacy classes;
- every class function on Sk
is a linear combination of {χλ : λ ∈ 𝕐k};
- ⟨χλ,χμ⟩ = δλ=μ.
Let π be the
representation of Sk
on (ℂN)⊗k
defined by
|
π(σ)ξ1 ⊗⋯ ⊗ ξk = ξσ−1(1) ⊗⋯ ⊗ ξσ−1(k).
|
Then
|
Tr(π(σ)) = ∑
λ⊢k𝗌λ(1N)χλ(σ)
|
for σ ∈ Sk,
where
|
𝗌λ(1N) = dim(λ)
k! ∏
(i,j)∈λ(N + j − i).
|
Notation 6.10. Define WgN,k : Sk → ℂ
by
|
WgN,k(σ) = 1
k!2 ∑
λ⊢k dim(λ)2
𝗌λ(1N) χλ(σ)
|
for σ ∈ Sk.
If no confusion will arise, we may refer to WgN,k
by Wg.
Note that for this definition to make sense, we are using the assumption N ≥ k
– otherwise, 𝗌λ(1N)
might be 0.
Theorem 6.11. Let Wg~ := ∑
σ∈SkWg(σ)σ
and assume N ≥ k.
Then we have Wg~Gr~ = Gr~Wg~ = 1.
Proof. We have
Wg~Gr~ = 1
k!2 ∑
λ,μ∈𝕐k dim(λ)2𝗌μ(1N)
𝗌λ(1N) (∑
α∈Skχλ(α)α) (∑
β∈Skχμ(β)β)
= ∑
λ,μ∈𝕐k dim(λ)𝗌μ(1N)
dim(μ)𝗌λ(1N)CλCμ
where Cλ := dim(λ)
k! ∑
α∈Skχλ(α)α and
similarly for μ.
It is an exercise in representation theory of finite groups to check that
|
CλCμ = { Cλif λ = μ 0 otherwise and ∑
λ∈𝕐kCλ = 1
|
so Wg~Gr~ = 1.
Similarly Gr~Wg~ = 1.
□
from Maschke’s theorem, we can define
|
ℂN[Sk] := ⊕
λ∈𝕐k
ℓ(λ)≤N
End(V λ)
|
and invert
GrN,k
in there; this is done in [7]. We only assumed
N ≥ k
for simplicity, and because for our purposes this will always be the case.
Corollary 6.13. Continue to assume N ≥ k.
We have
|
Wα,β = 1
k!2 ∑
λ∈𝕐k dim(λ)2
𝗌λ(1N) χλ(α−1β)
|
for α,β ∈ Sk.
Proof. Let W~α,β := WgN,k(α−1β). We
will show W~ = W by
checking that W~G = GW~ = I.
From the theorem, which says that
|
1 = ∑
α,β∈SkWg(α)Gr(β)𝛼𝛽 = ∑
σ∈Skσ∑
α∈SkWg(α)Gr(α−1σ),
|
we know that
|
∑
α∈SkWg(α)Gr(α−1σ) = { 1if σ = e
0otherwise .
|
So
[W~G]α,β = ∑
γ∈SkW~α,γGγ,β
= ∑
γ∈SkWg(α−1γ)Gr(γ−1β)
= ∑
γ∈SkWg(α−1𝛼𝛾)Gr((𝛼𝛾)−1β)
= ∑
γ∈SkWg(γ)Gr(γ−1α−1β)
= { 1if α−1β = e
0otherwise
and [W~G]α,β = δα=β. Similarly
we have [GW~]α,β = δα,β
so W~ = W.
□
6.3. Exercises.
Exercise 6.14. Let {Eα : α ∈ Sk}
be the spanning set described above.
- Prove that {Eα : α ∈ Sk}
is linearly independent if and only if N ≥ k.
-
Let {ξα : α ∈ A}
be a family of vectors in a Hilbert space
H. Prove
that if {ξα : α ∈ A}
is linearly independent, then the Gram matrix
is invertible.
Exercise 6.15 ([8, Problem 4.5.2]). Let
G be a finite group and
let {χi : i ∈ I} be the irreducible
characters of G.
For i ∈ I, let
ρi be the irreducible
representation giving rise to χi,
and let
|
Ci := dim(V i)
|G| ∑
g∈Gχi(g)g−1 ∈ ℂ[G].
|
Recall that ρi
extends by linearity to an algebra homomorphism
ℂ[G] →End(V i).
-
Prove that
|
ρj(Ci) = { Idif j = i 0 otherwise
|
for i,j ∈ I.
-
Prove that
|
CiCj = { Ciif i = j 0 otherwise
|
for i,j ∈ I.
- Prove that ∑
i∈ICi = 1
in ℂ[G].
Hint: the important keywords are central, Schur’s lemma, Maschke’s theorem, and
orthogonality relations.
7. Randomly rotated matrices
In this section we will show that conjugating by a random unitary matrix produces
asymptotic freeness. This section is heavily based on [13, Lecture 23].
First, let us fix our terminology:
Definition 7.1. For N ≥ 1,
let (𝒜N,φN) be a
∗-probability space
and let {ai(N) : i ∈ I} be a family of
self-adjoint elements of 𝒜N.
We say that this family is asymptotically free if for all
k ≥ 1, for all
i1,…,ik ∈ I with
i1≠⋯≠ik, for all
polynomials P1,…,Pk
with φN(Pj(aij(N))) → 0
for 1 ≤ j ≤ k,
we have
|
lim N→∞φN(P1(ai1(N))⋯P
k(aik(N))) = 0.
|
Theorem 7.2 (Voiculescu). For N ≥ 1,
let AN
and BN
be self-adjoint N × N
matrices, let UN
be a random N × N
unitary matrix, and suppose that for all m ≥ 1,
the sequences trN(ANm)
and trN(BNm)
converge as N →∞.
Then UNANUN∗
and BN
are asymptotically free.
Here is what we need to show: if P1,…,Pk
and Q1,…,Qk
are polynomials with
|
𝔼trN(Pj(𝑈𝐴U∗)) → 0 and 𝔼tr
N(Qj(B)) → 0
|
then
|
𝔼trN(P1(𝑈𝐴U∗)Q
1(B)⋯Pk(𝑈𝐴U∗)Q
k(B)) → 0.
|
Strictly speaking, there is another case to be handled where the
Qk(B) is
omitted; this can be done in the same way as the case mentioned above.
We can simplify this by bringing out the unitaries from inside the polynomials: we want to
show that if P1,…,Pk
and Q1,…,Qk
are polynomials with
|
trN(Pj(A)) → 0 and 𝔼trN(Qj(B)) → 0
|
then
|
𝔼trN(UP1(A)U∗Q
1(B)⋯UPk(A)U∗Q
k(B)) → 0.
|
So what we really need to do here is compute things like
where A1,…,Ak
and B1,…,Bk
are tuples of matrices.
7.1. Exact trace formula.
Notation 7.3. Let (𝒜,φ)
be a tracial ∗-probability
space. For a cycle c = (i1,…,ip),
write
|
φc(a1,…,an) := φ(ai1⋯aip).
|
For α ∈ Sn with disjoint
cycle decomposition α = c1⋯cl,
write
|
φα(a1,…,an) := φc1(a1,…,an)⋯φcl(a1,…,an)
|
Lemma 7.4. For N × N
matrices A1,…,Ak
with Ar = (a𝑖𝑗(r))1≤i,j≤N
for 1 ≤ r ≤ k,
we have
|
trα(A1,…,Ak) = 1
N#(α) ∑
𝐢:[k]→[N]a𝐢(1)𝐢(α(1))(1)⋯a
𝐢(k)𝐢(α(k))(k).
|
Proof. Direct verification from the definition of the trace. □
Proposition 7.5. Let A1,…,Ak
and B1,…,Bk
be N × N
matrices. Then
𝔼trN(UA1U∗B
1⋯UAkU∗B
k)
= ∑
α,β∈SkWgN,k(α−1β)N#(α)+#(β−1γ)−1
trα(A1,…,Ak)trβ−1γ(B1,…,Bk).
Proof. We have
𝔼trN(UA1U∗B
1⋯UAkU∗B
k)
= 1
N∑
𝐢,𝐢′,𝐣,𝐣′:[k]→[N]𝔼 (u𝐢(1)𝐣(1)a𝐣(1)𝐣′(1)(1)u
𝐢′(1)𝐣′(1)¯b𝐢′(1)𝐢(2)(1)⋯
⋯u𝐢(k)𝐣(k)a𝐣(k)𝐣′(k)(k)u
𝐢′(k)𝐣′(k)¯b𝐢′(k)𝐢(1)(k))
= 1
N∑
𝐢,𝐢′,𝐣,𝐣′:[k]→[N]a𝐣(1)𝐣′(1)(1)⋯a
𝐣(k)𝐣′(k)(k)b
𝐢′(1)𝐢(2)(1)⋯b
𝐢′(k)𝐢(1)(k)
𝔼 (u𝐢(1)𝐣(1)⋯u𝐢(k)𝐣(k)u𝐢′(1)𝐣′(1)¯⋯u𝐢′(k)𝐣′(k)¯)
= 1
N∑
𝐢,𝐢′,𝐣,𝐣′:[k]→[N]a𝐣(1)𝐣′(1)(1)⋯a
𝐣(k)𝐣′(k)(k)b
𝐢′(1)𝐢(2)(1)⋯b
𝐢′(k)𝐢(1)(k)
∑
α,β∈Skδ𝐢=𝐢′∘αδ𝐣=𝐣′∘βWgN,k(α−1β)
= 1
N∑
α,β∈SkWgN,k(α−1β)
∑
𝐢′,𝐣′:[k]→[N]a𝐣′(β(1))𝐣′(1)(1)⋯a
𝐣′(β(k))𝐣′(k)(k)b
𝐢′(1)𝐢′(α(2))(1)⋯b
𝐢′(k)𝐢′(α(1))(k)
We can relabel things to make this formula easier to work with: swap
α with
β,
𝐣′ with
𝐢′,
and drop the primes. Also, move the permutation in the
as to
the column index using the inverse. Then the above becomes
1
N∑
α,β∈SkWgN,k(α−1β)
∑
𝐢,𝐣:[k]→[N]a𝐢(1)𝐢(α(1))(1)⋯a
𝐢(k)𝐢(α(k))(k)b
𝐣(1)𝐣(β−1γ(1))(1)⋯b
𝐣(k)𝐣(β−1γ(k))(k)
= 1
N∑
α,β∈SkWgN,k(α−1β)Tr
α(A1,…,Ak)Trβ−1γ(B1,…,Bk)
= ∑
α,β∈SkWgN,k(α−1β)N#(α)+#(β−1γ)−1
trα(A1,…,Ak)trβ−1γ(B1,…,Bk)
as claimed. □
In order to take the limit N →∞, we need
to understand the asymptotics of Wgk,N
as a function of N.
7.2. Asymptotics of Weingarten function.
Again, assume N ≥ k.
Recall our formula for WgN,k:
|
WgN,k(σ) = 1
k!2 ∑
λ∈𝕐k dim(λ)2
𝗌λ(1N) χλ(σ).
|
The only part which involves N
is
|
1
𝗌λ(1N) = k!
dim(λ)∏
(i,j)∈λ 1
N + j − i
|
and the highest power of N in
the denominator is Nk. So we can
expand as a Laurent series in N:
|
WgN,k(σ) = ∑
n=0∞a
n(σ)N−k−n.
|
There is a nice formula for the coefficients.
Theorem 7.6 ([5, Theorem 2.2]). Let an(σ)
be the Laurent coefficients above. Then
-
with n = 0,
we have
|
a0(σ) = { 1if σ = e 0 otherwise
|
-
for n /mo> 0,
|
an(σ) = ∑
m=1n(−1)mb
m,n(σ)
|
where bm,n(σ) is the
number of tuples (σ1,…,σm)
with
- σj≠e,
- σσ1⋯σm = e,
and
- ∑
j=1m|σj| = n.
Corollary 7.7. For all σ ∈ Sk,
we have WgN,k(σ) = O(N−k−|σ|).
Proof. The claim here is that an(σ) = 0
for all n /mo> |σ|.
Suppose n /mo> |σ|,
let 1 ≤ m ≤ n, and
take σ1,…,σm ∈ Sk
with ∑
j=1m|σj| = n.
By the triangle inequality, we have
|
|σ| = |σσ1⋯σmσm−1⋯σ
1−1|≤|σσ
1⋯σm| + ∑
j=1m|σ
j|
|
so |σσ1⋯σm|≥|σ|−∑
j=1m|σj| /mo> 0.
This means σσ1⋯σm≠e.
□
Notation 7.8. Let μ(σ)
be the leading coefficient of WgN,k(σ).
|
μ(σ) = ∏
c∈Cyc(σ)(−1)|c|−1Cat(|c|− 1)
|
and
|
Nk+|σ|Wg
N,k(σ) = μ(σ) + O(N−2).
|
This level of detail will not be necessary for our purposes.
7.3. Limit of trace formula.
Recall the trace formula:
𝔼trN(UA1U∗B
1⋯UAkU∗B
k)
= ∑
α,β∈SkWgN,k(α−1β)N#(α)+#(β−1γ)−1
trα(A1,…,Ak)trβ−1γ(B1,…,Bk)
= ∑
α,β∈SkWgN,k(α−1β)N2k−1−|α|−|β−1γ|
trα(A1,…,Ak)trβ−1γ(B1,…,Bk)
We have
|
WgN,k(α−1β) = μ(α−1β)N−k−|α−1β|
+ O(N−k−|α−1β|−1
)
|
so the leading order in N
is
Lemma 7.10. We have
|
|α| + |α−1β| + |β−1γ|≥ k − 1.
|
If we have equality, then either
α
or
β−1γ
has a fixed point.
Proof. The first claim is just triangle inequality:
|
k − 1 = |γ| = |αα−1ββ−1γ|≤|α| + |α−1β| + |β−1γ|.
|
For the second claim, notice that at least one of
|
|α|≤ k − 1
2 or |β−1γ|≤ k − 1
2
|
is true. In the first case,
α
can be written as a product of
≤ k−1
2
transpositions, so it can only move up to
k − 1
points, i.e. it has a least one fixed point. Similarly, in the second case,
β−1γ
can only move up to
k − 1
points, so it has at least one fixed point. □
Recall the original goal: we need to show that for all
P1,…,Pk and
Q1,…,Qk with
trN(Pj(A)) → 0 and
trN(Qj(B)) → 0, we
have
|
𝔼trN(UP1(A)U∗Q
1(B)⋯UPk(A)U∗Q
k(B)) → 0
|
Using the lemma and formula above, we have
lim N→∞𝔼trN(UP1(A)U∗Q
1(B)⋯UPk(A)U∗Q
k(B))
= ∑
α,β∈Sk
|α|+|α−1β|+|β−1γ|=k−1
μ(α−1β)
(lim N→∞trα(P1(A),… ,Pk(A))) (lim N→∞trβ−1γ(Q1(B),… ,Qk(B)))
and according to the lemma, for each summand, either
α or
β−1γ has a fixed point. This
means we have a factor of lim N→∞trN(Pj(A))
or lim N→∞trN(Qj(B)). This makes
each summand 0.
8. Free additive convolution
Here is a very concrete problem: let A
be the 2N × 2N diagonal matrix
with half its entries 1
and the other half − 1.
Rotate A
randomly by conjugating with a Haar unitary, and keep another copy of
A. These are
two matrices with the same eigenvalues, but in generic/random position in relation to each other.
What does A + 𝑈𝐴U∗
look like?
The distribution is complicated in an exact sense, but in our context it is natural to ask this
question when N →∞.
This question can be answered using asymptotic freeness and free probability. This
section draws from [12, Chapter 3] and [13, Lecture 12], and is generally rather sketchy
about technical details. Rigorous proofs of everything in this section can be found in one
of the two aforementioned references.
Definition 8.1. Let μ,ν ∈Mc(ℝ)
and let (𝒜,φ) be a
∗-probability
space with freely independent self-adjoint elements
a,b ∈𝒜 such
that
|
φ(am) = ∫
ℝxm𝑑𝜇(x) and φ(bm) = ∫
ℝxm𝑑𝜈(x)
|
for all m ≥ 1.
Let μ ⊞ ν
be the distribution of a + b.
This construction can always be made using free products (see [13]). It produces
a well-defined operation on probability measures by a basic property of freeness: the
distribution of a + b
only depends on the individual distributions of a
and b.
According to asymptotic freeness, the limit of
A + 𝑈𝐴U∗ should be
described by μ ⊞ μ where
μ = 1
2(δ1 + δ−1). My goal for today
is to compute μ ⊞ μ.
8.1. Free cumulants.
Notation 8.2. Let 𝒜 be an
algebra and let (φn)n≥1 be a sequence
of multilinear functionals φn : 𝒜n → ℂ.
For a set V ⊆ [n],
say V = {i1,…,ip},
write
|
φV (a1,…,an) = φp(ai1,…,aip).
|
For π ∈ P(n),
write
|
φπ(a1,…,an) := ∏
V ∈πφV (a1,…,an).
|
For example, if π = {{1,3,5},{2,4},{6}},
then
|
φπ(a1,a2,a3,a4,a5,a6) = φ3(a1,a3,a5)φ2(a2,a4)φ1(a6).
|
Definition 8.3. Let (𝒜,φ)
be a ∗-probability
space. The free cumulants are the multilinear functionals
κn : 𝒜n → ℂ
defined implicitly by
|
φ(a1⋯an) = ∑
π∈𝑁𝐶(n)κπ(a1,…,an).
|
Example 8.4. With n = 1,
the moment-cumulant formula says
With n = 2,
there are two noncrossing partitions so
|
φ(a1a2) = κ2(a1,a2) + κ1(a1)κ1(a2) = κ2(a1,a2) + φ(a1)φ(a2)
|
hence κ2(a1,a2) = φ(a1a2) − φ(a1)φ(a2).
With n = 3,
there are 5
noncrossing partitions:
φ(a1a2a3) = κ3(a1,a2,a3) + κ2(a1,a2)κ1(a3) + κ2(a1,a3)κ1(a2)
+ κ1(a1)κ2(a2,a3) + κ1(a1)κ1(a2)κ1(a3)
and the only one we don’t already know is
κ3(a1,a2,a3). This pattern continues
to determine κn.
|
φ(a1⋯an) = ∑
π∈𝑁𝐶(n)κπ(a1,…,an)
|
implicitly determines κn(a1,…,an).
Notation 8.6. If we have a single variable
a, we
will abbreviate
This means that for a single variable, we have a new sequence (κn)n≥1
which is related to the moment sequence by mn = ∑
π∈𝑁𝐶(n)κπ.
Theorem 8.7. Let (𝒜,φ)
be a ∗-probability
space and let 𝒜1,…,𝒜r
be ∗-subalgebras
of 𝒜.
Then 𝒜1,…,𝒜r
are freely independent if and only if for any 1 ≤ i1,…,in ≤ r
with ij≠ik
for some 1 ≤ j,k ≤ r,
for any choice of aj ∈𝒜ij
for each 1 ≤ j ≤ n,
we have κn(a1,…,an) = 0.
Proof. See [13, Lecture 11]. □
Corollary 8.8. If a
and b
are free, then κn(a + b) = κn(a) + κn(b).
Proof. Suppose that a
and b
are free. Then
|
κn(a + b) = κn(a + b,…,a + b) = κn(a,…,a) + ⋯ + κn(b,…,b)
|
and the intermediate terms vanish, so the above reads as
|
κn(a + b) = κn(a) + κn(b)
|
as claimed. □
Example 8.9. If a
is standard semicircular, the moment-cumulant formula says
|
∑
π∈𝑁𝐶(n)κπ = { Cat(m)if n = 2m 0 otherwise .
|
This relation is satisfied by
|
κn = { 1if n = 2 0 otherwise .
|
Then, if a
and b
are free semicirculars, we have
|
κn(a+b) = κn(a)+κn(b) = { 2if n = 2 0 otherwise
|
so
|
m2k(a + b) = ∑
π∈𝑁𝐶(2k)κπ(a + b) = ∑
π∈NC2(2k)2k = 2kCat(k)
|
which are the moments of the semicircle measure with variance
2. In general, the semicircle
measure with variance t
is given by the density
|
dμt(x) = { 1
2𝜋𝑡4t − x2𝑑𝑥if x ∈ [−2t,2t]
0 otherwise .
|
The observation above can be generalized as follows: we have a semigroup (μt)t≥1
with respect to free additive convolution, in the sense that μs ⊞ μt = μs+t.
This is a fundamental property of semicircular variables. In particular, if A
and B
are GUEs, then the asymptotic distribution of A + B
is still semicircular, just with a bigger radius.
8.2. Cauchy transform and Stieltjes inversion.
Definition 8.11. For μ ∈M(ℝ),
the Cauchy transform of μ
is the function Gμ
defined by
|
Gμ(z) := ∫
ℝ 1
z − t𝑑𝜇(t)
|
for z ∈ ℍ+.
Proposition 8.12. Let μ ∈Mc(ℝ).
- Gμ
is a holomorphic map ℍ+ → ℍ−.
-
With r := sup t∈supp(μ)|t|,
we have
for |z| /mo> r.
-
We have
Theorem 8.13 (Stieltjes inversion). For
μ ∈M(ℝ), we
have
|
μ((a,b)) + 1
2μ{a,b} = −lim 𝜀→0+ 1
π∫
abImG
μ(t + 𝑖𝜀)𝑑𝑡
|
for a /mo> b.
If μ,ν ∈M(ℝ)
have Gμ = Gν,
then μ = ν.
Example 8.14 (Semicircle distribution). Recall that the standard semicircle
distribution has the density
|
𝑑𝜇(t) = { 1
2π4 − t2if t ∈ [−2,2]
0 otherwise .
|
The moments are
|
∫
ℝtm𝑑𝜇(t) = { Cat(k)if m = 2k
0 otherwise
|
where Cat(k) := 1
k+1( 2k
k)
are the Catalan numbers. The Catalan numbers satisfy the following recursion:
Cat(0) = 1 and
|
Cat(k) = ∑
r=1kCat(r − 1)Cat(k − r)
|
for k ≥ 1.
We can use this to compute the Cauchy transform:
G(z) = 1
z + ∑
k=1∞Cat(k) 1
z2k+1
= 1
z + ∑
k=1∞(∑
r=1kCat(r − 1)Cat(k − r)) 1
z2k+1
= 1
z + 1
z∑
k=1∞∑
r=1kCat(r − 1)
z2r−1 Cat(k − r)
z2(k−r)+1
= 1
z + 1
z∑
r=1∞Cat(r − 1)
z2r−1 ∑
k=r∞Cat(k − r)
z2(k−r)+1
= 1
z + 1
z∑
r=1∞Cat(r − 1)
z2r−1 G(z)
= 1
z + 1
zG(z)2
so G(z) satisfies the
quadratic equation G(z)2 − 𝑧𝐺(z) + 1 = 0.
By the quadratic formula, we have
Since G(z)
satisfies 𝑖𝑦𝐺(𝑖𝑦) → 1,
we need to pick the minus:
What do we actually mean by z2 − 4?
For z ∈ ℍ+, let
𝜃1 be the angle
between the x-axis
and the line from 2
to z, and let
𝜃2 be the angle
between the x-axis
and the line from − 2
to z. Then
z − 2 = |z − 2|ei𝜃1 and
z + 2 = |z + 2|ei𝜃2, and we
let z2 − 4 = |z2 − 4|1∕2ei𝜃1+𝜃2
2 .
Now let’s see what our inversion procedure does: we have
|
Im
(t + 𝑖𝜀)2 − 4 = |(t + 𝑖𝜀)2 − 4|1∕2 sin (𝜃1 + 𝜃2
2 )
|
and when 𝜀 → 0+, we have
two cases: either |t| /mo> 2,
in which case 𝜃1
and 𝜃2 both go
to either 0
or π, so the
sin goes
to sin(0) = sin(2π) = 0; or
|t|≤ 2 and one
of 𝜃1,𝜃2 goes to
0 while the
other goes to π,
so the sin
goes to sin(π∕2) = 1.
So if |t| /mo> 2,
we have
|
lim 𝜀→0+ (−1
πImG(t + 𝑖𝜀)) = Im (− t
2π ) = 0.
|
If |t|≤ 2,
we have
lim 𝜀→0+ (−1
πImG(t + 𝑖𝜀)) = − 1
2πlim 𝜀→0+Im (t + 𝑖𝜀 −(t + 𝑖𝜀)2 − 4)
= − 1
2πlim 𝜀→0+ (𝜀 −|(t + 𝑖𝜀)2 − 4|1∕2 sin (𝜃1 + 𝜃2
2 ))
= 1
2π4 − t2.
This recovers the familiar semicircular density.
8.3. R-transform.
Theorem 8.15 (Voiculescu). Let μ ∈M(ℝ)
with supp(μ) ⊆ [−R,R], and
let G
be the Cauchy transform. Then there is an analytic function
R(z) on
|z| /mo> 1
6R
with
-
G(R(z) + 1∕z) = z
for 0 /mo> |z| /mo> 1
6R,
and
- R(z) = ∑
n=0∞κn+1zn
for |z| /mo> 1
6R.
Corollary 8.16. For μ,ν ∈Mc(ℝ),
we have Rμ⊞ν(z) = Rμ(z) + Rν(z).
Example 8.17. Let μ = 1
2(δ1 + δ−1).
We will compute μ ⊞ μ
and solve the problem posed at the beginning of class; this is a classic textbook
example of what free convolutions look like.
The first step is to find the Cauchy transform, and then in turn the
R-transform.
The Cauchy transform is easy:
|
Gμ(z) = ∫
ℝ 1
z − t𝑑𝜇(t) = 1
2 ( 1
z − 1 + 1
z + 1 ) = z
z2 − 1.
|
For the R-transform, we want
to use the functional equation Gμ(Rμ(z) + z−1) = z.
Let Kμ(z) := Rμ(z) + z−1,
so
|
z = Gμ(Kμ(z)) = Kμ(z)
Kμ(z)2 − 1⟹Kμ(z)2 − z−1K
μ(z) − 1 = 0
|
and the quadratic formula says
|
Kμ(z) = z−1 ±z−2 + 4
2 = 1 ±1 + 4z2
2z .
|
To pick which one, observe that we are supposed to have
Rμ(0) = 0; the
minus sign would give something non-zero divided by zero, while the plus sign would give
0∕0, so
the plus sign is the only possiblity. This makes
Now, we can use the linearization property of the
R-transform:
|
Rμ⊞μ(z) = Rμ(z) + Rμ(z) = 1 − 4z2 − 1
z ⟹Kμ⊞μ(z) = 1 + 4z2
z .
|
One can show (see the proof of [13, Theorem 12.7]) that
z = Kμ⊞μ(Gμ⊞μ(z)). Then
|
z = Kμ⊞μ(Gμ⊞μ(z)) = 1 + 4Gμ⊞μ (z)2
Gμ⊞μ(z) ⟹(z2 − 4)G
μ⊞μ(z)2 − 1 = 0
|
and Gμ⊞μ(z) = 1
z2 −4.
Finally, we can do Stieltjes inversion with this:
lim 𝜀→0 (−1
πImGμ⊞μ(t + 𝑖𝜀)) = −1
πlim 𝜀→0Im 1
(t + 𝑖𝜀)2 − 4
= −1
πIm 1
t2 − 4.
So if |t|≤ 2,
the above is
|
−1
πIm 1
t2 − 4 = −1
πIm 1
i4 − t2 = 1
πIm i
4 − t2 = 1
π4 − t2
|
and otherwise, there is no imaginary part so it’s just
0.
8.4. Exercises.
Exercise 8.18. Let μ = δ0.
Compute the limit in the Stieltjes inversion formula and explain why you couldn’t
have worked directly with the integrand. Note: the point of this exercise is to see
how Stieltjes inversion works when you have atoms.
9. Expected characteristic polynomials
In this section, we will use familiar combinatorial methods to compute some expected
characteristic polynomials. This is the starting point for a new theory called finite free
probability, which we will spend the next few classes surveying.
Notation 9.1. For A ∈ MN(ℂ),
say A = (a𝑖𝑗)1≤i,j≤N,
and S,T ⊆ [N],
write
Let cx(A) := det (𝑥𝐼 − A)
be the characteristic polynomial of A.
Proposition 9.2. For A ∈ MN(ℂ),
we have
|
cx(A) = ∑
k=0NxN−k(−1)k𝖾
k(A) where 𝖾k(A) := ∑
S⊆[N]
|S|=k
det (A(S,S)).
|
Proof. See [9, Section 1.2]. □
The first example of an expected characteristic polynomial is straightforward:
Example 9.3 (GUE and Hermite polynomials). Let
A be an
N × N GUE. Denoting
by Sym(S) the set of
permutations of S ⊆ [N],
we have
𝔼𝖾m(A) = ∑
|S|=m𝔼 det (A(S,S))
= ∑
|S|=m ∑
σ∈Sym(S)sgn(σ)𝔼 (∏
i∈Sa𝑖𝜎(i))
= ∑
|S|=m ∑
σ∈Sym(S)sgn(σ)∑
π∈P2(S) ∏
(r,s)∈π𝔼(a𝑟𝜎(r)a𝑠𝜎(s)).
If m
is odd, then the inner sum above is empty and the whole expression is just
0. This
means the expected characteristic polynomial is even in the sense that it is of the form
|
xN + (⋯)xN−2 + (⋯)xN−4 + ⋯.
|
In other words, the roots come in plus-minus pairs.
Now suppose m = 2k.
Then
𝔼𝖾2k(A) = ∑
|S|=2k ∑
σ∈Sym(S)sgn(σ)∑
π∈P2(S) ∏
(r,s)∈π𝔼(a𝑟𝜎(r)a𝑠𝜎(s))
= ∑
|S|=2k ∑
σ∈Sym(S)sgn(σ)∑
π∈P2(S) 1
Nkδ
r=σ(s)
s=σ(r)
∀ (r,s)∈π
= 1
Nk ∑
|S|=2k ∑
π∈P2(S)sgn(π)
= 1
Nk( N
2k)(2k − 1)!!(−1)k
= 1
Nk N!
(2k)!(N − 2k)! (2k)!
2kk! (−1)k
= (N)2k
Nk (−1)k
2kk! .
This is the very famous Hermite polynomial with degree
N.
The central computation that gives rise to finite free probability involves the
randomly rotated matrices we discussed previously. Next class, we will turn the
following theorem into a definition of a “finite free convolution” operation on
polynomials.
Theorem 9.4 ([11]). Let A
and B be normal
N × N matrices
and let U be
a random N × N
unitary matrix. Then
|
𝔼cx(A + 𝑈𝐵U∗) = ∑
k=0NxN−k(−1)k ∑
i+j=k (N)k
(N)i(N)j𝖾i(A)𝖾j(B)
|
and
|
𝔼cx(𝐴𝑈𝐵U∗) = ∑
k=0NxN−k(−1)k 1
(N
k) 𝖾k(A)𝖾k(B).
|
Proof. First of all, we can assume without loss of generality that
A and
B are
diagonal with
|
A = diag(a1,…,aN) and B = diag(b1,…,bN)
|
as outlined in the exercises. Let X = A + 𝑈𝐵U∗;
we need to compute 𝔼(X(S,S))
for |S| = k.
We have
det (X(S,S)) = ∑
σ∈Sym(S)sgn(σ)∏
i∈Sx𝑖𝜎(i)
= ∑
σ∈Sym(S)sgn(σ)∏
i∈S(aiδi=σ(i) + ∑
p=1Nu
𝑖𝑝bpuσ(i)p¯)
= ∑
σ∈Sym(S)sgn(σ)∑
R⊆S ∏
i∈Raiδi=σ(i) ∏
i∈S∖R ∑
p=1Nu
𝑖𝑝bpuσ(i)p¯
= ∑
R⊆S (∏
i∈Rai) ∑
σ∈Sym(S∖R)sgn(σ)∑
𝐩:S∖R→[N] ∏
i∈S∖Rui𝐩(i)b𝐩(i)uσ(i)𝐩(i)¯
= ∑
R⊆S (∏
i∈Rai) ∑
𝐩:S∖R→[N] (∏
i∈S∖Rb𝐩(i))
∑
σ∈Sym(S∖R)sgn(σ)∏
i∈S∖Rui𝐩(i)uσ(i)𝐩(i)¯
so we need to compute
|
∑
σ∈Sym(S∖R)sgn(σ)𝔼 (∏
i∈S∖Rui𝐩(i)uσ(i)𝐩(i)¯)
|
for each |S| = k,
R ⊆ S, and
𝐩 : S ∖ R → [N].
As outlined in the exercises, this can be reduced to the following problem: for
0 ≤ n ≤ N and
𝐩 : [n] → [N],
compute
|
∑
σ∈Snsgn(σ)𝔼 (∏
i=1nu
i𝐩(i)uσ(i)𝐩(i)¯) .
|
By Weingarten calculus, it’s equal to
|
∑
σ∈Snsgn(σ)∑
α,β∈Sn
i=σ(α(i))
𝐩(i)=𝐩(β(i))
WgN,n(α−1β) = ∑
σ∈Snsgn(σ)∑
α∈Sn
𝐩=𝐩∘α
WgN,n(𝜎𝛼);
|
the reduction from a double sum to a single sum was made by observing that
i = σ(α(i)) for all
1 ≤ i ≤ n forces
α = σ−1. If
𝐩 is not injective,
say 𝐩(i) = 𝐩(j) with
i≠j, then one can use
the transposition (i,j)
to set up cancellation of pairs in the outer sum, as outlined in the exercises. So in this case, the
expression above is 0.
Now, assume 𝐩 is
injective, so the constraint 𝐩 = 𝐩 ∘ α
makes α = e.
Using the character expansion, the above is equal to
|
1
n!2 ∑
λ⊢n dim(λ)2
𝗌λ(1N) ∑
σ∈Snsgn(σ)χλ(σ)
|
and the inner sum can be immediately recognized as a scalar multiple of the inner
product ⟨χ1n
,χλ⟩.
As such, the orthogonality relations give
|
∑
σ∈Snsgn(σ)χλ(σ) = { n!if λ = 1n
0 otherwise
|
so the above is
|
1
n! dim(1n)2
𝗌1n(1N) = 1
n!∏
(i,j)∈1nn!N(N − 1)⋯(N − n + 1) = (N − n)!
N! .
|
In total, what we have shown is that
|
∑
σ∈Snsgn(σ)𝔼 (∏
i=1nu
i𝐩(i)uσ(i)𝐩(i)¯) = { (N−n)!
N! if 𝐩 injective 0 otherwise .
|
Putting this back into the big formula, we have
𝔼𝖾k(A + 𝑈𝐵U∗) = ∑
|S|=k𝔼 det (X(S,S))
= ∑
|S|=k ∑
R⊆S (∏
i∈Rai) ∑
𝐩:S∖R→[N]
injective (∏
i∈S∖Rb𝐩(i)) (N −|S ∖ R|)!
N!
= ∑
|S|=k ∑
R⊆S det (A(R,R))|S ∖ R|!𝖾|S∖R|(B)(N −|S ∖ R|)!
N!
= ∑
|S|=k ∑
r=0k(k − r)!(N − (k − r))!
N! 𝖾k−r(B)∑
R⊆S
|R|=r det (A(R,R))
= ∑
r=0k(k − r)!(N − (k − r))!
N! 𝖾k−r(B)∑
R⊆[N]
|R|=r det (A(R,R))
∑
R⊆S⊆[N]
|S|=k 1
= ∑
r=0k(N − r
k − r) (k − r)!(N − (k − r))!
N! 𝖾r(A)𝖾k−r(B)
= ∑
r=0k(N − r)!(N − (k − r))!
N!(N − k)! 𝖾r(A)𝖾k−r(B)
which is the claim of (1) in the theorem. The claim of (2) can be proved by similar
techniques; the character computation can be reused, and the only difference is in the
final “reassembly” step. □
9.1. Exercises.
Exercise 9.5. Let A
and B be normal
N × N matrices with
eigenvalues (a1,…,aN) and
(b1,…,bN) respectively.
Let U be a
random N × N
unitary matrix.
-
Let
|
DA = diag(a1,…,aN) and DB = diag(b1,…,bN).
|
Prove that
|
𝔼cx(A + 𝑈𝐵U∗) = 𝔼c
x(DA + UDBU∗).
|
-
When you need to compute
|
∑
σ∈Sym(S)sgn(σ)𝔼 (∏
i∈Sui𝐩(i)uσ(i)𝐩(i)¯)
|
for S ⊆ [N]
with |S| = n
and 𝐩 : S → [N]
in the proof of Theorem 9.4, why can you safely assume S = [n]?
Exercise 9.6. Let 0 ≤ n ≤ N,
and let 𝐩 : [n] → [N]
be a multi-index which is not injective. Prove that
|
∑
σ∈Snsgn(σ)𝔼 (∏
i=1nu
i𝐩(i)uσ(i)𝐩(i)¯) = 0.
|
10. Finite free probability
Notation 10.1. Let p be
a polynomial of degree N.
We write the coefficients as follows:
|
p(x) = ∑
k=0NxN−k(−1)k𝖾
k(p).
|
One can immediately show that
|
𝖾k(p) = ∑
1≤i1/mo>⋯/mo>ik≤Nλi1⋯λik
|
where λ1,…,λN
are the roots of p.
(Write p
as a product of linear factors and expand.) This is the so-called elementary symmetric
polynomial, perhaps familiar to you from algebraic combinatorics or undergrad
algebra.
Definition 10.2. For monic polynomials
p and
q with
degree N,
define
|
p ⊞Nq := ∑
k=0NxN−k(−1)k ∑
i+j=k (N)k
(N)i(N)j𝖾i(p)𝖾j(p)
|
and
|
p ⊠Nq := ∑
k=0NxN−k(−1)k 1
(N
k) 𝖾k(p)𝖾k(q).
|
It must be emphasized that this definition is merely a repackaging of
Theorem 9.4.
Theorem 10.3 (Quadrature for finite free convolution). Let
p and
q be monic polynomials
of degree N.
Let A and
B be diagonal
N × N matrices
with cx(A) = p(x)
and cx(B) = q(x).
Then
and
where U
is a random matrix from any of the following groups:
- UN
– N × N
unitary matrices;
- ON
– N × N
orthogonal matrices;
- SN
– N × N
permutation matrices;
- 𝐙r ≀ SN
– N × N
permutation matrices signed by r-th
roots of unity, for 1 ≤ r ≤∞,
where the case r = ∞
is shorthand for the circle group.
Theorem 10.4 ([17, 14]). Let p
and q be monic
polynomials of degree N.
Suppose that p
and q
are real-rooted. Then
- p ⊞Nq
is real-rooted;
- if at least one of p
or q
is non-negative-rooted, then p ⊠Nq
is real-rooted;
- if both p
and q
are non-negative-rooted, then p ⊠Nq
is non-negative-rooted.
Notation 10.5. For a polynomial p
of degree N,
write
where
λ1,…,λN
are the roots of
p
with multiplicity.
Theorem 10.6 ([10, 2, 1]). For N ≥ 1,
let pN and
qN be monic real-rooted
polynomials with degree N, and
suppose that there are some μ,ν ∈Mc(ℝ)
with ρ(pN) → μ
and ρ(qN) → ν.
Then
- ρ(pN ⊞NqN) → μ ⊞ ν;
- if “real-rooted” is replaced by “non-negative-rooted” and Mc(ℝ)
is replaced by Mc(ℝ≥0),
then ρ(pN ⊠NqN) → μ ⊠ ν.
10.1. Finite free cumulants.
Recall that for a self-adjoint variable
a, the free cumulants
are a sequence (κn)n≥1,
related to the moment sequence by a sum over noncrossing partitions, in which
κ1 is the
mean and κ2
is the variance.
Definition 10.8 ([2]). The finite free cumulants are defined implicitly by
|
𝖾n(p) = 1
Nn( N
n) ∑
π∈P(n)N|π|μ(π)κ
π(N)(p)
|
where μ(π) := (−1)n−|π|∏
i≥2i!mi+1(π)
and mj(π)
is the number of blocks in π
with size j.
Example 10.10. With n = 1,
the relation above says that
|
𝖾1(p) = Nκ1(N)(p)⟹κ
1(N)(p) = 1
N∑
i=1Nλ
i
|
where λ1,…,λN
are the roots of p.
This is just the mean of ρ(p).
With n = 2,
it gives
which is not the variance. We will see later that
Example 10.11. Recall the expected characteristic polynomial of GUE:
|
𝔼cx(A) = ∑
k=0⌊N∕2⌋xN−2k(−1)k (N)2k
2kNkk!
|
Claim: the finite free cumulants are
|
κn(N) = { 1if n = 2
0otherwise
|
To check this: putting these κs
into the definition, we have
|
1
Nn( N
n) ∑
π∈P(n)N|π|μ(π)κ
π(N) = 1
Nn( N
n) ∑
π∈P2(n)N|π|μ(π)
|
which is 0
if n is odd.
If n = 2k,
then the above is
1
N2k( N
2k)∑
π∈P2(2k)Nk(−1)2k−k = 1
N2k( N
2k)(2k − 1)!!Nk(−1)k
= 1
Nk N!
(2k)!(N − 2k)! (2k)!
2kk! (−1)k
= (−1)k (N)2k
2kNkk!
which matches.
Theorem 10.12 ([2]).
κn(N)(p ⊞Nq) = κn(N)(p) + κn(N)(q)
Proof. We have
1
(N)k𝖾k(p ⊞Nq)
= ∑
i+j=k 1
(N)i𝖾i(p) 1
(N)j𝖾j(q) (Definition 10.2)
= ∑
i+j=k ( 1
Nii!∑
π1∈P(i)N|π1|μ(π
1)κπ1(N)(p)) ( 1
Njj!∑
π2∈P(j)N|π2|μ(π
2)κπ2(N)(q))
(Definition 10.8)
= 1
Nkk!∑
i+j=k(k
i) (∑
π1∈P(i)N|π1|μ(π
1)κπ1(N)(p)) (∑
π2∈P(j)N|π2|μ(π
2)κπ2(N)(q))
= 1
Nkk!∑
S⊆[k] (∑
π1∈P(S)N|π1|μ(π
1)κπ1(N)(p)) (∑
π2∈P([k]∖S)N|π2|μ(π
2)κπ2(N)(q))
= 1
Nkk!∑
π∈P(k) ∑
π=π1∪π2N|π1|+|π2|μ(π
1)μ(π2)κπ1(N)(p)κ
π2(N)(q)
= 1
Nkk!∑
π∈P(k)N|π|μ(π)∑
π=π1∪π2κπ1(N)(p)κ
π2(N)(q)
= 1
Nkk!∑
π∈P(k)N|π|μ(π)(κ(N)(p) + κ(N)(q))
π
so κn(N)(p ⊞Nq) = κn(N)(p) + κn(N)(q).
□
10.2. Exercises.
Exercise 10.13. Prove (3) in Theorem 10.3. Hint: the only difference is in computing
|
∑
σ∈Snsgn(σ)𝔼 (∏
i=1nu
i𝐩(i)uσ(i)𝐩(i)¯)
|
where U = (u𝑖𝑗)i,j
is a random permutation matrix. Show that the expression above is the same as it is
for U ∈ UN.
11. Asymptotics of finite free convolution
Reminder: free moment-cumulant formula is
For example,
- m1 = κ1
- m2 = κ2 + κ12
- m3 = κ3 + 3κ2κ1 + κ13
Last class, we introduced a sequence of finite free cumulants associated to a degree
N polynomial
p, denoted
by κn(N)(p). In
this class, we will develop a moment-cumulant relation and use it to prove that the finite
free cumulants characterize convergence in root distribution.
11.1. Combinatorics of maps.
Recall that the noncrossing partitions can be characterized as follows:
|
𝑁𝐶(n) ≃{α ∈ Sn : #(α) + #(α−1γ
n) = n + 1}
|
where γn := (1,…,n)
and #(⋅) is
the number of disjoint cycles. Furthermore, we have
for all α ∈ Sn,
and the LHS decreases in twos. We need to generalize this phenomenon:
Theorem 11.1. Let α,γ ∈ Sn and
suppose that ⟨α,γ⟩is transitive.
Then there is some g ≥ 0
such that
|
#(α) + #(α−1γ) = (n + 2 − #(γ)) − 2g.
|
Idea of proof. For each α,
there is a surface
- whose boundary consists of disjoint discs, each of which has a cycles of
γ
drawn on it in order;
- which has the permutation α
drawn on it, in such a way that the cycles do not cross and each cycle is
homeomorphic to a circle oriented in such a way that the normal vector
points outward.
Shrinking the discs to points produces a closed surface with
a graph drawn on it. The Euler characteristic formula says
V − E + F = 2 − 2g; the number of
vertices is #(γ), the number
of edges is n, and the
number of faces is #(α) + #(α−1γ).
This makes
|
#(γ) − n + #(α) + #(α−1γ) = 2(1 − g).
|
□
Example 11.2. We have already seen an instance of this theorem, when
γ = (1,…,n) and
α
is an involution with no fixed points. In this case, with
n = 2k, the
formula reads as
|
k + #(α−1γ) + 1 = 2k + 2 − 2g⟺#(α−1γ) = k + 1 − 2g
|
and we recover the exponents from the GUE genus expansion.
Example 11.3. More generally, we saw earlier in the course that there is a bijection
|
𝑁𝐶(n) ≃{α ∈ Sn : #(α) + #(α−1γ
n) = n + 1}
|
where γn := (1,…,n).
This is the case in Theorem 11.1 where γ = (1,…,n)
and g = 0.
Example 11.4 ([12]). Let γ = (1,2,3),
and α = (1,2,3)
and β = (1,3,2).
Then
|
α−1γ = (1,3,2)(1,2,3) = (1)(2)(3) and β−1γ = (1,2,3)(1,2,3) = (1,3,2)
|
and we can check genus:
|
#(α) + #(α−1γ) = 1 + 3 = 4 and n + 2 − #(γ) = 3 + 2 − 1 = 4
|
so g = 0,
while
|
#(β) + #(β−1γ) = 1 + 1 = 2 and n + 2 − #(γ) = 3 + 2 − 1 = 4
|
so g = 1.
You should draw some pictures to verify that the results of these formal computations
are sensible. The critical difference between α
and β,
when you do this, will be the orientation requirement.
11.2. Moment-cumulant formulas.
Proposition 11.5 ([2, 1]). The finite free cumulants satisfy (and are defined implicitly
by) the following moment-cumulant relation:
mn = (−1)n−1
Nn+1(n − 1)!∑
α,β∈Sn
⟨α,β⟩≤Sn transitive (−N)#(α)+#(β)κ
α(N)
= (−1)n−1
Nn+1(n − 1)!∑
γ∈Sn ∑
α∈Sn
⟨α,γ⟩≤Sn transitive (−N)#(α)+#(α−1γ)
κα(N)
Example 11.6. With n = 1,
the above says
|
m1 = 1
N2(−N)2κ
1(N) = κ
1(N)
|
so there is no difference yet. With n = 2,
it says
m2 = − 1
N3 ((−N)3κ
1(N)κ
1(N) + (−N)3κ
2(N) + (−N)2κ
2(N))
= κ2(N) + κ
1(N)κ
1(N) − 1
Nκ2(N)
so
|
m2 − m12 = (1 − 1
N )κ2(N)⟹κ
2(N) = N
N − 1(m2 − m12).
|
When N →∞, the
rational function in N
goes away and you recover the free case.
With n = 3,
the pairs α,β
which generate a transitive subgroup are the following:
- e,(1,2,3)
- e,(1,3,2)
- (1,2),(2,3)
- (1,2),(1,3)
- (1,3),(1,2)
- (1,3),(2,3)
- (2,3),(1,3)
- (2,3),(1,2)
- (1,2,3),e
- (1,3,2),e
- (1,2),(1,2,3)
- (1,2),(1,3,2)
- (1,3),(1,2,3)
- (1,3),(1,3,2)
- (2,3),(1,2,3)
- (2,3),(1,3,2)
- (1,2,3),(1,2)
- (1,2,3),(1,3)
- (1,2,3),(2,3)
- (1,2,3),(1,2,3)
- (1,2,3),(1,3,2)
- (1,3,2),(1,2)
- (1,3,2),(1,3)
- (1,3,2),(2,3)
- (1,3,2),(1,2,3)
- (1,3,2),(1,3,2)
So the moment-cumulant formula says
mn = 1
2N4 (2(−N)3+1κ
1κ1κ1 + 6(−N)2+2κ
2κ1 + 2(−N)3+1κ
3 + 6(−N)2+1κ
2κ1
+3(−N)1+2κ
3 + 2(−N)1+1κ
3 + 3(−N)1+2κ
3 + 2(−N)1+1κ
3)
Notice, again, that the leading order is
|
m3 = κ13 + 3κ
1κ2 + κ3 + O(N−1)
|
which matches the moment-cumulant formula for free cumulants.
Theorem 11.7 ([2]). Let μ
be a compactly supported measure, and for
N ≥ 1, let
pN be a monic real-rooted
polynomial with degree N.
Then the following are equivalent:
- ρ(pN) → μ
weakly;
- κn(N)(pN) → κn(μ)
for all n ≥ 1.
Proof. First, assume that κn(N)(pN) → κn(μ)
for all n ≥ 1.
Then by the moment-cumulant formula, we have
lim N→∞mn(pN)
= lim N→∞(−1)n−1
(n − 1)! ∑
α,β∈Sn
⟨α,β⟩ transitive (−1)#(α)+#(β) ( 1
N )n+1−#(α)−#(β)κ
α(N)(p)
= lim N→∞(−1)n−1
(n − 1)! ∑
γ∈Sn
∑
α∈Sn
⟨α,γ⟩ transitive (−1)#(α)+#(α−1γ)
( 1
N )n+1−#(α)−#(α−1γ)
κα(N)(p)
From Theorem 11.1, since g ≥ 0
and #(γ) ≥ 1,
we have
|
#(α) + #(α−1γ) ≤ n + 2 − #(γ) ≤ n + 1
|
This means the expression above does not blow up, and the summands that survive
are the ones where this inequality is an equality. So we can continue the above
as
lim N→∞mn(pN) = (−1)n−1
(n − 1)! ∑
γ∈Sn ∑
α∈Sn
⟨α,γ⟩ transitive
#(α)+#(α−1γ)=n+1
(−1)n+1κ
α
= 1
(n − 1)!∑
γ∈Sn ∑
α∈Sn
⟨α,α−1γ⟩ transitive
#(α)+#(α−1γ)=n+1
κα
Now, if γ ∈ Sn
has #(γ) /mo> 1,
then we have
|
#(α) + #(α−1γ) = n + 2 − #(γ) − 2g /mo> n + 2 − 1 − 2g = n + 1 − 2g ≤ n + 1
|
so the inner sum is empty. One can check the inner sum also only depends on
γ through the
cycle type: if γ′ = 𝜎𝛾σ−1,
then
∑
α∈Sn
⟨α,γ′⟩ transitive
#(α)+#(α−1γ′)=n+1
κα = ∑
α∈Sn
⟨𝜎𝛼σ−1,𝜎𝛾σ−1⟩ transitive
#(𝜎𝛼σ−1)+#((𝜎𝛼σ−1)−1(𝜎𝛾σ−1))=n+1
κ𝜎𝛼σ−1
= ∑
α∈Sn
⟨α,γ⟩ transitive
#(α)+#(α−1γ)=n+1
κα.
So if #(γ) = 1,
then the limiting expression reads as
|
∑
α∈Sn
#(α)+#(α−1γ
n)=n+1 κα = ∑
π∈𝑁𝐶(n)κπ = mn
|
which shows that lim N→∞mn(pN) → mn(μ).
Conversely, assume that ρ(pN) → μ
weakly, and assume for the sake of simplicity that the measures
ρ(pN) have
uniformly bounded support (this isn’t a serious restriction, see [1, Appendix B]), so
mn(pN) → mn(μ) for all
n ≥ 1. We will prove
by induction on n
that κn(N)(pN) → κn(μ) for
all n ≥ 1.
For the base case n = 1,
we have
|
lim N→∞κ1(N)(p
N) = lim N→∞m1(pN) = m1(μ) = κ1(μ).
|
Now suppose the claim is true up to n − 1.
Then the moment-cumulant formula can be re-arranged: the
α which
produce κn(N)
are exactly the ones with only one cycle, and for these, any
β will
produce a transitive subgroup. So we have
(−1)n−1
Nn+1(n − 1)!∑
α,β∈Sn
#(α)=1 (−N)1+#(β)κ
n(N)(p
N)
= mn(pN) − (−1)n−1
Nn+1(n − 1)!∑
α,β∈Sn
⟨α,β⟩ transitive
#(α)/mo>1 (−N)#(α)+#(β)κ
α(N)(p
N)
On the LHS, we have
(−1)n
(n − 1)!∑
β∈Sn(−1)#(β) ( 1
N )n−#(β)κ
n(N)(p
N)
= κn(N)(p
N) 1
(n − 1)!∑
β∈Sn (−1
N )n−#(β)
and by (1) the induction assumption and (2) the moment convergence assumption, on
the RHS we have
lim N→∞ (mn(pN)
−(−1)n−1
(n − 1)! ∑
α,γ∈Sn
⟨α,γ⟩ transitive
#(α)/mo>1 (−1)#(α)+#(α−1γ)
( 1
N )n+1−#(α)−#(α−1γ)
κα(N)(p
N))
= lim N→∞mn(pN) − 1
(n − 1)!∑
α,γ∈Sn
⟨α,γ⟩ transitive
#(α)/mo>1
#(α)+#(α−1γ)=n+1
lim N→∞κα(N)(p
N)
= mn(μ) −∑
π∈𝑁𝐶(n)
|π|/mo>1 κπ(μ).
So the RHS has a limit, and the non-cumulant part of the LHS converges to
1. This situation corresponds
to the following: let ak
and bk be
sequences where ak → 1,
ak≠0, and
akbk → L for
some L ∈ ℝ.
Then we have
by the quotient rule for limits of sequences. So we can conclude that
|
lim N→∞κn(N)(p
N) = mn(μ)−∑
π∈𝑁𝐶(n)
|π|/mo>1
κπ(μ) = κn(μ)
|
which proves the theorem. □
Corollary 11.8. For N ≥ 1,
let pN
and qN
be monic real-rooted polynomials with degree N.
Let μ
and ν
be compactly supported measures and suppose that ρ(pN) → μ
and ρ(qN) → ν.
Then ρ(pN ⊞NqN) → μ ⊞ ν.
Proof. Again, assume for the sake of simplicity that the supports of ρ(pN)
and ρ(qN)
are uniformly bounded. Then mn(pN) → mn(μ)
and mn(qN) → mn(ν).
By Theorem 11.7, we have κn(N)(pN) → κn(μ)
and κn(N)(qN) → κn(ν)
for all n ≥ 1.
Now, we can use Theorem 10.4 to ensure that
pN ⊞NqN is
real-rooted, and apply Theorem 10.12 and Corollary 8.8 to obtain
|
κn(N)(p
N ⊞NqN) = κn(N)(p
N) + κn(N)(q
N) → κn(μ) + κn(ν) = κn(μ ⊞ ν)
|
for all n ≥ 1.
Applying Theorem 11.7 again, we can conclude that ρ(pN ⊞NqN) → μ ⊞ ν.
□
References
-
1.
- Octavio Arizmendi, Jorge Garza-Vargas, and Daniel Perales, Finite free cumulants:
Multiplicative convolutions, genus expansion and infinitesimal distributions, Trans. Amer.
Math. Soc. 376 (2023), no. 6, 4383–4420.
-
2.
- Octavio Arizmendi and Daniel Perales, Cumulants for finite free convolution, J. Combin.
Theory Ser. A 155 (2018), 244–266.
-
3.
- Philippe Biane, Some properties of crossings and partitions, Discrete Math. 175 (1997),
no. 1-3, 41–53.
-
4.
- Patrick Billingsley, Probability and measure, third ed., Wiley Series in Probability and
Mathematical Statistics, John Wiley &Sons, Inc., New York, 1995.
-
5.
- Benoît Collins, Moments and cumulants of polynomial random variables on unitary groups,
the Itzykson-Zuber integral, and free probability, Int. Math. Res. Not. (2003), no. 17, 953–982.
-
6.
- Benoît Collins, Moment methods on compact groups: Weingarten calculus and its
applications, ICM—International Congress of Mathematicians. Vol. 4. Sections 5–8, EMS
Press, Berlin, 2023, pp. 3142–3164.
-
7.
- Benoît Collins and Piotr Śniady, Integration with respect to the Haar measure on unitary,
orthogonal and symplectic group, Comm. Math. Phys. 264 (2006), no. 3, 773–795.
-
8.
- Pavel Etingof, Oleg Golberg, Sebastian Hensel, Tiankai Liu, Alex Schwendner, Dmitry
Vaintrob, and Elena Yudovina, Introduction to representation theory, Student Mathematical
Library, vol. 59, American Mathematical Society, Providence, RI, 2011, With historical
interludes by Slava Gerovitch.
-
9.
- Roger A. Horn and Charles R. Johnson, Matrix analysis, second ed., Cambridge University
Press, Cambridge, 2013.
-
10.
- Adam W. Marcus, Polynomial convolutions and (finite) free probability, 2021, arXiv:
2108.07054 [math.CO].
-
11.
- Adam W. Marcus, Daniel A. Spielman, and Nikhil Srivastava, Finite free convolutions of
polynomials, Probab. Theory Related Fields 182 (2022), no. 3-4, 807–848.
-
12.
- James A. Mingo and Roland Speicher, Free probability and random matrices, Fields Institute
Monographs, vol. 35, Springer, New York; Fields Institute for Research in Mathematical
Sciences, Toronto, ON, 2017.
-
13.
- Alexandru Nica and Roland Speicher, Lectures on the combinatorics of free probability,
London Mathematical Society Lecture Note Series, vol. 335, Cambridge University Press,
Cambridge, 2006.
-
14.
- G. Szegö, Bemerkungen zu einem Satz von J. H. Grace über die Wurzeln algebraischer
Gleichungen, Math. Z. 13 (1922), no. 1, 28–55.
-
15.
- Dan Voiculescu, Symmetries of some reduced free product
C∗-algebras,
Operator algebras and their connections with topology and ergodic theory (Buşteni, 1983),
Lecture Notes in Math., vol. 1132, Springer, Berlin, 1985, pp. 556–588.
-
16.
- _________ , Limit laws for random matrices and free products, Invent. Math. 104 (1991), no. 1,
201–220.
-
17.
- J. L. Walsh, On the location of the roots of certain types of polynomials, Trans. Amer. Math.
Soc. 24 (1922), no. 3, 163–180.