Flash and JavaScript are required for this feature.
Download the video from Internet Archive.
Description: What can be inferred about a graph from its second eigenvalue? Professor Zhao explores the role of spectra in pseudorandom graphs. Some topics discussed are Payley graphs and Gauss sums, quasirandom Cayley graphs, and the Alon–Boppana bound.
Instructor: Yufei Zhao
Lecture 12: Pseudorandom Gr...
YUFEI ZHAO: All right. Last time we started talking about pseudorandom graphs, and we considered this theorem of Chung, Graham, and Wilson, which, for dense graphs, gave several equivalent notions of quasi-randomness that, at least the phase values, do not appear to be all that equivalent. But they are actually-- you can deduce one from the other. There was one condition at the very end which had to do with eigenvalues. And, basically, it said that if your second largest eigenvalue in absolute value is small, then the graph is pseudorandom. So that's something that I want to explore further today to better understand the relationship between eigenvalues of a graph and the pseudorandomness properties.
For much of-- pretty much all of today, we're going to look at a special class of graphs known as n, d, lambda graphs. This just means we have n vertices, and we're only going to consider, mostly out of convenience, d regular graphs. So this will make our life somewhat simpler. And the lambda stands for that-- if you look at the adjacency matrix, and if you write down the eigenvalues of the adjacency matrix, then, well, what are these eigenvalues?
The top one, because it's d regular, is equal to d. And lambda corresponds to the statement that all the other eigenvalues are, at most, lambda in absolute value. So the top one is equal to d. All the other ones in absolute value-- so it could be basically the maximum of these two-- is bounded above by lambda.
And at the end of last time, we showed this expander mixing lemma, which, in this language, says that if G is n, d, lambda, then one has the following discrepancy type. So the randomness property, namely that if you look at two vertex sets and look at how many actual edges are between them compared to what you expect if this were a random graph of a similar density, then these two numbers are very similar, and the amount of error is controlled by your lambda. In particular, a smaller lambda gives you a more pseudorandom graph.
So the second part of today's class, I want to explore the question of how small this lambda can be. So what's the optimal amount of pseudorandomness? But, first, I want to show you some examples. So, so far, we've been talking about pseudorandom graphs, and the only example, really, I've talked about is that a random graph is pseudorandom. Which is true. A random graph is pseudorandom with high probability, but some of the spirit of pseudorandomness is to come up with non-random examples, come up with deterministic constructions that give you pseudorandom properties.
So I want to begin today with an example. A lot of examples, especially for pseudorandomness, come from this class of graphs called Cayley graphs, which are built from a group. So we're going to reserve the letter G for graphs, so I'm going to use gamma for a group. And I have a subset S of gamma, and S is symmetric, in that if you invert the elements of S, they remain in S.
Then we define the Cayley graph given by this group and the set S to be the following graph, where V, the set of vertices, is just the set of group elements. And the edges are obtained by taking a group element and multiplying it by S to go to its neighbor. So this is a Cayley graph. And Cayley graphs are-- start with any group, start with any subset of the group, you get a Cayley graph. And this is a very important construction of graphs. They have lots of nice properties
And, in particular, an example of a Cayley graph is a Paley graph. They're not related. So a Paley graph is a special case of a Cayley graph obtained by considering the group, the cyclic group mod p, where p is prime 1 mod 4. And I'm looking at S being the set of quadratic residues, mod p. It's actually nonzero quadratic residues. So elements of mod p that could be a square.
So we will show in a second that this Paley graph has nice pseudorandom properties by showing that it is an n, d, lambda graph with lambda fairly small compared to the degree. Just a historical note-- so Raymond Paley-- so the Paley graph named after him-- he actually-- he was from the earlier part of 20th century. So from 1907 to 1932. So he died very young at the age of 26, and he actually died in an avalanche when he was skiing by himself in Banff. So Banff is a national park in Alberta in Canada.
And when I was in Banff earlier this year for a math conference-- so there's also a math conference center there-- so I had a chance to go visit the-- Raymond Paley's tomb. So there's a graveyard there where you can find his tomb. And it's very sad that, in his short mathematical timespan, actually he managed to do a lot of amazing mathematical-- find a lot of amazing mathematical discoveries. And there are many important concepts named after him. So things like Paley-Wiener theorem, Paley-Zygmund, Littlewood-Paley, all this important ideas and analysis named after Paley. And Paley graph is also one of his contributions.
So what we'll claim is that this Paley graph has the desired pseudorandom properties, in that if you look at its eigenvalues, then the top eigenvalues, except-- so except for the top eigenvalue, all the other eigenvalue are quite small. So keep in mind that the size of S is basically half of the group. So p minus 1 over 2. So for especially larger values of p, p's eigenvalues are quite small compared to the degree.
So the main way to show that Cayley graphs like that have small eigenvalues is to just compute what the eigenvalues are. And this is actually not so hard to do for Cayley graphs, so let me do this explicitly. So I will tell you very explicitly a set of eigenvectors. And they are-- the first eigenvector is just the all 1's vector. The second eigenvector is the vector coming from 1, omega, omega squared, so omega to the p minus 1, where omega is a parameter of p-th root of unity.
The next one is 1, omega square, omega fourth, all the way to omega p-- omega to the 2 times p minus 1. And so on. So I want to have-- yes, so OK. So I make this list, and I have p of them. So these are my eigenvectors. And let me check that they are actually eigenvectors. And then we can also compute their eigenvalues.
So the top eigenvector corresponds to d. So the all 1's in a d regular graph is always an eigenvector with eigenvalue d. And the other ones, we'll just do this computation. So instead of getting confused with indices, let me just compute, as an example, the j-th coordinate of the adjacency matrix times V2. So the j-th coordinate, so what it comes to, is the following sum. If I run over S, then omega raised to j plus s. So S is symmetric, so I don't have to worry so much about plus or minus. So I say j plus s.
So if you think about what this Cayley graph, how it is defined, if you hit this vector with that matrix, the j-th coordinate is that sum there. But I can rewrite the sum by taking out this common factor omega to j. And you see that this is the j-th coordinate of V2. And this is true for all j. So this number here is lambda 2.
And, more generally, lambda k is the following sum, for k from 0-- so from k being 1 through p. So when you plug in k equals to 1, you just get d. And the others are sums of these exponential sums. Now, this is a pretty straightforward computation. And, in fact, we're not using anything about quadratic residues. This is a generic fact about Cayley graphs of z mod p. So this is true for all Cayley graphs S, not necessarily for quadratic residues. And the basic reason is that, here, you have this set of eigenvectors, and they do not depend on S. So you might know this concept from other places, such as circular matrices and whatnot, but this is true in this simple computation.
So now we have the values of lambda explicitly. I can now compute their sizes. I want to know how big this lambda is. Well, the first one, when k equals to 1, it's exactly d, the degree, which is p minus 1 over 2. But what about the other ones? So, for the other ones, we can do a computation as follows. So note that I can rewrite lambda k by noting that if I take twice it and plus 1, then I obtain the following sum. Because here I am using the S as a set of quadratic residues. So if I consider this sum here, every quadratic residue gets counted twice, except for 0, which gets counted once.
And now I would like to evaluate the size of this sum, this exponential sum. And this is something that's known as a Gauss sum. So, basically, a Gauss sum is what happens when you have something that's like a quadratic, an exponential sum with a quadratic dependence in the exponent. And the trick here is to consider the square of the sum. So the magnitude squared.
Now if I expand the square-- so squaring is a common feature of many of the things we do in this course. It really simplifies your life. You do the square, you expand the sum. You can re-parameterize one of the summands like that. So do two steps at once. I'm re-parameterizing and I'm expanding. But now you see, if I expand the exponent, we find-- so that's just algebra.
And now you notice that this sum here, the sum over a is equal to-- when b is nonzero, I claim that this sum is 0. And when b is nonzero, then I'm summing over some permutations of the roots of unity. So here I'm assuming that k is bigger than-- let's say here k is not 0. So I'm re-parameterizing k a little bit. So k is not 0. Then when b is not 0, the sum over a is 0. And otherwise it equals to p. So the sum over here equals to p.
And, therefore, lambda k, lambda sub k-- how about if I-- so what should I change that to? So if I-- k is 0, then I want this to be lambda sub k plus 1. Then lambda sub k plus 1 is equal to plus/minus p plus 1 over 2, for all lambda not equal to 0.
So, really, except for the top eigenvalue, which is just the degree, all the other ones are one of these two values, and they're all quite small. So this is an explicit computation showing you that this Paley graph is indeed a pseudorandom graph. It's an example of a quasi-random graph. Yes.
AUDIENCE: Do we know what the sign is?
YUFEI ZHAO: The question is, do we know what the sign is? So we actually-- so here I am not telling you what the sign is, but you can look up. Actually, people have computed exactly what the sign should be. And this is something that you can find in a number theory textbook, like Aaron and Rosen. Any more questions?
There is a concept here I just want to bring out, that you might recognize sums like this. So this kind of sum. That's a Fourier coefficient. So if you have some Fourier transform, I mean, this is exactly what Fourier transforms look like. And it is indeed the case that, in general, if you have an Abelian group, then the eigenvalues and the spectral information of the corresponding Cayley graph corresponds to Fourier coefficients.
And this is the connection that we'll see also later on in the course when we consider additive combinatorics and giving a Fourier analytic proof of Roth's theorem. And there Fourier analysis will play a central role. But this is actually-- this analogy, as I've written it, is only for Abelian groups. If you try to do the same for non-Abelian groups, you will get something somewhat different.
So for non-Abelian groups, you do not have this nice notion of Fourier analysis, at least in the versions that generalizes what's above in a straightforward way. But, instead, you have something else, which many of you have seen before but under a different name. And that's representation theory, which, in some sense, is Fourier analysis, except, instead of one-dimensional objects and complex numbers, we're looking at higher-dimensional representations. So I just want to point out this connection, and we'll see more of it later on. Any questions?
So let's talk more about Cayley graphs. So, last time, we mentioned these notions of quasi-randomness. And I said at the end of the class that many of these equivalences between quasi-random graphs, they fail for sparse graphs. If your density, if your x density is a constant, then the equivalences no longer hold.
But what about for Cayley graphs? And, in particular, I would like to consider two specific notions that we discussed last time and try to understand how they relate to each other for Cayley graphs. So for dense Cayley graphs, it's a special case of what we did yesterday. So I'm really interested in sparser Cayley graphs, even down the degree. So even down the degree. So that's much sparser than the regime we were looking at last time.
And the main result I want to tell you is that the DISC condition is, in a very strong sense, actually equivalent to the eigenvalue condition for all Cayley graphs, including non-Abelian Cayley graphs. So before telling you what the statement is, I first want to give an example showing you that this equivalence is definitely not true if you remove the assumption of Cayley graphs.
For example, if you-- so example that this is false for non-Cayley. Because if you take, let's say, a large-- so let's say d regular graph. So let's say a large random d regular graph. d here can be a constant or growing with n, but this is a pretty robust example. And then I add to it an extra destroying copy of k sub d plus 1 that's much smaller in terms of number of vertices.
The big, large random graph, well, by virtue of being a random graph, has the discrepancy property. And because we're only adding in a very small number of vertices, it does not destroy the discrepancy property. The discrepancy property, if you're just adding a small number of vertices, it doesn't change much. So this whole thing has discrepancy.
However, what about the eigenvalues? Claim that the top two eigenvalues are in fact both equal to d. And that's because you have two eigenvectors, one which is the all 1's vector on this graph, another which is the all 1's vector on that graph. These two DISC components each give you a top eigenvector of d, so you get d twice. And, in particular, the second eigenvalue is not small. So the implication from DISC to eigenvalue really fails for non-Cayley graph for general graphs.
The implication, the other direction is actually OK. In fact, the eigenvalue implies DISC is actually the content of the expander mixing lemma. So this follows by expander mixing lemma. And that's because, if you look at the expander mixing lemma for a Cayley graph-- or for a-- not for Cayley graph-- for-- if you have the eigenvalue condition, then, automatically, you would find that these two guys here are at most n. So if lambda is quite small compared to the degree, then you still have the desired type of quasi-randomness. So I'll make the statements more precise in a second.
So the question is, how can we certify, how can we show that, in fact, DISC, which is seemingly weaker property, implies a stronger property of eigenvalue for Cayley graphs. And what is a special about Cayley graphs that would allow to do this, that the statement is generally false for non-Cayley graphs? So let me define-- so let me first tell you the result. So this is the result due to David Conlon and myself two years ago.
So many of you may not have been to many seminar talks, where there's this convention in mathematics talks where you don't write out your full name, only by the initial. Although some kind of false modesty. But, of course, we all love talking about our own results, but somehow we don't like to write our own name for some reason.
So here's the theorem. So I start with a finite group, gamma. And let me consider a subset S of gamma that is symmetric. And consider G the Cayley graph. Let me right n as the number of vertices, and d the size of S. So this is a d regular graph. Let me define the following properties.
The first property, I'll call DISC with epsilon. So I give you an explicit parameter. The number of edges between x and y differs from the number of edges that you would expect. So as in the expander mixing lemma. So the DISC property is that this quantity is small relative to the total number of edges.
The second property, which we'll call the eigenvalue property, EIG, is that G is an n, d, lambda graph, with lambda, at most, epsilon d. So lambda is quite small as a function of d. The conclusion of the theorem is that, up to a small change of parameters, these two properties are equivalent. In particular, eigenvalue implies a-- epsilon implies DISC of epsilon. And DISC of epsilon-- and this is the-- the second one is the more interesting direction-- it implies EIG. Well, you lose a little bit, but, at most, a constant factor. EIG of 8 epsilon. Any questions about the statement so far?
And so, as I mentioned, this is completely false if you consider non-Cayley graphs. And we also, using expander mixing lemma, using that implication up there, this direction follows. One of the main reasons I want to show you a proof of this theorem is that it uses this tool which I think is worth knowing. And this is an important inequality known as Grothendieck's inequality.
So many of you probably know Grothendieck as this famous French mathematician who reinvented modern algebraic geometry and spent the rest of his life writing tomes and tomes of text that have yet to be translated to English. But he also did some important foundational work in functional analysis before he became an algebraic geometry nerd. And this is one of the important results in that area that he-- so Grothendieck's inequality tells us that there exists some absolute constant k such that for every matrix A-- so a real-valued matrix-- we have that the-- so we have that, if you-- so here's the idea.
Let's consider the supremum-- so let's consider the following quantity. This is a bilinear form. So this is a bilinear form. This is basically a-- so bilinear form, if you hit it by a vector x and y from the two sides. And I'm interested in what is the maximum value of this bilinear form if you are allowed to take x and y to be plus/minus 1-valued real numbers?
So this is an important quantity, and it gives you a matrix. And it's basically asking you, you get a sign of plus or minus to each row and column, and I want to maximize this number here. This is an important quantity that we'll see actually much more in the next chapter on graph limits. But, for now, just take my word. This is a very important quantity.
And this is actually a quantity that is very difficult to evaluate. If I give you a very large matrix and ask you to compute this number here, there is no good algorithm for it. And it's believed that there is no good algorithm for it. On the other hand, there is a relaxation of this problem, which is the following. It's still a sum, but now, instead of considering the bilinear form there, let's consider the xi's and yi's. Not-- take them not form real numbers, but take vectors.
So let's consider the sum where I'm taking a similar-looking sum, except that xi's and yi's come from a unit ball in some vector space with an inner product, where B is the unit ball in some Rm, where here the dimension is actually not so relevant. The dimension is arbitrary. If you like, you can make m n or 2n because you only have that many vectors.
So this quantity here, just by very definition, is a relaxation of the right-hand of this quantity here. So it's at least this large. So, in particular, if you have whatever plus/minus, you can always look at the same quantity with m equal to 1, and you obtain this quantity here. But this quantity may be substantially larger. So the x and y's have more room to put themselves in to maximize the sum.
And Grothendieck's inequality tells us that the left-hand side actually cannot be too much larger than the right-hand side. It exceeds it by, at most, a constant factor. So, in other words, the left-hand side, which is known as a semi-definite relaxation, you are not losing by more than a constant factor compared to the original problem. And this is important in computer science because the left-hand side turns out to be a Semidefinite Program, an SDP, which does have efficient algorithms to compute. So you can give a constant factor approximation to this difficult compute but important quantity by using semidefinite relaxation. And Grothendieck's inequality promises us that it is a good relaxation.
You might ask, what is the value of k? So I said there exists some constant k. So this is actually a mystery. So the current proofs have been improved over time. And Grothendieck himself proved this theorem, but it constantly has been improved over time. And, currently, the best-known result is something along the lines of k roughly 1.78 works. But the optimal value, which is known as Grothendieck's constant, is unknown.
So this is Grothendieck's constant. Actually, this, what I've written down is what's called the real Grothendieck's constant. Because you can also write a version for complex numbers and complex vectors, and that's the complex Grothendieck's constant. Yes.
AUDIENCE: Is there a lower bound that's known [INAUDIBLE] greater than 1?
YUFEI ZHAO: Is there a lower bound that is known? Yes. It's known that it's strictly bigger than 1.
AUDIENCE: Do we know [INAUDIBLE]?
YUFEI ZHAO: So there are some specific numbers, but I forget what they are. You can look it up. Any more questions? So we'll leave Grothendieck's inequality. We'll use it as a black box. So if you wish to learn the proof, I encourage you to do so. There are some quite nice proofs out there. And we'll use it to prove this theorem here about quasi-random Cayley graphs.
So let's suppose DISC holds. So what would we like to-- what do we like to show? We want to show that this eigenvalue condition holds. And we'll use the-- some min-max characterization of eigenvalues. But, first, some preliminaries.
Suppose you have vectors x and y which have plus/minus 1 coordinate values. Then, by letting-- so let's consider the following vectors, where I split up x and y according to where they're positive and where they're negative. So, here, these are such that x plus is equal to-- so if I evaluate it on a coordinate g, then it's 1. So if x sub g is plus 1, and 0 otherwise. xg sub minus is 1 if x sub g is minus 1. 0 otherwise. So x splits into x plus minus x minus, and y splits into y plus minus y minus.
Let's consider a matrix A where the g comma h entry of A is the following quantity. I have the set S, and I look at whether g inverse h lies in S. And I can consider an indicator of that. So it's 1 or 0. And then subtract d over n so that this value has mean 0. So this is a matrix.
And now if I consider the bilinear form, hit A from left and right with x and y, then the bilinear form splits according to the plus and minuses of the x's. And I claim that each one of these terms is controlled because of DISC. So, for example, the first term is, if you expand out what this guy is-- so here's an indicator vector. That's an indicator vector.
And if you look at the definition, then this is precisely the number of edges between x plus and y plus minus d over n times the size of x plus times the size of y plus, where x plus is the set of group elements such that x sub g is 1, and so on. All right. So the punchline up there is that this quantity-- so this quantity is, at most, by discrepancy, epsilon dn. So this sum here, by triangle inequality, is, at most, 4 epsilon dn.
All right. So, so far, we've reinterpreted the discrepancy property. And what we really want to show is that this graph satisfies eigenvalue condition. So what does that actually mean to satisfy the eigenvalue condition? So by the min-max characterization of eigenvalues, it follows that the maximum of these two eigenvalues, which is the quantity that we would like to control, is equal to the following. It is equal to the supremum of this bilinear form when x and y are unit-length vectors.
And this is simply because A is the matrix-- it's not the adjacency matrix. A is not the adjacency matrix. A is the matrix obtained by essentially taking the adjacency matrix and subtracting that constant there. And subtracting that constant gets rid of the top eigenvalue. And what you remained is whatever that's left. And you want to show that whatever you remained has small spectral radius. So we would like to show that this quantity here is quite small.
Well, let's do it. So give me a pair of vectors, x and y. And let's set the following quantities, where I take a twist on this x vector by rotating the coordinates, setting x super s sub g, the coordinate g, to be x sub sg. So x is a vector indexed by the group elements, and then rotating this indexing of the group elements by s. So that's what I mean by superscript s. And, likewise, y superscript s is defined similarly.
So I claim that these twists, these rotations, do not change the norm of these vectors. And that should be pretty clear, because I'm simply relabeling the coordinates in a uniform way. And, likewise, same for y.
So I would like to show this quantity up here is small. So let's consider two unit vectors. And consider this bilinear form. If I expand out this bilinear form, it looks like that. I'm just writing it out. But now let me just throw in an extra variable of summation. What we'll do is essentially look at the same sum, but now I add in an extra s, and put this s over here. So convince yourself that this is the same sum. So it's simply re-parameterizing the sum. So this is the same sum.
But now, if you look at the definition of A, there's this cancellation. So the two s's cancel out. So let's rewrite the sum. 1 over n, then g, h, s, all group elements. Then-- now, if I bring this summation of s, now I bring it inside, and then you see that what's inside is simply the inner product between the two vectors, x sub g-- between the two vectors. So this is-- so what's inside is simply the product, inner product, between these two.
So I may need to redefine. Yes. So when you're looking at-- when you're talking about non-Abelian groups, it's always a question of which side should you multiply things by. And you guys are OK? Or I need to change this s to over here. But anyway, it should work. Yes, question.
AUDIENCE: yh [INAUDIBLE].
YUFEI ZHAO: yh. Thank you. Yes, I think-- OK. Question.
AUDIENCE: [INAUDIBLE]
YUFEI ZHAO: Great. So maybe I need to switch the definition here, but, in any case, some version of this should be OK. Yes. So figure it out later in the notes. But now-- OK. So you have this-- we have this here. And if you look at this quantity here, it is the kind of quantity that comes up in Grothendieck's inequality. So this is basically the left-hand side of Grothendieck's inequality.
What about the right-hand side of Grothendieck's inequality? Well, we already controlled that. We already controlled that because we said, whenever you have up there little x and little y-- so the conclusion of this board was that-- let me erase over here. So the conclusion of this board was that this bilinear form is bounded by, at most, 4 epsilon d, for all x and y being plus/minus 1 coordinate valued.
So combining them by Grothendieck, we have an upper bound, which is the Grothendieck constant times 4 epsilon-- so 4 epsilon dn. There's a-- sorry. There's an n missing here. And, therefore, because the Grothendieck constant is less than 2, we have a bound of 8 epsilon d. And this shows that this variational problem, which characterizes the largest eigenvalue in absolute value, is, at most, 8 epsilon d, thereby implying the eigenvalue property.
So the main takeaway from this proof, two things. One is Grothendieck's inequality is a nice thing to know. So it's a semidefinite relaxation that changes the problem, which is initially somewhat intractable, to a semidefinite problem which is both, from a computer science point of view, algorithmically tractable, but also has nice mathematical properties. And for this application here, there's this nice trick in this proof where I'm symmetrizing the coordinates using the group symmetries. And that allows me to obtain this characterization showing that eigenvalue condition and this discrepancy condition are equivalent for Cayley graphs. Let's take a quick break.
Any questions so far? So we've been talking about n, d, lambda graphs. So d regular graphs. And the next question I would like to address is, In an n, d, lambda graph, how small can lambda be? So smaller lambda corresponds to a more pseudorandom graph. So how small can this be?
And the right kind of setting that I want you to think about is think of d as a constant. So think of d as a constant, and n getting large. So how small can lambda be. And it turns out there is a limit to how small it can be. And it is known as the Alon-Boppana bound, which tells you that if you have a fixed d-- and so G is an n-vertex graph with adjacency matrix eigenvalues lambda 1 through lambda n, sorted in non-increasing order.
Then the second largest eigenvalue has to be at least, basically, 2 root d minus 1 minus a small error term, little on-- little o1, where the little o1 goes to 0 as n goes to infinity. So the Alon-Boppana bound tells you that the lambda cannot be below this quantity here. And I want to explain what is the significance of this quantity, and you will see it in the proof. And this quantity is the best possible. And it also says what do we know about the existence of graphs which have lambda 2 close to this number. So this is the optimal number you can put here. Question.
AUDIENCE: Does it say anything about how negative lambda n can be?
YUFEI ZHAO: Question-- does it say how negative lambda n can be? So I'll address that in a second, but, essentially, if you have a bipartite graph and lambda n equals to minus lambda 1.
AUDIENCE: [INAUDIBLE]
YUFEI ZHAO: More questions? So I want to show you a proof and, time permitting, a couple of proofs of Alon-Boppana bound. And they're all quite simple to execute, but the-- I think it's a good way to understand how these special techniques work. So, first, as with all of the proofs that we did concerning-- or most of them-- concerning eigenvalues, we're looking at the Courant-Fischer characterization of eigenvalues. It suffices to show, to exhibit some vector z-- so a nonzero vector-- such that z is orthogonal to the all 1's vector and this quotient is at least the claimed bound.
So by the Courant-Fischer characterization of the second eigenvalue, if you vary over all such d that are orthogonal to the unit vector, then the maximum value this quantity attains is equal to lambda 2. So to show the lambda 2 is large, it suffices to exhibit such a z. So let me construct such a z for you.
So let r be a positive integer. And let's pick an arbitrary vertex v. So v is a vertex in the graph. And let V sub i denote vertices at distance exactly i from V. From-- yes, from V. So, in particular, V0 is equal to V-- and I can just draw you a picture. So you have V0, and then the neighbors of V0, and each of them have more neighbors. Like that. So I'm calling V0 this stuff, big V0. And then big V1, V sub 2, and so on.
So I'm going to define a vector, which I'll eventually make into z, by telling you what is the value of this vector on each of these vertices. I will do this by setting very explicitly-- so set x to be a vector with value x sub u to be wi, where wi is d minus 1 raised to power minus i over 2 whenever u lies in set big V sub i. So u is distance exactly i from V. I set it to this number. So notice that they decrease as you get further away from V.
And I do this for all distances less than r. So this is my x vector. And I set all the other vect-- all the other corners to be 0 if the distance between u and V is at least r. So that gives you this vector. And I would like to compute that quotient over there for this vector. And I claim that this quotient here is at least the following quantity.
But this is a computation, so let's just do it. So why is this true? Well, if you compute the norm of x-- so I'm just taking the sum of the squares of these coordinates. Well, that comes from adding up these values. So for each element in the i-th neighborhood. So I have wi squared.
And if I look at that quantity up there, so what is this? A is the adjacency matrix. So over here, A is the adjacency matrix. So this quantity, I can write it as a sum over all vertices u. And I look at x sub u, and now I sum again over all neighbors of u, and consider x sub u prime. It's that sum there.
But this sum, I have some control over, because it is-- so what's happening here? I claim it has at least the following quantity. Consider where u is. So u could be-- I mean, it's only nonzero if u lies in the r minus 1th neighborhood. So in that neighborhood, I have V sub i possible choices for the vertex u. For that choice, this x sub u is w sub i.
But what about all its neighbors? So it could have neighbors, well, in the same set going left. But there's-- so there's one neighbor going left, and all the other neighbors are-- maybe it's in the same set, maybe it's in the next set. But, in any case, I have the following inequality. There's one neighbor in the same-- in the left, if you look at that picture just now. And then all the remaining neighbors have x sub u primes at least w sub i plus 1, because these weights are decreasing. So I can-- the worst case, so to speak, is if you-- all the neighbors point to the next set. So I had that inequality there.
There's an issue. Because if you go to the very last set, if you go to the very last set and think about what happens, when i in in that very last set, I'm overcounting neighbors that no longer has weights. So I need to take them out. So I should subtract d minus 1 times-- and so this is the maximum possible weight sum I could have-- maximum possible overcount. So each product here has t minus 1 neighbors at most. All right.
So this is-- should be pretty straightforward if you do the counting correctly. But now let's plug in what these weights are. And you'll find that this sum here, this quantity, is equal to-- so the key point here is that this thing simplifies very nicely if you consider what this is. So what ends up happening is that you get this extra factor of 2 root d minus 1. And then the sum minus 1/2 of V sub [INAUDIBLE]. It's pretty straightforward computation using the specific weights that we have.
And one more thing is that notice that this-- so notice that the sizes of each neighborhood cannot expand by more than a factor of d minus 1, because, well, you only have d minus 1 outward edges going forward at each step. And, as a result, I can bound this guy. And so what you find is that this whole thing here is that least 2 times root d minus 1. The main term is the sum. And this here is less than each individual summand. So I can do 1 minus 1 over 2r.
Putting these two together, you find the claim. All right. So I've exhibited this vector x, which has that quotient property. But that's not quite enough, because we need a vector-- so it's called z up here-- that is orthogonal to the all 1's vector. And that you can do, because if the number of vertices is quite a bit larger than-- compared to the degree, then I claim that there exists u and v vectors-- vertices that are at distance at least 2r.
So if I let-- this is the size of this tree. So if you have-- everything is within distance r-- distance 2r from a vertex, then they all lie on this tree edge. If you count the number of vertices in that tree, it's what I have-- the sum I've written here. So if I consider these two vectors-- so be-- so x be the vector obtained above, which is, in some sense-- and I'm being somewhat informal here-- centered at v. And if I let y be the vector but I center it now at the vector-- at u, then I claim that, essentially, x and y are supported on disjoint vertex sets that have no edges even between them.
So, in particular, this inner product-- this bilinear form-- not inner product but this bilinear form-- is equal to 0, since no edge between the supports of x and y. So now I have two vectors that do not interact, but both have this nice property above. And now I can take a linear combination.
Let me choose a constant c-- so it's a real constant-- such that this z equal to x minus cy has-- and I can choose this constant. So x and y are both non-negative entries. They're both nonzero, and I can choose this constant c so that it is-- this z is orthogonal to the all 1's vector. And I now I have this extra property I want.
But what about the inner products? Well, these two vectors, x and y, they do not interact at all. So their inner products split just fine, and the bilinear form splits just fine. So you have this inequality here, as desired. And r, notice that I can take r going to infinity as n going to infinity, because d is fixed. So if n goes to infinity, then r can go to infinity, roughly a logarithmic n.
And that proves the Alon-Boppana bound. And just to recap, to prove this bound, we needed to exhibit by the Courant-Fischer some vector with a nice-- this quotient such that this quotient is large. And we exhibit this quotient by constructing the vector explicitly around the vertex and finding two such vertices that are far away from constructing these two vectors, taking the appropriate linear combination so that the final vector is orthogonal to the unit vector, to the all 1's vector, and then showing that the corresponding bilinear form has-- is large enough. Any questions?
I want to show you a different proof which gives you a slightly worse result, but the proof is conceptually nice. So let me give you a second proof which is slightly weakening. And just that we'll show-- so we'll show that-- so the earlier proof showed that lambda 2 is quite large. But, next, we'll show that the max of lambda 2 and the lambda n is large. So not that the second largest eigenvalue is large, but the second largest eigenvalue in absolute value is large. So it's slightly weaker, but, for all intents and purposes, it's the same spirit. So I'll show this one here.
And this is a nice illustration of what's called a trace method, sometimes also a moment method. Here's the idea. As we saw in the proof relating the quasi-randomness of C4 and eigenvalues, well, C4's are-- eigenvalues are related to counting closed walks in a graph. And so we'll use that counting closed walks in a graph. And, specifically, the 2k-th moment of the spectrum is equal to the trace of the 2k-th power, which counts the number of closed walks of length exactly 2k.
So to lower bound the left-hand side, we want to lower-bound the right-hand side. So let's consider closed walks starting at a fixed vertex. So the number of closed walks of length exactly 2k starting at a fixed vertex v. Here we're in a d regular graph. So here we are in a d regular graph. I claim, whatever this number is-- it maybe different for each d-- it is at least the same quantity if I do this walk in an infinite d regular tree. So infinite d regular tree is what? This is an infinite d regulator tree. We just start with the vertex, and go out d regular.
So why is this true? So think about how you walk. So let me just explain. This is, I think, pretty easy once you see things the right way. So start with a vertex v. Think about how you walk. And whatever way you can walk, well, you can walk the same way on the infinite d regular tree. Well, I mean, sorry. Whatever walk you can do an infinite d regular tree, if you label the first vertex, the first edge, second edge, if you do a corresponding labeling on your original graph, you can do that walk on your original graph. Although the original graph may have some additional walks, namely things that involve cycles, that are not available on your tree. But, certainly, every walk, you can do. Every closed walk you can do on a tree, you can do the same walk on your graph.
So you can make this more formal. So you can write down a bijection or injection to make this more formal, but it should be fairly convincing that this inequality is true. But this is just a number. So this is a number of 2k walks in a d regular tree starting on the vertex. And this number has been well studied, and we don't need to know the precise number. We just need to know some good lower bound. And here is one lower bound, which is that there's at least a Catalan number, the k-th Catalan number, times d minus 1 to the k, where C sub k is the k-th Catalan number, which is equal to-- so k 2k choose k divided by k plus 1.
So let me remind you what this is. A wonder, has many combinatorial interpretations, and it's a fun exercise to do bijections between them. But, in particular, C3 is the equal to 5, which counts the number of ups and down walks of length 6 that never dip below the horizontal line where you start. So, then, this corresponds to going away from the root versus coming back to the root. Soon you have at least that many ways. And when you are moving away from the root, you have d minus 1 choices on which branch to go to. OK, good.
Given that, the right-hand side is at least, then, n, the number of vertices, times the quantity above related to Catalan numbers. On the other hand, the left-hand side is at most-- here we're using that 2k is an even number-- is at most d to the 2k plus all the other eigenvalues that are most lambda in absolute value. So let me call this quantity lambda.
Rearranging this inequality, we find that lambda to the 2k is at least this number here. Just here, I'm changing n minus 1 to n. So we have that. And now what can we do? We let n go to infinity and k go to infinity slowly enough. So if k goes to infinity and n goes to-- so k goes to infinity with n, but not too quickly. But k is little of log n. And we find that this quantity here is essentially 2 to the k-- 2 to the 2k. And this guy here is little o1. So lambda is at least 2 root d minus 1 minus little o1.
That proves, essentially, the Alon-Boppana bound, although a small weakening because we are-- this big eigenvalue, you might find might actually be very negative instead of very positive. But that's OK. For applications, this is not such a big deal. These are two different proofs. And now, we think about, are they really the same proof? Are they different proofs? Are they related to each other? So it's worth thinking about. They look very different, but how are they related to each other?
And one final remark. You already saw two different proofs as to-- I mean, that shows you this number, and you see where this number comes from. And let me just offer one final remark on where that number really comes from. And it really comes from this infinite d regular tree. So it turns out that 2 root d minus 1 exactly is the spectral radius of the infinite d regular tree. And that is the reason, in some sense, that this is the correct number occurring Alon-Boppana bound.
This is-- if you've seen things like algebraic topology or topology, this is a universal cover for d regular graphs. So I won't talk more about it, but just some general remarks, and you already saw two different proofs. So beginning of next time, I want to wrap this up and to show you-- to explain some-- what we know about are there graphs for which this bound is tight? And the answer is yes, and there are lots of major open problems as well related to what happens there. And then, after that, I would like to start talking about graph limits. So that's the next chapter of this course. OK, good.