1 00:00:01,550 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,310 Commons license. 3 00:00:05,310 --> 00:00:07,520 Your support will help MIT OpenCourseWare 4 00:00:07,520 --> 00:00:11,610 continue to offer high quality educational resources for free. 5 00:00:11,610 --> 00:00:14,180 To make a donation, or to view additional materials 6 00:00:14,180 --> 00:00:18,140 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,140 --> 00:00:19,026 at ocw.mit.edu. 8 00:00:23,178 --> 00:00:23,970 GILBERT STRANG: OK. 9 00:00:23,970 --> 00:00:27,690 Now, clustering for graphs. 10 00:00:27,690 --> 00:00:30,360 So this is a topic-- 11 00:00:30,360 --> 00:00:32,040 this is one of the important things 12 00:00:32,040 --> 00:00:34,890 you can try to do with a graph. 13 00:00:34,890 --> 00:00:36,225 So you have a large graph. 14 00:00:44,230 --> 00:00:47,430 Let me kind of divide it into two clusters. 15 00:00:55,020 --> 00:00:56,760 So you've got a giant graph. 16 00:00:56,760 --> 00:01:00,030 And then the job is to make some sense out of it. 17 00:01:00,030 --> 00:01:04,560 And one possible step is to be able to subdivide it, 18 00:01:04,560 --> 00:01:13,560 if, as I see here, there's a cut between two reasonably 19 00:01:13,560 --> 00:01:16,080 equal parts of the graph-- 20 00:01:16,080 --> 00:01:19,320 reasonable-- reasonably same size. 21 00:01:19,320 --> 00:01:24,810 And therefore, that graph could be studied in two pieces. 22 00:01:24,810 --> 00:01:30,940 So the question is, how do you find such a cut by a algorithm? 23 00:01:30,940 --> 00:01:33,260 What's an algorithm that would find that cut? 24 00:01:33,260 --> 00:01:36,120 So that's a problem. 25 00:01:36,120 --> 00:01:41,775 Let's say we're looking for two clusters. 26 00:01:45,940 --> 00:01:48,460 We could look for more clusters, but let's say we 27 00:01:48,460 --> 00:01:50,470 want to look for two clusters. 28 00:01:50,470 --> 00:01:52,250 So what are we trying to do? 29 00:01:52,250 --> 00:01:55,580 We're trying to minimize. 30 00:01:55,580 --> 00:01:56,830 So this is the problem, then. 31 00:02:02,380 --> 00:02:04,720 So we look for-- 32 00:02:04,720 --> 00:02:10,509 find positions x and y, let's say. 33 00:02:10,509 --> 00:02:15,370 Two which will be the centers, so to speak, of the-- 34 00:02:15,370 --> 00:02:21,940 and really, it's just these points that-- 35 00:02:21,940 --> 00:02:25,350 so the data is the points and the edges, 36 00:02:25,350 --> 00:02:27,810 as always-- the nodes and the edges. 37 00:02:27,810 --> 00:02:31,090 So the problem is to find x and y so that-- 38 00:02:31,090 --> 00:02:32,200 to minimize. 39 00:02:36,860 --> 00:02:45,100 So it's a distance of points ai from x-- 40 00:02:45,100 --> 00:02:49,690 maybe should emphasize we're in high dimensions-- 41 00:02:49,690 --> 00:02:54,230 plus the distance of other points. 42 00:02:54,230 --> 00:02:58,060 So the ai will be these points-- 43 00:02:58,060 --> 00:03:00,220 these nodes. 44 00:03:00,220 --> 00:03:05,080 And the bi will be these nodes, plus the sum 45 00:03:05,080 --> 00:03:09,670 of bi minus y squared. 46 00:03:09,670 --> 00:03:13,750 And you understand the rule here-- 47 00:03:13,750 --> 00:03:23,250 that together the a's union the b's give me all nodes. 48 00:03:25,900 --> 00:03:36,765 And I guess to be complete, the a's intersect the b's is empty. 49 00:03:39,340 --> 00:03:41,130 Just what you expect. 50 00:03:41,130 --> 00:03:45,110 I'm dividing the a's and the b's into two groups. 51 00:03:45,110 --> 00:03:48,350 And I'm picking an x and a y sort 52 00:03:48,350 --> 00:03:55,860 of at the center of those groups, so that is a minimum. 53 00:03:55,860 --> 00:03:57,050 So I want to minimize. 54 00:03:59,690 --> 00:04:03,790 And also, I probably want to impose some condition that 55 00:04:03,790 --> 00:04:08,060 the number of a's is reasonably close to the number of b's. 56 00:04:08,060 --> 00:04:13,880 In other words, I don't want just that to be the a, and all 57 00:04:13,880 --> 00:04:15,740 the rest to be the b's. 58 00:04:15,740 --> 00:04:21,140 That would be not a satisfactory clustering. 59 00:04:21,140 --> 00:04:26,400 I'm looking for clusters that are good sized clusters. 60 00:04:26,400 --> 00:04:27,990 So minimize that. 61 00:04:27,990 --> 00:04:28,980 OK. 62 00:04:28,980 --> 00:04:31,980 So there are a lot of different algorithms to do it. 63 00:04:34,710 --> 00:04:41,010 Some are more directly attacking this problem. 64 00:04:41,010 --> 00:04:46,470 Others use matrices that we associate with the graph. 65 00:04:46,470 --> 00:04:49,390 So let me tell you about two or three of those algorithms. 66 00:04:49,390 --> 00:04:51,660 And if you've seen-- 67 00:04:51,660 --> 00:04:54,480 studied-- had a course in graph theory, this-- 68 00:04:54,480 --> 00:04:56,710 you may already have seen this problem. 69 00:04:56,710 --> 00:05:02,910 First question would be, suppose I decide these are 70 00:05:02,910 --> 00:05:06,210 the a's, and those are the b's-- or some other decision. 71 00:05:06,210 --> 00:05:08,060 Yeah, probably some other decision. 72 00:05:08,060 --> 00:05:11,940 I don't want to solve the problem before I even start. 73 00:05:11,940 --> 00:05:14,640 So some a's and some b's. 74 00:05:14,640 --> 00:05:17,970 What would be the best choice of the x once you've decided 75 00:05:17,970 --> 00:05:19,110 on the a's? 76 00:05:19,110 --> 00:05:21,750 And what would be the best choice of the y once 77 00:05:21,750 --> 00:05:23,820 you've decided on the b's? 78 00:05:23,820 --> 00:05:33,290 So we can answer that question if we knew the two groups. 79 00:05:33,290 --> 00:05:37,000 We could see where they should be centered, 80 00:05:37,000 --> 00:05:41,360 with the first group centered at x, the second group centered 81 00:05:41,360 --> 00:05:43,850 at y, and what does centering mean? 82 00:05:43,850 --> 00:05:44,930 So let's just say-- 83 00:05:48,325 --> 00:05:51,210 so I think what I'm saying here is-- 84 00:05:55,030 --> 00:05:57,800 let me bring that down a little. 85 00:05:57,800 --> 00:06:01,250 So given the a's-- 86 00:06:01,250 --> 00:06:07,310 the a's-- this is a1 up to, say, ak. 87 00:06:07,310 --> 00:06:19,190 What is the best x just to make that part right? 88 00:06:19,190 --> 00:06:22,160 And the answer is, to do you know, geometrically, 89 00:06:22,160 --> 00:06:24,230 what x should be here? 90 00:06:24,230 --> 00:06:28,430 X is the-- so if I have a bunch of points, 91 00:06:28,430 --> 00:06:32,660 and I'm looking for the middle of those points-- 92 00:06:32,660 --> 00:06:36,870 the point x-- a good point x to say, OK, that's the middle. 93 00:06:36,870 --> 00:06:41,190 It'll make the sum of the distances, I think, squared-- 94 00:06:41,190 --> 00:06:43,740 I hope I'm right about that-- 95 00:06:43,740 --> 00:06:45,530 a minimum. 96 00:06:45,530 --> 00:06:48,350 What is x? 97 00:06:48,350 --> 00:06:50,242 It is the-- 98 00:06:50,242 --> 00:06:51,878 AUDIENCE: [INAUDIBLE]. 99 00:06:51,878 --> 00:06:52,920 GILBERT STRANG: Centroid. 100 00:06:52,920 --> 00:06:54,150 Centroid is the word. 101 00:06:54,150 --> 00:07:02,700 X is the centroid of the a's. 102 00:07:02,700 --> 00:07:05,490 And what is the centroid? 103 00:07:05,490 --> 00:07:06,600 Let's see. 104 00:07:06,600 --> 00:07:09,120 Oh, maybe I don't know if x and y were a good choice, 105 00:07:09,120 --> 00:07:11,730 but let me see what-- 106 00:07:15,140 --> 00:07:17,210 I guess, it's the average a. 107 00:07:17,210 --> 00:07:21,098 It's the sum of the a's-- 108 00:07:21,098 --> 00:07:22,820 of these a's. 109 00:07:22,820 --> 00:07:28,220 Those are vectors, of course, divided by the number of a's. 110 00:07:32,710 --> 00:07:33,340 I think. 111 00:07:33,340 --> 00:07:36,670 Actually, I was just quickly reviewing this morning, 112 00:07:36,670 --> 00:07:42,350 so I'm not totally on top of this centroid. 113 00:07:42,350 --> 00:07:44,000 What I'm going to talk-- 114 00:07:44,000 --> 00:07:47,200 the algorithm that I'm going to talk about is the k-- 115 00:07:47,200 --> 00:07:51,970 well, the k-means, it's always called. 116 00:07:51,970 --> 00:07:55,450 And here it will be the-- k will be 2. 117 00:07:58,840 --> 00:08:01,120 I just have two-- 118 00:08:01,120 --> 00:08:05,080 partitioning into two sets, a's and b's, so I just-- k 119 00:08:05,080 --> 00:08:06,140 is just 2. 120 00:08:06,140 --> 00:08:06,640 OK. 121 00:08:06,640 --> 00:08:10,050 What's the algorithm? 122 00:08:10,050 --> 00:08:15,420 Well, if I've chosen a partition-- 123 00:08:15,420 --> 00:08:17,880 the a's and b's have separated them-- 124 00:08:17,880 --> 00:08:22,800 then that tells me what the x and the y should be. 125 00:08:22,800 --> 00:08:26,140 But, now what do I do next? 126 00:08:26,140 --> 00:08:29,640 So is this going to be a sort of an alternating partition? 127 00:08:29,640 --> 00:08:32,039 Now I take those two centroids. 128 00:08:32,039 --> 00:08:43,090 So step one is for given a's and b's, find 129 00:08:43,090 --> 00:08:49,770 the centroids x and y. 130 00:08:49,770 --> 00:08:52,440 And that's elementary. 131 00:08:52,440 --> 00:09:03,510 Then the second step is, given the centroids, x and y-- 132 00:09:03,510 --> 00:09:06,890 given those positions-- given x and y-- 133 00:09:06,890 --> 00:09:10,000 they won't be centroids when you see what happened. 134 00:09:10,000 --> 00:09:13,310 Given x and y, redo-- 135 00:09:18,750 --> 00:09:23,200 form the best partition-- 136 00:09:23,200 --> 00:09:23,805 best clusters. 137 00:09:29,840 --> 00:09:36,230 So step one, we had a guess at what the best clusters were. 138 00:09:36,230 --> 00:09:37,820 And we found they're centroids. 139 00:09:37,820 --> 00:09:40,460 Now, we start with the centroids, 140 00:09:40,460 --> 00:09:43,250 and we form new clusters again. 141 00:09:43,250 --> 00:09:46,370 And if these clusters are the same as the ones 142 00:09:46,370 --> 00:09:50,240 we started with, then the algorithm is converged. 143 00:09:50,240 --> 00:09:52,070 But probably they won't be-- 144 00:09:52,070 --> 00:09:53,070 these clusters. 145 00:09:53,070 --> 00:09:57,005 So you'll have to tell me what I mean by the best clusters. 146 00:09:57,005 --> 00:10:02,390 If I've got the two points, x and y, I want the points-- 147 00:10:02,390 --> 00:10:03,980 I want to separate all the points 148 00:10:03,980 --> 00:10:08,270 that cluster around x to the ones that cluster around y. 149 00:10:08,270 --> 00:10:11,030 And then, they're probably different 150 00:10:11,030 --> 00:10:13,160 from my original start. 151 00:10:13,160 --> 00:10:14,990 So now I've got new-- 152 00:10:14,990 --> 00:10:18,360 now I repeat step one. 153 00:10:18,360 --> 00:10:20,920 But let's just think, how do I form the best clusters? 154 00:10:23,550 --> 00:10:28,830 Well, I take a point and I have to decide, does it go with x, 155 00:10:28,830 --> 00:10:30,900 or does it go within the x cluster, 156 00:10:30,900 --> 00:10:33,390 or does it go in the cluster around y? 157 00:10:33,390 --> 00:10:36,870 So how do I decide that? 158 00:10:36,870 --> 00:10:39,540 Just whichever one it's closer to. 159 00:10:39,540 --> 00:10:48,540 So each point goes with each node. 160 00:10:48,540 --> 00:10:51,720 You should-- I could say, each node 161 00:10:51,720 --> 00:11:01,950 goes with the closer of x and y. 162 00:11:05,250 --> 00:11:08,961 So points that should have been-- 163 00:11:08,961 --> 00:11:11,340 that are closer to x-- now we're going to put them 164 00:11:11,340 --> 00:11:13,430 in the cluster around x. 165 00:11:13,430 --> 00:11:16,890 And does that solve the problem? 166 00:11:16,890 --> 00:11:20,040 No, because-- well, it might, but it might not. 167 00:11:24,430 --> 00:11:27,880 We'd have to come back to step one. 168 00:11:27,880 --> 00:11:29,800 We've now changed the clusters. 169 00:11:29,800 --> 00:11:32,620 They'll have different centroids. 170 00:11:32,620 --> 00:11:35,410 So we repeat step one-- find the centroids 171 00:11:35,410 --> 00:11:37,150 for the two new clusters. 172 00:11:37,150 --> 00:11:38,980 Then we come to step two. 173 00:11:38,980 --> 00:11:43,210 Find the ones that should go with the two centroids, 174 00:11:43,210 --> 00:11:44,680 and back and forth. 175 00:11:44,680 --> 00:11:46,120 I don't know. 176 00:11:46,120 --> 00:11:49,750 I don't think there's a nice theory of convergence, 177 00:11:49,750 --> 00:11:52,160 or rate of convergence-- all the questions 178 00:11:52,160 --> 00:11:55,150 that this course is always asking. 179 00:11:55,150 --> 00:12:01,960 But it's a very popular algorithm, k-means. 180 00:12:01,960 --> 00:12:04,924 k would be to have k clusters. 181 00:12:04,924 --> 00:12:06,760 OK. 182 00:12:06,760 --> 00:12:08,170 So that's a-- 183 00:12:08,170 --> 00:12:10,210 I'm not going to discuss the-- 184 00:12:10,210 --> 00:12:12,910 I'd rather discuss some other ways to do this, 185 00:12:12,910 --> 00:12:14,410 to solve this problem. 186 00:12:14,410 --> 00:12:20,830 But that's one sort of hack that works quite well. 187 00:12:20,830 --> 00:12:21,330 OK. 188 00:12:21,330 --> 00:12:27,000 So second approach is what is coming next. 189 00:12:27,000 --> 00:12:39,140 Second solution method-- and it's 190 00:12:39,140 --> 00:12:45,476 called the spectral clustering. 191 00:12:49,850 --> 00:12:52,400 That's the name of the method. 192 00:12:52,400 --> 00:12:57,050 And before I write down what you do, what does the word 193 00:12:57,050 --> 00:12:58,580 spectral mean? 194 00:12:58,580 --> 00:13:02,750 You see spectral graph theory, spectral clustering. 195 00:13:02,750 --> 00:13:06,320 And in other parts of mathematics, you see that-- 196 00:13:06,320 --> 00:13:08,900 you see spectral theorem. 197 00:13:08,900 --> 00:13:10,430 I gave you the most-- 198 00:13:10,430 --> 00:13:15,070 and I described it as the most important-- perhaps-- 199 00:13:15,070 --> 00:13:18,160 theorem in linear algebra-- at least one of the top three. 200 00:13:21,940 --> 00:13:25,030 So I'll write it over here, because it's not-- 201 00:13:25,030 --> 00:13:27,760 it doesn't-- this is-- 202 00:13:27,760 --> 00:13:29,125 it's the same word, spectral. 203 00:13:34,530 --> 00:13:36,870 Well, let me ask that question again? 204 00:13:36,870 --> 00:13:39,560 What's that word spectral about? 205 00:13:39,560 --> 00:13:40,620 What does that mean? 206 00:13:40,620 --> 00:13:42,560 That means that if I have a matrix, 207 00:13:42,560 --> 00:13:47,450 and I want to talk about its spectrum, what 208 00:13:47,450 --> 00:13:50,000 is the spectrum of the matrix? 209 00:13:50,000 --> 00:13:54,190 It is the eigenvalues. 210 00:13:54,190 --> 00:13:56,960 So spectral theory, spectral clustering 211 00:13:56,960 --> 00:14:01,760 is using the eigenvalues of some matrix. 212 00:14:01,760 --> 00:14:04,890 That's what that spectral is telling me. 213 00:14:04,890 --> 00:14:05,390 Yeah. 214 00:14:05,390 --> 00:14:07,550 So the spectral theorem, of course, 215 00:14:07,550 --> 00:14:16,580 is that for a symmetric matrix S, the eigenvalues are real, 216 00:14:16,580 --> 00:14:21,710 and the eigenvectors are orthogonal. 217 00:14:21,710 --> 00:14:30,210 And don't forget what the real, full statement is here, 218 00:14:30,210 --> 00:14:34,730 because there could be repeated real eigenvalues. 219 00:14:34,730 --> 00:14:37,400 And what does the spectral theorem tell me 220 00:14:37,400 --> 00:14:42,440 for symmetric matrices, if lambda equals 221 00:14:42,440 --> 00:14:45,770 5 is repeated four times-- 222 00:14:45,770 --> 00:14:53,190 if it's a four times solution of the equation that 223 00:14:53,190 --> 00:14:57,180 gives eigenvalues, then what's the conclusion? 224 00:14:57,180 --> 00:15:01,740 Then there are four independent, orthogonal eigenvectors 225 00:15:01,740 --> 00:15:04,210 to go with it. 226 00:15:04,210 --> 00:15:06,480 We can't say that about matrices-- 227 00:15:06,480 --> 00:15:07,920 about all matrices. 228 00:15:07,920 --> 00:15:12,150 But we can say it about all symmetric matrices. 229 00:15:12,150 --> 00:15:15,380 And in fact, those eigenvectors are orthogonal. 230 00:15:15,380 --> 00:15:17,340 So we're even saying more. 231 00:15:17,340 --> 00:15:20,340 We can find four orthogonal eigenvectors 232 00:15:20,340 --> 00:15:25,280 that go with a multiplicity for eigenvalues. 233 00:15:25,280 --> 00:15:25,780 OK. 234 00:15:25,780 --> 00:15:27,780 That's spectral theorem. 235 00:15:27,780 --> 00:15:42,720 Spectral clustering starts with the graph Laplacian matrix. 236 00:15:46,140 --> 00:15:49,700 May I remember what that matrix is? 237 00:15:49,700 --> 00:15:56,370 Because that's the key connection of linear algebra 238 00:15:56,370 --> 00:16:00,240 to graph theory, is the properties of this graph, 239 00:16:00,240 --> 00:16:02,210 Laplacian matrix. 240 00:16:02,210 --> 00:16:02,940 OK. 241 00:16:02,940 --> 00:16:05,420 So let me say L, for Laplacian. 242 00:16:08,270 --> 00:16:10,640 So that matrix-- one way to describe it 243 00:16:10,640 --> 00:16:18,680 is as A transpose A, where A is the incidence 244 00:16:18,680 --> 00:16:25,380 matrix of the graph. 245 00:16:25,380 --> 00:16:30,720 Or another way we'll see is the D-- 246 00:16:30,720 --> 00:16:32,080 the degree matrix. 247 00:16:35,690 --> 00:16:36,395 That's diagonal. 248 00:16:42,890 --> 00:16:45,230 And I'll do an example just to remind you. 249 00:16:45,230 --> 00:16:49,505 Minus the-- well, I don't know what I'd call this one. 250 00:16:52,390 --> 00:16:55,480 Shall I call it B for the moment. 251 00:16:55,480 --> 00:16:58,060 And what matrix is B? 252 00:16:58,060 --> 00:17:00,910 That's the adjacency matrix. 253 00:17:07,030 --> 00:17:13,900 Really, you should know these four matrices. 254 00:17:13,900 --> 00:17:18,180 They're the key four matrices associated with any graph. 255 00:17:18,180 --> 00:17:20,760 The incidence matrix, that's m by n-- 256 00:17:24,960 --> 00:17:31,330 edges and nodes-- edges and nodes. 257 00:17:31,330 --> 00:17:35,970 So it's rectangular, but I'm forming A transpose A here. 258 00:17:35,970 --> 00:17:41,280 So I'm forming a symmetric, positive, semi-definite matrix. 259 00:17:41,280 --> 00:17:50,350 So this Laplacian is symmetric, positive, semi-definite. 260 00:17:50,350 --> 00:17:50,850 Yeah. 261 00:17:50,850 --> 00:17:54,170 Let me let me just recall what all these matrices are 262 00:17:54,170 --> 00:17:57,210 for a simple graph. 263 00:18:02,090 --> 00:18:02,590 OK. 264 00:18:02,590 --> 00:18:03,930 So I'll just draw a graph. 265 00:18:07,870 --> 00:18:09,340 All right. 266 00:18:09,340 --> 00:18:09,840 OK. 267 00:18:09,840 --> 00:18:11,460 So the incidence matrix-- 268 00:18:14,570 --> 00:18:17,600 there are 1, 2, 3, 4, 5 edges-- 269 00:18:17,600 --> 00:18:18,605 so five rows. 270 00:18:22,030 --> 00:18:24,100 There are four nodes-- 271 00:18:24,100 --> 00:18:27,890 1, 2, 3, and 4. 272 00:18:27,890 --> 00:18:29,590 So four columns. 273 00:18:29,590 --> 00:18:36,460 And a typical row would be edge 1 going from node 1 to node 2, 274 00:18:36,460 --> 00:18:41,390 so it would have a minus 1 and a 1 there. 275 00:18:41,390 --> 00:18:46,400 And let me take edge 2, going from 1 to node 3, 276 00:18:46,400 --> 00:18:51,600 so it would have a minus 1 and a 1 there, and so on. 277 00:18:51,600 --> 00:18:57,390 So that's the incidence matrix A. OK. 278 00:18:57,390 --> 00:18:58,950 What's the degree matrix? 279 00:18:58,950 --> 00:19:00,720 That's simple. 280 00:19:00,720 --> 00:19:09,410 The degree matrix-- well, A transpose A. This is m by n. 281 00:19:09,410 --> 00:19:11,280 This is n by m. 282 00:19:11,280 --> 00:19:14,190 So it's n by n. 283 00:19:18,300 --> 00:19:19,430 OK. 284 00:19:19,430 --> 00:19:22,330 In this case, 4 by 4. 285 00:19:22,330 --> 00:19:26,200 So the degree matrix will be 4 by 4, n by n. 286 00:19:26,200 --> 00:19:29,230 And it will tell us the degree of that, which means-- 287 00:19:29,230 --> 00:19:30,880 which we just count the edges. 288 00:19:30,880 --> 00:19:38,110 So three edges going in, node 2, three edges going in, node 3 289 00:19:38,110 --> 00:19:40,180 has just two edges. 290 00:19:40,180 --> 00:19:43,030 And node 4 has just two edges. 291 00:19:43,030 --> 00:19:45,610 So that's the degree matrix. 292 00:19:45,610 --> 00:19:47,860 And then the adjacency matrix that I've 293 00:19:47,860 --> 00:19:52,600 called B is also 4 by 4. 294 00:19:52,600 --> 00:19:55,630 And what is it? 295 00:19:55,630 --> 00:19:59,990 It tells us which node is connected to which node. 296 00:19:59,990 --> 00:20:04,810 So I don't allow nodes-- 297 00:20:04,810 --> 00:20:10,180 edges that connect a node to itself, so 0's on the diagonal. 298 00:20:10,180 --> 00:20:14,680 How many-- so which nodes are connected to node 1? 299 00:20:14,680 --> 00:20:18,640 Well, all of 2 and 4 and 3 are connected to 1. 300 00:20:18,640 --> 00:20:20,140 So I have 1's there. 301 00:20:22,670 --> 00:20:26,780 Node 2-- all three nodes are connected to node 2. 302 00:20:26,780 --> 00:20:31,040 So I'll have-- the second column and row will have all three 303 00:20:31,040 --> 00:20:32,210 1's. 304 00:20:32,210 --> 00:20:34,350 How about node 3? 305 00:20:34,350 --> 00:20:34,870 OK. 306 00:20:34,870 --> 00:20:39,770 Only edges-- only two edges are connected. 307 00:20:39,770 --> 00:20:46,820 Only two nodes are connected to 3, 1 and 2, but not 4. 308 00:20:46,820 --> 00:20:49,970 So 1 and 2 I have, but not 4. 309 00:20:49,970 --> 00:20:51,950 OK. 310 00:20:51,950 --> 00:20:54,500 So that's the adjacency matrix. 311 00:20:54,500 --> 00:20:55,700 Is that right? 312 00:20:55,700 --> 00:20:58,350 Think so. 313 00:20:58,350 --> 00:21:00,270 This is the degree matrix. 314 00:21:00,270 --> 00:21:04,290 This is the incidence matrix. 315 00:21:04,290 --> 00:21:08,730 And that formula gives me the Laplacian. 316 00:21:08,730 --> 00:21:09,240 OK. 317 00:21:09,240 --> 00:21:11,356 Let's just write down the Laplacian. 318 00:21:15,460 --> 00:21:19,150 So if I use the degree minus B-- 319 00:21:19,150 --> 00:21:20,290 that's easy. 320 00:21:20,290 --> 00:21:23,888 The degrees were 3, 3, 2, and 2. 321 00:21:23,888 --> 00:21:24,930 Now I have these minuses. 322 00:21:30,610 --> 00:21:33,660 And those were 0. 323 00:21:33,660 --> 00:21:34,160 OK. 324 00:21:37,290 --> 00:21:39,990 So that's a positive, semi-definite matrix. 325 00:21:39,990 --> 00:21:44,060 Is it a positive definite matrix? 326 00:21:44,060 --> 00:21:49,890 So let me ask, is it singular or is it not singular? 327 00:21:49,890 --> 00:21:51,690 Is there a vector in its null space, 328 00:21:51,690 --> 00:21:54,580 or is there not a vector in its null space? 329 00:21:54,580 --> 00:21:59,345 Can you solve Dx equals all 0's? 330 00:22:03,410 --> 00:22:06,050 And of course, you can. 331 00:22:06,050 --> 00:22:11,180 Everybody sees that vector of all 1's will 332 00:22:11,180 --> 00:22:17,220 be a solution to L-- 333 00:22:17,220 --> 00:22:17,720 sorry. 334 00:22:17,720 --> 00:22:21,670 I should be saying L here. 335 00:22:21,670 --> 00:22:24,750 Lx equals 0. 336 00:22:24,750 --> 00:22:38,130 Lx equals 0 as for a whole line of one dimensional null space 337 00:22:38,130 --> 00:22:45,120 of L has dimension 1. 338 00:22:45,120 --> 00:22:49,140 It's got 1 basis vector, 1, 1, 1, 1. 339 00:22:49,140 --> 00:22:55,860 And that will always happen with the graph set up 340 00:22:55,860 --> 00:22:57,020 that I've created. 341 00:22:57,020 --> 00:22:57,520 OK. 342 00:22:57,520 --> 00:23:02,400 So that's a first fact, that this positive, semi-definite 343 00:23:02,400 --> 00:23:09,980 matrix, L, has lambda 1 equals 0. 344 00:23:09,980 --> 00:23:15,170 And the eigenvector is constant-- 345 00:23:15,170 --> 00:23:19,580 C, C, C, C-- the one dimensional eigenspace. 346 00:23:19,580 --> 00:23:22,830 Or 1, 1, 1, 1 is the typical eigenvector. 347 00:23:22,830 --> 00:23:23,330 OK. 348 00:23:26,580 --> 00:23:32,130 Now back to graph clustering. 349 00:23:32,130 --> 00:23:35,820 The idea of graph clustering is to look 350 00:23:35,820 --> 00:23:39,960 at the Fiedler eigenvector. 351 00:23:39,960 --> 00:23:41,225 This is called the x2-- 352 00:23:44,450 --> 00:23:55,850 is the next eigenvector-- is the eigenvector for the smallest 353 00:23:55,850 --> 00:24:01,790 positive eigenvalue for a lambda min excluding 0-- 354 00:24:01,790 --> 00:24:06,020 so the smallest eigenvalue of L-- 355 00:24:11,454 --> 00:24:15,420 the smallest eigenvalue and its eigenvector-- this 356 00:24:15,420 --> 00:24:17,670 is called the Fiedler vector, named 357 00:24:17,670 --> 00:24:20,940 after the Czech mathematician. 358 00:24:25,330 --> 00:24:31,020 A great man in linear algebra, and he studied this factor-- 359 00:24:31,020 --> 00:24:32,510 this situation. 360 00:24:32,510 --> 00:24:38,310 So everybody who knows about the graph Laplacian 361 00:24:38,310 --> 00:24:43,050 is aware that its first eigenvalue is 0, 362 00:24:43,050 --> 00:24:46,170 and that the next eigenvalue is important. 363 00:24:46,170 --> 00:24:46,801 Yeah. 364 00:24:46,801 --> 00:24:48,968 AUDIENCE: Is the graph Laplacian named the Laplacian 365 00:24:48,968 --> 00:24:52,850 because it has connections to-- 366 00:24:52,850 --> 00:24:54,705 GILBERT STRANG: To Laplace's equation, yes. 367 00:24:58,860 --> 00:25:00,250 Yeah, that's a good question. 368 00:25:00,250 --> 00:25:04,430 So why the word-- 369 00:25:04,430 --> 00:25:06,672 the name, Laplacian? 370 00:25:10,050 --> 00:25:15,400 So yeah, that's a good question. 371 00:25:15,400 --> 00:25:22,170 So the familiar thing-- so it connects to Laplace's finite 372 00:25:22,170 --> 00:25:24,990 difference equation, because we're talking about matrices 373 00:25:24,990 --> 00:25:27,030 here, and not derivatives-- 374 00:25:27,030 --> 00:25:28,980 not functions. 375 00:25:28,980 --> 00:25:30,600 So why the word Laplacian? 376 00:25:30,600 --> 00:25:32,880 Well, so if I have a regular-- 377 00:25:32,880 --> 00:25:37,950 if my graph is composed of-- 378 00:25:37,950 --> 00:25:46,630 so there is a graph with 25 nodes, and 4 times 5-- 379 00:25:46,630 --> 00:25:48,190 20, about 40. 380 00:25:48,190 --> 00:25:55,460 This probably has about 40 edges and 25 nodes. 381 00:25:58,550 --> 00:26:01,910 And of course, I can construct its-- 382 00:26:01,910 --> 00:26:05,446 graph all those four matrices for it-- 383 00:26:05,446 --> 00:26:11,050 its incidence matrix, its degree matrix. 384 00:26:11,050 --> 00:26:15,470 So the degree will be four at all these inside points. 385 00:26:15,470 --> 00:26:19,220 The degree will be three at these boundary points. 386 00:26:19,220 --> 00:26:22,120 The degree will be two at these corner points. 387 00:26:22,120 --> 00:26:28,670 But the-- what will the matrix L look like? 388 00:26:28,670 --> 00:26:30,800 So what is L? 389 00:26:34,690 --> 00:26:39,660 And that will tell you why it has this name Laplacian. 390 00:26:39,660 --> 00:26:42,690 So the matrix L will have-- 391 00:26:42,690 --> 00:26:45,550 the degree 4 right will be on the diagonal. 392 00:26:45,550 --> 00:26:49,990 That's coming from D. The other-- 393 00:26:49,990 --> 00:26:55,750 the minus 1's that come from B, the adjacency matrix, 394 00:26:55,750 --> 00:27:01,510 will be associated with those nodes, and otherwise, all 0's. 395 00:27:01,510 --> 00:27:05,860 So this is a typical row of L. This is typical row 396 00:27:05,860 --> 00:27:09,280 of L centered at that node. 397 00:27:09,280 --> 00:27:12,760 So maybe that's node number 5, 10, 13. 398 00:27:12,760 --> 00:27:17,100 That's 13 out of 25 that would show you this. 399 00:27:17,100 --> 00:27:18,310 And the-- sorry. 400 00:27:18,310 --> 00:27:20,990 Those are minus 1's. 401 00:27:20,990 --> 00:27:22,330 Minus 1's. 402 00:27:22,330 --> 00:27:27,370 So a 4 on the diagonal, and four minus 1's. 403 00:27:27,370 --> 00:27:31,810 That's the model problem for when the graph is a grid-- 404 00:27:31,810 --> 00:27:33,280 square grid. 405 00:27:33,280 --> 00:27:36,940 And do you associate that with Laplace's equation? 406 00:27:36,940 --> 00:27:41,760 So this is the reason that Laplace-- 407 00:27:41,760 --> 00:27:45,220 why Laplace gets in it. 408 00:27:45,220 --> 00:27:48,940 Because Laplace's equation-- the differential equation-- 409 00:27:48,940 --> 00:27:52,510 is second derivative with respect to x squared, 410 00:27:52,510 --> 00:27:57,460 and the second derivative with respect to y squared is 0. 411 00:27:57,460 --> 00:28:02,800 And what we have here is Lu equals 0. 412 00:28:02,800 --> 00:28:07,000 It's the discrete Laplacian, the vector Laplacian, 413 00:28:07,000 --> 00:28:08,710 the graph Laplacian-- 414 00:28:08,710 --> 00:28:15,340 where the second x derivative is replaced by -1, 2, -1. 415 00:28:15,340 --> 00:28:20,290 And the second y derivative is replaced by -1, 2, -1. 416 00:28:20,290 --> 00:28:24,490 Second differences in the x and the y directions. 417 00:28:24,490 --> 00:28:26,620 So that's-- yeah. 418 00:28:26,620 --> 00:28:30,610 So that's the explanation for Laplace. 419 00:28:30,610 --> 00:28:33,910 It's the discrete Laplace-- 420 00:28:33,910 --> 00:28:37,870 discrete, or the finite difference Laplace. 421 00:28:37,870 --> 00:28:39,380 OK. 422 00:28:39,380 --> 00:28:46,870 Now to just finish, I have to tell you what the-- 423 00:28:46,870 --> 00:28:50,890 what clusters-- how do you decide the clusters from L? 424 00:28:50,890 --> 00:28:57,920 How does L propose two clusters, the a's and b's? 425 00:28:57,920 --> 00:29:02,290 And here's the answer. 426 00:29:02,290 --> 00:29:06,930 They come from this eigenvector-- 427 00:29:06,930 --> 00:29:09,180 the Fiedler eigenvector. 428 00:29:09,180 --> 00:29:10,750 You look at that eigenvector. 429 00:29:13,280 --> 00:29:17,908 It's got some positive and some negative components. 430 00:29:17,908 --> 00:29:23,000 The components with positive numbers of this eigenvector-- 431 00:29:23,000 --> 00:29:31,790 so the positive components of x-- 432 00:29:34,890 --> 00:29:39,797 of-- this eigenvector. 433 00:29:42,480 --> 00:29:45,698 And there are negative components of this eigenvector. 434 00:29:45,698 --> 00:29:46,990 And those are the two clusters. 435 00:29:51,930 --> 00:29:56,730 So it's-- the cluster is-- the two clusters are decided 436 00:29:56,730 --> 00:29:58,620 by the eigenvector-- 437 00:29:58,620 --> 00:30:02,910 by the signs-- plus or minus signs of the components. 438 00:30:02,910 --> 00:30:06,810 The plus signs go in one and the minus signs go in another. 439 00:30:06,810 --> 00:30:12,450 And you have to experiment to see that that would succeed. 440 00:30:12,450 --> 00:30:16,680 I don't know what it would do on this, actually, because that's 441 00:30:16,680 --> 00:30:19,980 hardly split up into two. 442 00:30:19,980 --> 00:30:22,860 I suppose maybe the split is along a line 443 00:30:22,860 --> 00:30:25,080 like that or something, to get-- 444 00:30:25,080 --> 00:30:26,880 I don't know what clustering. 445 00:30:26,880 --> 00:30:31,770 This is not a graph that is naturally clustered, 446 00:30:31,770 --> 00:30:36,480 but you could still do k-means on it. 447 00:30:36,480 --> 00:30:40,740 You could still do spectral clustering. 448 00:30:40,740 --> 00:30:42,480 And you would find this eigenvector. 449 00:30:42,480 --> 00:30:46,080 Now what's the point about this eigenvector? 450 00:30:46,080 --> 00:30:48,220 I'll finish in one moment. 451 00:30:48,220 --> 00:30:51,360 What do we know about that eigenvector as 452 00:30:51,360 --> 00:30:53,530 compared to that one? 453 00:30:53,530 --> 00:30:55,710 So here was an eigenvector all 1's. 454 00:30:55,710 --> 00:31:00,150 Let me just make it all 1's, 1, 1, 1, 1. 455 00:31:00,150 --> 00:31:03,890 In that picture, it's 25 1's. 456 00:31:03,890 --> 00:31:07,320 Here's the next eigenvector up. 457 00:31:07,320 --> 00:31:12,320 And what's the relation between those two eigenvectors of L? 458 00:31:12,320 --> 00:31:13,964 They are-- 459 00:31:13,964 --> 00:31:14,845 AUDIENCE: Orthogonal. 460 00:31:14,845 --> 00:31:15,970 GILBERT STRANG: Orthogonal. 461 00:31:15,970 --> 00:31:19,460 These are eigenvectors of a symmetric matrix. 462 00:31:19,460 --> 00:31:20,920 So they're orthogonal. 463 00:31:20,920 --> 00:31:25,840 So that means-- to be orthogonal to this guy means 464 00:31:25,840 --> 00:31:28,195 that your components add to 0, right? 465 00:31:28,195 --> 00:31:29,020 A Vector. 466 00:31:29,020 --> 00:31:31,510 Is orthogonal to all 1's. 467 00:31:31,510 --> 00:31:34,750 That dot product is just, add up the components. 468 00:31:34,750 --> 00:31:36,760 So we have a bunch of positive components 469 00:31:36,760 --> 00:31:38,740 and a bunch of negative components. 470 00:31:38,740 --> 00:31:44,620 They have the same sum, because the dot product with that is 0. 471 00:31:44,620 --> 00:31:48,610 And those two components-- those two sets of components are 472 00:31:48,610 --> 00:31:49,540 your-- 473 00:31:49,540 --> 00:31:54,250 to tell you the two clusters in spectral clustering. 474 00:31:54,250 --> 00:31:58,810 So it's a pretty nifty algorithm. 475 00:31:58,810 --> 00:32:02,920 It does ask you to compute an eigenvector. 476 00:32:02,920 --> 00:32:06,350 And that, of course, takes time. 477 00:32:06,350 --> 00:32:10,400 And then there's a third, more direct algorithm 478 00:32:10,400 --> 00:32:14,140 to do this optimization problem. 479 00:32:14,140 --> 00:32:15,970 Well, actually, there are many. 480 00:32:15,970 --> 00:32:17,680 This is an important problem, so there 481 00:32:17,680 --> 00:32:19,510 are many proposed algorithms. 482 00:32:19,510 --> 00:32:20,080 Good. 483 00:32:20,080 --> 00:32:20,580 OK. 484 00:32:20,580 --> 00:32:22,360 I'm closing up. 485 00:32:22,360 --> 00:32:23,160 Final question. 486 00:32:23,160 --> 00:32:23,660 Yeah? 487 00:32:23,660 --> 00:32:26,110 AUDIENCE: Is it possible to do more than two clusters? 488 00:32:26,110 --> 00:32:28,480 GILBERT STRANG: Well, certainly for k-means. 489 00:32:28,480 --> 00:32:31,870 Now, if I had to do three clusters with Fiedler, 490 00:32:31,870 --> 00:32:34,740 I would look at the first three eigenvectors. 491 00:32:34,740 --> 00:32:37,780 And, well, the first one would be nothing. 492 00:32:37,780 --> 00:32:39,790 And I would look at the next two. 493 00:32:39,790 --> 00:32:42,790 And that would be pretty successful. 494 00:32:42,790 --> 00:32:45,130 If I wanted six clusters, it probably 495 00:32:45,130 --> 00:32:49,700 would fall off in the quality of the clustering. 496 00:32:49,700 --> 00:32:50,200 Yeah. 497 00:32:50,200 --> 00:32:53,380 But that certainly-- I would look at the lowest six 498 00:32:53,380 --> 00:32:56,920 eigenvectors, and get somewhere. 499 00:32:56,920 --> 00:32:57,880 Yeah. 500 00:32:57,880 --> 00:32:58,420 Right. 501 00:32:58,420 --> 00:33:00,040 So OK. 502 00:33:00,040 --> 00:33:02,050 So that's a topic-- 503 00:33:02,050 --> 00:33:08,320 an important topic-- a sort of standard topic in applied graph 504 00:33:08,320 --> 00:33:09,820 theory. 505 00:33:09,820 --> 00:33:10,360 OK. 506 00:33:10,360 --> 00:33:12,550 So see you Wednesday. 507 00:33:12,550 --> 00:33:17,710 I'm hoping, on Wednesday-- 508 00:33:17,710 --> 00:33:24,040 so Professor Edelman has told me a new and optimal way 509 00:33:24,040 --> 00:33:29,995 to look at the problem of backpropagation. 510 00:33:29,995 --> 00:33:32,680 Do you remember backpropagation? 511 00:33:32,680 --> 00:33:34,600 If you remember the lecture on it-- 512 00:33:34,600 --> 00:33:39,300 you don't want to remember the lecture on it. 513 00:33:39,300 --> 00:33:43,990 It's a tricky, messy thing to explain. 514 00:33:43,990 --> 00:33:50,110 But he says, if I explain it using Julia in linear algebra, 515 00:33:50,110 --> 00:33:52,060 it's clear. 516 00:33:52,060 --> 00:33:54,580 So we'll give him a chance on Wednesday 517 00:33:54,580 --> 00:34:02,110 to show that revolutionary approach to the explanation 518 00:34:02,110 --> 00:34:03,640 of backpropagation. 519 00:34:03,640 --> 00:34:06,340 And I hope for-- 520 00:34:06,340 --> 00:34:08,620 I told him he could have half an hour, 521 00:34:08,620 --> 00:34:11,800 and projects would take some time. 522 00:34:11,800 --> 00:34:15,730 I hope-- now we've had two with wild applause. 523 00:34:15,730 --> 00:34:21,920 So I hope we get a couple more in our last class. 524 00:34:21,920 --> 00:34:22,420 OK. 525 00:34:22,420 --> 00:34:23,440 See you Wednesday. 526 00:34:23,440 --> 00:34:25,360 And if you bring-- 527 00:34:25,360 --> 00:34:30,460 well, if you have projects ready, send them to me online, 528 00:34:30,460 --> 00:34:33,489 and maybe a print out as well. 529 00:34:33,489 --> 00:34:35,170 That would be terrific. 530 00:34:35,170 --> 00:34:41,080 If you don't have them ready by the hour, they can go-- 531 00:34:41,080 --> 00:34:43,699 the envelope outside my office would receive them. 532 00:34:43,699 --> 00:34:44,199 Good. 533 00:34:44,199 --> 00:34:48,090 So I'll see you Wednesday for the final class.