1 00:00:00,000 --> 00:00:01,992 [MUSIC PLAYING] 2 00:00:15,960 --> 00:00:18,210 GILBERT STRANG: So, I'm Gilbert Strang. 3 00:00:18,210 --> 00:00:26,430 And this is about my new course 18.065 and the new textbook 4 00:00:26,430 --> 00:00:28,860 Linear Algebra and Learning from Data, 5 00:00:28,860 --> 00:00:30,750 and what's in those subjects. 6 00:00:30,750 --> 00:00:36,360 So there are really two essential topics 7 00:00:36,360 --> 00:00:42,690 and two supplementary, but all very important subjects. 8 00:00:42,690 --> 00:00:47,250 So if I tell you about those four parts of mathematics that 9 00:00:47,250 --> 00:00:49,680 are in the course, that will give you 10 00:00:49,680 --> 00:00:53,230 an idea if you're interested to follow through. 11 00:00:53,230 --> 00:00:56,730 So the first big subject is linear algebra. 12 00:00:56,730 --> 00:01:01,140 That subject has just surged, exploded in importance 13 00:01:01,140 --> 00:01:02,310 in practice. 14 00:01:02,310 --> 00:01:07,380 What I want to focus on is some of the best matrices, say, 15 00:01:07,380 --> 00:01:11,880 symmetric matrices, orthogonal matrices, and their relation. 16 00:01:15,270 --> 00:01:19,080 Those are the stars of linear algebra. 17 00:01:19,080 --> 00:01:24,690 And the key step is to factor a matrix 18 00:01:24,690 --> 00:01:28,820 into maybe symmetric times orthogonal matrix, 19 00:01:28,820 --> 00:01:34,950 maybe orthogonal times diagonal times orthogonal matrix-- 20 00:01:34,950 --> 00:01:38,700 that's a very important factorization called 21 00:01:38,700 --> 00:01:41,700 the singular value decomposition. 22 00:01:41,700 --> 00:01:46,380 That doesn't get into a lot of linear algebra courses, 23 00:01:46,380 --> 00:01:49,350 but it's so critical. 24 00:01:49,350 --> 00:01:54,340 So can I speak now about the second important topic, 25 00:01:54,340 --> 00:01:55,780 which is deep learning? 26 00:01:55,780 --> 00:01:57,830 So what is deep learning? 27 00:01:57,830 --> 00:02:02,370 The job is to create a function. 28 00:02:02,370 --> 00:02:06,030 Your inputs are, like for driverless cars, 29 00:02:06,030 --> 00:02:08,820 the input would be an image that's a telephone 30 00:02:08,820 --> 00:02:10,440 pole or a pedestrian. 31 00:02:10,440 --> 00:02:17,460 And the system has to learn to recognize which it is. 32 00:02:17,460 --> 00:02:23,730 Or the inputs from handwriting on addresses 33 00:02:23,730 --> 00:02:25,650 would be a zip code. 34 00:02:25,650 --> 00:02:31,830 So the system has to learn how to recognize 0, 1, 2, 3 up to 9 35 00:02:31,830 --> 00:02:34,380 from handwriting of all kinds. 36 00:02:34,380 --> 00:02:38,580 Or another one is speech, like what Siri has to do. 37 00:02:38,580 --> 00:02:42,630 So my speech has to get input and interpreted 38 00:02:42,630 --> 00:02:45,690 and output by the process of deep learning. 39 00:02:45,690 --> 00:02:49,080 So it involves creating a learning function. 40 00:02:49,080 --> 00:02:54,120 The function takes the input, the data, 41 00:02:54,120 --> 00:02:59,750 and produces the output, the meaning of that data. 42 00:02:59,750 --> 00:03:03,120 And so what's the function like? 43 00:03:03,120 --> 00:03:07,080 That's what mathematics is about, functions. 44 00:03:07,080 --> 00:03:11,070 So it involves matrix multiplication. 45 00:03:11,070 --> 00:03:14,910 Part of the function is multiplying vectors 46 00:03:14,910 --> 00:03:16,650 by matrices. 47 00:03:16,650 --> 00:03:18,720 So that's a bunch of steps. 48 00:03:18,720 --> 00:03:22,440 But if there was only that, if it was all linear algebra, 49 00:03:22,440 --> 00:03:25,660 the thing would fail and has failed. 50 00:03:25,660 --> 00:03:32,040 What makes it work now so much that companies are investing 51 00:03:32,040 --> 00:03:36,330 enormously in the technology is that there is now 52 00:03:36,330 --> 00:03:39,810 a nonlinear function, a very simple one in the middle 53 00:03:39,810 --> 00:03:42,870 between every pair of matrices. 54 00:03:42,870 --> 00:03:46,510 And that nonlinear function, I can even tell you what it is. 55 00:03:46,510 --> 00:03:49,500 It's a function f, let's call it f, 56 00:03:49,500 --> 00:03:55,230 f of x is equal to x if x is positive. 57 00:03:55,230 --> 00:03:59,010 And f of x is 0 if x is negative. 58 00:03:59,010 --> 00:04:00,690 So you can imagine it's graph. 59 00:04:00,690 --> 00:04:02,790 It's a flat line where it's negative. 60 00:04:02,790 --> 00:04:06,660 And then it's a 45 degree slope where it's positive. 61 00:04:06,660 --> 00:04:09,030 So putting that nonlinear function 62 00:04:09,030 --> 00:04:11,760 in between the matrix multiplications 63 00:04:11,760 --> 00:04:18,300 is the way to construct successful learning function. 64 00:04:18,300 --> 00:04:20,790 But you have to find those matrices. 65 00:04:20,790 --> 00:04:23,490 I mentioned two supporting subjects. 66 00:04:23,490 --> 00:04:26,850 The first is-- optimization would be the word. 67 00:04:26,850 --> 00:04:31,260 We have to find the entries in those matrices that 68 00:04:31,260 --> 00:04:34,200 go into the learning function. 69 00:04:34,200 --> 00:04:36,030 That's a crucial step. 70 00:04:36,030 --> 00:04:39,840 So this is a problem of minimizing the error 71 00:04:39,840 --> 00:04:44,520 with all those matrix entries as variables. 72 00:04:44,520 --> 00:04:50,880 So this is multivariable calculus, like 100,000. 73 00:04:50,880 --> 00:04:57,210 500,000 variables, it's just unthinkable in a basic calculus 74 00:04:57,210 --> 00:05:02,370 course, but it's happening in a company that's 75 00:05:02,370 --> 00:05:04,380 working with deep learning. 76 00:05:04,380 --> 00:05:10,370 And so that's the giant calculation of deep learning. 77 00:05:10,370 --> 00:05:16,130 That's what keeps GPU's going for a week. 78 00:05:16,130 --> 00:05:19,970 But it gives amazing results that could never 79 00:05:19,970 --> 00:05:22,610 have been achieved in the past. 80 00:05:22,610 --> 00:05:27,060 So then the other key subject is statistics. 81 00:05:27,060 --> 00:05:29,570 And the basic ideas of statistics 82 00:05:29,570 --> 00:05:32,510 play a role here, because when you're 83 00:05:32,510 --> 00:05:37,190 multiplying a whole sequence of matrices in deep learning, 84 00:05:37,190 --> 00:05:41,480 it's very possible for the numbers 85 00:05:41,480 --> 00:05:45,290 to grow out of sight exponentially 86 00:05:45,290 --> 00:05:47,510 or to drop to zero. 87 00:05:47,510 --> 00:05:52,220 And both of those are bad news for the learning function. 88 00:05:52,220 --> 00:05:56,870 So you need to keep the mean and variance at the right spot 89 00:05:56,870 --> 00:06:02,510 to keep those numbers in the learning function, 90 00:06:02,510 --> 00:06:08,330 those matrices in a good range. 91 00:06:08,330 --> 00:06:12,500 So this course won't be a statistics course, 92 00:06:12,500 --> 00:06:16,820 but it will use statistics as deep learning does. 93 00:06:16,820 --> 00:06:19,700 So those are the four subjects. 94 00:06:19,700 --> 00:06:24,080 Linear algebra and deep learning, two big ones. 95 00:06:24,080 --> 00:06:29,240 Optimization and statistics, essential also. 96 00:06:29,240 --> 00:06:35,720 So I hope you'll enjoy the videos and enjoy the textbook. 97 00:06:35,720 --> 00:06:41,660 And go to the OpenCourseWare site ocw.mit.edu 98 00:06:41,660 --> 00:06:43,850 for the full picture. 99 00:06:43,850 --> 00:06:47,930 Beyond the videos, there are exercises, problems, 100 00:06:47,930 --> 00:06:53,210 discussion, lots more toward making a complete presentation, 101 00:06:53,210 --> 00:06:55,420 which I hope you like.