1 00:00:15,350 --> 00:00:17,450 PROFESSOR: All right, everyone, so we 2 00:00:17,450 --> 00:00:21,510 are very happy to have Andy Beck as our invited speaker today. 3 00:00:21,510 --> 00:00:25,250 Andy has a very unique background. 4 00:00:25,250 --> 00:00:29,030 He's trained both as a computer scientist and as a clinician. 5 00:00:29,030 --> 00:00:31,490 His specialty is in pathology. 6 00:00:31,490 --> 00:00:34,860 When he was a student at Stanford, 7 00:00:34,860 --> 00:00:39,200 his thesis was on how one could use machine learning algorithms 8 00:00:39,200 --> 00:00:43,730 to really understand a pathology data set, at the time, 9 00:00:43,730 --> 00:00:47,180 using more traditional regression-style approaches 10 00:00:47,180 --> 00:00:49,430 to understanding what the field is now 11 00:00:49,430 --> 00:00:51,260 called computational pathology. 12 00:00:51,260 --> 00:00:53,780 But his work was really at the forefront of his field. 13 00:00:53,780 --> 00:00:56,390 Since then, he's come to Boston, where 14 00:00:56,390 --> 00:01:01,280 he was an attending and faculty at Beth Israel Deaconess 15 00:01:01,280 --> 00:01:02,480 Medical Center. 16 00:01:02,480 --> 00:01:04,010 In the recent couple of years, he's 17 00:01:04,010 --> 00:01:07,160 been running a company called PathAI, 18 00:01:07,160 --> 00:01:11,960 which is, in my opinion, one of the most exciting companies 19 00:01:11,960 --> 00:01:13,670 of AI in medicine. 20 00:01:13,670 --> 00:01:15,920 And he is my favorite invited speaker-- 21 00:01:15,920 --> 00:01:16,310 ANDY BECK: He says that to everyone. 22 00:01:16,310 --> 00:01:18,893 PROFESSOR: --every time I get an opportunity to invite someone 23 00:01:18,893 --> 00:01:19,870 to speak. 24 00:01:19,870 --> 00:01:21,960 And I think you'll be really interested in what 25 00:01:21,960 --> 00:01:22,770 he has to say. 26 00:01:22,770 --> 00:01:23,080 ANDY BECK: Great. 27 00:01:23,080 --> 00:01:24,080 Well, thank you so much. 28 00:01:24,080 --> 00:01:25,550 Thanks for having me. 29 00:01:25,550 --> 00:01:28,310 Yeah, I'm really excited to talk in this course. 30 00:01:28,310 --> 00:01:32,512 It is a super exciting time for machine learning in pathology 31 00:01:32,512 --> 00:01:34,220 And if you have any questions throughout, 32 00:01:34,220 --> 00:01:35,360 please feel free to ask. 33 00:01:38,530 --> 00:01:42,430 And so for some background on what pathology is-- 34 00:01:42,430 --> 00:01:43,860 it's so like, if you're a patient. 35 00:01:43,860 --> 00:01:46,890 You go to the doctor, and AI could 36 00:01:46,890 --> 00:01:49,890 apply in any aspect of this whole trajectory, 37 00:01:49,890 --> 00:01:52,140 and I'll kind of talk about specifically in pathology. 38 00:01:52,140 --> 00:01:53,140 So you go to the doctor. 39 00:01:53,140 --> 00:01:54,660 They take a bunch of data from you. 40 00:01:54,660 --> 00:01:55,540 You talk to them. 41 00:01:55,540 --> 00:01:58,260 They get signs and symptoms. 42 00:01:58,260 --> 00:02:00,590 Typically, if they're at all concerned, 43 00:02:00,590 --> 00:02:03,510 and it could be something that's a structural alteration that's 44 00:02:03,510 --> 00:02:05,580 not accessible just through taking blood work, 45 00:02:05,580 --> 00:02:08,056 say, like a cancer, which is one of the biggest things, 46 00:02:08,056 --> 00:02:10,139 they'll send you to radiology where they want to-- 47 00:02:10,139 --> 00:02:12,139 the radiology is the best way for acquiring data 48 00:02:12,139 --> 00:02:14,400 to look for big structural changes. 49 00:02:14,400 --> 00:02:16,830 So you can't see single cells in radiology. 50 00:02:16,830 --> 00:02:20,250 But you can see inside the body and see some large things that 51 00:02:20,250 --> 00:02:22,230 are changing to make evaluations for, 52 00:02:22,230 --> 00:02:25,410 like, you have a cough, like are you looking at lung cancer, 53 00:02:25,410 --> 00:02:27,200 or are you looking at pneumonia? 54 00:02:27,200 --> 00:02:29,010 And radiology only takes you so far. 55 00:02:29,010 --> 00:02:33,030 And people are super excited about applying AI to radiology, 56 00:02:33,030 --> 00:02:35,100 but I think one thing they often forget 57 00:02:35,100 --> 00:02:38,280 is these images are not very data-rich compared 58 00:02:38,280 --> 00:02:41,460 to the core data types. 59 00:02:41,460 --> 00:02:43,257 I mean, this is my bias from pathology, 60 00:02:43,257 --> 00:02:45,090 but radiology gets you some part of the way, 61 00:02:45,090 --> 00:02:46,890 where you can sort of triage normal stuff. 62 00:02:46,890 --> 00:02:48,630 And the radiologist will have some impression 63 00:02:48,630 --> 00:02:49,410 of what they're looking at. 64 00:02:49,410 --> 00:02:50,785 And often, that's the bottom line 65 00:02:50,785 --> 00:02:52,890 in the radiology report is impression-- concerning 66 00:02:52,890 --> 00:02:55,770 for cancer, or impression-- likely benign but not sure, 67 00:02:55,770 --> 00:02:57,823 or impression-- totally benign. 68 00:02:57,823 --> 00:02:59,740 And that will also guide subsequent decisions. 69 00:02:59,740 --> 00:03:01,410 But if there's some concern that something serious is going on, 70 00:03:01,410 --> 00:03:03,750 the patient undergoes a pretty serious procedure, 71 00:03:03,750 --> 00:03:05,130 which is a tissue biopsy. 72 00:03:05,130 --> 00:03:08,292 So pathology requires tissue to do 73 00:03:08,292 --> 00:03:09,750 what I'm going to talk about, which 74 00:03:09,750 --> 00:03:11,957 is surgical pathology that requires tissue specimen. 75 00:03:11,957 --> 00:03:13,290 There's also blood-based things. 76 00:03:13,290 --> 00:03:16,475 But then this is the diagnosis where you're trying to say 77 00:03:16,475 --> 00:03:17,100 is this cancer? 78 00:03:17,100 --> 00:03:18,300 Is this not cancer? 79 00:03:18,300 --> 00:03:20,070 And that report by itself can really 80 00:03:20,070 --> 00:03:21,780 guide subsequent decisions, which 81 00:03:21,780 --> 00:03:24,450 could be no further treatment or a big surgery 82 00:03:24,450 --> 00:03:27,330 or a big decision about chemotherapy and radiotherapy. 83 00:03:27,330 --> 00:03:28,830 So this is one area where you really 84 00:03:28,830 --> 00:03:31,260 want to incorporate data in the most effective way 85 00:03:31,260 --> 00:03:33,330 to reduce errors, to increase standardization, 86 00:03:33,330 --> 00:03:35,130 and to really inform the best treatment 87 00:03:35,130 --> 00:03:37,830 decision for each patient based on the characteristics 88 00:03:37,830 --> 00:03:39,305 of their disease. 89 00:03:39,305 --> 00:03:40,680 And the one thing about pathology 90 00:03:40,680 --> 00:03:43,470 that's pretty interesting is it's super visual. 91 00:03:43,470 --> 00:03:46,380 And this is just a kind of random sampling 92 00:03:46,380 --> 00:03:48,138 of some of the types of different imagery 93 00:03:48,138 --> 00:03:49,930 that pathologists are looking at every day. 94 00:03:49,930 --> 00:03:52,770 I think this is one thing that draws people to this specialty 95 00:03:52,770 --> 00:03:54,960 is a saying in radiology, you're sort 96 00:03:54,960 --> 00:03:57,690 of looking at an impression of what might be happening based 97 00:03:57,690 --> 00:04:01,862 on sending different types of images 98 00:04:01,862 --> 00:04:03,570 and acquiring the data and sort of trying 99 00:04:03,570 --> 00:04:04,737 to estimate what's going on. 100 00:04:04,737 --> 00:04:07,380 Whereas here, you're actually staining pieces of tissue 101 00:04:07,380 --> 00:04:10,617 and looking by eye at actual individual cells. 102 00:04:10,617 --> 00:04:11,700 You can look within cells. 103 00:04:11,700 --> 00:04:14,170 You can look at how populations of cells are being organized. 104 00:04:14,170 --> 00:04:16,003 And for many diseases, this still represents 105 00:04:16,003 --> 00:04:19,714 sort of the core data type that defines what's going on, 106 00:04:19,714 --> 00:04:21,839 and is this something with a serious prognosis that 107 00:04:21,839 --> 00:04:22,797 requires, say, surgery? 108 00:04:22,797 --> 00:04:24,589 Or is this something that's totally benign? 109 00:04:24,589 --> 00:04:26,980 All of these are different aspects of benign processes. 110 00:04:26,980 --> 00:04:29,148 And so just the normal human body 111 00:04:29,148 --> 00:04:30,690 creates all these different patterns. 112 00:04:30,690 --> 00:04:32,620 And then there's a lot of patterns of disease. 113 00:04:32,620 --> 00:04:35,730 And these are all different subtypes of disease that 114 00:04:35,730 --> 00:04:37,345 are all different morphologies. 115 00:04:37,345 --> 00:04:38,970 So there's sort of an incredible wealth 116 00:04:38,970 --> 00:04:41,003 of different visual imagery that the pathologist 117 00:04:41,003 --> 00:04:42,670 has to incorporate into their diagnosis. 118 00:04:42,670 --> 00:04:43,800 And then there's, on top of that, 119 00:04:43,800 --> 00:04:45,258 things like special stains that can 120 00:04:45,258 --> 00:04:48,840 stain for specific organisms, for infectious disease, 121 00:04:48,840 --> 00:04:50,730 or specific patterns of protein expression, 122 00:04:50,730 --> 00:04:54,180 for subtyping disease based on expression of drug targets. 123 00:04:54,180 --> 00:04:57,390 And this even more sort of increases the complexity 124 00:04:57,390 --> 00:04:58,900 of the work. 125 00:04:58,900 --> 00:05:01,290 So for many years, there's really nothing new 126 00:05:01,290 --> 00:05:03,870 about trying to apply AI or machine learning 127 00:05:03,870 --> 00:05:05,230 or computation to this field. 128 00:05:05,230 --> 00:05:06,930 It's actually a very natural field, 129 00:05:06,930 --> 00:05:09,058 because it's sort of laboratory-based. 130 00:05:09,058 --> 00:05:10,350 It's all about data processing. 131 00:05:10,350 --> 00:05:12,017 You take this input, things like images, 132 00:05:12,017 --> 00:05:14,340 and produces output, what a diagnosis is. 133 00:05:14,340 --> 00:05:17,020 So people have really been trying this for 40 years 134 00:05:17,020 --> 00:05:17,520 or so now. 135 00:05:17,520 --> 00:05:19,950 This is one of the very first studies that sort of just 136 00:05:19,950 --> 00:05:22,230 tried to see, could we train a computer 137 00:05:22,230 --> 00:05:25,162 to identify the size of cancer cells 138 00:05:25,162 --> 00:05:27,120 through a process they called morphometry, here 139 00:05:27,120 --> 00:05:27,990 on the bottom? 140 00:05:27,990 --> 00:05:31,350 And then could we just use sort of measurements 141 00:05:31,350 --> 00:05:35,220 about the size of cancer cells in a very simple model 142 00:05:35,220 --> 00:05:36,295 to predict outcome? 143 00:05:36,295 --> 00:05:37,920 And in this study, they have a learning 144 00:05:37,920 --> 00:05:40,410 set that they're learning from and then a test set. 145 00:05:40,410 --> 00:05:42,750 And they show that their system, as every paper that 146 00:05:42,750 --> 00:05:45,840 ever gets published shows, does better than the two competing 147 00:05:45,840 --> 00:05:46,450 approaches. 148 00:05:46,450 --> 00:05:48,550 Although even in this best case scenario, 149 00:05:48,550 --> 00:05:51,150 there's significant degradation from learning to test. 150 00:05:51,150 --> 00:05:52,438 So one, it's super simple. 151 00:05:52,438 --> 00:05:54,480 It's using very simple methods, and the data sets 152 00:05:54,480 --> 00:05:58,190 are tiny, 38 learning cases, 40 test cases. 153 00:05:58,190 --> 00:06:01,020 And this is published in The Lancet, which is the leading 154 00:06:01,020 --> 00:06:04,230 biomedical journal even today. 155 00:06:04,230 --> 00:06:06,480 And then people got excited about AI 156 00:06:06,480 --> 00:06:08,890 sort of building off of simple approaches. 157 00:06:08,890 --> 00:06:12,170 And back in 1990, it was thought artificial neural nets would 158 00:06:12,170 --> 00:06:13,920 be super useful for quantitative pathology 159 00:06:13,920 --> 00:06:16,080 for sort of obvious reasons. 160 00:06:16,080 --> 00:06:17,520 But at that time, there was really 161 00:06:17,520 --> 00:06:20,045 no way of digitizing stuff at any sort of scale, 162 00:06:20,045 --> 00:06:21,920 and that problem's only recently been solved. 163 00:06:21,920 --> 00:06:24,180 But sort of in 2000, people were first thinking about 164 00:06:24,180 --> 00:06:25,830 once the slides are digital, then 165 00:06:25,830 --> 00:06:29,580 you could apply computational methods effectively. 166 00:06:29,580 --> 00:06:31,590 But kind of nothing really changed, 167 00:06:31,590 --> 00:06:33,090 and still, to a large degree, hasn't 168 00:06:33,090 --> 00:06:34,840 changed for the predominance of pathology, 169 00:06:34,840 --> 00:06:37,710 which I'll talk about. 170 00:06:37,710 --> 00:06:39,660 But as was mentioned earlier, I was 171 00:06:39,660 --> 00:06:42,390 part of one of the first studies to really take a more machine 172 00:06:42,390 --> 00:06:43,560 learning approach to this. 173 00:06:43,560 --> 00:06:45,060 And what we mean by machine learning 174 00:06:45,060 --> 00:06:46,860 versus prior approaches is the idea 175 00:06:46,860 --> 00:06:51,030 of using data-driven analysis to figure out the best features. 176 00:06:51,030 --> 00:06:53,280 And now you can do that in an even more explicit way 177 00:06:53,280 --> 00:06:54,750 with machine learning, but there's 178 00:06:54,750 --> 00:06:56,708 sort of a progression from measuring one or two 179 00:06:56,708 --> 00:06:59,060 things in a very tedious way on very small data sets to, 180 00:06:59,060 --> 00:07:00,560 I'd say, this way, where we're using 181 00:07:00,560 --> 00:07:02,520 some traditional regression-based machine 182 00:07:02,520 --> 00:07:05,253 learning to measure larger numbers of features. 183 00:07:05,253 --> 00:07:07,170 And then using things like those associations, 184 00:07:07,170 --> 00:07:08,940 those features with patient outcome 185 00:07:08,940 --> 00:07:11,700 to focus your analyses on the most important ones. 186 00:07:11,700 --> 00:07:14,370 And the challenging machine learning task here 187 00:07:14,370 --> 00:07:16,410 and really one of the core tasks in pathology 188 00:07:16,410 --> 00:07:17,860 is image processing. 189 00:07:17,860 --> 00:07:19,770 So how do we train computers to sort of 190 00:07:19,770 --> 00:07:21,570 have the knowledge of what is being 191 00:07:21,570 --> 00:07:23,700 looked at that any pathologist would want to have? 192 00:07:23,700 --> 00:07:25,200 And there's a few basic things you'd 193 00:07:25,200 --> 00:07:26,790 want to train the computer to do, 194 00:07:26,790 --> 00:07:29,070 which is, for example, identify where's the cancer? 195 00:07:29,070 --> 00:07:30,135 Where's the stroma? 196 00:07:30,135 --> 00:07:31,260 Where are the cancer cells? 197 00:07:31,260 --> 00:07:33,690 Where are the fibroblasts, et cetera? 198 00:07:33,690 --> 00:07:36,367 And then once you train a machine learning based system 199 00:07:36,367 --> 00:07:37,950 to identify those things, you can then 200 00:07:37,950 --> 00:07:40,980 extract lots of quantitative phenotypes out of the images. 201 00:07:40,980 --> 00:07:43,080 And this is all using human-engineered features 202 00:07:43,080 --> 00:07:45,330 to measure all the different characteristics of what's 203 00:07:45,330 --> 00:07:46,540 going on in an image. 204 00:07:46,540 --> 00:07:48,390 And machine learning is being used here 205 00:07:48,390 --> 00:07:49,960 to create those features. 206 00:07:49,960 --> 00:07:52,110 And then we use other regression-based methods 207 00:07:52,110 --> 00:07:54,750 to associate these features with things like clinical outcome. 208 00:07:54,750 --> 00:07:56,417 And in this work, we show that by taking 209 00:07:56,417 --> 00:07:58,125 a data-driven approach, sort of, you 210 00:07:58,125 --> 00:07:59,790 begin to focus on things like what's 211 00:07:59,790 --> 00:08:01,770 happening in the tumor microenvironment, 212 00:08:01,770 --> 00:08:03,350 not just in the tumor itself? 213 00:08:03,350 --> 00:08:05,910 And it sort of turned out, over the past decade, 214 00:08:05,910 --> 00:08:07,890 that understanding the way the tumor interacts with the tumor 215 00:08:07,890 --> 00:08:10,390 microenvironment is sort of one of the most important things 216 00:08:10,390 --> 00:08:12,200 to do in cancer with things like fields 217 00:08:12,200 --> 00:08:14,423 like immunooncology being one of the biggest 218 00:08:14,423 --> 00:08:15,840 advances in the therapy of cancer, 219 00:08:15,840 --> 00:08:17,507 where you're essentially just regulating 220 00:08:17,507 --> 00:08:21,120 how tumor cells interact with the cells around them. 221 00:08:21,120 --> 00:08:23,797 And that sort of data is entirely inaccessible 222 00:08:23,797 --> 00:08:25,380 using traditional pathology approaches 223 00:08:25,380 --> 00:08:27,450 and really required a machine learning approach 224 00:08:27,450 --> 00:08:30,720 to extract a bunch of features and sort of let the data speak 225 00:08:30,720 --> 00:08:32,640 for itself in terms of which of those features 226 00:08:32,640 --> 00:08:35,679 is most important for survival. 227 00:08:35,679 --> 00:08:37,762 And in this study, we showed that these things 228 00:08:37,762 --> 00:08:38,970 are associated with survival. 229 00:08:38,970 --> 00:08:40,345 I don't know if you guys do a lot 230 00:08:40,345 --> 00:08:41,880 of Kaplan-Meier plots in here. 231 00:08:41,880 --> 00:08:43,570 PROFESSOR: They saw it once, but taking us through it 232 00:08:43,570 --> 00:08:44,890 slowly is never a bad idea. 233 00:08:44,890 --> 00:08:46,182 ANDY BECK: Yeah, so these are-- 234 00:08:46,182 --> 00:08:48,280 I feel there's one type of plot to know 235 00:08:48,280 --> 00:08:50,790 for most of biomedical research, and it's probably this one. 236 00:08:50,790 --> 00:08:53,020 And it's extremely simple. 237 00:08:53,020 --> 00:08:56,640 So it's really just an empirical distribution 238 00:08:56,640 --> 00:08:59,350 of how patients are doing over time. 239 00:08:59,350 --> 00:09:01,290 So the x-axis is time. 240 00:09:01,290 --> 00:09:04,067 And here, the goal is to build a prognostic model. 241 00:09:04,067 --> 00:09:05,650 I wish I had a predictive one in here, 242 00:09:05,650 --> 00:09:07,650 but we can talk about what that would look like. 243 00:09:07,650 --> 00:09:09,840 But a prognostic model, any sort of prognostic test 244 00:09:09,840 --> 00:09:11,880 in any disease in medicine is to try 245 00:09:11,880 --> 00:09:14,612 to create subgroups that show different survival outcomes. 246 00:09:14,612 --> 00:09:16,320 And then by implication, they may benefit 247 00:09:16,320 --> 00:09:17,362 from different therapies. 248 00:09:17,362 --> 00:09:18,030 They may not. 249 00:09:18,030 --> 00:09:19,170 That doesn't answer that question, 250 00:09:19,170 --> 00:09:20,545 but it just tells you if you want 251 00:09:20,545 --> 00:09:22,490 to make an estimate for how a patient's going 252 00:09:22,490 --> 00:09:25,192 to be doing in five years, and you can sub-classify them 253 00:09:25,192 --> 00:09:27,150 into two groups, this is a way to visualize it. 254 00:09:27,150 --> 00:09:28,140 You don't need two groups. 255 00:09:28,140 --> 00:09:29,723 You could do this with even one group, 256 00:09:29,723 --> 00:09:32,730 but it's frequently used to show differences between two groups. 257 00:09:32,730 --> 00:09:36,460 So you'll see here, there's a black line and a red line. 258 00:09:36,460 --> 00:09:38,160 And these are groups of patients where 259 00:09:38,160 --> 00:09:41,160 a model trained not on these cases 260 00:09:41,160 --> 00:09:43,470 was trained to separate high-risk patients 261 00:09:43,470 --> 00:09:44,850 from low-risk patients. 262 00:09:44,850 --> 00:09:47,370 And the way we did that was we did logistic regression 263 00:09:47,370 --> 00:09:50,520 on a different data set, sort of trying to classify patients 264 00:09:50,520 --> 00:09:52,980 alive at five years following diagnosis versus patients 265 00:09:52,980 --> 00:09:54,492 deceased, five years diagnosis. 266 00:09:54,492 --> 00:09:55,200 We build a model. 267 00:09:55,200 --> 00:09:56,310 We fix the model. 268 00:09:56,310 --> 00:10:00,280 Then we apply it to this data set of about 250 cases. 269 00:10:00,280 --> 00:10:02,790 And then we just ask, did we actually effectively create 270 00:10:02,790 --> 00:10:07,290 two different groups of patients whose survival distribution is 271 00:10:07,290 --> 00:10:08,592 significantly different? 272 00:10:08,592 --> 00:10:10,050 So what this p-value is telling you 273 00:10:10,050 --> 00:10:12,690 is the probability that these two curves come 274 00:10:12,690 --> 00:10:14,250 from the same underlying distribution 275 00:10:14,250 --> 00:10:16,740 or that there's no difference between these two curves 276 00:10:16,740 --> 00:10:19,482 across all of the time points. 277 00:10:19,482 --> 00:10:20,940 And what we see here is there seems 278 00:10:20,940 --> 00:10:23,610 to be a difference between the black line versus the red line, 279 00:10:23,610 --> 00:10:28,470 where, say, 10 years, the probability of survival 280 00:10:28,470 --> 00:10:31,352 is about 80% in the low-risk group and more like 60% 281 00:10:31,352 --> 00:10:32,310 in the high-risk group. 282 00:10:32,310 --> 00:10:34,650 And overall, the p-value's very small 283 00:10:34,650 --> 00:10:36,900 for there being a difference between those two curves. 284 00:10:36,900 --> 00:10:39,128 So that's sort of like what a successful type 285 00:10:39,128 --> 00:10:40,920 Kaplan-Meier plot would look like if you're 286 00:10:40,920 --> 00:10:43,440 trying to create a model that separates patients 287 00:10:43,440 --> 00:10:45,840 into groups with different survival distributions 288 00:10:45,840 --> 00:10:47,670 And then it's always important for these types of things 289 00:10:47,670 --> 00:10:49,030 to try them on multiple data sets. 290 00:10:49,030 --> 00:10:51,655 And here we show the same model applied to a different data set 291 00:10:51,655 --> 00:10:55,700 showed pretty similar overall effectiveness at stratifying 292 00:10:55,700 --> 00:10:57,820 patients into two groups. 293 00:10:57,820 --> 00:11:01,320 So why do you think doing this might be useful? 294 00:11:01,320 --> 00:11:02,280 I guess, yeah, anyone? 295 00:11:04,793 --> 00:11:06,960 Because there's actually, I think this type of curve 296 00:11:06,960 --> 00:11:10,088 is often confused with one that actually is extremely useful, 297 00:11:10,088 --> 00:11:11,130 which I would say-- yeah? 298 00:11:11,130 --> 00:11:12,380 PROFESSOR: Why don't you wait? 299 00:11:12,380 --> 00:11:13,690 ANDY BECK: Sure. 300 00:11:13,690 --> 00:11:14,690 PROFESSOR: Don't be shy. 301 00:11:18,500 --> 00:11:20,080 You can call them. 302 00:11:20,080 --> 00:11:21,270 ANDY BECK: All right. 303 00:11:21,270 --> 00:11:22,980 AUDIENCE: Probably you can you use 304 00:11:22,980 --> 00:11:27,120 this to start off when the patient's 305 00:11:27,120 --> 00:11:29,040 of high-risk and probably at five years, 306 00:11:29,040 --> 00:11:32,907 if the patient has high-risk, probably do a follow-up. 307 00:11:32,907 --> 00:11:33,990 ANDY BECK: Right, exactly. 308 00:11:33,990 --> 00:11:35,802 Yeah, yeah. 309 00:11:35,802 --> 00:11:37,010 So that would be a great use. 310 00:11:37,010 --> 00:11:38,100 PROFESSOR: Can you repeat the question for the recording? 311 00:11:38,100 --> 00:11:40,280 ANDY BECK: So it was saying like if you 312 00:11:40,280 --> 00:11:43,430 know someone's at a high risk of having an event prior 313 00:11:43,430 --> 00:11:46,190 to five years, an event is when the curve goes down. 314 00:11:46,190 --> 00:11:52,970 So definitely, the red group is at 40, almost double 315 00:11:52,970 --> 00:11:56,150 or something the risk of the black group. 316 00:11:56,150 --> 00:11:58,220 So if you have certain interventions 317 00:11:58,220 --> 00:12:01,940 you can do to help prevent these things, such as giving 318 00:12:01,940 --> 00:12:04,670 an additional treatment or giving more frequent monitoring 319 00:12:04,670 --> 00:12:05,390 for recurrence. 320 00:12:05,390 --> 00:12:08,420 Like if you can do a follow-up scan in a month versus six 321 00:12:08,420 --> 00:12:11,090 months, you could make that decision in a data-driven way 322 00:12:11,090 --> 00:12:13,132 by knowing whether the patient's on the red curve 323 00:12:13,132 --> 00:12:15,230 or the black curve. 324 00:12:15,230 --> 00:12:16,370 So yeah, exactly right. 325 00:12:16,370 --> 00:12:18,140 It helps you to make therapeutic decisions when there's 326 00:12:18,140 --> 00:12:19,550 a bunch of things you can do, either give 327 00:12:19,550 --> 00:12:21,967 more aggressive treatment or do more aggressive monitoring 328 00:12:21,967 --> 00:12:24,500 of disease, depending on is it aggressive disease 329 00:12:24,500 --> 00:12:25,930 or a non-aggressive disease. 330 00:12:25,930 --> 00:12:27,680 The other type of curve that I think often 331 00:12:27,680 --> 00:12:29,222 gets confused with these that's quite 332 00:12:29,222 --> 00:12:34,050 useful is one that directly tests that intervention. 333 00:12:34,050 --> 00:12:35,720 So essentially, you could do a trial 334 00:12:35,720 --> 00:12:39,580 of the usefulness, the clinical utility of this algorithm, 335 00:12:39,580 --> 00:12:42,140 where on the one hand, you make the prediction on everyone 336 00:12:42,140 --> 00:12:44,150 and don't do anything differently. 337 00:12:44,150 --> 00:12:47,065 And then the other one is you make a prediction 338 00:12:47,065 --> 00:12:48,440 on the patients, and you actually 339 00:12:48,440 --> 00:12:52,220 use it to make a decision, like more frequent treatment or more 340 00:12:52,220 --> 00:12:53,270 frequent intervention. 341 00:12:53,270 --> 00:12:55,580 And then you could do a curve, saying 342 00:12:55,580 --> 00:12:59,120 among the high-risk patients, where we actually acted on it, 343 00:12:59,120 --> 00:13:00,020 that's black. 344 00:13:00,020 --> 00:13:02,240 And if we didn't act on it, it's red. 345 00:13:02,240 --> 00:13:04,640 And then, if you do the experiment in the right way, 346 00:13:04,640 --> 00:13:08,240 you can make the inference that you're actually 347 00:13:08,240 --> 00:13:12,140 preventing death by 50% if the intervention is 348 00:13:12,140 --> 00:13:13,433 causing black versus red. 349 00:13:13,433 --> 00:13:15,350 Here, we're not doing anything with causality. 350 00:13:15,350 --> 00:13:18,140 We're just sort of observing how patients do differently 351 00:13:18,140 --> 00:13:18,890 over time. 352 00:13:18,890 --> 00:13:22,250 But frequently, you see these as the figure, the key figure 353 00:13:22,250 --> 00:13:25,058 for a randomized control trial, where 354 00:13:25,058 --> 00:13:27,350 the only thing different between the groups of patients 355 00:13:27,350 --> 00:13:28,740 is the intervention. 356 00:13:28,740 --> 00:13:30,830 And that really lets you make a powerful inference 357 00:13:30,830 --> 00:13:32,510 that changes what care should be. 358 00:13:32,510 --> 00:13:34,340 This one, you're just like, OK, maybe we should do something 359 00:13:34,340 --> 00:13:35,715 differently, but not really sure, 360 00:13:35,715 --> 00:13:37,033 but it makes intuitive sense. 361 00:13:37,033 --> 00:13:38,450 But if you actually have something 362 00:13:38,450 --> 00:13:40,220 from a randomized clinical trial or something else 363 00:13:40,220 --> 00:13:41,900 that allows you to infer causality, 364 00:13:41,900 --> 00:13:45,380 this is the most important figure. 365 00:13:45,380 --> 00:13:48,200 And you can actually infer how many lives are being saved 366 00:13:48,200 --> 00:13:49,737 or things by doing something. 367 00:13:49,737 --> 00:13:51,320 But this one's not about intervention. 368 00:13:51,320 --> 00:13:53,090 It's just about sort of observing 369 00:13:53,090 --> 00:13:56,130 how patients do over time. 370 00:13:56,130 --> 00:13:59,750 So that was some of the work from eight years ago, 371 00:13:59,750 --> 00:14:02,083 and none of this has really changed in practice. 372 00:14:02,083 --> 00:14:04,250 Everyone is still using glass slides and microscopes 373 00:14:04,250 --> 00:14:05,030 in the clinic. 374 00:14:05,030 --> 00:14:07,290 Research is a totally different story. 375 00:14:07,290 --> 00:14:10,583 But still, 99% of clinic is using 376 00:14:10,583 --> 00:14:12,500 these old-fashioned technologies-- microscopes 377 00:14:12,500 --> 00:14:15,830 from technology breakthroughs in the mid-1800s, staining 378 00:14:15,830 --> 00:14:18,020 breakthroughs in the late 1800s. 379 00:14:18,020 --> 00:14:20,480 The H and E stain is the key stain. 380 00:14:20,480 --> 00:14:23,600 So aspects of pathology haven't moved forward at all, 381 00:14:23,600 --> 00:14:26,300 and this has pretty significant consequences. 382 00:14:26,300 --> 00:14:28,040 And here's just a couple of types 383 00:14:28,040 --> 00:14:30,020 of figures that really allow you to see 384 00:14:30,020 --> 00:14:32,660 the primary data for what a problem interobserver 385 00:14:32,660 --> 00:14:34,963 variability really is in clinical practice. 386 00:14:34,963 --> 00:14:36,380 And this is just another, I think, 387 00:14:36,380 --> 00:14:40,160 really nice, empirical way of viewing raw data, 388 00:14:40,160 --> 00:14:44,060 where there is a ground truth consensus of experts, 389 00:14:44,060 --> 00:14:47,510 who sort of decided what all these 70 or so cases were, 390 00:14:47,510 --> 00:14:50,210 through experts always knowing the right answer. 391 00:14:50,210 --> 00:14:52,160 And for all of these 70, called them 392 00:14:52,160 --> 00:14:53,930 all the category of atypia, which 393 00:14:53,930 --> 00:14:55,530 here is indicated in yellow. 394 00:14:55,530 --> 00:14:57,350 And then they took all of these 70 cases 395 00:14:57,350 --> 00:14:59,450 that the experts that are atypia and sent them 396 00:14:59,450 --> 00:15:02,540 to hundreds of pathologists across the country 397 00:15:02,540 --> 00:15:05,150 and for each one, just plotted the distribution 398 00:15:05,150 --> 00:15:07,040 of different diagnoses they were receiving. 399 00:15:07,040 --> 00:15:09,500 And quite strikingly-- and this was published in JAMA, 400 00:15:09,500 --> 00:15:12,200 a great journal, about four years ago now-- 401 00:15:12,200 --> 00:15:14,150 they show this incredible distribution 402 00:15:14,150 --> 00:15:16,500 of different diagnoses among each case. 403 00:15:16,500 --> 00:15:18,112 So this is really why you might want 404 00:15:18,112 --> 00:15:20,570 a computational approach is there should be the same color. 405 00:15:20,570 --> 00:15:22,987 This should just be one big color or maybe a few outliers, 406 00:15:22,987 --> 00:15:25,670 but for almost any case, there's a significant proportion 407 00:15:25,670 --> 00:15:27,860 of people calling it normal, which is yellow-- 408 00:15:27,860 --> 00:15:30,350 or sorry, tan, then atypical, which is yellow, 409 00:15:30,350 --> 00:15:33,170 and then actually cancer, which is orange or red. 410 00:15:33,170 --> 00:15:35,010 PROFESSOR: What does atypical mean? 411 00:15:35,010 --> 00:15:37,850 ANDY BECK: Yeah, so atypical is this border area between 412 00:15:37,850 --> 00:15:41,990 totally normal and cancer, where the pathologist is saying 413 00:15:41,990 --> 00:15:42,500 it's-- 414 00:15:42,500 --> 00:15:45,417 which is actually the most important diagnosis 415 00:15:45,417 --> 00:15:47,000 because totally normal you do nothing. 416 00:15:47,000 --> 00:15:50,350 Cancer-- there's well-described protocols for what to do. 417 00:15:50,350 --> 00:15:52,035 Atypia, they often overtreat. 418 00:15:52,035 --> 00:15:53,660 And that's sort of the bias in medicine 419 00:15:53,660 --> 00:15:56,580 is always assume the worst when you get a certain diagnosis 420 00:15:56,580 --> 00:15:57,080 back. 421 00:15:57,080 --> 00:16:01,067 So atypia has nuclear features of cancer but doesn't fully. 422 00:16:01,067 --> 00:16:02,900 You know, maybe you get 7 of the 10 criteria 423 00:16:02,900 --> 00:16:05,160 or three of the five criteria. 424 00:16:05,160 --> 00:16:07,130 And it has to do with sort of nuclei 425 00:16:07,130 --> 00:16:10,010 looking a little bigger and a little weirder than expected 426 00:16:10,010 --> 00:16:12,710 but not enough where the pathologist feels comfortable 427 00:16:12,710 --> 00:16:13,730 calling it cancer. 428 00:16:13,730 --> 00:16:15,355 And that's part of the reason that that 429 00:16:15,355 --> 00:16:17,540 shows almost a coin flip. 430 00:16:17,540 --> 00:16:21,620 Of the ones the experts called atypia, only 48% 431 00:16:21,620 --> 00:16:23,240 was agreed with in the community. 432 00:16:23,240 --> 00:16:26,090 The other interesting thing the study showed was intraobserver 433 00:16:26,090 --> 00:16:29,000 variability is just as big of an issue as interobserver. 434 00:16:29,000 --> 00:16:33,410 So a person disagrees with themselves after an eight month 435 00:16:33,410 --> 00:16:35,240 washout period pretty much as often 436 00:16:35,240 --> 00:16:37,340 as they disagree with others. 437 00:16:37,340 --> 00:16:41,690 So another reason why computational approaches 438 00:16:41,690 --> 00:16:43,998 would be valuable and why this really is a problem. 439 00:16:43,998 --> 00:16:45,290 And this is in breast biopsies. 440 00:16:45,290 --> 00:16:47,900 The same research group showed quite similar results. 441 00:16:47,900 --> 00:16:51,457 This was in British Medical Journal in skin biopsies, which 442 00:16:51,457 --> 00:16:53,540 is another super important area, where, again they 443 00:16:53,540 --> 00:16:55,880 have the same type of visualization of data. 444 00:16:55,880 --> 00:17:00,470 They have five different classes of severity of skin lesions, 445 00:17:00,470 --> 00:17:02,690 ranging from a totally normal benign nevus, like I'm 446 00:17:02,690 --> 00:17:05,690 sure many of us have on our skin to a melanoma, 447 00:17:05,690 --> 00:17:09,319 which is a serious, malignant cancer that needs to be treated 448 00:17:09,319 --> 00:17:11,569 as soon as possible. 449 00:17:11,569 --> 00:17:14,470 And here, the white color is totally benign. 450 00:17:14,470 --> 00:17:16,790 The darker blue color is melanoma. 451 00:17:16,790 --> 00:17:19,369 And again, they show lots of discordance, pretty much as 452 00:17:19,369 --> 00:17:22,790 bad as in the breast biopsies. 453 00:17:22,790 --> 00:17:25,550 And here again, the intraobserver variability 454 00:17:25,550 --> 00:17:28,303 with an eight-month washout period was about 33%. 455 00:17:28,303 --> 00:17:29,720 So people disagree with themselves 456 00:17:29,720 --> 00:17:30,678 one out of three times. 457 00:17:33,210 --> 00:17:36,030 And then these aren't totally outlier cases or one research 458 00:17:36,030 --> 00:17:36,530 group. 459 00:17:36,530 --> 00:17:38,660 The College of American Pathologists 460 00:17:38,660 --> 00:17:42,770 did a big summary of 116 studies and showed overall, 461 00:17:42,770 --> 00:17:47,750 an 18.3% median discrepancy rate across all the studies 462 00:17:47,750 --> 00:17:49,880 and a 6% major discrepancy rate, which 463 00:17:49,880 --> 00:17:51,740 would be a major clinical decision 464 00:17:51,740 --> 00:17:55,045 is the wrong one, like surgery, no surgery, et cetera. 465 00:17:55,045 --> 00:17:56,420 And those sort of in the ballpark 466 00:17:56,420 --> 00:18:00,020 agree with the previously published findings. 467 00:18:00,020 --> 00:18:02,640 So a lot of reasons to be pessimistic 468 00:18:02,640 --> 00:18:05,907 but one reason to be very optimistic is the one area 469 00:18:05,907 --> 00:18:08,240 where AI is not-- not the one area, but maybe one of two 470 00:18:08,240 --> 00:18:11,990 or three areas where AI is not total hype is vision. 471 00:18:11,990 --> 00:18:14,690 Vision really started working well as, I don't if you've 472 00:18:14,690 --> 00:18:17,480 covered in this class but with deep convolutional neural nets 473 00:18:17,480 --> 00:18:18,410 in 2012. 474 00:18:18,410 --> 00:18:20,300 And then all the groups sort of just 475 00:18:20,300 --> 00:18:23,480 kept getting incrementally better year over year. 476 00:18:23,480 --> 00:18:25,640 And now this is an old graph from 2015, 477 00:18:25,640 --> 00:18:27,860 but there's been a huge development of methods 478 00:18:27,860 --> 00:18:31,133 even since 2015, where now I think we really 479 00:18:31,133 --> 00:18:33,800 understand the strengths and the weaknesses of these approaches. 480 00:18:33,800 --> 00:18:36,270 And pathology sort of has a lot of the strengths, 481 00:18:36,270 --> 00:18:40,340 which is super well-defined, very focused questions. 482 00:18:40,340 --> 00:18:42,743 And I think there's lots of failures whenever you try 483 00:18:42,743 --> 00:18:43,910 to do anything more general. 484 00:18:43,910 --> 00:18:46,452 But for the types of tasks where you know exactly what you're 485 00:18:46,452 --> 00:18:49,360 looking for and you can generate the training data, 486 00:18:49,360 --> 00:18:51,660 these systems can work really well. 487 00:18:51,660 --> 00:18:54,560 So that's a lot of what we're focused on at PathAI 488 00:18:54,560 --> 00:18:56,640 is how do we extract the most information out 489 00:18:56,640 --> 00:18:58,480 of pathology images really doing two things. 490 00:18:58,480 --> 00:19:00,750 One is understanding what's inside the images 491 00:19:00,750 --> 00:19:03,720 and the second is using deep learning to sort of directly 492 00:19:03,720 --> 00:19:06,240 try to infer patient level phenotypes 493 00:19:06,240 --> 00:19:08,750 and outcomes directly from the images. 494 00:19:08,750 --> 00:19:10,800 And we use both traditional machine 495 00:19:10,800 --> 00:19:12,270 learning models for certain things, 496 00:19:12,270 --> 00:19:13,890 like particularly making inference 497 00:19:13,890 --> 00:19:16,290 at the patient level, where n is often very small. 498 00:19:16,290 --> 00:19:18,770 But anything that's directly operating on the image 499 00:19:18,770 --> 00:19:22,860 is almost some variant always of deep convolutional neural nets, 500 00:19:22,860 --> 00:19:27,930 which really are the state of the art for image processing. 501 00:19:27,930 --> 00:19:31,010 And we sort of, a lot of what we think about at PathAI, 502 00:19:31,010 --> 00:19:34,020 and I think what's really important in this area of ML 503 00:19:34,020 --> 00:19:36,360 for medicine is generating the right data set 504 00:19:36,360 --> 00:19:38,250 and then using things like deep learning 505 00:19:38,250 --> 00:19:40,733 to optimize all of the features in a data-driven away, 506 00:19:40,733 --> 00:19:42,150 and then really thinking about how 507 00:19:42,150 --> 00:19:43,890 to use the outputs of these models 508 00:19:43,890 --> 00:19:47,347 intelligently and really validate them in a robust way, 509 00:19:47,347 --> 00:19:48,930 because there's many ways to be fooled 510 00:19:48,930 --> 00:19:52,570 by artefacts and other things. 511 00:19:52,570 --> 00:19:54,150 So just some of the-- 512 00:19:54,150 --> 00:19:57,255 not to belabor the points, but why these approaches are really 513 00:19:57,255 --> 00:19:59,130 valuable in this application is it allows you 514 00:19:59,130 --> 00:20:00,870 to exhaustively analyze slides. 515 00:20:00,870 --> 00:20:02,663 So a pathologist, the reason they're 516 00:20:02,663 --> 00:20:05,080 making so many errors is they're just kind of overwhelmed. 517 00:20:05,080 --> 00:20:06,247 I mean, there's two reasons. 518 00:20:06,247 --> 00:20:08,880 One is humans aren't good at interpreting visual patterns. 519 00:20:08,880 --> 00:20:11,380 Actually, I think that's not the real reason, because humans 520 00:20:11,380 --> 00:20:12,730 are pretty darn good at that. 521 00:20:12,730 --> 00:20:14,938 And there are difficult things where we can disagree, 522 00:20:14,938 --> 00:20:18,750 but when people focus on small images, frequently they agree. 523 00:20:18,750 --> 00:20:21,780 But these images are enormous, and humans just 524 00:20:21,780 --> 00:20:24,270 don't have enough time to study carefully every cell 525 00:20:24,270 --> 00:20:25,170 on every slide. 526 00:20:25,170 --> 00:20:27,450 Whereas, the computer, in a real way, 527 00:20:27,450 --> 00:20:30,000 can be forced to exhaustively analyze 528 00:20:30,000 --> 00:20:33,940 every cell on every slide, and that's just a huge difference. 529 00:20:33,940 --> 00:20:34,950 It's quantitative. 530 00:20:34,950 --> 00:20:36,420 I mean, this is one thing the computer is definitely 531 00:20:36,420 --> 00:20:36,920 better at. 532 00:20:36,920 --> 00:20:39,150 It can compute huge numerators, huge denominators, 533 00:20:39,150 --> 00:20:40,635 and exactly compute proportions. 534 00:20:40,635 --> 00:20:42,510 Whereas, when a person is looking at a slide, 535 00:20:42,510 --> 00:20:44,677 they're really just eyeballing some percentage based 536 00:20:44,677 --> 00:20:46,050 on a very small amount of data. 537 00:20:46,050 --> 00:20:47,110 It's super efficient. 538 00:20:47,110 --> 00:20:49,590 So you can analyze-- 539 00:20:49,590 --> 00:20:52,710 this whole process is massively paralyzable, 540 00:20:52,710 --> 00:20:54,690 so you can almost do a slide as fast 541 00:20:54,690 --> 00:20:57,978 as you want based on how much you're willing to spend on it. 542 00:20:57,978 --> 00:21:00,270 And it allows you not only do all of of these, sort of, 543 00:21:00,270 --> 00:21:02,980 automation tasks exhaustively, quantitatively, and efficiently 544 00:21:02,980 --> 00:21:05,610 but also discover a lot of new insights from the data, which 545 00:21:05,610 --> 00:21:07,312 I think we did in a very early way, 546 00:21:07,312 --> 00:21:09,020 back eight years ago, when we sort of had 547 00:21:09,020 --> 00:21:11,640 human-extracted features correlate those with outcome. 548 00:21:11,640 --> 00:21:13,890 But now you can really supervise the whole process 549 00:21:13,890 --> 00:21:15,600 with machine learning of how you go 550 00:21:15,600 --> 00:21:19,200 from the components of an image to patient outcomes 551 00:21:19,200 --> 00:21:24,330 and learn new biology that you didn't know going in. 552 00:21:24,330 --> 00:21:25,800 And everyone's always like, well, 553 00:21:25,800 --> 00:21:27,592 are you just going to replace pathologists? 554 00:21:27,592 --> 00:21:31,080 And I really don't think this is, in any way, the future. 555 00:21:31,080 --> 00:21:35,850 In almost every field that's sort of like where automation 556 00:21:35,850 --> 00:21:38,550 is becoming very common, the demand 557 00:21:38,550 --> 00:21:41,440 for people who are experts in that area is increasing. 558 00:21:41,440 --> 00:21:43,470 And like airplane pilots is one I was just 559 00:21:43,470 --> 00:21:44,792 learning about today. 560 00:21:44,792 --> 00:21:46,500 They just do a completely different thing 561 00:21:46,500 --> 00:21:48,330 than they did 20 years ago, and now it's 562 00:21:48,330 --> 00:21:51,450 all about mission control of this big system 563 00:21:51,450 --> 00:21:53,575 and understanding all the flight management systems 564 00:21:53,575 --> 00:21:55,533 and understanding all the data they're getting. 565 00:21:55,533 --> 00:21:57,900 And I think the job has not gotten necessarily simpler, 566 00:21:57,900 --> 00:21:59,340 but they're much more effective, and they're doing 567 00:21:59,340 --> 00:22:00,790 much different types of work. 568 00:22:00,790 --> 00:22:01,920 And I do think the pathologist is 569 00:22:01,920 --> 00:22:03,337 going to move from sort of staring 570 00:22:03,337 --> 00:22:06,150 into a microscope with a literally very myopic focus 571 00:22:06,150 --> 00:22:08,368 on very small things to being more 572 00:22:08,368 --> 00:22:10,410 of a consultant with physicians, integrating lots 573 00:22:10,410 --> 00:22:13,260 of different types of data, things 574 00:22:13,260 --> 00:22:15,210 that AI is really bad at, a lot of reasoning 575 00:22:15,210 --> 00:22:19,033 about specific instances, and then providing 576 00:22:19,033 --> 00:22:20,200 that guidance to physicians. 577 00:22:20,200 --> 00:22:22,075 So I think the job will look a lot different, 578 00:22:22,075 --> 00:22:25,485 but we never really needed more diagnosticians in the future 579 00:22:25,485 --> 00:22:28,420 than in the past. 580 00:22:28,420 --> 00:22:30,690 So one example, I think we sent out a reading 581 00:22:30,690 --> 00:22:33,900 about this was this concept of breast cancer metastasis 582 00:22:33,900 --> 00:22:36,780 is a good use case of machine learning. 583 00:22:36,780 --> 00:22:38,520 And this is just a patient example. 584 00:22:38,520 --> 00:22:41,370 So a primary mass is discovered. 585 00:22:41,370 --> 00:22:44,460 So one of the big determinants of the prognosis 586 00:22:44,460 --> 00:22:47,088 from a primary tumor is has it spread to the lymph nodes? 587 00:22:47,088 --> 00:22:48,630 Because that's one of the first areas 588 00:22:48,630 --> 00:22:51,240 that tumors metastasize to. 589 00:22:51,240 --> 00:22:53,695 And the way to diagnose whether tumors have metastasized 590 00:22:53,695 --> 00:22:55,950 to lymph nodes is to take a biopsy 591 00:22:55,950 --> 00:22:58,200 and then evaluate those for the presence of cancer 592 00:22:58,200 --> 00:23:00,960 where it shouldn't be. 593 00:23:00,960 --> 00:23:04,980 And this is a task that's very quantitative and very tedious. 594 00:23:04,980 --> 00:23:08,700 So the International Symposium on Biomedical Imaging 595 00:23:08,700 --> 00:23:12,090 organized this challenge called the Chameleon 16 Challenge, 596 00:23:12,090 --> 00:23:14,910 where they put together almost 300 training slides and about 597 00:23:14,910 --> 00:23:16,740 130 test slides. 598 00:23:16,740 --> 00:23:19,950 And they asked a bunch of teams to build machine learning based 599 00:23:19,950 --> 00:23:23,940 systems to automate the evaluation of the test 600 00:23:23,940 --> 00:23:26,940 slides, both to diagnose whether the slide contained cancer 601 00:23:26,940 --> 00:23:29,490 or not, as well as to actually identify where in the slides 602 00:23:29,490 --> 00:23:31,560 the cancer was located. 603 00:23:31,560 --> 00:23:34,170 And kind of the big machine learning challenge here, 604 00:23:34,170 --> 00:23:38,790 why you can't just throw it into a off-the-shelf 605 00:23:38,790 --> 00:23:42,620 or on the web image classification tool 606 00:23:42,620 --> 00:23:46,650 is the images are so large that it's just not 607 00:23:46,650 --> 00:23:50,430 feasible to throw the whole image 608 00:23:50,430 --> 00:23:53,070 into any kind of neural net. 609 00:23:53,070 --> 00:23:57,150 Because they can be between 20,000 and 200,000 610 00:23:57,150 --> 00:23:58,260 pixels on a side. 611 00:23:58,260 --> 00:24:03,840 So they have millions of pixels. 612 00:24:03,840 --> 00:24:06,540 And for that, we do this process where 613 00:24:06,540 --> 00:24:08,280 we start with a labeled data set, 614 00:24:08,280 --> 00:24:10,920 where there are these very large regions labeled either 615 00:24:10,920 --> 00:24:12,960 as normal or tumor. 616 00:24:12,960 --> 00:24:14,970 And then we build procedures, which is actually 617 00:24:14,970 --> 00:24:17,460 a key component of getting machine learning to work well, 618 00:24:17,460 --> 00:24:20,550 of sampling patches of images and putting those patches 619 00:24:20,550 --> 00:24:22,110 into the model. 620 00:24:22,110 --> 00:24:23,760 And this sampling procedure is actually 621 00:24:23,760 --> 00:24:26,820 incredibly important for controlling 622 00:24:26,820 --> 00:24:29,160 the behavior of the system, because you could 623 00:24:29,160 --> 00:24:30,510 sample in all different ways. 624 00:24:30,510 --> 00:24:32,218 You're never going to sample exhaustively 625 00:24:32,218 --> 00:24:35,160 just because there's far too many possible patches. 626 00:24:35,160 --> 00:24:37,920 So thinking about the right examples to show the system 627 00:24:37,920 --> 00:24:40,950 has an enormous effect on both the performance 628 00:24:40,950 --> 00:24:43,800 and the generalizability of the systems you're building. 629 00:24:43,800 --> 00:24:45,900 And some of the, sort of, insights we learned 630 00:24:45,900 --> 00:24:49,290 was how best to do the, sort of, sampling. 631 00:24:49,290 --> 00:24:51,832 But once you have these samples, it's all data driven-- sure. 632 00:24:51,832 --> 00:24:54,123 AUDIENCE: Can you talk more about the sampling strategy 633 00:24:54,123 --> 00:24:54,740 schemes? 634 00:24:54,740 --> 00:24:58,200 ANDY BECK: Yeah, so from a high level, 635 00:24:58,200 --> 00:25:01,800 you want to go from random sampling, which 636 00:25:01,800 --> 00:25:06,000 is a reasonable thing to do, to more intelligent sampling, 637 00:25:06,000 --> 00:25:09,210 based on knowing what the computer needs 638 00:25:09,210 --> 00:25:12,220 to learn more about. 639 00:25:12,220 --> 00:25:15,413 And one thing we've done and-- 640 00:25:15,413 --> 00:25:17,580 so it's sort of like figuring-- so the first step is 641 00:25:17,580 --> 00:25:19,050 sort of simple. 642 00:25:19,050 --> 00:25:21,158 You can randomly sample. 643 00:25:21,158 --> 00:25:22,950 But then the second part is a little harder 644 00:25:22,950 --> 00:25:24,810 to figure out what examples do you 645 00:25:24,810 --> 00:25:27,540 want to enrich your training set for to make 646 00:25:27,540 --> 00:25:29,180 the system perform even better? 647 00:25:29,180 --> 00:25:31,680 And there's different things you can optimize for, for that. 648 00:25:31,680 --> 00:25:33,780 So it's sort of like this whole sampling actually 649 00:25:33,780 --> 00:25:35,197 being part of the machine learning 650 00:25:35,197 --> 00:25:37,383 procedure is quite useful. 651 00:25:37,383 --> 00:25:39,300 And you're not just going to be sampling once. 652 00:25:39,300 --> 00:25:41,485 You could iterate on this and keep providing 653 00:25:41,485 --> 00:25:42,610 different types of samples. 654 00:25:42,610 --> 00:25:45,030 So for example, if you learn that it's 655 00:25:45,030 --> 00:25:48,090 missing certain types of errors, or it 656 00:25:48,090 --> 00:25:49,602 hasn't seen enough of certain-- 657 00:25:49,602 --> 00:25:51,060 there's many ways of getting at it. 658 00:25:51,060 --> 00:25:54,000 But if you know it hasn't seen enough types of examples 659 00:25:54,000 --> 00:25:56,640 in your training set, you can over-sample for that. 660 00:25:56,640 --> 00:25:58,500 Or if you see you have a confusion matrix 661 00:25:58,500 --> 00:26:00,450 and you see it's failing on certain types, 662 00:26:00,450 --> 00:26:02,617 you can try to figure out why is it failing on those 663 00:26:02,617 --> 00:26:04,950 and alter the sampling procedure to enrich for that. 664 00:26:04,950 --> 00:26:07,740 You could even provide outputs to humans, 665 00:26:07,740 --> 00:26:11,460 who can point you to the areas where it's making mistakes. 666 00:26:11,460 --> 00:26:13,860 Because often you don't have exhaustively labeled. 667 00:26:13,860 --> 00:26:16,720 In this case, we actually did have exhaustively labeled 668 00:26:16,720 --> 00:26:17,220 slides. 669 00:26:17,220 --> 00:26:18,695 So it was somewhat easier. 670 00:26:18,695 --> 00:26:20,820 But you can see there's even a lot of heterogeneity 671 00:26:20,820 --> 00:26:22,270 within the different classes. 672 00:26:22,270 --> 00:26:25,950 So you might do some clever tricks to figure out 673 00:26:25,950 --> 00:26:28,470 what are the types of the red class that it's getting wrong, 674 00:26:28,470 --> 00:26:31,330 and how am I going to fix that by providing it more examples? 675 00:26:31,330 --> 00:26:34,650 So I think, sort of, that's one of the easier things 676 00:26:34,650 --> 00:26:35,490 to control. 677 00:26:35,490 --> 00:26:38,310 Rather than trying to tune other parameters 678 00:26:38,310 --> 00:26:41,730 within these super complicated networks, in our experience, 679 00:26:41,730 --> 00:26:44,787 just playing with the training, the sampling 680 00:26:44,787 --> 00:26:46,620 piece of the training, it should almost just 681 00:26:46,620 --> 00:26:48,037 be thought of as another parameter 682 00:26:48,037 --> 00:26:50,670 to optimize for when you're dealing with a problem 683 00:26:50,670 --> 00:26:53,040 where you have humongous slides and you can't 684 00:26:53,040 --> 00:26:56,090 use all the training data. 685 00:26:56,090 --> 00:26:59,250 AUDIENCE: So decades ago, I met some pathologists 686 00:26:59,250 --> 00:27:03,390 who were looking at cervical cancer screening. 687 00:27:03,390 --> 00:27:08,130 And they thought that you could detect a gradient 688 00:27:08,130 --> 00:27:11,140 in the degree of atypia. 689 00:27:11,140 --> 00:27:15,210 And so not at training time but at testing time, what 690 00:27:15,210 --> 00:27:18,300 they were trying to do was to follow that gradient in order 691 00:27:18,300 --> 00:27:25,450 to find the most atypical part of of the image. 692 00:27:25,450 --> 00:27:27,583 Is that still believed to be true? 693 00:27:27,583 --> 00:27:28,250 ANDY BECK: Yeah. 694 00:27:28,250 --> 00:27:29,890 That it's a continuum? 695 00:27:29,890 --> 00:27:31,380 Yeah, definitely. 696 00:27:31,380 --> 00:27:35,240 PROFESSOR: You mean within a sample and in the slides. 697 00:27:35,240 --> 00:27:36,980 ANDY BECK: Yeah, I mean, you mean just 698 00:27:36,980 --> 00:27:39,050 like a continuum of aggressiveness. 699 00:27:39,050 --> 00:27:40,570 Yeah, I think it is a continuum. 700 00:27:40,570 --> 00:27:43,460 I mean, this is more of a binary task, 701 00:27:43,460 --> 00:27:45,590 but there's going to be continuums 702 00:27:45,590 --> 00:27:47,720 of grade within the cancer. 703 00:27:47,720 --> 00:27:50,000 I mean, that's another level of adding on. 704 00:27:50,000 --> 00:27:52,130 If we wanted to correlate this with outcome, 705 00:27:52,130 --> 00:27:54,380 it would definitely be valuable to do that. 706 00:27:54,380 --> 00:27:56,730 To not just say quantitate the bulk of tumor 707 00:27:56,730 --> 00:28:00,770 but to estimate the malignancy of every individual nucleus, 708 00:28:00,770 --> 00:28:02,270 which we can do also. 709 00:28:02,270 --> 00:28:05,180 So you can actually classify, not just tumor region 710 00:28:05,180 --> 00:28:06,830 but you can classify individual cells. 711 00:28:06,830 --> 00:28:08,893 And you can classify them based on malignancy. 712 00:28:08,893 --> 00:28:10,310 And then you can get the, sort of, 713 00:28:10,310 --> 00:28:12,500 gradient within a population. 714 00:28:12,500 --> 00:28:16,130 In this study, it was just a region-based, not a cell-based, 715 00:28:16,130 --> 00:28:18,800 but you can definitely do that, and definitely, it's 716 00:28:18,800 --> 00:28:19,390 a spectrum. 717 00:28:19,390 --> 00:28:21,140 I mean, it's kind of like the atypia idea. 718 00:28:21,140 --> 00:28:23,690 Everything in biology is pretty much on a spectrum, 719 00:28:23,690 --> 00:28:27,080 like from normal to atypical to low-grade cancer, 720 00:28:27,080 --> 00:28:29,840 medium-grade cancer, high-grade cancer, 721 00:28:29,840 --> 00:28:31,370 and these sorts of methods do allow 722 00:28:31,370 --> 00:28:34,250 you to really more precisely estimate 723 00:28:34,250 --> 00:28:35,827 where you are on that continuum. 724 00:28:38,690 --> 00:28:41,350 And that's the basic approach. 725 00:28:41,350 --> 00:28:42,910 We get the big whole site images. 726 00:28:42,910 --> 00:28:44,667 We figure out how to sample patches 727 00:28:44,667 --> 00:28:46,750 from the different regions to optimize performance 728 00:28:46,750 --> 00:28:48,370 of the model during training time. 729 00:28:48,370 --> 00:28:50,153 And then during testing time, just we 730 00:28:50,153 --> 00:28:51,570 take a whole big whole site image. 731 00:28:51,570 --> 00:28:53,680 We break it into millions of little patches. 732 00:28:53,680 --> 00:28:55,662 Send each patch individually. 733 00:28:55,662 --> 00:28:57,370 We don't actually-- you could potentially 734 00:28:57,370 --> 00:28:59,740 use spatial information about how close they 735 00:28:59,740 --> 00:29:01,240 are to each other, which would make 736 00:29:01,240 --> 00:29:02,992 the process less efficient. 737 00:29:02,992 --> 00:29:03,700 We don't do that. 738 00:29:03,700 --> 00:29:05,320 We just send them in individually 739 00:29:05,320 --> 00:29:08,440 and then visualize the output as a heat map. 740 00:29:10,960 --> 00:29:13,030 And this, I think, isn't in the reference 741 00:29:13,030 --> 00:29:15,700 I sent so the one I sent showed how 742 00:29:15,700 --> 00:29:19,750 you were able to combine the estimates of the deep learning 743 00:29:19,750 --> 00:29:22,420 system with the human pathologist's estimate 744 00:29:22,420 --> 00:29:25,870 to make the human pathologist's error rate go down by 85% 745 00:29:25,870 --> 00:29:28,147 and get to less than 1%. 746 00:29:28,147 --> 00:29:30,730 And the interesting thing about how these systems keep getting 747 00:29:30,730 --> 00:29:32,647 better over time and potentially they over-fit 748 00:29:32,647 --> 00:29:34,640 to the competition data set-- 749 00:29:34,640 --> 00:29:36,910 because I think we submitted, maybe, three times, 750 00:29:36,910 --> 00:29:38,020 which isn't that many. 751 00:29:38,020 --> 00:29:41,890 But over the course of six months after the first closing 752 00:29:41,890 --> 00:29:44,650 of the competition, people kept competing and making systems 753 00:29:44,650 --> 00:29:45,220 better. 754 00:29:45,220 --> 00:29:46,887 And actually, the fully automated system 755 00:29:46,887 --> 00:29:49,840 on this data set achieved an error rate of less than 1% 756 00:29:49,840 --> 00:29:53,260 by the final submission date, which was significantly better 757 00:29:53,260 --> 00:29:55,810 than both the pathologists in the competition, which 758 00:29:55,810 --> 00:29:58,720 is the error rate, I believe, cited in the initial archive 759 00:29:58,720 --> 00:30:00,210 paper. 760 00:30:00,210 --> 00:30:01,960 And also, they took the same set of slides 761 00:30:01,960 --> 00:30:03,760 and sent them out to pathologists operating 762 00:30:03,760 --> 00:30:06,850 in clinical practice, where they had really significantly 763 00:30:06,850 --> 00:30:09,110 higher error rates, mainly due to the fact, 764 00:30:09,110 --> 00:30:11,650 they were more constrained by time limitations 765 00:30:11,650 --> 00:30:13,840 in clinical practice than in the competition. 766 00:30:13,840 --> 00:30:15,820 And most of the errors they are making are false negatives. 767 00:30:15,820 --> 00:30:17,528 Simply, they don't have the time to focus 768 00:30:17,528 --> 00:30:21,610 on small regions of metastasis amid these humongous giga 769 00:30:21,610 --> 00:30:24,432 pixel-size slides. 770 00:30:24,432 --> 00:30:27,780 AUDIENCE: In the paper, you say you combined the machine 771 00:30:27,780 --> 00:30:29,870 learning options with the pathologists, 772 00:30:29,870 --> 00:30:31,410 but you don't really say how. 773 00:30:31,410 --> 00:30:33,790 Is that it that they look at the heat maps, 774 00:30:33,790 --> 00:30:36,718 or is it just sort of combined? 775 00:30:36,718 --> 00:30:38,510 ANDY BECK: Yeah, no, it's a great question. 776 00:30:38,510 --> 00:30:41,510 So today, we do it that way. 777 00:30:41,510 --> 00:30:43,150 And that's the way in clinical practice 778 00:30:43,150 --> 00:30:45,700 we're building it, that the pathologists will look at both 779 00:30:45,700 --> 00:30:48,610 and then make a diagnosis based on incorporating both. 780 00:30:48,610 --> 00:30:51,040 For the competition, it was very simple, 781 00:30:51,040 --> 00:30:52,690 and the organizers actually did it. 782 00:30:52,690 --> 00:30:54,190 They interpreted them independently. 783 00:30:54,190 --> 00:30:56,273 So the pathologists just looked at all the slides. 784 00:30:56,273 --> 00:30:57,620 Our system made a prediction. 785 00:30:57,620 --> 00:31:00,040 It was literally the average of the probability 786 00:31:00,040 --> 00:31:01,615 that that slide contained cancer. 787 00:31:01,615 --> 00:31:03,490 That became the final score, and then the AUC 788 00:31:03,490 --> 00:31:06,100 went to 99% from whatever it was, 789 00:31:06,100 --> 00:31:08,840 92% by combining these two scores. 790 00:31:08,840 --> 00:31:10,840 AUDIENCE: I guess they make uncorrelated errors. 791 00:31:10,840 --> 00:31:11,632 ANDY BECK: Exactly. 792 00:31:11,632 --> 00:31:13,110 They're pretty much uncorrelated, 793 00:31:13,110 --> 00:31:14,860 particularly because the pathologists tend 794 00:31:14,860 --> 00:31:16,990 to have almost all false negatives, 795 00:31:16,990 --> 00:31:20,050 and the deep learning system tends 796 00:31:20,050 --> 00:31:22,090 to be fooled by a few things, like artefact. 797 00:31:22,090 --> 00:31:24,190 And they do make uncorrelated errors, 798 00:31:24,190 --> 00:31:26,275 and that's why there's a huge bump in performance. 799 00:31:31,230 --> 00:31:33,180 So I kind of made a reference to this, 800 00:31:33,180 --> 00:31:35,280 but any of these competition data sets 801 00:31:35,280 --> 00:31:38,335 are relatively easy to get really good at. 802 00:31:38,335 --> 00:31:39,960 People have shown that you can actually 803 00:31:39,960 --> 00:31:42,757 build models that just predict a data set using deep learning. 804 00:31:42,757 --> 00:31:44,340 Like, deep learning is almost too good 805 00:31:44,340 --> 00:31:48,217 at finding certain patterns and can find artefact. 806 00:31:48,217 --> 00:31:49,800 So it's just a caveat to keep in mind. 807 00:31:49,800 --> 00:31:55,230 We're doing experiments on lots of real-world testing 808 00:31:55,230 --> 00:31:57,017 of methods like this across many labs 809 00:31:57,017 --> 00:31:59,100 with many different standing procedures and tissue 810 00:31:59,100 --> 00:32:00,990 preparation procedures, et cetera, 811 00:32:00,990 --> 00:32:02,460 to evaluate the robustness. 812 00:32:02,460 --> 00:32:05,250 But that's why competition results, even ImageNet always 813 00:32:05,250 --> 00:32:09,890 need to be taken with a grain of salt. 814 00:32:09,890 --> 00:32:12,073 And then but we sort of think the value add 815 00:32:12,073 --> 00:32:13,240 of this is going to be huge. 816 00:32:13,240 --> 00:32:15,290 I mean, it's hard to tell because it's such a big image, 817 00:32:15,290 --> 00:32:16,420 but this is what a pathologist today is 818 00:32:16,420 --> 00:32:18,112 looking at under a microscope, and it's 819 00:32:18,112 --> 00:32:19,195 very hard to see anything. 820 00:32:19,195 --> 00:32:22,540 And with a very simple visualization, just of 821 00:32:22,540 --> 00:32:25,450 the output of the AI system as red where cancer looks like it 822 00:32:25,450 --> 00:32:26,350 is. 823 00:32:26,350 --> 00:32:28,720 It's clearly a sort of great map of the areas 824 00:32:28,720 --> 00:32:30,580 they need to be sure to focus on. 825 00:32:30,580 --> 00:32:32,860 And this is real data from this example, where 826 00:32:32,860 --> 00:32:35,800 this bright red area, in fact, contains this tiny little rim 827 00:32:35,800 --> 00:32:37,960 of metastatic breast cancer cells 828 00:32:37,960 --> 00:32:41,020 that would be very easy to miss without that assistant sort 829 00:32:41,020 --> 00:32:43,210 of just pointing you in the right place to look at, 830 00:32:43,210 --> 00:32:45,670 because it's a tiny set of 20 cells 831 00:32:45,670 --> 00:32:48,622 amid a big sea of all these normal lymphocytes. 832 00:32:48,622 --> 00:32:50,080 And here's another one that, again, 833 00:32:50,080 --> 00:32:51,790 now you can see from low power. 834 00:32:51,790 --> 00:32:53,498 It's like a satellite image or something, 835 00:32:53,498 --> 00:32:56,420 where you can focus immediately on this little red area, that, 836 00:32:56,420 --> 00:32:58,960 again, is a tiny pocket of 10 cancer cells 837 00:32:58,960 --> 00:33:01,780 amid hundreds of thousands of normal cells that are now 838 00:33:01,780 --> 00:33:05,750 visible from low power. 839 00:33:05,750 --> 00:33:10,010 So this is one application we're working on, 840 00:33:10,010 --> 00:33:13,490 where the clinical use case will be 841 00:33:13,490 --> 00:33:15,770 today, people are just sort of looking at images 842 00:33:15,770 --> 00:33:17,900 without the assistance of any machine learning. 843 00:33:17,900 --> 00:33:20,450 And they just have to kind of pick a number of patches 844 00:33:20,450 --> 00:33:22,265 to focus on with no guidance. 845 00:33:22,265 --> 00:33:24,140 So sometimes they focus on the right patches, 846 00:33:24,140 --> 00:33:26,510 sometimes they don't, but clearly they don't have time 847 00:33:26,510 --> 00:33:29,030 to look at all of this at high magnification, 848 00:33:29,030 --> 00:33:30,970 because that would take an entire day 849 00:33:30,970 --> 00:33:33,020 if you were trying to look at 40X magnification 850 00:33:33,020 --> 00:33:33,890 at the whole image. 851 00:33:33,890 --> 00:33:35,810 So they sort of use their intuition to focus. 852 00:33:35,810 --> 00:33:37,185 And for that reason, they end up, 853 00:33:37,185 --> 00:33:39,878 as we've seen, making significant number of mistakes. 854 00:33:39,878 --> 00:33:41,420 It's not reproducible, because people 855 00:33:41,420 --> 00:33:43,180 focus on different aspects of the image, 856 00:33:43,180 --> 00:33:44,450 and it's pretty slow. 857 00:33:44,450 --> 00:33:46,240 And they're faced with this empty report. 858 00:33:46,240 --> 00:33:47,810 So they have to actually summarize everything 859 00:33:47,810 --> 00:33:49,070 they've looked at in a report. 860 00:33:49,070 --> 00:33:50,240 Like, what's the diagnosis? 861 00:33:50,240 --> 00:33:51,546 What's the size? 862 00:33:51,546 --> 00:33:53,796 So let's say there's cancer here and cancer here, they 863 00:33:53,796 --> 00:33:56,570 have to manually add the distances of the cancer 864 00:33:56,570 --> 00:33:57,900 in those two regions. 865 00:33:57,900 --> 00:34:01,580 And then they have to put this into a staging system that 866 00:34:01,580 --> 00:34:04,010 incorporates how many areas of metastasis there are 867 00:34:04,010 --> 00:34:05,050 and how big are they? 868 00:34:05,050 --> 00:34:07,217 And all of these things are pretty much automatable. 869 00:34:07,217 --> 00:34:08,675 And this is the kind of thing we're 870 00:34:08,675 --> 00:34:11,630 building, where the system will highlight where it sees cancer, 871 00:34:11,630 --> 00:34:13,489 tell the pathologist to focus there. 872 00:34:13,489 --> 00:34:15,679 And then based on the input of the AI system 873 00:34:15,679 --> 00:34:18,440 and the input of the pathologist can summarize all of that data, 874 00:34:18,440 --> 00:34:21,620 quantitative as well as diagnostic 875 00:34:21,620 --> 00:34:23,389 as well as summary staging. 876 00:34:23,389 --> 00:34:25,190 Sort of if the pathologist then takes this 877 00:34:25,190 --> 00:34:27,080 is their first version of the report, 878 00:34:27,080 --> 00:34:29,710 they can edit it, confirm it, sign it out. 879 00:34:29,710 --> 00:34:31,460 That data goes back into the system, which 880 00:34:31,460 --> 00:34:33,460 can be used for more training data in the future 881 00:34:33,460 --> 00:34:34,850 and the case is signed out. 882 00:34:34,850 --> 00:34:38,239 So it's much faster, much more accurate, and standardized 883 00:34:38,239 --> 00:34:43,080 once this thing is fully developed, which it isn't yet. 884 00:34:43,080 --> 00:34:45,480 So this is a great application for AI, 885 00:34:45,480 --> 00:34:47,670 because you really do need-- 886 00:34:47,670 --> 00:34:49,245 you actually do have a ton of data, 887 00:34:49,245 --> 00:34:51,120 so you need to do an exhaustive analysis that 888 00:34:51,120 --> 00:34:54,025 has a lot of value. 889 00:34:54,025 --> 00:34:57,060 It's a task where the local image data in a patch, 890 00:34:57,060 --> 00:34:59,010 which is really what this current generation 891 00:34:59,010 --> 00:35:01,350 of deep CNN's are really good at, is enough. 892 00:35:01,350 --> 00:35:03,600 So we're looking at things at the cellular level. 893 00:35:03,600 --> 00:35:05,070 Radiology actually could be harder, 894 00:35:05,070 --> 00:35:07,320 because you often want to summarize over larger areas. 895 00:35:07,320 --> 00:35:10,800 Here, you really often have the salient information 896 00:35:10,800 --> 00:35:14,477 in patches that really are scalable in current ML systems. 897 00:35:14,477 --> 00:35:16,560 And then we can interpret the output to the model. 898 00:35:16,560 --> 00:35:19,060 So it really isn't-- even though the model itself is a black 899 00:35:19,060 --> 00:35:22,740 box, we can visualize the output on top of the image, 900 00:35:22,740 --> 00:35:24,690 which gives us incredible advantage in terms 901 00:35:24,690 --> 00:35:27,210 of interpretability of what the models are doing well, 902 00:35:27,210 --> 00:35:29,010 what they're doing poorly on. 903 00:35:29,010 --> 00:35:31,320 And it's a specialty, pathology, where sort of 80% 904 00:35:31,320 --> 00:35:32,320 is not good enough. 905 00:35:32,320 --> 00:35:37,640 We want to get as close to 100% as possible. 906 00:35:37,640 --> 00:35:39,850 And that's one sort of diagnostic application. 907 00:35:39,850 --> 00:35:42,250 The last, or one of the last examples I'm going to give 908 00:35:42,250 --> 00:35:44,542 has to do with precision immunotherapy, where we're not 909 00:35:44,542 --> 00:35:47,590 only trying to identify what the diagnosis is but 910 00:35:47,590 --> 00:35:51,207 to actually subtype patients to predict the right treatment. 911 00:35:51,207 --> 00:35:52,915 And as I mentioned earlier, immunotherapy 912 00:35:52,915 --> 00:35:56,273 is a really important and exciting, relatively new area 913 00:35:56,273 --> 00:35:57,940 of cancer therapy, which was another one 914 00:35:57,940 --> 00:35:59,770 of the big advances in 2012. 915 00:35:59,770 --> 00:36:02,230 Around the same time that deep learning came out, 916 00:36:02,230 --> 00:36:04,270 the first studies came out showing 917 00:36:04,270 --> 00:36:08,410 that targeting a protein mostly on tumor cells 918 00:36:08,410 --> 00:36:12,010 but also on immune cells, the PD-1 or the PD-L1 protein, 919 00:36:12,010 --> 00:36:13,720 which the protein's job when it's on 920 00:36:13,720 --> 00:36:15,860 is to inhibit immune response. 921 00:36:15,860 --> 00:36:18,195 But in the setting of cancer, the inhibition 922 00:36:18,195 --> 00:36:20,320 of immune response is actually bad for the patient, 923 00:36:20,320 --> 00:36:22,540 because the immune system's job is to really try 924 00:36:22,540 --> 00:36:24,230 to fight off the cancer. 925 00:36:24,230 --> 00:36:26,610 So they realized a very simple therapeutic strategy 926 00:36:26,610 --> 00:36:30,310 just having an antibody that binds to this inhibitory signal 927 00:36:30,310 --> 00:36:32,650 can sort of unleash the patient's own immune system 928 00:36:32,650 --> 00:36:36,280 to really end up curing really serious advanced cancers. 929 00:36:36,280 --> 00:36:38,140 And that image on the top right sort of 930 00:36:38,140 --> 00:36:40,150 speaks to that, where this patient had 931 00:36:40,150 --> 00:36:43,030 a very large melanoma. 932 00:36:43,030 --> 00:36:45,310 And then they just got this antibody to target, 933 00:36:45,310 --> 00:36:47,770 to sort of invigorate their immune system, 934 00:36:47,770 --> 00:36:50,200 and then the tumor really shrunk. 935 00:36:50,200 --> 00:36:53,170 And one of the big biomarkers for assessing which patients 936 00:36:53,170 --> 00:36:55,210 will benefit from these therapies 937 00:36:55,210 --> 00:36:57,820 is the tumor cell or the immune cell expressing 938 00:36:57,820 --> 00:37:00,690 this drug target PD-1 or PD-L1. 939 00:37:00,690 --> 00:37:02,440 And the one they test for is PD-L1, 940 00:37:02,440 --> 00:37:05,990 which is the ligand for the PD-1 receptor. 941 00:37:05,990 --> 00:37:07,900 So this is often the key piece of data 942 00:37:07,900 --> 00:37:09,650 used to decide who gets these therapies. 943 00:37:09,650 --> 00:37:12,460 And it turns out, pathologists are pretty bad at scoring this, 944 00:37:12,460 --> 00:37:14,377 not surprisingly, because it's very difficult, 945 00:37:14,377 --> 00:37:17,870 and there's millions of cells potentially per case. 946 00:37:17,870 --> 00:37:19,720 And they show an interobserver agreement 947 00:37:19,720 --> 00:37:22,030 of only 0.86 for scoring on tumor cells, which 948 00:37:22,030 --> 00:37:25,360 isn't bad, but 0.2 for scoring it on immune cells, which 949 00:37:25,360 --> 00:37:27,260 is super important. 950 00:37:27,260 --> 00:37:28,408 So this is a drug target. 951 00:37:28,408 --> 00:37:30,700 We're trying to measure to see which patients might get 952 00:37:30,700 --> 00:37:34,660 this life-saving therapy, but the diagnostic we have 953 00:37:34,660 --> 00:37:37,323 is super hard to interpret. 954 00:37:37,323 --> 00:37:38,740 And some studies, for this reason, 955 00:37:38,740 --> 00:37:41,500 have shown sort of mixed results about how valuable it is. 956 00:37:41,500 --> 00:37:43,810 In some cases, it appears valuable. 957 00:37:43,810 --> 00:37:46,300 In other cases, it appears it's not. 958 00:37:46,300 --> 00:37:48,670 So we want to see would this be a good example of where 959 00:37:48,670 --> 00:37:51,220 we can use machine learning? 960 00:37:51,220 --> 00:37:54,050 And for this type of application, 961 00:37:54,050 --> 00:37:55,750 this is really hard, and we want to be 962 00:37:55,750 --> 00:37:58,090 able to apply it across not just one cancer but 20 963 00:37:58,090 --> 00:37:59,330 different cancers. 964 00:37:59,330 --> 00:38:02,080 So we built a system at PathAI for generating lots 965 00:38:02,080 --> 00:38:03,910 of training data at scale. 966 00:38:03,910 --> 00:38:06,400 And that's something that a competition just won't get you. 967 00:38:06,400 --> 00:38:09,550 Like that competition example had 300 slides. 968 00:38:09,550 --> 00:38:10,720 Once a year, they do it. 969 00:38:10,720 --> 00:38:13,118 But we want to be able to build these models every week 970 00:38:13,118 --> 00:38:13,660 or something. 971 00:38:13,660 --> 00:38:16,600 So now, we have something 500 pathologists signed 972 00:38:16,600 --> 00:38:19,660 into our system that we can use to label lots of pathology data 973 00:38:19,660 --> 00:38:23,020 for us and to really build these models quickly and really 974 00:38:23,020 --> 00:38:23,590 high quality. 975 00:38:23,590 --> 00:38:26,380 So now we have something like over 2 and 1/2 million 976 00:38:26,380 --> 00:38:28,170 annotations in the system. 977 00:38:28,170 --> 00:38:30,670 And that allows us to build tissue region models. 978 00:38:30,670 --> 00:38:33,645 And this is immunohistochemistry in a cancer, where 979 00:38:33,645 --> 00:38:35,020 we've trained a model to identify 980 00:38:35,020 --> 00:38:37,270 all of the cancer epithelium in red, the cancer stroma 981 00:38:37,270 --> 00:38:38,420 in green. 982 00:38:38,420 --> 00:38:39,940 So now we know where the protein is 983 00:38:39,940 --> 00:38:43,690 being expressed, in the epithelium or in the stroma. 984 00:38:43,690 --> 00:38:46,840 And then we've also trained cellular classification. 985 00:38:46,840 --> 00:38:49,700 So now, for every single cell, we classify it as a cell type. 986 00:38:49,700 --> 00:38:52,660 Is it a cancer cell or a fibroblast or a macrophage 987 00:38:52,660 --> 00:38:53,410 or a lymphocyte? 988 00:38:53,410 --> 00:38:55,120 And is it expressing the protein, 989 00:38:55,120 --> 00:38:56,613 based on how brown it is? 990 00:38:56,613 --> 00:38:58,780 So while pathologists will try to make some estimate 991 00:38:58,780 --> 00:39:01,363 across the whole slide, we can actually compute for every cell 992 00:39:01,363 --> 00:39:03,040 and then compute exact statistics 993 00:39:03,040 --> 00:39:05,080 about which cells are expressing this protein 994 00:39:05,080 --> 00:39:07,795 and which patients might be the best candidates for therapy. 995 00:39:13,130 --> 00:39:19,370 And then the question is, can we identify additional things 996 00:39:19,370 --> 00:39:22,160 beyond just PD-L1 protein expression that's predictive 997 00:39:22,160 --> 00:39:23,780 of response to immunotherapy? 998 00:39:23,780 --> 00:39:26,420 And we've developed some machine learning approaches 999 00:39:26,420 --> 00:39:29,220 for doing that. 1000 00:39:29,220 --> 00:39:32,010 And part of it's doing things like quantitating 1001 00:39:32,010 --> 00:39:33,762 different cells and regions on H and E 1002 00:39:33,762 --> 00:39:35,220 images, which currently aren't used 1003 00:39:35,220 --> 00:39:36,720 at all in patient subtyping. 1004 00:39:36,720 --> 00:39:38,940 But we can do analyses to extract new features here 1005 00:39:38,940 --> 00:39:40,590 and to ask, even though nothing's 1006 00:39:40,590 --> 00:39:43,500 known about these images and immunotherapy response, 1007 00:39:43,500 --> 00:39:47,360 can we discover new features here? 1008 00:39:47,360 --> 00:39:48,990 And this would be an example routinely 1009 00:39:48,990 --> 00:39:51,360 of the types of features we can quantify now 1010 00:39:51,360 --> 00:39:55,290 using deep learning to extract these features on any case. 1011 00:39:55,290 --> 00:39:57,690 And this is sort of like every sort of pathologic 1012 00:39:57,690 --> 00:39:59,610 characteristic you can sort of imagine. 1013 00:39:59,610 --> 00:40:01,620 And then we correlate these with drug response 1014 00:40:01,620 --> 00:40:03,270 and can use this as a discovery tool 1015 00:40:03,270 --> 00:40:05,760 for identifying new aspects of pathology predictive 1016 00:40:05,760 --> 00:40:08,550 of which patients will respond best. 1017 00:40:08,550 --> 00:40:10,800 And then we can combine these features into models. 1018 00:40:10,800 --> 00:40:12,300 This is sort of a ridiculous example 1019 00:40:12,300 --> 00:40:13,530 because they're so different. 1020 00:40:13,530 --> 00:40:16,320 But this would be one example where 1021 00:40:16,320 --> 00:40:19,300 the output of the model, and this is totally fake data 1022 00:40:19,300 --> 00:40:21,270 but I think it's just to get to the point. 1023 00:40:21,270 --> 00:40:23,730 Is here, the color indicates the treatment, 1024 00:40:23,730 --> 00:40:25,800 where green would be the immunotherapy, 1025 00:40:25,800 --> 00:40:30,090 red would be the traditional therapy, 1026 00:40:30,090 --> 00:40:32,370 and the goal is to build a model to predict 1027 00:40:32,370 --> 00:40:34,412 which patients actually benefit from the therapy. 1028 00:40:34,412 --> 00:40:36,120 So this may be an easy question, but what 1029 00:40:36,120 --> 00:40:37,950 do you think, if the model's working, 1030 00:40:37,950 --> 00:40:39,867 what would the title of the graph on the right 1031 00:40:39,867 --> 00:40:42,360 be versus the graph on the left if these are 1032 00:40:42,360 --> 00:40:45,028 the ways of classifying patients with our model, 1033 00:40:45,028 --> 00:40:47,070 and the classifications are going to be responder 1034 00:40:47,070 --> 00:40:50,850 class or non-responder class? 1035 00:40:50,850 --> 00:40:52,740 And the color indicates the drug. 1036 00:40:56,450 --> 00:40:58,920 AUDIENCE: The drug works or it doesn't work. 1037 00:40:58,920 --> 00:41:02,442 ANDY BECK: That's right but what's the output of the model? 1038 00:41:02,442 --> 00:41:03,150 But you're right. 1039 00:41:03,150 --> 00:41:05,100 The interpretation of these graphs is drug works, 1040 00:41:05,100 --> 00:41:05,850 drug doesn't work. 1041 00:41:05,850 --> 00:41:07,920 It's kind of a tricky question, right? 1042 00:41:07,920 --> 00:41:10,620 But what is our model trying to predict? 1043 00:41:10,620 --> 00:41:12,870 AUDIENCE: Whether the person is going to die or not? 1044 00:41:12,870 --> 00:41:14,940 It looks like likelihood of death 1045 00:41:14,940 --> 00:41:17,312 is just not as high on the right. 1046 00:41:17,312 --> 00:41:19,020 ANDY BECK: I think the overall likelihood 1047 00:41:19,020 --> 00:41:22,018 is the same on the two graphs, right versus left. 1048 00:41:22,018 --> 00:41:24,060 You don't know how many patients are in each arm. 1049 00:41:24,060 --> 00:41:25,060 But I think the one piece on it-- 1050 00:41:25,060 --> 00:41:26,610 so green is experimental treatment. 1051 00:41:26,610 --> 00:41:27,960 Red is conventional treatment. 1052 00:41:27,960 --> 00:41:29,070 Maybe I already said that. 1053 00:41:29,070 --> 00:41:31,957 So here, and it's sort of like a read my mind type question, 1054 00:41:31,957 --> 00:41:33,540 but here the output of the model would 1055 00:41:33,540 --> 00:41:37,980 be responder to the drug would be the right class of patients. 1056 00:41:37,980 --> 00:41:39,480 And the left class of patients would 1057 00:41:39,480 --> 00:41:41,478 be non-responder to the drug. 1058 00:41:41,478 --> 00:41:43,770 So you're not actually saying anything about prognosis, 1059 00:41:43,770 --> 00:41:46,590 but you're saying that I'm predicting 1060 00:41:46,590 --> 00:41:49,650 that if you're in the right population of patients, 1061 00:41:49,650 --> 00:41:52,020 you will benefit from the blue drug. 1062 00:41:52,020 --> 00:41:54,330 And then you actually see that on this right population 1063 00:41:54,330 --> 00:41:57,060 of patients, the blue drug does really well. 1064 00:41:57,060 --> 00:41:58,650 And then the red drug are patients 1065 00:41:58,650 --> 00:42:01,067 who we thought-- we predicted would benefit from the drug, 1066 00:42:01,067 --> 00:42:02,580 but because it's an experiment, we 1067 00:42:02,580 --> 00:42:03,913 didn't give them the right drug. 1068 00:42:03,913 --> 00:42:05,690 And in fact, they did a whole lot worse. 1069 00:42:05,690 --> 00:42:07,440 Whereas, the one on the left, we're saying 1070 00:42:07,440 --> 00:42:09,000 you don't benefit from the drug, and they truly 1071 00:42:09,000 --> 00:42:10,330 don't benefit from the drug. 1072 00:42:10,330 --> 00:42:12,330 So this is the way of using an output of a model 1073 00:42:12,330 --> 00:42:15,420 to predict drug response and then visualizing 1074 00:42:15,420 --> 00:42:16,620 whether it actually works. 1075 00:42:16,620 --> 00:42:17,995 And it's kind of like the example 1076 00:42:17,995 --> 00:42:21,720 I talked about before, but here's a real version of it. 1077 00:42:21,720 --> 00:42:24,095 And you can learn this directly using machine learning 1078 00:42:24,095 --> 00:42:26,220 to try to say, I want to find patients who actually 1079 00:42:26,220 --> 00:42:27,480 benefit the most from a drug. 1080 00:42:33,660 --> 00:42:36,150 And then in terms of how do we validate 1081 00:42:36,150 --> 00:42:37,192 our models are correct? 1082 00:42:37,192 --> 00:42:38,650 I mean, we have two different ways. 1083 00:42:38,650 --> 00:42:40,130 One is do stuff like that. 1084 00:42:40,130 --> 00:42:42,630 So we build a model that says, respond to drug, 1085 00:42:42,630 --> 00:42:44,312 don't respond to a drug. 1086 00:42:44,312 --> 00:42:46,020 And then we plot the Kaplan-Meier curves. 1087 00:42:46,020 --> 00:42:52,170 If it's image analysis stuff, we ask pathologists to hand label. 1088 00:42:52,170 --> 00:42:53,760 Many cells, and we take the consensus 1089 00:42:53,760 --> 00:42:57,180 of pathologists as our ground truth and go from there. 1090 00:43:01,340 --> 00:43:03,215 AUDIENCE: The way you're presenting it, 1091 00:43:03,215 --> 00:43:05,390 it makes it sound like all the data comes 1092 00:43:05,390 --> 00:43:08,310 from the pathology images. 1093 00:43:08,310 --> 00:43:12,790 But in reality, people look at single nucleotide polymorphisms 1094 00:43:12,790 --> 00:43:19,410 or gene sequences or all kinds of clinical data as well. 1095 00:43:19,410 --> 00:43:21,855 So how do you get those? 1096 00:43:21,855 --> 00:43:24,230 ANDY BECK: Yeah, I mean, the beauty of the pathology data 1097 00:43:24,230 --> 00:43:25,887 is it's always available. 1098 00:43:25,887 --> 00:43:27,470 So that's why a lot of the stuff we do 1099 00:43:27,470 --> 00:43:31,550 is focused on that, because every clinical trial 1100 00:43:31,550 --> 00:43:34,955 patient has treatment data, outcome 1101 00:43:34,955 --> 00:43:36,080 data, and pathology images. 1102 00:43:36,080 --> 00:43:39,830 So it's like, we can really do this at scale pretty fast. 1103 00:43:39,830 --> 00:43:43,220 A lot of the other stuff is things like gene expression, 1104 00:43:43,220 --> 00:43:45,840 many people are collecting them. 1105 00:43:45,840 --> 00:43:48,405 And it's important to compare these to baselines 1106 00:43:48,405 --> 00:43:49,280 or to integrate them. 1107 00:43:49,280 --> 00:43:52,160 I mean, two things-- one is compare to it as a baseline. 1108 00:43:52,160 --> 00:43:55,220 What can we predict in terms of responder, non-responder using 1109 00:43:55,220 --> 00:43:58,700 just the pathology images versus using just gene expression 1110 00:43:58,700 --> 00:44:00,380 data versus combining them? 1111 00:44:00,380 --> 00:44:04,130 And that would just be increasing the input feature 1112 00:44:04,130 --> 00:44:04,630 space. 1113 00:44:04,630 --> 00:44:06,880 Part of the input feature space comes from the images. 1114 00:44:06,880 --> 00:44:08,780 Part of it comes from gene expression data. 1115 00:44:08,780 --> 00:44:10,363 Then you use machine learning to focus 1116 00:44:10,363 --> 00:44:12,170 on the most important characteristics 1117 00:44:12,170 --> 00:44:14,120 and predict outcome. 1118 00:44:14,120 --> 00:44:16,970 And the other is if you want to sort of prioritize. 1119 00:44:16,970 --> 00:44:18,895 Use pathology as a baseline because it's 1120 00:44:18,895 --> 00:44:20,100 available on everyone. 1121 00:44:20,100 --> 00:44:23,480 But then an adjuvant test that costs another $1,000 1122 00:44:23,480 --> 00:44:25,640 and might take another two weeks, how much does 1123 00:44:25,640 --> 00:44:28,140 that add to the prediction? 1124 00:44:28,140 --> 00:44:29,390 And that would be another way. 1125 00:44:29,390 --> 00:44:31,427 So I think it is important, but a lot 1126 00:44:31,427 --> 00:44:33,260 of our technology to developing our platform 1127 00:44:33,260 --> 00:44:35,540 is focused around how do we most effectively use 1128 00:44:35,540 --> 00:44:38,220 pathology and can certainly add in gene expression date. 1129 00:44:38,220 --> 00:44:40,220 I'm actually going to talk about that next-- one 1130 00:44:40,220 --> 00:44:40,968 way of doing it. 1131 00:44:40,968 --> 00:44:43,010 Because it's a very natural synergy, because they 1132 00:44:43,010 --> 00:44:44,302 tell you very different things. 1133 00:44:47,250 --> 00:44:49,357 So here's one example of integrating, just kind 1134 00:44:49,357 --> 00:44:51,440 of relative to that question, gene expression data 1135 00:44:51,440 --> 00:44:54,320 with image data, where the cancer genome analysis, 1136 00:44:54,320 --> 00:44:55,280 and this is all public. 1137 00:44:55,280 --> 00:44:58,787 So they have pathology images, RNA data, clinical outcomes. 1138 00:44:58,787 --> 00:45:00,620 They don't have the greatest treatment data, 1139 00:45:00,620 --> 00:45:02,495 but it's a great place for method development 1140 00:45:02,495 --> 00:45:06,110 for sort of ML in cancer, including 1141 00:45:06,110 --> 00:45:07,850 pathology-type analyses. 1142 00:45:07,850 --> 00:45:09,470 So this is a case of melanoma. 1143 00:45:09,470 --> 00:45:12,140 We've trained a model to identify cancer and stroma 1144 00:45:12,140 --> 00:45:13,950 and all the different cells. 1145 00:45:13,950 --> 00:45:16,980 And then we extract, as you saw, sort of hundreds of features. 1146 00:45:16,980 --> 00:45:19,820 And then we can rank the features here 1147 00:45:19,820 --> 00:45:21,960 by their correlation with survival. 1148 00:45:21,960 --> 00:45:24,110 So now we're mapping from pathology images 1149 00:45:24,110 --> 00:45:27,890 to outcome data and we find just in a totally data-driven way 1150 00:45:27,890 --> 00:45:31,228 that there's some small set of 15 features or so highly 1151 00:45:31,228 --> 00:45:32,270 associated with survival. 1152 00:45:32,270 --> 00:45:33,510 The rest aren't. 1153 00:45:33,510 --> 00:45:36,920 And the top ranking one is an immune cell feature, 1154 00:45:36,920 --> 00:45:38,630 increased area of stroma plasma cells 1155 00:45:38,630 --> 00:45:40,500 that are associated with increased survival. 1156 00:45:40,500 --> 00:45:42,708 And this was an analysis that was really just linking 1157 00:45:42,708 --> 00:45:43,970 the images with outcome. 1158 00:45:43,970 --> 00:45:47,060 And then we can ask, well, what are the genes underlying 1159 00:45:47,060 --> 00:45:48,050 this pathology? 1160 00:45:48,050 --> 00:45:50,990 So pathology is telling you about cells and tissues. 1161 00:45:50,990 --> 00:45:53,570 RNAs are telling you about the actual transcriptional 1162 00:45:53,570 --> 00:45:57,180 landscape of what's going on underneath. 1163 00:45:57,180 --> 00:45:59,283 And then we can rank all the genes in the genome 1164 00:45:59,283 --> 00:46:01,700 just by their correlation with this quantitative phenotype 1165 00:46:01,700 --> 00:46:02,990 we're measuring on the pathology images. 1166 00:46:02,990 --> 00:46:05,420 And here are all the genes, ranked from 0 to 20,000. 1167 00:46:05,420 --> 00:46:08,400 And again, we see a small set that we're thresholding 1168 00:46:08,400 --> 00:46:11,450 at a correlation of 0.4, strongly 1169 00:46:11,450 --> 00:46:14,720 associated with the pathologic phenotype we're measuring. 1170 00:46:14,720 --> 00:46:17,360 And then we sort of discover these sets 1171 00:46:17,360 --> 00:46:20,480 of genes that are known to be highly enriched in immune cell 1172 00:46:20,480 --> 00:46:21,230 genes. 1173 00:46:21,230 --> 00:46:23,635 Sort of which is some form of validation 1174 00:46:23,635 --> 00:46:25,760 that we're measuring what we think we're measuring, 1175 00:46:25,760 --> 00:46:29,690 but also this sets of genes are potentially new drug targets, 1176 00:46:29,690 --> 00:46:32,388 new diagnostics, et cetera, that was uncovered 1177 00:46:32,388 --> 00:46:34,430 by going from clinical outcomes to pathology data 1178 00:46:34,430 --> 00:46:36,368 to the underlying RNA signature. 1179 00:46:39,440 --> 00:46:41,990 And then kind of the beauty of the approach we're working on 1180 00:46:41,990 --> 00:46:44,900 is it's super scalable, and in theory, you 1181 00:46:44,900 --> 00:46:47,000 could apply it to all of TCGA or other data sets 1182 00:46:47,000 --> 00:46:51,950 and apply it across cancer types and do things like find-- 1183 00:46:51,950 --> 00:46:57,230 automatically find artefacts in all of the slides 1184 00:46:57,230 --> 00:46:59,720 and kind of do this in a broad way. 1185 00:46:59,720 --> 00:47:02,700 And then sort of the most interesting part, potentially, 1186 00:47:02,700 --> 00:47:04,453 is analyzing the outputs of the models 1187 00:47:04,453 --> 00:47:05,870 and how they correlate with things 1188 00:47:05,870 --> 00:47:09,930 like drug response or underlying molecular profiles. 1189 00:47:09,930 --> 00:47:11,930 And this is really the process we're working on, 1190 00:47:11,930 --> 00:47:15,290 is how do we go from images to new ways of measuring disease 1191 00:47:15,290 --> 00:47:16,792 pathology? 1192 00:47:16,792 --> 00:47:19,250 And kind of in summary, a lot of the technology development 1193 00:47:19,250 --> 00:47:21,080 that I think is most important today 1194 00:47:21,080 --> 00:47:22,700 for getting ML to work really well 1195 00:47:22,700 --> 00:47:25,490 in the real world for applications in medicine 1196 00:47:25,490 --> 00:47:28,370 is a lot about being super thoughtful about building 1197 00:47:28,370 --> 00:47:29,900 the right training data set. 1198 00:47:29,900 --> 00:47:32,040 And how do you do that in a scalable way and even 1199 00:47:32,040 --> 00:47:33,770 in a way that incorporates machine learning? 1200 00:47:33,770 --> 00:47:34,670 Which is kind of what I was talking about 1201 00:47:34,670 --> 00:47:36,410 before-- intelligently picking patches. 1202 00:47:36,410 --> 00:47:39,180 But that sort of concept applies everywhere. 1203 00:47:39,180 --> 00:47:41,360 So I think there's almost more room for innovation 1204 00:47:41,360 --> 00:47:44,210 on the defining the training data set side 1205 00:47:44,210 --> 00:47:46,675 than on the predictive modeling side, 1206 00:47:46,675 --> 00:47:48,050 and then putting the two together 1207 00:47:48,050 --> 00:47:50,378 is incredibly important. 1208 00:47:50,378 --> 00:47:51,920 And for the kind of work we're doing, 1209 00:47:51,920 --> 00:47:54,620 there's already such great advances in image processing. 1210 00:47:54,620 --> 00:47:57,230 A lot of it's about engineering and scalability, 1211 00:47:57,230 --> 00:47:59,220 as well as rigorous validation. 1212 00:47:59,220 --> 00:48:01,830 And then how do we connect it with underlying molecular data 1213 00:48:01,830 --> 00:48:03,720 as well as clinical outcome data? 1214 00:48:03,720 --> 00:48:08,445 Versus trying to solve a lot of the core vision tasks, which 1215 00:48:08,445 --> 00:48:10,320 there's already just been incredible progress 1216 00:48:10,320 --> 00:48:11,900 over the past couple of years. 1217 00:48:11,900 --> 00:48:13,920 And in terms of in our world, things 1218 00:48:13,920 --> 00:48:15,960 we think a lot about, not just the technology 1219 00:48:15,960 --> 00:48:17,490 and putting together our data sets but also, 1220 00:48:17,490 --> 00:48:18,850 how do we work with regulators? 1221 00:48:18,850 --> 00:48:20,547 How do we make strong business cases 1222 00:48:20,547 --> 00:48:22,380 for partners working with to actually change 1223 00:48:22,380 --> 00:48:24,048 what they're doing to incorporate some 1224 00:48:24,048 --> 00:48:26,340 of these new approaches that will really bring benefits 1225 00:48:26,340 --> 00:48:31,340 to patients around quality and accuracy in their diagnosis? 1226 00:48:31,340 --> 00:48:32,280 So in summary-- 1227 00:48:32,280 --> 00:48:34,540 I know you have to go in four minutes-- 1228 00:48:34,540 --> 00:48:36,580 this has been a longstanding problem. 1229 00:48:36,580 --> 00:48:39,070 There's nothing new about trying to apply AI 1230 00:48:39,070 --> 00:48:41,500 to diagnostics or to vision tasks, 1231 00:48:41,500 --> 00:48:44,620 but there are some really big differences in the past five 1232 00:48:44,620 --> 00:48:46,600 years that, even in my short career, 1233 00:48:46,600 --> 00:48:49,480 I've seen a sea change in this field. 1234 00:48:49,480 --> 00:48:51,250 One is availability of digital data-- 1235 00:48:51,250 --> 00:48:53,830 it's now much cheaper to generate lots of images 1236 00:48:53,830 --> 00:48:55,450 at scale. 1237 00:48:55,450 --> 00:48:56,860 But even more important, I think, 1238 00:48:56,860 --> 00:48:59,620 are the last two, which is access to large-scale computing 1239 00:48:59,620 --> 00:49:03,790 resources is a game-changer for anyone with access 1240 00:49:03,790 --> 00:49:06,790 to cloud computing or large computing resources. 1241 00:49:06,790 --> 00:49:09,220 Just, we all have access to a sort of arbitrary 1242 00:49:09,220 --> 00:49:11,800 compute today, and 10 years ago, that 1243 00:49:11,800 --> 00:49:13,755 was a huge limitation in this field. 1244 00:49:13,755 --> 00:49:15,880 As well as these really major algorithmic advances, 1245 00:49:15,880 --> 00:49:19,090 particularly deep CNN's revision. 1246 00:49:19,090 --> 00:49:21,700 And, in general, AI works extremely well 1247 00:49:21,700 --> 00:49:25,180 when problems can be defined to get the right type of training 1248 00:49:25,180 --> 00:49:28,092 data, access, large-scale computing, 1249 00:49:28,092 --> 00:49:30,550 as well as implement things like deep CNNs that work really 1250 00:49:30,550 --> 00:49:31,278 well. 1251 00:49:31,278 --> 00:49:33,070 And it sort of fails everywhere else, which 1252 00:49:33,070 --> 00:49:34,770 is probably 98% of things. 1253 00:49:34,770 --> 00:49:37,660 But if you can create a problem where the algorithms actually 1254 00:49:37,660 --> 00:49:40,960 work, you can have lots of data to train on, 1255 00:49:40,960 --> 00:49:43,330 they can succeed really well. 1256 00:49:43,330 --> 00:49:46,420 And this sort of vision-based AI-powered pathology 1257 00:49:46,420 --> 00:49:49,060 is broadly applicable across, really, all image-based tasks 1258 00:49:49,060 --> 00:49:49,660 and pathology. 1259 00:49:49,660 --> 00:49:51,243 It does enable integration with things 1260 00:49:51,243 --> 00:49:54,010 like omics data-- genomics, transcriptonics, 1261 00:49:54,010 --> 00:49:57,010 SNP data, et cetera. 1262 00:49:57,010 --> 00:49:59,500 And in the near future, we think this will be incorporated 1263 00:49:59,500 --> 00:50:00,520 into clinical practice. 1264 00:50:00,520 --> 00:50:02,470 And even today, it's really central to a lot 1265 00:50:02,470 --> 00:50:04,935 of research efforts. 1266 00:50:04,935 --> 00:50:06,310 And I just want to end on a quote 1267 00:50:06,310 --> 00:50:08,620 from 1987, where in the future, AI 1268 00:50:08,620 --> 00:50:12,070 can be expected to become staples of pathology practice. 1269 00:50:12,070 --> 00:50:17,305 And I think we're much, much closer than 30 years ago. 1270 00:50:17,305 --> 00:50:18,930 And I want to thank everyone at PathAI, 1271 00:50:18,930 --> 00:50:20,500 as well as Hunter, who really helped put together 1272 00:50:20,500 --> 00:50:21,417 a lot of these slides. 1273 00:50:21,417 --> 00:50:23,140 And we do have lots of opportunities 1274 00:50:23,140 --> 00:50:25,970 for machine learning engineers, software engineers, 1275 00:50:25,970 --> 00:50:26,940 et cetera, at PathAI. 1276 00:50:26,940 --> 00:50:30,520 So certainly reach out if you're interested in learning more. 1277 00:50:30,520 --> 00:50:32,990 And I'm happy to take any questions, if we have time. 1278 00:50:32,990 --> 00:50:35,035 So thank you. 1279 00:50:35,035 --> 00:50:36,430 [APPLAUSE] 1280 00:50:40,150 --> 00:50:42,760 AUDIENCE: Yes, I think generally very aggressive events. 1281 00:50:42,760 --> 00:50:46,640 I was wondering how close is this to clinical practice? 1282 00:50:46,640 --> 00:50:48,180 Is there FDA or-- 1283 00:50:48,180 --> 00:50:52,590 ANDY BECK: Yeah, so I mean, actual clinical practice, 1284 00:50:52,590 --> 00:50:57,890 probably 2020, like early, mid-2020. 1285 00:50:57,890 --> 00:51:01,450 But I mean, today, it's very active in clinical research, 1286 00:51:01,450 --> 00:51:03,900 so like clinical trials, et cetera, that do 1287 00:51:03,900 --> 00:51:07,740 involve patients, but it's in a much more well-defined setting. 1288 00:51:07,740 --> 00:51:09,770 But the first clinical use cases, 1289 00:51:09,770 --> 00:51:12,000 at least of the types of stuff we're building, 1290 00:51:12,000 --> 00:51:13,873 will be, I think, about a year from now. 1291 00:51:13,873 --> 00:51:15,540 And I think it will start small and then 1292 00:51:15,540 --> 00:51:16,680 get progressively bigger. 1293 00:51:16,680 --> 00:51:18,513 So I don't think it's going to be everything 1294 00:51:18,513 --> 00:51:20,345 all at once transforms in the clinic, 1295 00:51:20,345 --> 00:51:21,720 but I do think we'll start seeing 1296 00:51:21,720 --> 00:51:23,497 the first applications out. 1297 00:51:23,497 --> 00:51:25,830 And they will go-- some of them will go through the FDA, 1298 00:51:25,830 --> 00:51:27,830 and there'll be some laboratory-developed tests. 1299 00:51:27,830 --> 00:51:30,450 Ours will go through the FDA, but labs themselves 1300 00:51:30,450 --> 00:51:33,870 can actually validate tools themselves. 1301 00:51:33,870 --> 00:51:35,472 And that's another path. 1302 00:51:35,472 --> 00:51:36,180 AUDIENCE: Thanks. 1303 00:51:36,180 --> 00:51:36,847 ANDY BECK: Sure. 1304 00:51:46,698 --> 00:51:51,880 PROFESSOR: So have you been using observational data sets? 1305 00:51:51,880 --> 00:51:56,200 You gave one example where you tried to use data 1306 00:51:56,200 --> 00:51:58,540 from a randomized controlled trial, or both trials, 1307 00:51:58,540 --> 00:52:00,373 you used different randomized control trials 1308 00:52:00,373 --> 00:52:03,820 for different efficacies of each event. 1309 00:52:03,820 --> 00:52:05,560 The next major segment of this course, 1310 00:52:05,560 --> 00:52:08,335 starting in about two weeks, will be about causal inference 1311 00:52:08,335 --> 00:52:10,060 from observational data. 1312 00:52:10,060 --> 00:52:12,120 I'm wondering if that is something 1313 00:52:12,120 --> 00:52:14,300 PathAI has gotten into yet? 1314 00:52:14,300 --> 00:52:17,410 And if so, what has your finding been so far? 1315 00:52:17,410 --> 00:52:20,320 ANDY BECK: So we have focused a lot on randomized controlled 1316 00:52:20,320 --> 00:52:24,160 trial data and have developed methods 1317 00:52:24,160 --> 00:52:26,650 around that, which sort of simplifies the problem 1318 00:52:26,650 --> 00:52:30,640 and allows us to do, I think, pretty clever things around how 1319 00:52:30,640 --> 00:52:33,330 to generate those types of graphs I was showing, 1320 00:52:33,330 --> 00:52:38,620 where you truly can infer the treatment is having an effect. 1321 00:52:38,620 --> 00:52:39,910 And we've done far less. 1322 00:52:39,910 --> 00:52:41,230 I'm super interested in that. 1323 00:52:41,230 --> 00:52:42,940 I'd say the advantages of RCTs are 1324 00:52:42,940 --> 00:52:45,580 people are already investing hugely in building these very 1325 00:52:45,580 --> 00:52:49,270 well-curated data sets that include images, 1326 00:52:49,270 --> 00:52:52,170 molecular data, when available, treatment, and outcome. 1327 00:52:52,170 --> 00:52:53,980 And it's just that's there, because they've 1328 00:52:53,980 --> 00:52:55,300 invested in the clinical trial. 1329 00:52:55,300 --> 00:52:57,130 They've invested in generating that data set. 1330 00:52:57,130 --> 00:52:59,110 To me, the big challenge in observational stuff, 1331 00:52:59,110 --> 00:53:01,443 there's a few but I'd be interested in what you guys are 1332 00:53:01,443 --> 00:53:04,120 doing and learn about it, is getting 1333 00:53:04,120 --> 00:53:06,310 the data is not easy, right? 1334 00:53:06,310 --> 00:53:09,400 The outcome data is not-- 1335 00:53:09,400 --> 00:53:11,565 linking the pathology images with the outcome data 1336 00:53:11,565 --> 00:53:12,940 even is, actually, in my opinion, 1337 00:53:12,940 --> 00:53:14,865 harder in observational way than in RCT. 1338 00:53:14,865 --> 00:53:16,990 Because they're actually doing it and paying for it 1339 00:53:16,990 --> 00:53:18,790 and collecting it in RCTs. 1340 00:53:18,790 --> 00:53:21,270 No one's really done a very good job of-- 1341 00:53:21,270 --> 00:53:23,920 TCGA would be a good place to play around with because that 1342 00:53:23,920 --> 00:53:26,320 is observational data. 1343 00:53:26,320 --> 00:53:27,700 And we want to also, we generally 1344 00:53:27,700 --> 00:53:29,860 want to focus on actionable decisions. 1345 00:53:29,860 --> 00:53:32,050 And RCT is sort of perfectly set up for that. 1346 00:53:32,050 --> 00:53:35,378 Do I give drug X or not? 1347 00:53:35,378 --> 00:53:37,420 So I think if you put together the right data set 1348 00:53:37,420 --> 00:53:40,220 and somehow make the results actionable, 1349 00:53:40,220 --> 00:53:41,770 it could be really, really useful, 1350 00:53:41,770 --> 00:53:43,062 because there is a lot of data. 1351 00:53:43,062 --> 00:53:45,093 But I think just collecting the outcomes 1352 00:53:45,093 --> 00:53:47,260 and linking them with images is actually quite hard. 1353 00:53:47,260 --> 00:53:49,690 And ironically, I think it's harder for observational 1354 00:53:49,690 --> 00:53:52,600 than for randomized control trials, where they're already 1355 00:53:52,600 --> 00:53:53,290 collecting it. 1356 00:53:53,290 --> 00:53:55,248 I guess one example would be the Nurses' Health 1357 00:53:55,248 --> 00:53:58,600 Study or these big epidemiology cohorts, potentially. 1358 00:53:58,600 --> 00:54:00,765 They are collecting that data and organizing it. 1359 00:54:00,765 --> 00:54:02,140 But what were you thinking about? 1360 00:54:02,140 --> 00:54:03,220 Do you have anything with pathology 1361 00:54:03,220 --> 00:54:05,448 in mind for causal inference from observational data? 1362 00:54:05,448 --> 00:54:06,990 PROFESSOR: Well, I think, the example 1363 00:54:06,990 --> 00:54:11,000 you gave, like Nurses' Health Study or the Framingham study, 1364 00:54:11,000 --> 00:54:13,510 where you're tracking patients across time. 1365 00:54:13,510 --> 00:54:16,700 They're getting different interventions across time. 1366 00:54:16,700 --> 00:54:19,370 And because of the way the study was designed, in fact, 1367 00:54:19,370 --> 00:54:22,040 there are even good outcomes for patients across times. 1368 00:54:22,040 --> 00:54:23,415 So that problem in the profession 1369 00:54:23,415 --> 00:54:24,960 doesn't happen there. 1370 00:54:24,960 --> 00:54:27,910 But then suppose you were to take it from a biobank 1371 00:54:27,910 --> 00:54:28,860 and do pathologies? 1372 00:54:28,860 --> 00:54:31,450 You're now getting the samples. 1373 00:54:31,450 --> 00:54:33,442 Then, you can ask about, well, what 1374 00:54:33,442 --> 00:54:35,650 is the effect of different interventions or treatment 1375 00:54:35,650 --> 00:54:37,740 plans on outcomes? 1376 00:54:37,740 --> 00:54:39,610 The challenge, of course, drawing inferences 1377 00:54:39,610 --> 00:54:41,180 there is that there was bias in terms 1378 00:54:41,180 --> 00:54:43,153 of who got what treatments. 1379 00:54:43,153 --> 00:54:45,445 That's where the techniques that we talk about in class 1380 00:54:45,445 --> 00:54:48,612 would become very important. 1381 00:54:48,612 --> 00:54:51,070 I just say, I appreciate the challenges that you mentioned. 1382 00:54:51,070 --> 00:54:52,903 ANDY BECK: I think it's incredibly powerful. 1383 00:54:52,903 --> 00:54:55,510 I think the other issue I just think about is that treatments 1384 00:54:55,510 --> 00:54:57,040 change so quickly over time. 1385 00:54:57,040 --> 00:54:59,248 So you don't want to be like overfitting to the past. 1386 00:55:01,145 --> 00:55:02,770 But I think there's certain cases where 1387 00:55:02,770 --> 00:55:04,930 the therapeutic decisions today are similar to what 1388 00:55:04,930 --> 00:55:05,847 they were in the past. 1389 00:55:05,847 --> 00:55:08,620 There are other areas, like immunooncology, where there's 1390 00:55:08,620 --> 00:55:10,570 just no history to learn from. 1391 00:55:10,570 --> 00:55:12,350 So I think it depends on the-- 1392 00:55:12,350 --> 00:55:14,850 PROFESSOR: All right, then with that, let's thank Andy Beck. 1393 00:55:14,850 --> 00:55:15,350 [APPLAUSE] 1394 00:55:15,350 --> 00:55:16,700 ANDY BECK: Thank you.