1 00:00:15,920 --> 00:00:19,520 PETER SZOLOVITS: OK, so a little over a year 2 00:00:19,520 --> 00:00:24,620 ago, I got a call from this committee. 3 00:00:24,620 --> 00:00:28,550 NASEM is the National Academy of Science, Engineering, 4 00:00:28,550 --> 00:00:30,200 and Medicine. 5 00:00:30,200 --> 00:00:36,830 So this is an august body of old people with lots of gray hair 6 00:00:36,830 --> 00:00:40,130 who have done something important enough to get elected 7 00:00:40,130 --> 00:00:41,870 to these academies. 8 00:00:41,870 --> 00:00:45,590 And their research arm is called the National Research Council 9 00:00:45,590 --> 00:00:48,050 and has a bunch of different committees. 10 00:00:48,050 --> 00:00:50,750 One of them is this Committee on Science, Technology, 11 00:00:50,750 --> 00:00:52,033 and the Law. 12 00:00:52,033 --> 00:00:53,450 It's a very interesting committee. 13 00:00:53,450 --> 00:00:55,880 It's chaired by David Baltimore, who 14 00:00:55,880 --> 00:01:00,260 used to be an MIT professor until he went and became 15 00:01:00,260 --> 00:01:02,750 president of Caltech. 16 00:01:02,750 --> 00:01:05,990 And he also happens to have a Nobel Prize in his pocket 17 00:01:05,990 --> 00:01:09,410 and he's a pretty famous guy. 18 00:01:09,410 --> 00:01:13,220 And Judge David Tatel is a member 19 00:01:13,220 --> 00:01:16,190 of the US Court of Appeals for the District of Columbia 20 00:01:16,190 --> 00:01:21,260 circuit, so this is probably the most important circuit court. 21 00:01:21,260 --> 00:01:24,110 It's one level below the Supreme Court. 22 00:01:24,110 --> 00:01:26,360 And he happens to sit in the seat 23 00:01:26,360 --> 00:01:28,910 that Ruth Bader Ginsburg occupied 24 00:01:28,910 --> 00:01:32,120 before she was elevated to the Supreme Court 25 00:01:32,120 --> 00:01:36,240 from that Court of Appeals, so this is a pretty big deal. 26 00:01:36,240 --> 00:01:38,970 So these are heavy hitters. 27 00:01:38,970 --> 00:01:45,510 And they convened a meeting to talk about the set of topics 28 00:01:45,510 --> 00:01:47,700 that I've listed here. 29 00:01:47,700 --> 00:01:51,240 So blockchain and distributed trust, artificial intelligence 30 00:01:51,240 --> 00:01:54,660 and decision making, which is obviously the part that I got 31 00:01:54,660 --> 00:01:58,140 invited to talk about, privacy and informed consent 32 00:01:58,140 --> 00:02:01,560 in an era of big data, science curricula for law 33 00:02:01,560 --> 00:02:06,340 schools, emerging issues, and science, technology, and law. 34 00:02:06,340 --> 00:02:09,900 The issue of using litigation to target scientists who have 35 00:02:09,900 --> 00:02:12,380 opinions that you don't like. 36 00:02:12,380 --> 00:02:15,900 And the more general issue of how do you 37 00:02:15,900 --> 00:02:20,730 communicate advances in life sciences to a skeptical public. 38 00:02:20,730 --> 00:02:24,870 So this is dealing with the sort of anti-science tenor 39 00:02:24,870 --> 00:02:27,030 of the times. 40 00:02:27,030 --> 00:02:31,710 So the group of us that talked about AI and decision making, 41 00:02:31,710 --> 00:02:34,410 I was a little bit surprised by the focus 42 00:02:34,410 --> 00:02:38,520 because Hank really is a law school professor at Stanford 43 00:02:38,520 --> 00:02:43,290 who's done a lot of work on fairness and prejudice 44 00:02:43,290 --> 00:02:47,380 in health care. 45 00:02:47,380 --> 00:02:51,540 Cherise Burdee is at something called the Pretrial Justice 46 00:02:51,540 --> 00:02:54,600 Institute, and her issue is a legal one 47 00:02:54,600 --> 00:02:58,770 which is that there are now a lot of companies that 48 00:02:58,770 --> 00:03:03,330 have software that predict, if you get bail while you're 49 00:03:03,330 --> 00:03:07,450 awaiting trial, are you likely to skip bail or not? 50 00:03:07,450 --> 00:03:10,350 And so this is influential in the decision 51 00:03:10,350 --> 00:03:13,650 that judges make about how much bail to impose 52 00:03:13,650 --> 00:03:17,670 and whether to let you out on bail at all 53 00:03:17,670 --> 00:03:21,210 or to keep you in prison, awaiting your trial. 54 00:03:21,210 --> 00:03:25,710 Matt Lundgren is a radiology professor at Stanford 55 00:03:25,710 --> 00:03:28,020 and has done some of the really cool work 56 00:03:28,020 --> 00:03:31,530 on building convolutional neural network 57 00:03:31,530 --> 00:03:36,140 models to detect pulmonary emboli and various other things 58 00:03:36,140 --> 00:03:40,560 in imaging data. 59 00:03:40,560 --> 00:03:47,130 You know the next guy, and Suresh Venkatasubramanian 60 00:03:47,130 --> 00:03:48,570 is a professor. 61 00:03:48,570 --> 00:03:52,830 He was originally a theorist at the University of Utah 62 00:03:52,830 --> 00:03:56,520 but has also gotten into thinking a lot about privacy 63 00:03:56,520 --> 00:03:58,290 and fairness. 64 00:03:58,290 --> 00:04:02,820 And so that that was our panel, and we each gave a brief talk 65 00:04:02,820 --> 00:04:06,160 and then had a very interesting discussion. 66 00:04:06,160 --> 00:04:09,660 One of the things that I was very surprised by is somebody 67 00:04:09,660 --> 00:04:15,390 raised the question of shouldn't Tatel as a judge on the Circuit 68 00:04:15,390 --> 00:04:18,420 Court of Appeals hire people like you 69 00:04:18,420 --> 00:04:22,830 guys to be clerks in his court? 70 00:04:22,830 --> 00:04:25,080 So people like you guys who also happen 71 00:04:25,080 --> 00:04:29,250 to go to law school, of which there are a number now 72 00:04:29,250 --> 00:04:34,230 of people who are trained in computational methods 73 00:04:34,230 --> 00:04:39,300 and machine learning but also have the legal background. 74 00:04:39,300 --> 00:04:42,030 And he said something very interesting to me. 75 00:04:42,030 --> 00:04:44,250 He said, no, he wouldn't want people 76 00:04:44,250 --> 00:04:48,585 like that, which kind of shocked me. 77 00:04:48,585 --> 00:04:52,410 And so we quizzed him a little bit on why, 78 00:04:52,410 --> 00:04:58,830 and he said, well, because he views the role of the judge 79 00:04:58,830 --> 00:05:02,490 not to be an expert but to be a judge. 80 00:05:02,490 --> 00:05:07,590 To be a balancer of arguments on both sides of an issue. 81 00:05:07,590 --> 00:05:10,740 And he was afraid that if he had a clerk who 82 00:05:10,740 --> 00:05:13,530 had a strong technical background, 83 00:05:13,530 --> 00:05:16,560 that person would have strong technical opinions which 84 00:05:16,560 --> 00:05:20,110 would bias his decision one way or another. 85 00:05:20,110 --> 00:05:21,600 So this reminded me-- 86 00:05:21,600 --> 00:05:23,820 my wife was a lawyer, and I remember, 87 00:05:23,820 --> 00:05:27,870 when she was in law school, she would tell me about the classes 88 00:05:27,870 --> 00:05:29,430 that she was taking. 89 00:05:29,430 --> 00:05:35,250 And it became obvious that studying law 90 00:05:35,250 --> 00:05:41,220 was learning how to win, not learning how to find the truth. 91 00:05:41,220 --> 00:05:43,350 And there's this philosophical notion 92 00:05:43,350 --> 00:05:47,010 in the law that says that the truth will come out 93 00:05:47,010 --> 00:05:51,300 from spirited argument on two sides of a question, 94 00:05:51,300 --> 00:05:55,500 but your duty as a lawyer is to argue as hard as you can 95 00:05:55,500 --> 00:05:58,500 for your side of the argument. 96 00:05:58,500 --> 00:06:02,460 And in fact, in law school, they teach them, like in debate, 97 00:06:02,460 --> 00:06:06,540 that you should be able to take either side of any case 98 00:06:06,540 --> 00:06:09,750 and be able to make a cogent argument for it. 99 00:06:09,750 --> 00:06:13,650 And so Tatel sort of reinforced that notion 100 00:06:13,650 --> 00:06:17,520 in what he said, which I thought was interesting. 101 00:06:17,520 --> 00:06:21,210 Well, just to talk a little bit about the justice area 102 00:06:21,210 --> 00:06:24,180 because this is the one that has gotten the most 103 00:06:24,180 --> 00:06:29,970 public attention, governments use decision automation 104 00:06:29,970 --> 00:06:34,260 for determining eligibility for various kinds of services, 105 00:06:34,260 --> 00:06:37,230 evaluating where to deploy health inspectors and law 106 00:06:37,230 --> 00:06:40,380 enforcement personnel, defining boundaries 107 00:06:40,380 --> 00:06:42,060 along voting districts. 108 00:06:42,060 --> 00:06:46,820 So all of the gerrymandering discussion that you hear about 109 00:06:46,820 --> 00:06:50,070 is all about using computers and actually 110 00:06:50,070 --> 00:06:53,910 machine learning techniques to try to figure out how to-- 111 00:06:53,910 --> 00:06:57,930 your objective function is to get Republicans or Democrats 112 00:06:57,930 --> 00:07:02,010 elected, depending on who's in charge of the redistricting. 113 00:07:02,010 --> 00:07:06,690 And then you tailor these gerrymandered districts 114 00:07:06,690 --> 00:07:10,710 in order to maximize the probability that you're 115 00:07:10,710 --> 00:07:16,140 going to have the majority in whatever congressional races 116 00:07:16,140 --> 00:07:21,300 or state legislative races. 117 00:07:21,300 --> 00:07:27,110 So in the law, people are in favor of these ideas 118 00:07:27,110 --> 00:07:30,170 to the extent that they inject clarity and precision 119 00:07:30,170 --> 00:07:33,170 into bail, parole, and sentencing decisions. 120 00:07:33,170 --> 00:07:35,300 Algorithmic technologies may minimize 121 00:07:35,300 --> 00:07:38,150 harms that are the products of human judgment. 122 00:07:38,150 --> 00:07:41,840 So we know that people are in fact prejudiced, 123 00:07:41,840 --> 00:07:46,520 and so there are prejudices by judges and by juries 124 00:07:46,520 --> 00:07:52,020 that play into the decisions made in the legal system. 125 00:07:52,020 --> 00:07:56,450 So by formalizing it, you might win. 126 00:07:56,450 --> 00:07:59,120 However, conversely, the use of technology 127 00:07:59,120 --> 00:08:02,840 to determine whose liberty is deprived and on what terms 128 00:08:02,840 --> 00:08:06,140 raises significant concerns about transparency 129 00:08:06,140 --> 00:08:07,880 and interpretability. 130 00:08:07,880 --> 00:08:10,880 So next week, we're going to talk some about transparency 131 00:08:10,880 --> 00:08:14,000 and interpretability, but today's is really 132 00:08:14,000 --> 00:08:16,890 about fairness. 133 00:08:16,890 --> 00:08:21,165 So here is an article from October of last year-- 134 00:08:21,165 --> 00:08:24,440 no, September of last year, saying that as of October 135 00:08:24,440 --> 00:08:28,160 of this year, if you get arrested in California, 136 00:08:28,160 --> 00:08:31,040 the decision of whether you get bail or not 137 00:08:31,040 --> 00:08:33,289 is going to be made by a computer algorithm, 138 00:08:33,289 --> 00:08:36,059 not by a human being, OK? 139 00:08:36,059 --> 00:08:41,169 So it's not 100%. 140 00:08:41,169 --> 00:08:46,970 There is some discretion on the part of this county official 141 00:08:46,970 --> 00:08:50,090 who will make a recommendation, and the judge ultimately 142 00:08:50,090 --> 00:08:53,990 decides, but I suspect that until there 143 00:08:53,990 --> 00:08:58,520 are some egregious outcomes from doing this, 144 00:08:58,520 --> 00:09:02,420 it will probably be quite commonly used. 145 00:09:05,970 --> 00:09:10,230 Now, the critique of these bail algorithms 146 00:09:10,230 --> 00:09:14,500 is based on a number of different factors. 147 00:09:14,500 --> 00:09:22,960 One is that the algorithms reflect a severe racial bias. 148 00:09:22,960 --> 00:09:28,620 So for example, if you are two identical people but one of you 149 00:09:28,620 --> 00:09:32,680 happens to be white and one of you happens to be black, 150 00:09:32,680 --> 00:09:34,740 the chances of you getting bail are 151 00:09:34,740 --> 00:09:39,170 much lower if you're black than if you're white. 152 00:09:39,170 --> 00:09:41,180 Now, you say, well, how could that 153 00:09:41,180 --> 00:09:44,450 be given that we're learning this algorithmically? 154 00:09:44,450 --> 00:09:46,970 Well, it's a complicated feedback loop 155 00:09:46,970 --> 00:09:51,290 because the algorithm is learning from historical data, 156 00:09:51,290 --> 00:09:56,420 and if historically, judges have been less likely to grant bail 157 00:09:56,420 --> 00:10:01,730 to an African-American than to a Caucasian-American, 158 00:10:01,730 --> 00:10:04,970 then the algorithm will learn that that's the right thing 159 00:10:04,970 --> 00:10:07,850 to do and will nicely incorporate 160 00:10:07,850 --> 00:10:11,030 exactly that prejudice. 161 00:10:11,030 --> 00:10:13,040 And then the second problem, which 162 00:10:13,040 --> 00:10:15,890 I consider to be really horrendous, 163 00:10:15,890 --> 00:10:18,860 is that in this particular field, 164 00:10:18,860 --> 00:10:21,230 the algorithms are developed privately 165 00:10:21,230 --> 00:10:24,980 by private companies which will not tell you 166 00:10:24,980 --> 00:10:27,560 what their algorithm is. 167 00:10:27,560 --> 00:10:31,890 You can just pay them and they will tell you the answer, 168 00:10:31,890 --> 00:10:33,800 but they won't tell you how they compute it. 169 00:10:33,800 --> 00:10:35,390 They won't tell you what data they 170 00:10:35,390 --> 00:10:37,740 used to train the algorithm. 171 00:10:37,740 --> 00:10:39,725 And so it's really a black box. 172 00:10:39,725 --> 00:10:43,190 And so you have no idea what's going on in that box 173 00:10:43,190 --> 00:10:45,830 other than by looking at its decisions. 174 00:10:49,690 --> 00:10:51,600 And so the data collection system 175 00:10:51,600 --> 00:10:54,810 is flawed in the same way as the judicial system itself. 176 00:10:59,320 --> 00:11:03,030 So not only are there algorithms that 177 00:11:03,030 --> 00:11:04,890 decide whether you get bail or not, 178 00:11:04,890 --> 00:11:08,220 which is after all a relatively temporary question 179 00:11:08,220 --> 00:11:10,110 until your trial comes up, although that 180 00:11:10,110 --> 00:11:12,660 may be a long time, but there are also 181 00:11:12,660 --> 00:11:17,140 algorithms that advise on things like sentencing. 182 00:11:17,140 --> 00:11:21,150 So they say, how likely is this patient to be a recidivist? 183 00:11:21,150 --> 00:11:23,610 Somebody who, when they get out of jail, 184 00:11:23,610 --> 00:11:25,650 they're going to offend again. 185 00:11:25,650 --> 00:11:28,083 And therefore, they deserve a longer jail sentence 186 00:11:28,083 --> 00:11:30,000 because you want to keep them off the streets. 187 00:11:34,440 --> 00:11:38,730 Well, so this is a particular story about a particular person 188 00:11:38,730 --> 00:11:43,980 in Wisconsin, and shockingly, the state Supreme Court 189 00:11:43,980 --> 00:11:47,010 ruled against this guy, saying that knowledge 190 00:11:47,010 --> 00:11:50,610 of the algorithm's output was a sufficient level 191 00:11:50,610 --> 00:11:55,560 of transparency in order to not violate his rights, 192 00:11:55,560 --> 00:11:58,140 which I think many people consider to be 193 00:11:58,140 --> 00:12:01,230 kind of an outrageous decision. 194 00:12:01,230 --> 00:12:05,280 I'm sure it'll be appealed and maybe overturned. 195 00:12:05,280 --> 00:12:11,070 Conversely-- I keep doing on the one hand and on the other-- 196 00:12:11,070 --> 00:12:13,820 algorithms could help keep people out of jail. 197 00:12:13,820 --> 00:12:18,450 So there's a Wired article not long ago 198 00:12:18,450 --> 00:12:26,490 that says we can use algorithms to analyze people's cases 199 00:12:26,490 --> 00:12:30,630 and say, oh, this person looks like they're really 200 00:12:30,630 --> 00:12:34,110 in need of psychiatric help rather than in need of jail 201 00:12:34,110 --> 00:12:36,810 time, and so perhaps we can divert him 202 00:12:36,810 --> 00:12:43,110 from the penal system into psychiatric care 203 00:12:43,110 --> 00:12:47,520 and keep him out of prison and get him help and so on. 204 00:12:47,520 --> 00:12:50,280 So that's the positive side of being 205 00:12:50,280 --> 00:12:53,730 able to use these kinds of algorithms. 206 00:12:53,730 --> 00:12:56,310 Now, it's not only in criminality. 207 00:12:56,310 --> 00:12:59,170 There is also a long discussion-- 208 00:12:59,170 --> 00:13:01,470 you can find this all over the web-- 209 00:13:01,470 --> 00:13:03,660 of, for example, can an algorithm 210 00:13:03,660 --> 00:13:06,660 hire better than a human being. 211 00:13:06,660 --> 00:13:10,340 So if you're a big company and you have a lot of people 212 00:13:10,340 --> 00:13:13,610 that you're trying to hire for various jobs, 213 00:13:13,610 --> 00:13:16,220 it's very tempting to say, hey, I've 214 00:13:16,220 --> 00:13:19,160 made lots and lots of hiring decisions 215 00:13:19,160 --> 00:13:21,050 and we have some outcome data. 216 00:13:21,050 --> 00:13:24,500 I know which people have turned out to be good employees 217 00:13:24,500 --> 00:13:27,590 and which people have turned out to be bad employees, 218 00:13:27,590 --> 00:13:34,400 and therefore, we can base a first-cut screening method 219 00:13:34,400 --> 00:13:38,390 on learning such an algorithm and using it 220 00:13:38,390 --> 00:13:43,130 on people who apply for jobs and say, OK, these are the ones 221 00:13:43,130 --> 00:13:46,400 that we're going to interview and maybe hire because they 222 00:13:46,400 --> 00:13:48,470 look like they're a better bet. 223 00:13:48,470 --> 00:13:51,570 Now, I have to tell you a personal story. 224 00:13:51,570 --> 00:13:54,730 When I was an undergraduate at Caltech, 225 00:13:54,730 --> 00:13:57,410 the Caltech faculty decided that they 226 00:13:57,410 --> 00:14:01,010 wanted to include student members of all the faculty 227 00:14:01,010 --> 00:14:02,240 committees. 228 00:14:02,240 --> 00:14:04,610 And so I was lucky enough that I served 229 00:14:04,610 --> 00:14:07,760 for three years as a member of the Undergraduate Admissions 230 00:14:07,760 --> 00:14:10,010 Committee at Caltech. 231 00:14:10,010 --> 00:14:14,550 And in those days, Caltech only took about 220, 232 00:14:14,550 --> 00:14:16,670 230 students a year. 233 00:14:16,670 --> 00:14:18,530 It's a very small school. 234 00:14:18,530 --> 00:14:22,490 And we would actually fly around the country and interview 235 00:14:22,490 --> 00:14:25,790 about the top half of all the applicants in the applicant 236 00:14:25,790 --> 00:14:26,570 pool. 237 00:14:26,570 --> 00:14:28,790 So we would talk not only to the students 238 00:14:28,790 --> 00:14:31,970 but also to their teachers and their counselors 239 00:14:31,970 --> 00:14:34,880 and see what the environment was like, 240 00:14:34,880 --> 00:14:38,120 and I think we got a very good sense of how good a student was 241 00:14:38,120 --> 00:14:40,790 likely to be based on that. 242 00:14:40,790 --> 00:14:47,090 So one day, after the admissions decisions have been made, 243 00:14:47,090 --> 00:14:52,160 one of the professors, kind of as a thought experiment, 244 00:14:52,160 --> 00:14:55,280 said here's what we ought to do. 245 00:14:55,280 --> 00:14:58,310 We ought to take the 230 people that we've just 246 00:14:58,310 --> 00:15:01,700 offered admission to and we should reject them all 247 00:15:01,700 --> 00:15:05,510 and take the next 230 people, and then 248 00:15:05,510 --> 00:15:08,420 see whether the faculty notices. 249 00:15:08,420 --> 00:15:12,770 Because it seemed like a fairly flat distribution. 250 00:15:12,770 --> 00:15:16,250 Now, of course, I and others argued 251 00:15:16,250 --> 00:15:19,010 that this would be unfair and unethical 252 00:15:19,010 --> 00:15:22,340 and would be a waste of all the time 253 00:15:22,340 --> 00:15:24,890 that we had put into selecting these people, 254 00:15:24,890 --> 00:15:26,330 so we didn't do that. 255 00:15:26,330 --> 00:15:29,330 But then this guy went out and he 256 00:15:29,330 --> 00:15:33,590 looked at the data we had on people's ranking class, 257 00:15:33,590 --> 00:15:39,320 SAT scores, grade point average, the checkmarks 258 00:15:39,320 --> 00:15:41,870 on their recommendation letters about whether they 259 00:15:41,870 --> 00:15:45,050 were truly exceptional or merely outstanding. 260 00:15:48,050 --> 00:15:54,110 And he built a linear regression model 261 00:15:54,110 --> 00:15:57,530 that predicted the person's sophomore level grade point 262 00:15:57,530 --> 00:16:00,170 average, which seemed like a reasonable thing 263 00:16:00,170 --> 00:16:02,300 to try to predict. 264 00:16:02,300 --> 00:16:04,880 And he got a reasonably good fit, 265 00:16:04,880 --> 00:16:06,920 but what was disturbing about it is 266 00:16:06,920 --> 00:16:14,120 that in the Caltech population of students, 267 00:16:14,120 --> 00:16:22,190 it turned out that the beta for your SAT English performance 268 00:16:22,190 --> 00:16:25,460 was negative. 269 00:16:25,460 --> 00:16:30,670 So if you did particularly well in English on the SAT, 270 00:16:30,670 --> 00:16:34,600 you were likely to do worse as a sophomore at Caltech 271 00:16:34,600 --> 00:16:37,070 than if you didn't do as well. 272 00:16:37,070 --> 00:16:39,040 And so we thought about that a lot, 273 00:16:39,040 --> 00:16:40,900 and of course, we decided that that 274 00:16:40,900 --> 00:16:43,780 would be really unfair to penalize somebody 275 00:16:43,780 --> 00:16:47,440 for being good at something, especially when the school had 276 00:16:47,440 --> 00:16:50,410 this philosophical orientation that 277 00:16:50,410 --> 00:16:55,520 said we ought to look for people with broad educations. 278 00:16:55,520 --> 00:16:59,110 So that's just an example. 279 00:16:59,110 --> 00:17:02,230 And more, Science Friday had a nice show 280 00:17:02,230 --> 00:17:06,050 that you can listen to about this issue. 281 00:17:06,050 --> 00:17:11,040 So let me ask you, what do you mean by fairness? 282 00:17:11,040 --> 00:17:21,460 If we're going to define the concept, what is fair? 283 00:17:26,869 --> 00:17:28,760 What characteristics would you like 284 00:17:28,760 --> 00:17:31,430 to have an algorithm have that judges you 285 00:17:31,430 --> 00:17:33,990 for some particular purpose? 286 00:17:33,990 --> 00:17:34,990 Yeah? 287 00:17:34,990 --> 00:17:37,430 AUDIENCE: It's impossible to pin down sort of, at least 288 00:17:37,430 --> 00:17:39,620 might in my opinion, one specific definition, 289 00:17:39,620 --> 00:17:42,200 but for the pre-trial success rate for example, 290 00:17:42,200 --> 00:17:46,010 I think having the error rates be similar across populations, 291 00:17:46,010 --> 00:17:48,680 across the covariants you might care about, for example, 292 00:17:48,680 --> 00:17:51,430 fairness, I think is a good start. 293 00:17:57,490 --> 00:18:00,940 PETER SZOLOVITS: OK, so similar error rates is definitely 294 00:18:00,940 --> 00:18:06,400 one of the criteria that people use in talking about fairness. 295 00:18:06,400 --> 00:18:08,880 And you'll see later Irene-- 296 00:18:08,880 --> 00:18:09,640 where's Irene? 297 00:18:09,640 --> 00:18:11,050 Right there. 298 00:18:11,050 --> 00:18:15,265 Irene is a master of that notion of fairness. 299 00:18:17,960 --> 00:18:18,460 Yeah? 300 00:18:18,460 --> 00:18:22,030 AUDIENCE: When the model says some sort of observation that 301 00:18:22,030 --> 00:18:24,010 causally shouldn't be true, and what 302 00:18:24,010 --> 00:18:28,240 I want society to look like 303 00:18:28,240 --> 00:18:30,430 PETER SZOLOVITS: So I'm not sure how 304 00:18:30,430 --> 00:18:32,170 to capture that in a short phrase. 305 00:18:36,730 --> 00:18:40,060 Societal goals. 306 00:18:43,490 --> 00:18:45,230 But that's tricky, right? 307 00:18:45,230 --> 00:18:48,830 I mean, suppose that I would like it to be the case 308 00:18:48,830 --> 00:18:52,400 that the fraction of people of different ethnicity 309 00:18:52,400 --> 00:18:56,650 who are criminals should be the same. 310 00:18:56,650 --> 00:18:58,915 That seems like a good goal for fairness. 311 00:19:02,570 --> 00:19:04,420 How do I achieve that? 312 00:19:04,420 --> 00:19:07,120 I mean, I could pretend that it's the same, 313 00:19:07,120 --> 00:19:10,600 but it isn't the same today objectively, 314 00:19:10,600 --> 00:19:12,640 and the data wouldn't support that. 315 00:19:12,640 --> 00:19:15,270 So that's an issue. 316 00:19:15,270 --> 00:19:17,070 Yeah? 317 00:19:17,070 --> 00:19:20,305 AUDIENCE: People who are similar should be treated similarly, 318 00:19:20,305 --> 00:19:25,540 so engaged sort of independent of the [INAUDIBLE] 319 00:19:25,540 --> 00:19:29,070 attributes or independent of your covariate. 320 00:19:29,070 --> 00:19:31,020 PETER SZOLOVITS: Similar people should 321 00:19:31,020 --> 00:19:36,240 lead to similar treatment. 322 00:19:36,240 --> 00:19:39,450 Yeah, I like that. 323 00:19:39,450 --> 00:19:41,770 AUDIENCE: I didn't make it up. 324 00:19:41,770 --> 00:19:43,530 PETER SZOLOVITS: I know. 325 00:19:43,530 --> 00:19:47,230 It's another of the classic sort of notions of fairness. 326 00:19:50,710 --> 00:19:56,350 That puts a lot of weight on the distance function, right? 327 00:19:56,350 --> 00:19:59,670 In what way are to people similar? 328 00:19:59,670 --> 00:20:01,650 And what characteristics-- you obviously 329 00:20:01,650 --> 00:20:04,780 don't want to use the sensitive characteristics, 330 00:20:04,780 --> 00:20:08,700 the forbidden characteristics in order to decide similarity, 331 00:20:08,700 --> 00:20:11,940 because then people will be dissimilar in ways 332 00:20:11,940 --> 00:20:15,480 that you don't want, but defining that function 333 00:20:15,480 --> 00:20:19,080 is a challenge. 334 00:20:19,080 --> 00:20:25,410 All right, well, let me show you a more technical approach 335 00:20:25,410 --> 00:20:27,630 to thinking about this. 336 00:20:27,630 --> 00:20:31,470 So we all know about biases like selection bias, sampling bias, 337 00:20:31,470 --> 00:20:33,480 reporting bias, et cetera. 338 00:20:33,480 --> 00:20:37,410 These are in the conventional sense of the term bias. 339 00:20:37,410 --> 00:20:41,790 But I'll show you an example that I got involved in. 340 00:20:41,790 --> 00:20:50,460 Raj Manrai was a MIT Harvard HST student, 341 00:20:50,460 --> 00:20:57,240 and he started looking at the question of the genetics that 342 00:20:57,240 --> 00:21:00,960 was used in order to determine whether somebody 343 00:21:00,960 --> 00:21:07,170 is at risk for cardiomyopathy, hypertrophic cardiomyopathy. 344 00:21:07,170 --> 00:21:08,080 That's a big word. 345 00:21:08,080 --> 00:21:10,620 It means that your heart gets too big 346 00:21:10,620 --> 00:21:15,480 and it becomes sort of flabby and it stops pumping well, 347 00:21:15,480 --> 00:21:19,080 and eventually, you die of this disease at a relatively young 348 00:21:19,080 --> 00:21:20,670 age, if, in fact, you have it. 349 00:21:24,810 --> 00:21:29,890 So what happened is that there was 350 00:21:29,890 --> 00:21:34,920 a study that was done mostly with European populations 351 00:21:34,920 --> 00:21:38,910 where they discovered that a lot of people who had this disease 352 00:21:38,910 --> 00:21:42,130 had a certain genetic variant. 353 00:21:42,130 --> 00:21:46,400 And they said, well, that must be the cause of this disease, 354 00:21:46,400 --> 00:21:49,720 and so it became accepted wisdom that if you 355 00:21:49,720 --> 00:21:55,540 had that genetic variant, people would counsel you to not plan 356 00:21:55,540 --> 00:21:57,010 on living a long life. 357 00:21:57,010 --> 00:22:01,420 And this has all kinds of consequences. 358 00:22:01,420 --> 00:22:03,700 Imagine if you're thinking about having 359 00:22:03,700 --> 00:22:06,850 a kid when you're in your early 40s, 360 00:22:06,850 --> 00:22:10,000 and your life expectancy is 55. 361 00:22:10,000 --> 00:22:12,370 Would you want to die when you have a teenager 362 00:22:12,370 --> 00:22:14,300 that you leave to your spouse? 363 00:22:14,300 --> 00:22:16,720 So this was a consequential set of decisions 364 00:22:16,720 --> 00:22:20,080 that people have to make. 365 00:22:20,080 --> 00:22:23,800 Now, what happened is that in the US, 366 00:22:23,800 --> 00:22:27,850 there were tests of this sort done, 367 00:22:27,850 --> 00:22:32,290 but the problem was that a lot of African and African-American 368 00:22:32,290 --> 00:22:37,810 populations turned out to have this genetic variant frequently 369 00:22:37,810 --> 00:22:41,050 without developing this terrible disease, 370 00:22:41,050 --> 00:22:46,730 but they were all told that they were going to die, basically. 371 00:22:46,730 --> 00:22:49,810 And it was only after years when people 372 00:22:49,810 --> 00:22:51,910 noticed that these people who were supposed 373 00:22:51,910 --> 00:22:57,220 to die genetically weren't dying that they said, 374 00:22:57,220 --> 00:23:00,520 maybe we misunderstood something. 375 00:23:00,520 --> 00:23:04,090 And what they misunderstood was that the population that 376 00:23:04,090 --> 00:23:09,550 was used to develop the model was a European ancestry 377 00:23:09,550 --> 00:23:14,620 population and not an African ancestry population. 378 00:23:14,620 --> 00:23:16,870 So you go, well, we must have learned that lesson. 379 00:23:16,870 --> 00:23:21,160 So this paper was published in 2016, 380 00:23:21,160 --> 00:23:25,380 and this was one of the first in this area. 381 00:23:25,380 --> 00:23:26,950 Here's a paper that was published 382 00:23:26,950 --> 00:23:32,860 three weeks ago in Nature Scientific Reports that says, 383 00:23:32,860 --> 00:23:35,830 genetic risk factors identified in populations 384 00:23:35,830 --> 00:23:39,580 of European descent do not improve the prediction 385 00:23:39,580 --> 00:23:43,420 of osteoporotic fracture and bone mineral density 386 00:23:43,420 --> 00:23:45,410 in Chinese populations. 387 00:23:45,410 --> 00:23:47,180 So it's the same story. 388 00:23:47,180 --> 00:23:48,940 It's exactly the same story. 389 00:23:48,940 --> 00:23:51,070 Different disease, the consequence 390 00:23:51,070 --> 00:23:54,070 is probably less dire because being 391 00:23:54,070 --> 00:23:57,040 told that you're going to break your bones when you're old 392 00:23:57,040 --> 00:23:59,590 is not as bad as being told that your heart's going 393 00:23:59,590 --> 00:24:05,590 to stop working when you're in your 50s, but there we have it. 394 00:24:05,590 --> 00:24:10,300 OK, so technically, where does bias come from? 395 00:24:10,300 --> 00:24:13,580 Well, I mentioned the standard sources, 396 00:24:13,580 --> 00:24:15,520 but here is an interesting analysis. 397 00:24:15,520 --> 00:24:18,670 This comes from Constantine Aliferis 398 00:24:18,670 --> 00:24:22,540 from a number of years ago, 2006, 399 00:24:22,540 --> 00:24:29,550 and he says, well, look, in a perfect world, 400 00:24:29,550 --> 00:24:32,180 if I give you a data set, there's 401 00:24:32,180 --> 00:24:35,240 an uncountably infinite number of models 402 00:24:35,240 --> 00:24:39,900 that might possibly explain the relationships in that data. 403 00:24:42,670 --> 00:24:47,710 I cannot enumerate an uncountable number of models, 404 00:24:47,710 --> 00:24:51,340 and so what I'm going to do is choose some family of models 405 00:24:51,340 --> 00:24:55,450 to try to fit, and then I'm going to use some fitting 406 00:24:55,450 --> 00:25:00,400 technique, like stochastic gradient descent or something, 407 00:25:00,400 --> 00:25:05,440 that will find maybe a global optimum, but maybe not. 408 00:25:05,440 --> 00:25:08,890 Maybe it'll find the local optimum. 409 00:25:08,890 --> 00:25:11,370 And then there is noise. 410 00:25:11,370 --> 00:25:15,150 And so his observation is that if you 411 00:25:15,150 --> 00:25:20,610 count O as the optimal possible model over all possible model 412 00:25:20,610 --> 00:25:25,140 families, and if you count L as the best model that's 413 00:25:25,140 --> 00:25:29,580 learnable by a particular learning mechanism, 414 00:25:29,580 --> 00:25:34,800 and you call A the actual model that's learned, 415 00:25:34,800 --> 00:25:38,520 then the bias is essentially O minus L, 416 00:25:38,520 --> 00:25:41,910 so its limitation of learning method 417 00:25:41,910 --> 00:25:45,300 related to the target model. 418 00:25:45,300 --> 00:25:49,140 The variance is like L minus A, it's 419 00:25:49,140 --> 00:25:52,920 the error that's due to the particular way in which you 420 00:25:52,920 --> 00:25:57,550 learned things, like sampling and so on, 421 00:25:57,550 --> 00:26:02,130 and you can estimate the significance of differences 422 00:26:02,130 --> 00:26:05,720 between different models by just permuting the data, 423 00:26:05,720 --> 00:26:09,480 randomizing, essentially, the relationships in the data. 424 00:26:09,480 --> 00:26:13,740 And then you get a curve of performance of those models, 425 00:26:13,740 --> 00:26:19,180 and if yours lies outside the 95% confidence interval, 426 00:26:19,180 --> 00:26:22,710 then you have a P equal 0.05 result 427 00:26:22,710 --> 00:26:26,950 that this model is not random. 428 00:26:26,950 --> 00:26:29,550 So that's the typical way of going about this. 429 00:26:34,170 --> 00:26:42,020 Now, you might say, but isn't discrimination the very reason 430 00:26:42,020 --> 00:26:45,080 we do machine learning? 431 00:26:45,080 --> 00:26:47,270 Not discrimination in the legal sense, 432 00:26:47,270 --> 00:26:50,210 but discrimination in the sense of separating 433 00:26:50,210 --> 00:26:52,230 different populations. 434 00:26:52,230 --> 00:26:55,610 And so you could say, well, yes, but some basis 435 00:26:55,610 --> 00:26:58,460 for differentiation are justified 436 00:26:58,460 --> 00:27:00,560 and some basis for differentiation 437 00:27:00,560 --> 00:27:02,750 are not justified. 438 00:27:02,750 --> 00:27:05,840 So they're either practically irrelevant, 439 00:27:05,840 --> 00:27:10,760 or we decide for societal goals that we 440 00:27:10,760 --> 00:27:12,950 want them to be irrelevant and we're not going 441 00:27:12,950 --> 00:27:16,430 to take them into account. 442 00:27:16,430 --> 00:27:21,470 So one lesson from people who have studied this for a while 443 00:27:21,470 --> 00:27:24,950 is that discrimination is domain specific. 444 00:27:24,950 --> 00:27:29,810 So you can't define a universal notion 445 00:27:29,810 --> 00:27:33,890 of what it means to discriminate because it's very much tied 446 00:27:33,890 --> 00:27:36,860 to these questions of what is practically 447 00:27:36,860 --> 00:27:41,120 and morally irrelevant in the decisions that you're making. 448 00:27:41,120 --> 00:27:43,830 And so it's going to be different in criminal law 449 00:27:43,830 --> 00:27:47,300 than it is in medicine, than it is in hiring, than it 450 00:27:47,300 --> 00:27:50,660 is in various other fields, college admissions, 451 00:27:50,660 --> 00:27:52,460 for example. 452 00:27:52,460 --> 00:27:54,740 And it's feature-specific as well, 453 00:27:54,740 --> 00:27:57,200 so you have to take the individual features 454 00:27:57,200 --> 00:28:00,170 into account. 455 00:28:00,170 --> 00:28:02,420 Well, historically, the government 456 00:28:02,420 --> 00:28:06,290 has tried to regulate these domains, 457 00:28:06,290 --> 00:28:11,480 and so credit is regulated by the Equal Credit Opportunity 458 00:28:11,480 --> 00:28:14,750 Act, education by the Civil Rights 459 00:28:14,750 --> 00:28:18,860 Act and various amendments, employment by the Civil Rights 460 00:28:18,860 --> 00:28:21,590 Act, housing by the Fair Housing Act, 461 00:28:21,590 --> 00:28:24,980 public accommodation by the Civil Rights Act, 462 00:28:24,980 --> 00:28:30,110 more recently, marriage is regulated originally 463 00:28:30,110 --> 00:28:32,600 by the Defense of Marriage Act, which 464 00:28:32,600 --> 00:28:34,790 as you might tell from its title, 465 00:28:34,790 --> 00:28:39,170 was against things like people being able to marry who 466 00:28:39,170 --> 00:28:45,620 were not a traditional marriage that they wanted to defend, 467 00:28:45,620 --> 00:28:49,160 but it was struck down by the Supreme Court 468 00:28:49,160 --> 00:28:53,990 about six years ago as being discriminatory. 469 00:28:53,990 --> 00:28:57,560 It's interesting, if you look back to probably before you 470 00:28:57,560 --> 00:29:02,400 guys were born in 1967, until 1967, 471 00:29:02,400 --> 00:29:07,010 it was illegal for an African-American and a white 472 00:29:07,010 --> 00:29:10,140 to marry each other in Virginia. 473 00:29:10,140 --> 00:29:11,850 It was literally illegal. 474 00:29:11,850 --> 00:29:17,100 If you went to get a marriage license, you were denied, 475 00:29:17,100 --> 00:29:19,920 and if you got married out of state and came back, 476 00:29:19,920 --> 00:29:22,620 you could be arrested. 477 00:29:22,620 --> 00:29:24,720 This happened much later. 478 00:29:24,720 --> 00:29:28,440 Trevor Noah, if you know him from The Daily Show, 479 00:29:28,440 --> 00:29:31,680 wrote a book called Born a Crime, 480 00:29:31,680 --> 00:29:35,550 I think, and his father is white Swiss guy 481 00:29:35,550 --> 00:29:39,780 and his mother is a South African black, 482 00:29:39,780 --> 00:29:43,110 and so it was literally illegal for him 483 00:29:43,110 --> 00:29:48,270 to exist under the apartheid laws that they had. 484 00:29:48,270 --> 00:29:51,780 He had to pretend to be-- 485 00:29:51,780 --> 00:29:56,490 his mother was his caretaker rather than his mother 486 00:29:56,490 --> 00:29:58,650 in order to be able to go out in public, 487 00:29:58,650 --> 00:30:02,070 because otherwise, they would get arrested. 488 00:30:02,070 --> 00:30:05,220 So this has recently, of course, also disappeared, 489 00:30:05,220 --> 00:30:10,430 but these are some of the regulatory issues. 490 00:30:10,430 --> 00:30:15,130 So here are some of the legally recognized protected classes, 491 00:30:15,130 --> 00:30:18,880 race, color, sex, religion, national origin, citizenship, 492 00:30:18,880 --> 00:30:23,570 age, pregnancy, familial status, disability, veteran status, 493 00:30:23,570 --> 00:30:27,160 and more recently, sexual orientation 494 00:30:27,160 --> 00:30:30,610 in certain jurisdictions, but not everywhere 495 00:30:30,610 --> 00:30:31,480 around the country. 496 00:30:34,880 --> 00:30:39,320 OK, so given those examples, there 497 00:30:39,320 --> 00:30:44,330 are two legal doctrines about discrimination, and one of them 498 00:30:44,330 --> 00:30:47,240 talks about disparate treatment, which 499 00:30:47,240 --> 00:30:52,540 is sort of related to this one. 500 00:30:52,540 --> 00:30:55,500 And the other talks about disparate impact 501 00:30:55,500 --> 00:30:58,500 and says, no matter what the mechanism 502 00:30:58,500 --> 00:31:02,670 is, if the outcome is very different for different racial 503 00:31:02,670 --> 00:31:07,030 groups typically or gender groups, then there 504 00:31:07,030 --> 00:31:10,700 is prima facie evidence that there is something not right, 505 00:31:10,700 --> 00:31:14,710 that there is some sort of discrimination. 506 00:31:14,710 --> 00:31:21,790 Now, the problem is, how do you defend yourself against, 507 00:31:21,790 --> 00:31:24,510 for example, a disparate impact argument? 508 00:31:24,510 --> 00:31:31,590 Well, you say, in order to be disparate impact that's 509 00:31:31,590 --> 00:31:36,310 illegal, it has to be unjustified or avoidable. 510 00:31:36,310 --> 00:31:39,240 So for example, suppose I'm trying 511 00:31:39,240 --> 00:31:45,330 to hire people to climb 50-story buildings that 512 00:31:45,330 --> 00:31:49,410 are under construction, and you apply, but it turns out 513 00:31:49,410 --> 00:31:51,450 you have a medical condition which 514 00:31:51,450 --> 00:31:55,610 is that you get dizzy at times, I 515 00:31:55,610 --> 00:31:59,180 might say, you know what, I don't want to hire you, 516 00:31:59,180 --> 00:32:01,070 because I don't want you plopping off 517 00:32:01,070 --> 00:32:05,060 the 50th floor of a building that's under construction, 518 00:32:05,060 --> 00:32:07,880 and that's probably a reasonable defense. 519 00:32:07,880 --> 00:32:10,530 If I brought suit against you and said, 520 00:32:10,530 --> 00:32:12,920 hey, you're discriminating against me 521 00:32:12,920 --> 00:32:16,430 on the basis of this medical disability, 522 00:32:16,430 --> 00:32:21,050 a perfectly good defense is, yeah, it's true, 523 00:32:21,050 --> 00:32:24,030 but it's relevant to the job. 524 00:32:24,030 --> 00:32:26,780 So that's one way of dealing with it. 525 00:32:26,780 --> 00:32:30,170 Now, how do you demonstrate disparate impact? 526 00:32:30,170 --> 00:32:32,510 Well, the court has decided that you 527 00:32:32,510 --> 00:32:36,230 need to be able to show about a 20% difference 528 00:32:36,230 --> 00:32:38,600 in order to call something disparate impact. 529 00:32:44,110 --> 00:32:48,120 So the question, of course, is can we change our hiring 530 00:32:48,120 --> 00:32:51,840 policies or whatever policies we're using in order 531 00:32:51,840 --> 00:32:54,930 to achieve the same goals, but with less 532 00:32:54,930 --> 00:32:58,140 of a disparity in the impact. 533 00:32:58,140 --> 00:32:59,310 So that's the challenge. 534 00:33:03,220 --> 00:33:06,640 Now, what's interesting is that disparate treatment 535 00:33:06,640 --> 00:33:12,490 and disparate impact are really in conflict with each other. 536 00:33:12,490 --> 00:33:15,280 And you'll find that this is true in almost 537 00:33:15,280 --> 00:33:17,930 everything in this domain. 538 00:33:17,930 --> 00:33:22,330 So disparate impact is about distributive justice 539 00:33:22,330 --> 00:33:25,240 and minimizing equality of outcome. 540 00:33:25,240 --> 00:33:28,120 Disparate treatment is about procedural fairness 541 00:33:28,120 --> 00:33:32,170 and equality of opportunity, and those don't always mesh. 542 00:33:32,170 --> 00:33:36,790 In other words, it may well be that equality of opportunity 543 00:33:36,790 --> 00:33:40,930 still leads to differences in outcome, 544 00:33:40,930 --> 00:33:43,525 and you can't square that circle easily. 545 00:33:50,670 --> 00:33:53,520 Well, there's a lot of discrimination 546 00:33:53,520 --> 00:33:54,750 that keeps persisting. 547 00:33:54,750 --> 00:33:57,900 There's plenty of evidence in the literature. 548 00:33:57,900 --> 00:34:03,400 And one of the problems is that, for example, 549 00:34:03,400 --> 00:34:11,610 take an issue like the disparity between different races 550 00:34:11,610 --> 00:34:13,650 or different ethnicities. 551 00:34:13,650 --> 00:34:16,650 It turns out that we don't have a nicely balanced 552 00:34:16,650 --> 00:34:21,870 set where the number of people of European descent 553 00:34:21,870 --> 00:34:25,560 is equal to the number of people of African-American, 554 00:34:25,560 --> 00:34:27,750 or Hispanic, or Asian, or whatever 555 00:34:27,750 --> 00:34:31,380 population you choose descent, and therefore, we 556 00:34:31,380 --> 00:34:35,250 tend to know a lot more about the majority class 557 00:34:35,250 --> 00:34:38,130 than we know about these minority classes, 558 00:34:38,130 --> 00:34:41,820 and just that additional data and that additional knowledge 559 00:34:41,820 --> 00:34:46,110 might mean that we're able to reduce the error rate simply 560 00:34:46,110 --> 00:34:49,980 because we have a larger sample size. 561 00:34:49,980 --> 00:34:53,790 OK, so if you want to formalize this, 562 00:34:53,790 --> 00:34:59,130 this is Moritz Hardt's part of the tutorial 563 00:34:59,130 --> 00:35:03,420 that I'm stealing from in this talk. 564 00:35:06,060 --> 00:35:11,430 This was given at KDD about a year and a half ago, I think. 565 00:35:11,430 --> 00:35:14,610 And Moritz is a professor at Berkeley 566 00:35:14,610 --> 00:35:18,150 who actually teaches an entire semester-long course 567 00:35:18,150 --> 00:35:21,840 on fairness in machine learning, so there's a lot of material 568 00:35:21,840 --> 00:35:23,190 here. 569 00:35:23,190 --> 00:35:26,470 And so he formalizes the problem this way. 570 00:35:26,470 --> 00:35:33,920 He says, look, a decision problem, a model, in our terms, 571 00:35:33,920 --> 00:35:37,220 is that we have some X, which is the set of features 572 00:35:37,220 --> 00:35:41,960 we know about an individual, and we have some said A, which 573 00:35:41,960 --> 00:35:44,390 is the set of protected features, 574 00:35:44,390 --> 00:35:48,050 like your race, or your gender, or your age, 575 00:35:48,050 --> 00:35:51,800 or whatever it is we're trying to prevent from discriminating 576 00:35:51,800 --> 00:35:56,210 on, and then we have either a classifier or some score 577 00:35:56,210 --> 00:36:02,090 or predictive function that's a function of X and A 578 00:36:02,090 --> 00:36:06,290 in either case, and then we have some Y, which is the outcome 579 00:36:06,290 --> 00:36:09,530 that we're interested in predicting. 580 00:36:09,530 --> 00:36:13,970 So now you can begin to tease apart some different notions 581 00:36:13,970 --> 00:36:17,210 of fairness by looking at the relationships 582 00:36:17,210 --> 00:36:19,520 between these elements. 583 00:36:19,520 --> 00:36:23,520 So there are three criteria that appear in the literature. 584 00:36:23,520 --> 00:36:25,670 One of them is the notion of independence 585 00:36:25,670 --> 00:36:29,270 of the scoring function from sensitive attributes. 586 00:36:29,270 --> 00:36:34,430 So this says that R is independent from A. Remember, 587 00:36:34,430 --> 00:36:40,100 on the previous slide, I said that R is a function of-- 588 00:36:40,100 --> 00:36:41,380 oops. 589 00:36:41,380 --> 00:36:46,760 R is a function of X and A, so obviously, that criterion 590 00:36:46,760 --> 00:36:53,530 says that it can't be a function of A. Null function. 591 00:36:53,530 --> 00:36:56,470 Another notion is separation of score 592 00:36:56,470 --> 00:37:00,050 and the sensitive attribute given the outcome. 593 00:37:00,050 --> 00:37:03,580 So this is the one that says the different groups are 594 00:37:03,580 --> 00:37:05,710 going to be treated similarly. 595 00:37:05,710 --> 00:37:11,530 In other words, if I tell you the group, the outcome, 596 00:37:11,530 --> 00:37:13,390 the people who did well at the job 597 00:37:13,390 --> 00:37:16,420 and the people who did poorly at the job, 598 00:37:16,420 --> 00:37:21,850 then the scoring function is independent of the protected 599 00:37:21,850 --> 00:37:24,660 attribute. 600 00:37:24,660 --> 00:37:27,660 So that allows a little more wiggle room 601 00:37:27,660 --> 00:37:30,810 because it says that the protected attribute can still 602 00:37:30,810 --> 00:37:34,120 predict something about the outcome, 603 00:37:34,120 --> 00:37:38,940 it's just that you can't use it in the scoring function given 604 00:37:38,940 --> 00:37:41,700 the category of which outcome category 605 00:37:41,700 --> 00:37:43,830 that individual belongs to. 606 00:37:43,830 --> 00:37:46,980 And then sufficiency is the inverse of that. 607 00:37:46,980 --> 00:37:50,370 It says that given the scoring function, 608 00:37:50,370 --> 00:37:54,420 the outcome is independent of the protected attribute. 609 00:37:54,420 --> 00:37:57,690 So that says, can we build a fair scoring 610 00:37:57,690 --> 00:38:02,280 function that separates the outcome from the protected 611 00:38:02,280 --> 00:38:04,640 attribute? 612 00:38:04,640 --> 00:38:06,470 So here's some detail on those. 613 00:38:06,470 --> 00:38:08,750 If you look at independence-- 614 00:38:08,750 --> 00:38:12,380 this is also called by various other names-- 615 00:38:12,380 --> 00:38:16,040 basically, what it says is that the probability 616 00:38:16,040 --> 00:38:20,150 of a particular result, R equal 1, 617 00:38:20,150 --> 00:38:23,180 is the same whether you're in class A or class 618 00:38:23,180 --> 00:38:28,880 B in the protected attribute. 619 00:38:28,880 --> 00:38:30,020 So what does that tell you? 620 00:38:30,020 --> 00:38:32,780 That tells you that the scoring function has 621 00:38:32,780 --> 00:38:38,080 to be universal over the entire data set 622 00:38:38,080 --> 00:38:42,805 and has to not distinguish between people in class 623 00:38:42,805 --> 00:38:49,920 A versus class B. That's a pretty strong requirement. 624 00:38:49,920 --> 00:38:55,770 And then you can operationalize the notion of unfairness 625 00:38:55,770 --> 00:38:58,740 either by looking for an absolute difference 626 00:38:58,740 --> 00:39:00,930 between those probabilities. 627 00:39:00,930 --> 00:39:03,310 If it's greater than some epsilon, 628 00:39:03,310 --> 00:39:07,380 then you have evidence that this is not a fair scoring function, 629 00:39:07,380 --> 00:39:12,580 or a ratio test that says, we look at the ratio, 630 00:39:12,580 --> 00:39:16,020 and if it differs from 1 significantly, 631 00:39:16,020 --> 00:39:20,140 then you have evidence that this is an unfair scoring function. 632 00:39:20,140 --> 00:39:23,130 And by the way, this relates to the 4/5 rule, 633 00:39:23,130 --> 00:39:29,130 because if you make epsilon 20%, then that's 634 00:39:29,130 --> 00:39:32,460 the same as the 4/5 rule. 635 00:39:32,460 --> 00:39:36,300 Now, the problem-- there are problems with this notion 636 00:39:36,300 --> 00:39:37,900 of independence. 637 00:39:37,900 --> 00:39:42,090 So it only requires equal rates of decisions 638 00:39:42,090 --> 00:39:45,990 for hiring, or giving somebody a liver for transplant, 639 00:39:45,990 --> 00:39:49,840 or whatever topic you're interested in. 640 00:39:49,840 --> 00:39:54,900 And so what if hiring is based on a good score in group A, 641 00:39:54,900 --> 00:39:57,060 but random in B? 642 00:39:57,060 --> 00:40:01,860 So for example, what if we know a lot more information 643 00:40:01,860 --> 00:40:05,220 about group A than we do about group B, 644 00:40:05,220 --> 00:40:08,000 so we have a better way of scoring them 645 00:40:08,000 --> 00:40:12,480 than we do of scoring group B. So you might wind up 646 00:40:12,480 --> 00:40:15,630 with a situation where you wind up 647 00:40:15,630 --> 00:40:18,290 hiring the same number of people, 648 00:40:18,290 --> 00:40:22,800 the same ratio of people in both groups, but in one group, 649 00:40:22,800 --> 00:40:24,945 you've done a good job of selecting out 650 00:40:24,945 --> 00:40:27,510 the good candidates, and in the other group, 651 00:40:27,510 --> 00:40:31,290 you've essentially done it at random. 652 00:40:31,290 --> 00:40:35,130 Well, the outcomes are likely to be better for a group A 653 00:40:35,130 --> 00:40:38,940 than for group B, which means that you're developing more 654 00:40:38,940 --> 00:40:42,090 data for the future that says, we really 655 00:40:42,090 --> 00:40:44,430 ought to be hiring people in group A 656 00:40:44,430 --> 00:40:47,200 because they have better outcomes. 657 00:40:47,200 --> 00:40:50,170 So there's this feedback loop. 658 00:40:50,170 --> 00:40:52,600 Or alternatively-- well, of course, 659 00:40:52,600 --> 00:40:54,385 it could be caused by malice also. 660 00:40:57,250 --> 00:41:00,310 I could just decide as a hiring manager I'm not 661 00:41:00,310 --> 00:41:03,010 hiring enough African-Americans so I'm just 662 00:41:03,010 --> 00:41:06,280 going to take some random sample of African-Americans 663 00:41:06,280 --> 00:41:09,710 and hire them, and then maybe they'll do badly, 664 00:41:09,710 --> 00:41:12,010 and then I'll have more data to demonstrate 665 00:41:12,010 --> 00:41:14,490 that this was a bad idea. 666 00:41:14,490 --> 00:41:18,150 So that would be malicious. 667 00:41:18,150 --> 00:41:19,830 There's also a technical problem, 668 00:41:19,830 --> 00:41:24,450 which is it's possible that the category, the group 669 00:41:24,450 --> 00:41:27,480 is a perfect predictor of the outcome, in which 670 00:41:27,480 --> 00:41:29,760 case, of course, they can't be separated. 671 00:41:29,760 --> 00:41:34,820 They can't be independent of each other. 672 00:41:34,820 --> 00:41:37,540 Now, how do you achieve independence? 673 00:41:37,540 --> 00:41:39,920 Well, there are a number of different techniques. 674 00:41:39,920 --> 00:41:42,400 One of them is-- 675 00:41:42,400 --> 00:41:46,690 there's this article by Zemel about learning 676 00:41:46,690 --> 00:41:50,080 fair representations, and what it says 677 00:41:50,080 --> 00:41:55,810 is you create a new world representation, Z, 678 00:41:55,810 --> 00:41:59,845 which is some combination of X and A, 679 00:41:59,845 --> 00:42:04,290 and you do this by maximizing the mutual information 680 00:42:04,290 --> 00:42:07,800 between X and Z and by minimizing 681 00:42:07,800 --> 00:42:12,410 the mutual information between the A and Z. 682 00:42:12,410 --> 00:42:14,360 So this is an idea that I've seen 683 00:42:14,360 --> 00:42:20,150 used in machine learning for robustness rather 684 00:42:20,150 --> 00:42:23,450 than for fairness, where people say, 685 00:42:23,450 --> 00:42:26,210 the problem is that given a particular data set, 686 00:42:26,210 --> 00:42:30,470 you can overfit to that data set, and so one of the ideas 687 00:42:30,470 --> 00:42:34,700 is to do a Gann-like method where you say, 688 00:42:34,700 --> 00:42:37,790 I want to train my classifier, let's say, 689 00:42:37,790 --> 00:42:42,050 not only to work well on getting the right answer, 690 00:42:42,050 --> 00:42:47,680 but also to work as poorly as possible on identifying which 691 00:42:47,680 --> 00:42:51,190 data set my example came from. 692 00:42:51,190 --> 00:42:52,960 So this is the same sort of idea. 693 00:42:52,960 --> 00:42:55,480 It's a representation learning idea. 694 00:42:55,480 --> 00:42:58,780 And then you build your predictor, R, 695 00:42:58,780 --> 00:43:02,890 based on this representation, which is perhaps not 696 00:43:02,890 --> 00:43:06,940 perfectly independent of the protected attribute, 697 00:43:06,940 --> 00:43:09,920 but is as independent as possible. 698 00:43:09,920 --> 00:43:12,790 And usually, there are knobs in these learning algorithms, 699 00:43:12,790 --> 00:43:15,190 and depending on how you turn the knob, 700 00:43:15,190 --> 00:43:18,260 you can affect whether you're going 701 00:43:18,260 --> 00:43:21,790 to get a better classifier that's more discriminatory 702 00:43:21,790 --> 00:43:25,910 or a worse classifier that's less discriminatory. 703 00:43:25,910 --> 00:43:29,720 So you can do that in pre-processing. 704 00:43:29,720 --> 00:43:34,490 You can do some kind of incorporating in the loss 705 00:43:34,490 --> 00:43:40,880 function a dependence notion or an independence notion and say, 706 00:43:40,880 --> 00:43:44,120 we're going to train on a particular data set, 707 00:43:44,120 --> 00:43:48,260 imposing this notion of wanting this independence 708 00:43:48,260 --> 00:43:53,600 between A and R as part of our desiderata. 709 00:43:53,600 --> 00:43:56,180 And so you, again, are making trade-offs 710 00:43:56,180 --> 00:43:58,640 against other characteristics. 711 00:43:58,640 --> 00:44:00,590 Or you can do post-processing. 712 00:44:00,590 --> 00:44:05,600 So suppose I've built an optimal R, not worrying 713 00:44:05,600 --> 00:44:09,350 about discrimination, then I can do another learning 714 00:44:09,350 --> 00:44:13,850 problem that says I'm now going to build a new F, which takes 715 00:44:13,850 --> 00:44:17,920 R and the protected attribute into account, 716 00:44:17,920 --> 00:44:22,660 and it's going to minimize the cost of misclassifications. 717 00:44:22,660 --> 00:44:26,080 And again, there's a knob where you can say, how much do I 718 00:44:26,080 --> 00:44:28,900 want to emphasize misclassifications 719 00:44:28,900 --> 00:44:31,390 for the protected attribute or based 720 00:44:31,390 --> 00:44:33,225 on the protected attribute? 721 00:44:36,620 --> 00:44:41,000 So this was still talking about independence. 722 00:44:41,000 --> 00:44:45,890 The next notion is separation, that says given the outcome, 723 00:44:45,890 --> 00:44:52,190 I want to separate A and R. So that graphical model shows 724 00:44:52,190 --> 00:44:56,480 that the protected attribute is only related to the scoring 725 00:44:56,480 --> 00:45:00,150 function through the outcome. 726 00:45:00,150 --> 00:45:04,490 So there's nothing else that you can learn from one to the other 727 00:45:04,490 --> 00:45:07,530 than through the outcome. 728 00:45:07,530 --> 00:45:11,540 So this recognizes that the protected attribute 729 00:45:11,540 --> 00:45:14,150 may, in fact, be correlated with the target variable. 730 00:45:16,730 --> 00:45:20,120 An example might be different success rates 731 00:45:20,120 --> 00:45:24,330 in a drug trial for different ethnic populations. 732 00:45:24,330 --> 00:45:31,940 There are now some cardiac drugs where the manufacturer has 733 00:45:31,940 --> 00:45:35,210 determined that this drug works much 734 00:45:35,210 --> 00:45:38,180 better in certain subpopulations than it does 735 00:45:38,180 --> 00:45:40,760 in other populations, and the FDA 736 00:45:40,760 --> 00:45:44,420 has actually approved the marketing of that drug 737 00:45:44,420 --> 00:45:46,890 to those subpopulations. 738 00:45:46,890 --> 00:45:49,190 So you're not supposed to market it 739 00:45:49,190 --> 00:45:53,690 to the people for whom it doesn't work as well, 740 00:45:53,690 --> 00:45:56,030 but you're allowed to market it specifically 741 00:45:56,030 --> 00:45:58,890 for the people for whom it does work well. 742 00:45:58,890 --> 00:46:01,400 And if you think about the personalized medicine 743 00:46:01,400 --> 00:46:06,230 idea, which we've talked about earlier. 744 00:46:06,230 --> 00:46:08,450 The populations that we're interested in 745 00:46:08,450 --> 00:46:12,770 becomes smaller and smaller until it may just be you. 746 00:46:12,770 --> 00:46:16,730 And so there might be a drug that works for you and not 747 00:46:16,730 --> 00:46:19,910 for anybody else in the class, but it's exactly 748 00:46:19,910 --> 00:46:23,810 the right drug for you, and we may get to the point 749 00:46:23,810 --> 00:46:26,870 where that will happen and where we can build such drugs 750 00:46:26,870 --> 00:46:32,960 and where we can approve their use in human populations. 751 00:46:32,960 --> 00:46:37,510 Now, the idea here is that if I have 752 00:46:37,510 --> 00:46:44,290 two populations, blue and green, and I draw ROC curves for both 753 00:46:44,290 --> 00:46:46,870 of these populations, they're not 754 00:46:46,870 --> 00:46:50,500 going to be the same, because the drug will work differently 755 00:46:50,500 --> 00:46:52,750 for those two populations. 756 00:46:52,750 --> 00:46:56,140 But on the other hand, I can draw them on the same axes, 757 00:46:56,140 --> 00:47:01,460 and I can say, look any place within this colored region 758 00:47:01,460 --> 00:47:04,670 can be a fair region in that I'm going 759 00:47:04,670 --> 00:47:08,130 to get the same outcome for both populations. 760 00:47:08,130 --> 00:47:12,220 So I can't achieve this outcome for the blue population 761 00:47:12,220 --> 00:47:15,110 or this outcome for the green population, 762 00:47:15,110 --> 00:47:19,160 but I can achieve any of these outcomes for both populations 763 00:47:19,160 --> 00:47:20,880 simultaneously. 764 00:47:20,880 --> 00:47:25,370 And so that's one way of going about satisfying 765 00:47:25,370 --> 00:47:30,200 this requirement when it is not easily satisfied. 766 00:47:30,200 --> 00:47:33,350 So the advantage of separation over independence 767 00:47:33,350 --> 00:47:35,720 is that it allows correlation between R 768 00:47:35,720 --> 00:47:39,290 and Y, even a perfect predictor, so R 769 00:47:39,290 --> 00:47:42,320 could be a perfect predictor for Y. 770 00:47:42,320 --> 00:47:45,290 And it gives you incentives to learn to reduce 771 00:47:45,290 --> 00:47:47,270 the errors in all groups. 772 00:47:47,270 --> 00:47:50,450 So that issue about randomly choosing 773 00:47:50,450 --> 00:47:53,660 members of the minority group doesn't work here 774 00:47:53,660 --> 00:47:57,890 because that would suppress the ROC curve to the point 775 00:47:57,890 --> 00:48:01,980 where there would be no feasible region that you would like. 776 00:48:01,980 --> 00:48:04,430 So for example, if it's a coin flip, 777 00:48:04,430 --> 00:48:06,770 then you'd have the diagonal line 778 00:48:06,770 --> 00:48:10,760 and the only feasible region would be below that diagonal, 779 00:48:10,760 --> 00:48:14,490 no matter how good the predictor was for the other class. 780 00:48:14,490 --> 00:48:17,770 So that's a nice characteristic. 781 00:48:17,770 --> 00:48:21,440 And then the final criterion is sufficiency, 782 00:48:21,440 --> 00:48:26,000 which flips R and Y. So it says that the regressor 783 00:48:26,000 --> 00:48:30,730 or the predictive variable can depend on the protected class, 784 00:48:30,730 --> 00:48:34,910 but the protected class is separated from the outcome. 785 00:48:34,910 --> 00:48:40,150 So for example, the probability in a binary case 786 00:48:40,150 --> 00:48:45,070 of a true outcome of Y given that R 787 00:48:45,070 --> 00:48:50,470 is some particular value, R and A is a particular class, 788 00:48:50,470 --> 00:48:54,340 is the same as the probability of that same outcome given 789 00:48:54,340 --> 00:49:00,740 the same R value, but the different class. 790 00:49:00,740 --> 00:49:05,420 So that's related to the sort of similar people, 791 00:49:05,420 --> 00:49:11,270 similar treatment notion, qualitative notion, again. 792 00:49:13,790 --> 00:49:17,750 So it requires a parody of both the positive 793 00:49:17,750 --> 00:49:23,230 and the negative predictive values across different groups. 794 00:49:23,230 --> 00:49:28,310 So that's another popular way of looking at this. 795 00:49:28,310 --> 00:49:32,050 So for example, if the scoring function is a probability, 796 00:49:32,050 --> 00:49:36,220 or the set of all instances assigned the score R has an R 797 00:49:36,220 --> 00:49:39,560 fraction of positive instances among them, 798 00:49:39,560 --> 00:49:42,550 then the scoring function is said to be well-calibrated. 799 00:49:42,550 --> 00:49:46,000 So we've talked about that before in the class. 800 00:49:46,000 --> 00:49:50,150 If it turns out that R is not well-calibrated, 801 00:49:50,150 --> 00:49:54,640 you can hack it and you can make it well-calibrated by putting 802 00:49:54,640 --> 00:49:58,570 it through a logistic function that will then approximate 803 00:49:58,570 --> 00:50:01,990 the appropriately calibrated score, 804 00:50:01,990 --> 00:50:07,030 and then you hope that that calibration will give-- 805 00:50:07,030 --> 00:50:10,180 or the degree of calibration will give you 806 00:50:10,180 --> 00:50:14,320 a good approximation to this notion of sufficiency. 807 00:50:14,320 --> 00:50:18,400 These guys in the tutorial also point out 808 00:50:18,400 --> 00:50:24,370 that some data sets actually lead to good calibration 809 00:50:24,370 --> 00:50:26,720 without even trying very hard. 810 00:50:26,720 --> 00:50:31,060 So for example, this is the UCI census data set, 811 00:50:31,060 --> 00:50:33,910 and it's a binary prediction of whether somebody makes 812 00:50:33,910 --> 00:50:38,500 more than $50,000 a year if you have any income at all 813 00:50:38,500 --> 00:50:40,930 and if you're over 16 years old. 814 00:50:40,930 --> 00:50:45,370 And the feature, there are 14 features, age, type of work, 815 00:50:45,370 --> 00:50:47,890 weight of sample is some statistical hack 816 00:50:47,890 --> 00:50:51,550 from the Census Bureau, your education level, 817 00:50:51,550 --> 00:50:54,370 marital status, et cetera, and what 818 00:50:54,370 --> 00:50:59,230 you see is that the calibration for males and females 819 00:50:59,230 --> 00:51:00,580 is pretty decent. 820 00:51:00,580 --> 00:51:04,540 It's almost exactly along the 45 degree line 821 00:51:04,540 --> 00:51:07,690 without having done anything particularly dramatic 822 00:51:07,690 --> 00:51:10,000 in order to achieve that. 823 00:51:10,000 --> 00:51:13,420 On the other hand, if you look at the calibration curve 824 00:51:13,420 --> 00:51:16,690 by race for whites versus blacks, 825 00:51:16,690 --> 00:51:20,800 the whites, not surprisingly, are reasonably well-calibrated, 826 00:51:20,800 --> 00:51:23,410 and the blacks are not as well-calibrated. 827 00:51:23,410 --> 00:51:26,680 So you could imagine building some kind of a transformation 828 00:51:26,680 --> 00:51:29,710 function to improve that calibration, 829 00:51:29,710 --> 00:51:32,680 and that would get you separation. 830 00:51:32,680 --> 00:51:35,570 Now, there's a terrible piece of news, 831 00:51:35,570 --> 00:51:40,450 which is that you can prove, as they do in this tutorial, 832 00:51:40,450 --> 00:51:43,510 that it's not possible to jointly achieve 833 00:51:43,510 --> 00:51:46,010 any pair of these conditions. 834 00:51:46,010 --> 00:51:49,660 So you have three reasonable technical notions 835 00:51:49,660 --> 00:51:52,060 of what fairness means, and they're 836 00:51:52,060 --> 00:51:57,570 incompatible with each other except in some trivial cases. 837 00:51:57,570 --> 00:51:58,710 This is not good. 838 00:52:02,210 --> 00:52:04,600 And I'm not going to have time to go into it, 839 00:52:04,600 --> 00:52:07,920 but there's a very nice thing from Google 840 00:52:07,920 --> 00:52:12,960 where they illustrate the results of adopting 841 00:52:12,960 --> 00:52:16,380 one or another of these notions of fairness 842 00:52:16,380 --> 00:52:20,610 on a synthesized population of people, 843 00:52:20,610 --> 00:52:23,460 and you can see how the trade-offs vary 844 00:52:23,460 --> 00:52:25,650 and what the results are of choosing 845 00:52:25,650 --> 00:52:27,700 different notions of fairness. 846 00:52:27,700 --> 00:52:29,940 So it's a kind of nice graphical hack. 847 00:52:29,940 --> 00:52:31,980 Again, it'll be on the slides, and I 848 00:52:31,980 --> 00:52:33,900 urge you to check that out, but I'm not 849 00:52:33,900 --> 00:52:36,340 going to have time to go into it. 850 00:52:36,340 --> 00:52:39,370 There is one other problem that they point out 851 00:52:39,370 --> 00:52:42,410 which is interesting. 852 00:52:42,410 --> 00:52:45,520 So this was a scenario where you're 853 00:52:45,520 --> 00:52:49,090 trying to hire computer programmers, 854 00:52:49,090 --> 00:52:52,240 and you don't want to take gender into account because we 855 00:52:52,240 --> 00:52:55,780 know that women are underrepresented among computer 856 00:52:55,780 --> 00:52:58,420 people, and so we would like that not 857 00:52:58,420 --> 00:53:00,490 to be an allowed attribute in order 858 00:53:00,490 --> 00:53:03,110 to decide to hire someone. 859 00:53:03,110 --> 00:53:06,910 So they say, well, there are two scenarios. 860 00:53:06,910 --> 00:53:12,100 One of them is that gender, A, influences whether you're 861 00:53:12,100 --> 00:53:13,675 a programmer or not. 862 00:53:13,675 --> 00:53:15,850 And this is empirically true. 863 00:53:15,850 --> 00:53:19,630 There are fewer women who are programmers. 864 00:53:19,630 --> 00:53:24,310 It turns out that visiting Pinterest is slightly more 865 00:53:24,310 --> 00:53:27,330 common among women than men. 866 00:53:27,330 --> 00:53:30,330 Who knew? 867 00:53:30,330 --> 00:53:35,430 And then visiting GitHub is much more common among programmers 868 00:53:35,430 --> 00:53:39,360 than among non-programmers. 869 00:53:39,360 --> 00:53:41,590 That one's pretty obvious. 870 00:53:41,590 --> 00:53:47,880 So what they say is, if you want an optimal predictor 871 00:53:47,880 --> 00:53:51,000 of whether somebody's going to get hired, 872 00:53:51,000 --> 00:53:54,630 it should actually take both Pinterest visits and GitHub 873 00:53:54,630 --> 00:54:02,790 visits into account, but because those go back 874 00:54:02,790 --> 00:54:09,860 to gender, which is an unusable attribute, 875 00:54:09,860 --> 00:54:11,550 they don't like this model. 876 00:54:11,550 --> 00:54:18,140 And so they say, well, we could use an optimal separated score, 877 00:54:18,140 --> 00:54:23,000 because now, being a programmer separates your gender 878 00:54:23,000 --> 00:54:25,440 from the scoring function. 879 00:54:25,440 --> 00:54:28,010 And so we can create a different score 880 00:54:28,010 --> 00:54:31,490 which is not the same as the optimal score, 881 00:54:31,490 --> 00:54:39,380 but is permitted because it's no longer dependent on your sex, 882 00:54:39,380 --> 00:54:41,580 on your gender. 883 00:54:41,580 --> 00:54:46,140 Here's another scenario that, again, starts with gender 884 00:54:46,140 --> 00:54:51,510 and says, look, we know that there are more men than women 885 00:54:51,510 --> 00:54:55,750 who obtain college degrees in computer science, 886 00:54:55,750 --> 00:54:58,200 and so there's an influence there, 887 00:54:58,200 --> 00:55:00,390 and computer scientists are much more 888 00:55:00,390 --> 00:55:03,885 likely to be programmers than non-computer science majors. 889 00:55:07,870 --> 00:55:10,240 If you're were a woman-- 890 00:55:10,240 --> 00:55:13,840 has anybody visited the Grace Murray Hopper Conference? 891 00:55:13,840 --> 00:55:15,900 A couple, a few of you. 892 00:55:15,900 --> 00:55:17,640 So this is a really cool conference. 893 00:55:17,640 --> 00:55:22,350 Grace Murray Hopper invented the notion bug or the term bug 894 00:55:22,350 --> 00:55:25,650 and was a really famous computer scientist starting back 895 00:55:25,650 --> 00:55:29,170 in the 1940s when there were very few of them, 896 00:55:29,170 --> 00:55:31,980 and there is a yearly conference for women computer 897 00:55:31,980 --> 00:55:34,500 scientists in her honor. 898 00:55:34,500 --> 00:55:39,210 So clearly, the probability that you visited the Grace Hopper 899 00:55:39,210 --> 00:55:42,360 Conference is dependent on your gender. 900 00:55:42,360 --> 00:55:45,588 It's also dependent on whether you're a computer scientist, 901 00:55:45,588 --> 00:55:47,130 because if you're a historian, you're 902 00:55:47,130 --> 00:55:51,870 not likely to be interested in going to that conference. 903 00:55:51,870 --> 00:55:56,490 And so in this story, the optimal score 904 00:55:56,490 --> 00:55:59,850 is going to depend basically on whether you have a computer 905 00:55:59,850 --> 00:56:05,960 science degree or not, but the separated score 906 00:56:05,960 --> 00:56:09,020 will depend only on your gender, which 907 00:56:09,020 --> 00:56:12,590 is kind of funny, because that's the protected attribute. 908 00:56:12,590 --> 00:56:17,510 And what these guys point out is that despite the fact that you 909 00:56:17,510 --> 00:56:21,500 have these two scenarios, it could well turn out 910 00:56:21,500 --> 00:56:25,610 that the numerical data, the statistics from which you 911 00:56:25,610 --> 00:56:29,850 estimate these models are absolutely identical. 912 00:56:29,850 --> 00:56:32,330 In other words, the same fraction of people 913 00:56:32,330 --> 00:56:37,220 are men and women, the same fraction of people 914 00:56:37,220 --> 00:56:40,250 are programmers, they have the same relationship 915 00:56:40,250 --> 00:56:44,780 to those other factors, and so from a purely observational 916 00:56:44,780 --> 00:56:51,250 viewpoint, you can't tell which of these styles of model 917 00:56:51,250 --> 00:57:00,640 is correct or which version of fairness your data can support. 918 00:57:00,640 --> 00:57:03,160 So that's a problem because we know 919 00:57:03,160 --> 00:57:06,760 that these different notions of fairness 920 00:57:06,760 --> 00:57:09,170 are in conflict with each other. 921 00:57:09,170 --> 00:57:13,190 So I wanted to finish by showing you a couple of examples. 922 00:57:13,190 --> 00:57:17,890 So this was a paper based on Irene's work. 923 00:57:17,890 --> 00:57:25,720 So Irene, shout if I'm butchering the discussion. 924 00:57:25,720 --> 00:57:33,790 I got an invitation last year from the American Medical 925 00:57:33,790 --> 00:57:39,040 Association's Journal of Ethics, which I didn't know existed, 926 00:57:39,040 --> 00:57:42,940 to write a think piece for them about fairness in machine 927 00:57:42,940 --> 00:57:48,340 learning, and I decided that rather than just bloviate, 928 00:57:48,340 --> 00:57:50,380 I wanted to present some real work, 929 00:57:50,380 --> 00:57:53,920 and Irene had been doing some real work. 930 00:57:53,920 --> 00:57:56,530 And so Marcia, who was one of my students, 931 00:57:56,530 --> 00:58:00,890 and I convinced her to get into this, 932 00:58:00,890 --> 00:58:05,720 and we started looking at the question of how 933 00:58:05,720 --> 00:58:09,800 these machine learning models can identify and perhaps reduce 934 00:58:09,800 --> 00:58:15,110 disparities in general medical and mental health. 935 00:58:15,110 --> 00:58:16,730 Now, why those two areas? 936 00:58:16,730 --> 00:58:19,740 Because we had access to data in those areas. 937 00:58:19,740 --> 00:58:23,000 So the general medical was actually not that general. 938 00:58:23,000 --> 00:58:25,490 It's intensive care data from MIMIC, 939 00:58:25,490 --> 00:58:27,470 and mental health care is some data 940 00:58:27,470 --> 00:58:32,120 that we had access to from Mass General and McLean's hospital 941 00:58:32,120 --> 00:58:35,195 here in Boston, which both have big psychiatric clinics. 942 00:58:37,830 --> 00:58:42,580 So yeah, this is what I just said. 943 00:58:42,580 --> 00:58:44,910 So the question we were asking is, is there 944 00:58:44,910 --> 00:58:48,570 bias based on race, gender, and insurance type? 945 00:58:48,570 --> 00:58:51,700 So we were really interested in socioeconomic status, 946 00:58:51,700 --> 00:58:54,090 but we didn't have that in the database, 947 00:58:54,090 --> 00:58:57,900 but the type of insurance you have correlates pretty well 948 00:58:57,900 --> 00:59:00,180 with whether you're rich or poor. 949 00:59:00,180 --> 00:59:03,780 If you have Medicaid insurance, for example, you're poor, 950 00:59:03,780 --> 00:59:05,590 and if you have private insurance, 951 00:59:05,590 --> 00:59:08,490 the first approximation, you're rich. 952 00:59:08,490 --> 00:59:12,120 So we did that, and then we looked at the notes. 953 00:59:12,120 --> 00:59:15,330 So we wanted to see not the coded data, 954 00:59:15,330 --> 00:59:19,050 but whether the things that nurses and doctors said 955 00:59:19,050 --> 00:59:22,650 about you as you were in the hospital 956 00:59:22,650 --> 00:59:27,210 were predictive of readmission, of 30-day readmission, 957 00:59:27,210 --> 00:59:30,850 of whether you were likely to come back to the hospital. 958 00:59:30,850 --> 00:59:32,900 So these are some of the topics. 959 00:59:32,900 --> 00:59:38,220 We used LDA, standard topic modeling framework. 960 00:59:38,220 --> 00:59:41,790 And the topics, as usual, include some garbage, 961 00:59:41,790 --> 00:59:46,750 but also include a lot of recognizably useful topics. 962 00:59:46,750 --> 00:59:49,950 So for example, mass, cancer, metastatic, 963 00:59:49,950 --> 00:59:54,090 clearly associated with cancer, Afib, atrial, Coumadin, 964 00:59:54,090 --> 00:59:59,850 fibrillation, associated with heart function, et cetera, 965 00:59:59,850 --> 01:00:01,620 in the ICU domain. 966 01:00:01,620 --> 01:00:04,230 In the psychiatric domain, you have 967 01:00:04,230 --> 01:00:08,280 things like bipolar, lithium, manic episode, clearly 968 01:00:08,280 --> 01:00:13,620 associated with bipolar disease, pain, chronic, milligrams, 969 01:00:13,620 --> 01:00:18,390 the drug quantity, associated with chronic pain, et cetera. 970 01:00:18,390 --> 01:00:22,970 So these were the topics that we used. 971 01:00:22,970 --> 01:00:26,360 And so we said, what happens when 972 01:00:26,360 --> 01:00:32,180 you look at the different topics, 973 01:00:32,180 --> 01:00:34,400 how often the different topics arise 974 01:00:34,400 --> 01:00:36,860 in different subpopulations? 975 01:00:36,860 --> 01:00:41,330 And so what we found is that, for example, white patients 976 01:00:41,330 --> 01:00:44,780 have more topics that are enriched for anxiety 977 01:00:44,780 --> 01:00:49,760 and chronic pain, whereas black, Hispanic, and Asian patients 978 01:00:49,760 --> 01:00:52,430 had higher topic enrichment for psychosis. 979 01:00:55,460 --> 01:00:57,500 It's interesting. 980 01:00:57,500 --> 01:01:00,890 Male patients had more substance abuse problems. 981 01:01:00,890 --> 01:01:04,010 Female patients had more general depression 982 01:01:04,010 --> 01:01:06,650 and treatment-resistant depression. 983 01:01:06,650 --> 01:01:13,040 So if you want to create a stereotype, men are druggies 984 01:01:13,040 --> 01:01:18,530 and women are depressed, according to this data. 985 01:01:18,530 --> 01:01:19,860 What about insurance type? 986 01:01:19,860 --> 01:01:24,590 Well, private insurance patients had higher levels 987 01:01:24,590 --> 01:01:28,310 of anxiety and depression, and poorer patients 988 01:01:28,310 --> 01:01:31,310 or public insurance patients had more problems 989 01:01:31,310 --> 01:01:33,440 with substance abuse. 990 01:01:33,440 --> 01:01:37,070 Again, another stereotype that you could form. 991 01:01:37,070 --> 01:01:41,900 And then you could look at-- 992 01:01:41,900 --> 01:01:44,420 that was in the psychiatric population. 993 01:01:44,420 --> 01:01:51,770 In the ICU population, men still have substance abuse problems. 994 01:01:51,770 --> 01:01:55,880 Women have more pulmonary disease. 995 01:01:55,880 --> 01:01:57,830 And we were speculating on how this 996 01:01:57,830 --> 01:02:01,760 relates to sort of known data about underdiagnosis 997 01:02:01,760 --> 01:02:04,400 of COPD in women. 998 01:02:04,400 --> 01:02:09,920 By race, Asian patients have a lot of discussion of cancer, 999 01:02:09,920 --> 01:02:11,930 black patients have a lot of discussion 1000 01:02:11,930 --> 01:02:15,800 of kidney problems, Hispanics of liver problems, 1001 01:02:15,800 --> 01:02:18,630 and whites have atrial fibrillation. 1002 01:02:18,630 --> 01:02:22,310 So again, stereotypes of what's most common 1003 01:02:22,310 --> 01:02:23,560 in these different groups. 1004 01:02:29,780 --> 01:02:33,350 And by insurance type, those with public insurance 1005 01:02:33,350 --> 01:02:36,530 often have multiple chronic conditions. 1006 01:02:36,530 --> 01:02:41,060 And so public insurance patients have atrial fibrillation, 1007 01:02:41,060 --> 01:02:43,130 pacemakers, dialysis. 1008 01:02:43,130 --> 01:02:48,950 These are indications of chronic heart disease 1009 01:02:48,950 --> 01:02:51,080 and chronic kidney disease. 1010 01:02:51,080 --> 01:02:54,560 And private insurance patients have higher topic enrichment 1011 01:02:54,560 --> 01:02:56,810 values for fractures. 1012 01:02:56,810 --> 01:02:59,420 So maybe they're richer, they play more sports 1013 01:02:59,420 --> 01:03:02,300 and break their arms or something. 1014 01:03:02,300 --> 01:03:04,300 Lymphoma and aneurysms. 1015 01:03:06,930 --> 01:03:08,610 Just reporting the data. 1016 01:03:08,610 --> 01:03:10,210 Just the facts. 1017 01:03:10,210 --> 01:03:13,470 So these results are actually consistent with lots 1018 01:03:13,470 --> 01:03:17,880 of analysis that have been done of this kind of data. 1019 01:03:17,880 --> 01:03:20,160 Now, what I really wanted to look at 1020 01:03:20,160 --> 01:03:23,640 was this question of, can we get similar error rates, 1021 01:03:23,640 --> 01:03:26,910 or how similar are the error rates that we get, 1022 01:03:26,910 --> 01:03:29,610 and the answer is, not so much. 1023 01:03:29,610 --> 01:03:34,710 So for example, if you look at the ICU data, 1024 01:03:34,710 --> 01:03:39,830 we find that the error rates on a zero-one loss metric 1025 01:03:39,830 --> 01:03:45,100 are much lower for men than they are for women, statistically 1026 01:03:45,100 --> 01:03:47,240 significantly lower. 1027 01:03:47,240 --> 01:03:52,900 So we're able to more accurately model male response or male 1028 01:03:52,900 --> 01:03:57,100 prediction of 30-day readmission than we are-- 1029 01:03:57,100 --> 01:04:02,350 sorry, of ICU mortality for the ICU than we are for women. 1030 01:04:02,350 --> 01:04:08,680 Similarly, we have much tighter ability 1031 01:04:08,680 --> 01:04:13,090 to predict outcomes for private insurance patients 1032 01:04:13,090 --> 01:04:15,910 than for public insurance patients 1033 01:04:15,910 --> 01:04:19,210 with a huge gap in the confidence 1034 01:04:19,210 --> 01:04:21,290 intervals between them. 1035 01:04:21,290 --> 01:04:24,250 So this indicates that there is, in fact, 1036 01:04:24,250 --> 01:04:27,640 a racial bias in the data that we have 1037 01:04:27,640 --> 01:04:30,160 and in the models that we're building. 1038 01:04:30,160 --> 01:04:33,580 These are particularly simple models. 1039 01:04:33,580 --> 01:04:39,250 In psychiatry, when you look at the comparison 1040 01:04:39,250 --> 01:04:41,680 for different ethnic populations, 1041 01:04:41,680 --> 01:04:44,560 you see a fair amount of overlap. 1042 01:04:44,560 --> 01:04:47,170 One reason we speculate is that we 1043 01:04:47,170 --> 01:04:50,770 have a lot less data about psychiatric patients 1044 01:04:50,770 --> 01:04:53,270 than we do about ICU patients. 1045 01:04:53,270 --> 01:04:54,910 So the models are not going to give us 1046 01:04:54,910 --> 01:04:56,860 as accurate predictions. 1047 01:04:56,860 --> 01:05:02,320 But you still see, for example, a statistically significant 1048 01:05:02,320 --> 01:05:10,530 difference between blacks and whites and other races, 1049 01:05:10,530 --> 01:05:13,970 although there's a lot of overlap here. 1050 01:05:13,970 --> 01:05:16,400 Again, between males and females, 1051 01:05:16,400 --> 01:05:20,820 we get fewer errors in making predictions for males, 1052 01:05:20,820 --> 01:05:26,120 but there is not a 95% confidence separation 1053 01:05:26,120 --> 01:05:27,420 between them. 1054 01:05:27,420 --> 01:05:30,350 And for private versus public insurance, 1055 01:05:30,350 --> 01:05:33,810 we do see that separation where for some reason, 1056 01:05:33,810 --> 01:05:36,470 in fact, we're able to make better predictions 1057 01:05:36,470 --> 01:05:38,240 for the people on Medicare than we 1058 01:05:38,240 --> 01:05:41,270 are-- or Medicaid than we are for patients 1059 01:05:41,270 --> 01:05:43,830 in private insurance. 1060 01:05:43,830 --> 01:05:49,040 So just to wrap that up, this is not a solution to the problem, 1061 01:05:49,040 --> 01:05:53,120 but it's an examination of the problem. 1062 01:05:53,120 --> 01:05:55,940 And this Journal of Ethics considered 1063 01:05:55,940 --> 01:06:01,340 it interesting enough to publish just a couple of months ago. 1064 01:06:01,340 --> 01:06:03,560 The last thing I want to talk about 1065 01:06:03,560 --> 01:06:07,100 is some work of Willie's, so I'm taking 1066 01:06:07,100 --> 01:06:09,950 the risk of speaking before the people who 1067 01:06:09,950 --> 01:06:14,960 actually did the work here and embarrassing myself. 1068 01:06:14,960 --> 01:06:18,470 So this is modeling mistrust in end-of-life care, 1069 01:06:18,470 --> 01:06:22,190 and it's based on Willie's master's thesis 1070 01:06:22,190 --> 01:06:25,280 and on some papers that came as a result of that. 1071 01:06:27,800 --> 01:06:32,960 So here's the interesting data. 1072 01:06:32,960 --> 01:06:35,990 If you look at African-American patients, 1073 01:06:35,990 --> 01:06:42,260 and these are patients in the MIMIC data set, what you find 1074 01:06:42,260 --> 01:06:47,720 is that for mechanical ventilation, 1075 01:06:47,720 --> 01:06:50,180 blacks are on mechanical ventilation 1076 01:06:50,180 --> 01:06:54,110 a lot longer than whites on average, 1077 01:06:54,110 --> 01:06:56,660 and there's a pretty decent separation 1078 01:06:56,660 --> 01:07:00,230 at the P equal 0.05 level, so 1/2% 1079 01:07:00,230 --> 01:07:03,330 level between those two populations. 1080 01:07:03,330 --> 01:07:07,220 So there's something going on where black patients are 1081 01:07:07,220 --> 01:07:11,480 kept on mechanical ventilation longer than white patients. 1082 01:07:11,480 --> 01:07:14,270 Now, of course, we don't know exactly why. 1083 01:07:14,270 --> 01:07:16,100 We don't know whether it's because there 1084 01:07:16,100 --> 01:07:19,250 is a physiological difference, or because it 1085 01:07:19,250 --> 01:07:21,320 has something to do with their insurance, 1086 01:07:21,320 --> 01:07:23,120 or because God knows. 1087 01:07:23,120 --> 01:07:26,100 It could be any of a lot of different factors, 1088 01:07:26,100 --> 01:07:27,770 but that's the case. 1089 01:07:27,770 --> 01:07:30,390 The eICU data set we've mentioned, 1090 01:07:30,390 --> 01:07:34,430 it's a larger, but less detailed data set, also of 1091 01:07:34,430 --> 01:07:39,380 intensive care patients, that was donated to Roger Marks' Lab 1092 01:07:39,380 --> 01:07:41,960 by Phillips Corporation. 1093 01:07:41,960 --> 01:07:45,410 And there, we see, again, a separation 1094 01:07:45,410 --> 01:07:49,430 of mechanical ventilation duration roughly 1095 01:07:49,430 --> 01:07:52,250 comparable to what we saw in the MIMIC data set. 1096 01:07:52,250 --> 01:07:55,380 So these are consistent with each other. 1097 01:07:55,380 --> 01:07:59,450 On the other hand, if you look at the use of vasopressors, 1098 01:07:59,450 --> 01:08:04,250 blacks versus whites, at the P equal 0.12 level, 1099 01:08:04,250 --> 01:08:05,930 you say, well, there's a little bit 1100 01:08:05,930 --> 01:08:07,730 of evidence, but not strong enough 1101 01:08:07,730 --> 01:08:09,650 to reach any conclusions. 1102 01:08:09,650 --> 01:08:14,480 Or in the eICU data, P equal 0.42 1103 01:08:14,480 --> 01:08:17,899 is clearly quite insignificant, so we're not 1104 01:08:17,899 --> 01:08:20,029 making any claims there. 1105 01:08:20,029 --> 01:08:22,729 So the question that Willie was asking, 1106 01:08:22,729 --> 01:08:27,300 which I think is a really good question, is, 1107 01:08:27,300 --> 01:08:33,410 could this difference be due not to physiological differences 1108 01:08:33,410 --> 01:08:37,069 or even these sort of socioeconomic or social 1109 01:08:37,069 --> 01:08:41,300 differences, but to a difference in the degree of trust 1110 01:08:41,300 --> 01:08:45,500 between the patient and their doctors? 1111 01:08:45,500 --> 01:08:48,010 It's an interesting idea. 1112 01:08:48,010 --> 01:08:51,640 And of course, I wouldn't be telling you about this 1113 01:08:51,640 --> 01:08:54,580 if the answer were no. 1114 01:08:54,580 --> 01:08:58,300 And so the approach that he took was 1115 01:08:58,300 --> 01:09:02,260 to look for cases where there's clearly mistrust. 1116 01:09:02,260 --> 01:09:06,580 So there are red flags if you read the notes. 1117 01:09:06,580 --> 01:09:09,670 For example, if a patient leaves the hospital 1118 01:09:09,670 --> 01:09:14,170 against medical advice, that is a pretty good indication 1119 01:09:14,170 --> 01:09:17,979 that they don't trust the medical system. 1120 01:09:17,979 --> 01:09:22,390 If the family-- if the person dies and the family 1121 01:09:22,390 --> 01:09:26,240 refuses to allow them to do an autopsy, 1122 01:09:26,240 --> 01:09:28,689 this is another indication that maybe they 1123 01:09:28,689 --> 01:09:30,800 don't trust the medical system. 1124 01:09:30,800 --> 01:09:36,010 So there are these sort of red letter indicators of mistrust. 1125 01:09:36,010 --> 01:09:40,420 For example, patient refused to sign ICU consent 1126 01:09:40,420 --> 01:09:42,790 and expressed wishes to be do not 1127 01:09:42,790 --> 01:09:45,880 resuscitate, do not intubate, seemingly very 1128 01:09:45,880 --> 01:09:49,479 frustrated and mistrusting of the health care system, 1129 01:09:49,479 --> 01:09:52,870 also with a history of poor medication compliance 1130 01:09:52,870 --> 01:09:54,090 and follow-up. 1131 01:09:54,090 --> 01:09:56,080 So that's a pretty clear indication. 1132 01:09:56,080 --> 01:10:00,910 And you can build a relatively simple extraction 1133 01:10:00,910 --> 01:10:06,590 or interpretation model that identifies those clear cases. 1134 01:10:06,590 --> 01:10:10,190 This is what I was saying about autopsies. 1135 01:10:10,190 --> 01:10:12,890 So the problem, of course, is that not every patient 1136 01:10:12,890 --> 01:10:14,990 has such an obvious label. 1137 01:10:14,990 --> 01:10:17,130 In fact, most of them don't. 1138 01:10:17,130 --> 01:10:21,500 And so Willie's idea was, can we learn a model 1139 01:10:21,500 --> 01:10:24,530 from these obvious examples and then 1140 01:10:24,530 --> 01:10:27,680 apply them to the less obvious examples 1141 01:10:27,680 --> 01:10:31,430 in order to get a kind of a bronze standard 1142 01:10:31,430 --> 01:10:37,250 or remote supervision notion of a larger population that 1143 01:10:37,250 --> 01:10:42,020 has a tendency to be mistrustful according to our model 1144 01:10:42,020 --> 01:10:46,850 without having as explicit a clear case of mistrust, 1145 01:10:46,850 --> 01:10:49,810 as in those examples. 1146 01:10:49,810 --> 01:10:55,300 And so if you look at chart events in MIMIC, for example, 1147 01:10:55,300 --> 01:10:59,380 you discover that associated with those cases 1148 01:10:59,380 --> 01:11:04,480 of obvious mistrust are features like the person was 1149 01:11:04,480 --> 01:11:06,370 in restraints. 1150 01:11:06,370 --> 01:11:09,040 They were literally locked down to their bed 1151 01:11:09,040 --> 01:11:11,890 because the nurses were afraid they would 1152 01:11:11,890 --> 01:11:15,280 get up and do something bad. 1153 01:11:15,280 --> 01:11:17,710 Not necessarily like attack a nurse, 1154 01:11:17,710 --> 01:11:22,810 but more like fall out of bed or go wandering off the floor 1155 01:11:22,810 --> 01:11:25,180 or something like that. 1156 01:11:25,180 --> 01:11:29,920 If a person is in pain, that correlated with these mistrust 1157 01:11:29,920 --> 01:11:31,480 measures as well. 1158 01:11:31,480 --> 01:11:36,280 And conversely, if you saw that somebody had their hair washed 1159 01:11:36,280 --> 01:11:40,130 or that there was a discussion of their status and comfort, 1160 01:11:40,130 --> 01:11:43,420 then they were probably less likely to be 1161 01:11:43,420 --> 01:11:46,460 mistrustful of the system. 1162 01:11:46,460 --> 01:11:49,120 And so the approach that Willie took 1163 01:11:49,120 --> 01:11:54,100 was to say, well, let's code these 620 binary indicators 1164 01:11:54,100 --> 01:11:57,370 of trust and build a logistic regression 1165 01:11:57,370 --> 01:12:00,940 model to the labeled examples and then 1166 01:12:00,940 --> 01:12:04,420 apply it to the unlabeled examples of people for whom 1167 01:12:04,420 --> 01:12:07,540 we don't have such a clear indication, 1168 01:12:07,540 --> 01:12:11,380 and this gives us another population of people who 1169 01:12:11,380 --> 01:12:14,410 are likely to be mistrustful and therefore, 1170 01:12:14,410 --> 01:12:19,260 enough people that we can do further analysis on it. 1171 01:12:19,260 --> 01:12:22,730 So if you look at the mistrust metrics, 1172 01:12:22,730 --> 01:12:27,050 you have things like if the patient is agitated 1173 01:12:27,050 --> 01:12:32,690 on some agitation scale, they're more likely to be mistrustful. 1174 01:12:32,690 --> 01:12:35,180 If, conversely, they're alert, they're 1175 01:12:35,180 --> 01:12:37,600 less likely to be mistrustful. 1176 01:12:37,600 --> 01:12:40,340 So that means they're in some better mental shape. 1177 01:12:40,340 --> 01:12:42,290 If they're not in pain, they're less 1178 01:12:42,290 --> 01:12:46,050 likely to be mistrustful, et cetera. 1179 01:12:46,050 --> 01:12:51,860 And if the patient was restrained, 1180 01:12:51,860 --> 01:12:56,780 then trustful patients have no pain, 1181 01:12:56,780 --> 01:13:01,130 or they have a spokesperson who is their health care proxy, 1182 01:13:01,130 --> 01:13:03,860 or there is a lot of family communication, 1183 01:13:03,860 --> 01:13:09,770 but conversely, if restraints had to be reapplied, 1184 01:13:09,770 --> 01:13:15,710 or if there are various other factors, then 1185 01:13:15,710 --> 01:13:19,390 they're more likely to be mistrustful. 1186 01:13:19,390 --> 01:13:27,010 So if you look at that prediction, what you find 1187 01:13:27,010 --> 01:13:30,670 is that for both predicting the use of mechanical ventilation 1188 01:13:30,670 --> 01:13:36,040 and vasopressors, the disparity between a population 1189 01:13:36,040 --> 01:13:39,730 of black and white patients is actually 1190 01:13:39,730 --> 01:13:43,480 less significant than the disparity 1191 01:13:43,480 --> 01:13:48,920 between a population of high trust and low trust patients. 1192 01:13:48,920 --> 01:13:53,260 So what this suggests is that the fundamental feature here 1193 01:13:53,260 --> 01:13:55,480 that may be leading to that difference 1194 01:13:55,480 --> 01:13:58,780 is, in fact, not race, but is something 1195 01:13:58,780 --> 01:14:01,840 that correlates with race because blacks 1196 01:14:01,840 --> 01:14:03,850 are more likely to be distrustful 1197 01:14:03,850 --> 01:14:06,070 of the medical system than whites. 1198 01:14:06,070 --> 01:14:07,808 Now, why might that be? 1199 01:14:07,808 --> 01:14:09,100 What do you know about history? 1200 01:14:12,760 --> 01:14:15,580 I mean, you took the city training course 1201 01:14:15,580 --> 01:14:19,660 that had you read the Belmont Report talking about things 1202 01:14:19,660 --> 01:14:22,570 like the Tuskegee experiment. 1203 01:14:22,570 --> 01:14:26,410 I'm sure that leaves a significant impression 1204 01:14:26,410 --> 01:14:30,100 in people's minds about how the health care system is going 1205 01:14:30,100 --> 01:14:33,660 to treat people of their race. 1206 01:14:33,660 --> 01:14:34,670 I'm Jewish. 1207 01:14:34,670 --> 01:14:39,070 My mother barely lived through Auschwitz, 1208 01:14:39,070 --> 01:14:42,940 and so I understand some of the strong family 1209 01:14:42,940 --> 01:14:45,760 feelings that happened as a result of some 1210 01:14:45,760 --> 01:14:47,740 of these historical events. 1211 01:14:47,740 --> 01:14:52,600 And there were medical people doing experiments on prisoners 1212 01:14:52,600 --> 01:14:55,370 in the concentration camps as well, 1213 01:14:55,370 --> 01:14:58,390 so I would expect that people in my status 1214 01:14:58,390 --> 01:15:03,860 might also have similar issues of mistrust. 1215 01:15:03,860 --> 01:15:06,420 Now, it turns out, you might ask, well, 1216 01:15:06,420 --> 01:15:10,220 is mistrust, in fact, just a proxy for severity? 1217 01:15:10,220 --> 01:15:13,490 Are sicker people simply more mistrustful, 1218 01:15:13,490 --> 01:15:17,390 and is what we're seeing just a reflection of the fact 1219 01:15:17,390 --> 01:15:18,740 that they're sicker? 1220 01:15:18,740 --> 01:15:21,030 And the answer seems to be, not so much. 1221 01:15:21,030 --> 01:15:27,620 So if you look at these severity scores like OASIS and SAPS 1222 01:15:27,620 --> 01:15:32,240 and look at their correlation with noncompliance in autopsy, 1223 01:15:32,240 --> 01:15:34,370 those are pretty low correlation values, 1224 01:15:34,370 --> 01:15:39,360 so they're not explanatory of this phenomenon. 1225 01:15:39,360 --> 01:15:43,440 And then in the population, you see that, again, there 1226 01:15:43,440 --> 01:15:48,230 is a significant difference in sentiment 1227 01:15:48,230 --> 01:15:56,330 expressed in the notes between black and white patients. 1228 01:15:56,330 --> 01:16:00,560 The autopsy derived mistrust metrics don't 1229 01:16:00,560 --> 01:16:05,070 show a strong relationship, a strong difference between them, 1230 01:16:05,070 --> 01:16:11,340 but the noncompliance derived mistrust metrics do. 1231 01:16:11,340 --> 01:16:14,140 So I'm out of time. 1232 01:16:14,140 --> 01:16:17,390 I'll just leave you with a final word. 1233 01:16:17,390 --> 01:16:21,690 There is a lot more work that needs to be done in this area, 1234 01:16:21,690 --> 01:16:27,180 and it's a very rich area both for technical work 1235 01:16:27,180 --> 01:16:30,930 and for trying to understand what the desiderata are 1236 01:16:30,930 --> 01:16:34,500 and how to match them to the technical capabilities. 1237 01:16:34,500 --> 01:16:37,860 There are these various conferences. 1238 01:16:37,860 --> 01:16:41,100 One of the people active in this area, one 1239 01:16:41,100 --> 01:16:45,960 of the pairs of people, Mike Kearns and Aaron Roth at Penn 1240 01:16:45,960 --> 01:16:49,180 are coming out with a book called The Ethical Algorithm, 1241 01:16:49,180 --> 01:16:51,510 which is coming out this fall. 1242 01:16:51,510 --> 01:16:52,920 It's a popular pressbook. 1243 01:16:52,920 --> 01:16:55,650 I've not read it, but it looks like it 1244 01:16:55,650 --> 01:16:57,750 should be quite interesting. 1245 01:16:57,750 --> 01:17:00,720 And then we're starting to see whole classes 1246 01:17:00,720 --> 01:17:04,260 in fairness popping up at different universities. 1247 01:17:04,260 --> 01:17:08,460 University of Pennsylvania has the science of Data ethics, 1248 01:17:08,460 --> 01:17:11,430 and I've mentioned already this fairness in machine learning 1249 01:17:11,430 --> 01:17:13,510 class at Berkeley. 1250 01:17:13,510 --> 01:17:16,930 This is, in fact, one of the topics we've talked about. 1251 01:17:16,930 --> 01:17:19,260 I'm on a committee that is planning 1252 01:17:19,260 --> 01:17:21,120 the activities of the new Schwarzman 1253 01:17:21,120 --> 01:17:24,510 College of Computing, and this notion 1254 01:17:24,510 --> 01:17:27,990 of infusing ideas about fairness and ethics 1255 01:17:27,990 --> 01:17:31,230 into the technical curriculum is one of the things 1256 01:17:31,230 --> 01:17:32,920 that we've been discussing. 1257 01:17:32,920 --> 01:17:35,020 The college obviously hasn't started yet, 1258 01:17:35,020 --> 01:17:38,700 so we don't have anything other than this lecture 1259 01:17:38,700 --> 01:17:41,280 and a few other things like that in the works, 1260 01:17:41,280 --> 01:17:45,950 but the plan is there to expand more in this area.