1 00:00:04,810 --> 00:00:08,177 [MUSIC PLAYING] 2 00:00:11,063 --> 00:00:13,030 AMIT GANDHI: Hi, my name is Amit Gandhi. 3 00:00:13,030 --> 00:00:15,093 And I'm a graduate researcher at MIT. 4 00:00:15,093 --> 00:00:16,510 Welcome to the series on exploring 5 00:00:16,510 --> 00:00:19,240 fairness and machine learning for international development. 6 00:00:19,240 --> 00:00:21,130 In this module, we will have an example 7 00:00:21,130 --> 00:00:23,692 of how an organization would go about implementing machine 8 00:00:23,692 --> 00:00:25,900 learning and what some of the ethical challenges that 9 00:00:25,900 --> 00:00:28,580 may arise are. 10 00:00:28,580 --> 00:00:30,830 This module primarily focuses on decisions 11 00:00:30,830 --> 00:00:32,820 that are made at the organizational level. 12 00:00:32,820 --> 00:00:35,930 But it is important for both organizational decision makers 13 00:00:35,930 --> 00:00:37,370 and machine learning implementers 14 00:00:37,370 --> 00:00:39,960 to consider these interactions. 15 00:00:39,960 --> 00:00:41,520 In this case study, we will be taking 16 00:00:41,520 --> 00:00:44,670 the role of a chief technology officer of a social enterprise 17 00:00:44,670 --> 00:00:47,450 to provide solar lighting products in East Africa. 18 00:00:47,450 --> 00:00:49,950 The mission of the company is to provide affordable lighting 19 00:00:49,950 --> 00:00:51,793 solutions to people living in poverty. 20 00:00:51,793 --> 00:00:53,460 And the company started off by providing 21 00:00:53,460 --> 00:00:56,250 high-quality, inexpensive solar light, so a replacement 22 00:00:56,250 --> 00:00:59,020 for kerosene lanterns. 23 00:00:59,020 --> 00:01:02,130 Over time, the company has grown and increased its product 24 00:01:02,130 --> 00:01:05,760 offering to include large solar home systems and along the way 25 00:01:05,760 --> 00:01:08,250 has implemented pay-as-you-go models so that households can 26 00:01:08,250 --> 00:01:10,830 afford to purchase these larger systems. 27 00:01:10,830 --> 00:01:13,530 The way pay-as-you-go models work are that you provide 28 00:01:13,530 --> 00:01:15,440 the solar lighting infrastructure as a loaned 29 00:01:15,440 --> 00:01:16,497 asset to individuals. 30 00:01:16,497 --> 00:01:19,080 And they pay you back over time through mobile money payments, 31 00:01:19,080 --> 00:01:23,350 until the full value of the asset is recovered. 32 00:01:23,350 --> 00:01:25,730 The company has been meticulous about keeping records 33 00:01:25,730 --> 00:01:27,650 from transactions from their user base. 34 00:01:27,650 --> 00:01:30,580 And as a result, you have access to both demographic information 35 00:01:30,580 --> 00:01:33,263 and payment history for all of your clients. 36 00:01:33,263 --> 00:01:34,930 The information you have from your users 37 00:01:34,930 --> 00:01:37,600 includes age, gender, occupation, location, 38 00:01:37,600 --> 00:01:39,270 and household income. 39 00:01:39,270 --> 00:01:41,020 As you look at expanding the social impact 40 00:01:41,020 --> 00:01:43,120 of your enterprise, you realize that this data 41 00:01:43,120 --> 00:01:45,400 can be analyzed to determine a creditworthiness 42 00:01:45,400 --> 00:01:47,300 metric for your customers. 43 00:01:47,300 --> 00:01:49,300 Additionally, you could provide this information 44 00:01:49,300 --> 00:01:51,580 to banks or microfinance institutions 45 00:01:51,580 --> 00:01:55,500 so that they can give loans to your client base. 46 00:01:55,500 --> 00:01:57,210 Machine learning is a powerful tool 47 00:01:57,210 --> 00:01:59,610 that you can use to implement this credit scoring metric. 48 00:01:59,610 --> 00:02:01,747 However, you do not have data scientists or machine 49 00:02:01,747 --> 00:02:03,330 learning experts within your team that 50 00:02:03,330 --> 00:02:05,340 can implement this solution. 51 00:02:05,340 --> 00:02:07,290 You also do not know how accurate or powerful 52 00:02:07,290 --> 00:02:08,836 an algorithm you developed could be. 53 00:02:08,836 --> 00:02:10,919 So you do not want to spend the resources to build 54 00:02:10,919 --> 00:02:13,290 a full team [INAUDIBLE] on a small pilot with some 55 00:02:13,290 --> 00:02:16,010 of your users in Uganda. 56 00:02:16,010 --> 00:02:18,227 As a resourceful company with engineering staff, 57 00:02:18,227 --> 00:02:20,060 you could either have some of your engineers 58 00:02:20,060 --> 00:02:21,770 implement a machine learning solution 59 00:02:21,770 --> 00:02:25,670 using off-the-shelf products or work with a third party company 60 00:02:25,670 --> 00:02:29,670 to implement the solution for credit scoring. 61 00:02:29,670 --> 00:02:31,530 Let's pause this case study for a second 62 00:02:31,530 --> 00:02:33,660 and examine the pros and cons of the decisions that 63 00:02:33,660 --> 00:02:34,830 need to be made. 64 00:02:34,830 --> 00:02:36,600 It is important to consider perspectives 65 00:02:36,600 --> 00:02:38,970 from both the machine learning implementer as well 66 00:02:38,970 --> 00:02:40,950 as the organizations to understand the thoughts 67 00:02:40,950 --> 00:02:44,010 and complexities that go into developing a solution. 68 00:02:44,010 --> 00:02:46,770 Doing it in-house without a trained data scientist 69 00:02:46,770 --> 00:02:48,870 will likely involve implementation of a black box 70 00:02:48,870 --> 00:02:49,853 solution. 71 00:02:49,853 --> 00:02:52,020 While someone with no background in machine learning 72 00:02:52,020 --> 00:02:54,480 could get a solution up and running fairly quickly, 73 00:02:54,480 --> 00:02:58,640 there are several nuances in the design that may get overlooked. 74 00:02:58,640 --> 00:03:00,260 Allowing a third party consultant 75 00:03:00,260 --> 00:03:03,203 to implement your solution would solve many of these issues, 76 00:03:03,203 --> 00:03:05,120 though you may lack both in-house capabilities 77 00:03:05,120 --> 00:03:07,520 to understand how your model is being implemented 78 00:03:07,520 --> 00:03:10,960 and maintain it moving forward. 79 00:03:10,960 --> 00:03:13,600 Let's assume that one way or another, the credit scoring 80 00:03:13,600 --> 00:03:15,280 algorithm gets built. 81 00:03:15,280 --> 00:03:17,500 Without paying attention to fairness in this setup, 82 00:03:17,500 --> 00:03:19,270 several issues may arise. 83 00:03:19,270 --> 00:03:22,653 First, you may find that as you analyze your historical data, 84 00:03:22,653 --> 00:03:24,820 that certain groups of people have different default 85 00:03:24,820 --> 00:03:26,820 rates than others. 86 00:03:26,820 --> 00:03:30,010 For example, women may have a lower default rate than men. 87 00:03:30,010 --> 00:03:32,098 And you may decide that as an organization, 88 00:03:32,098 --> 00:03:34,140 you want to be fair and gender blind in your loan 89 00:03:34,140 --> 00:03:35,960 determination. 90 00:03:35,960 --> 00:03:38,500 The slide shows an example of what different loan rates look 91 00:03:38,500 --> 00:03:42,130 like for what men and women. 92 00:03:42,130 --> 00:03:44,320 To implement fairness, a naive implementer 93 00:03:44,320 --> 00:03:47,020 may first try to use fairness through unawareness, which 94 00:03:47,020 --> 00:03:48,940 means that you simply hide gender information 95 00:03:48,940 --> 00:03:51,150 while building your models. 96 00:03:51,150 --> 00:03:53,280 Depending on correlations within your data 97 00:03:53,280 --> 00:03:55,537 and how relevant gender is to default rates, 98 00:03:55,537 --> 00:03:57,120 your models could still predict gender 99 00:03:57,120 --> 00:03:59,590 and use that in the model. 100 00:03:59,590 --> 00:04:02,530 Second, since your data shows a difference in default rates, 101 00:04:02,530 --> 00:04:05,650 you have to actively decide how to correct for that. 102 00:04:05,650 --> 00:04:07,450 In the case of loans, different approaches 103 00:04:07,450 --> 00:04:09,950 to implement fairness may have a trade-off with the accuracy 104 00:04:09,950 --> 00:04:12,220 of your algorithms. 105 00:04:12,220 --> 00:04:15,130 Third, the type of algorithm the implementer uses 106 00:04:15,130 --> 00:04:16,870 could have trade-offs as well. 107 00:04:16,870 --> 00:04:19,720 Some algorithms may be faster at the cost of accuracy. 108 00:04:19,720 --> 00:04:22,330 Others may be more accurate at the cost of explainability 109 00:04:22,330 --> 00:04:26,170 or understandability. 110 00:04:26,170 --> 00:04:27,910 I won't go more in-depth on these topics, 111 00:04:27,910 --> 00:04:30,610 because we will discuss them more in future modules. 112 00:04:30,610 --> 00:04:34,220 However, I do want to highlight a couple of important concepts. 113 00:04:34,220 --> 00:04:36,580 First, implementing a machine learning algorithm 114 00:04:36,580 --> 00:04:38,760 is not an objective process. 115 00:04:38,760 --> 00:04:41,680 In your implementation, you are both designing a technology 116 00:04:41,680 --> 00:04:43,690 and making decisions, both of which 117 00:04:43,690 --> 00:04:46,770 introduce your biases into the system. 118 00:04:46,770 --> 00:04:49,350 To think that outcomes from a computer are objective 119 00:04:49,350 --> 00:04:51,500 is just a fantasy. 120 00:04:51,500 --> 00:04:53,690 Second, open communication between you 121 00:04:53,690 --> 00:04:56,440 and the implementer on your values as an organization. 122 00:04:56,440 --> 00:04:59,900 And decisions that they are making are critical. 123 00:04:59,900 --> 00:05:03,080 Third, you need a way to audit your data and your algorithms 124 00:05:03,080 --> 00:05:06,240 if you want to have a fair system. 125 00:05:06,240 --> 00:05:08,870 Let's move on, assuming you were able to work with a consultant 126 00:05:08,870 --> 00:05:11,407 to build a satisfactory solution to your algorithm. 127 00:05:11,407 --> 00:05:13,490 And you're able to demonstrate significant success 128 00:05:13,490 --> 00:05:15,670 with your pilot in western Uganda. 129 00:05:15,670 --> 00:05:18,150 You now want to scale your model to other parts of Uganda 130 00:05:18,150 --> 00:05:19,730 and East Africa. 131 00:05:19,730 --> 00:05:21,890 At this point, it is important to pay attention 132 00:05:21,890 --> 00:05:24,500 to the representativeness of your data. 133 00:05:24,500 --> 00:05:26,750 Are there large differences between the types of users 134 00:05:26,750 --> 00:05:29,450 you have in western Uganda and eastern Uganda? 135 00:05:29,450 --> 00:05:31,967 How about the users in Uganda and Tanzania? 136 00:05:31,967 --> 00:05:34,550 You need to make sure that you are collecting representatives' 137 00:05:34,550 --> 00:05:36,740 data as you scale your solution, which 138 00:05:36,740 --> 00:05:39,502 involves significant testing and auditing. 139 00:05:39,502 --> 00:05:40,960 Additionally, you want to make sure 140 00:05:40,960 --> 00:05:42,670 that changes within your population 141 00:05:42,670 --> 00:05:44,450 do not suddenly affect your results. 142 00:05:44,450 --> 00:05:47,590 For example, if a kerosene tax were imposed by the government, 143 00:05:47,590 --> 00:05:49,750 would your model no longer be accurate? 144 00:05:49,750 --> 00:05:52,277 How could you build in support within your organization 145 00:05:52,277 --> 00:05:53,860 to make sure you can react to changes? 146 00:05:57,300 --> 00:05:59,510 Thank you for taking the time to take this course. 147 00:05:59,510 --> 00:06:01,427 We hope that you'll continue to watch the rest 148 00:06:01,427 --> 00:06:03,650 of the modules in the series. 149 00:06:03,650 --> 00:06:07,000 [MUSIC PLAYING]