1
00:00:04,810 --> 00:00:08,177
[MUSIC PLAYING]

2
00:00:11,063 --> 00:00:13,030
AMIT GANDHI: Hi, my
name is Amit Gandhi.

3
00:00:13,030 --> 00:00:15,093
And I'm a graduate
researcher at MIT.

4
00:00:15,093 --> 00:00:16,510
Welcome to the
series on exploring

5
00:00:16,510 --> 00:00:19,240
fairness and machine learning
for international development.

6
00:00:19,240 --> 00:00:21,130
In this module, we
will have an example

7
00:00:21,130 --> 00:00:23,692
of how an organization would
go about implementing machine

8
00:00:23,692 --> 00:00:25,900
learning and what some of
the ethical challenges that

9
00:00:25,900 --> 00:00:28,580
may arise are.

10
00:00:28,580 --> 00:00:30,830
This module primarily
focuses on decisions

11
00:00:30,830 --> 00:00:32,820
that are made at the
organizational level.

12
00:00:32,820 --> 00:00:35,930
But it is important for both
organizational decision makers

13
00:00:35,930 --> 00:00:37,370
and machine learning
implementers

14
00:00:37,370 --> 00:00:39,960
to consider these interactions.

15
00:00:39,960 --> 00:00:41,520
In this case study,
we will be taking

16
00:00:41,520 --> 00:00:44,670
the role of a chief technology
officer of a social enterprise

17
00:00:44,670 --> 00:00:47,450
to provide solar lighting
products in East Africa.

18
00:00:47,450 --> 00:00:49,950
The mission of the company is
to provide affordable lighting

19
00:00:49,950 --> 00:00:51,793
solutions to people
living in poverty.

20
00:00:51,793 --> 00:00:53,460
And the company started
off by providing

21
00:00:53,460 --> 00:00:56,250
high-quality, inexpensive
solar light, so a replacement

22
00:00:56,250 --> 00:00:59,020
for kerosene lanterns.

23
00:00:59,020 --> 00:01:02,130
Over time, the company has
grown and increased its product

24
00:01:02,130 --> 00:01:05,760
offering to include large solar
home systems and along the way

25
00:01:05,760 --> 00:01:08,250
has implemented pay-as-you-go
models so that households can

26
00:01:08,250 --> 00:01:10,830
afford to purchase
these larger systems.

27
00:01:10,830 --> 00:01:13,530
The way pay-as-you-go models
work are that you provide

28
00:01:13,530 --> 00:01:15,440
the solar lighting
infrastructure as a loaned

29
00:01:15,440 --> 00:01:16,497
asset to individuals.

30
00:01:16,497 --> 00:01:19,080
And they pay you back over time
through mobile money payments,

31
00:01:19,080 --> 00:01:23,350
until the full value of
the asset is recovered.

32
00:01:23,350 --> 00:01:25,730
The company has been meticulous
about keeping records

33
00:01:25,730 --> 00:01:27,650
from transactions
from their user base.

34
00:01:27,650 --> 00:01:30,580
And as a result, you have access
to both demographic information

35
00:01:30,580 --> 00:01:33,263
and payment history for
all of your clients.

36
00:01:33,263 --> 00:01:34,930
The information you
have from your users

37
00:01:34,930 --> 00:01:37,600
includes age, gender,
occupation, location,

38
00:01:37,600 --> 00:01:39,270
and household income.

39
00:01:39,270 --> 00:01:41,020
As you look at expanding
the social impact

40
00:01:41,020 --> 00:01:43,120
of your enterprise, you
realize that this data

41
00:01:43,120 --> 00:01:45,400
can be analyzed to
determine a creditworthiness

42
00:01:45,400 --> 00:01:47,300
metric for your customers.

43
00:01:47,300 --> 00:01:49,300
Additionally, you could
provide this information

44
00:01:49,300 --> 00:01:51,580
to banks or microfinance
institutions

45
00:01:51,580 --> 00:01:55,500
so that they can give
loans to your client base.

46
00:01:55,500 --> 00:01:57,210
Machine learning
is a powerful tool

47
00:01:57,210 --> 00:01:59,610
that you can use to implement
this credit scoring metric.

48
00:01:59,610 --> 00:02:01,747
However, you do not have
data scientists or machine

49
00:02:01,747 --> 00:02:03,330
learning experts
within your team that

50
00:02:03,330 --> 00:02:05,340
can implement this solution.

51
00:02:05,340 --> 00:02:07,290
You also do not know
how accurate or powerful

52
00:02:07,290 --> 00:02:08,836
an algorithm you
developed could be.

53
00:02:08,836 --> 00:02:10,919
So you do not want to spend
the resources to build

54
00:02:10,919 --> 00:02:13,290
a full team [INAUDIBLE]
on a small pilot with some

55
00:02:13,290 --> 00:02:16,010
of your users in Uganda.

56
00:02:16,010 --> 00:02:18,227
As a resourceful company
with engineering staff,

57
00:02:18,227 --> 00:02:20,060
you could either have
some of your engineers

58
00:02:20,060 --> 00:02:21,770
implement a machine
learning solution

59
00:02:21,770 --> 00:02:25,670
using off-the-shelf products or
work with a third party company

60
00:02:25,670 --> 00:02:29,670
to implement the solution
for credit scoring.

61
00:02:29,670 --> 00:02:31,530
Let's pause this case
study for a second

62
00:02:31,530 --> 00:02:33,660
and examine the pros and
cons of the decisions that

63
00:02:33,660 --> 00:02:34,830
need to be made.

64
00:02:34,830 --> 00:02:36,600
It is important to
consider perspectives

65
00:02:36,600 --> 00:02:38,970
from both the machine
learning implementer as well

66
00:02:38,970 --> 00:02:40,950
as the organizations to
understand the thoughts

67
00:02:40,950 --> 00:02:44,010
and complexities that go
into developing a solution.

68
00:02:44,010 --> 00:02:46,770
Doing it in-house without
a trained data scientist

69
00:02:46,770 --> 00:02:48,870
will likely involve
implementation of a black box

70
00:02:48,870 --> 00:02:49,853
solution.

71
00:02:49,853 --> 00:02:52,020
While someone with no
background in machine learning

72
00:02:52,020 --> 00:02:54,480
could get a solution up
and running fairly quickly,

73
00:02:54,480 --> 00:02:58,640
there are several nuances in the
design that may get overlooked.

74
00:02:58,640 --> 00:03:00,260
Allowing a third
party consultant

75
00:03:00,260 --> 00:03:03,203
to implement your solution would
solve many of these issues,

76
00:03:03,203 --> 00:03:05,120
though you may lack both
in-house capabilities

77
00:03:05,120 --> 00:03:07,520
to understand how your
model is being implemented

78
00:03:07,520 --> 00:03:10,960
and maintain it moving forward.

79
00:03:10,960 --> 00:03:13,600
Let's assume that one way or
another, the credit scoring

80
00:03:13,600 --> 00:03:15,280
algorithm gets built.

81
00:03:15,280 --> 00:03:17,500
Without paying attention
to fairness in this setup,

82
00:03:17,500 --> 00:03:19,270
several issues may arise.

83
00:03:19,270 --> 00:03:22,653
First, you may find that as you
analyze your historical data,

84
00:03:22,653 --> 00:03:24,820
that certain groups of
people have different default

85
00:03:24,820 --> 00:03:26,820
rates than others.

86
00:03:26,820 --> 00:03:30,010
For example, women may have a
lower default rate than men.

87
00:03:30,010 --> 00:03:32,098
And you may decide that
as an organization,

88
00:03:32,098 --> 00:03:34,140
you want to be fair and
gender blind in your loan

89
00:03:34,140 --> 00:03:35,960
determination.

90
00:03:35,960 --> 00:03:38,500
The slide shows an example of
what different loan rates look

91
00:03:38,500 --> 00:03:42,130
like for what men and women.

92
00:03:42,130 --> 00:03:44,320
To implement fairness,
a naive implementer

93
00:03:44,320 --> 00:03:47,020
may first try to use fairness
through unawareness, which

94
00:03:47,020 --> 00:03:48,940
means that you simply
hide gender information

95
00:03:48,940 --> 00:03:51,150
while building your models.

96
00:03:51,150 --> 00:03:53,280
Depending on correlations
within your data

97
00:03:53,280 --> 00:03:55,537
and how relevant gender
is to default rates,

98
00:03:55,537 --> 00:03:57,120
your models could
still predict gender

99
00:03:57,120 --> 00:03:59,590
and use that in the model.

100
00:03:59,590 --> 00:04:02,530
Second, since your data shows
a difference in default rates,

101
00:04:02,530 --> 00:04:05,650
you have to actively decide
how to correct for that.

102
00:04:05,650 --> 00:04:07,450
In the case of loans,
different approaches

103
00:04:07,450 --> 00:04:09,950
to implement fairness may have
a trade-off with the accuracy

104
00:04:09,950 --> 00:04:12,220
of your algorithms.

105
00:04:12,220 --> 00:04:15,130
Third, the type of algorithm
the implementer uses

106
00:04:15,130 --> 00:04:16,870
could have trade-offs as well.

107
00:04:16,870 --> 00:04:19,720
Some algorithms may be faster
at the cost of accuracy.

108
00:04:19,720 --> 00:04:22,330
Others may be more accurate
at the cost of explainability

109
00:04:22,330 --> 00:04:26,170
or understandability.

110
00:04:26,170 --> 00:04:27,910
I won't go more in-depth
on these topics,

111
00:04:27,910 --> 00:04:30,610
because we will discuss
them more in future modules.

112
00:04:30,610 --> 00:04:34,220
However, I do want to highlight
a couple of important concepts.

113
00:04:34,220 --> 00:04:36,580
First, implementing a
machine learning algorithm

114
00:04:36,580 --> 00:04:38,760
is not an objective process.

115
00:04:38,760 --> 00:04:41,680
In your implementation, you
are both designing a technology

116
00:04:41,680 --> 00:04:43,690
and making decisions,
both of which

117
00:04:43,690 --> 00:04:46,770
introduce your biases
into the system.

118
00:04:46,770 --> 00:04:49,350
To think that outcomes from
a computer are objective

119
00:04:49,350 --> 00:04:51,500
is just a fantasy.

120
00:04:51,500 --> 00:04:53,690
Second, open
communication between you

121
00:04:53,690 --> 00:04:56,440
and the implementer on your
values as an organization.

122
00:04:56,440 --> 00:04:59,900
And decisions that they
are making are critical.

123
00:04:59,900 --> 00:05:03,080
Third, you need a way to audit
your data and your algorithms

124
00:05:03,080 --> 00:05:06,240
if you want to
have a fair system.

125
00:05:06,240 --> 00:05:08,870
Let's move on, assuming you were
able to work with a consultant

126
00:05:08,870 --> 00:05:11,407
to build a satisfactory
solution to your algorithm.

127
00:05:11,407 --> 00:05:13,490
And you're able to demonstrate
significant success

128
00:05:13,490 --> 00:05:15,670
with your pilot
in western Uganda.

129
00:05:15,670 --> 00:05:18,150
You now want to scale your
model to other parts of Uganda

130
00:05:18,150 --> 00:05:19,730
and East Africa.

131
00:05:19,730 --> 00:05:21,890
At this point, it is
important to pay attention

132
00:05:21,890 --> 00:05:24,500
to the representativeness
of your data.

133
00:05:24,500 --> 00:05:26,750
Are there large differences
between the types of users

134
00:05:26,750 --> 00:05:29,450
you have in western
Uganda and eastern Uganda?

135
00:05:29,450 --> 00:05:31,967
How about the users in
Uganda and Tanzania?

136
00:05:31,967 --> 00:05:34,550
You need to make sure that you
are collecting representatives'

137
00:05:34,550 --> 00:05:36,740
data as you scale
your solution, which

138
00:05:36,740 --> 00:05:39,502
involves significant
testing and auditing.

139
00:05:39,502 --> 00:05:40,960
Additionally, you
want to make sure

140
00:05:40,960 --> 00:05:42,670
that changes within
your population

141
00:05:42,670 --> 00:05:44,450
do not suddenly
affect your results.

142
00:05:44,450 --> 00:05:47,590
For example, if a kerosene tax
were imposed by the government,

143
00:05:47,590 --> 00:05:49,750
would your model no
longer be accurate?

144
00:05:49,750 --> 00:05:52,277
How could you build in support
within your organization

145
00:05:52,277 --> 00:05:53,860
to make sure you can
react to changes?

146
00:05:57,300 --> 00:05:59,510
Thank you for taking the
time to take this course.

147
00:05:59,510 --> 00:06:01,427
We hope that you'll
continue to watch the rest

148
00:06:01,427 --> 00:06:03,650
of the modules in the series.

149
00:06:03,650 --> 00:06:07,000
[MUSIC PLAYING]