1 00:00:04,500 --> 00:00:07,510 To build an analytics model, let us discuss the variables 2 00:00:07,510 --> 00:00:09,850 we used. 3 00:00:09,850 --> 00:00:13,280 First, we used the 13,000 diagnoses. 4 00:00:13,280 --> 00:00:18,580 It's for the codes for diagnosis that claims data utilize. 5 00:00:18,580 --> 00:00:21,970 There were also 22,000 different codes for procedures 6 00:00:21,970 --> 00:00:24,910 and 45,000 codes for prescription drugs. 7 00:00:24,910 --> 00:00:29,380 To work with this massive amount of variables, 8 00:00:29,380 --> 00:00:32,150 we aggregated the variables as follows. 9 00:00:32,150 --> 00:00:39,500 Out of the 13,000 diagnoses, we defined 217 diagnosis groups. 10 00:00:39,500 --> 00:00:43,730 Out of the 20,000 procedures, we aggregated the data 11 00:00:43,730 --> 00:00:46,410 to develop 213 procedure groups. 12 00:00:46,410 --> 00:00:49,330 And, finally, from 45,000 prescription drugs, 13 00:00:49,330 --> 00:00:54,530 we developed 189 therapeutic groups. 14 00:00:54,530 --> 00:00:58,620 To illustrate an example of how we infer further information 15 00:00:58,620 --> 00:01:03,340 from the data, the graph here shows 16 00:01:03,340 --> 00:01:08,190 on the horizontal axis, time, and on the vertical axis, 17 00:01:08,190 --> 00:01:13,280 costs in thousands of dollars. 18 00:01:13,280 --> 00:01:23,190 So patient one is a patient who, on a monthly basis, 19 00:01:23,190 --> 00:01:29,289 has costs on the order of $10,000 to $15,000, a fairly 20 00:01:29,289 --> 00:01:32,570 significant cost but fairly constant in time. 21 00:01:32,570 --> 00:01:37,340 Patient two has also an annual cost 22 00:01:37,340 --> 00:01:39,590 of a similar size to patient one. 23 00:01:39,590 --> 00:01:45,620 But in all but the third month, the costs are almost $0. 24 00:01:45,620 --> 00:01:51,250 Whereas in the third month, it cost about $70,000. 25 00:01:51,250 --> 00:01:53,020 In fact, this is additional data we 26 00:01:53,020 --> 00:01:59,140 defined indicating whether the patient has 27 00:01:59,140 --> 00:02:01,560 a chronic or an acute condition. 28 00:02:01,560 --> 00:02:06,360 In addition to the initial variables, the 217 procedure 29 00:02:06,360 --> 00:02:10,150 groups, and 189 drugs, and so forth, we also 30 00:02:10,150 --> 00:02:13,240 defined in collaboration with medical doctors, 31 00:02:13,240 --> 00:02:17,450 269 medically-defined rules. 32 00:02:17,450 --> 00:02:20,320 For example, the first type of rule 33 00:02:20,320 --> 00:02:23,960 indicates the interaction between various indices. 34 00:02:23,960 --> 00:02:26,620 For example, obesity and depression. 35 00:02:36,460 --> 00:02:39,440 Then new variables regarding interaction 36 00:02:39,440 --> 00:02:41,110 between diagnosis and age. 37 00:02:41,110 --> 00:02:44,950 For example, more than 65 years old and coronary 38 00:02:44,950 --> 00:02:45,610 artery disease. 39 00:02:50,480 --> 00:02:51,690 Noncompliance with treatment. 40 00:02:51,690 --> 00:02:55,930 For example, non-fulfillment of a particular drug order. 41 00:02:55,930 --> 00:02:58,790 And, finally, illness severity. 42 00:02:58,790 --> 00:03:01,000 For example, severe depression as 43 00:03:01,000 --> 00:03:02,620 opposed to regular depression. 44 00:03:05,520 --> 00:03:09,520 And the last set of variables involve demographic information 45 00:03:09,520 --> 00:03:11,080 like gender and age. 46 00:03:15,300 --> 00:03:18,600 An important aspect of the variables 47 00:03:18,600 --> 00:03:22,380 are the variables related to cost. 48 00:03:22,380 --> 00:03:24,590 So rather than using costs directly, 49 00:03:24,590 --> 00:03:31,079 we bucketed costs and considered everyone in the group equally. 50 00:03:31,079 --> 00:03:34,460 So we defined five buckets. 51 00:03:34,460 --> 00:03:37,579 So the buckets were partitioned in such a way 52 00:03:37,579 --> 00:03:45,570 so that 20% of all costs is in bucket five, 53 00:03:45,570 --> 00:03:49,700 20% is in bucket four, and so forth. 54 00:03:52,520 --> 00:03:58,920 So the partitions were from 0 to 3,000, from 3,000 to 8,000, 55 00:03:58,920 --> 00:04:04,000 from 8,000 to 19,000, from 19,000 to 55,000, 56 00:04:04,000 --> 00:04:06,580 and above 55,000. 57 00:04:06,580 --> 00:04:13,360 The number of patients that were below 3,000 58 00:04:13,360 --> 00:04:22,180 was-- 78% of the patients had costs below 3,000. 59 00:04:22,180 --> 00:04:26,010 Just to remind you, we created a bucket 60 00:04:26,010 --> 00:04:30,980 so that the total cost in each bucket was 20% of the total. 61 00:04:30,980 --> 00:04:33,840 But the number of patients in bucket one, for example, 62 00:04:33,840 --> 00:04:34,670 is very high (78%). 63 00:04:37,170 --> 00:04:41,250 Let us interpret the buckets medically. 64 00:04:41,250 --> 00:04:44,540 So this shows the various levels of risk. 65 00:04:44,540 --> 00:04:50,170 Bucket one consists of patients that have rather low risk. 66 00:04:50,170 --> 00:04:54,400 Bucket two has what is called emerging risk. 67 00:04:54,400 --> 00:04:57,460 In bucket three, moderate level of risk. 68 00:04:57,460 --> 00:04:59,230 Bucket four, high risk. 69 00:04:59,230 --> 00:05:01,880 And bucket five, very high risk. 70 00:05:01,880 --> 00:05:04,930 So from a medical perspective, buckets two and three, 71 00:05:04,930 --> 00:05:07,820 the medical and the moderate risk patients, 72 00:05:07,820 --> 00:05:11,620 are candidates for wellness programs. 73 00:05:11,620 --> 00:05:13,920 Whereas bucket four, the high risk patients, 74 00:05:13,920 --> 00:05:16,740 are candidates for disease management programs. 75 00:05:16,740 --> 00:05:20,210 And finally bucket five, the very high risk patients, 76 00:05:20,210 --> 00:05:22,590 are candidates for case management.