1 00:00:17,060 --> 00:00:19,580 ADAM MARTIN: And so I just want to say a couple sentences 2 00:00:19,580 --> 00:00:24,610 about DNA sequencing, just to finish that up. 3 00:00:24,610 --> 00:00:29,870 And so you'll remember this slide from last lecture. 4 00:00:29,870 --> 00:00:35,630 And remember, the way this Sanger technique works 5 00:00:35,630 --> 00:00:38,450 is to set up four different reactions where 6 00:00:38,450 --> 00:00:41,120 each reaction has a different one 7 00:00:41,120 --> 00:00:44,970 of these dideoxynucleotides. 8 00:00:44,970 --> 00:00:51,120 OK, so there's four reactions, each 9 00:00:51,120 --> 00:00:59,760 with different dideoxy NTP. 10 00:00:59,760 --> 00:01:04,830 And I brought along a gel that I ran a while ago, 11 00:01:04,830 --> 00:01:09,460 which is basically-- it's from sequencing gel, and you can-- 12 00:01:09,460 --> 00:01:12,550 I'll pass this around so you can take a look at it. 13 00:01:12,550 --> 00:01:16,650 So the four different lanes for each sample 14 00:01:16,650 --> 00:01:21,300 are the different dideoxynucleotide reactions. 15 00:01:21,300 --> 00:01:24,570 And what I want you to notice as that's passing around 16 00:01:24,570 --> 00:01:27,780 and you're looking at it is that the different reactions 17 00:01:27,780 --> 00:01:31,740 with the different dideoxynucleotides 18 00:01:31,740 --> 00:01:36,500 give different patterns of DNA fragment lengths. 19 00:01:36,500 --> 00:01:40,140 So there are different patterns of fragment lengths. 20 00:01:47,580 --> 00:01:52,950 And the different patterns are based on the fact-- 21 00:01:55,690 --> 00:01:59,000 this is based on the sequence, the sequence of the template, 22 00:01:59,000 --> 00:01:59,500 OK? 23 00:02:06,180 --> 00:02:11,190 And so if we look at the example up here, what you'll see 24 00:02:11,190 --> 00:02:15,900 is that in this banding pattern for dideoxy TTP, 25 00:02:15,900 --> 00:02:18,900 you see that there's a really short fragment at the bottom 26 00:02:18,900 --> 00:02:22,830 there, and so that fragment indicates that there must be 27 00:02:22,830 --> 00:02:25,620 an A in the template sequence. 28 00:02:25,620 --> 00:02:30,240 The next fragment up would be this one in the dideoxy GTP 29 00:02:30,240 --> 00:02:35,610 lane, and that indicates that one nucleotide beyond this A 30 00:02:35,610 --> 00:02:39,120 is a C position, and so on and so forth, 31 00:02:39,120 --> 00:02:42,270 such that you can sort of order the fragments 32 00:02:42,270 --> 00:02:45,630 and see which reaction has a fragment 33 00:02:45,630 --> 00:02:49,110 and then read off a DNA sequence. 34 00:02:49,110 --> 00:02:50,880 OK, so conceptually, that's how you 35 00:02:50,880 --> 00:02:57,150 would read off the sequence of a given strand of DNA, OK? 36 00:02:57,150 --> 00:03:00,720 So you might be wondering, if now, we just read off 37 00:03:00,720 --> 00:03:05,160 sequence as a series of colors, why am I even introducing 38 00:03:05,160 --> 00:03:06,510 this technique? 39 00:03:06,510 --> 00:03:08,160 And the reason is because I think 40 00:03:08,160 --> 00:03:14,730 it's important for you as potentially future scientists 41 00:03:14,730 --> 00:03:18,870 to know that when you're faced with a problem, 42 00:03:18,870 --> 00:03:21,480 how you might discover something new. 43 00:03:21,480 --> 00:03:24,540 And I see the Sanger method of DNA 44 00:03:24,540 --> 00:03:29,250 sequencing as a really clever and elegant way 45 00:03:29,250 --> 00:03:33,510 in which Fred Sanger solved the problem of DNA sequencing, 46 00:03:33,510 --> 00:03:37,710 and while we don't necessarily do it that way today, it still 47 00:03:37,710 --> 00:03:40,500 illustrates a concept that's important, 48 00:03:40,500 --> 00:03:43,020 the concept of chain termination, 49 00:03:43,020 --> 00:03:46,260 and I think there is something to be learned from this older 50 00:03:46,260 --> 00:03:48,900 technique, even if it's not exactly how we 51 00:03:48,900 --> 00:03:51,750 sequence DNA today. 52 00:03:51,750 --> 00:03:54,020 So for today's lecture, we're going 53 00:03:54,020 --> 00:04:00,300 to continue on our quest to basically clone a gene that's 54 00:04:00,300 --> 00:04:01,935 responsible for a disease. 55 00:04:04,920 --> 00:04:09,960 And so we started this in the last lecture. 56 00:04:09,960 --> 00:04:12,930 And I guess one thing we would want 57 00:04:12,930 --> 00:04:15,120 to start with is a disease, so I'm 58 00:04:15,120 --> 00:04:19,860 going to introduce to you now a disease called aniridia. 59 00:04:19,860 --> 00:04:23,720 And in order to clone the gene for a disease, 60 00:04:23,720 --> 00:04:25,320 it has to be a heritable disease, 61 00:04:25,320 --> 00:04:28,590 in this case, because we're going to use linkage analysis 62 00:04:28,590 --> 00:04:30,420 to identify it. 63 00:04:30,420 --> 00:04:35,250 So aniridia is a disease that's an eye disease in humans. 64 00:04:35,250 --> 00:04:37,800 It's a rare eye disease. 65 00:04:37,800 --> 00:04:43,020 So I want to show you a bit of an example of this eye disease. 66 00:04:43,020 --> 00:04:45,180 The way this disease manifests itself 67 00:04:45,180 --> 00:04:48,930 is it's basically the affected individual 68 00:04:48,930 --> 00:04:52,380 has an eye that is lacking an iris. 69 00:04:52,380 --> 00:04:55,230 So I'm going to show you what this looks like. 70 00:04:55,230 --> 00:04:58,950 If you're squeamish or don't like weird eyes 71 00:04:58,950 --> 00:05:02,320 and you don't want to look, you can look away. 72 00:05:02,320 --> 00:05:09,150 But I will show you affected phenotype in 3, 2, 1, OK, 73 00:05:09,150 --> 00:05:11,910 everyone looking who wants to see weird eyes. 74 00:05:11,910 --> 00:05:13,860 OK, good. 75 00:05:13,860 --> 00:05:17,970 So that is a individual that has aniridia, and also this one. 76 00:05:17,970 --> 00:05:21,750 So you see there's no clear iris in these eyes. 77 00:05:21,750 --> 00:05:25,470 And this disease is associated with other abnormalities 78 00:05:25,470 --> 00:05:29,550 of the eye that severely impair vision. 79 00:05:29,550 --> 00:05:33,940 And this is an inherited disease, 80 00:05:33,940 --> 00:05:38,580 and this is a pedigree from a family 81 00:05:38,580 --> 00:05:42,600 or series of families where the disease is propagating. 82 00:05:42,600 --> 00:05:46,720 And so anyone have a suggestion as to what mode of inheritance 83 00:05:46,720 --> 00:05:47,220 this is? 84 00:05:54,310 --> 00:05:56,450 Anyone want to rule a mode out? 85 00:05:56,450 --> 00:05:58,010 Rachel, you have an idea? 86 00:05:58,010 --> 00:06:00,416 AUDIENCE: I was going to say X-linked dominant, 87 00:06:00,416 --> 00:06:07,890 but [INAUDIBLE] 88 00:06:07,890 --> 00:06:11,940 ADAM MARTIN: OK, so let's take X-linked dominant. 89 00:06:11,940 --> 00:06:16,950 So if it was X-linked dominant, then this male 90 00:06:16,950 --> 00:06:19,950 would have an X chromosome with the dominant allele 91 00:06:19,950 --> 00:06:24,840 of the disease and should only pass it to his females. 92 00:06:24,840 --> 00:06:26,880 So I don't think that it would necessarily 93 00:06:26,880 --> 00:06:28,350 be X-linked dominant. 94 00:06:28,350 --> 00:06:30,967 Anyone else have an idea? 95 00:06:30,967 --> 00:06:31,550 Yeah, Georgia? 96 00:06:31,550 --> 00:06:32,690 AUDIENCE: Autosomal. 97 00:06:32,690 --> 00:06:34,130 ADAM MARTIN: Autosomal dominant. 98 00:06:34,130 --> 00:06:37,200 I like autosomal dominant. 99 00:06:37,200 --> 00:06:39,860 So in this case, you see you have 100 00:06:39,860 --> 00:06:42,440 an individual with the disease and they 101 00:06:42,440 --> 00:06:46,262 marry into a family with no history of the disease. 102 00:06:46,262 --> 00:06:48,470 One thing I'll point out, for many of these diseases, 103 00:06:48,470 --> 00:06:52,970 they're extremely rare, so if you see sort of a family tree 104 00:06:52,970 --> 00:06:55,820 where there's no instance of the disease, 105 00:06:55,820 --> 00:06:57,920 if it's a rare disease, it's likely 106 00:06:57,920 --> 00:07:01,660 that these individuals are not carriers. 107 00:07:01,660 --> 00:07:05,270 And so in this case, if you assume that this person doesn't 108 00:07:05,270 --> 00:07:07,750 have any form of the-- 109 00:07:07,750 --> 00:07:10,550 isn't a carrier for the disease, then this 110 00:07:10,550 --> 00:07:12,980 cross here resulting in about half 111 00:07:12,980 --> 00:07:15,710 of the individuals affected with the disease, that 112 00:07:15,710 --> 00:07:19,790 would be a characteristic of an autosomal dominant disease. 113 00:07:19,790 --> 00:07:21,470 So everyone understand my logic? 114 00:07:21,470 --> 00:07:22,190 Yes, Carlos? 115 00:07:22,190 --> 00:07:24,897 AUDIENCE: What are-- why is two and that 116 00:07:24,897 --> 00:07:27,230 looks like three on the slide, why are they crossed out? 117 00:07:27,230 --> 00:07:29,300 ADAM MARTIN: I think they're deceased. 118 00:07:29,300 --> 00:07:29,800 Yes. 119 00:07:32,880 --> 00:07:36,120 OK, so let's say you have a pedigree. 120 00:07:36,120 --> 00:07:42,930 You have pedigrees, you're able to try to link this marker to-- 121 00:07:42,930 --> 00:07:46,480 or the disease phenotype with various molecular markers, 122 00:07:46,480 --> 00:07:49,560 which we discussed in last week's lectures, 123 00:07:49,560 --> 00:07:55,260 then you're on the way to performing 124 00:07:55,260 --> 00:07:58,740 a process which is known as positional gene cloning. 125 00:08:01,380 --> 00:08:04,470 And what positional gene cloning is 126 00:08:04,470 --> 00:08:09,000 is it's basically cloning a gene and a allele that's 127 00:08:09,000 --> 00:08:13,710 responsible for a disease based on its position in the genome, 128 00:08:13,710 --> 00:08:18,000 it's position in a particular chromosomal region. 129 00:08:18,000 --> 00:08:25,740 So it's basically cloning a gene based 130 00:08:25,740 --> 00:08:31,920 on its chromosomal position or its chromosome position. 131 00:08:38,610 --> 00:08:42,030 And the first step of positional gene cloning 132 00:08:42,030 --> 00:08:45,900 would be to establish maybe what chromosome it's on. 133 00:08:45,900 --> 00:08:50,970 And a straightforward way to do this, as we've basically 134 00:08:50,970 --> 00:08:54,420 been discussing almost from when I started lecturing, 135 00:08:54,420 --> 00:08:56,700 is to create some sort of linkage 136 00:08:56,700 --> 00:09:02,220 map or do linkage mapping to identify, 137 00:09:02,220 --> 00:09:06,160 in the case of humans, molecular markers that this disease 138 00:09:06,160 --> 00:09:08,160 allele is linked to. 139 00:09:12,780 --> 00:09:15,930 And remember, in last week's lecture, 140 00:09:15,930 --> 00:09:19,530 we talked about a number of different polymorphisms 141 00:09:19,530 --> 00:09:22,410 that are present in the human genome 142 00:09:22,410 --> 00:09:28,050 that we can use to establish linkage with a given phenotype. 143 00:09:28,050 --> 00:09:30,920 In this case, it's a human disease. 144 00:09:30,920 --> 00:09:35,280 And we talked about this example for a microsatellite marker. 145 00:09:35,280 --> 00:09:39,630 And in this case, we talked through this example 146 00:09:39,630 --> 00:09:43,290 of how this dominant allele, P, is linked 147 00:09:43,290 --> 00:09:47,010 to this microsatellite allele m double prime, 148 00:09:47,010 --> 00:09:50,070 because if you look at the pedigree here, 149 00:09:50,070 --> 00:09:54,000 all of the affected individuals here 150 00:09:54,000 --> 00:09:57,090 contain this m double prime sized fragment 151 00:09:57,090 --> 00:10:00,030 for this microsatellite. 152 00:10:00,030 --> 00:10:02,220 Another thing to notice here is you 153 00:10:02,220 --> 00:10:06,510 can see that this couple has been faithful to each other, 154 00:10:06,510 --> 00:10:09,240 because basically, each of the children 155 00:10:09,240 --> 00:10:13,050 have an allele from the father and an allele from the mother. 156 00:10:13,050 --> 00:10:16,500 So you can see that type of-- 157 00:10:16,500 --> 00:10:22,590 you can see that using this type of molecular marker as well. 158 00:10:22,590 --> 00:10:24,880 OK, so you establish linkage. 159 00:10:24,880 --> 00:10:30,820 So linkage mapping establishes the chromosome position 160 00:10:30,820 --> 00:10:32,890 of a given allele and the gene. 161 00:10:37,450 --> 00:10:39,310 And this chromosome position sort of 162 00:10:39,310 --> 00:10:42,340 gets maybe in the right country, but you still 163 00:10:42,340 --> 00:10:47,370 have a long way before you get to the specific street address. 164 00:10:47,370 --> 00:10:49,720 And so you have to then sort of narrow it in 165 00:10:49,720 --> 00:10:54,580 to identify a smaller region of the chromosome that could 166 00:10:54,580 --> 00:10:57,730 possibly contain this gene. 167 00:10:57,730 --> 00:11:02,140 And so what you would do is go from this linkage map, where 168 00:11:02,140 --> 00:11:06,520 you maybe identify the position of this gene within a couple 169 00:11:06,520 --> 00:11:10,840 map units, to this next resolution of map 170 00:11:10,840 --> 00:11:14,390 called a physical map, OK? 171 00:11:14,390 --> 00:11:17,210 So we go from the linkage position 172 00:11:17,210 --> 00:11:21,430 to the physical map of the chromosome. 173 00:11:24,930 --> 00:11:27,810 And the physical map, as the name implies, 174 00:11:27,810 --> 00:11:31,040 is when you have physical pieces of DNA 175 00:11:31,040 --> 00:11:36,260 that are present in this region of the chromosome. 176 00:11:36,260 --> 00:11:38,510 So the physical map means you have 177 00:11:38,510 --> 00:11:42,530 cloned, so recombinant pieces of DNA, 178 00:11:42,530 --> 00:11:49,220 cloned pieces of DNA which encompass a given chromosome 179 00:11:49,220 --> 00:11:50,520 region. 180 00:11:50,520 --> 00:11:58,150 So these are encompassing a chromosome region. 181 00:12:05,900 --> 00:12:10,910 OK, so how would you get a piece of DNA 182 00:12:10,910 --> 00:12:14,560 that sort of is in this region? 183 00:12:14,560 --> 00:12:15,790 How would you start? 184 00:12:15,790 --> 00:12:17,870 How would you start fishing for that DNA? 185 00:12:21,490 --> 00:12:24,910 So you've gone through the process of linkage, 186 00:12:24,910 --> 00:12:29,200 you've identified sort of a polymorphism that is 187 00:12:29,200 --> 00:12:31,330 linked to the disease allele. 188 00:12:31,330 --> 00:12:35,650 How would you go from there to getting a physical piece of DNA 189 00:12:35,650 --> 00:12:41,690 that is present in that region of the chromosome? 190 00:12:41,690 --> 00:12:43,720 So let's think back to-- 191 00:12:43,720 --> 00:12:47,760 Jeremy, did you have an idea? 192 00:12:47,760 --> 00:12:52,388 AUDIENCE: Start by using PCR to just amplify that chunk. 193 00:12:52,388 --> 00:12:55,800 [INAUDIBLE] 194 00:12:55,800 --> 00:12:57,597 ADAM MARTIN: And what primers, I guess, 195 00:12:57,597 --> 00:12:58,680 would you use for the PCR? 196 00:13:01,902 --> 00:13:05,600 AUDIENCE: Depending on which chunk you're trying to get, 197 00:13:05,600 --> 00:13:09,740 you'd use [INAUDIBLE] 198 00:13:09,740 --> 00:13:11,240 ADAM MARTIN: OK, so Jeremy is saying 199 00:13:11,240 --> 00:13:14,150 if you knew the sequence, and I guess 200 00:13:14,150 --> 00:13:18,800 if you're doing this microsatellite analysis, 201 00:13:18,800 --> 00:13:22,310 you had primers that recognize a sequence at a given 202 00:13:22,310 --> 00:13:25,100 genomic position, so you actually know something 203 00:13:25,100 --> 00:13:29,060 about the sequence because of this polymorphism, 204 00:13:29,060 --> 00:13:34,730 so you can use that knowledge to then look for this sequence. 205 00:13:34,730 --> 00:13:38,040 And you could even look for the microsatellite in a DNA 206 00:13:38,040 --> 00:13:38,540 library. 207 00:13:41,360 --> 00:13:44,780 OK, so you have cloned pieces of DNA, 208 00:13:44,780 --> 00:13:48,650 and you're going to start with-- 209 00:13:48,650 --> 00:13:49,880 I'm going to swap this. 210 00:13:54,200 --> 00:13:56,470 Your starting position could be one 211 00:13:56,470 --> 00:14:00,200 of these polymorphisms in the sequence around it, 212 00:14:00,200 --> 00:14:02,500 which you already know. 213 00:14:02,500 --> 00:14:05,140 So let's say you had this microsatellite marker. 214 00:14:05,140 --> 00:14:07,810 You could then-- what I'm drawing here 215 00:14:07,810 --> 00:14:10,610 is a piece of genomic DNA. 216 00:14:10,610 --> 00:14:11,805 So this is genomic DNA. 217 00:14:18,040 --> 00:14:19,480 I'm just drawing the insert. 218 00:14:19,480 --> 00:14:21,430 This would be recombinant DNA. 219 00:14:21,430 --> 00:14:25,150 It would be present in some vector or plasmid. 220 00:14:25,150 --> 00:14:28,090 But if you can identify the sequence that 221 00:14:28,090 --> 00:14:32,770 contains this microsatellite marker, 222 00:14:32,770 --> 00:14:36,610 then you would have the microsatellite, but also 223 00:14:36,610 --> 00:14:39,190 the surrounding DNA, OK? 224 00:14:39,190 --> 00:14:43,000 So that sort of anchors you at a given position. 225 00:14:43,000 --> 00:14:46,240 Now, you don't know if your gene is in that piece of DNA, 226 00:14:46,240 --> 00:14:48,190 but you know that it's linked, and so it 227 00:14:48,190 --> 00:14:52,270 should be around that piece of DNA somewhere. 228 00:14:52,270 --> 00:14:55,640 And so it's unlikely your gene is 229 00:14:55,640 --> 00:14:59,460 going to be on this small piece of DNA that's cloned. 230 00:14:59,460 --> 00:15:02,350 This is probably just a few kb, and you could still 231 00:15:02,350 --> 00:15:04,750 be very far away from this, but that 232 00:15:04,750 --> 00:15:08,230 serves as a starting point from which you can go from 233 00:15:08,230 --> 00:15:10,780 to get more and more pieces of DNA 234 00:15:10,780 --> 00:15:13,330 such that eventually, you have a bunch of pieces of DNA 235 00:15:13,330 --> 00:15:16,650 that are going to span the entire region. 236 00:15:16,650 --> 00:15:21,130 So the way you identify other pieces of DNA is you 237 00:15:21,130 --> 00:15:26,170 could start with a piece of DNA maybe at the end of this insert 238 00:15:26,170 --> 00:15:28,510 and look for other inserts that are not 239 00:15:28,510 --> 00:15:33,800 identical to this piece that also contain this piece here. 240 00:15:33,800 --> 00:15:35,680 So that might get you a piece that's 241 00:15:35,680 --> 00:15:41,050 overlapping, but extends farther than your initial piece. 242 00:15:41,050 --> 00:15:43,600 So now you've moved slightly farther away 243 00:15:43,600 --> 00:15:49,650 from your starting point, which is this starting polymorphism. 244 00:15:49,650 --> 00:15:52,950 Then you could choose maybe another DNA sequence here 245 00:15:52,950 --> 00:15:56,280 and look for a piece of DNA that, again, 246 00:15:56,280 --> 00:15:58,740 is extending a bit farther out. 247 00:15:58,740 --> 00:16:01,350 And so you can see how iteratively, 248 00:16:01,350 --> 00:16:04,710 you can get farther and farther away from this starting point 249 00:16:04,710 --> 00:16:07,540 that you know your gene is linked to. 250 00:16:07,540 --> 00:16:11,280 And this process of going sort piece by piece and clone 251 00:16:11,280 --> 00:16:14,700 by clone away from a starting position 252 00:16:14,700 --> 00:16:16,530 is known as a chromosome walk. 253 00:16:22,930 --> 00:16:25,570 And you can do this bidirectionally. 254 00:16:25,570 --> 00:16:28,360 So you could also start with a sequence of DNA 255 00:16:28,360 --> 00:16:32,230 here and look for a clone that goes the other way. 256 00:16:35,230 --> 00:16:38,350 And you can see on my slide up there, 257 00:16:38,350 --> 00:16:40,960 you can see that in this case, they've 258 00:16:40,960 --> 00:16:44,290 taken a one map unit region of the chromosome 259 00:16:44,290 --> 00:16:48,610 and they're illustrating physical pieces of DNA that 260 00:16:48,610 --> 00:16:52,750 are overlapping that encompass this entire region. 261 00:16:52,750 --> 00:16:56,440 So this could be much bigger than the amount of DNA that 262 00:16:56,440 --> 00:16:59,380 would fit in one of these clones in the bacteria, 263 00:16:59,380 --> 00:17:02,740 but by sort of identify overlapping clones, 264 00:17:02,740 --> 00:17:04,839 you get the entire region. 265 00:17:04,839 --> 00:17:09,880 And what this is called here, because these pieces of DNA 266 00:17:09,880 --> 00:17:13,289 are contiguous with each other, this is known as a contig. 267 00:17:21,400 --> 00:17:22,900 Yeah, Jeremy? 268 00:17:22,900 --> 00:17:24,790 AUDIENCE: So would how you get the-- 269 00:17:24,790 --> 00:17:27,590 once you find one of those pieces, 270 00:17:27,590 --> 00:17:29,864 how do you get the primer for the end of it to start? 271 00:17:29,864 --> 00:17:32,170 Do you actually sequence each of these pieces of DNA? 272 00:17:32,170 --> 00:17:33,340 ADAM MARTIN: You could sequence it, 273 00:17:33,340 --> 00:17:34,960 or you could use a technique that I'm 274 00:17:34,960 --> 00:17:37,300 going to talk about at the end of my lecture, which 275 00:17:37,300 --> 00:17:38,980 I'll come back to. 276 00:17:38,980 --> 00:17:42,190 So nowadays, you'd probably just sequence it and then maybe look 277 00:17:42,190 --> 00:17:45,520 for that in another clone. 278 00:17:45,520 --> 00:17:50,260 But even before we could sequence DNA in entire genomes, 279 00:17:50,260 --> 00:17:52,840 you could do that type of experiment 280 00:17:52,840 --> 00:17:55,040 by using a technique called hybridization, 281 00:17:55,040 --> 00:17:58,150 which I'll come back to. 282 00:17:58,150 --> 00:18:02,480 OK, so the question in this chromosome walk then becomes, 283 00:18:02,480 --> 00:18:05,260 how do you know when to stop? 284 00:18:05,260 --> 00:18:08,290 Because you could do this for a very long time, 285 00:18:08,290 --> 00:18:09,950 but it might not be useful. 286 00:18:09,950 --> 00:18:12,400 So you have to know when to stop, 287 00:18:12,400 --> 00:18:15,760 and you need to know when you arrive at the gene 288 00:18:15,760 --> 00:18:18,340 that you're interested in, which would be the gene that 289 00:18:18,340 --> 00:18:20,155 is responsible for the disease. 290 00:18:22,870 --> 00:18:25,300 So another way to phrase this question is, 291 00:18:25,300 --> 00:18:28,960 how do you know when you have an interesting gene on one 292 00:18:28,960 --> 00:18:29,980 of these fragments? 293 00:18:29,980 --> 00:18:34,660 So let's say this is an interesting gene here. 294 00:18:34,660 --> 00:18:36,760 How do you identify interesting genes? 295 00:18:41,800 --> 00:18:44,695 So now, let's talk about identifying interesting genes. 296 00:18:52,990 --> 00:18:55,770 Anyone have an idea for how they would-- 297 00:18:55,770 --> 00:19:00,040 what criteria they would use to define a gene as being 298 00:19:00,040 --> 00:19:00,910 interesting here? 299 00:19:08,810 --> 00:19:11,780 I mean, one could say that all genes are interesting. 300 00:19:11,780 --> 00:19:13,450 If it's a gene, it's interesting, right? 301 00:19:17,230 --> 00:19:20,770 How might we define whether or not there's a gene even there? 302 00:19:20,770 --> 00:19:23,680 It could be-- there could be a gene-- 303 00:19:23,680 --> 00:19:25,490 how would you define a gene? 304 00:19:25,490 --> 00:19:27,130 Can someone define for me a gene? 305 00:19:30,660 --> 00:19:32,630 Yeah, Miles? 306 00:19:32,630 --> 00:19:33,940 Is it Miles? 307 00:19:33,940 --> 00:19:35,590 No? 308 00:19:35,590 --> 00:19:37,590 Malik, OK. 309 00:19:37,590 --> 00:19:43,652 AUDIENCE: [INAUDIBLE] that would create a starting and stopping 310 00:19:43,652 --> 00:19:44,576 point. 311 00:19:44,576 --> 00:19:47,617 So like [INAUDIBLE] 312 00:19:47,617 --> 00:19:49,700 ADAM MARTIN: So you'd look for a piece of DNA that 313 00:19:49,700 --> 00:19:53,120 has a start and a stop codon? 314 00:19:53,120 --> 00:19:55,970 So you'd look for an open reading frame, basically. 315 00:19:55,970 --> 00:19:56,840 Yeah. 316 00:19:56,840 --> 00:19:58,970 You could look for an open reading frame. 317 00:20:01,860 --> 00:20:05,660 And so I totally agree with Malik there. 318 00:20:05,660 --> 00:20:08,300 And another criteria you could use 319 00:20:08,300 --> 00:20:11,450 is if it's encoding a protein, at some point, 320 00:20:11,450 --> 00:20:15,500 it also must have been transcribed as an mRNA. 321 00:20:15,500 --> 00:20:20,150 And there are some genes that are transcribed as RNA 322 00:20:20,150 --> 00:20:21,980 but don't make a protein, and they're often 323 00:20:21,980 --> 00:20:27,560 involved in coding or in regulation of gene expression. 324 00:20:27,560 --> 00:20:28,730 So I'm going to-- 325 00:20:28,730 --> 00:20:32,460 I'm going to say, is it transcribed? 326 00:20:32,460 --> 00:20:37,420 So is there some transcript that's made? 327 00:20:37,420 --> 00:20:40,700 And specifically, is it transcribed in the tissue 328 00:20:40,700 --> 00:20:41,870 that we're interested in? 329 00:20:46,880 --> 00:20:49,790 So if we're talking aniridia, we might 330 00:20:49,790 --> 00:20:53,750 be looking for genes that are being expressed or transcribed 331 00:20:53,750 --> 00:20:57,325 specifically in eyes. 332 00:20:57,325 --> 00:20:58,700 You're looking for something that 333 00:20:58,700 --> 00:21:00,770 might be expressed in the eye. 334 00:21:00,770 --> 00:21:02,990 If it's not expressed in the eye, 335 00:21:02,990 --> 00:21:05,390 that gene's going to be much less interesting to you 336 00:21:05,390 --> 00:21:08,900 because the phenotype of aniridia is clearly in the eye. 337 00:21:11,690 --> 00:21:15,680 OK, what might be some other criteria here? 338 00:21:15,680 --> 00:21:20,120 Well, one criteria might be, is there a conserved gene that 339 00:21:20,120 --> 00:21:22,070 has an interesting function that's 340 00:21:22,070 --> 00:21:25,750 maybe similar to the disease related phenotype? 341 00:21:28,340 --> 00:21:39,535 So is there a conserved gene with an interesting function? 342 00:21:48,450 --> 00:21:51,600 And to take this example of aniridia, 343 00:21:51,600 --> 00:21:54,570 let's say you're doing this chromosome walk, 344 00:21:54,570 --> 00:21:56,580 and you identify a gene, maybe you 345 00:21:56,580 --> 00:22:00,630 sequence part of this clone, you get a string of sequence, 346 00:22:00,630 --> 00:22:03,450 and you realize that the sequence that you get 347 00:22:03,450 --> 00:22:08,280 is related to a gene from a model organism, 348 00:22:08,280 --> 00:22:12,930 and maybe that gene is called eyeless. 349 00:22:12,930 --> 00:22:18,000 If you've identified a region of sequence in a human, 350 00:22:18,000 --> 00:22:22,560 in the human genome that's mapping to an eye disease gene, 351 00:22:22,560 --> 00:22:24,540 and you find out that in that region, 352 00:22:24,540 --> 00:22:28,080 there is a conserved gene called eyeless, 353 00:22:28,080 --> 00:22:30,840 might be a very interesting gene for you. 354 00:22:30,840 --> 00:22:32,670 So eyeless is a gene. 355 00:22:32,670 --> 00:22:34,500 So here's a normal fly. 356 00:22:34,500 --> 00:22:37,670 You see it has that bright red eye. 357 00:22:37,670 --> 00:22:41,430 The eyeless gene, when mutated, results 358 00:22:41,430 --> 00:22:44,460 in a fly that now just doesn't have a white eye, 359 00:22:44,460 --> 00:22:48,120 but has no eye altogether. 360 00:22:48,120 --> 00:22:51,810 So it turns out that the aniridia gene is the homolog 361 00:22:51,810 --> 00:22:54,520 of the eyeless gene in flies. 362 00:22:54,520 --> 00:22:57,240 That's not how it was identified initially, 363 00:22:57,240 --> 00:23:01,810 but nowadays, there's a lot of information in model organisms. 364 00:23:01,810 --> 00:23:05,580 And so if you're sort of trying to identify a gene, 365 00:23:05,580 --> 00:23:08,010 and you see that there's a gene in the neighborhood 366 00:23:08,010 --> 00:23:11,610 you're looking at with a function that's 367 00:23:11,610 --> 00:23:15,630 related to a gene like eyeless, which 368 00:23:15,630 --> 00:23:18,960 has a clear sort of analogy in terms of phenotypes, 369 00:23:18,960 --> 00:23:21,540 then that's going to increase your interest in that gene. 370 00:23:24,870 --> 00:23:27,840 So I'm going to come back to this point 371 00:23:27,840 --> 00:23:31,710 here, which is how do we determine 372 00:23:31,710 --> 00:23:35,640 whether a piece of DNA that's on one of these inserts 373 00:23:35,640 --> 00:23:38,340 that we're getting as we walk across the chromosome, 374 00:23:38,340 --> 00:23:42,990 how do we know whether it is transcribed or not? 375 00:23:42,990 --> 00:23:44,940 And to get at this, I'm going to introduce you 376 00:23:44,940 --> 00:23:47,940 to a concept which is important in 377 00:23:47,940 --> 00:23:54,230 and of itself, which is the idea of cDNA. 378 00:23:54,230 --> 00:23:55,850 So cDNA. 379 00:23:55,850 --> 00:23:57,650 And specifically, I'm going to show you 380 00:23:57,650 --> 00:24:02,240 how one would make a cDNA library, which is basically 381 00:24:02,240 --> 00:24:05,180 a library of different cDNAs. 382 00:24:05,180 --> 00:24:09,640 And so what cDNA is, as shown up there on my slide, 383 00:24:09,640 --> 00:24:14,165 a cDNA is complementary DNA. 384 00:24:22,920 --> 00:24:27,730 It's complementary DNA, meaning that is the complement 385 00:24:27,730 --> 00:24:31,210 of an mRNA transcript. 386 00:24:31,210 --> 00:24:40,990 This DNA is the complement of an RNA or mRNA transcript. 387 00:24:47,750 --> 00:24:54,170 One thing to watch out for is it's not complimentary DNA. 388 00:24:54,170 --> 00:24:56,180 So this is MIT. 389 00:24:56,180 --> 00:24:59,780 This is a no compliment zone, so I don't want 390 00:24:59,780 --> 00:25:01,745 to see any complimentary DNA. 391 00:25:06,010 --> 00:25:10,030 All right, so let's think about complementary DNA. 392 00:25:10,030 --> 00:25:14,860 So remember, we've talked about the central dogma 393 00:25:14,860 --> 00:25:20,545 and how DNA encodes for RNA, which encodes for protein. 394 00:25:25,060 --> 00:25:27,600 And so the information flows from DNA 395 00:25:27,600 --> 00:25:30,150 through RNA to protein. 396 00:25:30,150 --> 00:25:33,990 But there are some specialized cases in biology 397 00:25:33,990 --> 00:25:37,950 where this information flow is reversed. 398 00:25:37,950 --> 00:25:42,210 So there can be a reverse of information flow 399 00:25:42,210 --> 00:25:45,405 where information flows from RNA to DNA. 400 00:25:48,340 --> 00:25:50,220 OK, so that's pretty cool. 401 00:25:50,220 --> 00:25:52,090 Where does that happen? 402 00:25:52,090 --> 00:25:56,700 Well, there are viruses, such as retroviruses, 403 00:25:56,700 --> 00:26:03,990 one example of a retrovirus is HIV, and the virus life-- 404 00:26:03,990 --> 00:26:08,490 the virus genome is a single-stranded RNA molecule, 405 00:26:08,490 --> 00:26:14,040 and the life cycle of the virus is that inserts into the host-- 406 00:26:14,040 --> 00:26:17,100 the host genome, which is double-stranded DNA. 407 00:26:17,100 --> 00:26:21,420 For a retrovirus to do that, it needs to take its RNA genome 408 00:26:21,420 --> 00:26:25,590 and make double-stranded DNA in order for it to insert. 409 00:26:25,590 --> 00:26:29,220 So this is an example in biology, which is basically 410 00:26:29,220 --> 00:26:32,100 breaking the rules that we talked to you about earlier 411 00:26:32,100 --> 00:26:33,990 in the semester. 412 00:26:33,990 --> 00:26:37,860 Also, there are retrotransposons which 413 00:26:37,860 --> 00:26:42,630 do a similar process, going from an RNA molecule 414 00:26:42,630 --> 00:26:44,610 to double-stranded DNA. 415 00:26:44,610 --> 00:26:48,010 So this is a specialized case, and it's interesting, 416 00:26:48,010 --> 00:26:54,570 and we can take advantage of it to basically clone and identify 417 00:26:54,570 --> 00:26:56,010 mRNA transcripts. 418 00:26:59,870 --> 00:27:04,110 OK, so I'm going to tell you how to make complementary DNA, 419 00:27:04,110 --> 00:27:07,060 and I'll go through a series of steps. 420 00:27:07,060 --> 00:27:13,320 The first step is we want to make complementary DNA of mRNA, 421 00:27:13,320 --> 00:27:15,405 so we need a way to purify the mRNA. 422 00:27:18,040 --> 00:27:22,750 So anyone have any idea how to purify mRNA? 423 00:27:22,750 --> 00:27:26,170 First, we could maybe draw an RNA molecule here. 424 00:27:26,170 --> 00:27:31,020 What are some salient features of mature mRNA? 425 00:27:31,020 --> 00:27:31,565 Yeah, Carlos? 426 00:27:31,565 --> 00:27:33,273 AUDIENCE: It'll have the five-prime cap 427 00:27:33,273 --> 00:27:34,190 [INAUDIBLE] phosphate. 428 00:27:34,190 --> 00:27:36,230 ADAM MARTIN: Yeah, it'll have a five-prime cap. 429 00:27:36,230 --> 00:27:37,290 Anything else? 430 00:27:37,290 --> 00:27:37,790 Jeremy? 431 00:27:37,790 --> 00:27:38,580 AUDIENCE: Poly-A tail. 432 00:27:38,580 --> 00:27:40,247 ADAM MARTIN: It'll have a five-prime cap 433 00:27:40,247 --> 00:27:41,480 and a poly-A tail. 434 00:27:41,480 --> 00:27:44,600 I'm going to take advantage mostly of the poly-A tail here. 435 00:27:47,480 --> 00:27:50,580 So here, we have a poly-A tail. 436 00:27:50,580 --> 00:27:55,660 OK, how might we use that poly-A tail to purify mRNA? 437 00:27:55,660 --> 00:27:56,160 Natalie? 438 00:27:56,160 --> 00:27:58,663 AUDIENCE: Well, you can add a [INAUDIBLE] 439 00:27:58,663 --> 00:28:00,507 because you know they're [INAUDIBLE] 440 00:28:00,507 --> 00:28:01,340 ADAM MARTIN: Mm-hmm. 441 00:28:01,340 --> 00:28:02,975 What sequence would you use? 442 00:28:02,975 --> 00:28:03,850 AUDIENCE: [INAUDIBLE] 443 00:28:03,850 --> 00:28:04,558 ADAM MARTIN: Yes. 444 00:28:04,558 --> 00:28:09,550 So Natalie has suggested using poly T, which 445 00:28:09,550 --> 00:28:13,480 she said would stick to this poly A tail because 446 00:28:13,480 --> 00:28:17,050 of base pair hybridization, OK? 447 00:28:17,050 --> 00:28:22,690 So let's say we have a bead or some type of resin with dTs 448 00:28:22,690 --> 00:28:25,450 hanging off of it. 449 00:28:25,450 --> 00:28:27,190 So I'll draw a few of them, but you'd 450 00:28:27,190 --> 00:28:34,060 have maybe a lot of them sticking off, OK? 451 00:28:34,060 --> 00:28:37,360 So you have a bead with pieces of DNA, all of which 452 00:28:37,360 --> 00:28:39,940 are poly dT hanging off of it. 453 00:28:39,940 --> 00:28:46,120 And then these poly dTs, if you add cytoplasm from cells, 454 00:28:46,120 --> 00:28:50,710 the mRNA in that cytoplasm is going to stick to this poly dT 455 00:28:50,710 --> 00:28:55,540 bead, and it will stick with a higher 456 00:28:55,540 --> 00:28:58,660 affinity than other things that are non specifically sticking 457 00:28:58,660 --> 00:29:02,770 to the beads, and you can wash these beads with buffer 458 00:29:02,770 --> 00:29:05,350 and salt to get rid of everything that's 459 00:29:05,350 --> 00:29:07,660 non-specifically sticking to the bead, 460 00:29:07,660 --> 00:29:10,870 and then you're left with just a bead that's 461 00:29:10,870 --> 00:29:13,750 enriched with mRNA, which is what was specifically 462 00:29:13,750 --> 00:29:15,760 sticking to this, OK? 463 00:29:15,760 --> 00:29:16,705 So you could purify-- 464 00:29:20,260 --> 00:29:36,750 you're purifying the mRNA based on its affinity for a poly dT, 465 00:29:36,750 --> 00:29:37,250 OK? 466 00:29:37,250 --> 00:29:42,170 So then you're going to have enrichment 467 00:29:42,170 --> 00:29:45,940 of mRNA in your sample. 468 00:29:45,940 --> 00:29:51,270 And so then once you have your RNA, 469 00:29:51,270 --> 00:29:55,150 you're going to want to somehow go from RNA to DNA, OK? 470 00:29:58,290 --> 00:30:04,710 So the next step will involve somehow going from RNA to DNA. 471 00:30:04,710 --> 00:30:07,020 So let's draw our piece of RNA here. 472 00:30:07,020 --> 00:30:08,150 Here's our RNA. 473 00:30:08,150 --> 00:30:10,320 It has a poly A tail so it's mRNA. 474 00:30:14,140 --> 00:30:15,095 There is 5 prime. 475 00:30:18,290 --> 00:30:22,840 OK, so now we need to take advantage of a trick. 476 00:30:22,840 --> 00:30:26,050 We can still take advantage of dT 477 00:30:26,050 --> 00:30:29,080 because we can use this as a primer 478 00:30:29,080 --> 00:30:32,310 because polymerase usually requires 479 00:30:32,310 --> 00:30:38,410 some primer and a three prime hydroxyl in order to extend. 480 00:30:38,410 --> 00:30:43,450 Now, can we use DNA polymerase to extend this primer? 481 00:30:43,450 --> 00:30:44,880 Jeremy is shaking his head no. 482 00:30:44,880 --> 00:30:45,380 Why? 483 00:30:45,380 --> 00:30:51,040 AUDIENCE: Because DNA [INAUDIBLE] 484 00:30:51,040 --> 00:30:52,460 ADAM MARTIN: Exactly. 485 00:30:52,460 --> 00:30:55,400 So what Jeremy is saying is DNA polymerase 486 00:30:55,400 --> 00:30:59,120 is a DNA dependent DNA polymerase, OK? 487 00:30:59,120 --> 00:31:03,680 DNA polymerase can only use this if this is DNA here, OK? 488 00:31:03,680 --> 00:31:07,460 So we need a different type of enzyme, essentially, 489 00:31:07,460 --> 00:31:13,640 in order to make DNA from RNA, and luckily, 490 00:31:13,640 --> 00:31:15,500 molecular biologists-- 491 00:31:15,500 --> 00:31:17,960 actually one of whom was here at MIT-- 492 00:31:17,960 --> 00:31:20,540 discovered this type of enzyme, and it's 493 00:31:20,540 --> 00:31:22,320 called reverse transcriptase. 494 00:31:25,670 --> 00:31:27,170 Reverse transcriptase. 495 00:31:27,170 --> 00:31:32,420 This is an enzyme that's encoded by retroviruses in order 496 00:31:32,420 --> 00:31:36,440 to make double stranded DNA from RNA, 497 00:31:36,440 --> 00:31:40,130 and that allows the retrovirus to insert into the host genome, 498 00:31:40,130 --> 00:31:42,240 OK? 499 00:31:42,240 --> 00:31:47,150 And what reverse transcriptase is is it's an RNA dependent DNA 500 00:31:47,150 --> 00:31:48,140 polymerase, OK? 501 00:31:48,140 --> 00:31:51,060 So it takes RNA as its substrate, 502 00:31:51,060 --> 00:31:57,020 and then it synthesizes DNA on the opposite strand, OK? 503 00:31:57,020 --> 00:32:05,222 So this is an RNA dependent DNA polymerase. 504 00:32:09,380 --> 00:32:12,950 OK, so if you add reverse transcriptase 505 00:32:12,950 --> 00:32:18,800 to mRNAs that have these dT primers, then what you get 506 00:32:18,800 --> 00:32:22,310 is a new strand, which is DNA here. 507 00:32:22,310 --> 00:32:23,570 This is the strand of DNA. 508 00:32:27,330 --> 00:32:33,800 And then you have a strand of RNA opposite it, OK? 509 00:32:33,800 --> 00:32:39,290 So at this step, you have a DNA RNA hybrid. 510 00:32:39,290 --> 00:32:42,860 So this is a DNA RNA hybrid. 511 00:32:47,550 --> 00:32:49,590 Let's see. 512 00:32:49,590 --> 00:32:51,140 Reveal some more of this. 513 00:32:51,140 --> 00:32:53,030 This is the process which I'm basically 514 00:32:53,030 --> 00:32:54,230 outlining on the board. 515 00:32:57,070 --> 00:32:59,860 So then you want double stranded DNA, 516 00:32:59,860 --> 00:33:03,070 so you don't want this strand of RNA that's down here, 517 00:33:03,070 --> 00:33:05,950 so you have to get rid of it. 518 00:33:05,950 --> 00:33:13,270 So you would degrade the RNA, and this 519 00:33:13,270 --> 00:33:16,540 is done using another enzymatic activity, which 520 00:33:16,540 --> 00:33:19,570 is derived from reverse transcriptase, which 521 00:33:19,570 --> 00:33:22,510 is termed RNAs H activity. 522 00:33:22,510 --> 00:33:28,990 So you can add an enzyme RNAs H, which 523 00:33:28,990 --> 00:33:32,530 RNAs H takes this DNA RNA hybrids 524 00:33:32,530 --> 00:33:35,800 and degrades the RNA part of it, OK? 525 00:33:35,800 --> 00:33:39,910 So this is going to degrade the RNA strand. 526 00:33:39,910 --> 00:33:42,170 And if you degrade the RNA strand, 527 00:33:42,170 --> 00:33:44,440 then you're left with a single strand of DNA. 528 00:33:48,740 --> 00:33:53,820 So you have single strand of DNA here, 529 00:33:53,820 --> 00:33:56,540 and now what you need to do is to synthesize 530 00:33:56,540 --> 00:33:59,690 the second strand of DNA. 531 00:33:59,690 --> 00:34:01,940 So you need a second strand synthesis. 532 00:34:07,260 --> 00:34:09,960 And so you need, again, a primer in order 533 00:34:09,960 --> 00:34:13,260 to prime the synthesis here. 534 00:34:13,260 --> 00:34:15,850 So there are a variety of ways to do this. 535 00:34:15,850 --> 00:34:18,360 You can add some type of hairpin, 536 00:34:18,360 --> 00:34:23,340 which is five prime here and three prime here, 537 00:34:23,340 --> 00:34:26,550 and then you can use either DNA, polymerase, 538 00:34:26,550 --> 00:34:31,590 or reverse transcriptase, which also can be a DNA dependent DNA 539 00:34:31,590 --> 00:34:35,389 polymerase to transcribe this strand here, OK? 540 00:34:38,190 --> 00:34:42,270 So again, you add polymerase, and now you've gone 541 00:34:42,270 --> 00:34:46,980 and you've generated double stranded DNA, OK? 542 00:34:46,980 --> 00:34:51,750 So everyone see how we've gone from an mRNA transcript, 543 00:34:51,750 --> 00:34:55,889 and we've done the reverse of everything we just told you 544 00:34:55,889 --> 00:35:00,780 in the first half of the course because we've gone from RNA 545 00:35:00,780 --> 00:35:02,940 and we've made DNA, OK? 546 00:35:02,940 --> 00:35:05,100 But this will be really useful because now we 547 00:35:05,100 --> 00:35:09,990 have a stable piece of DNA that we can clone into a plasmid 548 00:35:09,990 --> 00:35:12,690 and we have a record of this transcript being 549 00:35:12,690 --> 00:35:17,730 present in our sample, and we can propagate that on and on, 550 00:35:17,730 --> 00:35:20,310 so we've cloned it, OK? 551 00:35:20,310 --> 00:35:23,760 All right, what's going to be special about this piece of DNA 552 00:35:23,760 --> 00:35:27,050 versus a piece of genomic DNA? 553 00:35:27,050 --> 00:35:27,550 Natalie? 554 00:35:27,550 --> 00:35:29,340 AUDIENCE: [INAUDIBLE] 555 00:35:29,340 --> 00:35:31,220 ADAM MARTIN: Yes, so Natalie suggesting 556 00:35:31,220 --> 00:35:34,950 that it doesn't have introns, and that's totally right. 557 00:35:34,950 --> 00:35:41,880 So this is not like genomic DNA, and what 558 00:35:41,880 --> 00:35:47,540 Natalie said is because mRNA is processed, 559 00:35:47,540 --> 00:35:51,810 the introns are spliced out, such the mature mRNA only 560 00:35:51,810 --> 00:35:57,150 has the axons, and so this piece of complementary cDNA 561 00:35:57,150 --> 00:36:00,220 is going to have no introns. 562 00:36:02,850 --> 00:36:04,610 How else is it different? 563 00:36:10,260 --> 00:36:10,878 Yeah, Jeremy? 564 00:36:10,878 --> 00:36:12,670 AUDIENCE: It's not going to have promoters. 565 00:36:12,670 --> 00:36:14,090 ADAM MARTIN: It's not going to have a promoter. 566 00:36:14,090 --> 00:36:14,770 Yes, Carmen? 567 00:36:14,770 --> 00:36:19,820 AUDIENCE: It doesn't have [INAUDIBLE] 568 00:36:19,820 --> 00:36:23,000 ADAM MARTIN: You might see a poly A and T 569 00:36:23,000 --> 00:36:24,860 sequence in the cDNA. 570 00:36:24,860 --> 00:36:26,710 Yes, that's true. 571 00:36:26,710 --> 00:36:29,600 OK, so you might have poly A, poly T. I'm 572 00:36:29,600 --> 00:36:31,830 going to focus on the other part from-- 573 00:36:34,340 --> 00:36:41,180 there's going to be no promoter, enhancer, regulatory sequences. 574 00:36:41,180 --> 00:36:45,950 Basically, it's got no sequence that's not transcribed, right? 575 00:36:45,950 --> 00:36:49,160 The DNA is only going to have the part of the gene that 576 00:36:49,160 --> 00:36:54,180 was physically transcribed by the RNA polymerase originally. 577 00:36:54,180 --> 00:36:59,115 OK, so no non-transcribed regions. 578 00:37:02,290 --> 00:37:10,550 No non-transcribed regions, and Carmen's absolutely right. 579 00:37:10,550 --> 00:37:14,110 You will also have possibly a poly A or poly T sequence. 580 00:37:21,510 --> 00:37:27,360 OK, so when you get these cDNAs, you might have-- 581 00:37:27,360 --> 00:37:29,910 you have more than one mRNA in a sample 582 00:37:29,910 --> 00:37:34,350 like a cytoplasmic extract, so you're going to prime-- 583 00:37:34,350 --> 00:37:36,510 you're going to make multiple cDNA 584 00:37:36,510 --> 00:37:40,500 and different cDNAs will reflect different transcripts that 585 00:37:40,500 --> 00:37:43,080 are present in your sample, OK? 586 00:37:43,080 --> 00:37:45,630 So you could have one clone that's 587 00:37:45,630 --> 00:37:49,350 one gene, another clone that's a different gene, 588 00:37:49,350 --> 00:37:51,450 and another clone that's another gene, 589 00:37:51,450 --> 00:37:55,440 and you could have thousands of clones of these different DNAs. 590 00:37:55,440 --> 00:38:00,630 What's going to be special about what types of genes 591 00:38:00,630 --> 00:38:03,810 are you going to get for I guess different tissues. 592 00:38:03,810 --> 00:38:08,070 Are they going to be the same or not? 593 00:38:08,070 --> 00:38:08,730 Yeah, Carlos? 594 00:38:08,730 --> 00:38:11,815 AUDIENCE: [INAUDIBLE] 595 00:38:11,815 --> 00:38:12,690 ADAM MARTIN: Exactly. 596 00:38:12,690 --> 00:38:15,740 You're not going to see-- if you've prepared a tissue 597 00:38:15,740 --> 00:38:18,440 and there is no gene being-- 598 00:38:18,440 --> 00:38:22,260 if one gene was not expressed or transcribed in that tissue, 599 00:38:22,260 --> 00:38:25,350 you will not get a cDNA for that particular gene 600 00:38:25,350 --> 00:38:27,630 in your library, OK? 601 00:38:27,630 --> 00:38:30,090 So the representation of genes-- 602 00:38:33,630 --> 00:38:42,210 the representation of genes in a cDNA library 603 00:38:42,210 --> 00:38:47,550 is totally dependent on what genes are being expressed, OK? 604 00:38:47,550 --> 00:38:49,470 So this representation is going to be 605 00:38:49,470 --> 00:38:56,130 proportional to the expression level, and the more genes-- 606 00:38:56,130 --> 00:38:59,040 the more a gene is expressed in a given tissue, 607 00:38:59,040 --> 00:39:02,430 the more copies of cDNA for that gene 608 00:39:02,430 --> 00:39:04,200 you would see in the library, OK? 609 00:39:04,200 --> 00:39:06,510 So there's really a proportionality 610 00:39:06,510 --> 00:39:09,480 between the number of clones in a library 611 00:39:09,480 --> 00:39:12,120 and the expression level of a gene, 612 00:39:12,120 --> 00:39:15,210 where in the most extreme case, if this gene is not 613 00:39:15,210 --> 00:39:16,800 expressed at all, you're not going 614 00:39:16,800 --> 00:39:21,890 to see it represented at all in the cDNA library, OK? 615 00:39:21,890 --> 00:39:24,330 And then a corollary to this statement 616 00:39:24,330 --> 00:39:30,540 is that if you make cDNA libraries from different cell 617 00:39:30,540 --> 00:39:34,680 types or different tissue types, the cDNA libraries 618 00:39:34,680 --> 00:39:37,020 are going to be different between those different types 619 00:39:37,020 --> 00:39:39,780 of sources of mRNA, OK? 620 00:39:39,780 --> 00:39:46,860 So in other words, different tissues 621 00:39:46,860 --> 00:39:47,960 give you different cDNA. 622 00:39:59,550 --> 00:40:01,900 OK, so there is the process. 623 00:40:01,900 --> 00:40:03,580 So I went through most of the side. 624 00:40:03,580 --> 00:40:04,540 Yes, miles? 625 00:40:04,540 --> 00:40:08,415 AUDIENCE: Is this a way you can determine what gene sequences 626 00:40:08,415 --> 00:40:10,720 are expressed in all cells? 627 00:40:10,720 --> 00:40:16,588 Because in certain mRNA strands across all tissue samples, 628 00:40:16,588 --> 00:40:22,870 those are basic cell functions and expressed in a [INAUDIBLE] 629 00:40:22,870 --> 00:40:24,283 organism? 630 00:40:24,283 --> 00:40:26,200 ADAM MARTIN: So you're asking, if you grind up 631 00:40:26,200 --> 00:40:28,630 like an entire organism and if you 632 00:40:28,630 --> 00:40:31,150 get a cDNA from that library, could you 633 00:40:31,150 --> 00:40:36,370 tell if it's expressed in all different cell types? 634 00:40:36,370 --> 00:40:39,970 Even if you have one cell type that expresses a gene, 635 00:40:39,970 --> 00:40:42,530 if you grind up the entire organism, 636 00:40:42,530 --> 00:40:45,850 then you're going to have some mRNA that represents that gene. 637 00:40:45,850 --> 00:40:48,820 So I don't think it would be as an effective measure 638 00:40:48,820 --> 00:40:52,600 to determine the ubiquity of expression of a given gene, 639 00:40:52,600 --> 00:40:54,430 but in just a minute, I'm going to give you 640 00:40:54,430 --> 00:40:57,580 a tool that would allow you to answer the exact question 641 00:40:57,580 --> 00:41:00,220 that you're asking, OK? 642 00:41:00,220 --> 00:41:04,590 Any other questions about the cDNA library? 643 00:41:04,590 --> 00:41:06,920 OK. 644 00:41:06,920 --> 00:41:10,850 So I just wanted to mention that a comeback to this example I 645 00:41:10,850 --> 00:41:15,000 gave on the identification of the human CDK gene. 646 00:41:15,000 --> 00:41:20,150 So remember, we started with yeast that were mutant. 647 00:41:20,150 --> 00:41:22,430 They had temperature-sensitive mutants, 648 00:41:22,430 --> 00:41:25,040 and we transformed these mutants with a library, 649 00:41:25,040 --> 00:41:27,140 but I didn't really tell you what the library was. 650 00:41:27,140 --> 00:41:30,260 It was in fact the cDNA library from humans that 651 00:41:30,260 --> 00:41:32,740 was transformed into yeast, OK? 652 00:41:32,740 --> 00:41:34,460 And that's because yeast genes-- 653 00:41:34,460 --> 00:41:38,150 for the most part, they don't have a lot of interests, and so 654 00:41:38,150 --> 00:41:39,245 the yeast-- 655 00:41:39,245 --> 00:41:41,840 the machinery is not able to splice out 656 00:41:41,840 --> 00:41:45,290 the human interactions and human genes, OK? 657 00:41:45,290 --> 00:41:47,810 And so this was done with a human cDNA 658 00:41:47,810 --> 00:41:50,300 library, which then encoded-- 659 00:41:50,300 --> 00:41:54,080 one of which encoded the cumin CDK gene, and that 660 00:41:54,080 --> 00:41:57,920 allowed Paul Nurse to discover the piece of DNA that 661 00:41:57,920 --> 00:42:01,190 encoded for the human CDK, OK? 662 00:42:01,190 --> 00:42:03,440 So I just wanted to kind of retroactively 663 00:42:03,440 --> 00:42:06,440 go back and sort of tell you how that experiment was done. 664 00:42:09,870 --> 00:42:13,660 OK, so now I'm going to get to my final point 665 00:42:13,660 --> 00:42:17,980 for this lecture, which is this final technique, which 666 00:42:17,980 --> 00:42:21,850 will allow us to determine whether or not a transcript is 667 00:42:21,850 --> 00:42:24,880 expressed in a single cell type or ubiquitously 668 00:42:24,880 --> 00:42:29,260 through an organism, and this involves a technique, which 669 00:42:29,260 --> 00:42:30,780 is known as hybridization. 670 00:42:35,870 --> 00:42:38,920 And what hybridization is is if you're 671 00:42:38,920 --> 00:42:41,830 starting with a piece of DNA, you 672 00:42:41,830 --> 00:42:44,170 don't need to know its sequence in order 673 00:42:44,170 --> 00:42:46,690 to determine whether there are sequences that 674 00:42:46,690 --> 00:42:51,220 are similar or identical to it, because hybridisation 675 00:42:51,220 --> 00:42:57,160 is basically if you have some sequence 676 00:42:57,160 --> 00:43:00,310 and it's single stranded such that you have a DNA 677 00:43:00,310 --> 00:43:03,010 backbone but you have base pairs that 678 00:43:03,010 --> 00:43:06,790 are able to pair with their complementary bases 679 00:43:06,790 --> 00:43:10,840 and you can use a piece of single stranded DNA like this 680 00:43:10,840 --> 00:43:15,880 and you can label it such that if the labeled piece sticks 681 00:43:15,880 --> 00:43:19,960 to another piece that has identical or similar sequence, 682 00:43:19,960 --> 00:43:23,830 you'll be able to visualize it in some way, OK? 683 00:43:23,830 --> 00:43:25,360 So this is called-- 684 00:43:25,360 --> 00:43:32,770 you're looking for things that anneal or hybridize 685 00:43:32,770 --> 00:43:37,360 to a particular specific sequence. 686 00:43:40,270 --> 00:43:43,420 So you don't need to know the sequence a priori, OK? 687 00:43:43,420 --> 00:43:47,020 You just need to have this physical piece of DNA, 688 00:43:47,020 --> 00:43:50,350 and you can use this single stranded piece of DNA 689 00:43:50,350 --> 00:43:55,510 to then fish for similar sequences, OK? 690 00:43:55,510 --> 00:43:59,710 So we could take a piece of DNA here maybe that's in a gene, 691 00:43:59,710 --> 00:44:02,200 and we could fish through a DNA library 692 00:44:02,200 --> 00:44:07,600 to try to identify a cDNA clone that has sequence identity 693 00:44:07,600 --> 00:44:10,480 to that piece of DNA, OK? 694 00:44:10,480 --> 00:44:13,960 And the way this is done is to take a cDNA library. 695 00:44:13,960 --> 00:44:18,910 So each of these colonies here would express or have 696 00:44:18,910 --> 00:44:21,910 a different clone of DNA. 697 00:44:21,910 --> 00:44:24,910 You can then take a nitrocellulose filter, put it 698 00:44:24,910 --> 00:44:28,450 on this plate, which would stick the bacteria in place 699 00:44:28,450 --> 00:44:32,470 to that filter, and you could then lice the bacteria 700 00:44:32,470 --> 00:44:36,100 and denature the DNA, and then the DNA is stuck to the figure, 701 00:44:36,100 --> 00:44:39,610 but now it's single stranded. 702 00:44:39,610 --> 00:44:42,400 You can then add your probe, which is labeled, 703 00:44:42,400 --> 00:44:46,000 and look for the colonies that this probe sticks to, 704 00:44:46,000 --> 00:44:49,180 and that would then identify a particular cDNA, which 705 00:44:49,180 --> 00:44:52,300 would identify whether or not a piece of DNA 706 00:44:52,300 --> 00:44:55,630 is expressed in a given tissue type, OK? 707 00:44:55,630 --> 00:44:59,390 So everyone see how that would work? 708 00:44:59,390 --> 00:45:02,680 So in addition to doing this on a nitrous cellulose filter, 709 00:45:02,680 --> 00:45:06,310 you can also do this in a tissue, 710 00:45:06,310 --> 00:45:08,860 and that's known as in situ hybridization. 711 00:45:12,670 --> 00:45:15,820 And in this case, in situ hybridization, 712 00:45:15,820 --> 00:45:19,150 you're searching for mRNA in a section of fixed tissue. 713 00:45:30,310 --> 00:45:33,550 OK, and I have an example from this paper 714 00:45:33,550 --> 00:45:37,000 here, which is the paper this are cloned. 715 00:45:37,000 --> 00:45:40,420 In this paper was the cloning of the aniridia gene, 716 00:45:40,420 --> 00:45:44,800 and they identified a gene of interest, which is called Pax6 717 00:45:44,800 --> 00:45:48,400 now, and they basically used a piece of DNA 718 00:45:48,400 --> 00:45:50,620 that they thought was interesting, 719 00:45:50,620 --> 00:45:55,180 and they did in situ hybridization in an organism, 720 00:45:55,180 --> 00:45:56,620 in this case, you see an eye. 721 00:45:56,620 --> 00:45:59,980 This is an eye here, and the label Pax6 722 00:45:59,980 --> 00:46:01,810 is labeled in yellow, and you can 723 00:46:01,810 --> 00:46:04,390 see how this transcript is present throughout 724 00:46:04,390 --> 00:46:06,430 the entire eye, right? 725 00:46:06,430 --> 00:46:08,840 And the way you would see if it's tissue specific is you 726 00:46:08,840 --> 00:46:12,820 look in other tissues and you wouldn't see this yellow label. 727 00:46:12,820 --> 00:46:15,010 So that's how you would determine 728 00:46:15,010 --> 00:46:18,580 if it's expressed in a specific tissue or ubiquitously 729 00:46:18,580 --> 00:46:19,540 throughout an organism. 730 00:46:22,400 --> 00:46:24,760 OK, so this Pax6 gene. 731 00:46:24,760 --> 00:46:25,323 Oop. 732 00:46:25,323 --> 00:46:26,990 So I was going to ask, what do you think 733 00:46:26,990 --> 00:46:32,130 would happen if you hyperactivate Pax6 in humans, 734 00:46:32,130 --> 00:46:39,380 and this is one idea, but actually, I just made that up, 735 00:46:39,380 --> 00:46:44,750 or Stan Lee made that up, but actually, Stan Lee never 736 00:46:44,750 --> 00:46:48,920 in fact mentioned whether or not cyclops is a Pax6 mutant, 737 00:46:48,920 --> 00:46:53,810 but we can do a different type of experiment, which might be 738 00:46:53,810 --> 00:46:58,100 more ethical, which is we know there's a fly gene that's 739 00:46:58,100 --> 00:47:00,020 homologous to Pax6. 740 00:47:00,020 --> 00:47:04,310 And what we can do in flies is we can topically 741 00:47:04,310 --> 00:47:07,910 express this islets gene in non-eye tissues 742 00:47:07,910 --> 00:47:09,890 and see what happens. 743 00:47:09,890 --> 00:47:11,510 OK so, this is pretty wild. 744 00:47:11,510 --> 00:47:15,990 This is my Halloween image of the class. 745 00:47:15,990 --> 00:47:18,410 So this is a fly where eyeless has been 746 00:47:18,410 --> 00:47:21,190 expressed all over its body. 747 00:47:21,190 --> 00:47:22,940 OK, so here you see there's an eye-- 748 00:47:22,940 --> 00:47:25,010 It's normal eye-- here. 749 00:47:25,010 --> 00:47:26,660 You can see there's now another eye 750 00:47:26,660 --> 00:47:29,040 growing in the front of its head. 751 00:47:29,040 --> 00:47:32,150 You can see here's an eye growing on this fly's back, 752 00:47:32,150 --> 00:47:34,520 and you can see the legs. 753 00:47:34,520 --> 00:47:38,330 There's eye tissue all over the legs of this fly, OK? 754 00:47:38,330 --> 00:47:43,130 So this Pax6 gene, which is conserved from flies to humans 755 00:47:43,130 --> 00:47:46,430 is the master regulator of eye development, OK? 756 00:47:46,430 --> 00:47:48,640 And at least in flies, if you topically 757 00:47:48,640 --> 00:47:53,160 express this in other parts of the body, you get an eye. 758 00:47:53,160 --> 00:47:55,360 I should say these are not functionalized. 759 00:47:55,360 --> 00:47:57,650 They don't hook up to the brain the same way 760 00:47:57,650 --> 00:47:59,510 the normal eye does. 761 00:47:59,510 --> 00:48:01,760 So it's not like this fly can see out 762 00:48:01,760 --> 00:48:04,460 of the back of its head. 763 00:48:04,460 --> 00:48:06,290 OK, that's it. 764 00:48:06,290 --> 00:48:09,590 I'm done, and good luck on your exam on Wednesday. 765 00:48:09,590 --> 00:48:11,770 We will see you here.