1 00:00:16,710 --> 00:00:19,980 PROFESSOR: And today I'm going to talk about DNA sequencing. 2 00:00:19,980 --> 00:00:24,510 And I want to start by just sort of illustrating 3 00:00:24,510 --> 00:00:29,830 an example of how knowing the DNA sequence can be helpful. 4 00:00:29,830 --> 00:00:33,310 So you remember in the last lecture, 5 00:00:33,310 --> 00:00:37,050 we talked about how one might identify 6 00:00:37,050 --> 00:00:41,730 a gene through functional complementation. 7 00:00:41,730 --> 00:00:46,950 And this process involved making a DNA library that 8 00:00:46,950 --> 00:00:52,750 had different fragments of DNA cloned into different plasmids 9 00:00:52,750 --> 00:00:56,770 and then involved finding the needle in the haystack where 10 00:00:56,770 --> 00:01:01,200 you find the gene that can rescue a defect in a mutant 11 00:01:01,200 --> 00:01:01,920 that you have. 12 00:01:05,620 --> 00:01:15,500 So if this line that I'm drawing here is genomic DNA, 13 00:01:15,500 --> 00:01:19,820 and it could be genomic DNA from, 14 00:01:19,820 --> 00:01:25,700 let's say, a prototroph for LEU2, the leucine gene. 15 00:01:25,700 --> 00:01:27,380 So this is from a prototroph. 16 00:01:31,650 --> 00:01:38,140 Then you could cut up the DNA with EcoRI. 17 00:01:38,140 --> 00:01:41,460 And if there is not a restriction site in this LEU2 18 00:01:41,460 --> 00:01:46,710 gene, you get a fragment that contains the LEU2 gene. 19 00:01:46,710 --> 00:01:53,100 And then you could clone this into some type of plasmid 20 00:01:53,100 --> 00:01:56,750 that replicates in the organism that you're introducing 21 00:01:56,750 --> 00:01:59,910 it and propagating it in. 22 00:01:59,910 --> 00:02:04,590 And so that would allow you to then test whether or not 23 00:02:04,590 --> 00:02:10,750 this piece of DNA that you have compliments a LEU2 auxotroph, 24 00:02:10,750 --> 00:02:11,250 OK? 25 00:02:12,240 --> 00:02:14,310 Now one thing I want to point out 26 00:02:14,310 --> 00:02:23,190 is that because these EcoR1 sites, these sticky ends, 27 00:02:23,190 --> 00:02:28,230 would recognize this EcoR1 one end or this EcoR1 end, 28 00:02:28,230 --> 00:02:30,780 you can imagine that this gene-- 29 00:02:30,780 --> 00:02:33,330 if the gene reads this way to this way-- 30 00:02:33,330 --> 00:02:36,420 it could insert this way into the plasmid. 31 00:02:36,420 --> 00:02:39,340 Or it could insert in the opposite direction. 32 00:02:39,340 --> 00:02:42,120 So it could be inverted. 33 00:02:42,120 --> 00:02:45,270 So this would have some sort of origin of replication 34 00:02:45,270 --> 00:02:48,960 and some type of selectable marker. 35 00:02:48,960 --> 00:02:51,540 But if you have the same restriction site 36 00:02:51,540 --> 00:02:54,210 it can insert one way or the opposite way. 37 00:02:54,210 --> 00:02:58,680 That's just one thing I wanted to point out. 38 00:02:58,680 --> 00:03:02,290 Now let's say rather than leucine, 39 00:03:02,290 --> 00:03:05,160 you're interested in cycling dependent kinase, 40 00:03:05,160 --> 00:03:09,000 and you had a mutant end CDK and you had this sequence 41 00:03:09,000 --> 00:03:11,880 of your yeast CDK gene. 42 00:03:11,880 --> 00:03:18,240 Well, rather than having to dig through a whole library 43 00:03:18,240 --> 00:03:22,530 of pieces of DNA for the CDK gene, 44 00:03:22,530 --> 00:03:24,930 basically you're sort of fishing for that needle 45 00:03:24,930 --> 00:03:26,190 in the haystack. 46 00:03:26,190 --> 00:03:29,010 If you knew the sequence of the human genome, 47 00:03:29,010 --> 00:03:33,850 you'd be able to identify similar genes by sequence 48 00:03:33,850 --> 00:03:34,980 homology. 49 00:03:34,980 --> 00:03:37,860 And you could then take a more direct approach, 50 00:03:37,860 --> 00:03:39,600 where you take-- 51 00:03:39,600 --> 00:03:42,360 let's say you have a piece of human DNA 52 00:03:42,360 --> 00:03:48,930 now, double stranded DNA, and it has the CDK gene. 53 00:03:48,930 --> 00:03:53,160 You could take human DNA with this CDK gene. 54 00:03:53,160 --> 00:03:56,520 And you have unique sequence around the CDK 55 00:03:56,520 --> 00:04:00,885 gene, which would allow you to denature this DNA. 56 00:04:04,180 --> 00:04:11,010 And if you denature the DNA, you'd 57 00:04:11,010 --> 00:04:12,915 get two single strands of DNA. 58 00:04:16,260 --> 00:04:18,240 And you could then design primers 59 00:04:18,240 --> 00:04:23,350 that recognize unique sequences flanking the CDK gene. 60 00:04:23,350 --> 00:04:26,700 So you could imagine you'd have a primer here 61 00:04:26,700 --> 00:04:28,590 and a primer here. 62 00:04:28,590 --> 00:04:32,670 And then you could use PCR to amplify specifically 63 00:04:32,670 --> 00:04:38,730 CDK gene from, it could be the genome or from some library. 64 00:04:38,730 --> 00:04:45,960 And then you get this fragment here, which includes CDK. 65 00:04:45,960 --> 00:04:49,230 So knowing the sequence of the genome 66 00:04:49,230 --> 00:04:54,390 would allow you to more rapidly go from maybe a gene 67 00:04:54,390 --> 00:04:57,720 that you've identified as being important in one organism, 68 00:04:57,720 --> 00:05:01,650 and find the human equivalent that might be doing something 69 00:05:01,650 --> 00:05:05,280 similar in humans. 70 00:05:05,280 --> 00:05:07,230 So this step here is basically PCR. 71 00:05:10,530 --> 00:05:14,820 And let's say the CDK gene had restriction sites. 72 00:05:14,820 --> 00:05:22,350 Let's see, we'll say restriction site K and A here. 73 00:05:22,350 --> 00:05:24,420 Then if you have these restriction sites 74 00:05:24,420 --> 00:05:27,360 in your fragment of DNA, you can then 75 00:05:27,360 --> 00:05:31,200 digest or cut that piece of DNA with these restriction 76 00:05:31,200 --> 00:05:33,570 endonucleases. 77 00:05:33,570 --> 00:05:37,740 And then you'd get a fragment of CDK 78 00:05:37,740 --> 00:05:41,820 that has K and A sticky ends. 79 00:05:41,820 --> 00:05:45,420 We'll pretend that both of these have sticky ends. 80 00:05:45,420 --> 00:05:49,935 And now you have unique sticky ends between K and A. 81 00:05:49,935 --> 00:05:58,740 And you might have a vector that also has these two sites. 82 00:05:58,740 --> 00:06:02,510 And you could digest this vector with these two enzymes. 83 00:06:02,510 --> 00:06:06,890 And that would allow you to insert the specific gene 84 00:06:06,890 --> 00:06:09,920 in this plasmid. 85 00:06:09,920 --> 00:06:12,350 And if you have two unique sites, 86 00:06:12,350 --> 00:06:18,830 because K only recognizes K here and A only recognizes A, 87 00:06:18,830 --> 00:06:20,690 then it will ligate in. 88 00:06:20,690 --> 00:06:23,540 But you can do it with a specific orientation 89 00:06:23,540 --> 00:06:26,810 because you have two different restriction sites. 90 00:06:26,810 --> 00:06:32,360 So I hope you all see how it's with one restriction 91 00:06:32,360 --> 00:06:33,590 site versus two. 92 00:06:39,180 --> 00:06:39,680 All right. 93 00:06:39,680 --> 00:06:42,230 Now let's say you want to do something more complicated 94 00:06:42,230 --> 00:06:44,270 than this. 95 00:06:44,270 --> 00:06:49,310 Let's say rather than just identifying the gene that's 96 00:06:49,310 --> 00:06:54,500 involved in cell division, you want to engineer a new gene, 97 00:06:54,500 --> 00:06:59,210 in order to determine where this particular protein, CDK, 98 00:06:59,210 --> 00:07:03,140 localizes in the cell. 99 00:07:03,140 --> 00:07:09,470 So we have CDK, which could be from yeast or human, 100 00:07:09,470 --> 00:07:11,290 it doesn't matter. 101 00:07:11,290 --> 00:07:17,090 And you want to engineer a new protein, basically, 102 00:07:17,090 --> 00:07:19,790 that you can see. 103 00:07:19,790 --> 00:07:22,670 So remember Professor Imperiali introduced 104 00:07:22,670 --> 00:07:26,360 green fluorescent protein earlier in the year. 105 00:07:26,360 --> 00:07:28,700 And this green fluorescent protein 106 00:07:28,700 --> 00:07:34,050 is from a gene from jellyfish. 107 00:07:34,050 --> 00:07:37,700 So now we could, using what I've told you, 108 00:07:37,700 --> 00:07:40,580 reconstruct or engineer a gene that 109 00:07:40,580 --> 00:07:43,130 has DNA from three different organisms, 110 00:07:43,130 --> 00:07:46,910 in order to make a CDK variant that we 111 00:07:46,910 --> 00:07:49,650 are able to see in the cell. 112 00:07:49,650 --> 00:07:53,750 So remember, a green fluorescent protein is like a beacon, 113 00:07:53,750 --> 00:07:55,700 if it's attached to a protein. 114 00:07:55,700 --> 00:07:58,460 If you shine blue light on it, it emits green light. 115 00:07:58,460 --> 00:08:04,220 And so you can use a fluorescent microscope in order to see it. 116 00:08:04,220 --> 00:08:07,240 In this case, let's say there's also another restriction site 117 00:08:07,240 --> 00:08:13,730 here, R. And let's say you have a fragment of GFP that 118 00:08:13,730 --> 00:08:17,630 has two restriction sites, A and R. You could then 119 00:08:17,630 --> 00:08:20,990 cut this fragment and this fragment 120 00:08:20,990 --> 00:08:24,050 with these restriction enzymes A and R. 121 00:08:24,050 --> 00:08:30,500 And you could insert GFP at the C terminus of the CDK gene. 122 00:08:30,500 --> 00:08:38,720 So you could go and have a gene that has CDK GFP inserted 123 00:08:38,720 --> 00:08:42,169 inside a bacterial vector. 124 00:08:42,169 --> 00:08:45,680 Now which one of these junction sites 125 00:08:45,680 --> 00:08:47,840 do you think would be most sensitive in doing 126 00:08:47,840 --> 00:08:49,130 this type of experiment? 127 00:08:51,780 --> 00:08:54,060 So there are three junction sites. 128 00:08:54,060 --> 00:08:56,447 There's this one, this one, and this one. 129 00:08:56,447 --> 00:08:58,030 Which is the one you're probably going 130 00:08:58,030 --> 00:08:59,870 to put the most thought into when 131 00:08:59,870 --> 00:09:02,000 you're doing this experiment? 132 00:09:02,000 --> 00:09:04,676 Yes, Miles. 133 00:09:04,676 --> 00:09:06,450 AUDIENCE: The A? 134 00:09:06,450 --> 00:09:08,000 ADAM MARTIN: The A site. 135 00:09:08,000 --> 00:09:10,160 Miles is exactly right. 136 00:09:10,160 --> 00:09:11,750 This one is going to be important. 137 00:09:11,750 --> 00:09:15,744 And why did you choose that site? 138 00:09:15,744 --> 00:09:18,100 AUDIENCE: Of the three sites, two 139 00:09:18,100 --> 00:09:24,874 are half insert, half originals [INAUDIBLE].. 140 00:09:24,874 --> 00:09:29,360 But at A, both sides of it are inserts. 141 00:09:29,360 --> 00:09:31,825 So [INAUDIBLE] carefully. 142 00:09:31,825 --> 00:09:33,200 ADAM MARTIN: And if you're trying 143 00:09:33,200 --> 00:09:35,540 to make a fusion protein, what's going 144 00:09:35,540 --> 00:09:39,900 to be an important quality of this? 145 00:09:39,900 --> 00:09:41,920 Malik, DID you have a point? 146 00:09:41,920 --> 00:09:49,435 AUDIENCE: Well, they try to [INAUDIBLE] 147 00:09:49,435 --> 00:09:52,369 we'd have to make sure that the [INAUDIBLE].. 148 00:09:56,935 --> 00:09:58,060 ADAM MARTIN: Excellent job. 149 00:09:58,060 --> 00:10:01,450 So Malik just pointed out two really important things. 150 00:10:01,450 --> 00:10:04,330 To make this a fusion protein, you 151 00:10:04,330 --> 00:10:06,940 have two different open reading frames. 152 00:10:06,940 --> 00:10:08,920 These two open reading frames have 153 00:10:08,920 --> 00:10:11,410 to be in frame with each other. 154 00:10:11,410 --> 00:10:15,850 So this junction here has to be in frame where 155 00:10:15,850 --> 00:10:19,060 GFP is in frame with CDK, meaning 156 00:10:19,060 --> 00:10:26,080 that you're reading the same triplet codons for GFP, 157 00:10:26,080 --> 00:10:28,450 there in the same frame as CDK. 158 00:10:28,450 --> 00:10:31,090 Also, you want to make sure there's no stop codon here. 159 00:10:34,200 --> 00:10:36,310 Because if you had a stop codon here, 160 00:10:36,310 --> 00:10:38,650 you're just going to make a CDK protein. 161 00:10:38,650 --> 00:10:40,510 And then it's going to stop and then 162 00:10:40,510 --> 00:10:42,260 you won't have it fused to GFP. 163 00:10:46,440 --> 00:10:50,080 And you guys will work through more of these in the homework. 164 00:10:50,080 --> 00:10:53,200 So you'll be able to get a sense of it. 165 00:10:53,200 --> 00:10:55,870 So now for the remainder of this lecture 166 00:10:55,870 --> 00:10:58,750 and also for Monday's lecture, I want 167 00:10:58,750 --> 00:11:01,840 to go through a problem with you. 168 00:11:01,840 --> 00:11:06,580 Basically, if you have a given disease that's heritable, 169 00:11:06,580 --> 00:11:11,380 how might you go from knowing that disease 170 00:11:11,380 --> 00:11:15,640 is heritable to finding out what gene 171 00:11:15,640 --> 00:11:18,940 is responsible for that given disease? 172 00:11:18,940 --> 00:11:20,740 And this is going to involve thinking 173 00:11:20,740 --> 00:11:25,820 about different levels of resolution, in terms of maps. 174 00:11:25,820 --> 00:11:28,480 So the highest resolution map you can have 175 00:11:28,480 --> 00:11:31,780 for a genome is the sequence. 176 00:11:31,780 --> 00:11:37,120 You can have the full nucleotide sequence of a genome. 177 00:11:37,120 --> 00:11:39,220 And that's the highest possible resolution 178 00:11:39,220 --> 00:11:40,960 because you have single nucleotide 179 00:11:40,960 --> 00:11:46,540 resolution as to what every single base pair is. 180 00:11:46,540 --> 00:11:50,320 But that's like knowing like your apartment 181 00:11:50,320 --> 00:11:54,010 number and your street number and basically knowing 182 00:11:54,010 --> 00:11:55,300 everything. 183 00:11:55,300 --> 00:11:59,080 But starting out, you might want to know what continent it's on, 184 00:11:59,080 --> 00:12:02,050 or what country is it in. 185 00:12:02,050 --> 00:12:07,300 And so you first have to narrow down the possible locations 186 00:12:07,300 --> 00:12:09,520 for a given disease gene. 187 00:12:09,520 --> 00:12:14,080 And that will, at first, involve establishing what chromosome 188 00:12:14,080 --> 00:12:17,970 and what region of a chromosome a given disease 189 00:12:17,970 --> 00:12:20,710 allele is linked to. 190 00:12:20,710 --> 00:12:29,860 And that involves making essentially a linkage map, 191 00:12:29,860 --> 00:12:35,290 where you establish where a disease gene is located based 192 00:12:35,290 --> 00:12:40,960 on its linkage to known markers that are present in the genome. 193 00:12:40,960 --> 00:12:46,930 Now this is going to require that you remember back 194 00:12:46,930 --> 00:12:49,600 two weeks ago, to when we talked about linkage 195 00:12:49,600 --> 00:12:51,280 and recombination. 196 00:12:51,280 --> 00:12:57,970 And you'll recall that we were looking 197 00:12:57,970 --> 00:13:05,110 at the linkage between genes and flies and genes and yeast. 198 00:13:05,110 --> 00:13:09,640 One difference between that type of linkage mapping 199 00:13:09,640 --> 00:13:12,100 and human linkage mapping is we don't 200 00:13:12,100 --> 00:13:17,950 have really clear traits that are defined by single genes. 201 00:13:17,950 --> 00:13:22,660 You can't just take hair color and map the hair color gene 202 00:13:22,660 --> 00:13:26,180 to link it to a disease gene. 203 00:13:26,180 --> 00:13:30,740 Because hair color is determined by many, many different genes. 204 00:13:30,740 --> 00:13:32,740 So in fruit flies, you can take white eyes 205 00:13:32,740 --> 00:13:35,620 and see if it's connected with yellow body color 206 00:13:35,620 --> 00:13:39,790 because both of those are determined by single genes. 207 00:13:39,790 --> 00:13:44,590 So we need something other than just having phenotypic traits 208 00:13:44,590 --> 00:13:46,150 that we can track. 209 00:13:46,150 --> 00:13:49,330 We need what are known as molecular markers 210 00:13:49,330 --> 00:13:51,760 to be able to perform linkage mapping. 211 00:13:56,770 --> 00:14:00,070 And so what we need in these molecular markers-- 212 00:14:00,070 --> 00:14:02,530 well, if we just think about if we 213 00:14:02,530 --> 00:14:09,490 wanted to determine the linkage between the A and B genes. 214 00:14:09,490 --> 00:14:11,290 And if you did this cross, would you 215 00:14:11,290 --> 00:14:12,655 be able to determine linkage? 216 00:14:19,480 --> 00:14:23,050 Georgia, you made a motion that was correct. 217 00:14:23,050 --> 00:14:25,030 Tell me. 218 00:14:25,030 --> 00:14:27,030 Why did you shake your head no? 219 00:14:27,030 --> 00:14:30,110 AUDIENCE: They'd all be heterozygous. 220 00:14:30,110 --> 00:14:33,100 ADAM MARTIN: Yeah they'd all be heterozygous. 221 00:14:33,100 --> 00:14:37,420 Because this individual has the same allele 222 00:14:37,420 --> 00:14:39,460 on both chromosomes, you're not going 223 00:14:39,460 --> 00:14:43,690 to be able to differentiate one chromosome from the other. 224 00:14:43,690 --> 00:14:45,610 And so the point I want to make is 225 00:14:45,610 --> 00:14:49,450 that in order to see linkage, what you need is variation. 226 00:14:54,360 --> 00:14:56,850 So we need to have variation. 227 00:14:56,850 --> 00:15:01,650 And another term for genetic variation is polymorphism. 228 00:15:01,650 --> 00:15:11,310 So we need polymorphism, or genetic variation, 229 00:15:11,310 --> 00:15:13,320 between these molecular markers. 230 00:15:16,530 --> 00:15:21,150 We also need genetic variation in the disease. 231 00:15:21,150 --> 00:15:22,380 But we have that. 232 00:15:22,380 --> 00:15:25,560 We have individuals that are affected 233 00:15:25,560 --> 00:15:27,180 by a disease and individuals that 234 00:15:27,180 --> 00:15:28,590 are not affected by a disease. 235 00:15:28,590 --> 00:15:31,740 So we have variation in alleles there. 236 00:15:31,740 --> 00:15:36,480 But in order to map it with a molecular marker, 237 00:15:36,480 --> 00:15:38,850 to map linkage to a molecular marker, 238 00:15:38,850 --> 00:15:41,520 you also need variation here. 239 00:15:41,520 --> 00:15:43,890 So the problem with this cross is here 240 00:15:43,890 --> 00:15:47,370 you need to have heterozygote. 241 00:15:47,370 --> 00:15:53,160 There needs to be variation in this individual, where 242 00:15:53,160 --> 00:15:57,930 both of these alleles are heterozygous. 243 00:15:57,930 --> 00:16:01,680 So now I want to talk about some of these molecular markers 244 00:16:01,680 --> 00:16:08,700 that we can use, and how they vary between individuals 245 00:16:08,700 --> 00:16:10,050 and between chromosomes. 246 00:16:12,630 --> 00:16:15,450 Now this is going to be maybe the lowest resolution map. 247 00:16:15,450 --> 00:16:18,990 But I'm talking about this linkage map here. 248 00:16:18,990 --> 00:16:21,780 And you can see highlighted that the bottom here 249 00:16:21,780 --> 00:16:24,450 are various types of polymorphisms 250 00:16:24,450 --> 00:16:28,380 that we can use to link a given disease 251 00:16:28,380 --> 00:16:32,340 allele to a specific chromosome and a specific place 252 00:16:32,340 --> 00:16:33,280 on chromosome. 253 00:16:36,120 --> 00:16:45,310 So I'll start with the first one, which is a simple sequence 254 00:16:45,310 --> 00:16:46,510 repeat. 255 00:16:46,510 --> 00:16:47,860 It goes by many names. 256 00:16:47,860 --> 00:16:50,035 But I will stick with what's on the slide. 257 00:16:54,020 --> 00:17:00,280 So a simple sequence repeat is also known as a microsatellite. 258 00:17:00,280 --> 00:17:04,690 So you might see that term floating around, 259 00:17:04,690 --> 00:17:06,880 if you're reading about this. 260 00:17:06,880 --> 00:17:12,290 And what a simple sequence repeat is, as the name implies, 261 00:17:12,290 --> 00:17:14,230 it's a simple sequence. 262 00:17:14,230 --> 00:17:17,560 It could be a dinucleotide, like CA. 263 00:17:17,560 --> 00:17:19,390 And it's just a dinucleotide that's 264 00:17:19,390 --> 00:17:23,050 repeated over and over again. 265 00:17:23,050 --> 00:17:26,560 So on a chromosome, you might have a unique sequence, 266 00:17:26,560 --> 00:17:28,900 which I'll just draw as a line. , 267 00:17:28,900 --> 00:17:33,160 And then you could have a CA dinucleotide that's repeated 268 00:17:33,160 --> 00:17:37,570 some number of times, N. And then that's followed by another 269 00:17:37,570 --> 00:17:40,480 unique sequence. 270 00:17:40,480 --> 00:17:41,900 And that's what's present in it. 271 00:17:41,900 --> 00:17:43,870 So that would be one strand. 272 00:17:43,870 --> 00:17:45,850 And then in the opposite strand, you'd 273 00:17:45,850 --> 00:17:49,360 have a unique sequence, the complement of CA, 274 00:17:49,360 --> 00:17:52,140 which is GT, and then, again, unique sequence. 275 00:17:54,790 --> 00:18:00,040 And so there's variation in the number of repeats of the CA. 276 00:18:00,040 --> 00:18:02,530 And so there's polymorphism. 277 00:18:02,530 --> 00:18:05,980 So we can use this to establish linkage 278 00:18:05,980 --> 00:18:08,890 between this marker and a phenotype, 279 00:18:08,890 --> 00:18:12,250 like a disease phenotype. 280 00:18:12,250 --> 00:18:15,820 So how might you detect the number of repeats 281 00:18:15,820 --> 00:18:17,440 that are present here? 282 00:18:17,440 --> 00:18:20,140 Anyone have an idea of a tool that we've discussed 283 00:18:20,140 --> 00:18:21,280 that could be used here? 284 00:18:24,060 --> 00:18:28,340 So one hint that I gave you is that the sequence here 285 00:18:28,340 --> 00:18:30,930 is unique and the sequence here is unique. 286 00:18:30,930 --> 00:18:34,160 So is there a way we can leverage that unique sequence 287 00:18:34,160 --> 00:18:36,058 to determine whether there's a difference 288 00:18:36,058 --> 00:18:37,100 in the number of repeats? 289 00:18:42,500 --> 00:18:46,860 What's a technique we discussed that involves 290 00:18:46,860 --> 00:18:51,530 some component of the technique recognizing a unique sequence? 291 00:18:51,530 --> 00:18:52,140 Yeah, Natalie? 292 00:18:52,140 --> 00:18:53,790 AUDIENCE: CRISPR Cas9. 293 00:18:53,790 --> 00:18:57,720 ADAM MARTIN: Well, CRISPR Cas9 is a possibility. 294 00:18:57,720 --> 00:18:59,860 Jeremy, did you have an idea? 295 00:18:59,860 --> 00:19:00,740 AUDIENCE: PCR? 296 00:19:00,740 --> 00:19:03,510 ADAM MARTIN: PCR-- so it's true. 297 00:19:03,510 --> 00:19:05,770 You could get it to recognize that. 298 00:19:05,770 --> 00:19:08,520 But then you have to detect it, somehow. 299 00:19:08,520 --> 00:19:12,510 So what's more commonly used is PCR. 300 00:19:12,510 --> 00:19:15,090 Those are both good ideas. 301 00:19:15,090 --> 00:19:18,780 But using PCR, you could design a primer here 302 00:19:18,780 --> 00:19:20,460 and a primer here. 303 00:19:20,460 --> 00:19:24,000 And you could amplify this repeat sequence. 304 00:19:24,000 --> 00:19:26,430 And the number of repeats would determine 305 00:19:26,430 --> 00:19:30,000 the size of your PCR fragment. 306 00:19:30,000 --> 00:19:35,970 So if you did PCR, then you'd get a PCR fragment that 307 00:19:35,970 --> 00:19:41,130 has the primers on each end, but then has this certain size 308 00:19:41,130 --> 00:19:44,130 based on the number of repeats. 309 00:19:44,130 --> 00:19:46,680 So in that case, we need some sort of tool 310 00:19:46,680 --> 00:19:52,380 that enables us to determine the size of a particular DNA 311 00:19:52,380 --> 00:19:54,090 fragment. 312 00:19:54,090 --> 00:19:57,270 And so I'm going to just introduce to you one such tool, 313 00:19:57,270 --> 00:20:00,870 which is gel electrophoresis. 314 00:20:00,870 --> 00:20:04,110 And gel electrophoresis involves taking DNA 315 00:20:04,110 --> 00:20:08,400 that you've generated, by either PCR or by cutting up 316 00:20:08,400 --> 00:20:13,320 DNA with a restriction enzyme, and loading it 317 00:20:13,320 --> 00:20:16,680 in a gel that has agarose. 318 00:20:16,680 --> 00:20:18,150 Maybe it's composed of agarose. 319 00:20:18,150 --> 00:20:20,970 It could be composed of polyacrylamide. 320 00:20:20,970 --> 00:20:24,360 And then because DNA is negatively charged, 321 00:20:24,360 --> 00:20:28,380 the backbone, if you run a current through it, 322 00:20:28,380 --> 00:20:31,290 such as the positive electrode is at the bottom, 323 00:20:31,290 --> 00:20:36,540 then the DNA is going to snake through this gel. 324 00:20:36,540 --> 00:20:38,460 Now we'll do a quick demonstration, 325 00:20:38,460 --> 00:20:39,510 if you two could come up. 326 00:20:39,510 --> 00:20:41,520 I need one volunteer. 327 00:20:41,520 --> 00:20:44,580 Ori, find 10 of your friends and bring them down. 328 00:20:56,230 --> 00:20:56,730 All right. 329 00:20:56,730 --> 00:20:58,670 That's probably good. 330 00:20:58,670 --> 00:21:01,250 Yeah. 331 00:21:01,250 --> 00:21:03,680 All right, Hannah, why don't you-- 332 00:21:03,680 --> 00:21:05,260 you guys have to link up, OK? 333 00:21:08,300 --> 00:21:08,950 Stay over here. 334 00:21:08,950 --> 00:21:09,950 We'll start at this end. 335 00:21:09,950 --> 00:21:12,800 This is the negative electrode over here. 336 00:21:12,800 --> 00:21:15,950 The positive electrode is going to be down there. 337 00:21:15,950 --> 00:21:19,550 And Jackie is going to be our single nucleotide. 338 00:21:19,550 --> 00:21:22,220 You guys link like-- 339 00:21:22,220 --> 00:21:23,580 yeah. 340 00:21:23,580 --> 00:21:27,080 You don't have to do-si-do, or anything like that. 341 00:21:27,080 --> 00:21:28,010 All right. 342 00:21:28,010 --> 00:21:29,750 Now what I want you guys to do is 343 00:21:29,750 --> 00:21:32,570 I want you to slalom through these cones 344 00:21:32,570 --> 00:21:34,250 like it's all agarose gel. 345 00:21:34,250 --> 00:21:37,430 So that you're going towards the other side. 346 00:21:37,430 --> 00:21:39,680 And I'm going to turn on the current now. 347 00:21:39,680 --> 00:21:40,220 So go. 348 00:21:47,330 --> 00:21:49,590 All right, stop. 349 00:21:49,590 --> 00:21:50,090 All right. 350 00:21:50,090 --> 00:21:53,090 See how the shorter DNA fragment is 351 00:21:53,090 --> 00:21:55,790 able to more easily navigate through the cones 352 00:21:55,790 --> 00:21:58,370 and get farther. 353 00:21:58,370 --> 00:22:01,770 So it was somewhat rigged. 354 00:22:01,770 --> 00:22:02,270 I know. 355 00:22:02,270 --> 00:22:05,900 But I just needed some way to make sure you always 356 00:22:05,900 --> 00:22:09,050 remember that the shorter nucleotide, or the shorter 357 00:22:09,050 --> 00:22:11,480 fragment, is going to migrate faster. 358 00:22:11,480 --> 00:22:12,830 You guys can go back up. 359 00:22:12,830 --> 00:22:15,120 Thank you for your participation. 360 00:22:15,120 --> 00:22:16,610 Let's give them round of applause. 361 00:22:16,610 --> 00:22:20,068 [APPLAUSE] 362 00:22:22,040 --> 00:22:22,540 All right. 363 00:22:22,540 --> 00:22:27,660 So what you just saw is that the longer DNA fragments, 364 00:22:27,660 --> 00:22:31,540 they're going to be more inhibited by moving 365 00:22:31,540 --> 00:22:32,710 through the gel. 366 00:22:32,710 --> 00:22:34,960 And so they're going to move slower and thus, 367 00:22:34,960 --> 00:22:36,827 not move as far in the gel. 368 00:22:36,827 --> 00:22:38,410 Whereas, the small fragments are going 369 00:22:38,410 --> 00:22:40,600 to move much faster because they're 370 00:22:40,600 --> 00:22:43,870 able to maneuver their way through this gel 371 00:22:43,870 --> 00:22:45,710 much more quickly. 372 00:22:45,710 --> 00:22:49,060 So there's going to be an inverse proportionality 373 00:22:49,060 --> 00:22:55,840 between the size of the DNA chain and its rate of movement. 374 00:22:55,840 --> 00:22:58,240 You're always going to see the shorter DNA 375 00:22:58,240 --> 00:23:00,190 fragment moving faster. 376 00:23:00,190 --> 00:23:01,990 So what one of these gels actually 377 00:23:01,990 --> 00:23:05,120 looks like is shown here. 378 00:23:05,120 --> 00:23:08,260 So this is a DNA gel that's agarose. 379 00:23:08,260 --> 00:23:11,890 And DNA has been run in these different samples. 380 00:23:11,890 --> 00:23:14,140 And what you're seeing is this gel 381 00:23:14,140 --> 00:23:16,480 is subsequently stained with a dye, 382 00:23:16,480 --> 00:23:20,080 like ethidium bromide, which allows you to visualize 383 00:23:20,080 --> 00:23:22,690 the individual DNA fragments. 384 00:23:22,690 --> 00:23:26,080 And so a band on this gel indicates a whole bunch 385 00:23:26,080 --> 00:23:30,100 of DNA fragments that are all roughly the same length. 386 00:23:30,100 --> 00:23:32,740 So essentially, you can measure DNA length 387 00:23:32,740 --> 00:23:34,270 using this technique. 388 00:23:34,270 --> 00:23:36,530 What's over here at the end of the gel, 389 00:23:36,530 --> 00:23:39,010 this is probably some sort of DNA ladder, 390 00:23:39,010 --> 00:23:41,470 where you have DNA fragments of known length 391 00:23:41,470 --> 00:23:44,230 that you can use to calibrate the length 392 00:23:44,230 --> 00:23:47,980 of these bands over here. 393 00:23:47,980 --> 00:23:50,520 So this is how you measure DNA length. 394 00:23:50,520 --> 00:23:52,540 And we're going to use it over and over again, 395 00:23:52,540 --> 00:23:56,440 as we talk about DNA and sequencing. 396 00:23:56,440 --> 00:23:58,360 So now, let's think about how this 397 00:23:58,360 --> 00:24:01,480 is going to help us establish linkage 398 00:24:01,480 --> 00:24:03,880 between a particular marker in the genome 399 00:24:03,880 --> 00:24:06,920 and a genetic disease. 400 00:24:06,920 --> 00:24:10,330 So if we think about these microsatellite repeats, 401 00:24:10,330 --> 00:24:11,890 I told you they're polymorphic. 402 00:24:11,890 --> 00:24:15,400 They exhibit a lot of variation in size. 403 00:24:15,400 --> 00:24:17,590 And so here's an example showing you 404 00:24:17,590 --> 00:24:21,940 a female who has two intermediate sized 405 00:24:21,940 --> 00:24:23,500 microsatellites. 406 00:24:23,500 --> 00:24:25,540 And if you look at this-- 407 00:24:25,540 --> 00:24:28,660 if you did PCR and measured the size of these, 408 00:24:28,660 --> 00:24:30,660 you get two different bands because there 409 00:24:30,660 --> 00:24:34,690 are two different alleles of different length here. 410 00:24:34,690 --> 00:24:40,120 So you can see this individual has two intermediate length 411 00:24:40,120 --> 00:24:41,420 repeats. 412 00:24:41,420 --> 00:24:45,100 And this person has had children with an individual 413 00:24:45,100 --> 00:24:48,490 that has a short and a long microsatellite. 414 00:24:48,490 --> 00:24:51,640 And you can see that on the gel, here. 415 00:24:51,640 --> 00:24:55,750 Now this female is affected by some disease. 416 00:24:55,750 --> 00:24:59,690 And these two individuals have children. 417 00:24:59,690 --> 00:25:02,500 And you can see that a number of those children 418 00:25:02,500 --> 00:25:04,460 are affected by the disease. 419 00:25:04,460 --> 00:25:06,820 So what mode of inheritance does this look like? 420 00:25:10,850 --> 00:25:14,000 If you had your choice between autosomal recessive, 421 00:25:14,000 --> 00:25:16,690 autosomal dominant, sex linked dominant, 422 00:25:16,690 --> 00:25:22,460 and sex linked recessive, what mode of inheritance 423 00:25:22,460 --> 00:25:23,480 is this looking like? 424 00:25:26,570 --> 00:25:27,080 Oh, Carmen. 425 00:25:27,080 --> 00:25:29,200 AUDIENCE: Autosomal recessive. 426 00:25:29,200 --> 00:25:31,550 ADAM MARTIN: Autosomal recessive? 427 00:25:31,550 --> 00:25:35,900 Why do you go with recessive? 428 00:25:35,900 --> 00:25:37,010 Yeah, go ahead. 429 00:25:37,010 --> 00:25:40,490 AUDIENCE: Because there is a male that's affected. 430 00:25:44,770 --> 00:25:47,895 But not both of the parents are affected. 431 00:25:47,895 --> 00:25:55,220 So it seems like the father is heterozygous 432 00:25:55,220 --> 00:25:57,232 and the mother is homozygous recessive. 433 00:26:00,390 --> 00:26:02,730 ADAM MARTIN: That's possible. 434 00:26:02,730 --> 00:26:06,180 That's exactly the logic I want to see. 435 00:26:06,180 --> 00:26:08,488 Is there another possibility? 436 00:26:08,488 --> 00:26:09,030 Yeah, Jeremy. 437 00:26:09,030 --> 00:26:10,238 AUDIENCE: Autosomal dominant. 438 00:26:10,238 --> 00:26:13,380 ADAM MARTIN: It could also be autosomal dominant. 439 00:26:13,380 --> 00:26:14,488 So you're right. 440 00:26:14,488 --> 00:26:15,030 You're right. 441 00:26:15,030 --> 00:26:19,890 If this was not a rare disease, then that male 442 00:26:19,890 --> 00:26:23,790 could care be a carrier and could be passing it 443 00:26:23,790 --> 00:26:26,520 on to half the children. 444 00:26:26,520 --> 00:26:27,720 So that's good. 445 00:26:27,720 --> 00:26:29,550 You'd essentially need more information 446 00:26:29,550 --> 00:26:32,760 to differentiate between autosomal recessive 447 00:26:32,760 --> 00:26:34,620 and autosomal dominant. 448 00:26:34,620 --> 00:26:36,210 For the purposes of this, we're going 449 00:26:36,210 --> 00:26:40,500 to go with autosomal dominant. 450 00:26:40,500 --> 00:26:43,890 And what you see is that you want 451 00:26:43,890 --> 00:26:46,260 to look at the affected individuals 452 00:26:46,260 --> 00:26:51,000 and see if the disease phenotype is linked, or connected, 453 00:26:51,000 --> 00:26:54,480 with one of these microsatellite alleles. 454 00:26:54,480 --> 00:26:56,530 So if we look at-- 455 00:26:56,530 --> 00:27:00,750 we basically PCR DNA from all these individuals. 456 00:27:00,750 --> 00:27:02,700 And if you look at who is affected, 457 00:27:02,700 --> 00:27:07,500 each one of the individuals has this M double prime band. 458 00:27:07,500 --> 00:27:12,570 And none of the unaffected individuals has it. 459 00:27:12,570 --> 00:27:17,970 So obviously, it would be better to have more pedigrees and more 460 00:27:17,970 --> 00:27:20,370 data to really establish significance 461 00:27:20,370 --> 00:27:21,870 between this linkage. 462 00:27:21,870 --> 00:27:24,390 But this is just a simple example, 463 00:27:24,390 --> 00:27:27,090 showing what you could possibly see 464 00:27:27,090 --> 00:27:30,120 if you have one of these molecular markers linked 465 00:27:30,120 --> 00:27:31,860 to a particular disease allele. 466 00:27:34,770 --> 00:27:38,910 So that kind of establishes the principle. 467 00:27:38,910 --> 00:27:42,540 Now let's think about what are some other molecular markers 468 00:27:42,540 --> 00:27:43,290 that are possible? 469 00:27:47,130 --> 00:27:49,850 So another type of marker, and this 470 00:27:49,850 --> 00:27:54,210 is one that's the most common one, if I go here. 471 00:27:54,210 --> 00:27:57,110 So here, you see here's is a linkage map, here. 472 00:27:57,110 --> 00:28:00,350 And you see most of these bands are green. 473 00:28:00,350 --> 00:28:03,080 And the green markers, here, are what 474 00:28:03,080 --> 00:28:07,610 are known as Single Nucleotide Polymorphisms, or SNPs. 475 00:28:10,420 --> 00:28:14,780 So single nucleotide polymorphisms-- 476 00:28:24,010 --> 00:28:27,630 and this is abbreviated SNP. 477 00:28:27,630 --> 00:28:31,730 And what a single nucleotide polymorphism is, is it's 478 00:28:31,730 --> 00:28:37,680 a variation of a nucleotide at a single position in the genome. 479 00:28:37,680 --> 00:28:41,880 So it's just a one base pair difference at a position. 480 00:28:41,880 --> 00:28:54,300 So there's variation of single nucleotide at a given position, 481 00:28:54,300 --> 00:28:58,380 at a position in the genome. 482 00:29:02,860 --> 00:29:05,970 And because that's a pretty general definition, 483 00:29:05,970 --> 00:29:09,780 there are tons of these in the genome. 484 00:29:09,780 --> 00:29:11,640 Now one thing to think about is you 485 00:29:11,640 --> 00:29:16,110 could have a mutation in an individual that creates a SNP. 486 00:29:16,110 --> 00:29:19,710 So you could have a de novo formation of a SNP. 487 00:29:19,710 --> 00:29:23,580 But if you have a SNP and it gets 488 00:29:23,580 --> 00:29:26,580 incorporated to the gametes of an individual, 489 00:29:26,580 --> 00:29:28,710 then that variant is going to be passed 490 00:29:28,710 --> 00:29:31,260 on to the next generation. 491 00:29:31,260 --> 00:29:34,110 So this is something that could occur de novo. 492 00:29:34,110 --> 00:29:36,540 But it is also heritable. 493 00:29:36,540 --> 00:29:39,810 And if it's heritable, then you can follow it 494 00:29:39,810 --> 00:29:42,630 and use it to determine if a given 495 00:29:42,630 --> 00:29:46,155 variant is linked to a given phenotype, like a disease. 496 00:29:48,730 --> 00:29:53,980 So to identify a single nucleotide polymorphism, 497 00:29:53,980 --> 00:29:57,190 it's helpful to be able to sequence the DNA. 498 00:29:57,190 --> 00:30:00,830 And I'll talk about how we could do that in just a minute. 499 00:30:00,830 --> 00:30:04,060 But before I go on, I just want to point out 500 00:30:04,060 --> 00:30:10,000 a subclass of SNPs that can be visualized without sequencing. 501 00:30:10,000 --> 00:30:15,490 And these are called restriction fragment length polymorphisms. 502 00:30:15,490 --> 00:30:18,640 So restriction fragment-- so it's 503 00:30:18,640 --> 00:30:20,980 going to involve some type of restriction 504 00:30:20,980 --> 00:30:26,890 enzyme digest length polymorphism. 505 00:30:26,890 --> 00:30:28,630 It's a long word. 506 00:30:28,630 --> 00:30:30,100 But it's abbreviated RFLP. 507 00:30:32,800 --> 00:30:36,160 And what this is, is it's a variation 508 00:30:36,160 --> 00:30:37,960 of a single nucleotide. 509 00:30:37,960 --> 00:30:40,720 But this is a subclass of SNP. 510 00:30:40,720 --> 00:30:43,810 Because this is when the variation occurs 511 00:30:43,810 --> 00:30:47,080 in a restriction site for a restriction enzyme. 512 00:30:47,080 --> 00:30:51,250 So if you remember your good friend EcoR1, 513 00:30:51,250 --> 00:30:55,270 EcoR1 recognizes the nucleotide sequence GAATTC. 514 00:30:58,150 --> 00:31:04,270 And EcoR1 only cleaves DNA sequence that has GAATTC. 515 00:31:07,090 --> 00:31:13,000 So if there was a single nucleotide variation 516 00:31:13,000 --> 00:31:15,370 in the sequence, such that it's now 517 00:31:15,370 --> 00:31:24,190 GATTTC, or something like that, that destroys the EcoR1 site. 518 00:31:24,190 --> 00:31:28,300 And so EcoR1 will no longer be able to recognize this site 519 00:31:28,300 --> 00:31:31,340 in the genome and cut it. 520 00:31:31,340 --> 00:31:36,010 So you could imagine that if you had 521 00:31:36,010 --> 00:31:41,980 one individual in the genome having three EcoR1 sites, 522 00:31:41,980 --> 00:31:45,700 if you digest this region, you'd get two fragments. 523 00:31:45,700 --> 00:31:50,720 But if you destroyed the one in the middle, 524 00:31:50,720 --> 00:31:55,250 then if you digested this piece of DNA, 525 00:31:55,250 --> 00:31:57,825 then you'd only get one fragment. 526 00:31:57,825 --> 00:31:58,700 And that's something. 527 00:31:58,700 --> 00:32:02,000 Because it results in different sizes of fragments, 528 00:32:02,000 --> 00:32:03,560 that's something you can see just 529 00:32:03,560 --> 00:32:06,500 by doing DNA electrophoresis. 530 00:32:06,500 --> 00:32:08,660 And maybe you would use some method 531 00:32:08,660 --> 00:32:11,600 to detect this specific region, so that you're not 532 00:32:11,600 --> 00:32:13,820 looking at all the DNA in the genome, 533 00:32:13,820 --> 00:32:18,080 but you're establishing linkage to this specific area. 534 00:32:18,080 --> 00:32:20,090 You could use PCR. 535 00:32:20,090 --> 00:32:23,330 You can have PCR primers here and here. 536 00:32:23,330 --> 00:32:25,160 And you could then cut with EcoR1. 537 00:32:25,160 --> 00:32:27,560 In one case, you'd get two fragments. 538 00:32:27,560 --> 00:32:29,690 In this case, you'd get two fragments. 539 00:32:29,690 --> 00:32:32,960 In this case, if you amplified this region of the genome 540 00:32:32,960 --> 00:32:36,140 and cut with EcoR1, you'd only get one fragment. 541 00:32:36,140 --> 00:32:37,580 So you'd be able to differentiate 542 00:32:37,580 --> 00:32:39,530 between those possibilities. 543 00:32:39,530 --> 00:32:40,700 Yes, Malik. 544 00:32:40,700 --> 00:32:43,490 AUDIENCE: When you use PCR, are there [INAUDIBLE]?? 545 00:32:45,658 --> 00:32:46,700 ADAM MARTIN: What's that? 546 00:32:46,700 --> 00:32:48,373 AUDIENCE: Are there [INAUDIBLE]? 547 00:32:48,373 --> 00:32:49,040 ADAM MARTIN: Oh. 548 00:32:49,040 --> 00:32:52,500 You're saying what causes it to stop? 549 00:32:52,500 --> 00:32:53,820 That's a great question, Malik. 550 00:32:53,820 --> 00:32:54,320 Yeah. 551 00:32:54,320 --> 00:32:57,620 So initially, it's not going to stop. 552 00:32:57,620 --> 00:32:59,360 That's absolutely right. 553 00:32:59,360 --> 00:33:02,540 But because every step, each time you replicate, 554 00:33:02,540 --> 00:33:06,350 it's then primed with another primer. 555 00:33:06,350 --> 00:33:10,940 So you'd replicate something like this that's too long. 556 00:33:10,940 --> 00:33:13,700 But then the reverse primer would replicate like this. 557 00:33:13,700 --> 00:33:15,650 And it would stop. 558 00:33:15,650 --> 00:33:19,130 So if you go back to my slide from last lecture, 559 00:33:19,130 --> 00:33:23,270 look through that and see if it makes sense how it's ending. 560 00:33:23,270 --> 00:33:25,730 Because if you do this 30 times, you really 561 00:33:25,730 --> 00:33:30,590 will enrich for a fragment that stops and ends at the two 562 00:33:30,590 --> 00:33:35,900 primers, or begins and ends at the two primers, I should say. 563 00:33:35,900 --> 00:33:36,800 Good question. 564 00:33:36,800 --> 00:33:39,120 Thank you. 565 00:33:39,120 --> 00:33:39,620 All right. 566 00:33:39,620 --> 00:33:42,160 Now, let's talk about DNA sequencing. 567 00:33:42,160 --> 00:33:43,880 Because as I showed you, obviously, 568 00:33:43,880 --> 00:33:46,280 these SNPs, because there are so many of them, 569 00:33:46,280 --> 00:33:51,080 are probably the most useful of these markers to narrow in 570 00:33:51,080 --> 00:33:53,690 on where your disease gene is. 571 00:33:53,690 --> 00:33:58,160 And to detect a SNP, we need to be able to sequence DNA. 572 00:34:04,220 --> 00:34:05,810 So I'm going to start with an older 573 00:34:05,810 --> 00:34:09,530 method for DNA sequencing, which conceptually, 574 00:34:09,530 --> 00:34:13,250 is very similar to how we do DNA sequencing today. 575 00:34:13,250 --> 00:34:15,139 And so it will illustrate my point. 576 00:34:15,139 --> 00:34:19,550 And then at the end, I'll talk about more modern techniques 577 00:34:19,550 --> 00:34:21,600 to sequencing. 578 00:34:21,600 --> 00:34:23,570 So the technique I'm going to introduce to you 579 00:34:23,570 --> 00:34:25,639 is called Sanger sequencing. 580 00:34:28,820 --> 00:34:33,230 And that's because it was identified by an individual 581 00:34:33,230 --> 00:34:34,130 named Fred Sanger. 582 00:34:37,850 --> 00:34:43,429 And I'm going to just take a very simple DNA sequence, 583 00:34:43,429 --> 00:34:46,600 in order to illustrate how Sanger sequencing works. 584 00:34:50,370 --> 00:34:53,130 So let's take a sequence that's really simple. 585 00:35:02,470 --> 00:35:06,810 This is very, very simple, and then more sequence here. 586 00:35:09,370 --> 00:35:13,800 So let's say we want to determine the nucleotide that's 587 00:35:13,800 --> 00:35:18,120 at every position of this DNA fragment. 588 00:35:18,120 --> 00:35:20,790 So one way we could maybe conceptually 589 00:35:20,790 --> 00:35:22,950 think about doing this, is to try 590 00:35:22,950 --> 00:35:28,440 to let DNA polymerase tell us where given nucleotides are. 591 00:35:28,440 --> 00:35:30,690 And if we're going to use DNA polymerase, what 592 00:35:30,690 --> 00:35:34,780 are we going to need, in order to facilitate this process? 593 00:35:37,340 --> 00:35:37,970 Yes, Rachel. 594 00:35:37,970 --> 00:35:39,170 AUDIENCE: [INAUDIBLE]. 595 00:35:39,170 --> 00:35:42,560 ADAM MARTIN: You're going to need nucleotides, definitely. 596 00:35:42,560 --> 00:35:44,660 So we're going to need nucleotides. 597 00:35:44,660 --> 00:35:46,130 What else? 598 00:35:46,130 --> 00:35:49,080 To start, what are you going to need? 599 00:35:49,080 --> 00:35:49,916 Miles? 600 00:35:49,916 --> 00:35:50,770 AUDIENCE: Primer. 601 00:35:50,770 --> 00:35:52,937 ADAM MARTIN: You're going to need a primer, exactly. 602 00:35:52,937 --> 00:35:53,920 Good job. 603 00:35:53,920 --> 00:35:56,680 So you need a primer. 604 00:35:56,680 --> 00:35:58,660 So here's a primer. 605 00:35:58,660 --> 00:36:03,130 And now, we're going to try to get DNA polymerase to tell us 606 00:36:03,130 --> 00:36:06,560 whenever there is a given nucleotide in this DNA 607 00:36:06,560 --> 00:36:07,060 sequence. 608 00:36:10,330 --> 00:36:13,540 And so think with me. 609 00:36:13,540 --> 00:36:19,860 Let's say we were able to get DNA polymerase to stop whenever 610 00:36:19,860 --> 00:36:22,770 there was a certain nucleotide. 611 00:36:22,770 --> 00:36:25,980 So if we go through just a couple nucleotides, 612 00:36:25,980 --> 00:36:32,310 let's say, at first, we want DNA polymerase to stop whenever 613 00:36:32,310 --> 00:36:37,350 there's an A. So let's say there was 614 00:36:37,350 --> 00:36:40,400 a possibility it would stop at this A. If it's 615 00:36:40,400 --> 00:36:42,360 stopped at this A, you'd generate 616 00:36:42,360 --> 00:36:45,270 a fragment of this length. 617 00:36:45,270 --> 00:36:47,860 But if it read on through that A, 618 00:36:47,860 --> 00:36:53,160 there's another possibility that it would stop at this A. 619 00:36:53,160 --> 00:36:56,730 So we're kind of looking at when these are stopping. 620 00:36:56,730 --> 00:37:04,200 And the final possibility is it goes on and stops at this A. 621 00:37:04,200 --> 00:37:07,740 So if this DNA polymerase stopped only at As, 622 00:37:07,740 --> 00:37:11,580 you'd get fragments that are these three discrete lengths. 623 00:37:14,440 --> 00:37:17,050 Now let's consider another possibility. 624 00:37:17,050 --> 00:37:24,160 So pink here is stop at A. And in blue, I'm 625 00:37:24,160 --> 00:37:27,670 going to draw what would happen if it stopped at T. 626 00:37:27,670 --> 00:37:30,040 So they all start from the same place. 627 00:37:30,040 --> 00:37:32,260 If it stopped at T, it would just 628 00:37:32,260 --> 00:37:36,895 stop one nucleotide beyond this A in this simple sequence. 629 00:37:39,700 --> 00:37:44,920 So in blue here, this is stop at T. 630 00:37:44,920 --> 00:37:49,420 But if it's just a possibility, it stops. 631 00:37:49,420 --> 00:37:53,170 And some of the polymerases could go beyond this T 632 00:37:53,170 --> 00:37:57,730 and go to the next T and stop here. 633 00:37:57,730 --> 00:38:01,210 And again, this would be one nucleotide length longer 634 00:38:01,210 --> 00:38:04,450 than this pink one, here. 635 00:38:04,450 --> 00:38:06,420 And the final one would-- 636 00:38:06,420 --> 00:38:07,920 I'll just draw it down here-- 637 00:38:07,920 --> 00:38:11,380 would get out to this last T, here. 638 00:38:11,380 --> 00:38:16,130 So what you see is if we could get DNA polymerase to stop 639 00:38:16,130 --> 00:38:19,480 at these discrete positions, we'd 640 00:38:19,480 --> 00:38:23,230 get a different sized fragments, whether it 641 00:38:23,230 --> 00:38:27,730 was stopping at one nucleotide versus the other nucleotide. 642 00:38:27,730 --> 00:38:30,490 You all see how this is resulting in different fragment 643 00:38:30,490 --> 00:38:31,780 lengths. 644 00:38:31,780 --> 00:38:32,820 Yes, Andrew. 645 00:38:32,820 --> 00:38:35,870 AUDIENCE: How would you create a pattern [INAUDIBLE]?? 646 00:38:38,710 --> 00:38:40,990 ADAM MARTIN: There are companies now. 647 00:38:40,990 --> 00:38:42,670 You can basically take nucleotides 648 00:38:42,670 --> 00:38:46,420 and synthesize these primers chemically, 649 00:38:46,420 --> 00:38:48,210 not using DNA polymerase. 650 00:38:48,210 --> 00:38:50,844 AUDIENCE: I'm saying how would you know what primer to use, 651 00:38:50,844 --> 00:38:52,330 if you don't know the sequence? 652 00:38:52,330 --> 00:38:53,830 ADAM MARTIN: Oh, in this case, you'd 653 00:38:53,830 --> 00:38:58,030 have to start with some sequence that you know. 654 00:38:58,030 --> 00:39:01,150 So in most sequencing technologies, 655 00:39:01,150 --> 00:39:04,720 you kind of make a DNA library, where you know 656 00:39:04,720 --> 00:39:06,070 the sequence of the vector. 657 00:39:06,070 --> 00:39:08,590 And then you'd use the vector sequence as a primer 658 00:39:08,590 --> 00:39:11,770 to sequence into the unknown sequence. 659 00:39:11,770 --> 00:39:12,670 Great question. 660 00:39:12,670 --> 00:39:14,740 Good job. 661 00:39:14,740 --> 00:39:17,110 All right. 662 00:39:17,110 --> 00:39:21,190 So what we need now then is some sort of tool or ability 663 00:39:21,190 --> 00:39:26,080 to stop DNA polymerase when there's a certain nucleotide 664 00:39:26,080 --> 00:39:28,540 base. 665 00:39:28,540 --> 00:39:32,200 And to do that, we can use this type of molecule, 666 00:39:32,200 --> 00:39:36,670 here, which is known as a dideoxynucleotide. 667 00:39:36,670 --> 00:39:42,070 Remember, for DNA polymerase to elongate a chain, 668 00:39:42,070 --> 00:39:47,500 it requires that the last base have a three prime hydroxyl. 669 00:39:47,500 --> 00:39:53,110 And so what this dideoxynucleoside triphosphate 670 00:39:53,110 --> 00:39:57,040 is, is it's a nucleoside triphosphate that 671 00:39:57,040 --> 00:39:59,840 lacks a three prime hydroxyl. 672 00:39:59,840 --> 00:40:02,680 Here, I'll highlight that. 673 00:40:02,680 --> 00:40:03,820 So you see this guy? 674 00:40:03,820 --> 00:40:06,150 You see it bolt the highlight H? 675 00:40:06,150 --> 00:40:09,220 There's a hydrogen there on the three prime carbon, 676 00:40:09,220 --> 00:40:13,930 rather than the normal hydroxyl group. 677 00:40:13,930 --> 00:40:20,620 So if this base gets incorporated into a elongating 678 00:40:20,620 --> 00:40:26,080 chain, DNA polymerase is not going to be able to move on. 679 00:40:28,690 --> 00:40:34,570 So this method where you can add a certain dideoxynucleoside 680 00:40:34,570 --> 00:40:38,770 triphosphate to stop chain elongation 681 00:40:38,770 --> 00:40:42,430 is known as a chain termination method. 682 00:40:42,430 --> 00:40:46,030 So you're getting chain termination. 683 00:40:46,030 --> 00:40:48,430 And you're getting this chain termination 684 00:40:48,430 --> 00:40:52,120 with a specific dideoxynucleoside triphosphate. 685 00:40:52,120 --> 00:40:56,990 So these dideoxynucleotide triphosphates, 686 00:40:56,990 --> 00:40:59,770 if they get incorporated into the DNA, 687 00:40:59,770 --> 00:41:03,590 are going to halt the synthesis of that DNA strand. 688 00:41:03,590 --> 00:41:08,860 So if we take our example, here, this 689 00:41:08,860 --> 00:41:14,350 might be a reaction that has dideoxythymidine triphosphate. 690 00:41:14,350 --> 00:41:21,520 So if we had dideoxythymidine triphosphate in this sample 691 00:41:21,520 --> 00:41:24,850 and it's elongating, then when the polymerase reaches 692 00:41:24,850 --> 00:41:26,890 this point, there's a possibility 693 00:41:26,890 --> 00:41:33,490 that it will incorporate the dideoxynucleoside triphosphate. 694 00:41:33,490 --> 00:41:37,870 And if this is a dideoxynucleoside triphosphate, 695 00:41:37,870 --> 00:41:42,070 then there won't be a three prime hydroxyl. 696 00:41:42,070 --> 00:41:48,340 And DNA polymerase will just be like, oh, I can't go on! 697 00:41:48,340 --> 00:41:51,420 Because it's not going to have a three prime hydroxyl. 698 00:41:51,420 --> 00:41:54,010 So it's not going to be able to continue 699 00:41:54,010 --> 00:41:56,830 with the next nucleotide. 700 00:41:56,830 --> 00:42:00,040 So this is known as chain termination. 701 00:42:00,040 --> 00:42:03,415 So let me take you through an example, here. 702 00:42:06,020 --> 00:42:06,520 All right. 703 00:42:06,520 --> 00:42:10,330 So here's an example that you have a slide of. 704 00:42:10,330 --> 00:42:13,520 And again, there's a template strand, 705 00:42:13,520 --> 00:42:15,520 which is the top strand. 706 00:42:15,520 --> 00:42:19,330 And this method requires that you have a primer. 707 00:42:19,330 --> 00:42:21,535 And what's often done is you label the primer. 708 00:42:24,790 --> 00:42:28,690 So the first step is you have to denature your DNA. 709 00:42:28,690 --> 00:42:30,820 So you have to go from double stranded DNA 710 00:42:30,820 --> 00:42:32,050 to a single stranded DNA. 711 00:42:34,730 --> 00:42:41,000 And then you mix the double stranded DNA 712 00:42:41,000 --> 00:42:45,110 with first, this labeled primer, such 713 00:42:45,110 --> 00:42:49,940 that the primer can then yield to the single stranded DNA. 714 00:42:49,940 --> 00:42:53,846 You need DNA polymerase, as I've mentioned. 715 00:42:57,740 --> 00:42:59,990 And as, I believe, Rachel mentioned 716 00:42:59,990 --> 00:43:03,650 before, you need the building blocks of DNA. 717 00:43:03,650 --> 00:43:06,680 So you need the four dideoxynucleoside 718 00:43:06,680 --> 00:43:08,450 triphosphates. 719 00:43:08,450 --> 00:43:11,330 So you always have the four dideoxynucleotide 720 00:43:11,330 --> 00:43:14,030 triphosphates. 721 00:43:14,030 --> 00:43:16,100 But what's special here is you're 722 00:43:16,100 --> 00:43:20,480 going to spike several reactions with one 723 00:43:20,480 --> 00:43:23,630 of the dideoxynucleoside triphosphates. 724 00:43:23,630 --> 00:43:28,400 So you spike the reaction with a tiny amount 725 00:43:28,400 --> 00:43:34,190 of one of your dideoxynucleoside triphosphates. 726 00:43:36,860 --> 00:43:39,770 So let's say you have a reaction, here. 727 00:43:39,770 --> 00:43:45,970 And this this one here has dideoxyadenosine triphosphate. 728 00:43:45,970 --> 00:43:48,800 Then polymerase will along get this strand 729 00:43:48,800 --> 00:43:52,040 until there's a thymidine on the template. 730 00:43:52,040 --> 00:43:53,870 And then there's a possibility that it will 731 00:43:53,870 --> 00:43:57,170 incorporate this dideoxy NTP. 732 00:43:57,170 --> 00:43:59,480 And if it does, then you get chain termination. 733 00:43:59,480 --> 00:44:01,760 And you get a fragment of this length. 734 00:44:01,760 --> 00:44:04,460 But the other possibility, because there is still 735 00:44:04,460 --> 00:44:08,420 the deoxy form of the NTP present, 736 00:44:08,420 --> 00:44:10,550 it's possible that it incorporates 737 00:44:10,550 --> 00:44:13,760 a deoxyadenosine triphosphate there. 738 00:44:13,760 --> 00:44:18,050 And keeps going, and then incorporates a dideoxy ATP 739 00:44:18,050 --> 00:44:21,890 later on, where you have another T. 740 00:44:21,890 --> 00:44:24,800 And so the polymerase will essentially randomly 741 00:44:24,800 --> 00:44:30,410 stop at these different thymidine residues, 742 00:44:30,410 --> 00:44:34,070 depending on whether or not a dideoxynucleoside triphosphate 743 00:44:34,070 --> 00:44:35,690 is incorporated. 744 00:44:35,690 --> 00:44:38,480 And that means for a given reaction, one in which 745 00:44:38,480 --> 00:44:42,230 you have dideoxy ATP, you get a certain pattern 746 00:44:42,230 --> 00:44:46,640 of bands that represent the length of fragments, 747 00:44:46,640 --> 00:44:52,130 where you have, in this case, a thymidine base. 748 00:44:52,130 --> 00:44:55,790 And then you do this for all four bases, where 749 00:44:55,790 --> 00:44:59,410 you have four reactions, each with a different base that's 750 00:44:59,410 --> 00:44:59,910 dideoxy. 751 00:45:02,510 --> 00:45:04,700 So when you're adding these, you're 752 00:45:04,700 --> 00:45:10,820 going to do four reactions, one with dideoxy ATP spiked in, 753 00:45:10,820 --> 00:45:15,860 one with dideoxy TTP, one with dideoxy CTP, 754 00:45:15,860 --> 00:45:19,280 and the last with dideoxy GTP. 755 00:45:19,280 --> 00:45:22,280 And because these nucleotides are 756 00:45:22,280 --> 00:45:26,360 present in different positions along the sequence, 757 00:45:26,360 --> 00:45:30,080 you're going to get distinct banding pattern for each 758 00:45:30,080 --> 00:45:31,700 of these reactions. 759 00:45:31,700 --> 00:45:33,500 But using that banding pattern, you 760 00:45:33,500 --> 00:45:36,200 can then read off the sequence of DNA that's 761 00:45:36,200 --> 00:45:37,850 present on the template strand. 762 00:45:41,180 --> 00:45:45,590 So this is how sequencing was done for many, many years. 763 00:45:45,590 --> 00:45:50,030 These days, it's been made cheaper and faster. 764 00:45:50,030 --> 00:45:54,750 And now what's often used is next generation sequencing. 765 00:45:54,750 --> 00:45:59,090 And one the pain in the ass about sequencing before 766 00:45:59,090 --> 00:46:02,000 is you'd use a lot of radioactivity. 767 00:46:02,000 --> 00:46:05,020 Your primer would be radioactive, 768 00:46:05,020 --> 00:46:07,280 so that you could detect these bands. 769 00:46:07,280 --> 00:46:09,680 Right now, everything's done using fluorescence, which 770 00:46:09,680 --> 00:46:11,810 makes it much nicer, I think. 771 00:46:11,810 --> 00:46:13,940 And so in next generation sequencing, 772 00:46:13,940 --> 00:46:18,140 your template DNA is attached to a solid substrate, such 773 00:46:18,140 --> 00:46:22,880 that it's immobilized on some type of substrate. 774 00:46:22,880 --> 00:46:29,840 And then you add each of the four nucleoside triphosphates. 775 00:46:29,840 --> 00:46:33,290 In this case, they're labeled with a dye, such 776 00:46:33,290 --> 00:46:35,660 that each one is a different color. 777 00:46:35,660 --> 00:46:39,530 But the dye also functions to prevent elongation, 778 00:46:39,530 --> 00:46:42,500 such that, again, it's this chain termination. 779 00:46:42,500 --> 00:46:44,330 When you incorporate one of these, 780 00:46:44,330 --> 00:46:47,150 the polymerase just can't run along the DNA. 781 00:46:47,150 --> 00:46:50,300 It incorporates one and then stops. 782 00:46:50,300 --> 00:46:53,720 So if you get your first nucleotide incorporated, 783 00:46:53,720 --> 00:46:56,060 it will incorporate one of these four. 784 00:46:56,060 --> 00:46:59,330 And it will be fluorescent at a certain wavelength, which 785 00:46:59,330 --> 00:47:04,880 you can see using a device or microscope. 786 00:47:04,880 --> 00:47:08,900 And then what you then do is chemically modify this base, 787 00:47:08,900 --> 00:47:11,300 such that you remove the dye and allow it 788 00:47:11,300 --> 00:47:13,910 to extend one more base pair. 789 00:47:13,910 --> 00:47:17,490 And so you go one nucleotide at a time. 790 00:47:17,490 --> 00:47:20,480 And you read out the pattern of fluorescence that appears. 791 00:47:20,480 --> 00:47:23,450 And that gives you the sequence of DNA 792 00:47:23,450 --> 00:47:27,230 on this molecule that's stuck to your substrate. 793 00:47:27,230 --> 00:47:28,580 And you can do this in parallel. 794 00:47:28,580 --> 00:47:32,120 You can have tons, many different strands of DNA. 795 00:47:32,120 --> 00:47:34,430 And you can be reading out the sequence of each one 796 00:47:34,430 --> 00:47:36,200 of these strands in parallel. 797 00:47:39,060 --> 00:47:39,560 Great. 798 00:47:39,560 --> 00:47:41,480 Any questions about DNA sequencing? 799 00:47:45,200 --> 00:47:45,700 OK. 800 00:47:45,700 --> 00:47:46,420 Very good. 801 00:47:46,420 --> 00:47:47,630 I will see you on Monday. 802 00:47:47,630 --> 00:47:49,620 Have a great weekend.