1 00:00:16,379 --> 00:00:18,480 BARBARA IMPERIALI: So we are moving along. 2 00:00:18,480 --> 00:00:22,300 Lecture 6 is the last of the biochemistry lectures. 3 00:00:22,300 --> 00:00:26,880 We're going to be talking about nucleotides and nucleic acids. 4 00:00:26,880 --> 00:00:30,320 And you'll understand these terms in a moment. 5 00:00:30,320 --> 00:00:32,350 I'll clarify them for you. 6 00:00:32,350 --> 00:00:34,830 But this is a tremendous stepping stone 7 00:00:34,830 --> 00:00:36,405 to the next portion of the class. 8 00:00:36,405 --> 00:00:38,710 So I show you a few images here. 9 00:00:38,710 --> 00:00:42,070 I'm going to reshow you some of these in a moment 10 00:00:42,070 --> 00:00:45,750 when we talk about addressing understanding 11 00:00:45,750 --> 00:00:48,720 the noncovalent structure of DNA, which 12 00:00:48,720 --> 00:00:51,720 is so critical to understanding information 13 00:00:51,720 --> 00:00:54,600 storage and information transfer. 14 00:00:54,600 --> 00:00:58,590 But for now, let's just have a quick peek forward. 15 00:00:58,590 --> 00:01:03,700 After this section, I'm going to be covering molecular biology, 16 00:01:03,700 --> 00:01:06,780 so how to go from DNA to RNA to protein. 17 00:01:06,780 --> 00:01:08,690 And then Professor Martin will take 18 00:01:08,690 --> 00:01:13,080 over with the basic structures and functions of cells 19 00:01:13,080 --> 00:01:15,360 and then genetics. 20 00:01:15,360 --> 00:01:18,090 But for all of this, we're going to need nucleic acids. 21 00:01:18,090 --> 00:01:19,980 And I'll explain to you why here. 22 00:01:19,980 --> 00:01:23,910 So nucleic acids form fundamental units 23 00:01:23,910 --> 00:01:33,840 for information storage, storage. 24 00:01:33,840 --> 00:01:37,810 And that is the DNA that is in our nucleus 25 00:01:37,810 --> 00:01:45,205 and in our mitochondria, and then information transfer. 26 00:01:48,370 --> 00:01:50,440 And if I get a little bit of time at the end, 27 00:01:50,440 --> 00:01:52,282 I have three or four quick slides 28 00:01:52,282 --> 00:01:53,740 that you don't have on your handout 29 00:01:53,740 --> 00:01:57,190 because it's sort of a floating topic on the use of DNA 30 00:01:57,190 --> 00:01:59,500 and DNA-based computing, because it's 31 00:01:59,500 --> 00:02:02,230 a nanoscale structure that one can 32 00:02:02,230 --> 00:02:04,250 program to do different things. 33 00:02:04,250 --> 00:02:06,010 And I think you might enjoy that. 34 00:02:06,010 --> 00:02:09,340 So in this picture of the components 35 00:02:09,340 --> 00:02:12,310 and what's known as the central dogma, 36 00:02:12,310 --> 00:02:16,600 that is how DNA is converted into messenger RNA, 37 00:02:16,600 --> 00:02:20,530 which, through the help of transfer RNA and ribosomal RNA, 38 00:02:20,530 --> 00:02:22,240 we get proteins. 39 00:02:22,240 --> 00:02:26,950 The key elements on this slide are DNA, messenger RNA, 40 00:02:26,950 --> 00:02:29,980 ribosomal RNA, and transfer RNA. 41 00:02:29,980 --> 00:02:33,400 And those are all made up of nucleotides 42 00:02:33,400 --> 00:02:35,410 being brought together into polymers 43 00:02:35,410 --> 00:02:37,250 that are nucleic acids. 44 00:02:37,250 --> 00:02:40,990 So obviously, we really need to crack the structures of these 45 00:02:40,990 --> 00:02:45,550 and understand how the structure informs function. 46 00:02:45,550 --> 00:02:47,650 Remember, we did that for proteins. 47 00:02:47,650 --> 00:02:49,780 We've done that for phospholipids. 48 00:02:49,780 --> 00:02:52,750 We thought about it very briefly for carbohydrates. 49 00:02:52,750 --> 00:02:54,700 But the thing that I really want to stress 50 00:02:54,700 --> 00:02:57,850 to you with the fourth of these macromolecules 51 00:02:57,850 --> 00:03:02,500 is looking at how the last component of the biomolecule's 52 00:03:02,500 --> 00:03:04,570 structure really informs function. 53 00:03:04,570 --> 00:03:06,970 And it's really cool to think about how it's done. 54 00:03:17,660 --> 00:03:22,130 So how is that chemical molecular structure something 55 00:03:22,130 --> 00:03:26,450 that we can understand from the perspective of function? 56 00:03:26,450 --> 00:03:29,990 So what we need to do, first of all, 57 00:03:29,990 --> 00:03:33,740 is think about what nucleotides are and understand 58 00:03:33,740 --> 00:03:36,380 their structure so that we can move forward 59 00:03:36,380 --> 00:03:38,780 to understand how they come together 60 00:03:38,780 --> 00:03:40,540 to build these macromolecules. 61 00:03:40,540 --> 00:03:43,100 They're so pivotal and essential in life 62 00:03:43,100 --> 00:03:47,330 for programming the biosynthesis of our proteins. 63 00:03:47,330 --> 00:03:50,810 And now we're understanding more and more about not 64 00:03:50,810 --> 00:03:56,300 only that, but also how RNA, not DNA, is 65 00:03:56,300 --> 00:03:59,810 involved in a large number of regulatory processes. 66 00:03:59,810 --> 00:04:03,050 So it's not just DNA, double-stranded DNA 67 00:04:03,050 --> 00:04:05,330 goes to a messenger, and so on. 68 00:04:05,330 --> 00:04:08,120 Also, a lot of regulation occurs because 69 00:04:08,120 --> 00:04:12,025 of a lot of the other nucleic acids that are within the cell. 70 00:04:12,025 --> 00:04:13,400 So I'm going to go here because I 71 00:04:13,400 --> 00:04:19,829 want to describe the composite components of nucleotides 72 00:04:19,829 --> 00:04:24,030 so we understand their structure and their properties. 73 00:04:24,030 --> 00:04:27,140 So what are nucleotides? 74 00:04:30,198 --> 00:04:32,240 And you look at these structures up on the board. 75 00:04:32,240 --> 00:04:33,900 They look kind of complicated. 76 00:04:33,900 --> 00:04:36,550 So let me deconstruct them for you. 77 00:04:36,550 --> 00:04:38,460 It'll make life a lot easier. 78 00:04:38,460 --> 00:04:53,370 So they're two familiar building blocks and one new one. 79 00:04:53,370 --> 00:04:56,510 So the familiar building blocks are, first of all, 80 00:04:56,510 --> 00:04:58,890 carbohydrates. 81 00:04:58,890 --> 00:05:03,320 So the key carbohydrate in nucleic acid 82 00:05:03,320 --> 00:05:24,640 is a five-carbon pentose sugar, which looks like this. 83 00:05:24,640 --> 00:05:31,080 You can count the carbons, 1, 2, 4, 5, and 5. 84 00:05:31,080 --> 00:05:33,930 And you can reassure yourselves everything is there 85 00:05:33,930 --> 00:05:37,080 with respect to the carbons by translating 86 00:05:37,080 --> 00:05:39,660 this line-angle drawing into a drawing where 87 00:05:39,660 --> 00:05:43,470 you put all the hydrogens on and you know where everything is. 88 00:05:43,470 --> 00:05:46,920 There are two types of five-carbon pentoses 89 00:05:46,920 --> 00:05:49,510 that are used in the nucleic acid. 90 00:05:49,510 --> 00:05:55,290 They are ribose, which is shown here 91 00:05:55,290 --> 00:05:58,390 with all OHs on all of those carbons, 92 00:05:58,390 --> 00:06:11,310 and two deoxyribose, which is a building block of DNA, 93 00:06:11,310 --> 00:06:14,850 whereas ribose is a building block of RNA. 94 00:06:14,850 --> 00:06:16,680 What else do I need to tell you? 95 00:06:16,680 --> 00:06:18,780 You'll see this later on. 96 00:06:18,780 --> 00:06:23,310 That ribose sugar ends up being connected to what 97 00:06:23,310 --> 00:06:24,825 are known as nucleobases. 98 00:06:31,500 --> 00:06:33,960 You do not necessarily need to draw those, 99 00:06:33,960 --> 00:06:37,320 because you've got them on your handout to put sketches on. 100 00:06:37,320 --> 00:06:39,322 So I put them on the board for so I 101 00:06:39,322 --> 00:06:41,280 don't have to stand here and draw them for you. 102 00:06:41,280 --> 00:06:43,470 And I want to explain certain things. 103 00:06:43,470 --> 00:06:46,650 So the nucleobases in the numbering system-- 104 00:06:46,650 --> 00:06:50,250 and I'm going to keep on reiterating this so you'll 105 00:06:50,250 --> 00:06:52,080 get familiar with it-- 106 00:06:52,080 --> 00:06:59,370 number the carbons 1 through whatever it is, or rather, 107 00:06:59,370 --> 00:07:03,400 the atom numbers when you're walking around the ring. 108 00:07:03,400 --> 00:07:06,810 So when we talk about the ribose component, 109 00:07:06,810 --> 00:07:09,600 they have what's known as a prime numbering system 110 00:07:09,600 --> 00:07:12,240 to differentiate it from the numbering 111 00:07:12,240 --> 00:07:14,130 system in the riboses. 112 00:07:14,130 --> 00:07:18,310 So this would be 1 prime, 2 prime, 3 prime, 4 prime, 113 00:07:18,310 --> 00:07:19,350 and 5 prime. 114 00:07:19,350 --> 00:07:20,820 Why is that? 115 00:07:20,820 --> 00:07:22,840 This becomes incredibly important 116 00:07:22,840 --> 00:07:26,190 when we talk about putting together polymers of DNA 117 00:07:26,190 --> 00:07:30,300 and the direction in which DNA is assembled in life, and also, 118 00:07:30,300 --> 00:07:34,432 even when we describe 2-deoxyribose, 119 00:07:34,432 --> 00:07:40,230 or a ribose, because this would be called 2 prime deoxyribose 120 00:07:40,230 --> 00:07:41,940 in the nucleic acid. 121 00:07:41,940 --> 00:07:44,250 So I'm going to bore you with that numbering system 122 00:07:44,250 --> 00:07:46,950 because I'll start to use it very commonly. 123 00:07:46,950 --> 00:07:50,820 And it will make a lot of sense as we start to assemble the DNA 124 00:07:50,820 --> 00:07:53,610 macromolecule when we talk about the way it's 125 00:07:53,610 --> 00:07:55,410 built and drawn and written. 126 00:07:55,410 --> 00:07:57,930 The numbering system will be important 127 00:07:57,930 --> 00:08:03,030 because we'll constantly refer to 5 prime and 3 prime. 128 00:08:03,030 --> 00:08:05,790 That's just a little preview for later. 129 00:08:05,790 --> 00:08:09,820 The next component of the nucleic acid is a phosphate. 130 00:08:12,550 --> 00:08:13,865 Phosphorus looks like this. 131 00:08:17,360 --> 00:08:21,170 But in nucleic, in the nucleotides, 132 00:08:21,170 --> 00:08:29,540 these are joined to other units as phosphoesters. 133 00:08:34,240 --> 00:08:36,490 But you want to remember that in phosphorus, you 134 00:08:36,490 --> 00:08:39,789 have 1, 2, 3, 4, 5 bonds to phosphorus, 135 00:08:39,789 --> 00:08:44,650 and you commonly have a negative charge on one of those oxygens. 136 00:08:44,650 --> 00:08:47,590 And in the structure of DNA, you actually 137 00:08:47,590 --> 00:08:51,385 have phosphates occurring as phosphodiesters. 138 00:08:58,520 --> 00:09:00,570 And you, once again, you will see that when 139 00:09:00,570 --> 00:09:03,150 we see the intact structure of DNA. 140 00:09:03,150 --> 00:09:04,530 So what are nucleotides? 141 00:09:04,530 --> 00:09:13,680 Nucleotides are a combination of a carbohydrate 142 00:09:13,680 --> 00:09:17,040 or a sugar, a phosphate and a nucleobase. 143 00:09:17,040 --> 00:09:18,870 That's the third component, the one 144 00:09:18,870 --> 00:09:20,520 we're going to learn about now. 145 00:09:20,520 --> 00:09:23,650 So the nucleobases look like this. 146 00:09:23,650 --> 00:09:27,450 There are two families, two flavors of nucleobase. 147 00:09:27,450 --> 00:09:29,880 There is one flavor-- 148 00:09:29,880 --> 00:09:32,460 let's get this cleaned up a little bit here-- 149 00:09:32,460 --> 00:09:34,710 that has two rings. 150 00:09:34,710 --> 00:09:38,520 And it has the shorter name, purine. 151 00:09:38,520 --> 00:09:40,980 And there's a different family or flavor 152 00:09:40,980 --> 00:09:43,920 of nucleobases that has one ring, 153 00:09:43,920 --> 00:09:45,660 and it has the bigger name. 154 00:09:45,660 --> 00:09:47,700 And that, to this day, is the way 155 00:09:47,700 --> 00:09:50,730 I remember purines and pyrimidines. 156 00:09:50,730 --> 00:09:54,870 Small name, big structure; big name, small structure. 157 00:09:54,870 --> 00:09:56,780 If that's helpful to you, go for it. 158 00:09:56,780 --> 00:09:57,280 Use it. 159 00:09:57,280 --> 00:10:00,000 I haven't patented it or anything. 160 00:10:00,000 --> 00:10:04,860 So in nucleic acids, there are two different purines. 161 00:10:04,860 --> 00:10:07,950 They are known as adenine and guanine. 162 00:10:07,950 --> 00:10:10,260 You do not need to know these structures. 163 00:10:10,260 --> 00:10:13,920 I actually only know my favorite three of the five 164 00:10:13,920 --> 00:10:14,817 to draw easily. 165 00:10:14,817 --> 00:10:17,150 And the other two, I'm always stumbling around the ring. 166 00:10:17,150 --> 00:10:18,700 So don't worry about that. 167 00:10:18,700 --> 00:10:21,150 We all get to know the ones we work with every day. 168 00:10:21,150 --> 00:10:24,030 For me, it's uracil, it's adenine, 169 00:10:24,030 --> 00:10:26,970 and it's cytosine, but not the others. 170 00:10:26,970 --> 00:10:28,920 But what you do need to understand 171 00:10:28,920 --> 00:10:31,080 is a little bit about their structures. 172 00:10:31,080 --> 00:10:34,920 Because when we start to talk about the noncovalent structure 173 00:10:34,920 --> 00:10:40,230 of nucleic acids, principally, the double-stranded helix 174 00:10:40,230 --> 00:10:44,790 of DNA, we need to know where the hydrogen bond donors 175 00:10:44,790 --> 00:10:47,890 and acceptors are in these structures. 176 00:10:47,890 --> 00:10:50,370 So if you want to indulge me, you 177 00:10:50,370 --> 00:10:52,530 can take a look at these structures. 178 00:10:52,530 --> 00:10:56,700 This hydrogen would be a donor. 179 00:10:56,700 --> 00:10:59,740 You can see that it's a hydrogen on a nitrogen. 180 00:10:59,740 --> 00:11:01,590 This nitrogen is interesting. 181 00:11:01,590 --> 00:11:04,530 It has 1, 2, 3 bonds to nitrogen, 182 00:11:04,530 --> 00:11:06,750 which means there are a lone pair of electrons 183 00:11:06,750 --> 00:11:08,760 also on that ring system. 184 00:11:11,730 --> 00:11:14,340 So that would be a hydrogen bond acceptor. 185 00:11:14,340 --> 00:11:19,020 And the adenine nucleobase can accept and give 186 00:11:19,020 --> 00:11:20,850 a pair of hydrogen bonds. 187 00:11:20,850 --> 00:11:24,460 And you can work that out for all of the others. 188 00:11:24,460 --> 00:11:34,576 So in guanine, there is an acceptor, another acceptor, 189 00:11:34,576 --> 00:11:36,880 and a donor, and so on. 190 00:11:36,880 --> 00:11:39,860 So those rings in the nucleobases 191 00:11:39,860 --> 00:11:42,630 are very important because they have places 192 00:11:42,630 --> 00:11:44,730 that you can hydrogen bond to. 193 00:11:44,730 --> 00:11:46,998 Now, is everyone feeling comfortable about this? 194 00:11:46,998 --> 00:11:48,540 Does anyone want to ask me a question 195 00:11:48,540 --> 00:11:51,052 that might help clarify, because it's quite-- 196 00:11:51,052 --> 00:11:52,260 yeah, do you have a question? 197 00:11:52,260 --> 00:11:56,680 AUDIENCE: [INAUDIBLE] What does uracil [INAUDIBLE]?? 198 00:11:56,680 --> 00:11:57,930 BARBARA IMPERIALI: What does-- 199 00:11:57,930 --> 00:11:59,422 sorry? 200 00:11:59,422 --> 00:12:00,130 AUDIENCE: Uracil. 201 00:12:00,130 --> 00:12:02,100 BARBARA IMPERIALI: Uracil. 202 00:12:02,100 --> 00:12:04,290 These are all-- sorry. 203 00:12:04,290 --> 00:12:07,650 All these nucleobases have fancy names. 204 00:12:07,650 --> 00:12:10,920 So, so far, I've shown you the structure of adenine, guanine, 205 00:12:10,920 --> 00:12:12,700 cytosine, and thymine. 206 00:12:12,700 --> 00:12:17,410 Uracil, which is not drawn on the board, 207 00:12:17,410 --> 00:12:22,870 is very similar to thymine, except this methyl group 208 00:12:22,870 --> 00:12:27,070 is a hydrogen. 209 00:12:27,070 --> 00:12:29,680 Knowing the names is also complicated. 210 00:12:29,680 --> 00:12:31,670 I really care that you understand 211 00:12:31,670 --> 00:12:34,480 the hydrogen-bonding patterns; not to draw 212 00:12:34,480 --> 00:12:36,490 the whole structures, but to identify 213 00:12:36,490 --> 00:12:40,600 hydrogen-bonding patterns; not to remember fancy names, 214 00:12:40,600 --> 00:12:43,600 because there's no logic to those names; 215 00:12:43,600 --> 00:12:47,890 but really, to remember ribose, deoxyribose, phosphate 216 00:12:47,890 --> 00:12:51,820 and phosphodiesters, purines and pyrimidines, 217 00:12:51,820 --> 00:12:54,980 just the sizes of them to pick them out. 218 00:12:54,980 --> 00:12:57,100 Does that make sense, what I want you to know, 219 00:12:57,100 --> 00:13:00,520 and what you can remember if you think it's interesting? 220 00:13:03,230 --> 00:13:10,280 Now, in nature, we use the nucleotide building blocks 221 00:13:10,280 --> 00:13:13,320 or the nucleotides in many different ways. 222 00:13:13,320 --> 00:13:16,850 It's not just in DNA and RNA. 223 00:13:16,850 --> 00:13:19,820 And so here, I'm showing you some really important 224 00:13:19,820 --> 00:13:21,965 nucleotides that are found in nature. 225 00:13:21,965 --> 00:13:23,840 And I'll give you a little bit of information 226 00:13:23,840 --> 00:13:25,370 about their signaling. 227 00:13:25,370 --> 00:13:30,560 So here are the components that you can pick out. 228 00:13:30,560 --> 00:13:34,220 There is, in this case, a ribose sugar. 229 00:13:34,220 --> 00:13:38,690 In this case, it's phosphate, but it's a phosphate triester. 230 00:13:38,690 --> 00:13:41,030 So it's got three phosphates in a row. 231 00:13:41,030 --> 00:13:44,330 And here's a nucleobase, which is a purine. 232 00:13:44,330 --> 00:13:46,970 And this is adenosine triphosphate. 233 00:13:46,970 --> 00:13:50,420 So it's one of the bases, one of the nucleotides used 234 00:13:50,420 --> 00:13:53,040 in energy, energy transfer. 235 00:13:53,040 --> 00:13:55,550 In a lot of metabolic processes, we 236 00:13:55,550 --> 00:14:01,320 use ATP as a molecule that has energy that can be unlocked 237 00:14:01,320 --> 00:14:03,440 for chemical processes. 238 00:14:03,440 --> 00:14:05,660 There's another one of these, which 239 00:14:05,660 --> 00:14:11,190 is guanosine triphosphate, where the nucleobase is different. 240 00:14:11,190 --> 00:14:14,720 They're both purines, but they have different structures. 241 00:14:14,720 --> 00:14:16,220 You can see them there. 242 00:14:16,220 --> 00:14:18,830 And then finally, the last one I show 243 00:14:18,830 --> 00:14:23,750 you here is a nucleotide that has a cyclic phosphate. 244 00:14:23,750 --> 00:14:29,270 But it still has a nucleobase, a ribose, and a phosphate. 245 00:14:29,270 --> 00:14:31,700 And this is cyclic AMP. 246 00:14:31,700 --> 00:14:35,660 And when come back after Professor Martin has talked, 247 00:14:35,660 --> 00:14:38,300 we'll talk about the role of cyclic AMP 248 00:14:38,300 --> 00:14:40,110 as a second messenger. 249 00:14:40,110 --> 00:14:42,590 So these two molecules, in addition 250 00:14:42,590 --> 00:14:45,770 to being building blocks for DNA and RNA, 251 00:14:45,770 --> 00:14:48,680 also are forms of energy where you 252 00:14:48,680 --> 00:14:52,190 can use ATP or GTP as a form of energy 253 00:14:52,190 --> 00:14:54,770 in a lot of metabolic processes. 254 00:14:54,770 --> 00:14:59,060 And in fact, though, when we start constructing proteins 255 00:14:59,060 --> 00:15:01,520 using the ribosomal system, you'll 256 00:15:01,520 --> 00:15:05,300 notice we use GTP as a form of energy, not ATP. 257 00:15:05,300 --> 00:15:07,580 It's interesting how nature chooses to do that. 258 00:15:07,580 --> 00:15:08,720 Any questions about this? 259 00:15:13,040 --> 00:15:15,410 One tiny wrinkle left to deal with, 260 00:15:15,410 --> 00:15:17,870 and that's a little bit more about those building 261 00:15:17,870 --> 00:15:20,780 blocks for the nucleic acid, and one more 262 00:15:20,780 --> 00:15:23,840 item that it's useful to understand the name of. 263 00:15:23,840 --> 00:15:29,840 So here are the five nucleobases, two purines, 264 00:15:29,840 --> 00:15:32,630 and three pyrimidines. 265 00:15:32,630 --> 00:15:40,310 In DNA, we have AT, G and C, so A, T, G, and c. 266 00:15:40,310 --> 00:15:43,590 So we have different building blocks. 267 00:15:43,590 --> 00:15:45,860 Three are common to both polymers. 268 00:15:45,860 --> 00:15:48,480 One is different. 269 00:15:48,480 --> 00:15:56,270 Uracil and thymine are exchanged when you go from DNA to RNA. 270 00:15:56,270 --> 00:16:00,760 The pyrimidines are cytosine, uracil, and thymine. 271 00:16:00,760 --> 00:16:04,590 And in RNA, you have a AU, G and C. 272 00:16:04,590 --> 00:16:07,850 So there are reasons for these differences, 273 00:16:07,850 --> 00:16:11,540 and I'll nudge into some of those chemical differences 274 00:16:11,540 --> 00:16:12,630 in a moment. 275 00:16:12,630 --> 00:16:16,130 So the information up there is the same information 276 00:16:16,130 --> 00:16:17,600 that I have on this board. 277 00:16:17,600 --> 00:16:19,100 The next thing I need to talk to you 278 00:16:19,100 --> 00:16:35,230 is we very commonly use the term, or two terms, nucleoside 279 00:16:35,230 --> 00:16:36,260 and the nucleotide. 280 00:16:40,140 --> 00:16:42,240 How irritating is that? 281 00:16:42,240 --> 00:16:56,900 The nucleoside is just the ribose plus the nucleobase, 282 00:16:56,900 --> 00:16:59,840 but no phosphates. 283 00:16:59,840 --> 00:17:07,200 As soon as you put on phosphates, 284 00:17:07,200 --> 00:17:09,109 they become nucleotides. 285 00:17:09,109 --> 00:17:25,319 So for example, nucleobase, ribose, and in this case, 286 00:17:25,319 --> 00:17:26,640 a phosphate on it. 287 00:17:26,640 --> 00:17:28,500 And that becomes a nucleotide. 288 00:17:28,500 --> 00:17:31,140 No matter how many phosphates they are, 289 00:17:31,140 --> 00:17:33,000 it's called a nucleotide. 290 00:17:33,000 --> 00:17:36,360 I'm less concerned that you will remember that nomenclature, 291 00:17:36,360 --> 00:17:38,100 more that you know what it's all about, 292 00:17:38,100 --> 00:17:41,080 because otherwise, it might become a little bit confusing. 293 00:17:41,080 --> 00:17:43,350 So just remember, if you can remember that. 294 00:17:43,350 --> 00:17:45,630 But I think I've tried to define the things I 295 00:17:45,630 --> 00:17:47,310 would like you to remember-- 296 00:17:47,310 --> 00:17:50,610 the building blocks, the numbering system, 297 00:17:50,610 --> 00:17:54,360 the phosphodiester linkages, and the nucleobases, 298 00:17:54,360 --> 00:17:58,050 as far as understanding where donors and acceptors are 299 00:17:58,050 --> 00:17:59,190 for hydrogen bonding. 300 00:18:02,710 --> 00:18:04,550 And there's one thing. 301 00:18:04,550 --> 00:18:07,890 So we call that a nucleoside, whereas we call it a 302 00:18:07,890 --> 00:18:10,780 nucleotide when it includes the phosphates. 303 00:18:10,780 --> 00:18:13,060 And there's one thing that you want to notice, 304 00:18:13,060 --> 00:18:18,610 is that the bond from the nucleobase to the ribose 305 00:18:18,610 --> 00:18:23,050 is a glycoside bond. 306 00:18:23,050 --> 00:18:25,360 It's a bond to a carbohydrate. 307 00:18:25,360 --> 00:18:27,850 So that's why it's called a glycoside bond. 308 00:18:27,850 --> 00:18:31,810 There are glycosidases that cleave the bond from the base 309 00:18:31,810 --> 00:18:32,830 to the sugar. 310 00:18:32,830 --> 00:18:36,730 Those are very important when we have mutations in our DNA, 311 00:18:36,730 --> 00:18:40,450 and we want to cut out the sugar to fix it 312 00:18:40,450 --> 00:18:45,160 so it doesn't get misread in the biosynthesis of DNA, 313 00:18:45,160 --> 00:18:49,370 in the biosynthesis of messenger RNA. 314 00:18:49,370 --> 00:18:51,040 So that bond is important. 315 00:18:51,040 --> 00:18:53,350 We may often talk about it, but only 316 00:18:53,350 --> 00:18:58,510 when we get to learning about how DNA sequences are corrected 317 00:18:58,510 --> 00:19:00,850 if there are mistakes in those sequences. 318 00:19:00,850 --> 00:19:03,400 And that will be later on. 319 00:19:03,400 --> 00:19:06,100 So let's start to now look at the polymers. 320 00:19:06,100 --> 00:19:10,300 Now, I want to tell you that by the early 1900s, 321 00:19:10,300 --> 00:19:12,970 people pretty much knew the structure, 322 00:19:12,970 --> 00:19:15,760 the noncovalent structure of DNA. 323 00:19:15,760 --> 00:19:17,560 And I'll describe it to you now. 324 00:19:17,560 --> 00:19:20,920 DNA is made up of nucleotides. 325 00:19:20,920 --> 00:19:25,630 And this is its basic structure, where you have a phosphodiester 326 00:19:25,630 --> 00:19:31,290 backbone linking riboses, and each of those ribosomes 327 00:19:31,290 --> 00:19:35,490 is modified with a purine or a pyrimidine. 328 00:19:35,490 --> 00:19:39,520 And that is the basic structure of a nucleic acid polymer, 329 00:19:39,520 --> 00:19:42,240 only it's very, very, very, very long. 330 00:19:42,240 --> 00:19:44,790 So let's take a look at the components here. 331 00:19:44,790 --> 00:19:45,960 Look at the bonds. 332 00:19:45,960 --> 00:19:48,450 And maybe on your notes, just highlight the bonds 333 00:19:48,450 --> 00:19:50,380 and some of the things I'll talk about. 334 00:19:50,380 --> 00:19:53,520 So first of all, the numbering system here, 335 00:19:53,520 --> 00:19:56,400 we always talk about a nucleic acid. 336 00:19:56,400 --> 00:20:01,740 And we describe the sequence of the nucleic acid based on 337 00:20:01,740 --> 00:20:04,560 from 5 prime to 3 prime. 338 00:20:04,560 --> 00:20:09,520 Because the phosphodiester bonds join the 5 prime-- 339 00:20:09,520 --> 00:20:11,950 there should be a number there-- and the 3 prime sites. 340 00:20:11,950 --> 00:20:16,690 So the linkage would be here, would be 5 prime and 3 341 00:20:16,690 --> 00:20:21,260 prime joining to the ribose molecules. 342 00:20:21,260 --> 00:20:26,260 So the architecture of that nucleic acid is a polymer that 343 00:20:26,260 --> 00:20:30,880 includes a phosphodiester backbone linked by phosphate 344 00:20:30,880 --> 00:20:34,510 esters-- that's 1 phosphate ester; that's the other one-- 345 00:20:34,510 --> 00:20:38,260 on two of the OHs of the ribose sugar. 346 00:20:38,260 --> 00:20:42,760 When this is DNA, there's no OH group on that carbon site. 347 00:20:42,760 --> 00:20:44,380 That would be the 2-prime site. 348 00:20:44,380 --> 00:20:48,640 You can see-- you can pick straight out that this is DNA. 349 00:20:48,640 --> 00:20:53,250 The sequence is then defined by what the identity of the base 350 00:20:53,250 --> 00:20:54,100 is here. 351 00:20:54,100 --> 00:20:58,150 So this would be guanine, adenine, thymine 352 00:20:58,150 --> 00:20:59,590 on that sequence. 353 00:20:59,590 --> 00:21:16,910 Now, by convention, if we write out this sequence, 354 00:21:16,910 --> 00:21:22,985 the way the sequences are written, 355 00:21:22,985 --> 00:21:27,850 are 5 prime to 3 prime direction. 356 00:21:27,850 --> 00:21:36,810 So if I look at that, I would be able to name it as an A, G, T 357 00:21:36,810 --> 00:21:40,110 sequence, because we always write the sequences 5 358 00:21:40,110 --> 00:21:41,670 prime to 3 prime. 359 00:21:41,670 --> 00:21:44,910 We can remember that later on because we actually also build 360 00:21:44,910 --> 00:21:47,790 sequences 5 prime to 3 prime. 361 00:21:54,930 --> 00:21:59,040 So there are some conventions in biology and biochemistry. 362 00:21:59,040 --> 00:22:08,730 You want to remember that by convention, we write peptides N 363 00:22:08,730 --> 00:22:12,720 terminal to C terminal. 364 00:22:12,720 --> 00:22:20,540 But we also build them N to C. So that's 365 00:22:20,540 --> 00:22:23,513 why the convention is strong, and it's good to remember, 366 00:22:23,513 --> 00:22:25,430 because it can get you out of a lot of trouble 367 00:22:25,430 --> 00:22:27,350 if you remember those things. 368 00:22:27,350 --> 00:22:33,650 Now, when we are building a DNA polymer, we grow that sequence. 369 00:22:33,650 --> 00:22:36,830 You'll see the biochemistry for all of that polymerization 370 00:22:36,830 --> 00:22:37,910 in the next class. 371 00:22:37,910 --> 00:22:41,570 It's amazingly cool how the entire contents 372 00:22:41,570 --> 00:22:45,590 of a cell, the DNA, can be replicated in amazing time 373 00:22:45,590 --> 00:22:49,430 frames, but all through growing those chains from 5 prime 374 00:22:49,430 --> 00:22:50,780 to 3 prime. 375 00:22:50,780 --> 00:22:53,960 So when we add another building block on, 376 00:22:53,960 --> 00:22:56,730 we remove a molecule of water. 377 00:22:56,730 --> 00:23:00,230 So that's a condensation reaction. 378 00:23:00,230 --> 00:23:03,690 And we form a new phosphodiester bond. 379 00:23:03,690 --> 00:23:06,920 So in the biosynthesis of DNA, you 380 00:23:06,920 --> 00:23:12,200 keep on adding new nucleotides to the 3 prime end. 381 00:23:12,200 --> 00:23:15,620 There's a chemical reason for that. 382 00:23:15,620 --> 00:23:20,060 When we build DNA, we don't just cram the two groups together. 383 00:23:20,060 --> 00:23:24,260 We, rather, come in with a triphosphate 384 00:23:24,260 --> 00:23:26,750 and use that activated triphosphate 385 00:23:26,750 --> 00:23:27,920 as the new building block. 386 00:23:27,920 --> 00:23:29,930 And you kick out triphosphate. 387 00:23:29,930 --> 00:23:33,080 And you'll see that when we talk about DNA synthesis. 388 00:23:33,080 --> 00:23:35,030 But what I want you to remember here 389 00:23:35,030 --> 00:23:38,260 is this is another condensation reaction. 390 00:23:38,260 --> 00:23:41,270 We talked about them when making peptides. 391 00:23:41,270 --> 00:23:45,050 We talked about them when making carbohydrate polymers. 392 00:23:45,050 --> 00:23:49,100 And now we're seeing, once again, a condensation reaction 393 00:23:49,100 --> 00:23:51,980 to make a nucleic acid polymer. 394 00:23:51,980 --> 00:23:58,480 Now, the last term that's kind of worth mentioning 395 00:23:58,480 --> 00:24:04,140 is the word nucleic acid. 396 00:24:04,140 --> 00:24:06,330 What's that about? 397 00:24:06,330 --> 00:24:08,850 I don't see any carboxylic acids. 398 00:24:08,850 --> 00:24:11,550 It turns out the polymers of DNA are 399 00:24:11,550 --> 00:24:18,970 very acidic because the OH group on those phosphodiester 400 00:24:18,970 --> 00:24:20,620 backbones is very acidic. 401 00:24:20,620 --> 00:24:24,370 So you give up H plus. 402 00:24:24,370 --> 00:24:27,830 And this is in its most stable form as O minus. 403 00:24:27,830 --> 00:24:28,330 So when 404 00:24:28,330 --> 00:24:35,290 DNA was first isolated, it was isolated from white blood cells 405 00:24:35,290 --> 00:24:37,030 by isolating the nucleus. 406 00:24:37,030 --> 00:24:41,170 And it was found that it was a very acidic material packed 407 00:24:41,170 --> 00:24:42,460 into the nucleus. 408 00:24:42,460 --> 00:24:46,990 That's why it was called nucleic acid, acids in the nucleus. 409 00:24:46,990 --> 00:24:50,710 Before people even understood anything about the composition, 410 00:24:50,710 --> 00:24:53,950 it garnered that name, nucleic acid. 411 00:24:53,950 --> 00:24:57,640 So we talk about polymers of nucleotides, 412 00:24:57,640 --> 00:24:59,065 we call them nucleic acids. 413 00:25:03,650 --> 00:25:07,610 Then with respect to writing our sequences, 414 00:25:07,610 --> 00:25:10,230 we could write them in this way. 415 00:25:10,230 --> 00:25:10,900 So pdGATC. 416 00:25:13,790 --> 00:25:15,320 That would be that structure. 417 00:25:15,320 --> 00:25:18,530 What do all the little extra Ps and Ds stand for? 418 00:25:18,530 --> 00:25:22,790 The P stands for whether there's a phosphate at this end. 419 00:25:22,790 --> 00:25:26,750 The D stands for whether it's a deoxy sugar as a building 420 00:25:26,750 --> 00:25:27,380 block. 421 00:25:27,380 --> 00:25:29,100 Going all the way to the other end, 422 00:25:29,100 --> 00:25:31,140 there's no little p at the other end. 423 00:25:31,140 --> 00:25:32,930 So it means that OH is free. 424 00:25:32,930 --> 00:25:38,000 Does everyone understand that shorthand writing? 425 00:25:38,000 --> 00:25:40,280 There's another way I could know this was DNA 426 00:25:40,280 --> 00:25:44,870 without needing to put deoxy on each of the building blocks. 427 00:25:44,870 --> 00:25:47,450 Does anyone know how I know immediately 428 00:25:47,450 --> 00:25:49,570 it's a stretch of DNA? 429 00:25:49,570 --> 00:25:50,077 Yeah? 430 00:25:50,077 --> 00:25:50,910 AUDIENCE: No uracil? 431 00:25:50,910 --> 00:25:51,880 BARBARA IMPERIALI: Yeah, there's no uracil, 432 00:25:51,880 --> 00:25:53,300 and there's thymine instead. 433 00:25:53,300 --> 00:25:57,050 So in principle, as long as there's a T in there, 434 00:25:57,050 --> 00:25:58,160 you know it's DNA. 435 00:25:58,160 --> 00:26:00,235 As long as a U in there, you know it's RNA. 436 00:26:05,200 --> 00:26:08,200 Now, let's talk about the noncovalent structure, 437 00:26:08,200 --> 00:26:11,320 because I really feel that that's 438 00:26:11,320 --> 00:26:22,340 the most exciting part of this entire endeavor 439 00:26:22,340 --> 00:26:24,440 because the covalent structure really 440 00:26:24,440 --> 00:26:29,810 doesn't allow us to understand how DNA stores information 441 00:26:29,810 --> 00:26:31,550 for building proteins. 442 00:26:31,550 --> 00:26:33,830 It doesn't tell us that much about it. 443 00:26:33,830 --> 00:26:36,470 It looks like a cool polymer, but we can't really 444 00:26:36,470 --> 00:26:39,795 understand the details by not looking at the covalence 445 00:26:39,795 --> 00:26:42,350 of the noncovalent structure. 446 00:26:42,350 --> 00:26:44,900 So there was one key piece of information, 447 00:26:44,900 --> 00:26:46,440 and it's called Chargaff's data. 448 00:26:53,580 --> 00:26:56,200 And this piece of scientific information 449 00:26:56,200 --> 00:26:59,560 ran around the scientific community in the early '50s 450 00:26:59,560 --> 00:27:02,230 because it seemed incredibly important. 451 00:27:02,230 --> 00:27:05,620 And what Chargaff's data was, he collected all kinds 452 00:27:05,620 --> 00:27:09,370 of organisms, and then their nuclei, and then measured-- 453 00:27:09,370 --> 00:27:10,660 or their DNA-- 454 00:27:10,660 --> 00:27:13,870 and then measured the ratio between the purines 455 00:27:13,870 --> 00:27:15,670 and the pyrimidines. 456 00:27:15,670 --> 00:27:18,880 He measured the ratio of the large ones 457 00:27:18,880 --> 00:27:21,490 and the small ones of the nucleobases. 458 00:27:21,490 --> 00:27:25,630 So how many of these relative to how many of those? 459 00:27:25,630 --> 00:27:29,560 And what he found by looking all across organisms 460 00:27:29,560 --> 00:27:32,740 from all domains of life is that there was a one to one 461 00:27:32,740 --> 00:27:35,660 ratio of purine to pyrimidine. 462 00:27:48,170 --> 00:27:50,870 So that became very interesting, because what 463 00:27:50,870 --> 00:27:54,020 it suggested was that in some way, 464 00:27:54,020 --> 00:27:57,380 the noncovalent structure of nucleic acids 465 00:27:57,380 --> 00:28:00,830 had some correlation between the number of the purines 466 00:28:00,830 --> 00:28:02,690 and the number of the pyrimidines. 467 00:28:02,690 --> 00:28:06,050 And what you can imagine is it sounds like we're always 468 00:28:06,050 --> 00:28:10,220 pairing a small one with a large one by looking at that number. 469 00:28:10,220 --> 00:28:12,200 So this is really, really important 470 00:28:12,200 --> 00:28:14,030 because it's like the light bulb that 471 00:28:14,030 --> 00:28:17,440 went on with respect to understanding the structure 472 00:28:17,440 --> 00:28:18,860 of double-stranded DNA. 473 00:28:21,380 --> 00:28:24,140 So despite all kinds of variations, 474 00:28:24,140 --> 00:28:26,900 some organisms have a lot more GCs. 475 00:28:26,900 --> 00:28:28,820 Some have more ATs. 476 00:28:28,820 --> 00:28:32,420 But no matter what, the ratio is always one to one. 477 00:28:32,420 --> 00:28:36,080 And this ultimately led to understanding 478 00:28:36,080 --> 00:28:39,950 the noncovalent structure of double-stranded DNA 479 00:28:39,950 --> 00:28:42,980 because it provided clues to how there could 480 00:28:42,980 --> 00:28:45,920 be some way that information was coded, 481 00:28:45,920 --> 00:28:48,290 but then could be replicated. 482 00:28:48,290 --> 00:28:50,690 Now, the next thing that became the clue 483 00:28:50,690 --> 00:28:53,240 to the structure of double-stranded DNA 484 00:28:53,240 --> 00:28:57,510 came from a very talented researcher, Rosalind Franklin, 485 00:28:57,510 --> 00:29:02,390 who sadly died way before her time of ovarian cancer, 486 00:29:02,390 --> 00:29:04,880 really, in large part because she spent 487 00:29:04,880 --> 00:29:07,370 a lot of time near X-ray beams. 488 00:29:07,370 --> 00:29:11,360 So that would have caused mutations to her DNA. 489 00:29:11,360 --> 00:29:15,950 And she developed a way to make fibrils of DNA that 490 00:29:15,950 --> 00:29:20,870 were ordered enough to collect electron diffraction data. 491 00:29:20,870 --> 00:29:23,300 And that diffraction data actually 492 00:29:23,300 --> 00:29:26,330 gave a clue to some of the dimensions 493 00:29:26,330 --> 00:29:28,820 of the double-stranded DNA structure. 494 00:29:28,820 --> 00:29:30,890 And it actually was the clue that 495 00:29:30,890 --> 00:29:34,460 told the spacing between the strands of DNA. 496 00:29:34,460 --> 00:29:36,890 So it really was a piece of information 497 00:29:36,890 --> 00:29:39,110 that you simply couldn't do without. 498 00:29:39,110 --> 00:29:41,300 With Chargaff's data and with this, 499 00:29:41,300 --> 00:29:45,870 what was called Photograph 51, it really gave you the clue. 500 00:29:45,870 --> 00:29:47,860 And it was really during those years 501 00:29:47,860 --> 00:29:49,910 that Watson and Crick were desperately 502 00:29:49,910 --> 00:29:52,520 model building to try to understand 503 00:29:52,520 --> 00:29:55,010 the noncovalent structure of DNA. 504 00:29:55,010 --> 00:29:58,610 And once they had those two pieces of information, 505 00:29:58,610 --> 00:30:02,840 they could actually put together hand-built models. 506 00:30:02,840 --> 00:30:05,030 This looks kind of clunky, but I know 507 00:30:05,030 --> 00:30:08,720 the room they took this photo in from my years at Caltech. 508 00:30:08,720 --> 00:30:11,000 In fact, I can recognize the room. 509 00:30:11,000 --> 00:30:14,480 They built not just little tiny molecular models, 510 00:30:14,480 --> 00:30:18,920 but big molecular models so they could make measurements to say, 511 00:30:18,920 --> 00:30:21,740 the diffraction data told me this was so many nanometers 512 00:30:21,740 --> 00:30:22,520 apart. 513 00:30:22,520 --> 00:30:24,770 And they were able to piece together 514 00:30:24,770 --> 00:30:26,930 the structure of double-stranded DNA. 515 00:30:26,930 --> 00:30:29,270 But I still haven't shown you how those two 516 00:30:29,270 --> 00:30:31,460 strands come together. 517 00:30:31,460 --> 00:30:35,480 It's really intriguing, because at that very same time, Linus 518 00:30:35,480 --> 00:30:37,010 Pauling, had been-- 519 00:30:37,010 --> 00:30:38,990 done very well with the structure 520 00:30:38,990 --> 00:30:42,080 of the alpha helix and proteins, also 521 00:30:42,080 --> 00:30:46,730 was trying to figure out the structure of DNA. 522 00:30:46,730 --> 00:30:49,730 But he came up with a sort of a crazy structure 523 00:30:49,730 --> 00:30:55,690 where he thought that it was a triple-stranded structure where 524 00:30:55,690 --> 00:31:02,020 the bases actually stuck out, and somehow, 525 00:31:02,020 --> 00:31:06,637 this triple-stranded structure coded for replication of DNA. 526 00:31:06,637 --> 00:31:08,470 Now, there's a ton of things that are really 527 00:31:08,470 --> 00:31:09,880 awful about this structure. 528 00:31:09,880 --> 00:31:11,770 First of all, it's a triple-stranded. 529 00:31:11,770 --> 00:31:13,960 But the other terrible thing is there's 530 00:31:13,960 --> 00:31:17,350 so many phosphates in the backbone 531 00:31:17,350 --> 00:31:20,500 there would have been massive electrostatic repulsion. 532 00:31:20,500 --> 00:31:22,480 Those sequences would want to blow 533 00:31:22,480 --> 00:31:26,710 themselves apart because you can't cram that much 534 00:31:26,710 --> 00:31:28,820 negative all in one place. 535 00:31:28,820 --> 00:31:32,995 But it was really an intriguing sort of sociological phenomena 536 00:31:32,995 --> 00:31:34,180 of the time. 537 00:31:34,180 --> 00:31:38,020 Pauling was a major pacifist, and he was really, really 538 00:31:38,020 --> 00:31:40,150 active in nuclear disarmament. 539 00:31:40,150 --> 00:31:43,540 And they said that his mind just wasn't on some of this stuff 540 00:31:43,540 --> 00:31:45,970 and that this model came out of him really 541 00:31:45,970 --> 00:31:47,620 worrying about other things and not 542 00:31:47,620 --> 00:31:49,850 focusing on the DNA structure. 543 00:31:49,850 --> 00:31:55,000 So let's try to explain Chargaff's data 544 00:31:55,000 --> 00:31:59,490 by looking at the nucleobases and thinking about how 545 00:31:59,490 --> 00:32:01,090 they might come together. 546 00:32:01,090 --> 00:32:03,300 So here I show you the structures 547 00:32:03,300 --> 00:32:05,920 of the four nucleobases in DNA. 548 00:32:05,920 --> 00:32:09,930 Wherever I have an R, you can assume that's part. 549 00:32:09,930 --> 00:32:13,530 That's a ribose that is part of the phosphodiester backbone. 550 00:32:13,530 --> 00:32:17,100 What we want to understand is, how do the nucleobases 551 00:32:17,100 --> 00:32:20,520 come together to form some kind of pair that 552 00:32:20,520 --> 00:32:24,690 could be useful to programming their resynthesis? 553 00:32:24,690 --> 00:32:27,330 So I've drawn them all here, but it's not quite intuitive. 554 00:32:27,330 --> 00:32:30,510 I need to do a little bit of flipping around to line things 555 00:32:30,510 --> 00:32:31,820 up better. 556 00:32:31,820 --> 00:32:34,400 And the other thing I need to do is get things 557 00:32:34,400 --> 00:32:36,270 at the right angles so you can start 558 00:32:36,270 --> 00:32:39,780 seeing how those bases might come together, 559 00:32:39,780 --> 00:32:42,990 because Chargaff's data dictates that you 560 00:32:42,990 --> 00:32:48,170 have a purine and a pyrimidine, purine pyrimidine. 561 00:32:48,170 --> 00:32:51,980 You have pairing between the nucleobases 562 00:32:51,980 --> 00:32:55,850 in your double-stranded DNA in a structure 563 00:32:55,850 --> 00:32:57,125 that looks more like this. 564 00:33:10,120 --> 00:33:14,350 And in each case, you've paired a purine and a pyrimidine. 565 00:33:14,350 --> 00:33:16,570 So what I want you to do is take a look. 566 00:33:16,570 --> 00:33:19,690 I've shown you now where donors and acceptors are. 567 00:33:19,690 --> 00:33:22,000 You can go back and do this for all the nucleobases. 568 00:33:22,000 --> 00:33:24,400 But I'm going to do this for you right now, 569 00:33:24,400 --> 00:33:27,340 by showing you the donors and acceptors of hydrogen 570 00:33:27,340 --> 00:33:30,250 bonds within those structures, what I've done 571 00:33:30,250 --> 00:33:32,320 is I've lined them up beautifully 572 00:33:32,320 --> 00:33:34,660 so they look straight at each other, 573 00:33:34,660 --> 00:33:37,750 so you can tell that there is a complementarity 574 00:33:37,750 --> 00:33:40,180 between a purine and a pyrimidine 575 00:33:40,180 --> 00:33:43,030 that makes very nice hydrogen bonding, which 576 00:33:43,030 --> 00:33:46,120 is the noncovalent force that's very important. 577 00:33:46,120 --> 00:33:50,740 Between G and C, I can set up three hydrogen bonds. 578 00:33:50,740 --> 00:33:55,070 Between A and T, I can only set up two hydrogen bonds. 579 00:33:55,070 --> 00:33:58,390 So the one purine is complementary to one 580 00:33:58,390 --> 00:33:59,350 of the pyrimidines. 581 00:33:59,350 --> 00:34:04,150 One purine is complementary to one of the other pyrimidines. 582 00:34:04,150 --> 00:34:08,590 And then we can draw those hydrogen bonds in place. 583 00:34:08,590 --> 00:34:11,110 That totally explains the measurement 584 00:34:11,110 --> 00:34:15,010 from the Franklin data of the distance, the width 585 00:34:15,010 --> 00:34:17,050 of the double-stranded helix, because it's 586 00:34:17,050 --> 00:34:21,330 identical for both of those base pair options. 587 00:34:21,330 --> 00:34:23,830 And that gives you the structure that 588 00:34:23,830 --> 00:34:27,280 forms the noncovalent structure of DNA, which 589 00:34:27,280 --> 00:34:31,510 is a series of interactions where the solid line is 590 00:34:31,510 --> 00:34:34,780 the phosphodiester backbone, but sticking out 591 00:34:34,780 --> 00:34:37,960 like steps on a spiral staircase are the bases, 592 00:34:37,960 --> 00:34:43,090 where each base is complementary to a specific additional base. 593 00:34:43,090 --> 00:34:46,150 So it predicts the Chargaff ratio, 594 00:34:46,150 --> 00:34:48,469 and it also predicts the distances. 595 00:34:48,469 --> 00:34:52,360 Now, within all the model building, 596 00:34:52,360 --> 00:34:55,480 it became quite clear that the structure, 597 00:34:55,480 --> 00:34:58,240 the noncovalent structure of DNA, 598 00:34:58,240 --> 00:35:02,890 was afforded by antiparallel strands, where one strand went 599 00:35:02,890 --> 00:35:07,060 in one direction, 5 prime to 3 prime, 600 00:35:07,060 --> 00:35:11,080 and the other strand went in the opposite direction, 5 prime 601 00:35:11,080 --> 00:35:12,520 to 3 prime. 602 00:35:12,520 --> 00:35:15,130 When we start replicating DNA, we're 603 00:35:15,130 --> 00:35:17,390 going to see that that's pretty convenient. 604 00:35:17,390 --> 00:35:21,920 But thermodynamically, it is also the favored orientation. 605 00:35:21,920 --> 00:35:23,950 So let's just look at the orientation. 606 00:35:23,950 --> 00:35:31,090 Where you would draw one strand of DNA, 5 prime to 3 prime, 607 00:35:31,090 --> 00:35:33,620 now I've taken this all down to cartoon level. 608 00:35:33,620 --> 00:35:38,020 These are the phosphate diesters, the riboses, the 3 609 00:35:38,020 --> 00:35:41,530 prime end, and the 5 prime end and the bases that 610 00:35:41,530 --> 00:35:43,900 come off at the 1 prime carbon. 611 00:35:43,900 --> 00:35:46,480 And then when you pair it with another strand, 612 00:35:46,480 --> 00:35:49,680 one strand goes in one direction. 613 00:35:49,680 --> 00:35:53,000 5 prime-- whoa, I don't know why this is misbehaving, 5 prime, 614 00:35:53,000 --> 00:35:55,710 whoops-- 615 00:35:55,710 --> 00:35:57,700 5 prime to 3 prime. 616 00:35:57,700 --> 00:36:00,960 The other strand goes in the other direction, 5 prime 617 00:36:00,960 --> 00:36:02,200 to 3 prime. 618 00:36:02,200 --> 00:36:04,710 And when asked this question a few years ago, 619 00:36:04,710 --> 00:36:06,990 I couldn't really explain it very well. 620 00:36:06,990 --> 00:36:09,870 I just said it had to be because it always has been. 621 00:36:09,870 --> 00:36:12,270 But what's really cool is people have 622 00:36:12,270 --> 00:36:18,420 been able to solve the crystal structure of a parallel pair 623 00:36:18,420 --> 00:36:20,340 of DNA strands. 624 00:36:20,340 --> 00:36:24,180 So this is canonical DNA, the beautiful antiparallel 625 00:36:24,180 --> 00:36:24,960 structure. 626 00:36:24,960 --> 00:36:27,960 And it's very regular, very, very even. 627 00:36:27,960 --> 00:36:29,970 It turns out, though, when you try 628 00:36:29,970 --> 00:36:33,780 to pair the two strands in a parallel orientation, 629 00:36:33,780 --> 00:36:37,480 they're very uncomfortable, and it's much less stable. 630 00:36:37,480 --> 00:36:39,750 So the antiparallel orientation is 631 00:36:39,750 --> 00:36:43,350 very important for the thermodynamic stability 632 00:36:43,350 --> 00:36:47,430 and the optimum hydrogen bonding interaction of all those bases 633 00:36:47,430 --> 00:36:48,450 that are pairing. 634 00:36:48,450 --> 00:36:50,700 So it's actually what nature favors 635 00:36:50,700 --> 00:36:52,410 because it is more stable. 636 00:36:52,410 --> 00:36:53,240 Any questions? 637 00:36:55,900 --> 00:36:58,310 And this, it's on your slides. 638 00:36:58,310 --> 00:37:05,090 But you can see just how regular DNA looks so organized, 639 00:37:05,090 --> 00:37:08,690 whereas the antiparallel one, the one, the parallel one, 640 00:37:08,690 --> 00:37:12,080 really does not afford you good hydrogen bonding 641 00:37:12,080 --> 00:37:14,990 interactions at all. 642 00:37:14,990 --> 00:37:18,080 So let us now-- 643 00:37:18,080 --> 00:37:20,060 so what we've done now is we understand 644 00:37:20,060 --> 00:37:23,720 the structure of DNA, the noncovalent and covalent 645 00:37:23,720 --> 00:37:25,100 structure of DNA. 646 00:37:25,100 --> 00:37:27,680 We understand it's antiparallel. 647 00:37:27,680 --> 00:37:29,450 What we'll do in the next class is 648 00:37:29,450 --> 00:37:34,280 show how you can peel apart those antiparallel structures 649 00:37:34,280 --> 00:37:36,410 to make unpaired structures. 650 00:37:36,410 --> 00:37:40,460 And you can use each of them as the template for the synthesis 651 00:37:40,460 --> 00:37:42,350 of a new strand of DNA. 652 00:37:42,350 --> 00:37:45,020 So you can get two daughter double strands 653 00:37:45,020 --> 00:37:47,120 from a single parent double strand. 654 00:37:47,120 --> 00:37:49,880 And that all comes from understanding the structure. 655 00:37:49,880 --> 00:37:53,540 Now, what I want to do is move you just very briefly 656 00:37:53,540 --> 00:37:57,980 to the structure of RNA and comparing the DNA and RNA 657 00:37:57,980 --> 00:38:01,140 structures, because there are some differences. 658 00:38:01,140 --> 00:38:04,460 So let's just work through what the differences are. 659 00:38:04,460 --> 00:38:05,660 I have this written down. 660 00:38:08,648 --> 00:38:10,910 And the differences are very important 661 00:38:10,910 --> 00:38:14,570 for the functional properties. 662 00:38:18,080 --> 00:38:23,410 So DNA, RNA. 663 00:38:23,410 --> 00:38:29,760 First of all, obviously, deoxyribose, ribose. 664 00:38:32,762 --> 00:38:37,320 And you may go, why, why, why is nature so complicated? 665 00:38:37,320 --> 00:38:40,280 Why do I have this extra factoid to remember 666 00:38:40,280 --> 00:38:42,920 about RNA versus DNA? 667 00:38:42,920 --> 00:38:45,620 And it's really amazing that the difference 668 00:38:45,620 --> 00:38:50,090 between having that hydroxyl on the 2 prime position versus not 669 00:38:50,090 --> 00:38:54,350 happening, not having it, makes enormous differences 670 00:38:54,350 --> 00:38:56,810 to the stability of the polymer. 671 00:38:56,810 --> 00:39:00,050 RNAs breakdown very, very readily. 672 00:39:00,050 --> 00:39:03,730 DNAs are stable for the lifetime of a cell, 673 00:39:03,730 --> 00:39:06,680 all perfect in the nucleus or at mitochondria. 674 00:39:06,680 --> 00:39:08,490 They stay intact. 675 00:39:08,490 --> 00:39:13,160 So there's a stability difference 676 00:39:13,160 --> 00:39:14,930 between the two sugars. 677 00:39:14,930 --> 00:39:17,390 Because DNA has to be the place where 678 00:39:17,390 --> 00:39:21,680 you store your genetic material, it's got to stay good, 679 00:39:21,680 --> 00:39:26,690 whereas RNA is the message that you make transiently 680 00:39:26,690 --> 00:39:29,000 to program a protein being made, and then you 681 00:39:29,000 --> 00:39:30,810 want to get rid of it. 682 00:39:30,810 --> 00:39:33,260 So we need the differences in stability 683 00:39:33,260 --> 00:39:37,610 that originate from that small feature. 684 00:39:37,610 --> 00:39:41,912 ATGC-- there's the difference-- 685 00:39:41,912 --> 00:39:47,720 AUGC in the bases. 686 00:39:47,720 --> 00:39:54,830 The most common DNA is double-stranded DNA, 687 00:39:54,830 --> 00:40:02,310 whereas RNA forms various structures, so 688 00:40:02,310 --> 00:40:06,420 much more irregular structures than the DNA, probably in part 689 00:40:06,420 --> 00:40:09,390 because the ribose is substituted differently. 690 00:40:09,390 --> 00:40:13,710 So that continuous strand of double-stranded material 691 00:40:13,710 --> 00:40:16,230 is not quite so stable in RNA. 692 00:40:16,230 --> 00:40:20,420 We find DNA principally as double-stranded DNA. 693 00:40:20,420 --> 00:40:28,260 But the RNA we find as transfer RNA, messenger RNA, 694 00:40:28,260 --> 00:40:32,640 ribosomal RNA-- it does go on forever-- 695 00:40:32,640 --> 00:40:34,410 short interfering RNA. 696 00:40:37,740 --> 00:40:41,610 So various RNA is used for a lot of purposes, 697 00:40:41,610 --> 00:40:45,660 whereas DNA principally stays as the double-stranded DNA. 698 00:40:45,660 --> 00:40:48,060 There's a little double-stranded RNA, 699 00:40:48,060 --> 00:40:52,710 but it is a precursor to some of these other forms of RNA. 700 00:40:52,710 --> 00:40:55,950 So this slide just summarizes some of that for you, 701 00:40:55,950 --> 00:41:00,240 the differences comparing DNA and RNA. 702 00:41:00,240 --> 00:41:04,440 And so what we'll see later is how RNA lends itself 703 00:41:04,440 --> 00:41:07,710 to these interesting structures where you still have some base 704 00:41:07,710 --> 00:41:10,410 pairing, but you have a lot of loops and turns 705 00:41:10,410 --> 00:41:12,300 and diversity of structure. 706 00:41:12,300 --> 00:41:15,960 And that's really kind of the origin of this RNA world, 707 00:41:15,960 --> 00:41:18,390 where RNA structures were not-- 708 00:41:18,390 --> 00:41:22,950 could have variety of form that might contribute 709 00:41:22,950 --> 00:41:26,280 to different functions beyond just as a message, 710 00:41:26,280 --> 00:41:29,040 as a place to store a DNA message. 711 00:41:29,040 --> 00:41:31,350 So there are a lot of things that one 712 00:41:31,350 --> 00:41:35,400 can understand about DNA by knowing its hydrogen bonding 713 00:41:35,400 --> 00:41:36,330 patterns. 714 00:41:36,330 --> 00:41:39,930 So can you guys guess which of these strands 715 00:41:39,930 --> 00:41:41,940 would have a complementary strand 716 00:41:41,940 --> 00:41:45,270 and be the most stable double-stranded DNA? 717 00:41:45,270 --> 00:41:47,250 So this would be one strand. 718 00:41:47,250 --> 00:41:50,640 You could draw for each of them it complementary strand. 719 00:41:50,640 --> 00:41:53,130 Can you guess the clues to figuring out 720 00:41:53,130 --> 00:41:56,100 which would have a most stable organization 721 00:41:56,100 --> 00:41:59,140 of the antiparallel double-stranded DNA? 722 00:41:59,140 --> 00:42:01,330 What would I be looking for? 723 00:42:01,330 --> 00:42:01,830 Yeah? 724 00:42:01,830 --> 00:42:05,880 AUDIENCE: More Gs and Cs [INAUDIBLE] 725 00:42:05,880 --> 00:42:07,440 BARBARA IMPERIALI: So number one, 726 00:42:07,440 --> 00:42:12,300 higher GC content because Gs and Cs form three hydrogen bonds. 727 00:42:12,300 --> 00:42:14,160 As and Ts only form two. 728 00:42:14,160 --> 00:42:16,620 And what's the other determinant, just looking 729 00:42:16,620 --> 00:42:18,432 at those structures? 730 00:42:18,432 --> 00:42:19,374 Yeah? 731 00:42:19,374 --> 00:42:21,730 AUDIENCE: [INAUDIBLE] 732 00:42:21,730 --> 00:42:23,650 BARBARA IMPERIALI: Yeah, you are doing-- 733 00:42:23,650 --> 00:42:24,350 no. 734 00:42:24,350 --> 00:42:27,280 It's actually even more silly. 735 00:42:27,280 --> 00:42:30,022 It's more simple than that. 736 00:42:30,022 --> 00:42:30,730 AUDIENCE: Length? 737 00:42:30,730 --> 00:42:31,813 BARBARA IMPERIALI: Length. 738 00:42:31,813 --> 00:42:33,830 So all you do is you go along and say, 739 00:42:33,830 --> 00:42:38,500 I can make three hydrogen bonds, two, three, two, two, three, 740 00:42:38,500 --> 00:42:39,730 two, two, two. 741 00:42:39,730 --> 00:42:44,810 So you truly just count hydrogen bonds in its partner sequence, 742 00:42:44,810 --> 00:42:47,410 and you can guess which is going to be 743 00:42:47,410 --> 00:42:51,940 the more stable because it has the most hydrogen bonds. 744 00:42:51,940 --> 00:42:53,260 So we might ask you that. 745 00:42:53,260 --> 00:42:54,760 Which one will come apart? 746 00:42:54,760 --> 00:42:58,780 Now, the intriguing thing about DNA is you can peel it. 747 00:42:58,780 --> 00:43:01,930 You can heat it, and it'll come apart. 748 00:43:01,930 --> 00:43:04,270 But it doesn't denature the way proteins do. 749 00:43:04,270 --> 00:43:07,630 If you just cool it down, it comes back together. 750 00:43:07,630 --> 00:43:18,280 So another feature of DNA is that you can heat, denature, 751 00:43:18,280 --> 00:43:25,260 and then reanneal exactly how it was in the first place. 752 00:43:25,260 --> 00:43:28,290 It doesn't denature to something that's not very useful. 753 00:43:32,020 --> 00:43:35,450 And now the question, can you draw the complementary strand? 754 00:43:35,450 --> 00:43:39,200 I always find, of this top strand 755 00:43:39,200 --> 00:43:41,780 here, which of these is the complementary strand? 756 00:43:41,780 --> 00:43:43,490 Frankly, the best way to do it is 757 00:43:43,490 --> 00:43:45,980 sketch out the complementary stand. 758 00:43:45,980 --> 00:43:48,110 You can see it kind of upside down 759 00:43:48,110 --> 00:43:51,680 because it's really hard to draw things 5 prime to 3 prime 760 00:43:51,680 --> 00:43:54,300 when you're also trying to figure out base pairing. 761 00:43:54,300 --> 00:43:55,970 So draw it upside down. 762 00:43:55,970 --> 00:43:59,090 Make sure you know the 5 prime and the 3 prime end. 763 00:43:59,090 --> 00:44:00,980 And then you can guess the right answer 764 00:44:00,980 --> 00:44:04,920 for these types of questions about complementary strands. 765 00:44:04,920 --> 00:44:08,870 Now, one last question, the stability 766 00:44:08,870 --> 00:44:10,290 of double-stranded DNA. 767 00:44:10,290 --> 00:44:14,150 I've made a whole big deal about hydrogen bonding. 768 00:44:14,150 --> 00:44:15,830 That's what holds it together. 769 00:44:15,830 --> 00:44:18,470 What other forces could be at play 770 00:44:18,470 --> 00:44:20,300 in double-stranded DNA that might 771 00:44:20,300 --> 00:44:23,920 contribute to its stability? 772 00:44:23,920 --> 00:44:25,110 Any thoughts? 773 00:44:25,110 --> 00:44:27,360 What else? 774 00:44:27,360 --> 00:44:29,430 Well, it certainly doesn't look like it's 775 00:44:29,430 --> 00:44:32,400 charged, because the predominant charge is negative. 776 00:44:32,400 --> 00:44:36,360 There's not an-- you've probably got metal ions there, kind 777 00:44:36,360 --> 00:44:38,490 of neutralizing that charge. 778 00:44:38,490 --> 00:44:44,320 What would be the other force, and how would I describe it? 779 00:44:44,320 --> 00:44:45,300 It's a tricky one. 780 00:44:45,300 --> 00:44:49,340 So we've got these bases, and they're pretty hydrophobic. 781 00:44:49,340 --> 00:44:50,120 They're planes. 782 00:44:50,120 --> 00:44:53,300 They have electron density on both sides. 783 00:44:53,300 --> 00:44:56,540 So it turns out there is some stability 784 00:44:56,540 --> 00:45:00,440 gained between the packing of the steps of DNA 785 00:45:00,440 --> 00:45:03,180 between each base pair with the next, with the next. 786 00:45:03,180 --> 00:45:05,690 So there are hydrophobic forces. 787 00:45:05,690 --> 00:45:09,830 And researchers at Scripps have actually proved this paradigm 788 00:45:09,830 --> 00:45:14,030 by making extra DNA bases that don't have hydrogen bonding 789 00:45:14,030 --> 00:45:17,330 partnerships, but just provide the stuff that's 790 00:45:17,330 --> 00:45:21,470 the flat hydrophobic entity with the right size that 791 00:45:21,470 --> 00:45:25,840 can slip into DNA sequences and make stable [INAUDIBLE],, 792 00:45:25,840 --> 00:45:29,720 make stable not really base pairs anymore, but just 793 00:45:29,720 --> 00:45:32,260 be stable in that polymeric structure. 794 00:45:32,260 --> 00:45:34,250 Are people understanding and following that? 795 00:45:40,460 --> 00:45:43,400 So finally, when we look at the structure of DNA, 796 00:45:43,400 --> 00:45:46,970 there are some trenches where things can bind to, 797 00:45:46,970 --> 00:45:50,090 proteins can bind, and we talk about the major groove 798 00:45:50,090 --> 00:45:51,170 and the minor groove. 799 00:45:51,170 --> 00:45:52,850 But I will talk about those later 800 00:45:52,850 --> 00:45:56,340 on when we talk about transcription factors. 801 00:45:56,340 --> 00:45:59,630 Now, I just want to, in really triple-fast time, 802 00:45:59,630 --> 00:46:02,900 and I'll put this on the website, 803 00:46:02,900 --> 00:46:05,480 there's tremendous interest in using 804 00:46:05,480 --> 00:46:08,900 the building blocks of DNA for information 805 00:46:08,900 --> 00:46:10,700 storage in computing. 806 00:46:10,700 --> 00:46:14,430 So if you look up DNA-based computing on Wikipedia, 807 00:46:14,430 --> 00:46:16,520 you'll learn a whole lot about it. 808 00:46:16,520 --> 00:46:19,400 Because what's so exciting about it is it's 809 00:46:19,400 --> 00:46:23,000 an organized nanoscale material that 810 00:46:23,000 --> 00:46:27,540 can be programmed to base pair and form certain structures. 811 00:46:27,540 --> 00:46:30,470 So in the sort of range of different sizes, 812 00:46:30,470 --> 00:46:33,050 there's been a lot of interest in DNA 813 00:46:33,050 --> 00:46:35,210 as a material for information storage, 814 00:46:35,210 --> 00:46:38,510 not for your genetic material, but for plain old information 815 00:46:38,510 --> 00:46:39,380 storage. 816 00:46:39,380 --> 00:46:42,980 So people have learned how to build structures of DNA 817 00:46:42,980 --> 00:46:46,610 where they can construct these sort of cruciform structures 818 00:46:46,610 --> 00:46:48,170 by base pairing. 819 00:46:48,170 --> 00:46:51,080 They can make the arms of these structures a little bit 820 00:46:51,080 --> 00:46:52,010 extended. 821 00:46:52,010 --> 00:46:54,860 So you could start joining those things together 822 00:46:54,860 --> 00:46:59,510 to make very defined three-dimensional entities. 823 00:46:59,510 --> 00:47:01,790 They went kind of nuts doing this sort of stuff 824 00:47:01,790 --> 00:47:04,460 because you can build sort of tetrahedra 825 00:47:04,460 --> 00:47:07,170 and other sort of shapes and sizes, 826 00:47:07,170 --> 00:47:10,220 all by strands that base pair, that 827 00:47:10,220 --> 00:47:13,370 are about 10 base pairs long, that are stable, 828 00:47:13,370 --> 00:47:16,170 and only complement certain other base pairs. 829 00:47:16,170 --> 00:47:18,230 So you could literally build up-- they often 830 00:47:18,230 --> 00:47:21,830 called it DNA origami because you can build up 831 00:47:21,830 --> 00:47:27,440 macroscopic structures just by the assembly of strands of DNA 832 00:47:27,440 --> 00:47:31,250 that will ultimately fold to form the best complementary DNA 833 00:47:31,250 --> 00:47:32,810 to form the structures. 834 00:47:32,810 --> 00:47:35,000 And it's also been found-- 835 00:47:35,000 --> 00:47:38,750 as I said, they went completely nuts-- smiley faces and stars 836 00:47:38,750 --> 00:47:40,280 and stripes and so on. 837 00:47:40,280 --> 00:47:43,700 But the most valuable thing you can-- as I said, 838 00:47:43,700 --> 00:47:45,650 you can read more about this-- 839 00:47:45,650 --> 00:47:49,990 is to use DNA as logic gates to define and, 840 00:47:49,990 --> 00:47:53,780 or, or not, so the sort of three options, 841 00:47:53,780 --> 00:47:57,980 and actually use them to program certain puzzles where 842 00:47:57,980 --> 00:48:01,340 the DNA will spit out the answer to a particular puzzle 843 00:48:01,340 --> 00:48:03,620 through a logic diagram. 844 00:48:03,620 --> 00:48:06,650 So those of you who are interested in computing 845 00:48:06,650 --> 00:48:10,130 and these kinds of logic puzzles may 846 00:48:10,130 --> 00:48:12,200 want to read a little bit more, because DNA 847 00:48:12,200 --> 00:48:15,800 is such a reliable noncovalent structure, 848 00:48:15,800 --> 00:48:19,100 where those base pairs are incredibly reliable, that you 849 00:48:19,100 --> 00:48:22,580 can start envisioning not just building double-stranded DNA, 850 00:48:22,580 --> 00:48:26,240 but building all kinds of architectures or programming 851 00:48:26,240 --> 00:48:28,730 things with the sequence of DNA. 852 00:48:28,730 --> 00:48:30,500 And that's it for today. 853 00:48:30,500 --> 00:48:33,460 And that's the end of the biochemistry section.