1 00:00:15,947 --> 00:00:18,030 BARBARA IMPERIALI: So what we're going to do today 2 00:00:18,030 --> 00:00:21,030 is we're going to finish up a bit of transcription, 3 00:00:21,030 --> 00:00:22,650 and then I'm going to talk largely 4 00:00:22,650 --> 00:00:24,240 about transcription control. 5 00:00:24,240 --> 00:00:27,750 Because it's tremendously important that we actually 6 00:00:27,750 --> 00:00:30,270 not just know how to transcribe, but we 7 00:00:30,270 --> 00:00:33,900 know how it's controlled, and how ultimately 8 00:00:33,900 --> 00:00:39,060 the appropriate messenger RNA is sent out for a translation 9 00:00:39,060 --> 00:00:39,910 into proteins. 10 00:00:39,910 --> 00:00:43,620 So transcription control is a very important component. 11 00:00:43,620 --> 00:00:47,230 So let me just start with this first. 12 00:00:47,230 --> 00:00:49,890 So these are typical of questions. 13 00:00:49,890 --> 00:00:52,560 And they look like, gosh, I don't have enough information 14 00:00:52,560 --> 00:00:54,210 to answer this question. 15 00:00:54,210 --> 00:00:58,170 And it's basically about what's known as the transcription 16 00:00:58,170 --> 00:01:02,940 bubble, which is the portion of the double-stranded DNA, where 17 00:01:02,940 --> 00:01:05,080 transcription is occurring. 18 00:01:05,080 --> 00:01:08,190 And so a transiently, the double-stranded DNA 19 00:01:08,190 --> 00:01:12,540 is opened up for the RNA polymerase, which 20 00:01:12,540 --> 00:01:15,930 remember has inbuilt helicase activity to start 21 00:01:15,930 --> 00:01:17,920 transcribing the gene. 22 00:01:17,920 --> 00:01:20,880 So let's just say we have the information we might give 23 00:01:20,880 --> 00:01:24,150 you is that transcription starts here 24 00:01:24,150 --> 00:01:27,450 at this point on the double-stranded DNA. 25 00:01:27,450 --> 00:01:31,050 And what we want you to know is that you 26 00:01:31,050 --> 00:01:34,110 have all the information you need to know which of the two 27 00:01:34,110 --> 00:01:35,930 strands is copied. 28 00:01:35,930 --> 00:01:37,680 Have you had a chance to think about that? 29 00:01:37,680 --> 00:01:40,920 Does anyone want to give me a good answer to that? 30 00:01:40,920 --> 00:01:42,030 Yes, here. 31 00:01:42,030 --> 00:01:43,530 AUDIENCE: The bottom [INAUDIBLE].. 32 00:01:43,530 --> 00:01:44,988 BARBARA IMPERIALI: And why is that? 33 00:01:44,988 --> 00:01:46,688 AUDIENCE: Because it's [INAUDIBLE].. 34 00:01:46,688 --> 00:01:47,730 BARBARA IMPERIALI: Right. 35 00:01:47,730 --> 00:01:51,000 So that reading things 3 prime to 5 prime, 36 00:01:51,000 --> 00:01:54,420 and making things 5 prime to 3 prime, 37 00:01:54,420 --> 00:01:58,000 is all you need to know to answer this question. 38 00:01:58,000 --> 00:02:00,090 So you would straightaway know, OK, 39 00:02:00,090 --> 00:02:02,730 we're going to start somewhere here, 40 00:02:02,730 --> 00:02:04,770 but we're actually going to start 41 00:02:04,770 --> 00:02:06,900 on the lower strand, because that's the one 3 42 00:02:06,900 --> 00:02:08,250 prime to 5 prime. 43 00:02:08,250 --> 00:02:12,870 And we're going to make the new transcript 44 00:02:12,870 --> 00:02:16,050 in the appropriate direction, and know what it's going to be. 45 00:02:16,050 --> 00:02:20,040 So you can only read the bottom strand in this case. 46 00:02:20,040 --> 00:02:22,200 But watch out, because you might be 47 00:02:22,200 --> 00:02:27,160 reading the top strand in the appropriate direction as well. 48 00:02:27,160 --> 00:02:30,570 But what we've told you already is where the start site is. 49 00:02:30,570 --> 00:02:32,640 So you're going to know that information. 50 00:02:32,640 --> 00:02:36,910 If we'd put, for example, that the start site was over here, 51 00:02:36,910 --> 00:02:39,910 you might have different answers to this question. 52 00:02:39,910 --> 00:02:43,230 So the only singular piece of information you need 53 00:02:43,230 --> 00:02:46,770 is that you read 3 to 5, and you make 5 to 3. 54 00:02:46,770 --> 00:02:49,980 Then you can also fill in the bases 55 00:02:49,980 --> 00:02:53,760 to know what the transcribed sequence looks like. 56 00:02:53,760 --> 00:02:56,370 And I always recommend when you're filling in bases, 57 00:02:56,370 --> 00:02:58,770 you just write 5 prime to 3 prime. 58 00:02:58,770 --> 00:03:02,580 So you really are following properly which is being read, 59 00:03:02,580 --> 00:03:04,620 and the direction it's being read. 60 00:03:04,620 --> 00:03:08,910 And so if you were asked what the new messenger RNA sequence 61 00:03:08,910 --> 00:03:10,700 would be, you'd get it right. 62 00:03:10,700 --> 00:03:12,870 And here I just have a few questions 63 00:03:12,870 --> 00:03:16,020 that will help highlight the differences 64 00:03:16,020 --> 00:03:19,980 between the transcription process and the replication 65 00:03:19,980 --> 00:03:20,880 process. 66 00:03:20,880 --> 00:03:23,310 So can you guys just read each of these, 67 00:03:23,310 --> 00:03:25,410 and see whether they look like rules 68 00:03:25,410 --> 00:03:27,780 that apply to transcription, or they're 69 00:03:27,780 --> 00:03:30,640 things that are not true about transcription. 70 00:03:30,640 --> 00:03:34,980 So just give you one second to think about that, who? 71 00:03:34,980 --> 00:03:40,100 OK, who would like to give me an answer? 72 00:03:40,100 --> 00:03:41,950 Someone over in this part of the room, 73 00:03:41,950 --> 00:03:43,981 I haven't heard as much-- yes? 74 00:03:43,981 --> 00:03:46,462 AUDIENCE: Makes a complete copy of one strata. 75 00:03:46,462 --> 00:03:47,420 BARBARA IMPERIALI: DNA. 76 00:03:47,420 --> 00:03:51,620 OK, so the correct answer is that we only transcribe 77 00:03:51,620 --> 00:03:55,670 about 1.5% of the genomic DNA. 78 00:03:55,670 --> 00:03:58,040 We're not transcribing the whole thing. 79 00:03:58,040 --> 00:04:02,220 We're only transcribing the bits that we need to make proteins. 80 00:04:02,220 --> 00:04:06,260 So the correct answer is C, because we 81 00:04:06,260 --> 00:04:09,650 don't make a copy of the whole of genomic DNA. 82 00:04:09,650 --> 00:04:10,650 But these are all right. 83 00:04:10,650 --> 00:04:12,560 We have a different set of nucleotides, 84 00:04:12,560 --> 00:04:15,950 or rather a difference in one of the nucleotide triphosphate 85 00:04:15,950 --> 00:04:18,140 building blocks. 86 00:04:18,140 --> 00:04:21,890 Remember that RNA polymerase does not require a primer. 87 00:04:21,890 --> 00:04:24,860 That was a complication when we looked at replication, 88 00:04:24,860 --> 00:04:27,740 because we had to paste in primers, made often 89 00:04:27,740 --> 00:04:31,390 by the primase, RNA primase. 90 00:04:31,390 --> 00:04:36,410 All RNA polymerase is much cleverer than DNA polymerase, 91 00:04:36,410 --> 00:04:38,900 because it has the polymerase activity. 92 00:04:38,900 --> 00:04:41,240 It has the helicase activity. 93 00:04:41,240 --> 00:04:43,880 There isn't needed a topoisomerase, 94 00:04:43,880 --> 00:04:45,980 because we're only opening a little bit 95 00:04:45,980 --> 00:04:47,690 of the double-stranded to copy. 96 00:04:47,690 --> 00:04:50,720 I'll show you a movie in a second. 97 00:04:50,720 --> 00:04:54,380 And it also has a 3 prime exonuclease, 98 00:04:54,380 --> 00:05:01,460 which means that RNA is able to do its own proofreading, just 99 00:05:01,460 --> 00:05:03,990 like the DNA polymerase. 100 00:05:03,990 --> 00:05:08,180 So the error rate is similar to the error 101 00:05:08,180 --> 00:05:10,430 rate with the exonuclease activity for. 102 00:05:10,430 --> 00:05:17,683 DNA so that's about 1 in 10 to the 5 to 10 to the sixth. 103 00:05:17,683 --> 00:05:19,100 Now I'm going to talk in a minute. 104 00:05:19,100 --> 00:05:23,360 There's not the same cadre of repair enzymes 105 00:05:23,360 --> 00:05:25,250 that we have for DNA. 106 00:05:25,250 --> 00:05:27,560 And that's a curious thing, until you start thinking 107 00:05:27,560 --> 00:05:29,730 about what the reality is. 108 00:05:29,730 --> 00:05:32,400 So let me try to lead you to my thinking. 109 00:05:32,400 --> 00:05:35,060 So with genomic DNA, that's the copy 110 00:05:35,060 --> 00:05:38,630 of the DNA that's in the nucleus that has to stay good. 111 00:05:38,630 --> 00:05:41,630 So if there's any mistakes, we need them cleaned up. 112 00:05:41,630 --> 00:05:44,840 Because otherwise when we replicate 113 00:05:44,840 --> 00:05:47,180 a set of all of the double-stranded DNA, 114 00:05:47,180 --> 00:05:52,070 there will be an error in the progeny, the daughter cells. 115 00:05:52,070 --> 00:05:56,030 For transcribing RNA, an error rate of about 1 116 00:05:56,030 --> 00:06:01,790 in 10 to the 5 to 1 in 10 to a 6, is just fine. 117 00:06:01,790 --> 00:06:06,740 That piece of RNA ends up, after a few of the processing 118 00:06:06,740 --> 00:06:10,430 steps that I'll describe to you, ends up leaving the nucleus 119 00:06:10,430 --> 00:06:12,710 to be made into proteins. 120 00:06:12,710 --> 00:06:15,890 And it's not such a bad thing if you have a little bit 121 00:06:15,890 --> 00:06:17,990 more error in that RNA. 122 00:06:17,990 --> 00:06:20,960 Why is that? 123 00:06:20,960 --> 00:06:22,940 Over here, or up there actually. 124 00:06:22,940 --> 00:06:23,580 Yeah? 125 00:06:23,580 --> 00:06:26,822 AUDIENCE: It has a shorter life, so it's not 126 00:06:26,822 --> 00:06:30,652 going to mess up everything for the rest of its life. 127 00:06:30,652 --> 00:06:33,110 BARBARA IMPERIALI: Right, so if you have a bit of an error, 128 00:06:33,110 --> 00:06:36,980 maybe you make a new protein, but it's not full length, 129 00:06:36,980 --> 00:06:40,790 or the protein you make isn't perfect; it's OK. 130 00:06:40,790 --> 00:06:43,820 Because there'll be other transcripts that are correct, 131 00:06:43,820 --> 00:06:47,160 and then after you've done making the protein, 132 00:06:47,160 --> 00:06:49,430 then you're just going to destroy 133 00:06:49,430 --> 00:06:52,910 that RNA, because it has a transient lifetime because 134 00:06:52,910 --> 00:06:55,850 of the structure of the RNA, and the nucleases that 135 00:06:55,850 --> 00:06:57,270 chew up the RNA. 136 00:06:57,270 --> 00:07:01,760 So this is an acceptable error rate for RNA polymerase. 137 00:07:01,760 --> 00:07:04,280 Remember, it's not an acceptable error rate 138 00:07:04,280 --> 00:07:05,660 to have in your genome. 139 00:07:05,660 --> 00:07:07,280 OK, it's just too big. 140 00:07:07,280 --> 00:07:08,330 Any questions about that? 141 00:07:08,330 --> 00:07:11,060 Does that all make sense? 142 00:07:11,060 --> 00:07:11,560 OK. 143 00:07:14,230 --> 00:07:15,620 All right, so now we want to talk 144 00:07:15,620 --> 00:07:18,030 about transcription control. 145 00:07:18,030 --> 00:07:20,870 But before I do that, I do want to very quickly show you 146 00:07:20,870 --> 00:07:25,160 something, because I feel like it really caps off 147 00:07:25,160 --> 00:07:28,140 the transcription part. 148 00:07:28,140 --> 00:07:29,820 And I'm going to show you only about 149 00:07:29,820 --> 00:07:33,513 a minute of this, because once again, I 150 00:07:33,513 --> 00:07:34,680 love the sound effects here. 151 00:07:37,970 --> 00:07:39,540 And I can't turn down the volume. 152 00:07:39,540 --> 00:07:43,340 This is one of those animations showing transcription. 153 00:07:43,340 --> 00:07:45,960 And it basically shows on the double-stranded DNA, 154 00:07:45,960 --> 00:07:50,010 lot of things accumulating to make a decision with respect 155 00:07:50,010 --> 00:07:51,330 to starting. 156 00:07:51,330 --> 00:07:55,560 But now, the whole complex, the RNA polymerase 157 00:07:55,560 --> 00:08:00,750 is just screeching through the DNA, making that messenger RNA. 158 00:08:00,750 --> 00:08:05,220 And you're only unraveling just a little bit of DNA 159 00:08:05,220 --> 00:08:06,970 around that transcription bubble. 160 00:08:06,970 --> 00:08:10,560 So you wouldn't need topoisomerase in this case. 161 00:08:10,560 --> 00:08:13,700 You're only copying one of the two strands. 162 00:08:13,700 --> 00:08:16,170 We saw how to identify which. 163 00:08:16,170 --> 00:08:19,500 And thus, the new RNA strand is basically 164 00:08:19,500 --> 00:08:21,210 falling out of the complex. 165 00:08:21,210 --> 00:08:24,084 You can see [INAUDIBLE] it's moving a lot. 166 00:08:24,084 --> 00:08:26,479 So that gives you a feel for things. 167 00:08:30,293 --> 00:08:31,960 When I hear that music, I'm just sort of 168 00:08:31,960 --> 00:08:33,940 waiting for the whole thing to crash somewhere. 169 00:08:33,940 --> 00:08:38,039 But who would know, right? 170 00:08:38,039 --> 00:08:38,590 All right. 171 00:08:38,590 --> 00:08:41,890 So at the beginning of that animation 172 00:08:41,890 --> 00:08:43,929 a few things were highlighted, where 173 00:08:43,929 --> 00:08:47,620 there's the double-stranded DNA, and a collection of entities 174 00:08:47,620 --> 00:08:50,440 starts clustering around where the reading is 175 00:08:50,440 --> 00:08:51,980 going to take place. 176 00:08:51,980 --> 00:08:56,080 So this thin, black strand is the double-stranded DNA. 177 00:08:56,080 --> 00:08:59,800 So some of the entities accumulate quite close 178 00:08:59,800 --> 00:09:01,270 to the start site. 179 00:09:01,270 --> 00:09:04,180 There is a section of DNA known as a TATA box 180 00:09:04,180 --> 00:09:08,020 that we spoke about, at the very end of the last lecture. 181 00:09:08,020 --> 00:09:11,080 But there are a number of transcription factors 182 00:09:11,080 --> 00:09:12,790 that are quite important. 183 00:09:12,790 --> 00:09:15,010 But then you might also, if you go back and look 184 00:09:15,010 --> 00:09:18,580 at that animation, there are sections of DNA 185 00:09:18,580 --> 00:09:24,010 that are at quite a distance that also help really regulate 186 00:09:24,010 --> 00:09:26,060 and promote transcription. 187 00:09:26,060 --> 00:09:29,320 So many factors regulate where the transcription occurs. 188 00:09:29,320 --> 00:09:32,080 First of all, is there a promoter site right 189 00:09:32,080 --> 00:09:33,340 near where we need to start? 190 00:09:33,340 --> 00:09:36,790 That causes a fair amount of collecting 191 00:09:36,790 --> 00:09:39,040 of complex components. 192 00:09:39,040 --> 00:09:41,800 But then there may be other things at a distance. 193 00:09:41,800 --> 00:09:44,860 So the promoter region might be located quite 194 00:09:44,860 --> 00:09:47,380 near the transcription start site, 195 00:09:47,380 --> 00:09:49,690 but then there are also enhancers 196 00:09:49,690 --> 00:09:52,360 that can be located at quite a big distance 197 00:09:52,360 --> 00:09:56,500 away from the start site that also play a role. 198 00:09:56,500 --> 00:09:58,780 Why do we need this much control? 199 00:09:58,780 --> 00:10:00,940 Because we don't need to be making 200 00:10:00,940 --> 00:10:04,960 the RNA for every protein all at the same time. 201 00:10:04,960 --> 00:10:07,360 Certain times in cell cycle you'll see, 202 00:10:07,360 --> 00:10:10,720 I only need to make one protein or a different protein. 203 00:10:10,720 --> 00:10:15,430 So we need all of this control to decide when transcription 204 00:10:15,430 --> 00:10:18,850 occurs, when do we need to make the messenger 205 00:10:18,850 --> 00:10:23,220 RNA to make our favorite new protein that needs to be made. 206 00:10:23,220 --> 00:10:26,740 And you'll learn a lot in signal transduction and cell 207 00:10:26,740 --> 00:10:29,410 cycle, where we really show you how 208 00:10:29,410 --> 00:10:31,870 a lot of the housekeeping genes are all fine. 209 00:10:31,870 --> 00:10:33,035 The proteins are all there. 210 00:10:33,035 --> 00:10:34,660 But at a certain stage, we need to make 211 00:10:34,660 --> 00:10:36,670 more of a particular protein. 212 00:10:36,670 --> 00:10:39,250 And that's when the transcription control 213 00:10:39,250 --> 00:10:41,020 comes into play. 214 00:10:41,020 --> 00:10:42,850 And you'll commonly hear about things 215 00:10:42,850 --> 00:10:44,125 called transcription factors. 216 00:10:52,590 --> 00:10:56,440 And those may be the proteins, for example, 217 00:10:56,440 --> 00:10:59,410 that regulate that transcription should start. 218 00:10:59,410 --> 00:11:01,930 In other cases, there may be times when 219 00:11:01,930 --> 00:11:04,750 transcription is turned down. 220 00:11:04,750 --> 00:11:06,790 So we have things that activate. 221 00:11:10,020 --> 00:11:13,440 So we start transcription, we make that happen, 222 00:11:13,440 --> 00:11:14,640 or others that repress. 223 00:11:17,750 --> 00:11:19,950 So they turn down transcription. 224 00:11:19,950 --> 00:11:23,750 So it's all about when you need to start, 225 00:11:23,750 --> 00:11:26,480 which is controlled by external factors 226 00:11:26,480 --> 00:11:28,460 acting on the double-stranded DNA 227 00:11:28,460 --> 00:11:32,360 to send the transcription complex, making the new entity. 228 00:11:32,360 --> 00:11:33,830 All right, does that make sense? 229 00:11:33,830 --> 00:11:37,520 We can't have a dysregulated system, or if we do, 230 00:11:37,520 --> 00:11:40,520 then we have problems with, for example, 231 00:11:40,520 --> 00:11:43,190 proliferation of cells. 232 00:11:43,190 --> 00:11:45,170 All right, so what I want to talk about 233 00:11:45,170 --> 00:11:48,590 are the key things that regulate transcription, 234 00:11:48,590 --> 00:11:52,790 and the key things that we do to the messenger that's been made. 235 00:11:52,790 --> 00:11:54,770 So what you're seeing here on this slide 236 00:11:54,770 --> 00:11:57,350 is just where we are in the process. 237 00:11:57,350 --> 00:11:58,730 We've seen replication. 238 00:11:58,730 --> 00:12:00,650 We're now at the transcription step. 239 00:12:00,650 --> 00:12:05,210 But there are many steps to go in eukaryotes 240 00:12:05,210 --> 00:12:08,150 before that transcript can leave the nucleus. 241 00:12:08,150 --> 00:12:09,280 All right? 242 00:12:09,280 --> 00:12:14,130 And I want you to remember the difference 243 00:12:14,130 --> 00:12:17,400 between eukaryotic and prokaryotic cells. 244 00:12:17,400 --> 00:12:20,190 This was just a picture we saw very early on. 245 00:12:20,190 --> 00:12:22,920 Prokaryotic cells, like bacteria, 246 00:12:22,920 --> 00:12:24,610 do not have a nucleus. 247 00:12:24,610 --> 00:12:27,000 They have an area called a nucleoid, 248 00:12:27,000 --> 00:12:32,040 but they don't have a discrete membrane encased organelle, 249 00:12:32,040 --> 00:12:36,030 where the processes of replication and transcription 250 00:12:36,030 --> 00:12:36,930 occur. 251 00:12:36,930 --> 00:12:39,450 In contrast, in eukaryotes, there 252 00:12:39,450 --> 00:12:42,780 is a discrete area of the cell, the nucleus, 253 00:12:42,780 --> 00:12:46,140 that includes all the machinery for replication 254 00:12:46,140 --> 00:12:47,850 and transcription. 255 00:12:47,850 --> 00:12:50,100 And it also includes the machinery 256 00:12:50,100 --> 00:13:03,620 that takes a pre-messenger RNA into a mature messenger 257 00:13:03,620 --> 00:13:05,840 RNA that can leave the nucleus. 258 00:13:10,860 --> 00:13:12,820 If you send that pre-messenger out there, 259 00:13:12,820 --> 00:13:14,820 it's going to have a lot of stuff wrong with it. 260 00:13:14,820 --> 00:13:16,700 It's not going to be ready to face 261 00:13:16,700 --> 00:13:18,350 the outside of the nucleus. 262 00:13:18,350 --> 00:13:20,310 It's going to be readily degraded. 263 00:13:20,310 --> 00:13:22,260 It's not going to have the full information. 264 00:13:22,260 --> 00:13:25,190 So I want to talk to you about the processes that 265 00:13:25,190 --> 00:13:27,470 are put in place for this conversion 266 00:13:27,470 --> 00:13:31,850 from the pre-messenger RNA to the messenger RNA, which 267 00:13:31,850 --> 00:13:34,730 we don't have to think about in prokaryotes, 268 00:13:34,730 --> 00:13:41,750 the small organisms without organelles. 269 00:13:41,750 --> 00:13:45,770 All right, so, and this is the foundation 270 00:13:45,770 --> 00:13:48,215 of transcriptional control. 271 00:13:51,160 --> 00:13:51,956 All right. 272 00:13:57,550 --> 00:14:05,380 So, we've talked about promoters and enhancers. 273 00:14:08,110 --> 00:14:09,520 Those happen early. 274 00:14:09,520 --> 00:14:14,230 That's all about making the pre-messenger RNA. 275 00:14:14,230 --> 00:14:18,130 But now we have to discuss some aspects that are also 276 00:14:18,130 --> 00:14:24,100 critical for making the initial pre-messenger RNA, and that's 277 00:14:24,100 --> 00:14:34,770 chromatin remodelers, or chromatin remodeling. 278 00:14:34,770 --> 00:14:38,520 Because in order to transcribe anything, 279 00:14:38,520 --> 00:14:43,020 there's a lot of ground to cover with respect to unwrapping 280 00:14:43,020 --> 00:14:45,060 the chromatin, the chromosomes. 281 00:14:45,060 --> 00:14:47,200 Because they're all packed up in such a way 282 00:14:47,200 --> 00:14:50,490 that you can't possibly do any transcription there, 283 00:14:50,490 --> 00:14:52,890 because they're too tightly packed in order 284 00:14:52,890 --> 00:14:55,890 to be accessible for the transcription machinery 285 00:14:55,890 --> 00:14:57,030 to tackle it. 286 00:14:57,030 --> 00:14:59,700 So I show you parts of that machinery up here. 287 00:14:59,700 --> 00:15:03,300 These are the nucleosomes that make up chromatin, 288 00:15:03,300 --> 00:15:05,340 which makes up the chromosomes. 289 00:15:05,340 --> 00:15:09,210 And in order for you to even be able to start transcribing, 290 00:15:09,210 --> 00:15:14,970 you have to unravel that part, those complex structures that 291 00:15:14,970 --> 00:15:16,290 are tightly wrapped up. 292 00:15:16,290 --> 00:15:20,010 You've got make the double-stranded DNA accessible, 293 00:15:20,010 --> 00:15:21,930 otherwise you can't break into it. 294 00:15:21,930 --> 00:15:25,770 So there are two things that also contribute 295 00:15:25,770 --> 00:15:32,730 to allowing transcription, and those occur both at the DNA 296 00:15:32,730 --> 00:15:36,400 level, and the histone level. 297 00:15:36,400 --> 00:15:40,020 And I'm going to talk about the histone level changes first. 298 00:15:40,020 --> 00:15:45,300 I want you to recall that histones are proteins that 299 00:15:45,300 --> 00:15:50,310 have a lot of positive charge, by virtue of the fact that they 300 00:15:50,310 --> 00:15:56,580 include two of the positively charged amino acids, arginine 301 00:15:56,580 --> 00:15:57,690 and lysine. 302 00:15:57,690 --> 00:15:59,640 And if you're curious about those structures, 303 00:15:59,640 --> 00:16:02,910 you can go look back at the table of the amino acids 304 00:16:02,910 --> 00:16:06,150 and see that those guys are always positively charged. 305 00:16:06,150 --> 00:16:11,040 The reason we use histones as the core of the nucleus home 306 00:16:11,040 --> 00:16:13,790 structure, is they're very positively charged, 307 00:16:13,790 --> 00:16:16,740 and they neutralize the dense negative charge 308 00:16:16,740 --> 00:16:17,910 of the nucleic acid. 309 00:16:17,910 --> 00:16:21,120 Otherwise we couldn't pack it up as tightly as we do. 310 00:16:21,120 --> 00:16:25,330 So changes that occur at the histone level, 311 00:16:25,330 --> 00:16:27,660 and their remodeling of the chromatin 312 00:16:27,660 --> 00:16:30,300 in order to promote transcription 313 00:16:30,300 --> 00:16:39,850 is modification oftentimes to neutralize those charges. 314 00:16:39,850 --> 00:16:42,970 So the one most obvious one I'll show you, and then there 315 00:16:42,970 --> 00:16:46,660 are others where you add methyl groups that 316 00:16:46,660 --> 00:16:47,830 dampen down the charge. 317 00:16:47,830 --> 00:16:50,740 But I'm just going to show you the very obvious one. 318 00:16:50,740 --> 00:16:59,140 So let's just look at lysine in a protein. 319 00:17:02,740 --> 00:17:03,730 That looks like this. 320 00:17:03,730 --> 00:17:06,069 That is a terrible drawing. 321 00:17:06,069 --> 00:17:08,319 It has a positive charge for bonds 322 00:17:08,319 --> 00:17:12,109 to nitrogen. In order to neutralize that charge, 323 00:17:12,109 --> 00:17:15,310 there are enzymes that acylate, or transfer 324 00:17:15,310 --> 00:17:19,359 an acyl group to turn this positively charged amine 325 00:17:19,359 --> 00:17:21,640 into a neutral amide. 326 00:17:21,640 --> 00:17:23,500 Let me draw that side chain, because I think 327 00:17:23,500 --> 00:17:26,619 it makes much more sense to understand it, 328 00:17:26,619 --> 00:17:29,560 this particular thing from a chemical perspective. 329 00:17:29,560 --> 00:17:35,230 So we still have the n but it is now part of an amide. 330 00:17:39,790 --> 00:17:43,990 So the charge has been neutralized on the nitrogen. 331 00:17:43,990 --> 00:17:48,520 If the charge is neutralized on the positively charged residues 332 00:17:48,520 --> 00:17:51,790 in the histones, the DNA will be encouraged 333 00:17:51,790 --> 00:17:54,070 to unravel from those. 334 00:17:54,070 --> 00:17:57,010 It makes good chemical sense that that's a good way 335 00:17:57,010 --> 00:17:59,920 to start on packaging DNA. 336 00:17:59,920 --> 00:18:03,220 The other modification occurs at the DNA level 337 00:18:03,220 --> 00:18:09,560 and it's methylation of cytosine. 338 00:18:12,880 --> 00:18:15,910 A little harder to explain from a chemical sense, 339 00:18:15,910 --> 00:18:18,580 but up there in the corner of this slide 340 00:18:18,580 --> 00:18:20,830 I've shown you the pictures. 341 00:18:20,830 --> 00:18:24,700 So methylation on cytosine would look like this. 342 00:18:24,700 --> 00:18:28,850 This encourages stabilization of the chromatin. 343 00:18:36,550 --> 00:18:38,620 So what would that do to transcription if I've 344 00:18:38,620 --> 00:18:41,960 stabilized the chromatin? 345 00:18:41,960 --> 00:18:42,910 It represses it. 346 00:18:42,910 --> 00:18:44,530 It turns it down. 347 00:18:44,530 --> 00:18:48,900 So these are pairs of changes that act in opposite direction. 348 00:18:48,900 --> 00:18:57,410 So we'd go down-regulate transcription. 349 00:18:57,410 --> 00:19:01,840 OK, so DNA methylation causes the chromatin 350 00:19:01,840 --> 00:19:04,750 to be a bit more compact, more stable. 351 00:19:04,750 --> 00:19:07,750 It's much harder to unravel the DNA. 352 00:19:07,750 --> 00:19:11,410 It's obviously then harder to start transcription. 353 00:19:11,410 --> 00:19:15,970 In contrast, modification of the histone proteins 354 00:19:15,970 --> 00:19:24,490 to neutralize their charges destabilizes and up-regulates 355 00:19:24,490 --> 00:19:27,040 transcription, because it's allowing 356 00:19:27,040 --> 00:19:30,250 us to open up the nucleosomes in order 357 00:19:30,250 --> 00:19:32,448 to make the DNA available. 358 00:19:32,448 --> 00:19:33,490 Does that all make sense? 359 00:19:33,490 --> 00:19:39,060 So we have two counter-balances that play in each direction. 360 00:19:39,060 --> 00:19:39,560 OK? 361 00:19:43,210 --> 00:19:45,816 All right, so that's chromatic remodeling. 362 00:19:53,440 --> 00:19:56,720 The next two transformations I'll show you 363 00:19:56,720 --> 00:19:59,550 may certainly look fairly complicated. 364 00:19:59,550 --> 00:20:01,730 But I'm going to describe them to you, 365 00:20:01,730 --> 00:20:03,920 and show you what the changes are, 366 00:20:03,920 --> 00:20:08,180 and how they would contribute to stabilizing the transcript 367 00:20:08,180 --> 00:20:10,610 and finishing the transcript up, to make 368 00:20:10,610 --> 00:20:13,580 it ready to leave the nucleus, to go out 369 00:20:13,580 --> 00:20:17,160 to the cytoplasm, where the machinery to translate proteins 370 00:20:17,160 --> 00:20:17,660 is. 371 00:20:17,660 --> 00:20:19,410 Because you want to remember the ribosomes 372 00:20:19,410 --> 00:20:22,060 that we're going to use on Friday aren't in the nucleus. 373 00:20:22,060 --> 00:20:23,790 They're in the cytoplasm. 374 00:20:23,790 --> 00:20:26,540 So there is a variety of events. 375 00:20:26,540 --> 00:20:28,655 There is what's known as 5 prime capping. 376 00:20:35,858 --> 00:20:39,500 So that is going to be a change. 377 00:20:39,500 --> 00:20:46,510 Let's just say this goes down. 378 00:20:46,510 --> 00:20:48,000 This is the base. 379 00:20:48,000 --> 00:20:54,600 Remember 1, 2, 3 prime, 4 prime, 5 prime. 380 00:20:54,600 --> 00:20:59,340 It's something that's happening to this end of the messenger 381 00:20:59,340 --> 00:21:01,600 RNA that stabilizes it. 382 00:21:01,600 --> 00:21:06,810 So 5 prime capping is important for one end. 383 00:21:06,810 --> 00:21:15,830 And then at the other end there is polyadenylation 384 00:21:15,830 --> 00:21:17,900 of the 3 prime end. 385 00:21:17,900 --> 00:21:20,810 OK, so let's just look at these one at a time. 386 00:21:20,810 --> 00:21:22,770 And let me convince you that they 387 00:21:22,770 --> 00:21:26,700 are important changes in the transcript that will preserve 388 00:21:26,700 --> 00:21:29,250 its identity, and in fact, give it a little bit more 389 00:21:29,250 --> 00:21:30,300 information. 390 00:21:30,300 --> 00:21:33,810 Because indeed, the 5 prime capping modification actually 391 00:21:33,810 --> 00:21:38,730 is a signal later on, when the transcript leaves the nucleus 392 00:21:38,730 --> 00:21:40,470 for protein translation. 393 00:21:40,470 --> 00:21:42,900 But in general, both of these changes 394 00:21:42,900 --> 00:21:48,060 mechanically protect the ends of the part of the gene 395 00:21:48,060 --> 00:21:50,460 that you're going to want to translate. 396 00:21:50,460 --> 00:21:52,560 They basically leave that piece of gene 397 00:21:52,560 --> 00:21:55,170 in the middle, where it's not going to be nibbled up, 398 00:21:55,170 --> 00:21:56,880 it's not going to be degraded. 399 00:21:56,880 --> 00:21:59,520 Because the biggest threat to the messenger 400 00:21:59,520 --> 00:22:03,638 RNA are things known as exonucleases. 401 00:22:07,770 --> 00:22:12,840 Exonucleases chew down nucleic acids from the two ends. 402 00:22:12,840 --> 00:22:13,680 All right. 403 00:22:13,680 --> 00:22:15,690 So let's look first, and it's kind 404 00:22:15,690 --> 00:22:20,025 of wild and crazy chemistry, at the 5 prime capping. 405 00:22:23,710 --> 00:22:25,730 And then we'll discuss the other two. 406 00:22:25,730 --> 00:22:28,300 And then I will finish these types of changes 407 00:22:28,300 --> 00:22:31,780 with splicing, which is really cool and extremely important. 408 00:22:31,780 --> 00:22:34,690 But the transcript, the pre-messenger RNA 409 00:22:34,690 --> 00:22:36,970 has to go through all of these steps 410 00:22:36,970 --> 00:22:39,670 before it's ready for nuclear export. 411 00:22:39,670 --> 00:22:42,490 So in 5 prime capping, you have this strand 412 00:22:42,490 --> 00:22:44,350 of pre-messenger RNA. 413 00:22:44,350 --> 00:22:45,700 Everything looks pretty happy. 414 00:22:45,700 --> 00:22:47,240 It's all in one piece. 415 00:22:47,240 --> 00:22:50,560 But the first thing that happens is three phosphates 416 00:22:50,560 --> 00:22:53,710 are added to that 5 prime OH group. 417 00:22:53,710 --> 00:22:56,080 So that's the start of this process. 418 00:22:56,080 --> 00:22:59,290 And this process actually happens 419 00:22:59,290 --> 00:23:03,160 while you're still transcribing the double-stranded DNA. 420 00:23:03,160 --> 00:23:04,510 It's actually already going on. 421 00:23:04,510 --> 00:23:08,230 As soon as that component of the newly transcribed 422 00:23:08,230 --> 00:23:10,600 pre-messenger RNA is made, things 423 00:23:10,600 --> 00:23:15,250 start happening at the 5 prime end to protect it. 424 00:23:15,250 --> 00:23:18,100 At that stage, then once those three phosphates 425 00:23:18,100 --> 00:23:23,020 are put on there, and nucleobase is added backwards. 426 00:23:23,020 --> 00:23:24,700 So there's a couple of functions. 427 00:23:24,700 --> 00:23:26,770 This all looks pretty strange. 428 00:23:26,770 --> 00:23:28,840 The rest of this still looks quite good. 429 00:23:28,840 --> 00:23:30,880 It looks fairly intact. 430 00:23:30,880 --> 00:23:34,170 But then the next thing that happens 431 00:23:34,170 --> 00:23:38,580 is that the guanine that's here is-- 432 00:23:38,580 --> 00:23:41,520 you do not have to remember this stuff, I couldn't remember it. 433 00:23:41,520 --> 00:23:44,700 I just want to show you how weird and different 434 00:23:44,700 --> 00:23:46,080 the 5 prime end looks. 435 00:23:46,080 --> 00:23:48,640 It doesn't look like a strand of RNA. 436 00:23:48,640 --> 00:23:51,210 So the guanine is methylated. 437 00:23:51,210 --> 00:23:54,270 And then a couple of the riboses, 438 00:23:54,270 --> 00:23:57,270 those sugars that have an OH usually at 2 prime, 439 00:23:57,270 --> 00:23:58,560 get methylated. 440 00:23:58,560 --> 00:24:02,610 So we've created this entire thing at the 5 prime end, 441 00:24:02,610 --> 00:24:06,060 known as the 5 prime cap that looks nothing 442 00:24:06,060 --> 00:24:08,650 like regular messenger RNA. 443 00:24:08,650 --> 00:24:11,760 And that protects that end of the messenger, 444 00:24:11,760 --> 00:24:16,040 and makes it safe from a variety of insults. 445 00:24:16,040 --> 00:24:16,920 So let's take a look. 446 00:24:16,920 --> 00:24:18,370 Why does it happen? 447 00:24:18,370 --> 00:24:21,150 So the first thing is it stops nuclease activity. 448 00:24:21,150 --> 00:24:23,500 Because the nuclease could look at it, and go, 449 00:24:23,500 --> 00:24:25,620 I don't recognize any of this happy mess. 450 00:24:25,620 --> 00:24:28,320 I'm not going to chop down this component. 451 00:24:28,320 --> 00:24:29,940 It's too foreign to me. 452 00:24:29,940 --> 00:24:31,980 So that's the primary thing. 453 00:24:31,980 --> 00:24:35,310 But then it's actually quite important for regulation 454 00:24:35,310 --> 00:24:38,170 of when that messenger needs to leave the nucleus. 455 00:24:38,170 --> 00:24:41,130 There are proteins involved in helping export 456 00:24:41,130 --> 00:24:44,860 the messenger RNA, when it's ready from the nucleus. 457 00:24:44,860 --> 00:24:48,990 So it's an important signal or recognition element. 458 00:24:48,990 --> 00:24:52,680 It marks this thing as a messenger RNA 459 00:24:52,680 --> 00:24:55,170 that's going to be important in translation. 460 00:24:55,170 --> 00:24:57,180 It gives it an identity. 461 00:24:57,180 --> 00:25:00,180 And then finally, it can actually in the next step 462 00:25:00,180 --> 00:25:01,860 promote translation. 463 00:25:01,860 --> 00:25:03,780 So all of these things are useful. 464 00:25:03,780 --> 00:25:07,230 So we do a number of these unusual transformations, 465 00:25:07,230 --> 00:25:09,420 but they all have a reason, and they all 466 00:25:09,420 --> 00:25:12,960 carried out in the nucleus purposed 467 00:25:12,960 --> 00:25:16,830 to protect the 5 prime cap of the messenger, 468 00:25:16,830 --> 00:25:19,860 and actually make it ready for its next tasks. 469 00:25:19,860 --> 00:25:21,030 All right? 470 00:25:21,030 --> 00:25:24,150 The next thing that happens is what's 471 00:25:24,150 --> 00:25:26,260 known as polyadenylation. 472 00:25:26,260 --> 00:25:27,120 So here's it. 473 00:25:27,120 --> 00:25:29,220 This would be the 5 prime cap. 474 00:25:29,220 --> 00:25:31,410 In the middle would be your gene that you're 475 00:25:31,410 --> 00:25:32,880 going to transcribe. 476 00:25:32,880 --> 00:25:36,450 But at the other end there is an enzyme that puts on a lot 477 00:25:36,450 --> 00:25:41,190 of adenine nucleotides, and basically adds to the other end 478 00:25:41,190 --> 00:25:41,985 a lot of A's. 479 00:25:45,420 --> 00:25:48,780 And when I say a lot, it can be hundreds. 480 00:25:48,780 --> 00:25:51,270 And it just promiscuously keeps on adenylating. 481 00:25:51,270 --> 00:25:53,700 And now what's this bit for? 482 00:25:53,700 --> 00:25:56,730 Once again, it protects from exonuclease activity 483 00:25:56,730 --> 00:25:59,040 from the other end of the strand. 484 00:25:59,040 --> 00:26:01,830 Because if you have exonuclease activity, 485 00:26:01,830 --> 00:26:04,620 you might start munching away at these As. 486 00:26:04,620 --> 00:26:07,680 But these aren't the important parts of the messenger, right? 487 00:26:07,680 --> 00:26:09,240 These are just add-ons. 488 00:26:09,240 --> 00:26:10,230 It's kind of a buffer. 489 00:26:10,230 --> 00:26:12,335 It's like something to do while you're waiting. 490 00:26:12,335 --> 00:26:13,710 Oh, I'm going to chew up some As. 491 00:26:13,710 --> 00:26:16,470 But you're not messing up the messenger RNA 492 00:26:16,470 --> 00:26:17,850 that you need at the end. 493 00:26:17,850 --> 00:26:20,790 So it's adding some dummy sequence 494 00:26:20,790 --> 00:26:24,360 that will be handy there. 495 00:26:24,360 --> 00:26:27,120 It contributes to stability. 496 00:26:27,120 --> 00:26:30,040 The tail is shortened over time, but it's non-coding, 497 00:26:30,040 --> 00:26:31,250 so that's OK. 498 00:26:31,250 --> 00:26:34,500 And actually though, when the tail is short enough, 499 00:26:34,500 --> 00:26:38,220 the polyadenine tail kind of acts as a bit of a timer. 500 00:26:38,220 --> 00:26:39,760 Because once the tail gets short, 501 00:26:39,760 --> 00:26:42,570 you start chewing into the transcript, 502 00:26:42,570 --> 00:26:45,930 and you basically end up with a degradation of the transcript. 503 00:26:45,930 --> 00:26:49,950 But it gives you time in the cytoplasm for the gene 504 00:26:49,950 --> 00:26:51,968 to be translated into protein. 505 00:26:51,968 --> 00:26:53,010 Does that all make sense? 506 00:26:53,010 --> 00:26:56,610 So it's basically like an egg timer, just watching, watching; 507 00:26:56,610 --> 00:26:57,660 back, back, back. 508 00:26:57,660 --> 00:27:00,960 OK, this transcript has been out here long enough. 509 00:27:00,960 --> 00:27:02,950 We've made all the protein we need. 510 00:27:02,950 --> 00:27:05,650 Now we're going to chew up the coding part. 511 00:27:05,650 --> 00:27:09,390 And then finally it's actually a good marker 512 00:27:09,390 --> 00:27:11,490 again for leaving the nucleus. 513 00:27:11,490 --> 00:27:15,330 And then finally it's kind of a cool tool for technology, 514 00:27:15,330 --> 00:27:26,420 because there's lots of interest in characterizing what's 515 00:27:26,420 --> 00:27:27,990 known as the transcriptome. 516 00:27:35,410 --> 00:27:39,010 OK, it's all well and good to characterize a genome. 517 00:27:39,010 --> 00:27:43,060 But it's a lot of work, 3.2 billion base pairs. 518 00:27:43,060 --> 00:27:46,210 And only a part of it, a tiny part of it, 519 00:27:46,210 --> 00:27:49,820 are parts of the genes that encode the proteins. 520 00:27:49,820 --> 00:27:51,610 So what you'd really want to know, 521 00:27:51,610 --> 00:27:54,070 if you want to analyze the genes that 522 00:27:54,070 --> 00:27:57,450 are going to become the proteins and maybe look for defects, 523 00:27:57,450 --> 00:28:00,770 is look at the things that are going to be transcribed. 524 00:28:00,770 --> 00:28:03,340 So you can use the fact that there 525 00:28:03,340 --> 00:28:07,630 is this poly-A tail on the transcripts that 526 00:28:07,630 --> 00:28:11,770 are going to leave the nucleus with something that 527 00:28:11,770 --> 00:28:13,810 will pull them out of the mix. 528 00:28:13,810 --> 00:28:14,860 What would I use? 529 00:28:14,860 --> 00:28:21,370 Let's say I've got a resin bead, polystyrene 530 00:28:21,370 --> 00:28:23,920 or some favorable polymer. 531 00:28:23,920 --> 00:28:29,350 And I can attach covalently nucleic acids to that bead. 532 00:28:29,350 --> 00:28:33,580 What would I add there to fish this lot out? 533 00:28:33,580 --> 00:28:34,910 Yeah, up here? 534 00:28:34,910 --> 00:28:35,422 Yeah. 535 00:28:35,422 --> 00:28:36,538 AUDIENCE: [INAUDIBLE] 536 00:28:36,538 --> 00:28:38,080 BARBARA IMPERIALI: Yeah, a lot of Ts. 537 00:28:38,080 --> 00:28:39,180 So I'm going to make this. 538 00:28:39,180 --> 00:28:44,630 I'm going to put on a bunch of Ts, and I can do this a lot. 539 00:28:44,630 --> 00:28:46,720 This is done very, very commonly. 540 00:28:46,720 --> 00:28:49,090 And then I'm just going to fish out 541 00:28:49,090 --> 00:28:54,460 everything that is part of the transcriptome, not the genome. 542 00:28:54,460 --> 00:28:58,240 So I've got a much smaller job then to find out errors 543 00:28:58,240 --> 00:29:00,700 in the genes that encode the proteins, 544 00:29:00,700 --> 00:29:03,950 than if I was going to start with the entire genome. 545 00:29:03,950 --> 00:29:06,250 So pretty cool tool, and later on we're 546 00:29:06,250 --> 00:29:08,330 going to see how this can be used. 547 00:29:08,330 --> 00:29:11,290 It's actually used in concert with an interesting enzyme 548 00:29:11,290 --> 00:29:14,800 that comes from viruses called reverse transcriptase. 549 00:29:14,800 --> 00:29:16,480 But that's a story for a later day. 550 00:29:16,480 --> 00:29:18,370 But it is a cool story. 551 00:29:18,370 --> 00:29:24,430 Now the last thing that we do in the nucleus 552 00:29:24,430 --> 00:29:26,740 is arguably the most important. 553 00:29:26,740 --> 00:29:27,715 And that is splicing. 554 00:29:32,440 --> 00:29:37,490 OK, all right. 555 00:29:37,490 --> 00:29:41,250 Once again, this is a picture with a lot of moving parts. 556 00:29:41,250 --> 00:29:44,360 But I want to convey to you the point of splicing, 557 00:29:44,360 --> 00:29:47,750 as opposed to you knowing all the little details. 558 00:29:47,750 --> 00:29:53,420 It was noticed for a long time that the messenger didn't 559 00:29:53,420 --> 00:29:55,970 always correspond to the original transcript. 560 00:29:55,970 --> 00:29:59,020 And this is the case eukaryotes, not in bacteria. 561 00:29:59,020 --> 00:30:01,340 But that there was quite a lot of processing 562 00:30:01,340 --> 00:30:06,560 done, not just to cap the ends, but by cutting out 563 00:30:06,560 --> 00:30:18,820 chunks of the transcript, so removal of segments 564 00:30:18,820 --> 00:30:19,780 of the transcript. 565 00:30:23,840 --> 00:30:26,060 And a lot of the seminal work was 566 00:30:26,060 --> 00:30:29,120 done by Phillip Sharp, who is a member of our faculty 567 00:30:29,120 --> 00:30:30,380 in the Biology Department. 568 00:30:30,380 --> 00:30:32,030 So we're very proud of this. 569 00:30:32,030 --> 00:30:37,680 This was actually the topic of a Nobel Prize in the '90s. 570 00:30:37,680 --> 00:30:43,010 And so what was noticed is that if you had a gene, and let's 571 00:30:43,010 --> 00:30:53,220 just put it here, 5 prime to 3 prime. 572 00:30:53,220 --> 00:30:54,720 And I'm going to name these. 573 00:30:54,720 --> 00:30:56,910 And then I'll explain to you what they are. 574 00:30:56,910 --> 00:31:00,210 These would be called exons. 575 00:31:00,210 --> 00:31:01,546 These would be called intros. 576 00:31:04,940 --> 00:31:07,730 And this is another exon. 577 00:31:07,730 --> 00:31:10,730 Outside, inside; that's the way to remember them. 578 00:31:10,730 --> 00:31:13,330 And what happens in splicing is that the introns 579 00:31:13,330 --> 00:31:14,710 are chopped out. 580 00:31:14,710 --> 00:31:17,300 OK, so your gene then ends up being, 581 00:31:17,300 --> 00:31:20,240 if this is exon 1 and exon 2, not 582 00:31:20,240 --> 00:31:22,490 represented by this entire thing, 583 00:31:22,490 --> 00:31:24,890 but eliminating that middle component. 584 00:31:38,185 --> 00:31:42,550 And remember all the time we have the cap at 5, 585 00:31:42,550 --> 00:31:49,470 we have the poly-A tail ad infinitum, the 3 prime end. 586 00:31:49,470 --> 00:31:51,850 So we've just changed the middle. 587 00:31:51,850 --> 00:31:55,780 And it was recognized through bioinformatics analysis that 588 00:31:55,780 --> 00:31:59,830 basically noticed the pieces that was chopped out 589 00:31:59,830 --> 00:32:02,430 and the pieces that ended up being joined. 590 00:32:02,430 --> 00:32:05,440 And protein splicing occurs as a sequence 591 00:32:05,440 --> 00:32:09,940 of events that ends you up with the protein being spliced 592 00:32:09,940 --> 00:32:13,000 together through a series of rearrangements 593 00:32:13,000 --> 00:32:18,320 that occur on the structure of the messenger RNA. 594 00:32:18,320 --> 00:32:20,950 So there is an internal rearrangement. 595 00:32:20,950 --> 00:32:22,870 A new phosphate is made. 596 00:32:22,870 --> 00:32:25,030 And then there's another rearrangement, 597 00:32:25,030 --> 00:32:27,790 where there's a new bond made between one 598 00:32:27,790 --> 00:32:34,590 end of the pre-messenger RNA, and the other end 599 00:32:34,590 --> 00:32:38,800 of the pre-messenger RNA, to give you 600 00:32:38,800 --> 00:32:43,340 the exon rejoined, ready to be read in translation. 601 00:32:43,340 --> 00:32:45,020 Now why is this so important? 602 00:32:45,020 --> 00:32:47,350 So I want to go here to a few numbers 603 00:32:47,350 --> 00:32:49,240 that I think are very pertinent, which 604 00:32:49,240 --> 00:32:50,830 I had to drill my colleagues for, 605 00:32:50,830 --> 00:32:52,430 because I didn't know all the numbers. 606 00:32:52,430 --> 00:32:55,450 So when the genome was sequenced, 607 00:32:55,450 --> 00:32:58,270 we were a bit stunned, because the number 608 00:32:58,270 --> 00:33:00,760 of coding things that are genes that 609 00:33:00,760 --> 00:33:02,740 are going to be turned into proteins 610 00:33:02,740 --> 00:33:04,990 was much smaller than we anticipated. 611 00:33:04,990 --> 00:33:08,830 For man, it's currently at about 20,000 genes. 612 00:33:08,830 --> 00:33:14,620 That's 20,000 mature transcripts that could make proteins. 613 00:33:14,620 --> 00:33:15,220 Fly? 614 00:33:15,220 --> 00:33:19,400 Not much smaller, arguably they're pretty smart, 16,000. 615 00:33:19,400 --> 00:33:22,300 Yeast 6,000, a bacteria at 4,000. 616 00:33:22,300 --> 00:33:25,600 There doesn't seem to be enough of a difference. 617 00:33:25,600 --> 00:33:28,900 But a huge amount of diversity is 618 00:33:28,900 --> 00:33:33,490 introduced into the transcripts by different splicing events. 619 00:33:33,490 --> 00:33:35,170 Because if you make one transcript, 620 00:33:35,170 --> 00:33:37,970 that would count as one gene. 621 00:33:37,970 --> 00:33:41,050 But if there is a lot of opportunities for difference 622 00:33:41,050 --> 00:33:44,530 splicing activity, you can piece together 623 00:33:44,530 --> 00:33:48,860 new transcripts that will encode proteins differently. 624 00:33:48,860 --> 00:33:52,720 And I think this next slide will show you 625 00:33:52,720 --> 00:33:57,270 an example of a transcript that has several different introns 626 00:33:57,270 --> 00:33:58,630 and exons. 627 00:33:58,630 --> 00:34:00,940 So you can see here across, there's 628 00:34:00,940 --> 00:34:06,100 a blue, green, red, another blue, and an orange exon. 629 00:34:06,100 --> 00:34:09,280 But depending on which pieces are spliced out, 630 00:34:09,280 --> 00:34:11,500 we'll have different proteins end up. 631 00:34:11,500 --> 00:34:14,409 So it could be a way, for example, 632 00:34:14,409 --> 00:34:17,800 to think of a practical use, to have a protein that is either 633 00:34:17,800 --> 00:34:21,100 secreted as a soluble protein, or left 634 00:34:21,100 --> 00:34:24,620 in the membrane with a membrane association domain, 635 00:34:24,620 --> 00:34:28,360 or left in the cytoplasm because it has no way to be secreted. 636 00:34:28,360 --> 00:34:30,580 So it could be a way to make three proteins that 637 00:34:30,580 --> 00:34:33,280 are in completely different places of a cell, 638 00:34:33,280 --> 00:34:35,260 or have different functions. 639 00:34:35,260 --> 00:34:38,630 It's also very important in tissues, 640 00:34:38,630 --> 00:34:41,949 because we splice proteins differently in muscle, 641 00:34:41,949 --> 00:34:45,190 or liver, or heart to achieve different outcomes 642 00:34:45,190 --> 00:34:46,989 in the proteins that we make. 643 00:34:46,989 --> 00:34:50,620 So basically, it's a source of huge diversity, 644 00:34:50,620 --> 00:34:53,020 and it gives us the genetic diversity 645 00:34:53,020 --> 00:34:56,500 that means this 20,000 can be a much bigger number. 646 00:34:56,500 --> 00:34:58,000 And then with the post-translational 647 00:34:58,000 --> 00:35:02,560 modifications I talk about when we come back later on, 648 00:35:02,560 --> 00:35:07,590 you'll see we've got lots of ways to diversify that 20,000. 649 00:35:07,590 --> 00:35:10,490 That mean these aren't literal numbers. 650 00:35:10,490 --> 00:35:11,470 They're quite varied. 651 00:35:11,470 --> 00:35:15,560 But what you want to remember is E. coli doesn't do splicing. 652 00:35:15,560 --> 00:35:17,320 There's no opportunity there. 653 00:35:17,320 --> 00:35:19,540 It's much more limited in yeast, so is 654 00:35:19,540 --> 00:35:21,470 the post-translational modification. 655 00:35:21,470 --> 00:35:24,010 So these numbers settle out quite differently 656 00:35:24,010 --> 00:35:26,200 from what they look like by looking directly 657 00:35:26,200 --> 00:35:28,150 at those numbers. 658 00:35:28,150 --> 00:35:29,990 And I wanted to give you an example. 659 00:35:29,990 --> 00:35:31,530 You don't have this in your slides, 660 00:35:31,530 --> 00:35:33,280 but I was thinking about it the other day. 661 00:35:33,280 --> 00:35:36,970 A colleague of mine, Professor Pentelute in chemistry, 662 00:35:36,970 --> 00:35:39,970 works on trying to reprogram genes 663 00:35:39,970 --> 00:35:44,590 to overcome a disease known as Duchenne muscular dystrophy. 664 00:35:44,590 --> 00:35:47,020 It's another genetic disorder. 665 00:35:47,020 --> 00:35:48,550 It's X-linked. 666 00:35:48,550 --> 00:35:52,180 So it's much more serious in males than in females. 667 00:35:52,180 --> 00:35:54,550 Because if you only have one bad copy of the gene, 668 00:35:54,550 --> 00:35:58,300 you can do OK with this disease. 669 00:35:58,300 --> 00:36:01,180 Whereas in the male, you'd only have 670 00:36:01,180 --> 00:36:04,880 the component of the gene that's on the X chromosome. 671 00:36:04,880 --> 00:36:10,130 And it's a defect in RNA splicing, so directly there. 672 00:36:10,130 --> 00:36:14,230 So this is a biopsy of muscle where 673 00:36:14,230 --> 00:36:17,950 red would be good muscle cells, whereas the white would 674 00:36:17,950 --> 00:36:22,570 be fat cells. 675 00:36:22,570 --> 00:36:25,563 And they actually weaken the integrity of the muscle, 676 00:36:25,563 --> 00:36:26,980 because the muscle doesn't develop 677 00:36:26,980 --> 00:36:30,890 to have all the well-defined muscle cells that 678 00:36:30,890 --> 00:36:33,620 are important for muscle tensile strength, 679 00:36:33,620 --> 00:36:35,900 and contractility, and everything else that muscles 680 00:36:35,900 --> 00:36:36,500 do. 681 00:36:36,500 --> 00:36:40,250 And it's all related to a protein known as dystrophin, 682 00:36:40,250 --> 00:36:41,840 which is a huge protein. 683 00:36:41,840 --> 00:36:44,000 And it's a critical structural protein 684 00:36:44,000 --> 00:36:47,570 that's important to maintain the cellular membranes 685 00:36:47,570 --> 00:36:48,810 within the muscle cells. 686 00:36:48,810 --> 00:36:53,000 So if dystrophin is no good, the cell membrane integrity 687 00:36:53,000 --> 00:36:53,900 is no good. 688 00:36:53,900 --> 00:36:55,790 And you just end up with losing the muscle 689 00:36:55,790 --> 00:37:00,320 cells, a cost of replacing them with fat cells in the muscle. 690 00:37:00,320 --> 00:37:03,080 And here's a really amazing number. 691 00:37:03,080 --> 00:37:07,730 The gene that encodes dystrophin has 79 exons. 692 00:37:07,730 --> 00:37:10,520 So you can picture the opportunities for things 693 00:37:10,520 --> 00:37:14,120 to go wrong, and it's actually a defect in the place 694 00:37:14,120 --> 00:37:15,710 where the splicing happens. 695 00:37:15,710 --> 00:37:18,330 So a splicing event cannot happen. 696 00:37:18,330 --> 00:37:22,250 And so the protein at the end of the day is not good. 697 00:37:22,250 --> 00:37:25,130 So there's a lot of efforts going on, 698 00:37:25,130 --> 00:37:30,680 antisense efforts, and even other gene therapy efforts, 699 00:37:30,680 --> 00:37:33,980 and also nowadays there's obviously a large focus 700 00:37:33,980 --> 00:37:35,900 on CRISPR-based gene editing. 701 00:37:35,900 --> 00:37:39,170 But remember this is a serious debilitating 702 00:37:39,170 --> 00:37:41,480 disease of all muscle tissue. 703 00:37:41,480 --> 00:37:44,528 It starts to be noticed when the babies are toddlers, 704 00:37:44,528 --> 00:37:46,070 because that's really when they start 705 00:37:46,070 --> 00:37:48,210 to engage muscle strength. 706 00:37:48,210 --> 00:37:51,530 So around the age of four it's actually noticed. 707 00:37:51,530 --> 00:37:55,010 And then the life expectancy is in the 30 708 00:37:55,010 --> 00:37:57,500 to 40-years-old range. 709 00:37:57,500 --> 00:37:59,870 But it's also a terrible lifestyle, 710 00:37:59,870 --> 00:38:02,510 because the lungs don't work. 711 00:38:02,510 --> 00:38:05,030 So many things rely on muscle strength. 712 00:38:05,030 --> 00:38:06,380 OK, good. 713 00:38:06,380 --> 00:38:09,260 All right, so I believe that's the end of that. 714 00:38:09,260 --> 00:38:12,770 But I want to give you an introduction to translation, 715 00:38:12,770 --> 00:38:15,860 so you'll see what we have in store for Friday. 716 00:38:15,860 --> 00:38:18,230 And just get you back to this picture, 717 00:38:18,230 --> 00:38:19,710 we've done everything we need. 718 00:38:19,710 --> 00:38:21,800 We've done all the processing. 719 00:38:21,800 --> 00:38:25,940 We're here, and then the mature messenger RNA 720 00:38:25,940 --> 00:38:30,080 can finally leave the nucleus to be ready for translation. 721 00:38:30,080 --> 00:38:32,300 And the cap-- there is a complex that 722 00:38:32,300 --> 00:38:38,150 binds the 5 prime cap that's tasked 723 00:38:38,150 --> 00:38:45,020 with helping export that transcript 724 00:38:45,020 --> 00:38:48,500 through the nuclear pore, which is a fairly large opening, 725 00:38:48,500 --> 00:38:51,070 outside into the cytoplasm. 726 00:38:51,070 --> 00:38:52,820 And I want to show you one other thing 727 00:38:52,820 --> 00:38:56,900 and I hope I can get this reliably, 728 00:38:56,900 --> 00:38:59,600 because with this I'm going to actually show you 729 00:38:59,600 --> 00:39:01,370 some of the moving parts. 730 00:39:01,370 --> 00:39:03,090 There's a little thing on the sidebar 731 00:39:03,090 --> 00:39:07,300 now, called a short translation. 732 00:39:07,300 --> 00:39:11,580 And so what you're seeing here, the black line 733 00:39:11,580 --> 00:39:13,270 is the messenger RNA. 734 00:39:13,270 --> 00:39:15,300 OK, so that's the thing that's finally out 735 00:39:15,300 --> 00:39:17,580 there in the cytoplasm ready. 736 00:39:17,580 --> 00:39:20,460 The pale green and yellow are the two components 737 00:39:20,460 --> 00:39:23,700 of the ribosome, which is a soluble organelle that 738 00:39:23,700 --> 00:39:28,290 assembles on the double-stranded DNA based on a few cues. 739 00:39:28,290 --> 00:39:33,160 The dark blue components are transfer RNAs 740 00:39:33,160 --> 00:39:35,760 that bring in new amino acids. 741 00:39:35,760 --> 00:39:38,220 And what you see threading out here 742 00:39:38,220 --> 00:39:42,540 is a new protein strand that is being translated. 743 00:39:42,540 --> 00:39:46,320 This particular translation is to make a protein that becomes 744 00:39:46,320 --> 00:39:47,910 membrane bound and secreted. 745 00:39:47,910 --> 00:39:50,440 So I'm not going to go further than that. 746 00:39:50,440 --> 00:39:53,010 But there's a lot of those light blue proteins 747 00:39:53,010 --> 00:39:56,070 that are actually helping escort the transfer 748 00:39:56,070 --> 00:40:00,420 RNAs to where they're needed to continue building the proteins. 749 00:40:00,420 --> 00:40:02,700 So we've seen the messenger. 750 00:40:02,700 --> 00:40:04,890 We'll learn more about the ribosome, 751 00:40:04,890 --> 00:40:08,220 and will learn about the transfer RNAs. 752 00:40:08,220 --> 00:40:12,720 And you can also watch this to your heart's content. 753 00:40:12,720 --> 00:40:15,600 I just find it just a useful simple cartoon 754 00:40:15,600 --> 00:40:18,840 that tells us a lot. 755 00:40:18,840 --> 00:40:21,720 So what I really want to encourage you to do 756 00:40:21,720 --> 00:40:26,040 is read this section 14.5, because the next parts will 757 00:40:26,040 --> 00:40:28,240 make a great deal more sense. 758 00:40:28,240 --> 00:40:33,640 So what I want to do though now is first of all, 759 00:40:33,640 --> 00:40:37,900 start with describing the players in translation, 760 00:40:37,900 --> 00:40:40,380 and just the way we did for replication, 761 00:40:40,380 --> 00:40:43,630 we're going to systematically ascribe the importance 762 00:40:43,630 --> 00:40:45,430 to each of these players. 763 00:40:45,430 --> 00:40:48,220 So what you see, there are molecular players 764 00:40:48,220 --> 00:40:49,750 and there are key steps. 765 00:40:49,750 --> 00:40:52,060 So the molecular players are shown here. 766 00:40:52,060 --> 00:40:56,010 So far, we've focused very strongly on the messenger RNA. 767 00:40:56,010 --> 00:40:57,010 You know all about that. 768 00:40:57,010 --> 00:40:58,150 You know where it's from. 769 00:40:58,150 --> 00:41:00,760 You know it's stable in the cytoplasm for a while. 770 00:41:00,760 --> 00:41:03,910 You know it's got signals that tell the complexes where 771 00:41:03,910 --> 00:41:05,800 to go down. 772 00:41:05,800 --> 00:41:09,190 Then we need transfer RNAs and amino acids, 773 00:41:09,190 --> 00:41:11,020 and we need the ribosomes. 774 00:41:11,020 --> 00:41:14,890 And ribosomes are made of a composite of nucleic acid 775 00:41:14,890 --> 00:41:17,350 and protein wrapped together. 776 00:41:17,350 --> 00:41:19,660 There's quite a big difference in prokaryotes 777 00:41:19,660 --> 00:41:21,700 and eukaryotic ribosomes, and I'll 778 00:41:21,700 --> 00:41:23,560 mention that when we get to it. 779 00:41:23,560 --> 00:41:26,590 But I want to focus for a few moments, first of all, 780 00:41:26,590 --> 00:41:32,050 on the landmarks that got us to finally in 2009 the structure 781 00:41:32,050 --> 00:41:32,920 of the ribosome. 782 00:41:32,920 --> 00:41:35,500 It's a cool development for everyone. 783 00:41:35,500 --> 00:41:38,860 But also where we came from in the '60s, when 784 00:41:38,860 --> 00:41:42,430 Crick and Brenner finally cracked the genetic code, 785 00:41:42,430 --> 00:41:45,610 and found out that three bases on that messenger 786 00:41:45,610 --> 00:41:49,240 RNA will encode each new amino acid 787 00:41:49,240 --> 00:41:51,790 that gets put into your protein sequence. 788 00:41:51,790 --> 00:41:57,400 So I want to introduce you to the transfer RNAs that 789 00:41:57,400 --> 00:42:00,850 are really the most important element 790 00:42:00,850 --> 00:42:02,200 to start thinking about. 791 00:42:08,110 --> 00:42:14,350 And you can call these guys the decoders, 792 00:42:14,350 --> 00:42:17,410 because what the transfer RNA does 793 00:42:17,410 --> 00:42:22,630 is it can carry an amino acid on one end of its nucleic acid. 794 00:42:22,630 --> 00:42:25,360 But on one of the other loops within the transfer 795 00:42:25,360 --> 00:42:28,750 RNA is what's known as an anticodon that 796 00:42:28,750 --> 00:42:32,610 recognizes a codon in the messenger RNA, 797 00:42:32,610 --> 00:42:36,890 and basically prescribes what amino acids get put in. 798 00:42:36,890 --> 00:42:40,330 So it was always a wonder how you went from the nucleic acid 799 00:42:40,330 --> 00:42:42,770 world to the protein world. 800 00:42:42,770 --> 00:42:45,960 The answer is you use a nucleic acid 801 00:42:45,960 --> 00:42:48,310 that you can load with an amino acid, 802 00:42:48,310 --> 00:42:52,390 but it can also recognize the messenger RNA that codes 803 00:42:52,390 --> 00:42:54,460 for the protein sequence. 804 00:42:54,460 --> 00:42:57,550 And that's why I like to think of them as decoders. 805 00:42:57,550 --> 00:43:04,870 But what this picture of the RNA shows you, 806 00:43:04,870 --> 00:43:07,470 is the beautiful structure of RNA, 807 00:43:07,470 --> 00:43:11,050 where at one end is where an amino acid would be attached. 808 00:43:11,050 --> 00:43:14,080 There's this cool kind of cloverleaf structure. 809 00:43:14,080 --> 00:43:17,200 And then at the other end is the anticodon loop 810 00:43:17,200 --> 00:43:20,960 that will actually recognize your messenger RNA. 811 00:43:20,960 --> 00:43:22,330 So I'm going to stop now. 812 00:43:22,330 --> 00:43:25,840 But I really want to encourage you, just skim 813 00:43:25,840 --> 00:43:28,300 through that small section. 814 00:43:28,300 --> 00:43:31,120 It'll make a lot more sense on Friday. 815 00:43:31,120 --> 00:43:34,470 I've been waiting to talk to you about protein translation 816 00:43:34,470 --> 00:43:35,890 since we started this class. 817 00:43:35,890 --> 00:43:38,780 But it'll make a lot more sense. 818 00:43:38,780 --> 00:43:42,400 And there's some super cool initiatives in chemical biology 819 00:43:42,400 --> 00:43:46,270 now, where people have been able to completely hijack 820 00:43:46,270 --> 00:43:48,820 protein translation, and not put in just 821 00:43:48,820 --> 00:43:52,000 20 regular boring amino acids, but actually 822 00:43:52,000 --> 00:43:54,770 put in all kinds of other amino acids. 823 00:43:54,770 --> 00:43:59,140 So we understand the system well enough to manipulate it. 824 00:43:59,140 --> 00:44:02,380 And this really hearkens to the Feynman quotes, 825 00:44:02,380 --> 00:44:04,780 "If you can build it, you can understand it." 826 00:44:04,780 --> 00:44:07,600 That's the level to which we understand translation 827 00:44:07,600 --> 00:44:08,500 nowadays. 828 00:44:08,500 --> 00:44:10,500 That's it for today.