1 00:00:00,500 --> 00:00:03,270 The following content is provided under a Creative 2 00:00:03,270 --> 00:00:04,630 Commons license. 3 00:00:04,630 --> 00:00:07,140 Your support will help MIT OpenCourseWare 4 00:00:07,140 --> 00:00:11,470 continue to offer high quality educational resources for free. 5 00:00:11,470 --> 00:00:14,100 To make a donation or view additional materials 6 00:00:14,100 --> 00:00:18,050 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,050 --> 00:00:19,000 at ocw.mit.edu. 8 00:00:24,317 --> 00:00:27,770 SHIVA MANDALA: And so just as an overview, today 9 00:00:27,770 --> 00:00:30,340 we're going to be talking about different techniques that 10 00:00:30,340 --> 00:00:32,949 are used to determine protein structure. 11 00:00:32,949 --> 00:00:35,240 We're talking a little bit about the protein data bank, 12 00:00:35,240 --> 00:00:36,650 or the PDB. 13 00:00:36,650 --> 00:00:40,700 And then the latter half of the recitation will be on-- 14 00:00:40,700 --> 00:00:43,220 we'll be doing a worksheet to look 15 00:00:43,220 --> 00:00:46,740 at the structure of ubiquitin and di-ubiqiotin using PyMOL. 16 00:00:46,740 --> 00:00:50,670 So that's just to get you familiar with using PyMOL. 17 00:00:50,670 --> 00:00:53,840 And so for the first question I'd like to pose to you 18 00:00:53,840 --> 00:00:56,690 is why should we determine protein structure, what 19 00:00:56,690 --> 00:00:59,420 we can learn from determining protein structure? 20 00:00:59,420 --> 00:01:03,120 And so I look to you for answers, a lot of answers. 21 00:01:03,120 --> 00:01:04,760 But does anybody have any ideas? 22 00:01:08,232 --> 00:01:09,720 Yeah? 23 00:01:09,720 --> 00:01:10,712 AUDIENCE: [INAUDIBLE] 24 00:01:10,712 --> 00:01:13,324 SHIVA MANDALA: Yeah, absolutely. 25 00:01:13,324 --> 00:01:14,990 Knowing putting structures does help you 26 00:01:14,990 --> 00:01:16,700 determine enzyme mechanisms. 27 00:01:16,700 --> 00:01:18,947 Anything else? 28 00:01:18,947 --> 00:01:20,863 AUDIENCE: Structure can indicate function. 29 00:01:20,863 --> 00:01:21,821 SHIVA MANDALA: Structure what? 30 00:01:21,821 --> 00:01:22,385 AUDIENCE: Can indicate function. 31 00:01:22,385 --> 00:01:24,051 SHIVA MANDALA: Yes, absolutely structure 32 00:01:24,051 --> 00:01:25,220 does indicate function. 33 00:01:25,220 --> 00:01:28,878 Can you be a bit more specific with respect to that? 34 00:01:28,878 --> 00:01:33,189 AUDIENCE: [INAUDIBLE] 35 00:01:33,189 --> 00:01:34,600 SHIVA MANDALA: Yeah, absolutely. 36 00:01:34,600 --> 00:01:38,060 Yeah you can determine active site of enzymes. 37 00:01:38,060 --> 00:01:40,000 Any other ideas? 38 00:01:40,000 --> 00:01:41,300 Still a lot more to go. 39 00:01:44,300 --> 00:01:46,990 What about can you learn interactions 40 00:01:46,990 --> 00:01:48,280 with other macro molecules? 41 00:01:50,810 --> 00:01:51,670 OK. 42 00:01:51,670 --> 00:01:53,380 Well, let's just go through some of them. 43 00:01:53,380 --> 00:01:55,529 So yes, structure does, indeed, determine function. 44 00:01:55,529 --> 00:01:57,820 And the idea is if you know the structure of a protein, 45 00:01:57,820 --> 00:02:00,980 you can learn a lot about its function in vivo. 46 00:02:00,980 --> 00:02:03,650 And so some of the things that you can study 47 00:02:03,650 --> 00:02:05,570 is you can study enzyme mechanisms. 48 00:02:05,570 --> 00:02:07,840 It's relatively hard to do in x-ray crystallography, 49 00:02:07,840 --> 00:02:10,449 because every time you solve a crystal structure, 50 00:02:10,449 --> 00:02:13,640 it's only one snapshot of the enzyme. 51 00:02:13,640 --> 00:02:15,490 But you can definitely do it. 52 00:02:15,490 --> 00:02:18,372 You can design drugs or substrate that bind 53 00:02:18,372 --> 00:02:20,830 to the protein if you know what the active site looks like. 54 00:02:20,830 --> 00:02:23,650 You can design a high affinity inhibitor, 55 00:02:23,650 --> 00:02:27,160 and this is used in the pharmaceutical industry a lot. 56 00:02:27,160 --> 00:02:29,560 You can study translation and transcription, which 57 00:02:29,560 --> 00:02:32,260 is what we'll be learning about in class this week, 58 00:02:32,260 --> 00:02:34,550 and we'll be learning next week, as well. 59 00:02:34,550 --> 00:02:39,100 You can also make co-crystals of proteins with other proteins 60 00:02:39,100 --> 00:02:42,490 and nucleic acids and study interactions between macro 61 00:02:42,490 --> 00:02:43,580 molecules. 62 00:02:43,580 --> 00:02:46,170 And this is something that has been emerging. 63 00:02:46,170 --> 00:02:48,340 It makes it harder to solve a crystal structure, 64 00:02:48,340 --> 00:02:54,139 but x-ray crystallography is a very powerful tool for this. 65 00:02:54,139 --> 00:02:55,930 You can also study immune system functions. 66 00:02:55,930 --> 00:02:57,971 So this is more on the biological side of things. 67 00:02:57,971 --> 00:03:00,250 You can study host-pathogen receptors 68 00:03:00,250 --> 00:03:01,420 and their interactions. 69 00:03:01,420 --> 00:03:02,320 And many, many more. 70 00:03:02,320 --> 00:03:04,750 Really there's no reason why you shouldn't 71 00:03:04,750 --> 00:03:07,770 have a structure for whatever protein you're studying. 72 00:03:07,770 --> 00:03:12,310 And so the key idea is that if you know the protein structure, 73 00:03:12,310 --> 00:03:14,950 that allows you to carry out biochemical studies. 74 00:03:14,950 --> 00:03:17,590 But then also if you determine the structure, 75 00:03:17,590 --> 00:03:19,157 that can help rationalize results 76 00:03:19,157 --> 00:03:20,740 that you get from biochemical studies. 77 00:03:20,740 --> 00:03:22,810 So it goes both ways. 78 00:03:22,810 --> 00:03:24,760 And the final point that is not perhaps not 79 00:03:24,760 --> 00:03:26,343 emphasized that much is that structure 80 00:03:26,343 --> 00:03:27,710 is a result of sequence. 81 00:03:27,710 --> 00:03:29,590 And ideally, what we would like to know 82 00:03:29,590 --> 00:03:33,640 is if we know the primary amino acid sequence of a polypeptide, 83 00:03:33,640 --> 00:03:36,250 we'd like to be able to predict its complete three 84 00:03:36,250 --> 00:03:38,420 dimensional fold. 85 00:03:38,420 --> 00:03:40,825 And that's sort of the idea behind protein folding. 86 00:03:40,825 --> 00:03:43,927 But we're not quite at that stage yet computationally. 87 00:03:43,927 --> 00:03:45,760 And so that's why we need experimental data. 88 00:03:45,760 --> 00:03:48,040 But that's something we're moving forward, 89 00:03:48,040 --> 00:03:51,270 moving towards as a science. 90 00:03:51,270 --> 00:03:52,840 And so just to go over, I'm going 91 00:03:52,840 --> 00:03:55,180 to cover the three main techniques in protein structure 92 00:03:55,180 --> 00:03:56,510 determination. 93 00:03:56,510 --> 00:03:59,420 And so the most common one is X-ray diffraction. 94 00:03:59,420 --> 00:04:02,710 And the I'll go over some of the details of X-ray later. 95 00:04:02,710 --> 00:04:05,975 I'll be focusing on X-ray diffraction in this talk. 96 00:04:05,975 --> 00:04:09,490 Some of the selling points of X-ray diffraction are that you 97 00:04:09,490 --> 00:04:11,500 can study a protein of any size-- 98 00:04:11,500 --> 00:04:12,940 small proteins, large proteins. 99 00:04:12,940 --> 00:04:15,520 You can study complexes of proteins. 100 00:04:15,520 --> 00:04:18,010 There is a need to crystallize your sample, which 101 00:04:18,010 --> 00:04:18,950 makes it challenging. 102 00:04:18,950 --> 00:04:20,740 It's very hard to crystallize proteins. 103 00:04:20,740 --> 00:04:23,717 In your body, no protein is crystallized. 104 00:04:23,717 --> 00:04:26,050 So you're trying to make proteins do something that they 105 00:04:26,050 --> 00:04:27,840 don't really like doing. 106 00:04:27,840 --> 00:04:31,990 You can obtain a high resolution structure. 107 00:04:31,990 --> 00:04:34,910 So a resolution of two Angstroms or even less 108 00:04:34,910 --> 00:04:38,530 is pretty common for good protein X-ray structures. 109 00:04:38,530 --> 00:04:41,340 But then also it's difficult to observe dynamics. 110 00:04:41,340 --> 00:04:43,510 So an X-ray structure is just a snapshot 111 00:04:43,510 --> 00:04:46,220 of the protein at a specific time. 112 00:04:46,220 --> 00:04:48,700 And so you need often a series of X-ray structures 113 00:04:48,700 --> 00:04:50,830 to really learn something about the mechanism 114 00:04:50,830 --> 00:04:52,690 or the dynamics of these proteins. 115 00:04:55,125 --> 00:04:56,500 The second most popular technique 116 00:04:56,500 --> 00:04:59,700 is nuclear magnetic resonance, or NMR. 117 00:04:59,700 --> 00:05:02,630 NMR is typically used to study small proteins. 118 00:05:02,630 --> 00:05:05,380 And the reason for this is that if you look at-- 119 00:05:05,380 --> 00:05:07,720 so this is a 2D NMR spectra. 120 00:05:07,720 --> 00:05:09,610 If you look at the 2D NMR spectra, 121 00:05:09,610 --> 00:05:11,230 there's a lot of peaks, right? 122 00:05:11,230 --> 00:05:14,260 And the more residues you have, the more amino acids you have, 123 00:05:14,260 --> 00:05:16,770 the more peaks you have and it gets very crowded. 124 00:05:16,770 --> 00:05:19,210 And so that's the limiting factor 125 00:05:19,210 --> 00:05:22,290 with using NMR to study large proteins 126 00:05:22,290 --> 00:05:26,225 is that you can't resolve all the chemical shifts. 127 00:05:26,225 --> 00:05:27,850 The plus point is that there is no need 128 00:05:27,850 --> 00:05:29,260 to crystallize your protein. 129 00:05:29,260 --> 00:05:31,940 You can study it in solution state or solid state. 130 00:05:31,940 --> 00:05:36,730 Solid state NMR is typically used for memory in proteins. 131 00:05:36,730 --> 00:05:40,330 You can use solution NMR to study any soluble proteins. 132 00:05:40,330 --> 00:05:42,460 You do need isotopically labeled samples. 133 00:05:42,460 --> 00:05:46,160 So these are 13c or 15n enriched samples, 134 00:05:46,160 --> 00:05:47,980 which is very hard and expensive to do. 135 00:05:47,980 --> 00:05:49,990 So that's one of the drawbacks with NMR 136 00:05:49,990 --> 00:05:52,660 is that it's a relatively expensive technique while X-ray 137 00:05:52,660 --> 00:05:55,280 is more accessible. 138 00:05:55,280 --> 00:05:58,780 You can obtain a high resolution picture with NMR, as well, 139 00:05:58,780 --> 00:06:02,180 but it often requires more work than X-ray crystallography 140 00:06:02,180 --> 00:06:05,800 in that you need to do about five NMR experiments. 141 00:06:05,800 --> 00:06:09,190 That can sometimes take months to determine high resolution 142 00:06:09,190 --> 00:06:10,000 structure. 143 00:06:10,000 --> 00:06:12,070 X-ray is more accessible. 144 00:06:12,070 --> 00:06:14,590 But really, the big upshot of NMR 145 00:06:14,590 --> 00:06:17,030 is that you can observe dynamics within proteins. 146 00:06:17,030 --> 00:06:18,960 So you can really see-- 147 00:06:18,960 --> 00:06:20,930 proteins are living, breathing machines. 148 00:06:20,930 --> 00:06:23,410 And you can see that with NMR better than you 149 00:06:23,410 --> 00:06:26,220 can with any other technique. 150 00:06:26,220 --> 00:06:29,170 Another quick point which I don't have written up here 151 00:06:29,170 --> 00:06:31,900 is that NMR is sensitive to protons. 152 00:06:31,900 --> 00:06:34,120 And you can study hydrogens with NMR. 153 00:06:34,120 --> 00:06:37,270 You cannot study hydrogens with X-ray diffraction, 154 00:06:37,270 --> 00:06:40,825 for reasons that I'll come back to later in the talk. 155 00:06:40,825 --> 00:06:46,160 AUDIENCE: Can you explain how exactly NMR observes dynamics? 156 00:06:46,160 --> 00:06:49,720 SHIVA MANDALA: Yeah, so there is a series of different-- 157 00:06:49,720 --> 00:06:52,480 I mean, I don't know how much detail you guys know about NMR. 158 00:06:52,480 --> 00:06:55,140 But basically, you can study the relaxation of nuclei. 159 00:06:55,140 --> 00:06:56,790 That's often one that's used. 160 00:06:56,790 --> 00:06:59,830 So you study T1 and T2 relaxation of nuclei. 161 00:06:59,830 --> 00:07:04,440 And more mobile residues and more mobile atoms relax faster. 162 00:07:04,440 --> 00:07:07,320 But really, the idea is you can use-- 163 00:07:07,320 --> 00:07:10,110 there are a whole bunch of different experiments in NMR. 164 00:07:10,110 --> 00:07:13,890 And you can access timescales from the nanosecond 165 00:07:13,890 --> 00:07:15,080 up to till millisecond. 166 00:07:15,080 --> 00:07:17,100 So 10 to the negative 9 to about 10 167 00:07:17,100 --> 00:07:19,860 to the negative 3 seconds of motion. 168 00:07:19,860 --> 00:07:22,620 So it's quite-- and they use different experiments 169 00:07:22,620 --> 00:07:25,340 for different parts of that timescale. 170 00:07:25,340 --> 00:07:29,360 And we can talk more about that in detail. 171 00:07:29,360 --> 00:07:32,250 The third technique is electron microscopy. 172 00:07:32,250 --> 00:07:36,300 So this is restricted so far to large proteins. 173 00:07:36,300 --> 00:07:39,575 The reason for this is that resolution is not so good. 174 00:07:39,575 --> 00:07:41,700 Again, you don't need to crystallize your proteins. 175 00:07:41,700 --> 00:07:43,930 So that's an option. 176 00:07:43,930 --> 00:07:46,050 You don't need to use labeled samples, either. 177 00:07:46,050 --> 00:07:48,180 So the sample preparation is probably the easiest 178 00:07:48,180 --> 00:07:50,780 for electron microscopy. 179 00:07:50,780 --> 00:07:56,370 The picture that you get is sometimes lower resolution, 180 00:07:56,370 --> 00:07:59,010 but the technology is moving forward to the point 181 00:07:59,010 --> 00:08:03,580 where we can get a resolution as good as X-ray structures. 182 00:08:03,580 --> 00:08:06,990 And I know that there's a 2.2 Angstrom resolution 183 00:08:06,990 --> 00:08:09,060 structure out there definitely, and there are 184 00:08:09,060 --> 00:08:10,790 others that are 3.2 Angstroms. 185 00:08:10,790 --> 00:08:13,040 But maybe there's something better than that out there 186 00:08:13,040 --> 00:08:13,960 in the literature. 187 00:08:13,960 --> 00:08:15,710 Again, it's difficult to observe dynamics. 188 00:08:15,710 --> 00:08:19,824 So similar to X-ray, it's just a snapshot of your enzyme. 189 00:08:19,824 --> 00:08:21,045 AUDIENCE: Is that picture-- 190 00:08:25,100 --> 00:08:30,245 is the concept similar to a normal microscope? 191 00:08:30,245 --> 00:08:32,370 SHIVA MANDALA: Yeah, absolutely it is very similar. 192 00:08:32,370 --> 00:08:34,940 And the only thing is you're looking 193 00:08:34,940 --> 00:08:38,960 at how electrons interact with with your sample as 194 00:08:38,960 --> 00:08:40,130 compared to light, right? 195 00:08:40,130 --> 00:08:41,150 Visible light, I guess. 196 00:08:45,000 --> 00:08:49,190 So each of these particles here is your protein, 197 00:08:49,190 --> 00:08:51,030 is a protein molecule. 198 00:08:51,030 --> 00:08:54,270 And then these are three dimensional reconstructions. 199 00:08:54,270 --> 00:08:56,250 So there's computer software that does this. 200 00:08:56,250 --> 00:08:57,770 So to go from this, you basically 201 00:08:57,770 --> 00:09:00,590 signal average over all of these different molecules. 202 00:09:00,590 --> 00:09:02,630 And then you signal average over all 203 00:09:02,630 --> 00:09:04,700 of your different orientations of the protein 204 00:09:04,700 --> 00:09:08,540 that are trapped in your static electron microscope image. 205 00:09:08,540 --> 00:09:10,350 And then using some image processing, 206 00:09:10,350 --> 00:09:13,310 you generate a three dimensional image of your protein 207 00:09:13,310 --> 00:09:15,620 that has a higher resolution than what you can see just 208 00:09:15,620 --> 00:09:17,750 with one single photo, I guess. 209 00:09:17,750 --> 00:09:19,549 So there's lot of computer processing 210 00:09:19,549 --> 00:09:20,840 that happens behind the scenes. 211 00:09:23,440 --> 00:09:25,530 AUDIENCE: So is the electron-- 212 00:09:25,530 --> 00:09:27,910 the interactions with electrons, is that similar 213 00:09:27,910 --> 00:09:30,106 to fluorescence microscopy? 214 00:09:30,106 --> 00:09:33,071 Because that's where you're seeing where your proteins are 215 00:09:33,071 --> 00:09:33,960 located, right? 216 00:09:33,960 --> 00:09:36,310 SHIVA MANDALA: So the difference with fluorescence-- 217 00:09:36,310 --> 00:09:39,670 I mean, here electrons can interact with any atoms, right? 218 00:09:39,670 --> 00:09:40,470 Any material. 219 00:09:40,470 --> 00:09:42,760 AUDIENCE: Oh, so you can distinguish 220 00:09:42,760 --> 00:09:45,809 what different atoms the electrons are interacting with? 221 00:09:45,809 --> 00:09:47,600 SHIVA MANDALA: Yes, because different atoms 222 00:09:47,600 --> 00:09:50,370 interact with-- different nuclei interact, 223 00:09:50,370 --> 00:09:53,330 and electron densities interact with electrons differently. 224 00:09:53,330 --> 00:09:54,860 But with fluorescence microscopy, 225 00:09:54,860 --> 00:09:57,220 you're usually looking at just a single molecule 226 00:09:57,220 --> 00:10:00,350 a fluorophore that's reporting on where your protein is. 227 00:10:00,350 --> 00:10:03,020 But electron microscopy is a much higher resolution picture. 228 00:10:03,020 --> 00:10:04,940 It's actually an atomic level-- 229 00:10:04,940 --> 00:10:07,190 well, maybe a few atoms level-- 230 00:10:07,190 --> 00:10:07,874 picture. 231 00:10:07,874 --> 00:10:09,290 Fluorescence microscopy is usually 232 00:10:09,290 --> 00:10:12,290 just used to study where your protein is 233 00:10:12,290 --> 00:10:13,590 if proteins are interacting. 234 00:10:13,590 --> 00:10:18,500 So that's more macromolecular interactions. 235 00:10:18,500 --> 00:10:20,950 But you can get single molecule resolution 236 00:10:20,950 --> 00:10:26,000 with fluorescence microscopy if you use the correct techniques. 237 00:10:26,000 --> 00:10:29,560 And so just as an introduction to the protein data bank, so 238 00:10:29,560 --> 00:10:33,370 the first graph tells you the number of structures in the PDB 239 00:10:33,370 --> 00:10:37,510 as a function of a year going from 1975 all the way to 2015. 240 00:10:37,510 --> 00:10:41,900 So you'll see today there are about 110,000 structures, 241 00:10:41,900 --> 00:10:49,150 of which 100,000 were determined using X-ray crystallography, 242 00:10:49,150 --> 00:10:52,630 and about 10,000 using NMR, and about 1,000 243 00:10:52,630 --> 00:10:54,700 using electron microscopy. 244 00:10:54,700 --> 00:10:56,890 So really quite a nice ratio there. 245 00:10:56,890 --> 00:10:58,570 And if you see the yearly increase 246 00:10:58,570 --> 00:11:00,670 in the number of PDB structures, you'll 247 00:11:00,670 --> 00:11:04,180 see that X-ray is, of course, really big. 248 00:11:04,180 --> 00:11:06,257 NMR has been fairly consistent over time. 249 00:11:06,257 --> 00:11:08,590 I think that has to do with the fact that it's expensive 250 00:11:08,590 --> 00:11:10,660 and it takes time to prepare your samples. 251 00:11:10,660 --> 00:11:14,140 But you also see a huge spike in electron microscopy of late. 252 00:11:14,140 --> 00:11:19,060 And so with the advent of cryo EM, a lot more 253 00:11:19,060 --> 00:11:21,890 people start using cry EM to determine protein structure. 254 00:11:21,890 --> 00:11:24,496 AUDIENCE: Doesn't [INAUDIBLE] produce [INAUDIBLE] or you 255 00:11:24,496 --> 00:11:26,950 just put it in [INAUDIBLE] 256 00:11:26,950 --> 00:11:29,910 SHIVA MANDALA: Yeah, it can be. 257 00:11:29,910 --> 00:11:34,500 But the problem with that is when 258 00:11:34,500 --> 00:11:36,300 you put your sample in the [INAUDIBLE],, 259 00:11:36,300 --> 00:11:38,067 you can get chemical shift information. 260 00:11:38,067 --> 00:11:39,900 But chemical shift doesn't tell you anything 261 00:11:39,900 --> 00:11:41,220 about protein structure. 262 00:11:41,220 --> 00:11:42,810 I mean, it tells you a little bit. 263 00:11:42,810 --> 00:11:44,310 It tells you about what the electron 264 00:11:44,310 --> 00:11:46,320 density is at the atoms. 265 00:11:46,320 --> 00:11:48,810 But what you really need to get from NMR experiments 266 00:11:48,810 --> 00:11:50,160 are distance of strains. 267 00:11:50,160 --> 00:11:52,330 And so this is through space experiments. 268 00:11:52,330 --> 00:11:55,140 So you can say that, oh, this one carbon 269 00:11:55,140 --> 00:11:57,090 nuclei is at a distance of 6 Angstroms 270 00:11:57,090 --> 00:11:59,780 away from this other carbon nuclei. 271 00:11:59,780 --> 00:12:01,350 And you typically want to accumulate 272 00:12:01,350 --> 00:12:04,200 about five strains per atom. 273 00:12:04,200 --> 00:12:06,150 And so to collect five times how many ever 274 00:12:06,150 --> 00:12:08,380 atoms you have in your sample, it can take time. 275 00:12:08,380 --> 00:12:09,140 It's hard to do. 276 00:12:09,140 --> 00:12:10,980 So chemical shift by itself doesn't tell you 277 00:12:10,980 --> 00:12:15,180 much about protein structure. 278 00:12:15,180 --> 00:12:17,810 Any other questions so far? 279 00:12:17,810 --> 00:12:21,220 All right. 280 00:12:21,220 --> 00:12:23,230 So now we will focus the rest of the talk 281 00:12:23,230 --> 00:12:26,770 on-- well, another part of the talk on X-ray crystallography. 282 00:12:26,770 --> 00:12:28,630 And so crystallography is the science 283 00:12:28,630 --> 00:12:33,520 of determining the t dimensional position of atoms in a crystal. 284 00:12:33,520 --> 00:12:35,410 And so what crystal is, a crystal 285 00:12:35,410 --> 00:12:37,270 is a solid material whose constituents 286 00:12:37,270 --> 00:12:40,840 are arranged in an ordered pattern expanding and extending 287 00:12:40,840 --> 00:12:42,640 in all three spatial dimensions. 288 00:12:42,640 --> 00:12:46,090 And so the key idea is that this translational symmetry-- 289 00:12:46,090 --> 00:12:48,540 so if you go in any of the three directions 290 00:12:48,540 --> 00:12:50,970 for a certain amount of time, a certain amount of length, 291 00:12:50,970 --> 00:12:52,469 you'll come back to the same pattern 292 00:12:52,469 --> 00:12:53,770 that you start off with. 293 00:12:53,770 --> 00:12:57,796 And so this is a crystal of your protein of interest. 294 00:12:57,796 --> 00:12:59,170 What you want to know is you want 295 00:12:59,170 --> 00:13:01,330 to know how the proteins are packed or arranged 296 00:13:01,330 --> 00:13:02,830 within this crystal structure. 297 00:13:02,830 --> 00:13:04,420 And also as a result, how the atoms 298 00:13:04,420 --> 00:13:06,520 are arranged within the crystal structure. 299 00:13:06,520 --> 00:13:09,580 And the way this works is by diffracting X-rays 300 00:13:09,580 --> 00:13:11,590 through your sample of interest. 301 00:13:11,590 --> 00:13:13,240 And with this slide, I just wanted 302 00:13:13,240 --> 00:13:15,790 to point out that it's not restricted to proteins. 303 00:13:15,790 --> 00:13:17,470 You can study salts, you can study 304 00:13:17,470 --> 00:13:19,060 your favorite small organic molecule. 305 00:13:19,060 --> 00:13:21,790 Whatever you want, really. 306 00:13:21,790 --> 00:13:24,580 And so the general workflow is that you have a source 307 00:13:24,580 --> 00:13:28,210 of x-rays that can be a singleton-- or local source, 308 00:13:28,210 --> 00:13:30,760 singletons are much brighter than local sources-- 309 00:13:30,760 --> 00:13:32,170 that you shine in your crystal. 310 00:13:32,170 --> 00:13:35,250 And you obtain what is known as a diffraction pattern. 311 00:13:35,250 --> 00:13:37,900 And so this tells you how the X-rays are in track-- 312 00:13:37,900 --> 00:13:40,450 this tells you something about how the X-rays are interacting 313 00:13:40,450 --> 00:13:43,400 with the atoms in the crystal. 314 00:13:43,400 --> 00:13:46,300 And so this used to be collected on a photographic plate. 315 00:13:46,300 --> 00:13:49,510 This particular image is on a photographic plate, 316 00:13:49,510 --> 00:13:51,430 but now people use CCD sensors. 317 00:13:51,430 --> 00:13:52,870 It's a lot easier. 318 00:13:52,870 --> 00:13:57,100 And knowing the-- sorry, one more thing. 319 00:13:57,100 --> 00:13:59,330 Each of these dots, light and dark, 320 00:13:59,330 --> 00:14:01,650 on the diffraction pattern is called the reflection. 321 00:14:01,650 --> 00:14:03,250 And that contains some information 322 00:14:03,250 --> 00:14:06,250 about the electron density and the crystal structure. 323 00:14:06,250 --> 00:14:07,960 And from your diffraction pattern, 324 00:14:07,960 --> 00:14:11,110 you can then back calculate the electron density 325 00:14:11,110 --> 00:14:12,970 in your crystal structure that gave rise 326 00:14:12,970 --> 00:14:14,680 to this diffraction pattern. 327 00:14:14,680 --> 00:14:16,360 And the way you do that is by looking 328 00:14:16,360 --> 00:14:18,890 at the intensity of these reflections. 329 00:14:18,890 --> 00:14:21,390 And you also need-- there's also something called phase, 330 00:14:21,390 --> 00:14:22,870 so you need to determine phase. 331 00:14:22,870 --> 00:14:26,110 And sometimes you'll see in the literature 332 00:14:26,110 --> 00:14:27,870 you'll see heavy atoms being introduced, 333 00:14:27,870 --> 00:14:29,414 or mercury being introduced. 334 00:14:29,414 --> 00:14:31,330 And that's often to determine the phase, which 335 00:14:31,330 --> 00:14:34,900 is essential for calculating the electron density. 336 00:14:34,900 --> 00:14:36,920 Once you determine the electron density, 337 00:14:36,920 --> 00:14:39,070 you know what protein you started off with. 338 00:14:39,070 --> 00:14:41,350 And so you know what your protein 339 00:14:41,350 --> 00:14:43,600 looks-- you know the sequence of your protein. 340 00:14:43,600 --> 00:14:46,180 And so then you just take your electron density 341 00:14:46,180 --> 00:14:49,320 and fit whatever polypeptide change you have to that. 342 00:14:49,320 --> 00:14:52,300 And then usually this is all automated nowadays. 343 00:14:52,300 --> 00:14:56,050 So you press a few buttons and it goes through, 344 00:14:56,050 --> 00:14:57,850 software does everything for you. 345 00:14:57,850 --> 00:15:01,090 But it was much more challenging early on. 346 00:15:01,090 --> 00:15:04,850 And even now, the computers will get you up to a certain point. 347 00:15:04,850 --> 00:15:07,780 And then in the last, last stages of refinement, 348 00:15:07,780 --> 00:15:08,950 you always want to-- 349 00:15:08,950 --> 00:15:12,480 usually people do that by hand. 350 00:15:12,480 --> 00:15:16,356 Any questions about X-ray crystallography? 351 00:15:16,356 --> 00:15:20,670 AUDIENCE: [INAUDIBLE] how strongly or complex, 352 00:15:20,670 --> 00:15:23,390 but how do you get that electron density from diffraction 353 00:15:23,390 --> 00:15:24,760 pattern. 354 00:15:24,760 --> 00:15:30,500 Like in organic chemistry, in basic [INAUDIBLE] I thought 355 00:15:30,500 --> 00:15:31,750 that you can-- 356 00:15:31,750 --> 00:15:34,350 from diffraction pattern, you can 357 00:15:34,350 --> 00:15:41,970 learn the distance between atoms in the lattice points. 358 00:15:41,970 --> 00:15:47,640 But here with proteins, every point is itself a protein, 359 00:15:47,640 --> 00:15:49,120 right? 360 00:15:49,120 --> 00:15:50,130 In the lattice? 361 00:15:50,130 --> 00:15:50,720 No? 362 00:15:50,720 --> 00:15:51,970 SHIVA MANDALA: No, no, no, no. 363 00:15:51,970 --> 00:15:54,386 Because you're still looking at every point in the lattice 364 00:15:54,386 --> 00:15:56,740 is still an atom if you're doing proteins. 365 00:15:56,740 --> 00:15:58,540 It's just that there are a lot more atoms, 366 00:15:58,540 --> 00:16:00,510 and the lattice is a lot bigger, which 367 00:16:00,510 --> 00:16:02,260 is what makes protein crystallography hard 368 00:16:02,260 --> 00:16:04,670 compared to small molecule crystallography. 369 00:16:04,670 --> 00:16:06,820 So it's harder to solve a protein crystal structure 370 00:16:06,820 --> 00:16:08,694 than it is a small molecule crystal structure 371 00:16:08,694 --> 00:16:11,420 just because there's so many more atoms in your lattice. 372 00:16:11,420 --> 00:16:14,020 But the idea is exactly the same as a small molecule. 373 00:16:14,020 --> 00:16:15,890 It's just a lot harder to do. 374 00:16:15,890 --> 00:16:17,530 And for more information, I actually 375 00:16:17,530 --> 00:16:19,930 have a resource at the end that goes in-depth 376 00:16:19,930 --> 00:16:22,570 into the math of the process. 377 00:16:22,570 --> 00:16:24,880 But just briefly, this diffraction pattern 378 00:16:24,880 --> 00:16:27,610 is collected into what's called reciprocal space. 379 00:16:27,610 --> 00:16:29,914 And to go from this to electron density, 380 00:16:29,914 --> 00:16:31,330 you need to do a Fourier transform 381 00:16:31,330 --> 00:16:33,890 into Hilbert space, which is what electron density is spaced 382 00:16:33,890 --> 00:16:34,390 in. 383 00:16:34,390 --> 00:16:40,670 But I will provide a reference for more information on that. 384 00:16:40,670 --> 00:16:43,610 And so the next part of the discussion part 385 00:16:43,610 --> 00:16:46,300 of this recitation will be thinking about some 386 00:16:46,300 --> 00:16:48,400 of the limitations of X-ray crystallography. 387 00:16:48,400 --> 00:16:52,330 So there's a lot of them, but I'll turn to all of you 388 00:16:52,330 --> 00:16:54,490 for your inputs. 389 00:16:54,490 --> 00:16:56,614 AUDIENCE: Is it difficult to develop crystals 390 00:16:56,614 --> 00:16:58,012 of certain types of proteins? 391 00:16:58,012 --> 00:16:59,470 SHIVA MANDALA: Yes, absolutely yes. 392 00:16:59,470 --> 00:17:02,910 First point, it's really hard to purify and crystallize 393 00:17:02,910 --> 00:17:03,729 proteins. 394 00:17:03,729 --> 00:17:05,020 It's really not a trivial task. 395 00:17:05,020 --> 00:17:07,359 It can take months or even years to do so. 396 00:17:07,359 --> 00:17:10,979 And nowadays you have robots that can set up reactions 397 00:17:10,979 --> 00:17:13,270 under hundreds of different crystallization conditions. 398 00:17:13,270 --> 00:17:16,677 It's sort of a black magic sort of hard. 399 00:17:16,677 --> 00:17:19,010 It's hard to predict what crystallization conditions are 400 00:17:19,010 --> 00:17:22,369 going to give you a high quality crystal. 401 00:17:22,369 --> 00:17:23,190 Anything else? 402 00:17:23,190 --> 00:17:24,047 Yes? 403 00:17:24,047 --> 00:17:26,130 AUDIENCE: Like the crystals you get may or may not 404 00:17:26,130 --> 00:17:28,369 be physiologically relevant? 405 00:17:28,369 --> 00:17:29,821 SHIVA MANDALA: Yes, absolutely. 406 00:17:29,821 --> 00:17:32,180 Just on point with questions. 407 00:17:32,180 --> 00:17:34,910 But yeah, it's hard to tell whether the crystal 408 00:17:34,910 --> 00:17:37,970 structure that you get is depicting what's happening 409 00:17:37,970 --> 00:17:39,230 with the protein in vivo. 410 00:17:39,230 --> 00:17:42,742 And I mean, this is a problem that's inherent to X-ray 411 00:17:42,742 --> 00:17:43,700 crystallography, right? 412 00:17:43,700 --> 00:17:45,408 Every time you solve a crystal structure, 413 00:17:45,408 --> 00:17:47,210 you don't know whether it's relevant. 414 00:17:47,210 --> 00:17:50,810 But usually I think it turns out that it's 415 00:17:50,810 --> 00:17:54,500 pretty close, if not completely accurate in solution. 416 00:17:54,500 --> 00:17:56,919 But sometimes you do have to be careful about this. 417 00:17:56,919 --> 00:17:58,460 It's especially challenging for stuff 418 00:17:58,460 --> 00:18:01,730 like membrane proteins, where you don't really know. 419 00:18:01,730 --> 00:18:02,570 Any other ideas? 420 00:18:05,720 --> 00:18:08,690 So when proteins are translated, do they usually-- 421 00:18:08,690 --> 00:18:10,595 are they usually used just like that, 422 00:18:10,595 --> 00:18:13,178 or does something else happened to the proteins in most cells? 423 00:18:13,178 --> 00:18:14,402 AUDIENCE: [INAUDIBLE] 424 00:18:14,402 --> 00:18:15,770 SHIVA MANDALA: Yes, absolutely. 425 00:18:15,770 --> 00:18:17,150 Post translation modifications. 426 00:18:17,150 --> 00:18:20,180 So a lot of proteins are post-translationally modified. 427 00:18:20,180 --> 00:18:25,119 And so when you're growing a crystal of your protein, 428 00:18:25,119 --> 00:18:27,410 you usually just use a purified version of your protein 429 00:18:27,410 --> 00:18:28,850 so you can't really calculate. 430 00:18:28,850 --> 00:18:32,106 And sometimes these PTMs are essential for the function 431 00:18:32,106 --> 00:18:32,730 of the protein. 432 00:18:32,730 --> 00:18:35,456 So you're missing some part of the picture. 433 00:18:35,456 --> 00:18:38,480 Anything else? 434 00:18:38,480 --> 00:18:39,900 What about movement? 435 00:18:39,900 --> 00:18:44,710 Can you tell what proteins are flexible, what parts of the-- 436 00:18:44,710 --> 00:18:45,630 sorry. 437 00:18:45,630 --> 00:18:49,126 AUDIENCE: Well it's like if part of protein is mobile, 438 00:18:49,126 --> 00:18:53,960 then you won't have the density for it. 439 00:18:53,960 --> 00:18:55,640 SHIVA MANDALA: Yes, that is true. 440 00:18:55,640 --> 00:18:58,120 You can discern anything about dynamics and flexibility. 441 00:18:58,120 --> 00:19:01,321 And the answer is you can tell something. 442 00:19:01,321 --> 00:19:03,320 You can tell something about the relative motion 443 00:19:03,320 --> 00:19:05,236 of different parts of the protein with respect 444 00:19:05,236 --> 00:19:07,280 to each other, but it's hard to tell something 445 00:19:07,280 --> 00:19:09,600 about the absolute motion of these proteins. 446 00:19:09,600 --> 00:19:12,146 So you can't see, say, larger scale motions, right? 447 00:19:12,146 --> 00:19:14,020 Most proteins are living, breathing machines, 448 00:19:14,020 --> 00:19:18,650 and it's hard to capture that in an X-ray structure. 449 00:19:18,650 --> 00:19:21,000 And one more thing. 450 00:19:21,000 --> 00:19:24,650 So is there any element that you cannot detect in X-ray 451 00:19:24,650 --> 00:19:26,120 crystallography very well? 452 00:19:26,120 --> 00:19:28,500 This has to do with the way-- 453 00:19:28,500 --> 00:19:30,020 so in X-ray crystallography, you're 454 00:19:30,020 --> 00:19:33,770 setting interactions of X-rays with electrons, right? 455 00:19:33,770 --> 00:19:35,520 So does anybody know? 456 00:19:38,150 --> 00:19:39,493 Yeah, I heard somewhere. 457 00:19:39,493 --> 00:19:39,856 AUDIENCE: Protons. 458 00:19:39,856 --> 00:19:40,460 SHIVA MANDALA: Protons, yeah. 459 00:19:40,460 --> 00:19:42,160 So protons have one electron. 460 00:19:42,160 --> 00:19:46,640 And so their X-ray signal, so-called, is really weak. 461 00:19:46,640 --> 00:19:47,480 Really, really weak. 462 00:19:47,480 --> 00:19:49,430 And so you can't really see protons 463 00:19:49,430 --> 00:19:50,900 with X-ray crystallography. 464 00:19:50,900 --> 00:19:53,980 And so you can't really study hydrogens or hydrogen bonds. 465 00:19:53,980 --> 00:19:55,850 And if you look at the structure of proteins 466 00:19:55,850 --> 00:19:58,130 that has hydrogens in it, those hydrogens 467 00:19:58,130 --> 00:20:01,160 were put there as a result of an average bond length, 468 00:20:01,160 --> 00:20:02,820 the typical bond calculation. 469 00:20:02,820 --> 00:20:04,940 So it's not actually experimentally determined. 470 00:20:04,940 --> 00:20:07,680 You can use neutron diffraction to get around this, 471 00:20:07,680 --> 00:20:09,260 but neutron diffraction is hard to do 472 00:20:09,260 --> 00:20:12,540 because you need to grow very large crystals to study. 473 00:20:12,540 --> 00:20:15,350 And I think there are about 80, I think, neutron-- 474 00:20:15,350 --> 00:20:18,490 around 100 neutron structures in the PDB so far. 475 00:20:18,490 --> 00:20:21,470 But for small molecules, neutron is much more accessible. 476 00:20:21,470 --> 00:20:23,096 Neutron diffraction? 477 00:20:23,096 --> 00:20:26,270 And so the idea is the same as X-ray crystallography, 478 00:20:26,270 --> 00:20:27,990 except for you're using neutrons. 479 00:20:31,730 --> 00:20:34,014 And the final point is that one structure only 480 00:20:34,014 --> 00:20:35,180 tells you part of the story. 481 00:20:35,180 --> 00:20:37,580 Again, this is emphasizing the fact 482 00:20:37,580 --> 00:20:41,120 that one structure is just a snapshot of the protein 483 00:20:41,120 --> 00:20:42,320 at a certain time. 484 00:20:42,320 --> 00:20:44,600 And you want to correctly interpret your data 485 00:20:44,600 --> 00:20:46,880 and learn something more about the protein, 486 00:20:46,880 --> 00:20:50,420 you often have to use complementary biochemical 487 00:20:50,420 --> 00:20:52,820 techniques. 488 00:20:52,820 --> 00:20:55,410 Are there any questions at this point? 489 00:20:55,410 --> 00:20:58,130 So the last part of the talk is on how 490 00:20:58,130 --> 00:21:00,960 to assess the quality of structures in the PDB. 491 00:21:00,960 --> 00:21:02,530 They're large structures, and you 492 00:21:02,530 --> 00:21:05,120 want to be able to know whether the model that's 493 00:21:05,120 --> 00:21:07,564 presented to you is actually accurate, 494 00:21:07,564 --> 00:21:10,500 actually reflects the data that was collected. 495 00:21:10,500 --> 00:21:13,640 And so the first point is what is the resolution 496 00:21:13,640 --> 00:21:14,830 of the structure? 497 00:21:14,830 --> 00:21:18,650 And so the take home message is that a lower number 498 00:21:18,650 --> 00:21:20,150 means a greater resolution. 499 00:21:20,150 --> 00:21:21,800 And the resolution actually here is 500 00:21:21,800 --> 00:21:24,806 referring to the distances between the atoms in the plane. 501 00:21:24,806 --> 00:21:26,430 And so that's where that's coming from. 502 00:21:26,430 --> 00:21:28,190 So that's why if you have a lower resolution, 503 00:21:28,190 --> 00:21:29,614 that means you can resolve atoms. 504 00:21:29,614 --> 00:21:31,280 A one Angstrom resolution means that you 505 00:21:31,280 --> 00:21:33,970 can resolve atoms that are one Angstrom apart 506 00:21:33,970 --> 00:21:36,200 on parallel planes. 507 00:21:36,200 --> 00:21:38,930 But the take home message at a one Angstrom resolution, 508 00:21:38,930 --> 00:21:41,090 you can see individual atoms and you 509 00:21:41,090 --> 00:21:42,890 can discern the identities of those atoms 510 00:21:42,890 --> 00:21:45,180 by looking at their electron density. 511 00:21:45,180 --> 00:21:48,131 But if you come bound to a four Angstrom structure, 512 00:21:48,131 --> 00:21:49,880 you'll see the benzene ring doesn't really 513 00:21:49,880 --> 00:21:51,650 have a clearly defined electron density, 514 00:21:51,650 --> 00:21:53,560 and there's no hole in the center. 515 00:21:53,560 --> 00:21:55,476 But if you look at the one Angstrom structure, 516 00:21:55,476 --> 00:21:58,227 you can see that there's even a hole in the benzene ring 517 00:21:58,227 --> 00:21:59,810 to confirm the electron density there. 518 00:22:02,410 --> 00:22:05,610 And then if you look at-- so these are the data statistics. 519 00:22:05,610 --> 00:22:09,330 So this is just pulled from PDB off 2JF5. 520 00:22:09,330 --> 00:22:12,129 So this is the PDB ID for di-ubiquitin, 521 00:22:12,129 --> 00:22:13,920 and we'll be looking at the structure later 522 00:22:13,920 --> 00:22:15,470 in the worksheet. 523 00:22:15,470 --> 00:22:17,850 The resolution tells you, of course, 524 00:22:17,850 --> 00:22:20,560 about the resolution of the crystal structure. 525 00:22:20,560 --> 00:22:23,580 And so in this case, it's at 1.95 Angstroms 526 00:22:23,580 --> 00:22:26,340 or two Angstroms, and so that's pretty high resolution. 527 00:22:29,659 --> 00:22:31,950 Reflections are each of those points in the diffraction 528 00:22:31,950 --> 00:22:34,380 pattern that you collect, and a unique reflection 529 00:22:34,380 --> 00:22:37,290 refers to the fact that you've only collected it once. 530 00:22:37,290 --> 00:22:41,370 So usually when you collect diffraction patterns, 531 00:22:41,370 --> 00:22:44,730 you put your protein a certain orientation with respect 532 00:22:44,730 --> 00:22:47,070 to the X-ray beam, and then you collect the diffraction 533 00:22:47,070 --> 00:22:49,500 pattern, and then you rotate your crystal a whole bunch 534 00:22:49,500 --> 00:22:50,700 of times, and you collect a whole bunch 535 00:22:50,700 --> 00:22:52,080 of different diffraction patterns. 536 00:22:52,080 --> 00:22:53,640 And then you superimpose all of those 537 00:22:53,640 --> 00:22:57,540 together to get the master diffraction pattern. 538 00:22:57,540 --> 00:23:01,530 Redundancy refers to the fact of how often each reflection was 539 00:23:01,530 --> 00:23:02,250 observed. 540 00:23:02,250 --> 00:23:03,750 And so this is a signal averaging. 541 00:23:03,750 --> 00:23:08,040 The more redundancy you have, the greater number of times 542 00:23:08,040 --> 00:23:10,710 you observed that particular reflection. 543 00:23:10,710 --> 00:23:13,080 Completedness refers to how many of the data 544 00:23:13,080 --> 00:23:15,480 points were actually measured. 545 00:23:15,480 --> 00:23:18,330 And so this is when you created your model, 546 00:23:18,330 --> 00:23:21,160 you can back calculate your diffraction pattern. 547 00:23:21,160 --> 00:23:24,150 And then you see how many of those 548 00:23:24,150 --> 00:23:28,660 reflections were experimentally observed in our data set. 549 00:23:28,660 --> 00:23:31,190 And so for this usually you want as close to 100%, 550 00:23:31,190 --> 00:23:35,298 and anything above 95% is considered fairly good. 551 00:23:35,298 --> 00:23:41,350 R merge is an indicator of how consistent measurements are. 552 00:23:41,350 --> 00:23:45,220 So this is a measure of what the difference 553 00:23:45,220 --> 00:23:50,050 between different measurements for the same reflection are. 554 00:23:50,050 --> 00:23:54,400 So if you look at the intensity for the same reflection 555 00:23:54,400 --> 00:23:55,870 a different number of times, you're 556 00:23:55,870 --> 00:23:58,300 seeing the standard deviation of that. 557 00:23:58,300 --> 00:24:00,280 So you want as low a number as possible. 558 00:24:00,280 --> 00:24:05,140 So usually you want it to be about 1/10 559 00:24:05,140 --> 00:24:07,120 of the resolution of your crystal structure, 560 00:24:07,120 --> 00:24:10,570 which is 0.2 Angstroms in this case. 561 00:24:10,570 --> 00:24:13,030 I forgot to mention this, but the values in parentheses 562 00:24:13,030 --> 00:24:14,830 are for a high resolution bin. 563 00:24:14,830 --> 00:24:18,760 So this is just a certain subset of this data set 564 00:24:18,760 --> 00:24:21,322 that's considered to be higher quality than the original data 565 00:24:21,322 --> 00:24:21,750 set. 566 00:24:21,750 --> 00:24:23,830 And that's actually coming from this value, which 567 00:24:23,830 --> 00:24:28,840 is signal intensity over sigma of signal intensity, 568 00:24:28,840 --> 00:24:31,310 and that's a measure of the signal to noise ratio. 569 00:24:31,310 --> 00:24:34,010 So you want a higher signal to noise ratio is better. 570 00:24:34,010 --> 00:24:36,310 And this is fairly good. 571 00:24:36,310 --> 00:24:37,880 And the higher the better. 572 00:24:37,880 --> 00:24:40,850 And cutoff is at 2 for the high resolution 573 00:24:40,850 --> 00:24:44,890 bin of your data points. 574 00:24:44,890 --> 00:24:47,040 All, right so this is just the raw data. 575 00:24:47,040 --> 00:24:49,410 And then the refinement statistics 576 00:24:49,410 --> 00:24:51,900 tell you something about your refinement process that 577 00:24:51,900 --> 00:24:54,240 gave you the crystal structure that you calculated. 578 00:24:54,240 --> 00:24:57,300 So our crystallization, which is R cryst, which is also 579 00:24:57,300 --> 00:25:00,900 called R work, and also called the R factor. 580 00:25:00,900 --> 00:25:05,070 That tells you how well your model and your data match. 581 00:25:05,070 --> 00:25:06,990 And so this is where you calculate 582 00:25:06,990 --> 00:25:09,030 the difference in the diffraction 583 00:25:09,030 --> 00:25:13,710 patterns between what you experimentally observed 584 00:25:13,710 --> 00:25:19,035 and what you calculated using the model of electron density. 585 00:25:19,035 --> 00:25:23,220 R free tells you how well your model and data match 586 00:25:23,220 --> 00:25:24,930 when corrected for overfitting. 587 00:25:24,930 --> 00:25:29,430 So the idea behind this is that when you collect your data set, 588 00:25:29,430 --> 00:25:32,200 you put aside about 5% of the reflections 589 00:25:32,200 --> 00:25:37,080 that you observe in the data set to prevent overfitting. 590 00:25:37,080 --> 00:25:38,940 And then when you've created your model, 591 00:25:38,940 --> 00:25:41,820 you go back and see how well those 5% of data points 592 00:25:41,820 --> 00:25:44,530 fit the model that you've come up with. 593 00:25:44,530 --> 00:25:47,130 And if they fit well, that means you've accurately 594 00:25:47,130 --> 00:25:49,985 predicted your crystal structure using the data that you have. 595 00:25:49,985 --> 00:25:51,360 If you don't fit well, that means 596 00:25:51,360 --> 00:25:54,774 you've just fit whatever data points you have to some model. 597 00:25:54,774 --> 00:25:56,190 You've used a bunch of data points 598 00:25:56,190 --> 00:25:59,670 and you fit it to some model, but that's not actually 599 00:25:59,670 --> 00:26:02,670 accurate to the protein structure that you determined. 600 00:26:02,670 --> 00:26:06,780 And so for this you want it to be about 1/10 of a resolution. 601 00:26:06,780 --> 00:26:08,640 That's also true for R cryst. 602 00:26:08,640 --> 00:26:10,590 But then the other key point is that it 603 00:26:10,590 --> 00:26:12,895 should be very close to the R cryst, 604 00:26:12,895 --> 00:26:16,950 because it's telling you that it's random error, 605 00:26:16,950 --> 00:26:18,450 or it's the quality of your data set 606 00:26:18,450 --> 00:26:21,900 that's causing this and not sum overrefinement 607 00:26:21,900 --> 00:26:25,340 that you've done during the refinement process. 608 00:26:25,340 --> 00:26:28,145 B factor tells you about how mobile atoms 609 00:26:28,145 --> 00:26:29,720 are in the crystal lattice. 610 00:26:29,720 --> 00:26:31,470 And this is something that's it's 611 00:26:31,470 --> 00:26:35,430 not particularly useful if you look at the bulk statistic. 612 00:26:35,430 --> 00:26:38,910 But if you need to, it's important to evaluate this 613 00:26:38,910 --> 00:26:39,940 by residue. 614 00:26:39,940 --> 00:26:42,870 And so if you look at amino acids in the loops of proteins, 615 00:26:42,870 --> 00:26:45,300 you'll find that they have a higher B factor usually, 616 00:26:45,300 --> 00:26:48,120 meaning that they're more mobile. 617 00:26:48,120 --> 00:26:50,140 And so you can often tell what residues 618 00:26:50,140 --> 00:26:54,830 are important for function by looking at the B factor. 619 00:26:54,830 --> 00:26:58,477 B factor is kind of-- there's inherent vibration motions 620 00:26:58,477 --> 00:26:59,310 in all atoms, right? 621 00:26:59,310 --> 00:27:03,270 So there will be a B factor at any temperature greater than 0. 622 00:27:03,270 --> 00:27:06,000 But it also does tell you a little bit about the disorder 623 00:27:06,000 --> 00:27:08,070 in protein structures. 624 00:27:08,070 --> 00:27:09,750 And the B factor of water is often 625 00:27:09,750 --> 00:27:12,710 included just so that people who are looking at the crystal 626 00:27:12,710 --> 00:27:15,210 structure afterwards can decide whether that water is really 627 00:27:15,210 --> 00:27:18,786 there, or really was part of the electron density, 628 00:27:18,786 --> 00:27:20,910 or whether it's just an artifact of ore refinement, 629 00:27:20,910 --> 00:27:23,230 or something like that. 630 00:27:23,230 --> 00:27:25,500 And then the final statistic that you can look at 631 00:27:25,500 --> 00:27:27,510 is the RMSD from ideal geometry. 632 00:27:27,510 --> 00:27:30,450 So the geometries of these bonds and angles 633 00:27:30,450 --> 00:27:31,690 are usually well known. 634 00:27:31,690 --> 00:27:36,060 And so if you compare the results from your structure 635 00:27:36,060 --> 00:27:39,030 to the known stereochemistry, you'll 636 00:27:39,030 --> 00:27:42,124 find that this is actually just at the threshold of the cutoff 637 00:27:42,124 --> 00:27:43,290 for what is considered good. 638 00:27:43,290 --> 00:27:46,890 So 0.015 Angstroms, this is standard deviation 639 00:27:46,890 --> 00:27:51,070 of 1 lens of your model versus what is already known. 640 00:27:51,070 --> 00:27:55,230 And so 0.015 Angstroms is just about the cutoff 641 00:27:55,230 --> 00:27:58,080 that's considered good for X-ray structures. 642 00:27:58,080 --> 00:27:59,490 And same for bond angles. 643 00:27:59,490 --> 00:28:02,700 1.5 degrees is considered the threshold of what 644 00:28:02,700 --> 00:28:04,640 is considered acceptable. 645 00:28:04,640 --> 00:28:07,510 And you can also look at Ramachandran's statistics-- 646 00:28:07,510 --> 00:28:10,760 so this is looking at five sided back one angles-- 647 00:28:10,760 --> 00:28:16,290 to tell you if there are any static clashes, say, 648 00:28:16,290 --> 00:28:19,290 for side chains that really shouldn't be there. 649 00:28:19,290 --> 00:28:21,690 And you can if you do have static clash nowadays, 650 00:28:21,690 --> 00:28:23,700 you have to report it to PDB. 651 00:28:23,700 --> 00:28:25,410 The only exception is for glycine, which 652 00:28:25,410 --> 00:28:26,493 doesn't have a side chain. 653 00:28:26,493 --> 00:28:30,840 And so it can adopt a strange phi psi angle that's 654 00:28:30,840 --> 00:28:33,240 outside the Ramachandran plot. 655 00:28:33,240 --> 00:28:36,520 Any questions on this? 656 00:28:36,520 --> 00:28:38,372 Anything else about X-ray crystallography? 657 00:28:38,372 --> 00:28:40,080 So this brings us to the end of the talk, 658 00:28:40,080 --> 00:28:43,830 and this is a resource that I found 659 00:28:43,830 --> 00:28:46,860 that has more information about X-ray crystallography, 660 00:28:46,860 --> 00:28:48,610 and the math behind X-ray crystallography, 661 00:28:48,610 --> 00:28:49,985 and more of the theory behind it. 662 00:28:49,985 --> 00:28:52,130 But again, I want to emphasize that today it's 663 00:28:52,130 --> 00:28:54,720 a very automated process is that you click a few buttons 664 00:28:54,720 --> 00:28:57,020 and anybody can do it.