1 00:00:01,110 --> 00:00:04,830 The following content is provided under a Creative Commons license. 2 00:00:04,830 --> 00:00:09,300 Your support will help MIT OpenCourseWare continue to offer high quality educational 3 00:00:09,300 --> 00:00:11,070 resources for free. 4 00:00:11,070 --> 00:00:19,400 To make a donation or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare 5 00:00:19,400 --> 00:00:21,430 at ocw.mit.edu. 6 00:00:21,430 --> 00:00:27,540 TADGE DRYJA: So today will be some sort of future technologies, future developments that 7 00:00:27,540 --> 00:00:31,090 might not be around yet, but are interesting things to look for. 8 00:00:31,090 --> 00:00:34,600 And if you're interested in researching these kinds of things, you want to make a EmenG 9 00:00:34,600 --> 00:00:37,620 out of this or want to make a-- who knows? 10 00:00:37,620 --> 00:00:42,170 PhD probably is a little overkill for a lot of these things. 11 00:00:42,170 --> 00:00:47,530 But if you're interested in this kind of research, here's some things we'll talk about today-- 12 00:00:47,530 --> 00:00:53,890 block slash block filters or committed bloom filters, sharding, accumulators, and UTXO 13 00:00:53,890 --> 00:00:54,890 commitments. 14 00:00:54,890 --> 00:01:02,510 OK, so, first one I'll talk about is block filters. 15 00:01:02,510 --> 00:01:06,630 So I don't think I ever really talked about what a Bloom filter is. 16 00:01:06,630 --> 00:01:11,530 I'm still not really going to explain how they work, but the basic idea, the sort of 17 00:01:11,530 --> 00:01:14,170 high level-- here's the prototype. 18 00:01:14,170 --> 00:01:15,250 Here's the function prototype. 19 00:01:15,250 --> 00:01:17,890 Here's the interface that Bloom filters have. 20 00:01:17,890 --> 00:01:21,990 So you make a filter from a bunch of objects. 21 00:01:21,990 --> 00:01:24,408 And in this case, objects are just bytes, right? 22 00:01:24,408 --> 00:01:27,420 Or some string of bytes. 23 00:01:27,420 --> 00:01:29,960 And Bloom filters usually use hash functions under the hood. 24 00:01:29,960 --> 00:01:32,408 A lot of times, they don't use cryptographic hash functions. 25 00:01:32,408 --> 00:01:36,310 They can use sort of faster functions, where there may be collisions. 26 00:01:36,310 --> 00:01:39,700 But in this case, it's not like a security problem. 27 00:01:39,700 --> 00:01:41,439 So you've got a bunch of objects. 28 00:01:41,439 --> 00:01:47,060 They might be addresses, which are 20 byte pubkey hashes, or your UTXOs, which you can 29 00:01:47,060 --> 00:01:53,689 represent as a 36 byte TIXD and out point, or TIXD and index. 30 00:01:53,689 --> 00:01:55,250 So they're small, right? 31 00:01:55,250 --> 00:01:59,229 So either 20 bytes, 36 bytes, sometimes 32 bytes. 32 00:01:59,229 --> 00:02:04,420 You put these list of data objects and make a filter, and you get a filter out. 33 00:02:04,420 --> 00:02:09,840 And the filters are usually a kilobyte sometimes. 34 00:02:09,840 --> 00:02:13,010 But the idea is it's sort of a mix of hash functions. 35 00:02:13,010 --> 00:02:14,560 And then you say, OK, I want to match filters. 36 00:02:14,560 --> 00:02:18,980 Usually, a different person does this and says, OK, I've got a filter that someone generated. 37 00:02:18,980 --> 00:02:23,329 And I compare it against this object and see if there's a hit, all right? 38 00:02:23,329 --> 00:02:25,310 So I see, OK, that matched. 39 00:02:25,310 --> 00:02:29,660 This object matched the filter, or this object did not match the filter. 40 00:02:29,660 --> 00:02:33,260 And this returns a true or false. 41 00:02:33,260 --> 00:02:35,730 And so what's interesting about these is you can have false positives. 42 00:02:35,730 --> 00:02:39,989 So it may be that this object was not in here. 43 00:02:39,989 --> 00:02:45,569 It was not used when creating the filter, but it still returns true. 44 00:02:45,569 --> 00:02:49,470 So it's sort of matching against multiple different hash functions, seeing, hey, do 45 00:02:49,470 --> 00:02:50,480 any of these bits match? 46 00:02:50,480 --> 00:02:55,110 And it says, oh, yeah, this object may have been in this filter. 47 00:02:55,110 --> 00:02:57,410 However, there's no false negatives. 48 00:02:57,410 --> 00:03:04,629 So if you did put, say, address A into this filter and then matched this filter against 49 00:03:04,629 --> 00:03:07,569 address A, it would always return true. 50 00:03:07,569 --> 00:03:14,140 There's no way you can put something into the filter, and then it doesn't show up when 51 00:03:14,140 --> 00:03:16,510 you try to match against it. 52 00:03:16,510 --> 00:03:19,349 So this is useful for lots of things. 53 00:03:19,349 --> 00:03:24,599 Bloom filters are definitely not restricted to cryptocurrency, Bitcoin, or anything like 54 00:03:24,599 --> 00:03:25,599 that. 55 00:03:25,599 --> 00:03:30,140 They're used all the time in various databases, all the time in things like that. 56 00:03:30,140 --> 00:03:36,260 The current way they're used in Bitcoin is for SPV filtering. 57 00:03:36,260 --> 00:03:38,620 So we defined SPV, like, months ago. 58 00:03:38,620 --> 00:03:41,250 The basic idea is I'm a client. 59 00:03:41,250 --> 00:03:43,900 I don't want to download and verify the whole blockchain. 60 00:03:43,900 --> 00:03:46,590 I want someone else to do that. 61 00:03:46,590 --> 00:03:48,420 I assume the miners are doing the right thing. 62 00:03:48,420 --> 00:03:51,410 I assume the rest of the network is doing the right thing. 63 00:03:51,410 --> 00:03:53,950 And I just want to know about my data. 64 00:03:53,950 --> 00:03:55,630 So I'm not going to download the whole block. 65 00:03:55,630 --> 00:03:59,980 I'm not going to verify all the signatures or keep a UTXO set on my own. 66 00:03:59,980 --> 00:04:04,310 I'm just concerned with my UTXOs in my wallet. 67 00:04:04,310 --> 00:04:07,120 So what I do-- and this exists today. 68 00:04:07,120 --> 00:04:09,110 You can do this. 69 00:04:09,110 --> 00:04:12,420 You make a Bloom filter of all your UTXOs and addresses. 70 00:04:12,420 --> 00:04:17,079 So you say, OK, here's all my addresses that I'm hoping to receive money on. 71 00:04:17,079 --> 00:04:22,210 I've got 20 of them, 30 of them, 100 of them-- however many-- although Bloom filters don't 72 00:04:22,210 --> 00:04:23,839 really work once you have too many. 73 00:04:23,839 --> 00:04:28,210 But so the idea is if you have five addresses, let's say, you start a wallet. 74 00:04:28,210 --> 00:04:29,539 You don't have any money. 75 00:04:29,539 --> 00:04:33,379 You're pretty sure you don't know of any money yet existing. 76 00:04:33,379 --> 00:04:36,199 But you say, OK, I made five addresses. 77 00:04:36,199 --> 00:04:37,490 I told these addresses to people. 78 00:04:37,490 --> 00:04:38,569 They might have sent me money. 79 00:04:38,569 --> 00:04:39,669 That'd be nice. 80 00:04:39,669 --> 00:04:42,499 So you make a Bloom filter of all these addresses, right? 81 00:04:42,499 --> 00:04:51,058 So let's say you've got, OK, address A-- dot dot dot-- A through E. And you make a filter. 82 00:04:51,058 --> 00:04:58,729 OK, so you say I've got my filter F. I then send that filter F to a remote server. 83 00:04:58,729 --> 00:04:59,879 So there's the cloud. 84 00:04:59,879 --> 00:05:04,300 There's some Bitcoin full node out here. 85 00:05:04,300 --> 00:05:10,659 It receives filter F. OK, so this got filter F. And it knows that this filter F is specific 86 00:05:10,659 --> 00:05:11,759 to me, right? 87 00:05:11,759 --> 00:05:17,589 So it sees that, hey, I'm a client. 88 00:05:17,589 --> 00:05:18,899 I'm an SPV client. 89 00:05:18,899 --> 00:05:23,069 I connect to a full node, and I say, hey, I sent a message that's called load filter. 90 00:05:23,069 --> 00:05:28,639 And I say, hey, load filter, here's my filter F. Only send stuff to me that matches this 91 00:05:28,639 --> 00:05:30,129 filter. 92 00:05:30,129 --> 00:05:35,058 99% of all the whole thing-- everything going on in Bitcoin, I don't care about. 93 00:05:35,058 --> 00:05:39,300 Here's a filter, and only send me messages that match this filter. 94 00:05:39,300 --> 00:05:45,800 And so when it sends a transact-- well, they send INV messages, right? 95 00:05:45,800 --> 00:05:49,740 When it sends an INV message with inventory, saying, hey, I found this thing that you might 96 00:05:49,740 --> 00:05:55,210 be interested in, normally, a full node to another full node-- so let's say there's communication 97 00:05:55,210 --> 00:06:00,779 between two full nodes, Full2. 98 00:06:00,779 --> 00:06:04,210 These guys will talk to each other about every transaction they see, right? 99 00:06:04,210 --> 00:06:07,089 So if they see a transaction, it's valid. 100 00:06:07,089 --> 00:06:08,539 There's nothing wrong with it. 101 00:06:08,539 --> 00:06:12,069 They'll just send it to each other to propagate transaction throughout the network, so they 102 00:06:12,069 --> 00:06:13,339 can get mined later. 103 00:06:13,339 --> 00:06:19,219 However, if a filter has been loaded, it says, oh, OK, I will only send you INV messages 104 00:06:19,219 --> 00:06:20,749 that match this filter. 105 00:06:20,749 --> 00:06:25,300 So if there's a transaction that doesn't have any of these five addresses as an output, 106 00:06:25,300 --> 00:06:28,800 they're just not going to send it to you. 107 00:06:28,800 --> 00:06:33,289 Similarly, when a block comes out-- this is the big one. 108 00:06:33,289 --> 00:06:35,620 Normally, blocks get propagated the same. 109 00:06:35,620 --> 00:06:37,949 They just send you the block. 110 00:06:37,949 --> 00:06:41,649 Here, we do something called a Merkle block. 111 00:06:41,649 --> 00:06:43,889 I don't send you a regular block. 112 00:06:43,889 --> 00:06:51,129 I filter everything within the block and send you only the things that match to that filter. 113 00:06:51,129 --> 00:06:54,810 So generally, it gets very small. 114 00:06:54,810 --> 00:06:57,779 So the Merkle block might just have one transaction in it. 115 00:06:57,779 --> 00:07:01,419 And it has the sort of Merkle proof up to the root. 116 00:07:01,419 --> 00:07:06,289 So the server sends only the matching transactions in the block, which can drop from a megabyte 117 00:07:06,289 --> 00:07:08,189 to less than a kilobyte. 118 00:07:08,189 --> 00:07:10,509 And then the client says, oh, cool. 119 00:07:10,509 --> 00:07:12,779 There's a transaction where I received money. 120 00:07:12,779 --> 00:07:13,779 Great. 121 00:07:13,779 --> 00:07:20,089 Also, let me update my filter to include this new transaction that I received. 122 00:07:20,089 --> 00:07:23,900 So if that gets spent later, I want to know about it, even if it doesn't send to one of 123 00:07:23,900 --> 00:07:25,150 these five addresses, right? 124 00:07:25,150 --> 00:07:29,360 So if you're only matching on addresses, you can only sort of get money. 125 00:07:29,360 --> 00:07:34,539 But if you're matching on these UTXOs, you can lose the money as well. 126 00:07:34,539 --> 00:07:38,479 And you don't want to lose money, but you sort of want to know when everyone else thinks 127 00:07:38,479 --> 00:07:41,319 you lost money. 128 00:07:41,319 --> 00:07:42,659 So this works today. 129 00:07:42,659 --> 00:07:44,539 This was implemented 2012-ish. 130 00:07:44,539 --> 00:07:51,169 The history behind it was the first Android Bitcoin wallet. 131 00:07:51,169 --> 00:07:54,930 Andreas Schildbach wrote it, and it didn't do this, right? 132 00:07:54,930 --> 00:08:00,800 It just downloaded the whole block and then threw away most of the data and only kept-- 133 00:08:00,800 --> 00:08:04,229 it wasn't a full node, and then it didn't keep a UTXO set. 134 00:08:04,229 --> 00:08:07,789 But it did download everything. 135 00:08:07,789 --> 00:08:09,739 And then they were saying, OK, this is really slow. 136 00:08:09,739 --> 00:08:14,839 We want a decentralized way to do this kind of thing, where instead of just connecting 137 00:08:14,839 --> 00:08:21,889 to a server-- so the other model I explained, again, weeks ago was, you just have some server. 138 00:08:21,889 --> 00:08:24,119 And you tell it the address and say, hey, here's my address. 139 00:08:24,119 --> 00:08:25,469 How much money do I have? 140 00:08:25,469 --> 00:08:29,759 And it sends you transactions, and you maintain your wallet that way. 141 00:08:29,759 --> 00:08:32,200 This is nicer because it's decentralized, right? 142 00:08:32,200 --> 00:08:33,909 Every full node can do this. 143 00:08:33,909 --> 00:08:39,209 And by default, if you download Bitcoin 0.16 or whatever recent versions-- not even recent-- 144 00:08:39,209 --> 00:08:40,269 since like 0.7? 145 00:08:40,269 --> 00:08:42,149 I don't know. 146 00:08:42,149 --> 00:08:49,399 Most of the versions will have this capability, where if a client says, hey, here's a Bloom 147 00:08:49,399 --> 00:08:50,399 filter. 148 00:08:50,399 --> 00:08:51,399 Load it. 149 00:08:51,399 --> 00:08:52,820 Your full node will load that filter. 150 00:08:52,820 --> 00:08:57,509 And then every block that comes in or every block that's requested, they will match against 151 00:08:57,509 --> 00:08:58,509 the filter. 152 00:08:58,509 --> 00:09:00,899 The filter match function call's not too heavy. 153 00:09:00,899 --> 00:09:02,279 It involves a bunch of hash functions. 154 00:09:02,279 --> 00:09:05,690 It's not too slow, but it's slower than doing nothing, right? 155 00:09:05,690 --> 00:09:07,550 It's slower than just sending it directly. 156 00:09:07,550 --> 00:09:09,870 OK, so this is nice, right? 157 00:09:09,870 --> 00:09:17,490 You can sync the entire chain in way less data and having SPV security. 158 00:09:17,490 --> 00:09:20,660 Problems-- it's really bad for privacy. 159 00:09:20,660 --> 00:09:22,220 You're sending a Bloom filter, right? 160 00:09:22,220 --> 00:09:26,810 So it's this thing that's created from your list of addresses. 161 00:09:26,810 --> 00:09:33,310 But in practice, it's got about the same security as just telling them all your addresses, right? 162 00:09:33,310 --> 00:09:36,650 So it's sort of like, oh, I'm sending a hash of my address, instead of my address. 163 00:09:36,650 --> 00:09:41,100 Well, yeah, but I know all the addresses in existence on the Bitcoin network. 164 00:09:41,100 --> 00:09:45,640 I can just try to match it, try to hash and stuff like that. 165 00:09:45,640 --> 00:09:49,440 When you do Bloom filters, there's this sort of false positive rate that's sort of a knob 166 00:09:49,440 --> 00:09:50,440 you can twist. 167 00:09:50,440 --> 00:09:56,000 And you can say, oh, I'm going to make a Bloom filter where 10% of the time, when you perform 168 00:09:56,000 --> 00:10:02,680 this match filter for any given object, I'm going to create a filter where 10% of the 169 00:10:02,680 --> 00:10:03,920 time, it'll just return true. 170 00:10:03,920 --> 00:10:05,610 So I can dial in a false positive rate. 171 00:10:05,610 --> 00:10:08,810 So I can say, OK, I'll make it 1%. 172 00:10:08,810 --> 00:10:17,029 And then when I get these matching transactions from the full node, yeah, I'll get an extra 173 00:10:17,029 --> 00:10:19,449 few transactions that don't match my filter. 174 00:10:19,449 --> 00:10:24,160 Or they match my filter, but they don't actually match anything I'm looking at. 175 00:10:24,160 --> 00:10:26,090 And that will improve my privacy, right? 176 00:10:26,090 --> 00:10:32,100 Because then the full node doesn't see what's-- he doesn't know what's truly mine. 177 00:10:32,100 --> 00:10:35,329 In fact, you don't even know what the false positive rate is. 178 00:10:35,329 --> 00:10:39,660 When you receive a filter, you don't know what the false positive rate is. 179 00:10:39,660 --> 00:10:43,079 You just see, OK, these things match, and you send them. 180 00:10:43,079 --> 00:10:48,459 Another strategy is OK, I'm a SPV client. 181 00:10:48,459 --> 00:10:55,540 I connect to a bunch of full nodes, and I can give different filters to each one. 182 00:10:55,540 --> 00:10:58,120 I think the initial software did this. 183 00:10:58,120 --> 00:11:03,089 And because you can sort of put some randomness into your different filters to hope-- why 184 00:11:03,089 --> 00:11:04,790 did they do this? 185 00:11:04,790 --> 00:11:06,660 It actually makes it worse. 186 00:11:06,660 --> 00:11:12,339 It actually makes it worse because if these full nodes collude-- collaborate-- whatever. 187 00:11:12,339 --> 00:11:17,339 If these full nodes share the information of the filters, it makes it easier to determine, 188 00:11:17,339 --> 00:11:19,069 to sort of filter out the false positives. 189 00:11:19,069 --> 00:11:22,700 Because they'll have different false positives because they had a different filter. 190 00:11:22,700 --> 00:11:28,570 And so if they collaborate, if they work together, they can say, oh, well, I got some false positive 191 00:11:28,570 --> 00:11:29,570 transactions. 192 00:11:29,570 --> 00:11:30,570 I did, too. 193 00:11:30,570 --> 00:11:34,990 We can filter out the ones that one of us had as a false positive and not the other, 194 00:11:34,990 --> 00:11:35,990 right? 195 00:11:35,990 --> 00:11:38,540 So we can detect the false positives we're sending to the client. 196 00:11:38,540 --> 00:11:41,029 So privacy is really bad. 197 00:11:41,029 --> 00:11:46,190 There's a paper written I think 2013 or 2014, where they basically broke the whole privacy 198 00:11:46,190 --> 00:11:49,779 argument for this Bloom filter based SPV. 199 00:11:49,779 --> 00:11:53,759 And they said in practice, you can get like 90 something percent of the addresses and 200 00:11:53,759 --> 00:11:57,439 UTXOs that people are sending for the software. 201 00:11:57,439 --> 00:12:02,540 And their recommendations were like, yeah, I don't see how you make this work privately. 202 00:12:02,540 --> 00:12:05,490 There's just no privacy here. 203 00:12:05,490 --> 00:12:10,470 It's slow for the servers, so when you're running-- I don't know if you can see. 204 00:12:10,470 --> 00:12:15,300 When you're running a full node-- OK, I should close that. 205 00:12:15,300 --> 00:12:17,579 Don't even know what that is. 206 00:12:17,579 --> 00:12:29,269 When you're running a full node, you can see here's all the nodes that are connected to 207 00:12:29,269 --> 00:12:32,790 my full node and running downstairs. 208 00:12:32,790 --> 00:12:35,240 Most of them are other full nodes. 209 00:12:35,240 --> 00:12:36,449 Oh, OK, these are not. 210 00:12:36,449 --> 00:12:37,730 I don't know what those are. 211 00:12:37,730 --> 00:12:39,310 Those are not true. 212 00:12:39,310 --> 00:12:42,139 All these 0.9, 0.99s, they're not actually nodes at all. 213 00:12:42,139 --> 00:12:44,670 They don't seem to ask for anything. 214 00:12:44,670 --> 00:12:46,749 And then some people put their version message. 215 00:12:46,749 --> 00:12:47,990 They put an address. 216 00:12:47,990 --> 00:12:49,800 Hopefully, someone will send them lots of money. 217 00:12:49,800 --> 00:12:52,649 It's not going to happen. 218 00:12:52,649 --> 00:12:53,800 That's weird. 219 00:12:53,800 --> 00:12:57,620 There's like no SPV-- well, OK, bitcoinj. 220 00:12:57,620 --> 00:13:03,060 So that's a Java implementation, which does filter load. 221 00:13:03,060 --> 00:13:05,819 And so we can look-- hm. 222 00:13:05,819 --> 00:13:14,829 A fee filter, but no filter load. 223 00:13:14,829 --> 00:13:17,069 Well, that's weird. 224 00:13:17,069 --> 00:13:19,829 Someone's sending me lots of different fee filter messages. 225 00:13:19,829 --> 00:13:21,699 There's lots of weird stuff going on in the Bitcoin network. 226 00:13:21,699 --> 00:13:28,529 But I weirdly don't have any SPV filter load things going on right now, which is unusual. 227 00:13:28,529 --> 00:13:31,560 I don't know. 228 00:13:31,560 --> 00:13:40,320 It changes a lot, too, based on-- so also, you can sort of track your hourly data usage. 229 00:13:40,320 --> 00:13:45,829 And this server basically is only for-- all this traffic is the clients connected. 230 00:13:45,829 --> 00:13:48,019 You know, Bitcoin. 231 00:13:48,019 --> 00:13:54,389 And so I'm doing, like, a gigabyte every hour, which is a lot for a home connection, but 232 00:13:54,389 --> 00:13:55,680 not too bad for here. 233 00:13:55,680 --> 00:13:58,480 But in December, everyone was interested in Bitcoin. 234 00:13:58,480 --> 00:14:01,329 And so you had lots of people downloading it, running it. 235 00:14:01,329 --> 00:14:05,759 And it would be something like 10 times this, where just tons of people were installing 236 00:14:05,759 --> 00:14:10,819 it, downloading it, getting a whole blockchain, and then probably after losing interest, deleting 237 00:14:10,819 --> 00:14:12,449 it, but whatever. 238 00:14:12,449 --> 00:14:17,550 And a lot of SPV clients doing filter loads can slow down your server. 239 00:14:17,550 --> 00:14:20,749 It can take CPU time. 240 00:14:20,749 --> 00:14:24,499 Right now, yeah, OK, so 6%, 2%. 241 00:14:24,499 --> 00:14:31,100 It's pretty low CPU usage generally for Bitcoin, even with that many-- however many that was-- 242 00:14:31,100 --> 00:14:34,920 30 or 40 different other nodes connecting and downloading stuff. 243 00:14:34,920 --> 00:14:39,580 Basically, this is like receiving transactions, verifying the signatures, and sending them 244 00:14:39,580 --> 00:14:40,580 out. 245 00:14:40,580 --> 00:14:42,529 And granted-- but this is per core, right? 246 00:14:42,529 --> 00:14:45,790 So if I have 2%, that's 2% of a single core. 247 00:14:45,790 --> 00:14:47,089 It's really not much. 248 00:14:47,089 --> 00:14:52,610 And then I guess Cryptokernel's only using 1%, so even less. 249 00:14:52,610 --> 00:15:00,279 So yeah, it's not much, but when you have a lot of SPV clients, it can start using a 250 00:15:00,279 --> 00:15:02,009 lot of CPU. 251 00:15:02,009 --> 00:15:05,569 OK, so how do we improve this? 252 00:15:05,569 --> 00:15:06,860 This is a new-ish idea. 253 00:15:06,860 --> 00:15:08,769 It's actually about two years old. 254 00:15:08,769 --> 00:15:10,490 And it was kind of interesting. 255 00:15:10,490 --> 00:15:16,709 It was just a random anonymous internet person posted on the mailing list. 256 00:15:16,709 --> 00:15:21,300 I think his email address was some inappropriate swear word or something. 257 00:15:21,300 --> 00:15:22,570 Anyway. 258 00:15:22,570 --> 00:15:28,410 But whoever this person was just said, hey, why don't we do it the other way? 259 00:15:28,410 --> 00:15:30,170 Why don't we do it backwards? 260 00:15:30,170 --> 00:15:36,019 And instead of having the client create a Bloom filter and send it to the full node, 261 00:15:36,019 --> 00:15:42,139 have the full nodes make Bloom filters from all the transactions within a block. 262 00:15:42,139 --> 00:15:45,399 And then the client will just ask for that filter. 263 00:15:45,399 --> 00:15:51,540 The client can then perform the filter match function on their own UTXO set. 264 00:15:51,540 --> 00:15:55,209 And then if they do find a match, they request the entire block. 265 00:15:55,209 --> 00:15:56,319 Right? 266 00:15:56,319 --> 00:15:58,370 So this is a different model. 267 00:15:58,370 --> 00:15:59,570 I don't know. 268 00:15:59,570 --> 00:16:04,730 Does it get the idea where, OK, so you're a client. 269 00:16:04,730 --> 00:16:06,279 All you do is request filters. 270 00:16:06,279 --> 00:16:08,619 So you say, filter please. 271 00:16:08,619 --> 00:16:13,089 So you have some kind of filter request. 272 00:16:13,089 --> 00:16:18,170 And then the full node just says, OK, for every block in the blockchain, I'm going to 273 00:16:18,170 --> 00:16:21,240 create a filter, right? 274 00:16:21,240 --> 00:16:26,750 I take all the objects in the block, which are basically all the addresses used in every 275 00:16:26,750 --> 00:16:29,990 transaction, all the UTXOs spent in every transaction. 276 00:16:29,990 --> 00:16:35,190 I concatenate that, so there's going to be 5,000, 10,000-- a lot of these objects. 277 00:16:35,190 --> 00:16:39,820 Put it into a really big Bloom filter, generally bigger than the ones used in this method. 278 00:16:39,820 --> 00:16:45,819 Because usually, a wallet won't have thousands of addresses or thousands of UTXOs. 279 00:16:45,819 --> 00:16:50,910 It's possible, but in this model, usually, you've got 20, 30, maybe 100. 280 00:16:50,910 --> 00:16:53,319 But in this case, you're going to have thousands. 281 00:16:53,319 --> 00:16:58,459 Make a larger filter, create the filter, and store it for each block. 282 00:16:58,459 --> 00:17:00,750 So maybe it's 20 kilobytes or something. 283 00:17:00,750 --> 00:17:03,369 And maybe in this case, they're only like 1 kilobyte. 284 00:17:03,369 --> 00:17:04,500 So you have a filter. 285 00:17:04,500 --> 00:17:07,939 And then the node will request these filters for every block. 286 00:17:07,939 --> 00:17:08,939 So it's OK, this block, get the filter. 287 00:17:08,939 --> 00:17:10,359 Get the filter. 288 00:17:10,359 --> 00:17:13,720 And then perform the matching on their own, right? 289 00:17:13,720 --> 00:17:15,520 So they've got the filter. 290 00:17:15,520 --> 00:17:18,619 They see, hey, does this filter match any of my addresses? 291 00:17:18,619 --> 00:17:23,689 So is there anything in this block that may have paid me or anything in this block where 292 00:17:23,689 --> 00:17:26,940 my transactions may have been spent? 293 00:17:26,940 --> 00:17:29,560 And if they get a true, they just request the whole block. 294 00:17:29,560 --> 00:17:33,309 They just download the whole 1 megabyte block or whatever it is. 295 00:17:33,309 --> 00:17:35,070 And there may be false positives, right? 296 00:17:35,070 --> 00:17:38,260 So they might be downloading the block for no reason. 297 00:17:38,260 --> 00:17:40,890 They download the whole block, see there's nothing of their address. 298 00:17:40,890 --> 00:17:41,890 Yeah. 299 00:17:41,890 --> 00:17:48,492 AUDIENCE: How does that work if we increase the block size to a ridiculous amount? 300 00:17:48,492 --> 00:17:49,492 [INAUDIBLE] 301 00:17:49,492 --> 00:17:50,492 TADGE DRYJA: Wait, ridiculous amount block size? 302 00:17:50,492 --> 00:17:51,492 Yeah, I guess. 303 00:17:51,492 --> 00:17:52,492 AUDIENCE: [INAUDIBLE] 304 00:17:52,492 --> 00:17:58,640 TADGE DRYJA: Well, I mean, they're not going to actually-- I don't think any of these have 305 00:17:58,640 --> 00:18:02,640 actually 32 megabyte blocks and that no one's using them. 306 00:18:02,640 --> 00:18:08,309 Even Bitcoin now, right, where the actual block usage has gone down substantially. 307 00:18:08,309 --> 00:18:12,649 You've got like-- I don't know-- 500k or something average now? 308 00:18:12,649 --> 00:18:13,649 It's low. 309 00:18:13,649 --> 00:18:18,930 Occasionally, you have lots of little transactions saturating the mempool and then full blocks 310 00:18:18,930 --> 00:18:20,630 for a few hours. 311 00:18:20,630 --> 00:18:22,760 But it's gone down, and it's not even at full usage. 312 00:18:22,760 --> 00:18:27,220 So and Bitcoin cash, yeah, you've got 8 megabyte max size, but-- 313 00:18:27,220 --> 00:18:28,270 AUDIENCE: [INAUDIBLE] 314 00:18:28,270 --> 00:18:30,380 TADGE DRYJA: Right, right. 315 00:18:30,380 --> 00:18:32,500 And they're going to do 32 megs full sized. 316 00:18:32,500 --> 00:18:37,610 But still, the actual blocks are like 10k, 20k, whatever. 317 00:18:37,610 --> 00:18:45,480 I mean, yeah, but if you did have actually 32 megabyte blocks, a false positive would 318 00:18:45,480 --> 00:18:47,950 be a big problem for a light node. 319 00:18:47,950 --> 00:18:51,950 Because now you have to download this 32 megabyte block, look through the whole thing. 320 00:18:51,950 --> 00:18:54,150 Actually, there was nothing of interest. 321 00:18:54,150 --> 00:18:56,130 It was a false positive. 322 00:18:56,130 --> 00:18:59,380 OK, try again. 323 00:18:59,380 --> 00:19:01,440 But yeah, you can download 32 megs. 324 00:19:01,440 --> 00:19:03,059 It's not the end of the world. 325 00:19:03,059 --> 00:19:08,310 And if you're only downloading one out of every 100 maybe, if you have a 1% false positive 326 00:19:08,310 --> 00:19:10,070 rate, it's not too bad. 327 00:19:10,070 --> 00:19:13,770 That's like 32 megs a day or so. 328 00:19:13,770 --> 00:19:20,059 But anyway this model is-- and not only that, you can request all the filters, match them, 329 00:19:20,059 --> 00:19:23,840 and then download from someone else, the full block, right? 330 00:19:23,840 --> 00:19:27,260 You can request a block, download it from someone else. 331 00:19:27,260 --> 00:19:30,310 So this full node, they know you requested all the filters. 332 00:19:30,310 --> 00:19:33,399 They don't see anything else other than that. 333 00:19:33,399 --> 00:19:38,150 And then another full node just sees you requesting blocks and thinks nothing of it. 334 00:19:38,150 --> 00:19:39,600 Because that's totally normal. 335 00:19:39,600 --> 00:19:44,490 This is a lot nicer model for privacy because the full nodes don't learn anything other. 336 00:19:44,490 --> 00:19:48,600 At most, they learn you downloaded this block and not this other block. 337 00:19:48,600 --> 00:19:52,240 So maybe you have transactions in this block, but not this one. 338 00:19:52,240 --> 00:19:57,580 But that's a much bigger sort of needle in a haystack problem, where OK, here are the 339 00:19:57,580 --> 00:19:58,630 blocks they used. 340 00:19:58,630 --> 00:20:03,260 What are the commonalities between these sets of blocks they were downloading? 341 00:20:03,260 --> 00:20:05,179 Possible to maybe weed things out, right? 342 00:20:05,179 --> 00:20:09,390 If the blocks are small, and there's only a few transactions, and they download the 343 00:20:09,390 --> 00:20:13,799 entire blockchain from a single node, and that node can track, OK, which blocks are 344 00:20:13,799 --> 00:20:14,799 being downloaded? 345 00:20:14,799 --> 00:20:17,510 What are the common transactions or addresses in these? 346 00:20:17,510 --> 00:20:18,510 It's possible. 347 00:20:18,510 --> 00:20:24,899 But it's a lot better for privacy than the current model and a lot better for CPU, right? 348 00:20:24,899 --> 00:20:26,690 So privacy-- great. 349 00:20:26,690 --> 00:20:28,490 Great improvement. 350 00:20:28,490 --> 00:20:34,960 The server has much lower CPU, because it can pre-compute all the filters for every 351 00:20:34,960 --> 00:20:36,060 block, right? 352 00:20:36,060 --> 00:20:39,770 So as soon as it downloads a block-- or a few seconds later-- there's no rush. 353 00:20:39,770 --> 00:20:44,010 Compute a Bloom filter for it, store it on disk-- because it doesn't change-- and then 354 00:20:44,010 --> 00:20:47,590 when anyone requests it, you've already got it on disk. 355 00:20:47,590 --> 00:20:50,580 So you just read it off the disk, send it over the network, you're done. 356 00:20:50,580 --> 00:20:55,240 The current model where you don't write these to disk because they're sort of client specific, 357 00:20:55,240 --> 00:21:00,190 so client connects in, sends you a Bloom filter, you have to keep that in RAM, and then match 358 00:21:00,190 --> 00:21:03,130 all the things against this specific filter for this specific user. 359 00:21:03,130 --> 00:21:09,510 Whereas in this model, you make a filter for the block, save it, you're good. 360 00:21:09,510 --> 00:21:10,510 Yes. 361 00:21:10,510 --> 00:21:20,060 AUDIENCE: Is the filter for the block [INAUDIBLE] given you're just sending the addresses and 362 00:21:20,060 --> 00:21:21,060 not [INAUDIBLE]? 363 00:21:21,060 --> 00:21:22,660 Or is it just less data? 364 00:21:22,660 --> 00:21:25,340 It's just not the full block, right? 365 00:21:25,340 --> 00:21:26,340 It's just the-- 366 00:21:26,340 --> 00:21:27,340 TADGE DRYJA: Yeah. 367 00:21:27,340 --> 00:21:29,529 So if it were the addresses and the UTXOs, then it would be really big. 368 00:21:29,529 --> 00:21:34,730 It would be-- oh, maybe like 40% of the whole block size. 369 00:21:34,730 --> 00:21:38,799 So the idea of the Bloom filter is you squish it down. 370 00:21:38,799 --> 00:21:43,750 So the Boom filter itself might only be like 20k. 371 00:21:43,750 --> 00:21:51,809 And so yeah, the basic way a Bloom filter will work is you take sort of a bunch of hashes 372 00:21:51,809 --> 00:21:55,370 and populate a bit field with them. 373 00:21:55,370 --> 00:22:00,210 So the thing is, if you keep adding objects to the filter, the filter will eventually 374 00:22:00,210 --> 00:22:02,510 just be like FFFF. 375 00:22:02,510 --> 00:22:03,840 And everything will match it. 376 00:22:03,840 --> 00:22:10,179 So you need to sort of decide how big the filter should be when you start creating and 377 00:22:10,179 --> 00:22:11,440 adding objects to it. 378 00:22:11,440 --> 00:22:17,730 So with this, you can get it down to about 20k, and then it'll have a pretty low false 379 00:22:17,730 --> 00:22:24,450 positive rate, but not tell you exactly what the addresses and UTXOs were. 380 00:22:24,450 --> 00:22:27,520 So it's a nice trade-off to have. 381 00:22:27,520 --> 00:22:33,490 Yeah, so I mean, the perfect sort of easiest Bloom filter would be, here's a list of all 382 00:22:33,490 --> 00:22:39,059 the addresses and all the UTXOs-- basically a block minus the signatures, which is sort 383 00:22:39,059 --> 00:22:41,570 of what you can get with segwit, where you say, hey. 384 00:22:41,570 --> 00:22:46,539 In segwit, you can say, hey, give me the block without all the segwit data. 385 00:22:46,539 --> 00:22:48,370 Because I'm just looking for things. 386 00:22:48,370 --> 00:22:51,940 I don't want to actually validate the signatures. 387 00:22:51,940 --> 00:22:57,010 So if you did that with regular-- yeah, it drops it by about 50%. 388 00:22:57,010 --> 00:23:02,080 But the Bloom filter drops it substantially more. 389 00:23:02,080 --> 00:23:07,299 So if you did this, it'd be better-- lower CPU for the server. 390 00:23:07,299 --> 00:23:09,610 It's also harder to lie and omit things. 391 00:23:09,610 --> 00:23:15,470 So in the current SPV model, if someone is running a client-- it says, hey, here's my 392 00:23:15,470 --> 00:23:16,690 Bloom filter. 393 00:23:16,690 --> 00:23:18,260 And the full node responds. 394 00:23:18,260 --> 00:23:22,710 The full node can easily just omit things. 395 00:23:22,710 --> 00:23:26,529 So there was a transaction that did hit the Bloom filter and did match. 396 00:23:26,529 --> 00:23:33,740 And address A was present in a transaction, and the full node just doesn't send it. 397 00:23:33,740 --> 00:23:39,940 There's really nothing the client can do to detect that kind of thing, which in general, 398 00:23:39,940 --> 00:23:42,429 it's not the end of the world. 399 00:23:42,429 --> 00:23:47,440 In normal Bitcoin usage, if you don't hear about a transaction, maybe you'll hear about 400 00:23:47,440 --> 00:23:49,580 it eventually. 401 00:23:49,580 --> 00:23:53,019 The worst they can do is sort of lie about you-- they say you didn't get paid, but you 402 00:23:53,019 --> 00:23:54,059 actually did. 403 00:23:54,059 --> 00:23:58,630 Not the worst thing in the world, although in Lightning Network, that can change a little 404 00:23:58,630 --> 00:24:03,840 bit in that you want to know about the transactions that potentially close the channel and immediately 405 00:24:03,840 --> 00:24:04,880 respond to them. 406 00:24:04,880 --> 00:24:08,920 So that's more of a security problem with Lightning. 407 00:24:08,920 --> 00:24:15,850 So with this, it could be harder to omit things, especially if you commit the Bloom filter 408 00:24:15,850 --> 00:24:17,970 into the Coinbase transaction. 409 00:24:17,970 --> 00:24:23,141 So if this committed filter becomes like a consensus rule, and you say, OK, everyone 410 00:24:23,141 --> 00:24:25,519 makes a 20 kilobyte Bloom filter. 411 00:24:25,519 --> 00:24:30,620 Everyone takes the hash of that and puts it into an op return in the Coinbase transaction 412 00:24:30,620 --> 00:24:33,000 the way they do a segwit. 413 00:24:33,000 --> 00:24:34,000 Then it's a consensus rule. 414 00:24:34,000 --> 00:24:40,289 Then it becomes essentially impossible for the full nodes to lie or omit anything. 415 00:24:40,289 --> 00:24:44,150 Because you can say, hey, I've got the headers. 416 00:24:44,150 --> 00:24:47,760 Give me the Coinbase transaction and a Merkle proof for it. 417 00:24:47,760 --> 00:24:49,090 And now I've got the Coinbase transaction. 418 00:24:49,090 --> 00:24:56,840 And now, hey, give me that filter that matches this committed hash in the Coinbase transaction. 419 00:24:56,840 --> 00:25:01,429 So they would have to do valid proof of work to lie or omit. 420 00:25:01,429 --> 00:25:04,840 Whereas now, they can just easily omit anything they want. 421 00:25:04,840 --> 00:25:07,990 Oh, lying also. 422 00:25:07,990 --> 00:25:11,549 So it becomes harder to lie because it's in a block, right? 423 00:25:11,549 --> 00:25:13,419 So this is operating on the block level. 424 00:25:13,419 --> 00:25:16,150 The current SPV does not operate on the block level. 425 00:25:16,150 --> 00:25:21,360 So you can get unconfirmed transactions over the wire that match your filter. 426 00:25:21,360 --> 00:25:23,890 I think this is a really bad idea. 427 00:25:23,890 --> 00:25:26,679 I'm not 100% sure why they put it in. 428 00:25:26,679 --> 00:25:29,309 But there's still people who like it. 429 00:25:29,309 --> 00:25:34,450 But the whole idea of SPV is that you're verifying the proof of work, right? 430 00:25:34,450 --> 00:25:37,480 You're verifying that the miners validated this. 431 00:25:37,480 --> 00:25:41,049 And you think, well, the incentives are such that miners don't want to mine invalid things. 432 00:25:41,049 --> 00:25:42,230 They won't get paid. 433 00:25:42,230 --> 00:25:47,139 So if it's in a block, I'll accept it as OK. 434 00:25:47,139 --> 00:25:51,880 For mempool transactions, if it's just an inv message for a transaction that's not in 435 00:25:51,880 --> 00:25:54,710 a block yet, there's no SPV security at all. 436 00:25:54,710 --> 00:26:01,230 And it's trivial to send an invalid transaction to an SPV client, this currently. 437 00:26:01,230 --> 00:26:05,419 So if you say, hey, here's-- and it's not only is it trivial, but you can also try to 438 00:26:05,419 --> 00:26:10,020 figure out what their addresses are, and lie to them, and say that, hey, you just got thousands 439 00:26:10,020 --> 00:26:12,049 of coins. 440 00:26:12,049 --> 00:26:16,919 So that's the current problem with SPV usage. 441 00:26:16,919 --> 00:26:20,799 You're basically telling the full load your addresses, and you're accepting transactions 442 00:26:20,799 --> 00:26:22,150 without proof of work. 443 00:26:22,150 --> 00:26:28,020 So the full load can say, hey, here's a transaction that sends you 5,000 coins to an address that 444 00:26:28,020 --> 00:26:29,020 I think you have. 445 00:26:29,020 --> 00:26:32,120 Because I tried to figure out your address from your filter. 446 00:26:32,120 --> 00:26:34,889 And it's got an input that doesn't actually exist, right? 447 00:26:34,889 --> 00:26:37,840 So I'm just saying I'm spending 5,000 coins from here. 448 00:26:37,840 --> 00:26:40,890 And the from part isn't actually a thing. 449 00:26:40,890 --> 00:26:43,460 But since you're an SPV node, you don't know that. 450 00:26:43,460 --> 00:26:45,510 So there's like a-- let's see. 451 00:26:45,510 --> 00:26:54,410 Lie to SPV is like a branch on Bitcoin [INAUDIBLE]. 452 00:26:54,410 --> 00:26:59,510 OK, yeah. 453 00:26:59,510 --> 00:27:04,760 So there's a branch that Peter Todd made called Lie to SPV. 454 00:27:04,760 --> 00:27:10,590 I don't know. 455 00:27:10,590 --> 00:27:13,210 Quick and dirty hack to lie to SPV wallets. 456 00:27:13,210 --> 00:27:18,280 They can't verify amounts, so yeah. 457 00:27:18,280 --> 00:27:21,490 So you can just sort of detect-- well, if you want to look at it. 458 00:27:21,490 --> 00:27:24,460 You can detect their addresses. 459 00:27:24,460 --> 00:27:29,780 And then I think that one just sort of opportunistically, if it finds a match, just multiplies the amount 460 00:27:29,780 --> 00:27:32,610 that they're receiving by, like, 100. 461 00:27:32,610 --> 00:27:35,980 And there's no way that they can validate that since it's in mempool. 462 00:27:35,980 --> 00:27:40,350 So this also makes it on the block level, which is what SPV really should be. 463 00:27:40,350 --> 00:27:42,700 So that's nice. 464 00:27:42,700 --> 00:27:46,759 Downsides-- mainly that it's going to be higher network traffic for the client, right? 465 00:27:46,759 --> 00:27:53,240 So even at low false positive rates, the entire rest of the block is essentially a false positive, 466 00:27:53,240 --> 00:27:54,240 right? 467 00:27:54,240 --> 00:27:56,399 So they're like, hey, there's one transaction in this block I want to get. 468 00:27:56,399 --> 00:27:59,049 I have to download the whole block to get it. 469 00:27:59,049 --> 00:28:03,750 So yeah, more network traffic for the client, which is a downside, but it helps with privacy. 470 00:28:03,750 --> 00:28:06,190 So there is current development. 471 00:28:06,190 --> 00:28:10,120 Lighting Labs-- basically [INAUDIBLE] working on-- it's called neutrino. 472 00:28:10,120 --> 00:28:12,350 It's a variant of this. 473 00:28:12,350 --> 00:28:18,889 So, and then hopefully, something like this will eventually get into Bitcoin Core itself 474 00:28:18,889 --> 00:28:23,890 and replace the current server side Bloom filter code. 475 00:28:23,890 --> 00:28:27,740 Hopefully, but something to work on. 476 00:28:27,740 --> 00:28:31,380 Any questions about the Bloom filter stuff? 477 00:28:31,380 --> 00:28:34,240 Cool, OK. 478 00:28:34,240 --> 00:28:38,049 OK, other issue, sharding. 479 00:28:38,049 --> 00:28:41,480 So this is mainly being worked on in the context of Ethereum. 480 00:28:41,480 --> 00:28:45,960 And it's sort of their sort of holy grail of scalability. 481 00:28:45,960 --> 00:28:51,610 It's common in the database world, where you've got d data objects, n servers. 482 00:28:51,610 --> 00:28:56,140 So in the case of Bitcoin or these blockchains, you just store d times n, right? 483 00:28:56,140 --> 00:28:58,360 Every node stores all data. 484 00:28:58,360 --> 00:29:03,190 Instead, store something closer to d itself and shard the data over all the servers, so 485 00:29:03,190 --> 00:29:06,460 that each server holds like d divided by n, right? 486 00:29:06,460 --> 00:29:11,130 So if you have 10 servers and a gigabyte, have them each store 100 megs. 487 00:29:11,130 --> 00:29:12,870 And then you still got all the data. 488 00:29:12,870 --> 00:29:18,080 Of course, if they actually store exactly d over n-- and you need to coordinate it so 489 00:29:18,080 --> 00:29:24,440 they all store their own little shard-- and then if any single node goes down, well, you're 490 00:29:24,440 --> 00:29:25,440 stuck. 491 00:29:25,440 --> 00:29:27,799 So this is sort of the limit. 492 00:29:27,799 --> 00:29:33,299 And there's no redundancy there. 493 00:29:33,299 --> 00:29:35,240 But you can have different redundancy ratings. 494 00:29:35,240 --> 00:29:38,740 So you could say, OK, well, any-- you could have [INAUDIBLE] you're coding. 495 00:29:38,740 --> 00:29:43,059 So if any five nodes or any 20% of the nodes disappear, we're still OK. 496 00:29:43,059 --> 00:29:48,409 So in the database world, this is a well studied problem. 497 00:29:48,409 --> 00:29:53,700 But in the context of Bitcoin, Ethereum, and blockchains, it's more difficult. 498 00:29:53,700 --> 00:29:57,780 It's difficult here because you're in this adversarial environment, where people are 499 00:29:57,780 --> 00:30:01,590 trying to break your system at all times. 500 00:30:01,590 --> 00:30:06,470 People want to create transactions that are invalid. 501 00:30:06,470 --> 00:30:08,649 Because that can be worth a lot of money. 502 00:30:08,649 --> 00:30:15,880 I can say, hey, you don't know about this shard, but I'm telling you that on this shard, 503 00:30:15,880 --> 00:30:17,020 I have a lot of money. 504 00:30:17,020 --> 00:30:21,169 And I'm sending it to you, so give me your house or whatever. 505 00:30:21,169 --> 00:30:28,679 So the idea is to split a single UTXO set into multiple smaller sets, and that part 506 00:30:28,679 --> 00:30:29,679 is OK. 507 00:30:29,679 --> 00:30:31,889 But you need communication between the shards, right? 508 00:30:31,889 --> 00:30:38,100 So you could say, OK, well, making sort of a bunch of different UTXO sets, each node 509 00:30:38,100 --> 00:30:41,150 can choose their own UTXO set that they're keeping track of. 510 00:30:41,150 --> 00:30:45,759 And then you have some kind of merge mining between the UTXO sets, but you need swaps 511 00:30:45,759 --> 00:30:46,820 between the shards. 512 00:30:46,820 --> 00:30:51,650 It's kind of an interesting thing to think of. 513 00:30:51,650 --> 00:30:54,690 We already have multiple UTXO sets, right? 514 00:30:54,690 --> 00:31:02,360 So as of this morning, coinmarketcap.com tracks 1,614 different currencies, 10,000 markets. 515 00:31:02,360 --> 00:31:05,960 There's supposedly $434 billion going around. 516 00:31:05,960 --> 00:31:10,200 Is this sharding, right? 517 00:31:10,200 --> 00:31:17,790 And it's sort of a joke, but in a real sense, it has taken scalability pressure off of Bitcoin 518 00:31:17,790 --> 00:31:23,009 and off of any individual currency because there's so many of them. 519 00:31:23,009 --> 00:31:26,340 So if you didn't have Dogecoin, maybe there will be more Bitcoin transactions. 520 00:31:26,340 --> 00:31:30,880 AUDIENCE: And how does this play into Bitcoins that are just running on Ether? 521 00:31:30,880 --> 00:31:37,570 TADGE DRYJA: Right, so in the case of ERC-20 or ERC-721, it actually is worse. 522 00:31:37,570 --> 00:31:46,950 Because now you've got sort of multiple UTXO sets all being managed by a single UTXO set. 523 00:31:46,950 --> 00:31:51,210 So if you want to keep track of how many-- I don't know. 524 00:31:51,210 --> 00:31:53,159 What's an ECR-20 token? 525 00:31:53,159 --> 00:31:58,379 If you want to keep track of how many Pied Piper coins there are, you need to download 526 00:31:58,379 --> 00:32:00,879 the entire Ethereum blockchain. 527 00:32:00,879 --> 00:32:05,490 So that's sort of the opposite of sharding in that now any single UTXO set you want to 528 00:32:05,490 --> 00:32:08,040 keep track of, you need to keep track of all of them. 529 00:32:08,040 --> 00:32:12,649 However, in this case, if I want to keep track of my Bitcoin UTXO set, I don't need to download 530 00:32:12,649 --> 00:32:13,649 Dogecoin. 531 00:32:13,649 --> 00:32:14,649 So that's great. 532 00:32:14,649 --> 00:32:19,500 And if people want to swap between Dogecoin and Bitcoin, they can do so-- well, Dogecoin 533 00:32:19,500 --> 00:32:21,009 doesn't have segwit support, right? 534 00:32:21,009 --> 00:32:22,870 So it's a little harder. 535 00:32:22,870 --> 00:32:28,620 But Vertcoin, Litecoin-- a lot of different coins have fairly easy swaps. 536 00:32:28,620 --> 00:32:30,370 But it's more than just swaps. 537 00:32:30,370 --> 00:32:35,559 We need actual fungibility between the shards. 538 00:32:35,559 --> 00:32:42,049 Because if I can say, oh, I'm going to use Litecoin, and I can just swap to Bitcoin whenever 539 00:32:42,049 --> 00:32:44,740 I need to pay someone who accepts Bitcoin, right, maybe. 540 00:32:44,740 --> 00:32:50,250 But maybe Litecoin drops in value 20% with respect to Bitcoin. 541 00:32:50,250 --> 00:32:53,000 And then I tried to pay. 542 00:32:53,000 --> 00:32:57,289 And this is sort of getting ahead of the real use cases of these things. 543 00:32:57,289 --> 00:33:04,880 Because well, Bitcoin also drops 20% randomly against whatever asset you're trying to buy. 544 00:33:04,880 --> 00:33:09,720 And so but Bitcoin does tend to be a bit more stable than most of the smaller market cap 545 00:33:09,720 --> 00:33:10,720 coins. 546 00:33:10,720 --> 00:33:15,789 I mean, you can sort of think that generally, if you have a bigger market capitalization, 547 00:33:15,789 --> 00:33:19,799 you're going to tend to be less volatile. 548 00:33:19,799 --> 00:33:24,799 And we've seen that in Bitcoin, where it still seems ridiculously volatile, right? 549 00:33:24,799 --> 00:33:27,269 It's gone down 50% since January. 550 00:33:27,269 --> 00:33:32,419 And most currencies in the developed world don't do that. 551 00:33:32,419 --> 00:33:37,200 But if you actually compare it to in 2011, it dropped, like, 95% in a month or two. 552 00:33:37,200 --> 00:33:38,200 Yeah. 553 00:33:38,200 --> 00:33:43,039 AUDIENCE: It's more likely to be liquid supply [INAUDIBLE] 554 00:33:43,039 --> 00:33:50,279 TADGE DRYJA: Yeah, so sometimes, there is a currency that has these really inflated 555 00:33:50,279 --> 00:33:52,379 market caps. 556 00:33:52,379 --> 00:33:57,990 So you'll see one where someone makes a coin and says, OK, I'm making a million coins. 557 00:33:57,990 --> 00:34:00,830 And I'll sell you one for $100. 558 00:34:00,830 --> 00:34:05,580 And really, I still have all of the coins, and one other person has one of them. 559 00:34:05,580 --> 00:34:09,320 But we can sort of do the math and get a market cap of $100 million that way. 560 00:34:09,320 --> 00:34:12,360 So a lot of the coins do that to sort of inflate. 561 00:34:12,360 --> 00:34:16,670 Because a coin market cap is literally a ranking. 562 00:34:16,670 --> 00:34:20,370 And it even says rank. 563 00:34:20,370 --> 00:34:24,489 So if you click EOS, it'll say rank. 564 00:34:24,489 --> 00:34:25,989 Where does it say there? 565 00:34:25,989 --> 00:34:26,989 Yeah, rank five. 566 00:34:26,989 --> 00:34:30,409 So it's the fifth best, presumably. 567 00:34:30,409 --> 00:34:35,380 And yeah, to what extent-- how many actual people hold these things? 568 00:34:35,380 --> 00:34:41,389 They just sort of made them up. 569 00:34:41,389 --> 00:34:43,492 But there's all sorts of problems with this. 570 00:34:43,492 --> 00:34:47,840 But the idea is, if you really want sharding, you want the swaps between the shards to not 571 00:34:47,840 --> 00:34:52,429 really have counter parties and to maintain the same value. 572 00:34:52,429 --> 00:34:56,730 You want fungibility between the shards, so that you can quickly and easily say, OK, well, 573 00:34:56,730 --> 00:35:01,620 I've got something on shard A. I'm going to pay someone who's using shard B. And I don't 574 00:35:01,620 --> 00:35:02,620 want any friction. 575 00:35:02,620 --> 00:35:06,830 I don't want any exchange between there. 576 00:35:06,830 --> 00:35:08,810 So this is hard. 577 00:35:08,810 --> 00:35:12,040 There's a lot of cool research going on here. 578 00:35:12,040 --> 00:35:14,790 And if it works, it's a real scalability improvement. 579 00:35:14,790 --> 00:35:17,110 This is sort of the holy grail. 580 00:35:17,110 --> 00:35:21,330 I don't really see much in-- in Bitcoin, they're sort of like side chains. 581 00:35:21,330 --> 00:35:24,890 But those weren't really talked about as a scalability improvement. 582 00:35:24,890 --> 00:35:28,950 And it's mostly the Ethereum crowd that are like, this is our real sort of holy grail 583 00:35:28,950 --> 00:35:30,740 for scalability. 584 00:35:30,740 --> 00:35:31,760 So it's interesting to look at. 585 00:35:31,760 --> 00:35:32,760 I haven't kept up. 586 00:35:32,760 --> 00:35:36,500 I've read a little bit about their most recent sharding ideas. 587 00:35:36,500 --> 00:35:38,540 So it's cool if it works. 588 00:35:38,540 --> 00:35:41,600 But there's a lot of sort of different assumptions that go in. 589 00:35:41,600 --> 00:35:42,600 Yeah. 590 00:35:42,600 --> 00:35:47,360 AUDIENCE: What are some of the current plans to do sharding? 591 00:35:47,360 --> 00:35:49,680 TADGE DRYJA: They sort of like-- a lot of them hinge on fraud proofs. 592 00:35:49,680 --> 00:35:54,100 So the idea is, you've got, say, five different chains going along. 593 00:35:54,100 --> 00:35:56,550 And you don't validate the other four. 594 00:35:56,550 --> 00:35:58,560 You say I'm going to only validate this one. 595 00:35:58,560 --> 00:36:00,380 There's four others going on. 596 00:36:00,380 --> 00:36:05,030 And if, in my chain, something bad happens, like a transaction with an invalid signature 597 00:36:05,030 --> 00:36:11,370 gets confirmed, or transaction has more coins coming out than going in-- something like 598 00:36:11,370 --> 00:36:14,440 that-- you provide a small fraud proof. 599 00:36:14,440 --> 00:36:18,420 You provide a proof that says, OK, you don't need to know everything that's going on in 600 00:36:18,420 --> 00:36:19,420 this chain. 601 00:36:19,420 --> 00:36:25,630 But I'll provide enough data to convince you that this specific transaction is broken. 602 00:36:25,630 --> 00:36:30,000 And then I try to broadcast to the other people in those other four subchains. 603 00:36:30,000 --> 00:36:33,010 And then they know, OK, something's going on wrong here. 604 00:36:33,010 --> 00:36:37,680 Let's freeze-- let's not accept any cross shard swaps from that one. 605 00:36:37,680 --> 00:36:42,170 And then the other-- this chain will have to reorg. 606 00:36:42,170 --> 00:36:45,010 So the idea is as long as you have-- and it sort of makes sense. 607 00:36:45,010 --> 00:36:49,660 As long as you have some number of people checking it that can then broadcast it between 608 00:36:49,660 --> 00:36:52,010 the different shards, you can assume that they're doing OK. 609 00:36:52,010 --> 00:36:53,010 Yeah. 610 00:36:53,010 --> 00:36:54,010 AUDIENCE: Problems temporarily arise from [INAUDIBLE]. 611 00:36:54,010 --> 00:37:01,430 So you'll have a situation where [INAUDIBLE]. 612 00:37:01,430 --> 00:37:19,481 And then you wait two days, and then the transaction [INAUDIBLE] which relies on data from two 613 00:37:19,481 --> 00:37:20,481 years ago for that shard. 614 00:37:20,481 --> 00:37:21,481 All of those shards have disappeared. 615 00:37:21,481 --> 00:37:22,481 So you have a choice. 616 00:37:22,481 --> 00:37:23,481 Do you do it to your real or just accept that that money is gone? 617 00:37:23,481 --> 00:37:24,481 TADGE DRYJA: Yeah. 618 00:37:24,481 --> 00:37:25,481 Or just accept that, oh, well, it's two years ago. 619 00:37:25,481 --> 00:37:26,481 So I'll assume it's OK. 620 00:37:26,481 --> 00:37:27,481 AUDIENCE: Yeah. 621 00:37:27,481 --> 00:37:28,481 TADGE DRYJA: Yeah. 622 00:37:28,481 --> 00:37:29,481 Which is probably more dangerous. 623 00:37:29,481 --> 00:37:30,481 Yeah, so sort of availability and liveness are other issues here, where if in Bitcoin 624 00:37:30,481 --> 00:37:33,410 or Ethereum-- well, it's more of an issue in Ethereum. 625 00:37:33,410 --> 00:37:36,400 In Bitcoin, there's a lot of copies of the full set. 626 00:37:36,400 --> 00:37:41,270 So if you're not sure about something, you can pretty easily get the entire blockchain, 627 00:37:41,270 --> 00:37:44,100 even though it's like 180 gigs, and go through it. 628 00:37:44,100 --> 00:37:46,010 There's so many copies of it out there. 629 00:37:46,010 --> 00:37:48,270 In Ethereum, a little bit less so. 630 00:37:48,270 --> 00:37:49,950 Well, there's more full nodes. 631 00:37:49,950 --> 00:37:51,820 But full is sort of redefined. 632 00:37:51,820 --> 00:37:53,870 And so it may be harder to get the full thing. 633 00:37:53,870 --> 00:37:59,780 And in the case of sharding, if you divide it too finally, it may be that you lose data, 634 00:37:59,780 --> 00:38:01,770 and no one has it. 635 00:38:01,770 --> 00:38:03,690 And then you're in real trouble. 636 00:38:03,690 --> 00:38:07,020 OK, so that's sharding. 637 00:38:07,020 --> 00:38:10,630 OK, accumulators-- this is something I'm actually looking at. 638 00:38:10,630 --> 00:38:17,520 I'm not going to go into the details of what I'm working on, but accumulators in general. 639 00:38:17,520 --> 00:38:22,950 This is a cool-- so accumulators, as I will describe, are nothing new. 640 00:38:22,950 --> 00:38:27,150 But if you read the papers-- so one of the first papers was called One Way Accumulators 641 00:38:27,150 --> 00:38:28,400 in, like, '93. 642 00:38:28,400 --> 00:38:31,560 And if you read the paper, a lot of the words jump out. 643 00:38:31,560 --> 00:38:33,530 It's like, hey, this might be useful for Bitcoin. 644 00:38:33,530 --> 00:38:38,500 It's like, set membership, and timestamping, and signature aggregate. 645 00:38:38,500 --> 00:38:41,390 It's got a lot of stuff in there that's like, hey, this could be useful. 646 00:38:41,390 --> 00:38:46,900 So an accumulator is basically a cryptographic set. 647 00:38:46,900 --> 00:38:51,870 And there's some set operations that you can do and then provide proofs. 648 00:38:51,870 --> 00:38:54,930 So, the simplest-- well, not even the simplest. 649 00:38:54,930 --> 00:38:57,320 The simplest would just be add and prove. 650 00:38:57,320 --> 00:38:58,920 But sometimes you can add. 651 00:38:58,920 --> 00:38:59,940 Sometimes you can remove. 652 00:38:59,940 --> 00:39:01,420 Sometimes you can prove something's in there. 653 00:39:01,420 --> 00:39:04,510 Sometimes you can prove something's not in there. 654 00:39:04,510 --> 00:39:06,490 And if you can do all four, that's even better. 655 00:39:06,490 --> 00:39:09,460 So the idea is, you take an accumulator. 656 00:39:09,460 --> 00:39:11,150 And you add an object to it. 657 00:39:11,150 --> 00:39:15,500 And in general, objects are going to be strings of bytes or just numbers, right? 658 00:39:15,500 --> 00:39:16,990 And then it spits out a new accumulator. 659 00:39:16,990 --> 00:39:20,690 Essentially, it modifies the accumulator in place. 660 00:39:20,690 --> 00:39:23,210 So if you delete-- you say OK, I've got an accumulator. 661 00:39:23,210 --> 00:39:24,990 And I want to delete this object. 662 00:39:24,990 --> 00:39:29,170 Well, it'll modify in place and return a different accumulator. 663 00:39:29,170 --> 00:39:33,320 And then maybe I want to prove that this object is in this accumulator. 664 00:39:33,320 --> 00:39:34,790 And it will return a Boolean. 665 00:39:34,790 --> 00:39:36,950 Like, yep, that worked, or no, it didn't. 666 00:39:36,950 --> 00:39:43,340 So the simplest example, which I think is kind of fun, composite numbers. 667 00:39:43,340 --> 00:39:47,110 So accumulate prime numbers. 668 00:39:47,110 --> 00:39:48,790 So to add, multiply. 669 00:39:48,790 --> 00:39:50,450 To delete, divide. 670 00:39:50,450 --> 00:39:55,510 So really, if you're going to do this, you start with 1. 671 00:39:55,510 --> 00:40:00,300 1 is not a prime, I guess, but whatever. 672 00:40:00,300 --> 00:40:02,440 So let's say you've got-- so yeah. 673 00:40:02,440 --> 00:40:03,750 1 is not a prime, so that works. 674 00:40:03,750 --> 00:40:07,650 So you cannot prove any prime exists within that accumulator. 675 00:40:07,650 --> 00:40:11,120 Anyway, but let's say you start with 1, and then you added 3 to the accumulator. 676 00:40:11,120 --> 00:40:12,900 So you multiplied by 3, and you get 3. 677 00:40:12,900 --> 00:40:15,210 Now I want to add the number 5 to the accumulator. 678 00:40:15,210 --> 00:40:16,620 So I multiply by 5. 679 00:40:16,620 --> 00:40:17,940 I get 15. 680 00:40:17,940 --> 00:40:21,740 And now I want to add 7 to this accumulator. 681 00:40:21,740 --> 00:40:23,860 So I've got my accumulator, which is 15. 682 00:40:23,860 --> 00:40:27,250 I add the number 7 into it, and I get 105, right? 683 00:40:27,250 --> 00:40:28,910 I just multiply by 7. 684 00:40:28,910 --> 00:40:31,070 I get 105. 685 00:40:31,070 --> 00:40:33,940 Wait, is it called fundamental theorem of arithmetic? 686 00:40:33,940 --> 00:40:36,700 I think that's what it's called, where everything's a product of primes. 687 00:40:36,700 --> 00:40:40,010 It's got some cool name. 688 00:40:40,010 --> 00:40:43,310 So everything's a product of primes. 689 00:40:43,310 --> 00:40:45,060 And everything has a unique factorization. 690 00:40:45,060 --> 00:40:48,980 So 105 is 3 times 5 times 7, right? 691 00:40:48,980 --> 00:40:54,360 There's no other ways around it. 692 00:40:54,360 --> 00:40:59,280 And if you want to delete, you can say, OK, well, I'm going to delete the number 5 from 693 00:40:59,280 --> 00:41:00,280 this accumulator. 694 00:41:00,280 --> 00:41:01,280 Just divide. 695 00:41:01,280 --> 00:41:02,280 Now I get 21. 696 00:41:02,280 --> 00:41:04,820 And then I want to prove 7 is in there. 697 00:41:04,820 --> 00:41:07,250 So I can say, hey, 7 is in this accumulator. 698 00:41:07,250 --> 00:41:09,590 It was added, but not deleted. 699 00:41:09,590 --> 00:41:10,640 And I can do that, right? 700 00:41:10,640 --> 00:41:13,300 I tried to divide 21 by 7, and it worked. 701 00:41:13,300 --> 00:41:15,810 I want to divide, and I want to make sure there's no remainder. 702 00:41:15,810 --> 00:41:17,380 I want to see that it divides evenly. 703 00:41:17,380 --> 00:41:19,410 So I get a true. 704 00:41:19,410 --> 00:41:22,550 Yep, if I divide 21 by 7, I get 3. 705 00:41:22,550 --> 00:41:24,550 3 is a natural number. 706 00:41:24,550 --> 00:41:25,550 It works. 707 00:41:25,550 --> 00:41:29,600 So this is kind of cool. 708 00:41:29,600 --> 00:41:32,480 It is not really [INAUDIBLE]. 709 00:41:32,480 --> 00:41:35,290 Oh, but this works even if you do modulo. 710 00:41:35,290 --> 00:41:40,970 So you could do modulo some big prime and have just formality based accumulators. 711 00:41:40,970 --> 00:41:44,260 So anyway, you get the idea, right? 712 00:41:44,260 --> 00:41:48,660 I think in this case, you can also prove things are not in there. 713 00:41:48,660 --> 00:41:49,960 But this is limited to prime numbers. 714 00:41:49,960 --> 00:41:53,150 Anyway, but the idea is, keep adding things to it, removing things from it, and proving 715 00:41:53,150 --> 00:41:55,690 that things are in it. 716 00:41:55,690 --> 00:41:59,790 So there's RSA accumulators, which are some of the most well-known. 717 00:41:59,790 --> 00:42:05,260 And that's you've got some RSA number, which is basically a product of two large primes. 718 00:42:05,260 --> 00:42:09,210 And the accumulator itself is of constant size. 719 00:42:09,210 --> 00:42:11,440 And the proofs are also of constant size. 720 00:42:11,440 --> 00:42:13,290 So we call this a proof. 721 00:42:13,290 --> 00:42:17,770 You're just giving the number itself in this case. 722 00:42:17,770 --> 00:42:24,170 So it's efficient, but the RSA accumulators use trusted setup. 723 00:42:24,170 --> 00:42:29,610 So the idea is you need to find some composite number n, which is p times q, where p and 724 00:42:29,610 --> 00:42:33,250 q are prime, where nobody knows p and q. 725 00:42:33,250 --> 00:42:37,220 Or nobody knows or you trust that the person who does know p and q not to screw around 726 00:42:37,220 --> 00:42:38,990 with the accumulator. 727 00:42:38,990 --> 00:42:43,990 Because the person who-- if knowledge of p-- I think it's actually knowledge of p or q-- 728 00:42:43,990 --> 00:42:50,050 will let you create proofs that, hey, this object is in the accumulator when really, 729 00:42:50,050 --> 00:42:52,319 it isn't. 730 00:42:52,319 --> 00:42:55,500 So that's not so much fun for Bitcoin. 731 00:42:55,500 --> 00:43:00,080 If you need trusted setup, people don't really like that. 732 00:43:00,080 --> 00:43:01,870 There's all these other accumulator assay ideas. 733 00:43:01,870 --> 00:43:03,780 Some are one way where you can't delete. 734 00:43:03,780 --> 00:43:08,380 You can add things to the accumulator, but there's no way to remove it. 735 00:43:08,380 --> 00:43:12,710 Sometimes you can batch things, where, OK, if I want-- so you can see in the composite 736 00:43:12,710 --> 00:43:14,830 number accumulator, you could batch things. 737 00:43:14,830 --> 00:43:20,490 Where if I add 105 to the accumulator, I'm performing one operation that essentially 738 00:43:20,490 --> 00:43:22,810 adds three objects, right? 739 00:43:22,810 --> 00:43:25,180 3, 5, and 7. 740 00:43:25,180 --> 00:43:27,420 Some can be batched like that, some cannot. 741 00:43:27,420 --> 00:43:29,530 Some have trusted setup, some don't. 742 00:43:29,530 --> 00:43:33,570 So there's different tradeoffs for all these different use cases. 743 00:43:33,570 --> 00:43:38,740 And in the case of Bitcoin, the idea of an accumulator would be put the UTXO set in it, 744 00:43:38,740 --> 00:43:40,590 or put the STXO. 745 00:43:40,590 --> 00:43:44,280 Like, spent the transaction outputs into it. 746 00:43:44,280 --> 00:43:49,650 And then prove, in the case of UTXOs, prove that it's in the accumulator. 747 00:43:49,650 --> 00:43:53,310 Or in the case of STXOs, prove that it's no longer in the accumulator. 748 00:43:53,310 --> 00:43:54,650 Right? 749 00:43:54,650 --> 00:43:58,750 Provided proof of non-inclusion, that, hey, it's not in this STXO set. 750 00:43:58,750 --> 00:44:03,230 So, well, with the STXO inclusion-- so let's say you did it that way. 751 00:44:03,230 --> 00:44:10,740 You'd have headers, and then you've got this STXO accumulator. 752 00:44:10,740 --> 00:44:17,950 And what you can do there is say I've got a transaction. 753 00:44:17,950 --> 00:44:19,470 It's got some inputs. 754 00:44:19,470 --> 00:44:23,560 I prove that this input exists in the headers, right? 755 00:44:23,560 --> 00:44:25,650 I provide you an SPV proof. 756 00:44:25,650 --> 00:44:29,280 So I say, OK, at some point, this was created, right? 757 00:44:29,280 --> 00:44:30,570 This input. 758 00:44:30,570 --> 00:44:33,620 Maybe a year ago, I showed you, OK, there is this header. 759 00:44:33,620 --> 00:44:35,830 Here's a Merkle proof that this was in a block. 760 00:44:35,830 --> 00:44:37,860 So this exists. 761 00:44:37,860 --> 00:44:41,490 Then I also prove somehow that it doesn't exist in here, right? 762 00:44:41,490 --> 00:44:43,930 It exists, but it was never spent. 763 00:44:43,930 --> 00:44:48,070 So now you can accept that, oh, OK, he gave me an SPV proof. 764 00:44:48,070 --> 00:44:51,050 He gave me a non-inclusion to the STXO set proof. 765 00:44:51,050 --> 00:44:58,190 So I know this transaction, this input still exists and can be spent. 766 00:44:58,190 --> 00:45:03,600 So what this would do, if you get it working, you don't need to store the UTXOs anymore. 767 00:45:03,600 --> 00:45:08,080 You just store the accumulator, and then everyone provides proofs that, hey, I've got these 768 00:45:08,080 --> 00:45:09,470 coins. 769 00:45:09,470 --> 00:45:14,180 But so right now, if you store the UTXO set, it's a couple of gigabytes-- 3 or 4 gigs. 770 00:45:14,180 --> 00:45:22,000 And when someone sends a transaction, you just look in your UTXO set, right? 771 00:45:22,000 --> 00:45:23,190 So you see this input. 772 00:45:23,190 --> 00:45:25,850 You say, hey, does that exist in my UTXO set? 773 00:45:25,850 --> 00:45:27,080 OK, it does. 774 00:45:27,080 --> 00:45:28,080 Cool. 775 00:45:28,080 --> 00:45:31,230 We'll now verify the signature, verify everything else about the transaction. 776 00:45:31,230 --> 00:45:34,390 If it doesn't exist in my UTXO set, I'm like, hey, you're trying to spend something that 777 00:45:34,390 --> 00:45:37,220 isn't there. 778 00:45:37,220 --> 00:45:42,360 So what would be cool is if you got rid of the UTXO set and only used an accumulator. 779 00:45:42,360 --> 00:45:45,200 Because the accumulators are constant size. 780 00:45:45,200 --> 00:45:51,850 So even if the UTXO set is 20 gigabytes, well, I've got this 10 kilobyte accumulator thing 781 00:45:51,850 --> 00:45:53,230 on my hard drive. 782 00:45:53,230 --> 00:45:55,350 And I just modify that in place. 783 00:45:55,350 --> 00:45:59,090 And now I don't need to store a bazillion gigabytes. 784 00:45:59,090 --> 00:46:04,120 And I basically can have the same security as a full node. 785 00:46:04,120 --> 00:46:07,200 I'm still a full node, but I just don't store the UTXO set. 786 00:46:07,200 --> 00:46:13,950 I just require either SPV proofs or STXOs-- different proofs from the people trying to 787 00:46:13,950 --> 00:46:15,790 spend the transactions. 788 00:46:15,790 --> 00:46:17,220 So this is really cool. 789 00:46:17,220 --> 00:46:19,430 Constant size, the proofs are small. 790 00:46:19,430 --> 00:46:21,460 Sometimes the proofs are really small. 791 00:46:21,460 --> 00:46:23,400 And then the wallets track the proofs. 792 00:46:23,400 --> 00:46:25,360 So some questions. 793 00:46:25,360 --> 00:46:30,050 Proofs can be like constant size, or sometimes they're log n. 794 00:46:30,050 --> 00:46:35,470 If the proofs are o of n, then it's not useful, right? 795 00:46:35,470 --> 00:46:37,590 Because you might as well store the UTXO set at that point. 796 00:46:37,590 --> 00:46:38,940 Also, what is n? 797 00:46:38,940 --> 00:46:41,870 Is it the number of transactions, blocks? 798 00:46:41,870 --> 00:46:43,530 Can you aggregate these operations? 799 00:46:43,530 --> 00:46:46,930 So there's a bunch of questions there. 800 00:46:46,930 --> 00:46:52,170 Another really big problem-- so I was talking to Peter Wool, who's sort of one of the premier 801 00:46:52,170 --> 00:46:53,910 researchers on Bitcoin. 802 00:46:53,910 --> 00:46:58,390 He had been talking to people at Stanford about a lattice based accumulator that did 803 00:46:58,390 --> 00:46:59,960 not need trusted setup. 804 00:46:59,960 --> 00:47:03,570 And he was very excited about it because it was like, hey, there's no trusted setup. 805 00:47:03,570 --> 00:47:05,480 You have constant size proofs. 806 00:47:05,480 --> 00:47:07,360 The accumulator itself is pretty small. 807 00:47:07,360 --> 00:47:09,900 CPU wise, it seems doable. 808 00:47:09,900 --> 00:47:17,300 The thing that sort of killed it was you couldn't batch operations that accumulated. 809 00:47:17,300 --> 00:47:20,470 And the other thing is we need some kind of bridge node. 810 00:47:20,470 --> 00:47:24,040 So the idea is, normal transactions, right? 811 00:47:24,040 --> 00:47:26,260 They just say, here's my input. 812 00:47:26,260 --> 00:47:27,700 I've got a couple inputs. 813 00:47:27,700 --> 00:47:30,370 I've got a couple outputs. 814 00:47:30,370 --> 00:47:32,180 I don't provide any proofs or anything. 815 00:47:32,180 --> 00:47:34,670 I just point to what I'm spending. 816 00:47:34,670 --> 00:47:38,170 And you look in your UTXO set and see if it's there. 817 00:47:38,170 --> 00:47:42,040 With this new idea with accumulators, you're going to have to stick proofs on. 818 00:47:42,040 --> 00:47:47,150 So really, it's an extra data structure, probably per input. 819 00:47:47,150 --> 00:47:50,940 So you say, hey, I'm a node. 820 00:47:50,940 --> 00:47:52,130 I run an accumulator. 821 00:47:52,130 --> 00:47:53,640 I don't keep the whole set. 822 00:47:53,640 --> 00:47:58,650 So for all your inputs, please provide proofs. 823 00:47:58,650 --> 00:48:05,610 And wallets can maintain these proofs and attach them to their transactions. 824 00:48:05,610 --> 00:48:07,280 And then the nodes will verify them. 825 00:48:07,280 --> 00:48:11,690 However, right now, most wallets don't have these proofs-- have no idea that this is a 826 00:48:11,690 --> 00:48:12,690 thing, right? 827 00:48:12,690 --> 00:48:17,330 So if you're the first node to do this and say, hey, I'm getting rid of my UTXO set. 828 00:48:17,330 --> 00:48:22,570 I'm only going to verify proofs that these UTXOs exist. 829 00:48:22,570 --> 00:48:24,910 Most of the wallets won't provide those proofs. 830 00:48:24,910 --> 00:48:28,760 And so as a new node, say, hey, I got rid of my UTXO set, but no one's giving you proofs. 831 00:48:28,760 --> 00:48:29,870 OK, I'm just stuck, right? 832 00:48:29,870 --> 00:48:31,250 I see a block come out. 833 00:48:31,250 --> 00:48:32,250 I can't validate it. 834 00:48:32,250 --> 00:48:35,710 So you're going to need some kind of transition, right? 835 00:48:35,710 --> 00:48:40,530 If you started from scratch and say, OK, the responsibility of every wallet is not just 836 00:48:40,530 --> 00:48:45,890 keep your private keys and keep track of what your UTXO is, it's also to keep track of a 837 00:48:45,890 --> 00:48:46,890 proof. 838 00:48:46,890 --> 00:48:49,560 So that when you want to spend it, you give it to someone else. 839 00:48:49,560 --> 00:48:52,380 If you started that way, great. 840 00:48:52,380 --> 00:48:56,350 But if you want to transition, what you're kind of going to need is a bridge node. 841 00:48:56,350 --> 00:49:05,460 And the idea of a bridge node is you've got-- so here's an accumulator node, where it requires 842 00:49:05,460 --> 00:49:06,660 proofs. 843 00:49:06,660 --> 00:49:10,760 Here's a old node that just has regular UTXOs. 844 00:49:10,760 --> 00:49:17,460 So when the old node sends a transaction, this gives TX proof, right? 845 00:49:17,460 --> 00:49:23,070 So this basically has proofs for everything, like all proofs. 846 00:49:23,070 --> 00:49:28,880 So the bridge node can provide proofs. 847 00:49:28,880 --> 00:49:30,620 You'll need one bridge node, right? 848 00:49:30,620 --> 00:49:36,560 So if these accumulator nodes talk to each other, they can send the proofs along to each 849 00:49:36,560 --> 00:49:37,980 other and stuff like that. 850 00:49:37,980 --> 00:49:42,330 The old nodes cannot send proofs because they're not aware of it in their software. 851 00:49:42,330 --> 00:49:45,890 But if you have the new nodes, you can say, OK, well, when I receive a transaction, I 852 00:49:45,890 --> 00:49:46,970 verify the proofs. 853 00:49:46,970 --> 00:49:52,230 I also keep the proofs and give them to other peers who want to verify them. 854 00:49:52,230 --> 00:49:55,090 But you need some kind of bridge between these two networks. 855 00:49:55,090 --> 00:49:56,090 Yeah. 856 00:49:56,090 --> 00:50:00,070 AUDIENCE: You would just keep that running until everyone switches? 857 00:50:00,070 --> 00:50:03,300 TADGE DRYJA: Probably forever. 858 00:50:03,300 --> 00:50:04,300 So yeah. 859 00:50:04,300 --> 00:50:08,270 So the issue with the bridge node is like, well, if you're an old wallet, you don't to 860 00:50:08,270 --> 00:50:10,490 keep track of these proofs. 861 00:50:10,490 --> 00:50:14,100 So, how much of a problem is this? 862 00:50:14,100 --> 00:50:21,510 And the lattice based accumulator, that was sort of what killed it, was that you couldn't 863 00:50:21,510 --> 00:50:22,700 batch operations. 864 00:50:22,700 --> 00:50:27,600 Sorry, and the proofs changed. 865 00:50:27,600 --> 00:50:34,830 That's not the case with the sort of silly prime number accumulator that I showed. 866 00:50:34,830 --> 00:50:39,720 But in the lattice space one, the proofs changed every time an accumulator operation happened. 867 00:50:39,720 --> 00:50:43,370 So every time an ad happened, your proof had to change. 868 00:50:43,370 --> 00:50:47,310 So you didn't just need to add something and modify the accumulator in place. 869 00:50:47,310 --> 00:50:50,730 You had to modify your proofs in place as well. 870 00:50:50,730 --> 00:50:56,830 And so if you're a wallet, and you've got three UTXOs, that meant that every block, 871 00:50:56,830 --> 00:50:59,750 you'd have to do a couple thousand operations, right? 872 00:50:59,750 --> 00:51:04,690 Because every block, you're going to have a few thousand adds and a few thousand deletes 873 00:51:04,690 --> 00:51:06,160 to this accumulator. 874 00:51:06,160 --> 00:51:09,740 Every time one of those happened, you'd have to modify your proof for each of your three 875 00:51:09,740 --> 00:51:10,980 UTXOs. 876 00:51:10,980 --> 00:51:15,010 So, a little ugly but doable, because you've got three UTXOs. 877 00:51:15,010 --> 00:51:19,930 You basically 3x the number of operations, set operations in the block. 878 00:51:19,930 --> 00:51:26,960 The bridge node, on the other hand, has to keep track of something like 70 million UTXOs. 879 00:51:26,960 --> 00:51:27,960 Right? 880 00:51:27,960 --> 00:51:29,380 That's about how many there are right now. 881 00:51:29,380 --> 00:51:36,020 And so since there's no way to batch it, that means you're going to have maybe 10k times 882 00:51:36,020 --> 00:51:38,110 70 million every block. 883 00:51:38,110 --> 00:51:43,730 It's just not going to happen. 884 00:51:43,730 --> 00:51:46,610 And then it's like, well, maybe you can sort of try to shard the bridge nodes. 885 00:51:46,610 --> 00:51:50,300 And this bridge node only keeps track of this portion of the set. 886 00:51:50,300 --> 00:51:52,860 But it just looked really daunting. 887 00:51:52,860 --> 00:51:56,580 And that was why he was sort of like, yeah, I don't think this lattice-- I think it's 888 00:51:56,580 --> 00:52:02,410 really key that either there's some kind of batching operation, where we can consolidate 889 00:52:02,410 --> 00:52:07,840 all the operations within a single block to one set operation for the accumulator, or 890 00:52:07,840 --> 00:52:13,840 try to make it so that the proofs don't have to be updated or something like that. 891 00:52:13,840 --> 00:52:19,510 So this is still an unsolved problem, although I'm working on a fun way that I think might 892 00:52:19,510 --> 00:52:21,950 work to do this. 893 00:52:21,950 --> 00:52:23,560 And I don't want to-- because this is videotaped. 894 00:52:23,560 --> 00:52:24,970 It's going on the internet and stuff. 895 00:52:24,970 --> 00:52:26,260 I don't want to talk about it. 896 00:52:26,260 --> 00:52:32,270 But if you have questions, we can talk about it at office hours or something. 897 00:52:32,270 --> 00:52:37,290 So if it works, it'll be cool, although in some cases, it might-- so in the case of that 898 00:52:37,290 --> 00:52:40,860 bridge node, it was the bridge node that really killed it. 899 00:52:40,860 --> 00:52:45,240 Because it was just like, oh, you're going to have to do billions of operations per block. 900 00:52:45,240 --> 00:52:50,260 But in other cases, it's like, well, you can have accumulators that seem good, but aren't 901 00:52:50,260 --> 00:52:52,060 actually faster. 902 00:52:52,060 --> 00:52:54,460 Because verifying the proofs takes a long time or something like that. 903 00:52:54,460 --> 00:53:01,640 So it's sort of like with the range proofs or something like nimble-wear where the o 904 00:53:01,640 --> 00:53:03,090 of n is great. 905 00:53:03,090 --> 00:53:07,770 But you've got these constant factors that, in practice, end up meaning you're not actually 906 00:53:07,770 --> 00:53:08,770 faster, right? 907 00:53:08,770 --> 00:53:11,790 Because the Bitcoin UTXO set, well, it's 4 gigs. 908 00:53:11,790 --> 00:53:12,790 Not even, right? 909 00:53:12,790 --> 00:53:15,550 It's 3 and 1/2 gigs now. 910 00:53:15,550 --> 00:53:20,050 So in practice, there's probably a lot of cool cryptographic technologies. 911 00:53:20,050 --> 00:53:22,280 And you can write a whole paper and have all these cool things. 912 00:53:22,280 --> 00:53:26,470 And then if you actually implement it, it's like, well, it actually goes twice as slow 913 00:53:26,470 --> 00:53:27,910 as just using the regular. 914 00:53:27,910 --> 00:53:30,560 Also, this is super optimized. 915 00:53:30,560 --> 00:53:36,160 One of the biggest sort of code engineering things that has happened in Bitcoin Core is 916 00:53:36,160 --> 00:53:38,540 OK, how can we make database updates to this? 917 00:53:38,540 --> 00:53:44,670 How can we make different caching, different flushes to disk, and editing level DB itself. 918 00:53:44,670 --> 00:53:49,890 So making this UTXO set database operation faster is a big thing in Bitcoin. 919 00:53:49,890 --> 00:53:51,970 So it's pretty optimized. 920 00:53:51,970 --> 00:53:56,300 And even if you've got something that seems like in computer science terms, hey, this 921 00:53:56,300 --> 00:54:03,200 is log n instead of n, like, yeah, but what about the constant factors? 922 00:54:03,200 --> 00:54:06,200 So that's another issue for these. 923 00:54:06,200 --> 00:54:07,330 OK. 924 00:54:07,330 --> 00:54:11,090 Other things-- I was going to talk-- well, yeah. 925 00:54:11,090 --> 00:54:16,920 I'll do last one, and then we can maybe end a little early and talk about projects if 926 00:54:16,920 --> 00:54:17,920 you want. 927 00:54:17,920 --> 00:54:23,160 UTXO commitments, which is somewhat in the same region as accumulators-- a little bit 928 00:54:23,160 --> 00:54:24,670 different idea. 929 00:54:24,670 --> 00:54:27,040 And this exists in some coins. 930 00:54:27,040 --> 00:54:33,480 Like in Ethereum, Joe Bonneau was saying that you basically have a tree of all the different 931 00:54:33,480 --> 00:54:37,580 contracts that exist, all the different addresses in Ethereum. 932 00:54:37,580 --> 00:54:40,310 And the root of that tree appears in every block header. 933 00:54:40,310 --> 00:54:44,821 So you can make these proofs that coins exist based on a block header. 934 00:54:44,821 --> 00:54:46,310 So it exists in Ethereum. 935 00:54:46,310 --> 00:54:50,490 It doesn't exist in Bitcoin, but people have been talking about it for years. 936 00:54:50,490 --> 00:54:56,640 The simplest would say take a hash of the UTXO set and put it in every Coinbase transaction. 937 00:54:56,640 --> 00:55:02,260 And make that a consensus rule, so that when you mine a block, you have to take the hash 938 00:55:02,260 --> 00:55:04,430 of the UTXO set that you're aware of. 939 00:55:04,430 --> 00:55:06,810 Put it in the block so that everyone can see it. 940 00:55:06,810 --> 00:55:10,840 And if everyone disagrees, they're going to invalidate that block. 941 00:55:10,840 --> 00:55:11,980 This is really simple. 942 00:55:11,980 --> 00:55:16,480 You probably want to do a little bit fancier than this. 943 00:55:16,480 --> 00:55:22,331 A bit more useful-- instead of just taking a linear hash of the entire UTXO set, make 944 00:55:22,331 --> 00:55:25,170 it a Merkle tree, right? 945 00:55:25,170 --> 00:55:28,760 And that way, yeah, sure, it's very little extra hashing. 946 00:55:28,760 --> 00:55:31,830 Well, no, wait, twice as much hashing, right? 947 00:55:31,830 --> 00:55:32,830 But anyway. 948 00:55:32,830 --> 00:55:38,430 2x the hashing, but now you've got the ability to make small proofs. 949 00:55:38,430 --> 00:55:45,000 And then you can prove an output exists at a given block height pretty easily at SPV 950 00:55:45,000 --> 00:55:46,010 level security. 951 00:55:46,010 --> 00:55:50,950 So currently, you can prove that a transaction exists at a block height. 952 00:55:50,950 --> 00:55:55,700 And you can also prove that a transaction was consumed at a certain block height, right? 953 00:55:55,700 --> 00:56:04,440 So if you have all these different blocks, 2, 3, 4, 5, you can say, hey, here's a Merkle 954 00:56:04,440 --> 00:56:09,650 proof that transaction 1 was included in block 2. 955 00:56:09,650 --> 00:56:16,760 And then you can prove here, OK, here's transaction 2 that consumes one of the outputs from transaction 956 00:56:16,760 --> 00:56:18,820 1. 957 00:56:18,820 --> 00:56:25,740 But you can't prove just given-- but also, you can omit this proof, right? 958 00:56:25,740 --> 00:56:33,030 So if you want to prove that-- so at block 6, how do I prove that the outputs from transaction 959 00:56:33,030 --> 00:56:36,360 1 still exist at block 6? 960 00:56:36,360 --> 00:56:37,360 You're stuck, right? 961 00:56:37,360 --> 00:56:44,130 You basically have to go through blocks 3 through 6 and look at the whole thing. 962 00:56:44,130 --> 00:56:47,850 So with current SPV proofs, we can prove inclusion, right? 963 00:56:47,850 --> 00:56:51,440 You sort of think of this as an accumulator, where you've got-- a Merkle tree is kind of 964 00:56:51,440 --> 00:56:53,250 an accumulator. 965 00:56:53,250 --> 00:56:56,480 I can prove inclusion, and then I can prove transaction 1 is in block 2. 966 00:56:56,480 --> 00:57:02,680 I can prove that transaction 2 consumes transaction 1 in block 5. 967 00:57:02,680 --> 00:57:06,810 But you have to rely on me to give you that proof honestly, right? 968 00:57:06,810 --> 00:57:12,930 If I'm trying to cheat you and say, oh yeah, I've totally still got money, I can't prove 969 00:57:12,930 --> 00:57:14,660 that it hasn't been deleted. 970 00:57:14,660 --> 00:57:21,590 Whereas with a UTXO set commitment, if every block, there was a total commitment to every 971 00:57:21,590 --> 00:57:27,170 UTXO, I could just say, hey, at block 6, my output from transaction 1, it's still in there. 972 00:57:27,170 --> 00:57:28,310 Well, it wouldn't be in this case. 973 00:57:28,310 --> 00:57:30,090 But I could prove it at block 4, right? 974 00:57:30,090 --> 00:57:33,650 And say, hey, look, transaction 1, it's still in there at block 4. 975 00:57:33,650 --> 00:57:35,990 No longer there at block 6. 976 00:57:35,990 --> 00:57:43,460 Depending on how you construct the UTXO commitment, you may be able to provide non inclusion proofs, 977 00:57:43,460 --> 00:57:45,410 which is also really cool. 978 00:57:45,410 --> 00:57:53,260 So for example, if you had it be a Merkle tree, if it was just in order insertion-- 979 00:57:53,260 --> 00:58:01,580 so this is TX1, this is TX2, this is TX3, and you make a Merkle tree that way, then 980 00:58:01,580 --> 00:58:05,900 you cannot prove that something's not in it, right? 981 00:58:05,900 --> 00:58:14,020 So when TX3 gets deleted and now TX4 goes here, and TX5 comes here, you can't prove 982 00:58:14,020 --> 00:58:16,640 the TX3's not there anymore in any real way. 983 00:58:16,640 --> 00:58:23,180 However, if you sort them so that every time so now you sort them canonically, so like 984 00:58:23,180 --> 00:58:28,740 by hash, just like greater than, less than, and then the order gets all weird, right? 985 00:58:28,740 --> 00:58:37,050 So now TX5 is here and TX6 is here because TX5's hash is closer to 1 and stuff like that. 986 00:58:37,050 --> 00:58:38,740 Then you can prove non-inclusion. 987 00:58:38,740 --> 00:58:46,310 So then you can say, hey, TX3 is not here because TX3 would be between 5 and 4, right? 988 00:58:46,310 --> 00:58:48,390 Just based on what the hash looks like. 989 00:58:48,390 --> 00:58:49,390 And I can show you 5. 990 00:58:49,390 --> 00:58:50,540 I'm going to prove for it. 991 00:58:50,540 --> 00:58:52,290 I can show you 4 and approve for it. 992 00:58:52,290 --> 00:58:56,320 I can show that these two things are adjacent, right, in the tree. 993 00:58:56,320 --> 00:58:58,750 And there's no 3 there, and it would be there. 994 00:58:58,750 --> 00:59:01,281 If it existed, it would be right here. 995 00:59:01,281 --> 00:59:02,800 And I can show that it's not there. 996 00:59:02,800 --> 00:59:06,850 So then I have non-inclusion proofs in the UTXO set, which could be used for some kind 997 00:59:06,850 --> 00:59:07,900 of fraud proof. 998 00:59:07,900 --> 00:59:13,810 So if someone makes a transaction spending something, you can say, no, look, here is 999 00:59:13,810 --> 00:59:14,810 the last block. 1000 00:59:14,810 --> 00:59:16,460 Here's the UTXO set hash. 1001 00:59:16,460 --> 00:59:20,160 And here's a proof that this input doesn't exist. 1002 00:59:20,160 --> 00:59:22,170 So this must be an invalid transaction. 1003 00:59:22,170 --> 00:59:23,400 That would be really cool, too. 1004 00:59:23,400 --> 00:59:29,650 Because then you could propagate that on the network and sort of prove fraud. 1005 00:59:29,650 --> 00:59:32,570 So this is an idea that's been around for a long time. 1006 00:59:32,570 --> 00:59:39,940 I think maybe the reason it hasn't yet been implemented is, no one can really agree on 1007 00:59:39,940 --> 00:59:41,450 exactly how to do it. 1008 00:59:41,450 --> 00:59:43,420 A lot of people are like, yeah, we should do that. 1009 00:59:43,420 --> 00:59:44,700 That would be cool. 1010 00:59:44,700 --> 00:59:48,800 Yeah, you can prove [INAUDIBLE]. 1011 00:59:48,800 --> 00:59:52,810 So, one thing you could do would be to skip years of initial block download. 1012 00:59:52,810 --> 00:59:56,550 So I don't really care what happened from 2009 to 2015. 1013 00:59:56,550 --> 00:59:59,390 I assume it was fine, right? 1014 00:59:59,390 --> 01:00:04,520 I'm going to start my synchronization in 2016 and just synchronize the last two years. 1015 01:00:04,520 --> 01:00:09,640 So I'll go get this UTXO commitment from end of 2015. 1016 01:00:09,640 --> 01:00:15,260 I will then not just get the commitment, but download a UTXO snapshot from that time. 1017 01:00:15,260 --> 01:00:20,070 And then I'll check that it matches the committed UTXO set hash. 1018 01:00:20,070 --> 01:00:26,120 And then I'll just start from there and then become a full node, where I didn't check the 1019 01:00:26,120 --> 01:00:29,170 first six years or something. 1020 01:00:29,170 --> 01:00:35,680 And its interesting mix of SPV security and regular full node security, and I think personally 1021 01:00:35,680 --> 01:00:38,570 that's probably OK, right? 1022 01:00:38,570 --> 01:00:43,440 If you only verify the last six months of signatures, well, if everyone's been wrong 1023 01:00:43,440 --> 01:00:45,980 for six months, we have bigger problems, right? 1024 01:00:45,980 --> 01:00:53,790 If there's some erroneous transaction in 2015 and the entire network has been just extending 1025 01:00:53,790 --> 01:00:58,720 for three years without reorging, like, wait, what? 1026 01:00:58,720 --> 01:01:03,880 So on the one hand, you don't want to do this for today's blocks, right? 1027 01:01:03,880 --> 01:01:06,550 Because then the miners have full control. 1028 01:01:06,550 --> 01:01:08,510 And the miners can just make up transactions. 1029 01:01:08,510 --> 01:01:12,751 And if you really just say, oh, well, the miners are doing the right thing, then they 1030 01:01:12,751 --> 01:01:14,330 get a lot more power. 1031 01:01:14,330 --> 01:01:18,310 But if you say, well, other people are validating it in real time. 1032 01:01:18,310 --> 01:01:22,850 And I assume that that's the case, and I'll validate six months' worth of transactions. 1033 01:01:22,850 --> 01:01:26,410 So I'll pick up any errors in the last six months and be able to report them. 1034 01:01:26,410 --> 01:01:30,580 But I will assume that after a certain amount of time, I'm pretty sure it's been OK. 1035 01:01:30,580 --> 01:01:32,550 Everyone else has been looking for this. 1036 01:01:32,550 --> 01:01:36,010 So that's a slightly different model than full nodes. 1037 01:01:36,010 --> 01:01:37,780 But in practice, not really. 1038 01:01:37,780 --> 01:01:44,280 Because if you look at the code for Bitcoin today, there's assume valid, which doesn't 1039 01:01:44,280 --> 01:01:45,280 check signatures. 1040 01:01:45,280 --> 01:01:49,790 It sort of has a block hash and says, OK, anything before this, don't check signatures. 1041 01:01:49,790 --> 01:01:54,210 The developers themselves have said, yeah, every signature before here is OK, so you 1042 01:01:54,210 --> 01:01:58,380 have to check the signatures, which is a little nicer. 1043 01:01:58,380 --> 01:02:02,531 Before they had check points, where it would just not validate anything before a certain 1044 01:02:02,531 --> 01:02:03,531 block hash. 1045 01:02:03,531 --> 01:02:08,540 So you sort of already have things like that to speed up the initial synchronization process. 1046 01:02:08,540 --> 01:02:12,560 But a UTXO commitment would make it a lot more sort of decentralized. 1047 01:02:12,560 --> 01:02:17,590 Because right now, it's just the programmers are like, well, this is last year. 1048 01:02:17,590 --> 01:02:18,590 Everyone's validated. 1049 01:02:18,590 --> 01:02:23,610 Let's just hard code this into the code, into the client. 1050 01:02:23,610 --> 01:02:25,580 Anyway. 1051 01:02:25,580 --> 01:02:27,800 So this is an idea. 1052 01:02:27,800 --> 01:02:31,500 The issues-- timing is probably one of the biggest issues. 1053 01:02:31,500 --> 01:02:37,680 If it's consensus critical, then the miners need to put this into their blocks. 1054 01:02:37,680 --> 01:02:41,520 And miners also need to verify it when they get a block. 1055 01:02:41,520 --> 01:02:48,060 And adding even a second to creating and verifying a block can centralize mining a bit. 1056 01:02:48,060 --> 01:02:52,210 Because the idea isl I want to be able to immediately start mining as soon as I see 1057 01:02:52,210 --> 01:02:54,650 a block. 1058 01:02:54,650 --> 01:03:00,100 Because a larger miner doesn't need to verify the block that just came out, right? 1059 01:03:00,100 --> 01:03:01,100 Because they created it. 1060 01:03:01,100 --> 01:03:04,260 If they created a block themselves, they know it's fine. 1061 01:03:04,260 --> 01:03:09,400 They build on top of it immediately with zero propagation delays, zero verification delay. 1062 01:03:09,400 --> 01:03:13,800 Smaller miners or miners receiving that block have to make sure it's correct before mining 1063 01:03:13,800 --> 01:03:15,810 on top of it. 1064 01:03:15,810 --> 01:03:21,140 So adding even a one second creation verification delay is something that a lot of the programmers 1065 01:03:21,140 --> 01:03:25,060 are like, mm, if you can get it down to half a second, maybe we're OK. 1066 01:03:25,060 --> 01:03:29,920 We want this functionality, but not at the cost of increasing that. 1067 01:03:29,920 --> 01:03:31,520 That's the worst time to add something. 1068 01:03:31,520 --> 01:03:37,430 If you can defer it, if you can say, oh, well, you need to verify this, but you can do it 1069 01:03:37,430 --> 01:03:39,570 after the fact or something, then it's fine. 1070 01:03:39,570 --> 01:03:43,450 So this I ran up against this because I had this fun idea, where you could half aggregate 1071 01:03:43,450 --> 01:03:46,660 Schnorr signatures within a block and do it non-interactively. 1072 01:03:46,660 --> 01:03:50,740 And it was really cool, and I'm pretty sure it works. 1073 01:03:50,740 --> 01:03:58,040 But it added, like, three seconds to block verification, so that killed that idea. 1074 01:03:58,040 --> 01:04:01,030 So one of the issues, yeah, timing. 1075 01:04:01,030 --> 01:04:06,740 Another issue is it does encourage more SPV level verification, where you can sort of 1076 01:04:06,740 --> 01:04:11,590 see that a lot of people will now use this method to not run a full node. 1077 01:04:11,590 --> 01:04:16,630 Because they're like, well, I can just get these compact proofs for all the UTXOs, whether 1078 01:04:16,630 --> 01:04:18,630 inclusion or non-inclusion. 1079 01:04:18,630 --> 01:04:19,630 So why run a full node? 1080 01:04:19,630 --> 01:04:23,750 I don't need to verify it because other people are. 1081 01:04:23,750 --> 01:04:26,740 You don't really want to encourage that because there's so much of that already. 1082 01:04:26,740 --> 01:04:30,620 Also, the biggest is probably there's got to be a better way to do this. 1083 01:04:30,620 --> 01:04:32,600 So there's all these different ideas on how to do it. 1084 01:04:32,600 --> 01:04:35,560 And they never seem to settle or converge on a single way. 1085 01:04:35,560 --> 01:04:43,750 So the three main ideas are sort of hash based UTXO sets, elliptic curve based commitments, 1086 01:04:43,750 --> 01:04:46,420 and RSA based commitments. 1087 01:04:46,420 --> 01:04:51,640 And there's a lot of overlap with accumulators, in that you're sort of-- the UTXO commitment 1088 01:04:51,640 --> 01:04:55,710 could be the sort of route or the hash of the accumulator itself. 1089 01:04:55,710 --> 01:05:00,480 That you want to prove that something's in there or prove that something's not in there. 1090 01:05:00,480 --> 01:05:03,990 And the EC one is the current one that Sip was looking at was looking at, but it never 1091 01:05:03,990 --> 01:05:04,990 made sense to me. 1092 01:05:04,990 --> 01:05:09,960 Because it's trivially-- you can provide invalid proofs really easily. 1093 01:05:09,960 --> 01:05:11,529 So it's all trusted. 1094 01:05:11,529 --> 01:05:17,170 But his idea was, well, let's say you have a node that you synced up, and you want to 1095 01:05:17,170 --> 01:05:21,490 sort of port that to somewhere else. 1096 01:05:21,490 --> 01:05:27,000 You can do that very compactly with an elliptic curve based UTXO commitment, which is cool, 1097 01:05:27,000 --> 01:05:30,779 but you can't trust the inclusion or non-inclusion proofs. 1098 01:05:30,779 --> 01:05:36,340 And there's RSA ones that also-- yeah, so there's some overlap there. 1099 01:05:36,340 --> 01:05:38,700 Anyway it's a pretty cool idea. 1100 01:05:38,700 --> 01:05:43,720 It does exist in Ethereum, which is just hash based that Joe Bonneau sort of explained the 1101 01:05:43,720 --> 01:05:46,270 tree thing they used. 1102 01:05:46,270 --> 01:05:49,470 But that doesn't provide for non-inclusion proofs, which would be fun. 1103 01:05:49,470 --> 01:05:52,480 So yeah, so this is another research area. 1104 01:05:52,480 --> 01:05:57,710 Any questions about UTXO commitments? 1105 01:05:57,710 --> 01:06:02,380 There's a bunch of-- Bram Cohen's Bitfield thing. 1106 01:06:02,380 --> 01:06:03,720 Have you seen that? 1107 01:06:03,720 --> 01:06:04,720 Yeah. 1108 01:06:04,720 --> 01:06:09,810 So that was a really interesting case of-- in computer science terms, it's o of n. 1109 01:06:09,810 --> 01:06:15,670 But he got it so that it's actually one bit per spent transaction output. 1110 01:06:15,670 --> 01:06:22,810 So the idea would be maintain a TXO set, right? 1111 01:06:22,810 --> 01:06:28,660 So for every transaction output ever created in order, you have just bits, right? 1112 01:06:28,660 --> 01:06:32,270 So if eight transaction outputs are created, that's a byte. 1113 01:06:32,270 --> 01:06:34,830 And you leave it all as zeros when they're unspent. 1114 01:06:34,830 --> 01:06:37,930 And when they have been spent, you just set it to 1. 1115 01:06:37,930 --> 01:06:43,980 And the thing is, yeah, there's 70 million, but if it's that-- well, there's more. 1116 01:06:43,980 --> 01:06:45,350 There's hundreds of millions. 1117 01:06:45,350 --> 01:06:48,460 But if it's bits instead of bytes, that's pretty small, right? 1118 01:06:48,460 --> 01:06:54,100 It's maybe 100 megabytes if you have 800 million outputs ever. 1119 01:06:54,100 --> 01:06:58,310 And then you can quickly see on your hard drive, OK, this output has been spent. 1120 01:06:58,310 --> 01:07:00,060 And you can provide proofs and stuff. 1121 01:07:00,060 --> 01:07:02,480 So the Bitfield thing was a kind of an interesting one. 1122 01:07:02,480 --> 01:07:08,500 OK, it's o of n, but 1 bit per object. 1123 01:07:08,500 --> 01:07:10,910 And you can sort of quickly sort between them. 1124 01:07:10,910 --> 01:07:15,180 So these are some of the things that-- I think that's it for today, right? 1125 01:07:15,180 --> 01:07:22,180 I was going to talk about covenants, but that's another story. 1126 01:07:22,180 --> 01:07:30,180 So these are some of the current research issues in Bitcoin, blockchain, cryptocurrency. 1127 01:07:30,180 --> 01:07:31,260 And there's lots of other topics. 1128 01:07:31,260 --> 01:07:34,720 But these are some of the big ones that I'm aware of and I know people working on. 1129 01:07:34,720 --> 01:07:41,020 But yeah, there's lots of ways to improve privacy, scalability, functionality. 1130 01:07:41,020 --> 01:07:45,690 It's certainly not like-- and sometimes people say, oh, well, Bitcoin was made in 2009. 1131 01:07:45,690 --> 01:07:53,370 And it's like blockchain version 1.0, and we need to make blockchain 4.0 or whatever. 1132 01:07:53,370 --> 01:07:56,510 And to some extent, yeah, Bitcoin is annoying to change, right? 1133 01:07:56,510 --> 01:08:01,540 It's difficult to change, and you've got the idea of a bridge node, and can't we just start 1134 01:08:01,540 --> 01:08:02,540 over? 1135 01:08:02,540 --> 01:08:03,540 It'd be so much easier. 1136 01:08:03,540 --> 01:08:06,950 In some cases, yes, it would be nicer to start over. 1137 01:08:06,950 --> 01:08:08,010 But that's sort of the challenge. 1138 01:08:08,010 --> 01:08:16,040 It's like, wait, can we make this system better, but maintain backwards compatibility? 1139 01:08:16,040 --> 01:08:23,799 And so, in a lot of cases, like the accumulators or the client side Bloom filter checking, 1140 01:08:23,799 --> 01:08:27,270 it's not a fork at all, right? 1141 01:08:27,270 --> 01:08:28,899 The miners don't have to change anything. 1142 01:08:28,899 --> 01:08:33,788 If you don't know about this new change, you don't have to change anything. 1143 01:08:33,788 --> 01:08:38,149 And so there are totally optional ways to improve some of these things. 1144 01:08:38,149 --> 01:08:39,960 And that's a fun challenge, right? 1145 01:08:39,960 --> 01:08:40,960 You're given this system. 1146 01:08:40,960 --> 01:08:46,259 It's sort of like trying to improve an airplane while it's in flight, where you're like, OK, 1147 01:08:46,259 --> 01:08:49,769 I'm going to make it faster, make it better while it's flying. 1148 01:08:49,769 --> 01:08:52,179 We can't even land. 1149 01:08:52,179 --> 01:08:55,568 So yeah, that definitely makes it harder. 1150 01:08:55,568 --> 01:09:01,698 But other things are-- but what's also interesting is that there's not that many things in it 1151 01:09:01,698 --> 01:09:05,979 that really require starting over from scratch. 1152 01:09:05,979 --> 01:09:12,880 So I know at the New York City Bitcoin Core Developer Meetup in March, Bram Cohen, who 1153 01:09:12,880 --> 01:09:16,660 made BitTorrent-- he's making his own coin, I guess, now. 1154 01:09:16,660 --> 01:09:21,520 And it's kind of interesting because usually, the Bitcoin people don't like alt coins or 1155 01:09:21,520 --> 01:09:26,630 are sort of wary of them, and don't invite alt coin people to their events and stuff. 1156 01:09:26,630 --> 01:09:31,799 But Bram Cohen invented BitTorrent, so everyone sort of owes him in some spiritual way a couple 1157 01:09:31,799 --> 01:09:37,120 thousand bucks for all the music and movies they've downloaded, right? 1158 01:09:37,120 --> 01:09:40,210 So it's like, oh, Bram, he helped us out. 1159 01:09:40,210 --> 01:09:41,818 So he comes to these things. 1160 01:09:41,818 --> 01:09:46,759 And he was asking at the thing in March, OK-- asking a lot of people-- if you were going 1161 01:09:46,759 --> 01:09:51,029 to start Bitcoin over from scratch, what would you change? 1162 01:09:51,029 --> 01:09:55,699 All the people who were programming this for years and dealing with all the annoying problems 1163 01:09:55,699 --> 01:10:00,619 in it gave, if you started from scratch, what code do you change? 1164 01:10:00,619 --> 01:10:04,820 Or what sort of basic infrastructure things do you change in the system? 1165 01:10:04,820 --> 01:10:06,400 And there weren't that many, right? 1166 01:10:06,400 --> 01:10:08,869 A lot of people were like, oh, there's the little things. 1167 01:10:08,869 --> 01:10:12,300 Like, oh, get rid of the little silly op zero in multi sig. 1168 01:10:12,300 --> 01:10:14,890 Get rid of this, or change a couple of these things. 1169 01:10:14,890 --> 01:10:21,180 But not a lot of really fundamental changes people would want to have made. 1170 01:10:21,180 --> 01:10:25,969 Some people were saying, hey, maybe instead of pointing based on TXID, be able to point 1171 01:10:25,969 --> 01:10:30,860 height and index, so you can have a lot smaller there. 1172 01:10:30,860 --> 01:10:36,800 Some different opcode things, but what was interesting is that-- and also, it's a super 1173 01:10:36,800 --> 01:10:37,800 biased sample. 1174 01:10:37,800 --> 01:10:42,099 And it's like, if you're asking all the people who work on Bitcoin, well, yeah, they sort 1175 01:10:42,099 --> 01:10:45,989 of like how Bitcoin works, or at least, they've grown accustomed to it. 1176 01:10:45,989 --> 01:10:49,239 So you might want to ask other people if you want to sort of think out of the box. 1177 01:10:49,239 --> 01:10:52,940 But there weren't a ton of major changes that people wanted to have made. 1178 01:10:52,940 --> 01:11:00,760 So it does seem that it's a fairly well thought out base of a system. 1179 01:11:00,760 --> 01:11:04,800 And then I'd say the Ethereum design is the other design that's very different, right? 1180 01:11:04,800 --> 01:11:07,619 The account based versus UTXO base. 1181 01:11:07,619 --> 01:11:11,800 And a lot of the people who work on Bitcoin think that account based is worse. 1182 01:11:11,800 --> 01:11:15,519 A lot of people maybe in Ethereum case think UTXO base is worse. 1183 01:11:15,520 --> 01:11:17,700 They do have tradeoffs.