The 1.x Files: The Updated Stateless Tech Tree

The Updated Stateless Ethereum Tech Tree

Apologies for the delay in releasing this post; there have been some unavoidable distractions in my life recently, as I'm sure there have been in yours. I hope that you are making the best of your circumstances, whatever they may be, and implore you to turn your empathy up to eleven for the next few months, and to help your community's at-risk people in whatever capacity you can :pray:.

With that said, let's talk about Stateless Ethereum, and the changes to the Tech Tree!

Graphically, the tree has been completely re-worked, but if you were to compare it to the original, you'd notice that a lot of the content is the same. For the sake of completeness and avoidance of confusion, we'll still go through everything in this post, though, so feel free to close that tab you just opened in the background. Without further ado, I present to you the updated Stateless Tech Tree:

Each major milestone in pink represents a roughly defined category that must be "solved" before more advanced ones. These are intentionally a little vague, and don't represent anything like specific EIPs or unified features, although some of them could eventually be defined as such.

Smaller elements of the tree in purple are more specific dependencies that will lead to the major milestones being "unlocked". The purple ones are required in the sense that they need to be fully understood before the milestone can be considered finished, but they don't necessarily need to be implemented or accepted. For example, it is possible that after more research, we find that code merkleization doesn't reduce witness sizes sufficiently to justify the time and effort it would take to implement it; we would then consider it 'finished', because it no longer needs to be investigated.

As you might have guessed already, items in green are the "side quests" that would theoretically be useful in Stateless Ethereum, but which might not be the best use of the researcher's limited time and effort. There are likely more of these to be discovered along the way; I'll add them as needed.

Additionally, we have elements in yellow that fall into the category of tools. These are yet-uncreated software tools that will help to validate assumptions, test implementations, and more generally make the work go faster. Ideally these tools will be of high enough quality and properly maintained-- enough to be valuable to the larger developer ecosystem even outside of the Stateless Ethereum context.

Alternative Sync Protocol

One important takeaway from the summit in Paris was that sync is the first major milestone in Stateless Ethereum. Specifically, we must find a way for new nodes to fetch the current state trie without relying on the network primitive GetNodeData. Until we have a reliable alternative to this network primitive (beam sync and fast sync are both based on it), efforts to build Stateless Ethereum will be impeded, and potentially even counterproductive. It's worth digging in here a bit to explain why this is such a problem. If you're not familiar with the fundamentals of the Ethereum state, I recommend checking out my previous post in this series on the subject.

Let's do some jargon-busting first. There isn't really a special technical definition for the term "network primitive" in this context, it's just a hip way of saying "the basic grammar of Ethereum network communication". One client asks "hey, what's the data for the node with hash 0xfoo? And a peer can respond "oh, it's 0xbeef. For most cases, the response will contain additional hashes of child nodes in the trie, which can then be asked for in the same manner. This game of marco-polo continues until the requester is satisfied, usually after having asked for each of the ~400 million nodes in the current state trie individually.

Syncing this way can still be fast, because a client can of course multi-task, and ask many other full nodes for different pieces of the state at the same time. But there is a more fundamental problem here in the way the primitive works: the 'leechers' requesting state get to do it on their own terms, and they can only get what they need from the 'seeders', i.e. full nodes with the complete state. This asymmetric relationship is just the way things work right now, and it works well enough because of two related facts about the network: First, there are a sufficient number of full nodes actively serving state by request. Second, anyone requesting state will eventually turn into a full node, so the demand for state is self-limiting.

Now we can see why this is a problem for Stateless Ethereum: in a stateless paradigm, nodes that aren't keeping the state data they request will need to just keep requesting data indefinitely. If running a stateless node is easier than running a full node (it is), we'd expect the number of stateless nodes to grow faster than the number of full nodes, until eventually the state is unable to propagate fast enough throughout the network. Uh oh.

We don't have time to go into further detail here, so I'll refer you to Piper's write-up on the problem, and then we can move on to the emerging solutions, which are all different approaches to improving the state sync protocol, to either make the problem less pronounced, or solve it entirely. Here are the 3 most promising alternative sync protocols:

Ethereum Snapshot Protocol (SNAP). We've talked about this previously, but I referred to it as "state tiling". Recently, it was more verbosely described by Peter in the devp2p repo. Snap breaks the state into a handful of large chunks and proofs (on the order of 10,000 trie nodes) that can be re-assembled into the full state. A syncing node would request a sub-section of the state from multiple nodes, and in a short amount of time have an almost valid picture of the state stitched together from ~100 different similar state roots. To finish, the client ‘patches up’ the chunk by switching back to getNodeData until it has a valid state.

Fire Queen's Sync. Not much has changed since this was written about in the original tech tree article, except for the name, which is a combination of "firehose" and "Red Queen's" sync. These are very similar proposals to replace getNodeData with an alternative set of primitives for various aspects of state.

Merry-go-round. This is a new idea for sync explained at a high level in ethresear.ch and more concretely described in notes. In merry-go-round sync, the whole state is passed around in a predetermined order, so that all participants gossip the same pieces of the state trie at the same time. To sync the whole state, one must complete a full “revolution” on the merry-go-round, covering all parts of the state. This design has some useful properties. First, it allows new nodes joining to contribute immediately to state propagation, rather than only becoming useful to the network after a completed sync. Second, it inverts the current model of ‘leecher-driven sync’ whereby those with no data may request pieces of state from full nodes at will. Rather, new syncing nodes in merry-go-round sync know what parts of state are being offered at a given time, and adjust accordingly.

The last sync method worth mentioning is beam sync, which is now supported by not one, but two alternative clients. Beam sync still relies on getNodeData, but it offers an ideal entry point for experimentation and data collection for these alternative sync methods. It's important to note that there are many unknowns about sync still, and having these separate, independently developed approaches to solving sync is important. The next few months could be thought of as a sync hackathon of sorts, where ideas are prototyped and tested out. Ideally, the best aspects of each of these alternative sync protocols can be molded into one new standard for Stateless Ethereum.

Witness Spec Prototype

There is a draft specification in the Stateless Ethereum specs repo that describes at a high level the structure of a block witness, and the semantics of building and modifying one from the state trie. The purpose of this document is to define witnesses without ambiguity, so that implementers, regardless of client or programming language, may write their own implementation and have reasonable certainty that it is the same thing as another, different implementation.

As mentioned in the latest call digest, there doesn't seem to be a downside to writing out a reference implementation for block witnesses and getting that into existing clients for testing. A witness prototype feature on a client would be something like an optional flag to enable, and having a handful of testers on the network producing and relaying witnesses could provide valuable insight for researchers to incorporate into subsequent improvements.

Two things need to be "solved" before witnesses are resilient enough to be considered ready for widespread use.

Witness Indexing. This one is relatively straightforward: we need a reliable way of determining which witness corresponds to which block and associated state. This could be as simple as putting a witnessHash field into the block header, or something else that serves the same purpose but in a different way.

Stateless Tx Validation. This is an interesting early problem thoroughly summarized on the ethresearch forums. In summary, clients need to quickly check if incoming transactions (waiting to be mined into a future block) are at least eligible to be included in a future block. This prevents attackers from spamming the network with bogus transactions. The current check, however, requires accessing data which is a part of the state, i.e. the sender's nonce and account balance. If a client is stateless, it won't be able to perform this check.

There is certainly more work than these two specific problems that needs to be done before we have a working prototype of witnesses, but these two things are what absolutely need to be 'solved' as part of bringing a viable prototype to a beam-syncing node near you.

EVM

As in the original version of the tech tree, some changes will need to happen inside the EVM abstraction. Specifically, witnesses need to be generated and propagated across the network, and that activity needs to be accounted for in EVM operations. The topics tied to this milestone have to do with what those costs and incentives are, how they are estimated, and how they will be implemented with minimal impact on higher layers.

Witness gas accounting. This remains unchanged from previous articles. Every transaction will be responsible for a small part of the full block’s witness. Generating a block’s witness involves some computation that will be performed by the block’s miner, and therefore will need to have an associated gas cost, paid for by the transaction’s sender.

Code Merkleization. One major component of a witness is accompanying code. Without this feature, a transaction that contained a contract call would require the full bytecode of that contract in order to verify its codeHash. That could be a lot of data, depending on the contract. Code ‘merkleization’ is a method of splitting up contract bytecode so that only the portion of the code called is required to generate and verify a witness for the transaction. This is one technique of dramatically reducing the average size of witnesses, but it has not been fully investigated yet.

The UNGAS / Versionless Ethereum changes have been removed from the 'critical path' of Stateless Ethereum. These are still potentially beneficial features for Ethereum, but it became clear during the summit that their merits and particularities can and should be discussed independently of the Stateless goals.

The Transition to Binary Trie

Switching Ethereum’s state to a Binary Trie structure is key to getting witness sizes small enough to be gossiped around the network without running into bandwidth/latency issues. Theoretically the reduction should be over 3-fold, but in practice that number is a little less dramatic (because of the size of contract code in witnesses, which is why code merkleization is potentially important).

The transition to a completely different data representation is a rather significant change, and enacting that transition through hard-fork will be a delicate process. Two strategies outlined in the previous article remain unchanged:

Progressive. The current hexary state trie woud be transformed piece-by-piece over a long period of time. Any transaction or EVM execution touching parts of state would by this strategy automatically encode changes to state into the new binary form. This implies the adoption of a ‘hybrid’ trie structure that will leave dormant parts of state in their current hexary representation. The process would effectively never complete, and would be complex for client developers to implement, but would for the most part insulate users and higher-layer developers from the changes happening under the hood in layer 0.

Clean-cut. This strategy would compute a fresh binary trie representation of the state at a predetermined time, then carry on in binary form once the new state has been computed. Although more straightforward from an implementation perspective, a clean-cut requires coordination from all node operators, and would almost certainly entail some (limited) disruption to the network, affecting developer and user experience during the transition.

There is, however, a new proposal for the transition, which offers a middle ground between the progressive and clean-cut strategies. It is outlined in full on the ethresearch forums.

Overlay. New values from transactions after a certain time are stored directly in a binary tree sitting “on top” of the hexary, while the “historical” hexary tree is converted in the background. When the base layer has been fully converted, the two can be merged.

One additional consideration for the transition to a binary trie is the database layouts of clients. Currently, all clients use the 'naive' approach to the state trie, storing each node in the trie as a [key, value] pair where the hash of the node is the key. It is possible that the transition strategy could be an opportunity for clients to switch to an alternative database structure, following the example of turbo-geth.

True Stateless Ethereum

The final pieces of the tree come together after the witness prototype has been tested and improved, the necessary changes to the EVM have been enacted, and the state trie has become binary. These are the more distant quests and side quests which we know must be completed eventually, but it's likely best not to think too deeply about until more pressing matters have been attended to.

Compulsory Witnesses. Witnesses need to be generated by miners, and right now it's not clear if spending that extra few milliseconds to generate a witness will be something miners will seek to avoid or not. Part of this can be offset by tweaking the fees that miners get to keep from the partial witnesses included with transactions, but a sure-fire way is to just make witnesses part of the core Ethereum protocol. This is a change that can only happen after we're sure everything is working the way it's supposed to be, so it's one of the final changes in the tree.

Witness Chunking. Another more distant feature to be considered is the ability for a stateless network to pass around smaller chunks of witnesses, rather than entire blocks. This would be especially valuable for partial-state nodes, which might choose to 'watch over' the parts of state they're interested in, and then rely on complementary witness chunks for other transactions.

Historical Accumulators. Originally conceived as some sort of magic moon math zero-knowledge scheme, a historical accumulator would make verifying a historical witness much easier. This would allow a stateless node to perform checks and queries on, for example, the historical balances of an account it was interested, without actually needing to fetch a specific piece of archived state.

DHT Chain Data. Although the idea of an Ethereum data delivery network for state has been more or less abandoned, it would still be quite useful and far easier to implement one for historical chain data such as transaction receipts. This might be another approach to enabling stateless clients to have on-demand access to historical data that might ordinarily be gotten from an archive node.

Stay Safe, and Stay Tuned

Thanks for reading, and thank you for the many warm positive comments I've gotten recently about these updates. I have something more... magical planned for subsequent posts about the Stateless Ethereum research, which I'll be posting intermittently on the Fellowship of the Ethereum Magician's forum, and on this blog when appropriate. Until next time, keep your social distance, and wash your hands often!

As always, if you have feedback, questions, or requests for topics, please @gichiba or @JHancock on twitter.