The 1.x Files: A Primer for the Witness Specification

Since a lot of us have a bit more time on our hands, I thought now might be a good opportunity to proceed with something perhaps a little bit boring and tedious, but nevertheless quite fundamental to the Stateless Ethereum effort: understanding the formal Witness Specification.

Like the captain of the Battleship in StarCraft, we're going to take it slow. The witness spec is not a particularly complicated concept, but it is very deep. That depth is a little daunting, but is well worth exploring, because it'll provide insights that, perhaps to your nerdy delight, extend well beyond the world of blockchains, or even software!

By the end of this primer, you should have at least minimum-viable-confidence in your ability to understand what the formal Stateless Ethereum Witness Specification is all about. I'll try to make it a little more fun, too.

Recap: What you need to know about State

Stateless Ethereum is, of course, a bit of a misnomer, because the state is really what this whole effort is about. Specifically, finding a way to make keeping a copy of the whole Ethereum state an optional thing. If you haven't been following this series, it might be worth having a look at my earlier primer on the state of stateless Ethereum. I'll give a short TL;DR here though. Feel free to skim if you feel like you've already got a good handle on this topic.

The complete ‘state’ of Ethereum describes the current status of all accounts and balances, as well as the collective memories of all smart contracts deployed and running in the EVM. Every finalized block in the chain has one and only one state, which is agreed upon by all participants in the network. That state is changed and updated with each new block that is added to the chain.

The Ethereum State is represented in silico as a Merkle-Patricia Trie: a hashed data structure that organizes each individual piece of information (e.g. an account balance) into one massive connected unit that can be verified for uniqueness. The complete state trie is too massive to visualize, but here's a 'toy version' that will be helpful when we get to witnesses:

toy state trie

Like magical cryptographic caterpillars, the accounts and code of smart contracts live in the leaves and branches of this tree, which through successive hashing eventually leads to a single root hash. If you want to know that two copies of a state trie are the same, you can simply compare the root hashes. Maintaining relatively secure and indisputable consensus over one 'canonical' state is the essence of what a blockchain is designed to do.

In order to submit a transaction to be included in the next block, or to validate that a particular change is consistent with the last included block, Ethereum nodes must keep a complete copy of the state, and re-compute the root hash (over and over again). Stateless Ethereum is a set of changes that will remove this requirement, by adding what's known as a 'witness'.

A Witness Sketch

Before we dive into the witness specification, it'll be helpful to have an intuitive sense of what a witness is. Again, there is a more thorough explanation in the post on the Ethereum state linked above.

A witness is a bit like a cheat sheet for an oblivious (stateless) student (client). It's just the minimum amount of information need to pass the exam (submit a valid change of state for inclusion in the next block). Instead of reading the whole textbook (keeping a copy of the current state), the oblivious student (stateless client) asks a friend (full node) for a crib sheet to submit their answers.

In very abstract terms, a witness provides all of the needed hashes in a state trie, combined with some ‘structural’ information about where in the trie those hashes belong. This allows an ‘oblivious’ node to include new transaction in its state, and to compute a new root hash locally – without requiring them to download an entire copy of the state trie.

Let's move away from the cartoonish idea and towards a more concrete representation. Here is a "real" visualization of a witness:

witness-hex

I recommend opening this image in a new tab so that you can zoom in and really appreciate it. This witness was selected because it's relatively small and easy to pick out features. Each little square in this image represents a single 'nibble', or half of a byte, and you can verify that yourself by counting the number of squares that you have to 'pass through', starting at the root and ending at an Ether balance (you should count 64). While we're looking at this image, notice the huge chunk of code within one of the transactions that must be included for a contract call -- code makes up a relatively large part of the witness, and could be reduced by code merkleization (which we'll explore another day).

Some Formalities

One of the fundamental distinguishing features of Ethereum as a protocol is its independence from a particular implementation. This is why, rather than just one official client as we see in Bitcoin, Ethereum has several completely different versions of client. These clients, written in various programming languages, must adhere to The Ethereum Yellow Paper, which explains in much more formal terms how any client should behave in order to participate in the Ethereum protocol. That way, a developer writing a client for Ethereum doesn't have to deal with any ambiguity in the system.

The Witness Specification has this exact goal: to provide an unambiguous description of what a witness is, which will make implementing it straightforward in any language, for all clients. If and when Stateless Ethereum becomes 'a thing', the witness specification can be inserted into the Yellow Paper as an appendix.

When we say unambiguous in this context, it means something stronger than what you might mean in ordinary speech. It's not that the formal specification is just a really, really, really, detailed description of what a witness is and how it behaves. It means that, ideally, there is literally one and only one way describe a particular witness. That is to say, if you adhere to the formal specification, it'd be impossible for you to write an implementation for Stateless Ethereum that generates witnesses different than any other implementation also following the rules. This is key, because the witness is going to (hopefully) become a new cornerstone of the Ethereum protocol; It needs to be correct by construction.

A Matter of Semantics (and Syntax)

Although 'blockchain development' usually implies something new and exciting, it must be said that a lot of it is grounded in much older and wiser traditions of computer programming, cryptography, and formal logic. This really comes out in the Witness Specification! In order to understand how it works, we need to feel comfortable with some of the technical terms, and to do that we're going to have to take a little detour into linguistics and formal language theory.

Read aloud the following two sentences, and pay particular attention to your intonation and cadence:

furiously sleep ideas green colorless
colorless green ideas sleep furiously

I bet the first sentence came out a bit robotic, with a flat emphasis and pause after each word. By contrast, the second sentence probably felt natural, if a bit silly. Even though it didn't really mean anything, the second sentence made sense in a way that the first one didn't. This is a little intuition pump to draw attention to the distinction between Syntax and Semantics. If you're an English speaker you have an understanding of what the words represent (their semantic content), but that was largely irrelevant here; what you noticed was a difference between valid and invalid grammar (their syntax).

This example sentence is from a 1956 paper by one Noam Chomsky, which is a name you might recognize. Although he is now known as an influential political and social thinker, Chomsky's first contributions as an academic were in the field of logic and linguistics, and in this paper, he created one of the most useful classification systems for formal languages.

Chomsky was concerned with the mathematical description of grammar, how one can categorize languages based on their grammar rules, and what properties those categories have. One such property that is relevant to us is syntactic ambiguity.

Ambiguous Buffalo

Consider the grammatically correct sentence "Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo." -- this is a classic example that illustrates just how ambiguous English syntax rules can be. If you understand that, depending on the context, the word 'buffalo' can be used as a verb (to intimidate), an adjective (being from Buffalo, NY), or a noun (a bison), you can parse the sentence based on where each word belongs.

We could also use entirely different words, and multiple sentences: "You know those NY bison that other NY bison intimidate? Well, they intimidate, too. They intimidate NY bison, to be exact."

But what if we want to remove the ambiguity, but still restrict our words to use only 'buffalo', and keep it all as a single sentence? It's possible, but we need to modify the rules of English a bit. Our new "language" is going to be a little more exact. One way to do that would be to mark each word to indicate its part of speech, like so:

Buffalo{pn} buffalo{n} Buffalo{pn} buffalo{n} buffalo{v} buffalo{v} Buffalo{pn} buffalo{n}

Perhaps that's still not super clear for a reader. To make it even more exact, let's try using a bit of substitution to help us herd some of these "buffalo" into groups. Any bison from Buffalo, NY is really just one special version of what we would call a "noun phrase", or <NP>. We can substitute <NP> into the sentence whenever we encounter the string Buffalo{pn} buffalo{n}. Since we're getting a bit more formal, we might decide to use a shorthand notation for this and other future substitution rules, by writing:

<NP> ::= Buffalo{pn} buffalo{n}

where ::= means "What's on the left side can be replaced by what's on the right side". Importantly, we don't want this relationship to go the other way; imagine how mad the Boulder buffalo would get!

Applying our substitution rule to the full sentence, it would change to:

<NP> <NP> buffalo{v} buffalo{v} <NP>

Now, this is still a bit confusing, because in this sentence there is a sneaky relative clause, which can be seen a lot more clearly by inserting the word 'that' into the first part our sentence, i.e. <NP> *that* <NP> buffalo{v}....

So let's make a substitution rule that groups the relative clause into <RC>, and say:

<RC> ::= <NP> buffalo{v}

Additionally, since a relative clause really just makes a clarification about a noun phrase, the two taken together are equivalent to just another noun phrase:

With these rules defined and applied, we can write the sentence as:

<NP> buffalo{v} <NP>

That seems pretty good, and really gets at the core relationship this silly sentence expresses: One particular group of bison intimidating another group of bison.

We've taken it this far, so why not go all the way? Whenever 'buffalo' as a verb precedes a noun, we could call that a verb phrase, or <VP>, and define a rule:

<VP> ::= buffalo{v}<NP>

And with that, we have our single complete valid sentence, which we could call S:

S ::= <NP><VP>

What we've done here might be better represented visually:

That structure looks curiously familiar, doesn't it?

The buffalo example is a bit silly and not very rigorous, but it's close enough to demonstrate what's going on with the weird mathematical language of the Witness Specification, which I have very sneakily introduced in my rant about buffalo. It's called Backus-Naur form notation, and it's often used in formal specifications like this, in a variety of real-world scenarios.

The 'substitution rules' we defined for our restricted English language helped to make sure that, given a herd of "buffalo", we could construct a 'valid' sentence without needing to know anything about what the word buffalo means in the real world. In the classification first elucidated by Chomsky, a language that has exact enough rules of grammar that allow you to do this is called a context-free language.

More importantly, the rules ensure that for every possible sentence comprised of the word(s) buffalo{np|n|v}, there is one and only one way to construct the data structure illustrated in the tree diagram above. Un-ambiguity FTW!

Go Forth and Read the Spec

Witnesses are at their core just a single large object, encoded into a byte array. From the (anthropomorphic) perspective of a stateless client, that array of bytes might look a bit like a long sentence comprised of very similar looking words. So long as all clients follow the same set of rules, the array of bytes should convert into one and only one hashed data structure, regardless of how the implementation chooses to represent it in memory or on disk.

The production rules, written out in section 3.2, are a bit more complex and far less intuitive than the ones we used for our toy example, but the spirit is very much the same: To be unambiguous guidelines for a stateless client (or a developer writing a client) to follow and be certain they're getting it right.

I've glossed over quite a lot in this exposition, and the rabbit hole of formal languages goes far deeper, to be sure. My aim here was to just provide enough of an introduction and foundation to overcome that first hurdle of understanding. Now that you have cleared that hurdle, it's time pop open wikipedia and tackle the rest yourself!

As always, if you have feedback, questions, or requests for topics, please @gichiba or @JHancock on twitter.