The Burden of Proof(s): Code Merkleization

A note about the Stateless Ethereum initiative: Research activity has (understandably) slowed in the second half of 2020 as all contributors have adjusted to life on the weird timeline. But as the ecosystem moves incrementally closer to Serenity and the Eth1/Eth2 merge, Stateless Ethereum work will become increasingly relevant and impactful. Expect a more substantial year-end Stateless Ethereum retrospective in the coming weeks.

Let's roll through the re-cap one more time: The ultimate goal of Stateless Ethereum is to remove the requirement of an Ethereum node to keep a full copy of the updated state trie at all times, and to instead allow for changes of state to rely on a (much smaller) piece of data that proves a particular transaction is making a valid change. Doing this solves a major problem for Ethereum; a problem that has so far only been pushed further out by improved client software: State growth.

The Merkle proof needed for Stateless Ethereum is called a 'witness', and it attests to a state change by providing all of the unchanged intermediate hashes required to arrive at a new valid state root. Witnesses are theoretically a lot smaller than the full Ethereum state (which takes 6 hours at best to sync), but they are still a lot larger than a block (which needs to propagate to the whole network in just a few seconds). Leaning out the size of witnesses is therefore paramount to getting Stateless Ethereum to minimum-viable-utility.

Just like the Ethereum state itself, a lot of the extra (digital) weight in witnesses comes from smart contract code. If a transaction makes a call to a particular contract, the witness will by default need to include the contract bytecode in its entirety with the witness. Code Merkelization is a general technique to reduce burden of smart contract code in witnesses, so that contract calls only need to include the bits of code that they 'touch' in order to prove their validity. With this technique alone we might see a substantial reduction in witness, but there are a lot of details to consider when breaking up smart contract code into byte-sized chunks.

What is Bytecode?

There are some trade-offs to consider when splitting up contract bytecode. The question we will eventually need to ask is "how big will the code chunks be?" – but for now, let's look at some real bytecode in a very simple smart contract, just to understand what it is:

pragma solidity >=0.4.22 <0.7.0;

contract Storage {

    uint256 number;

    function store(uint256 num) public {
        number = num;
    }

    function retrieve() public view returns (uint256){
        return number;
    }
}

When this simple storage contract is compiled, it turns into the machine code meant to run 'inside' the EVM. Here, you can see the same simple storage contract shown above, but complied into individual EVM instructions (opcodes):

PUSH1 0x80 PUSH1 0x40 MSTORE CALLVALUE DUP1 ISZERO PUSH1 0xF JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST POP PUSH1 0x4 CALLDATASIZE LT PUSH1 0x32 JUMPI PUSH1 0x0 CALLDATALOAD PUSH1 0xE0 SHR DUP1 PUSH4 0x2E64CEC1 EQ PUSH1 0x37 JUMPI DUP1 PUSH4 0x6057361D EQ PUSH1 0x53 JUMPI JUMPDEST PUSH1 0x0 DUP1 REVERT JUMPDEST PUSH1 0x3D PUSH1 0x7E JUMP JUMPDEST PUSH1 0x40 MLOAD DUP1 DUP3 DUP2 MSTORE PUSH1 0x20 ADD SWAP2 POP POP PUSH1 0x40 MLOAD DUP1 SWAP2 SUB SWAP1 RETURN JUMPDEST PUSH1 0x7C PUSH1 0x4 DUP1 CALLDATASIZE SUB PUSH1 0x20 DUP2 LT ISZERO PUSH1 0x67 JUMPI PUSH1 0x0 DUP1 REVERT JUMPDEST DUP2 ADD SWAP1 DUP1 DUP1 CALLDATALOAD SWAP1 PUSH1 0x20 ADD SWAP1 SWAP3 SWAP2 SWAP1 POP POP POP PUSH1 0x87 JUMP JUMPDEST STOP JUMPDEST PUSH1 0x0 DUP1 SLOAD SWAP1 POP SWAP1 JUMP JUMPDEST DUP1 PUSH1 0x0 DUP2 SWAP1 SSTORE POP POP JUMP INVALID LOG2 PUSH5 0x6970667358 0x22 SLT KECCAK256 DUP13 PUSH7 0x1368BFFE1FF61A 0x29 0x4C CALLER 0x1F 0x5C DUP8 PUSH18 0xA3F10C9539C716CF2DF6E04FC192E3906473 PUSH16 0x6C634300060600330000000000000000

As explained in a previous post, these opcode instructions are the basic operations of the EVM's stack architecture. They define the simple storage contract, and all of the functions it contains. You can find this contract as one of the example solidity contracts in the Remix IDE (Note that the machine code above is an example of the storage.sol after it's already been deployed, and not the output of the Solidity compiler, which will have some extra 'bootstrapping' opcodes). If you un-focus your eyes and imagine a physical stack machine chugging along with step-by-step computation on opcode cards, in the blur of the moving stack you can almost see the outlines of functions laid out in the Solidity contract.

Whenever the contract receives a message call, this code runs inside every Ethereum node validating new blocks on the network. In order to submit a valid transaction on Ethereum today, one needs a full copy of the contract's bytecode, because running that code from beginning to end is the only way to obtain the (deterministic) output state and associated hash.

Stateless Ethereum, remember, aims to change this requirement. Let's say that all you want to do is call the function retrieve() and nothing more. The logic describing that function is only a subset of the whole contract, and in this case the EVM only really needs two of the basic blocks of opcode instructions in order to return the desired value:

PUSH1 0x0 DUP1 SLOAD SWAP1 POP SWAP1 JUMP,

JUMPDEST PUSH1 0x40 MLOAD DUP1 DUP3 DUP2 MSTORE PUSH1 0x20 ADD SWAP2 POP POP PUSH1 0x40 MLOAD DUP1 SWAP2 SUB SWAP1 RETURN

In the Stateless paradigm, just as a witness provides the missing hashes of un-touched state, a witness should also provide the missing hashes for un-executed pieces of machine code, so that a stateless client only requires the portion of the contract it's executing.

The Code's Witness

Smart contracts in Ethereum live in the same place that externally-owned accounts do: as leaf nodes in the enormous single-rooted state trie. Contracts are in many ways no different than the externally-owned accounts humans use. They have an address, can submit transactions, and hold a balance of Ether and any other token. But contract accounts are special because they must contain their own program logic (code), or a hash thereof. Another associated Merkle-Patricia Trie, called the storageTrie keeps any variables or persistent state that an active contract uses to go about its business during execution.

witness

This witness visualization provides a good sense of how important code merklization could be in reducing the size of witnesses. See that giant chunk of colored squares and how much bigger it is than all the other elements in the trie? That's a single full serving of smart contract bytecode.

Next to it and slightly below are the pieces of persistent state in the storageTrie, such as ERC20 balance mappings or ERC721 digital item ownership manifests. Since this is example is of a witness and not a full state snapshot, those too are made mostly of intermediate hashes, and only include the changes a stateless client would require to prove the next block.

Code merkleization aims to split up that giant chunk of code, and to replace the field codeHash in an Ethereum account with the root of another Merkle Trie, aptly named the codeTrie.

Worth its Weight in Hashes

Let's look at an example from this Ethereum Engineering Group video, which analyzes some methods of code chunking using an ERC20 token contract. Since many of the tokens you've heard of are made to the ERC-20 standard, this is a good real-world context to understand code merkleization.

Because bytecode is long and unruly, let's use a simple shorthand of replacing four bytes of code (8 hexidecimal characters) with either an . or X character, with the latter representing bytecode required for the execution of a specific function (in the example, the ERC20.transfer() function is used throughout).

In the ERC20 example, calling the transfer() function uses a little less than half of the whole smart contract:

XXX.XXXXXXXXXXXXXXXXXX..........................................
.....................XXXXXX.....................................
............XXXXXXXXXXXX........................................
........................XXX.................................XX..
......................................................XXXXXXXXXX
XXXXXXXXXXXXXXXXXX...............XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX..................................
.......................................................XXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXX..................................X
XXXXXXXX........................................................
....

If we wanted to split up that code into chunks of 64 bytes, only 19 out of the 41 chunks would be required to execute a stateless transfer() transaction, with the rest of the required data coming from a witness.

|XXX.XXXXXXXXXXXX|XXXXXX..........|................|................
|................|.....XXXXXX.....|................|................
|............XXXX|XXXXXXXX........|................|................
|................|........XXX.....|................|............XX..
|................|................|................|......XXXXXXXXXX
|XXXXXXXXXXXXXXXX|XX..............|.XXXXXXXXXXXXXXX|XXXXXXXXXXXXXXXX
|XXXXXXXXXXXXXXXX|XXXXXXXXXXXXXX..|................|................
|................|................|................|.......XXXXXXXXX
|XXXXXXXXXXXXXXXX|XXXXXXXXXXXXX...|................|...............X
|XXXXXXXX........|................|................|................
|....

Compare that to 31 out of 81 chunks in a 32 byte chunking scheme:

|XXX.XXXX|XXXXXXXX|XXXXXX..|........|........|........|........|........
|........|........|.....XXX|XXX.....|........|........|........|........
|........|....XXXX|XXXXXXXX|........|........|........|........|........
|........|........|........|XXX.....|........|........|........|....XX..
|........|........|........|........|........|........|......XX|XXXXXXXX
|XXXXXXXX|XXXXXXXX|XX......|........|.XXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX
|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXX..|........|........|........|........
|........|........|........|........|........|........|.......X|XXXXXXXX
|XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXX...|........|........|........|.......X
|XXXXXXXX|........|........|........|........|........|........|........
|....

On the surface it seems like smaller chunks are more efficient than larger ones, because the mostly-empty chunks are less frequent. But here we need to remember that the unused code has a cost as well: each un-executed code chunk is replaced by a hash of fixed size. Smaller code chunks mean a greater number of hashes for the unused code, and those hashes could be as large as 32 bytes each (or as small as 8 bytes). You might at this point exclaim "Hol' up! If the hash of code chunks is a standard size of 32 bytes, how would it help to replace 32 bytes of code with 32 bytes of hash!?".

Recall that the contract code is merkleized, meaning that all hashes are linked together in the codeTrie -- the root hash of which we need to validate a block. In that structure, any sequential un-executed chunks only require one hash, no matter how many there are. That is to say, one hash can stand in for a potentially large limb full of sequential chunk hashes on the merkleized code trie, so long as none of them are required for coded execution.

We Must Collect Additional Data

The conclusion we've been building to is a bit of an anticlimax: There is no theoretically 'optimal' scheme for code merkleization. Design choices like fixing the size of code chunks and hashes depend on data collected about the 'real world'. Every smart contract will merkleize differently, so the burden is on researchers to choose the format that provides the largest efficiency gains to observed mainnet activity. What does that mean, exactly?

overhead

One thing that could indicate how efficient a code merkleization scheme is Merkleization overhead, which answers the question "how much extra information beyond executed code is getting included in this witness?"

Already we have some promising results, collected using a purpose-built tool developed by Horacio Mijail from Consensys' TeamX research team, which shows overheads as small as 25% -- not bad at all!

In short, the data shows that by-and-large smaller chunk sizes are more efficient than larger ones, especially if smaller hashes (8-bytes) are used. But these early numbers are by no means comprehensive, as they only represent about 100 recent blocks. If you're reading this and interested in contributing to the Stateless Ethereum initiative by collecting more substantial code merkleization data, come introduce yourself on the ethresear.ch forums, or the #code-merkleization channel on the Eth1x/2 research discord!

And as always, if you have questions, feedback, or requests related to "The 1.X Files" and Stateless Ethereum, DM or @gichiba on twitter.

The Burden of Proof(s): Code Merkleization

Posted by Griffin Ichiba Hotchkiss on November 30, 2020

The Code's Witness

Worth its Weight in Hashes

We Must Collect Additional Data