00-Merkle-4K

Special thanks to Sacha Yves Saint-Leger & Joseph Schweitzer for review.

Sharding is one of the many improvements that eth2 has over eth1. The term was borrowed from database research where a shard means a piece of a larger whole. In the context of databases and eth2, sharding means breaking up the storage and computation of the whole system into shards, processing the shards separately, and combining the results as needed. Specifically, eth2 implements many shard chains, where each shard has similar capabilities to the eth1 chain. This results in massive scaling improvements.

However, there’s a less-well-known type of sharding in eth2. One which is arguably more exciting from a protocol design point of view. Enter sharded consensus.

Sharding Consensus

In much the same way that the processing power of the slowest node limits the throughput of the network, the computing resources of a single validator limit the total number of validators that can participate in consensus. Since each additional validator introduces extra work for every other validator in the system, there’ll come a point where the validator with the least resources can no longer participate (because it can no longer keep track of the votes of all of the other validators). The solution eth2 employs to this is sharding consensus.

Breaking it down

Eth2 breaks time down into two durations, slots and epochs.

A slot is the 12 second time-frame in which a new block is expected to be added to the chain. Blocks are the mechanism by which votes cast by validators are included on the chain in addition to the transactions that actually make the chain useful.

An epoch is comprised of 32 slots (6.4 minutes) during which the beacon chain performs all of the calculations associated with the upkeep of the chain, including: justifying and finalising new blocks, and issuing rewards and penalties to validators.

As we touched upon in the first post of this series, validators are organised into committees to do their work. At any one time, each validator is a member of exactly one beacon chain and one shard chain committee, and is called on to make an attestation exactly once per epoch – where an attestation is a vote for a beacon chain block that has been proposed for a slot.

The security model of eth2’s sharded consensus rests upon the idea that committees are more or less an accurate statistical representation of the overall validator set.

For example, if we have a situation in which 33% of validators in the overall set are malicious, there is a chance that they could end up in the same committee. This would be a disaster for our security model.

So we need a way to ensure that this can’t happen. In other words, we need a way to ensure that if 33% of validators are malicious, only about ~33% of validators in a committee will be malicious.

It turns out we can achieve this by doing two things:

  1. Ensuring committee assignments are random
  2. Requiring a minimum number of validators in each committee

For example, with 128 randomly sampled validators per committee, the chance of an attacker with 1/3 of the validators gaining control of > 2/3 committee is vanishingly small (probability less than 2^-40).

Building it up

Votes cast by validators are called attestations. An attestation is comprised of many elements, specifically:

  • a vote for the current beacon chain head
  • a vote on which beacon block should be justified/finalised
  • a vote on the current state of the shard chain
  • the signatures of all of the validators who agree with that vote

By combining as many components as possible into an attestation, the overall efficiency of the system is increased. This is possible since, instead of having to check votes and signatures for beacon blocks and shard blocks separately, nodes need only do the math on attestations to be informed about the state of the beacon chain and of every shard chain.

If every validator produced their own attestation and every attestation needed to be verified by all other nodes, then being an eth2 node would be prohibitively expensive. Enter aggregation.

Attestations are designed to be easily combined such that if two or more validators have attestations with the same votes, they can be combined by adding the signatures fields together in one attestation. This is what we mean by aggregation.

Committees, by their construction, will have votes that are easy to aggregate because they are assigned to the same shard, and therefore should have the same votes for both the shard state and beacon chain. This is the mechanism by which eth2 scales the number of validators. By breaking the validators up into committees, validators need only to care about their fellow committee members and only have to check very few aggregated attestations from each of the other committees.

Signature aggregation

Eth2 makes use of the BLS signatures - a signature scheme defined over several elliptic curves that is friendly to aggregation. On the specific curve chosen, signatures are 96 bytes each.

If 10% of all ETH ends up staked, then there will be ~350,000 validators on eth2. This means that an epoch’s worth of signatures would be 33.6 megabytes which comes to ~7.6 gigabytes per day. In this case, all of the false claims about the eth1 state-size reaching 1TB back in 2018 would be true in eth2’s case in fewer than 133 days (based on signatures alone).

The trick here is that BLS signatures can be aggregated: If Alice produces signature A, and Bob’s signature is B on the same data, then both Alice’s and Bob’s signatures can be stored and checked together by only storing C = A + B. By using signature aggregation, only 1 signature needs to be stored and checked for the entire committee. This reduces the storage requirements to less than 2 megabytes per day.

In summary,

By separating validators out into committees, the effort required to verify eth2 is reduced by orders of magnitude.

For a node to validate the beacon chain and all of the shard chains, it only needs to look at the aggregated attestations from each of the committees. In this way it can know the state of every shard, and every validator’s opinions on which blocks are and aren’t a part of the chain.

The committee mechanism therefore helps eth2 achieve two of the design goals established in the first article: namely that participating in the eth2 network must be possible on a consumer-grade laptop, and that it must strive to be maximally decentralised by supporting as many validators as possible.

To put numbers to it, while most Byzantine Fault Tolerant Proof of Stake protocols scale to tens (and in extreme cases, hundreds of validators), eth2 is capable of having hundreds of thousands of validators all contributing to security without compromising on latency or throughput.