Special thanks to Sacha Yves Saint-Leger & Danny Ryan for review.
In this installment, we'll discuss the consensus mechanisms behind eth2. Eth2 has a novel approach to deciding which block is the head of the chain, along with which blocks are and are not a part of the chain.
By using a hybrid between the two mechanisms, eth2 aims to have a consensus which, in addition to being rapid and safe when the network is behaving normally, remains safe even when it’s being attacked.
FLP impossibility is a core result in the field of distributed computation which states that in a distributed system it is not possible to simultaneously have safety, liveness, and full asynchrony unless some unreasonable assumptions can be made about your system.
Safety is the idea that decisions cannot be unmade whereas liveness captures the notion that new things can be decided. A protocol is asynchronus if there is no bound on how long a message may take to get delivered.
If nodes could communicate reliably, always follow the protocol honestly and never crash, then consensus would be easy, but that is not how the world works. When these assumption don't hold, FLP Impossibility is the proof that at least one of: safety, liveness, or full asynchrony must be compromised.
GHOSTs and their opinions on forks
Eth2 uses Greedy Heaviest Observed Subtree (GHOST) as its fork-choice rule. GHOST selects the head of the chain by choosing the fork which has the most votes (it does this by considering all of the votes for each fork block and their respective child blocks).
Put another way, each time there is a fork, GHOST chooses the side where more of the latest messages support that block’s subtree (i.e. more of the latest messages support either that block or one of its descendants). The algorithm does this until it reaches a block with no children.
GHOST has the benefit of reducing the efficacy of attacks during times of high network latency as well as minimizing the depth of chain reorgs when compared to the longest-chain rule. This is because while an attacker can keep building blocks efficiently on their own chain thereby making it the longest, GHOST would choose the other fork as there are more votes for it in total.
In particular, eth2 uses a variation of GHOST which has been adapted to a PoS context called Latest Message Driven GHOST (LMD-GHOST). The idea behind LMD-GHOST is that when calculating the head of the chain, one only considers the latest vote made by each validator, and not any of the votes made in the past. This dramatically decreases the computation required when running GHOST, since the number of forks that need to be considered to execute the fork choice cannot be greater than the number of validators ( in Big O notation).
Under the rules of GHOST, validators/miners can always try to add a new block to the blockchain (liveness), and they can do this at any point in the chain’s history (asynchronous). Since it is live and fully asynchronous, thanks to our friend FLP, we know it can’t be safe.
The lack of safety presents itself in the form of reorgs where a chain can suddenly switch between forks of arbitrary depth. Obviously this is undesirable and eth1 deals with this by having users make assumptions about how long miners' blocks will take to be communicated with the rest of the network, this takes the form of waiting for confirmations. Eth2, by contrast, makes no such assumptions.
The friendly finality gadget
A blockchain without any notion of safety is useless because no decisions could be reached and users could not agree on the state of the chain. Enter Casper the Friendly Finality Gadget (Casper FFG). Casper FFG is a mechanism which favours safety over liveness when making decisions. This means that while the decisions it makes are final, under poor network conditions, it may not be able to decide on anything.
FFG is a crypto-economic adaption of the classic Practical Byzantine Fault Tolerent (PBFT) which has phases where nodes first indicate that they'd like to agree on something (justification) and then agree that they've seen each other agreeing (finalisation).
Eth2 does not try to justify and finalise every slot (the time when a block is expected to be produced), but instead only every 32 slots. Collectively, 32 slots is called an epoch. First, validators sign that they agree with all 32 blocks in an epoch. Then, if do so, the block is justified. In a later epoch, validators get another chance to vote to indicate that they have seen the earlier justified epoch and if do this, the epoch is finalised and is forever a part of the eth2 chain.
FFG employs a clever trick. Votes actually consist of two sub-votes, one for the epoch that is attempting to be justified and another for an earlier epoch that is to become finalised. This saves a lot of extra communication between nodes and helps to achieve the goal of scaling to millions of validators.
Two ghosts in a trench coat
Consensus within eth2 relies on both LMD-GHOST – which adds new blocks and decides what the head of the chain is – and Casper FFG which makes the final decision on which blocks are and are not a part of the chain. GHOST’s favourable liveness properties allow new blocks to quickly and efficiently be added to the chain, while FFG follows behind to provide safety by finalising epochs.
The two protocols are merged by running GHOST from the last finalised block as decided upon by FFG. By construction, the last finalised block is always a part of the chain which means GHOST doesn't need to consider earlier blocks.
In the normal case when blocks are being produced and validators are voting on them, these blocks are added to the head of the chain by GHOST, and not long after justified and finalised by FFG (which considers the last few epochs).
If there is an attack on the network and/or a large proportion of validators go offline, then GHOST continues adding new blocks. However, since GHOST is live, but not safe, it may change its mind about the head of the chain – this is because new blocks are continually added to the chain, which means nodes keep learning new information. FFG on the other hand, favours safety over liveness meaning that it stops finalising blocks until the network is stable enough for validators to vote consistently again.