From Smart Contracts to Courts with not so Smart Judges

Ethereum is often described as a platform for self-enforcing smart contracts. While this is certainly true, this article argues that, especially when more complex systems are involved, it is rather a court with smart lawyers and a judge that is not so smart, or more formally, a judge with restricted computational resources. We will see later how this view can be leveraged to write very efficient smart contract systems, to the extent that cross-chain token transfers or computations like checking proof of work can be implemented at almost no cost.

The Court Analogy

First of all, you probably know that a smart contract on Ethereum cannot in itself retrieve information from the outside world. It can only ask outside actors to deliver information on its behalf. And even then, it either has to trust the outside actors or verify the integrity of the information itself. In court, the judge usually asks experts about their opinion (who they usually trust) or witnesses for a testimony that is often verified by cross-checking.

I guess it is obvious that the computational resources of the judge in Ethereum are restricted due to the gas limit, which is rather low when compared to the computational powers of the lawyers coming from the outside world. Yet, a judge restricted in such a way can still decide on very complicated legal cases: Her powers come from the fact that she can play off the defender against the prosecutor.

Complexity Theory

This exact analogy was formalised in an article by Feige, Shamir and Tennenholtz, The Noisy Oracle Problem. A very simplified version of their main result is the following: Assume we have a contract (judge) who can use N steps to perform a computation (potentially spread over multiple transactions). There are several outside actors (lawyers) who can help the judge and at least one of them is honest (i.e. at least one actor follows a given protocol, the others may be malicious and send arbitrary messages), but the judge does not know who the honest actor is. Such a contract can perform any computation that can be carried out using N memory cells and an arbitrary number of steps without outside help. (The formal version states that a polynomial-time verifier can accept all of PSPACE in this model)

This might sound a bit clunky, but their proof is actually quite instructive and uses the analogy of PSPACE being the class of problems that can be solved by "games". As an example, let me show you how an Ethereum contract can play chess with almost no gas costs (experts may forgive me to use chess which is NEXPTIME complete, but we will use the classic 8x8 variant here, so it actually is in PSPACE...): Playing chess in this context means that some outside actor proposes a chess position and the contract has to determine whether the position is a winning position for white, i.e. white always wins, assuming white and black are infinitely clever. This assumes that the honest off-chain actor has enough computing power to play chess perfectly, but well... So the task is not to play chess against the outside actors, but to determine whether the given position is a winning position for white and asking the outside actors (all except one of which might be misleading by giving wrong answers) for help. I hope you agree that doing this without outside help is extremely complicated. For simplicity, we only look at the case where we have two outside actors A and B. Here is what the contract would do:

Ask A and B whether this is a winning position for white. If both agree, this is the answer (at least one is honest).
If they disagree, ask the one who answered "yes" (we will call that actor W from now on, and the other one B) for a winning move for white.
If the move is invalid (for example because no move is possible), black wins
Otherwise, apply the move to the board and ask B for a winning move for black (because B claimed that black can win)
If the move is invalid (for example because no move is possible), white wins
Otherwise, apply the move to the board, ask A for a winning move for white and continue with 3.

The contract does not really need to have a clue about chess strategies. It just has to be able to verify whether a single move was valid or not. So the costs for the contract are roughly

N*(V+U)

, where N is the number of moves (ply, actually), V is the cost for verifying a move and U is the cost for updating the board.

This result can actually be improved to something like N*U + V, because we do not have to verify every single move. We can just update the board (assuming moves are given by coordinates) and while we ask for the next move, we also ask whether the previous move was invalid. If that is answered as "yes", we check the move. Depending on whether the move was valid or not, one of the players cheated and we know who wins.

Homework: Improve the contract so that we only have to store the sequence of moves and update the board only for a tiny fraction of the moves and perform a move verification only for a single move, i.e. bring the costs to something like N*M + tiny(N)*U + V, where M is the cost for storing a move and tiny is an appropriate function which returns a "tiny fraction" of N.

On a side note, Babai, Fortnow and Lund showed that a model where the lawyers are cooperating but cannot communicate with each other and the judge is allowed to roll dice (both changes are important) captures an allegedly much larger class called NEXPTIME, nondeterministic exponential time.

Adding Cryptoeconomics to the Game

One thing to remember from the previous section is that, assuming transactions do not get censored, the contract will always find out who the honest and who the dis-honest actor was. This leads to the interesting observation that we now have a rather cheap interactive protocol to solve hard problems, but we can add a cryptoeconomic mechanism that ensures that this protocol almost never has to be carried out: The mechanism allows anyone to submit the result of a computation together with a security deposit. Anyone can challenge the result, but also has to provide a deposit. If there is at least one challenger, the interactive protocol (or its multi-prover variant) is carried out. Assuming there is at least one honest actor among the set of proposers and challengers, the dishonest actors will be revealed and the honest actor will receive the deposits (minus a percentage, which will disincentivise a dishonest proposer from challenging themselves) as a reward. So the end result is that as long as at least one honest person is watching who does not get censored, there is no way for a malicious actor to succeed, and even trying will be costly for the malicious actor.

Applications that want to use the computation result can take the deposits as an indicator for the trustworthiness of the computation: If there is a large deposit from the solution proposer and no challenge for a certain amount of time, the result is probably correct. As soon as there are challenges, applications should wait for the protocol to be resolved. We could even create a computation result insurance that promises to check computations off-chain and refunds users in case an invalid result was not challenged early enough.

The Power of Binary Search

In the next two sections, I will give two specific examples. One is about interactively verifying the presence of data in a foreign blockchain, the second is about verifying general (deterministic) computation. In both of them, we will often have the situation where the proposer has a very long list of values (which is not directly available to the contract because of its length) that starts with the correct value but ends with an incorrect value (because the proposer wants to cheat). The contract can easily compute the (i+1)st value from the ith, but checking the full list would be too expensive. The challenger knows the correct list and can ask the proposer to provide several values from this list. Since the first value is correct and the last is incorrect, there must be at least one point i in this list where the ith value is correct and the (i+1)st value is incorrect, and it is the challenger's task to find this position (let us call this point the "transition point"), because then the contract can check it.

Let us assume the list has a length of 1.000.000, so we have a search range from 1 to 1.000.000. The challenger asks for the value at position 500.000. If it is correct, there is at least one transition point between 500.000 and 1.000.000. If it is incorrect, there is a transition point between 1 and 500.000. In both cases, the length of the search range was reduced by one half. We now repeat this process until we reach a search range of size 2, which must be the transition point. The logarithm to the basis two can be used to compute the number of steps such an "iterated bisection" takes. In the case of 1.000.000, these are log 1.000.000 ≈ 20 steps.

Cheap Cross-Chain Transfers

As a first real-world example, I would like to show how to design an extremely cheap cross-chain state or payment verification. Due to the fact that blockchains are not deterministic but can fork, this is a bit more complicated, but the general idea is the same.

The proposer submits the data she wants to be available in the target contract (e.g. a bitcoin or dogecoin transaction, a state value in another Ethereum chain, or anything in a Merkle-DAG whose root hash is included in the block header of a blockchain and is publicly known (this is very important)) together with the block number, the hash of that block header and a deposit.

Note that we only submit a single block number and hash. In the first version of BTCRelay, currently all bitcoin block headers need to be submitted and the proof of work is verified for all of them. This protocol will only need that information in case of an attack.

If everything is fine, i.e. external verifiers check that the hash of the block number matches the canonical chain (and optionally has some confirmations) and see the transaction / data included in that block, the proposer can request a return of the deposit and the cross-chain transfer is finished. That's all there is in the non-attack case. This should cost about 200000 gas per transfer.

If something is wrong, i.e. we either have a malicious proposer / submitter or a malicious challenger, the challenger now has two possibilities:

declare the block hash invalid (because it does not exist or is part of an abandoned fork) or
declare the Merkle-hashed data invalid (but the block hash and number valid)

Note that a blockchain is a Merkle-DAG consisting of two "arms": One that forms the chain of block headers and one that forms the Merkle-DAG of state or transactions. Once we accept the root (the current block header hash) to be valid, verifications in both arms are simple Merkle-DAG-proofs.

(2) So let us consider the second case first, because it is simpler: As we want to be as efficient as possible, we do not request a full Merkle-DAG proof from the proposer. Instead we just request a path through the DAG from the root to the data (i.e. a sequence of child indices).

If the path is too long or has invalid indices, the challenger asks the proposer for the parent and child values at the point that goes out of range and the proposer cannot supply valid data that hashes to the parent. Otherwise, we have the situation that the root hash is correct but the hash at some point is different. Using binary search we find a point in the path where we have a correct hash directly above an incorrect one. The proposer will be unable to provide child values that hash to the correct hash and thus the fraud is detectable by the contract.

(1) Let us now consider the situation where the proposer used an invalid block or a block that was part of an abandoned fork. Let us assume that we have a mechanism to correlate the block numbers of the other blockchain to the time on the Ethereum blockchain, so the contract has a way to tell a block number invalid because it must lie in the future. The proposer now has to provide all block headers (only 80 bytes for bitcoin, if they are too large, start with hashes only) up to a certain checkpoint the contract already knows (or the challenger requests them in chunks). The challenger has to do the same and will hopefully supply a block with a higher block number / total difficulty. Both can now cross-check their blocks. If someone finds an error, they can submit the block number to the contract which can check it or let it be verified by another interactive stage.

Specific Interactive Proofs for General Computations

Assume we have a computing model that respects locality, i.e. it can only make local modifications to the memory in a single step. Turing machines respect locality, but random-access-machines (usual computers) are also fine if they only modify a constant number of points in memory in each step. Furthermore, assume that we have a secure hash function with H bits of output. If a computation on such a machine needs t steps and uses at most s bytes of memory / state, then we can perform interactive verification (in the proposer/challenger model) of this computation in Ethereum in about log(t) + 2 * log(log(s)) + 2 rounds, where messages in each round are not longer than max(log(t), H + k + log(s)), where k is the size of the "program counter", registers, tape head position or similar internal state. Apart from storing messages in storage, the contract needs to perform at most one step of the machine or one evaluation of the hash function.

Proof:

The idea is to compute (at least on request) a Merkle-tree of all the memory that is used by the computation at each single step. The effects of a single step on memory is easy to verify by the contract and since only a constant number of points in memory will be accessed, the consistency of memory can be verified using Merkle-proofs.

Without loss of generality, we assume that only a single point in memory is accessed at each step. The protocol starts by the proposer submitting input and output. The challenger can now request, for various time steps i, the Merkle-tree root of the memory, the internal state / program counter and the positions where memory is accessed. The challenger uses that to perform a binary search that leads to a step i where the returned information is correct but it is incorrect in step i + 1. This needs at most log(t) rounds and messages of size log(t) resp. H + k + log(s).

The challenger now requests the value in memory that is accessed (before and after the step) together with all siblings along the path to the root (i.e. a Merkle proof). Note that the siblings are identical before and after the step, only the data itself changed. Using this information, the contract can check whether the step is executed correctly and the root hash is updated correctly. If the contract verified the Merkle proof as valid, the input memory data must be correct (because the hash function is secure and both proposer and challenger have the same pre-root hash). If also the step execution was verified correct, their output memory data is equal. As the Merkle tree siblings are the same, the only way to find a different post-root hash is for the computation or the Merkle proof to have an error.

Note that the step described in the previous paragraph took one round and a message size of (H+1) log(s). So we have log(t) + 1 rounds and message sizes of max(log(t), k + (H+2) log(s)) in total. Furthermore, the contract needed to compute the hash function 2*log(s) times. If s is large or the hash function is complicated, we can decrease the size of the messages a little and reach only a single application of the hash function at the cost of more interactions. The idea is to perform a binary search on the Merkle proof as follows:

We do not ask the proposer to send the full Merkle proof, but only the pre- and post values in memory. The contract can check the execution of the stop, so let us assume that the transition is correct (including the internal post state and the memory access index in step i + 1). The cases that are left are:

the proposer provided the wrong pre-data
pre- and post-data are correct but the Merkle root of the post memory is wrong

In the first case, the challenger performs an interactive binary search on the path from the Merkle tree leaf containing the memory data to the root and finds a position with correct parent but wrong child. This takes at most log(log(s)) rounds and messages of size log(log(s)) resp. H bits. Finally, since the hash function is secure, the proposer cannot supply a sibling for the wrong child that hashes to the parent. This can be checked by the contract with a single evaluation of the hash function.

In the second case, we are in an inverted situation: The root is wrong but the leaf is correct. The challenger again performs an interactive binary search in at most log(log(s(n))) rounds with message sizes of log(log(s)) resp. H bits and finds a position in the tree where the parent P is wrong but the child C is correct. The challenger asks the proposer for the sibling S such that (C, S) hash to P, which the contract can check. Since we know that only the given position in memory could have changed with the execution of the step, S must also be present at the same position in the Merkle-tree of the memory before the step. Furthermore, the value the proposer provided for S cannot be correct, since then, (C, S) would not hash to P (we know that P is wrong but C and S are correct). So we reduced this to the situation where the proposer supplied an incorrect node in the pre-Merkle-tree but a correct root hash. As seen in the first case, this takes at most log(log(s)) rounds and messages of size log(log(s)) resp. H bits to verify.

Overall, we had at most log(t) + 1 + 2 * log(log(s)) + 1 rounds with message sizes at most max(log(t), H + k + log(s)).

Homework: Convert this proof to a working contract that can be used for EVM or TinyRAM (and thus C) programs and integrate it into Piper Merriam's Ethereum computation market.

Thanks to Vitalik for suggesting to Merkle-hash the memory to allow arbitrary intra-step memory sizes! This is by the way most likely not a new result.

In Practice

These logarithms are nice, but what does that mean in practice? Let us assume we have a computation that takes 5 seconds on a 4 GHz computer using 5 GB of RAM. Simplifying the relation between real-world clock rate and steps on an artificial architecture, we roughly have t = 20000000000 ≈ 2⁴³ and s = 5000000000 ≈ 2³². Interactively verifying such a computation should take 43 + 2 + 2 * 5 = 55 rounds, i.e. 2 * 55 = 110 blocks and use messages of around 128 bytes (mostly depending on k, i.e. the architecture). If we do not verify the Merkle proof interactively, we get 44 rounds (88 blocks) and messages of size 1200 bytes (only the last message is that large).

If you say that 110 blocks (roughly 30 minutes on Ethereum, 3 confirmations on bitcoin) sounds like a lot, don't forget what we are talking about here: 5 seconds on a 4 GHz machine actually using full 5 GB of RAM. If you usually run programs that take so much power, they search for specific input values that satisfy a certain condition (optimizing routines, password cracker, proof of work solver, ...). Since we only want to verify a computation, searching for the values does not need to be performed in that way, we can supply the solution right from the beginning and only check the condition.

Ok, right, it should be quite expensive to compute and update the Merkle tree for each computation step, but this example should only show how well this protocol scales on chain. Furthermore, most computations, especially in functional languages, can be subdivided into levels where we call an expensive function that use a lot of memory but outputs a small number. We could treat this function as a single step in the main protocol and start a new interactive protocol if an error is detected in that function. Finally, as already said: In most cases, we simply verify the output and never challenge it (only then do we need to compute the Merkle tree), as the proposer will almost certainly lose their deposit.

Open Problems

In several places in this article, we assumed that we only have two external actors and at least one of them is honest. We can get close to this assumption by requiring a deposit from both the proposer and the challenger. One problem is that one of them might just refuse to continue with the protocol, so we need to have timeouts. If we add timeouts, on the other hand, a malicious actor could saturate the blockchain with unrelated transactions in the hope that the answer does not make it into a block in time. Is there a possibility for the contract to detect this situation and prolong the timeout? Furthermore, the honest proposer could be blocked out from the network. Because of that (and because it is better to have more honest than malicious actors), we might allow the possibility for anyone to step in (on both sides) after having made a deposit. Again, if we allow this, malicious actors could step in for the "honest" side and just pretend to be honest. This all sounds a bit complicated, but I am pretty confident it will work out in the end.