Check out the different parts:
- Part I: protocol for a cryptographic desk clock
- Part II: time reference
- Part II: agreement protocols (this page)
- … sometime in the future … client implementation
Say you have 3 imperfect clock measurements. They do not tell the exact same time. Maybe they are drifting away, or to put things worse maybe one is totally broken. Here we look at this basic problem: given several clock measurements, how do you establish time? We’ll see three fundamental works in fault-tolerant agreement protocols.
Establishing a notion of agreement in discrete variables in easy. For example, we can vote to decide where a group should go have dinner (= quorum-based technique). Or we can say we trust certain message when at least a certain threshold of parties trust it (and express this via cryptographic signatures). This works nicely. We can build very robust systems with this approach, a la triple-modular redundancy. Commercial planes land autonomously hundreds of people safely thanks to this.
Establishing agreement in continuous variables is a bit harder. After all, what are outliers? Five-sigma deviation from the mean? Why not \(42 \sigma\)? And how do you take into account the measurement confidence? Or the sensor accuracy?
Never go to sea with two chronometers; take one or three
Marzullo 1983 #
Keith Marzullo proposed a synchronization algorithm based on interval intersection. Marzullo algorithm provides accurate time synchronization, but does not really deal with faulty clocks. This is the basic idea:
NTP uses Marzullo’s algorithm as inspiration.
Lamport 1987 #
Leslie Lamport wrote in 1987 the absolutely delightful Synchronizing Time Servers.
The main advantage is that Lamport’s technique can simultaneously deal with faulty clocks and provide similar time to nodes that are relatively close:
Lamport noticed there was something off with Marzullo’s approach. Lamport expresses with astonishing clarity that two nodes that receive a slightly different view of the same clocks may end up with very different end result. This is represented in this picture:
In the context of distributed systems, this is important. You’d expect “close” nodes to have a “similar” view of the time. Technically, this is because the agreement function is not ``Lipschitz continuous’’
The solution Lamport provides is essentially averaging midpoints when the outliers are thrown away. In precise terms:
Drawback: Lamport could not give an “optimal” solution for this. (It is telling the honesty of Lamport, hardly seen in other authors.)
Schmid and Schossmaier 1999 #
Fast forward, 20 something years later, Schmid and Schossmaier solved Lamport’s concern in their paper How to Reconcile Fault-Tolerant Interval Intersection with the Lipschitz Condition.
They define a very simple agreement function that is Lipschitz continuous, and that takes into account all available information:
This is a comparison of Schmid and Schossmaier vs Lamport function showing Lipschitz continuity in action: