Check out the different parts:

Say you have 3 imperfect clock measurements. They do not tell the exact same time. Maybe they are drifting away, or to put things worse maybe one is totally broken. Here we look at this basic problem: given several clock measurements, how do you establish time? We’ll see three fundamental works in fault-tolerant agreement protocols.

Agreement #

Establishing a notion of agreement in discrete variables in easy. For example, we can vote to decide where a group should go have dinner (= quorum-based technique). Or we can say we trust certain message when at least a certain threshold of parties trust it (and express this via cryptographic signatures). This works nicely. We can build very robust systems with this approach, a la triple-modular redundancy. Commercial planes land autonomously hundreds of people safely thanks to this.

Establishing agreement in continuous variables is a bit harder. After all, what are outliers? Five-sigma deviation from the mean? Why not \(42 \sigma\)? And how do you take into account the measurement confidence? Or the sensor accuracy?

Never go to sea with two chronometers; take one or three

Marzullo 1983 #

Keith Marzullo proposed a synchronization algorithm based on interval intersection. Marzullo algorithm provides accurate time synchronization, but does not really deal with faulty clocks. This is the basic idea:


NTP uses Marzullo’s algorithm as inspiration.

Lamport 1987 #

Leslie Lamport wrote in 1987 the absolutely delightful Synchronizing Time Servers.

The main advantage is that Lamport’s technique can simultaneously deal with faulty clocks and provide similar time to nodes that are relatively close: Synchronizing Time Servers

Lamport noticed there was something off with Marzullo’s approach. Lamport expresses with astonishing clarity that two nodes that receive a slightly different view of the same clocks may end up with very different end result. This is represented in this picture:

Problem with Marzullo

In the context of distributed systems, this is important. You’d expect “close” nodes to have a “similar” view of the time. Technically, this is because the agreement function is not ``Lipschitz continuous’’

The solution Lamport provides is essentially averaging midpoints when the outliers are thrown away. In precise terms:

Lamport's average

Drawback: Lamport could not give an “optimal” solution for this. (It is telling the honesty of Lamport, hardly seen in other authors.)

Problem with Lamport

Schmid and Schossmaier 1999 #

Fast forward, 20 something years later, Schmid and Schossmaier solved Lamport’s concern in their paper How to Reconcile Fault-Tolerant Interval Intersection with the Lipschitz Condition.

They define a very simple agreement function that is Lipschitz continuous, and that takes into account all available information:

Definition: How to Reconcile Fault-Tolerant Interval Intersection with the Lipschitz Condition

This is a comparison of Schmid and Schossmaier vs Lamport function showing Lipschitz continuity in action:

Example Lipschitz in action

Example what can happen to Marzullo