Paxos Derived

Lamport's fault-intolerant nation car replication algorithm

In 1978, Lamport published his classic "Time, Clocks, as well as the Ordering of Events inwards a Distributed System". As an application of logical clocks, he presented a distributed replicated nation car algorithm (and hence he instantiated that algorithm to solve usual exclusion every bit an example). Lamport complains that no 1 seemed to last aware of the distributed replicated nation car algorithm introduced inwards the paper:

"This is my most often cited paper. Many calculator scientists claim to take hold read it. But I take hold rarely encountered anyone who was aware that the newspaper said anything nearly nation machines. People appear to think that it is nearly either the causality relation on events inwards a distributed system, or the distributed usual exclusion problem. People take hold insisted that in that place is null nearly nation machines inwards the paper. I’ve fifty-fifty had to become dorsum as well as reread it to convince myself that I genuinely did recollect what I had written."

I had talked nearly this distributed replicated nation car algorithm earlier. This algorithm is decentralized to a defect. It is non fifty-fifty tolerant to a unmarried node failure. It assumes failure-free nodes.

The thought of the algorithm is every bit follows: In companionship to ensure that processes practise non take hold dissimilar views of the companionship of updates, logical clocks is used to impose a total ordering on the updates. Each procedure keeps every bit purpose of its nation the following: re-create of the state, logical clock, queue of "modify requests" (with their logical fourth dimension stamps), listing of "known-times", 1 for every other process. Each procedure executes an update asking on its re-create of the nation inwards increasing companionship of timestamps. For safety, all "known times" from other processes should last afterwards than the fourth dimension of the request.

The algorithm industrial plant every bit follows:

Push your asking inwards your ain queue (timestamped alongside your logical clock)
Broadcast your asking to every node timestamp included.
Wait for replies from all other nodes.
If your asking is right away at the caput of your queue as well as the known-times for other processes is ahead of its asking timestamp (known-times is updated every bit processes mail replies to the update request), locomote into critical department (where update to the nation is done).
Upon exiting the critical section, take away your asking from the queue as well as mail a release message to every process.

A fault-intolerant version of Paxos

I late realized that the algorithm inwards a higher house (from the 1978 paper) constitutes a fault-intolerant trial of Paxos!

This occurred to me after thinking nearly it inwards the context of flexible quorums result. The flexible quorums thought (2016) states that nosotros tin weaken Paxos’ "all quorums should intersect" assertion to instead "only quorums from dissimilar phases should intersect". That is, bulk quorums are non necessary for Paxos, provided that phase-1 quorums (Q1) intersect alongside phase-2 quorums (Q2).

This number allows trading off Q1 as well as Q2 sizes to better functioning (to the detriment of fault-tolerance) Assuming failures as well as resulting leader changes are rare, phase-2 (where the leader tells the acceptors to determine values) is run to a greater extent than often than phase-1 (where a novel leader is elected). Thus it is possible to better functioning of Paxos yesteryear reducing the size of Q2 at the expense of making the infrequently used Q1 larger. For representative inwards a arrangement of 10 acceptors, nosotros tin safely allow whatever railroad train of exclusively iii acceptors to participate inwards Phase2, provided that nosotros require viii acceptors to participate for Phase1. Note that the bulk quorums (Q1=Q2=6) would last able to mask upto five node failures (f=5), whereas the Q1=8 configuration tin exclusively alongside stand upward upto two node failures (f=2) every bit it needs viii nodes to last able to perform phase-1 if needed.

So, if you lot bring Q1=N as well as Q2=1, the Paxos algorithm simplifies to the Lamport's distributed nation car replication algorithm above. Note that Q1=N implies the algorithm cannot tolerate whatever node failures, i.e., f=0. On the other hand, alongside this setup, you lot tin combine stage two as well as stage iii because you lot are writing to exclusively 1 node, yourself. So stage iii is non-existent inwards that algorithm.

The route from f=0 to Paxos

Ok, let's approach our claim from the other side every bit well. How practise nosotros bring that f=0 protocol as well as strengthen it hence that it doesn't block (lose progress) alongside 1 node failure?

This is how Phase iii comes inwards to play every bit nosotros add together fault-tolerance. In companionship to tolerate 1 node crash (in a fault-masking manner), you lot demand Q2 to last 2. Then things all of a abrupt teach complicated, because you lot are non but writing to yourself, you lot volition besides demand to write to about other node inwards a guaranteed fashion to persist the state. But, about other leader may last stealing your plough before you lot tin write to your other Q2 node your determination at Phase 2, hence it is non prophylactic to commit the update request! Therefore, Phase two clearing, which is stage 3, is needed to brand this check, as well as it helps you lot replicate your nation hence it is preserved to the expression upward of 1 node failure.

This is a signal of objection, though. In Lamport's f=0 algorithm, logical clocks (LC) are used for reservation; every node respects LC, as well as puts requests into its queue ordered yesteryear LC. If 1 node needs to teach its update done, it eventually volition because the arrangement is making progress. On the other hand, inwards Paxos, using the ballot numbers, for whose implementation LC could last used, a leader steals the previous leader's plough instead of patiently waiting the previous circular to last complete. So what gives?

Well... In Lamport's f=0 algorithm, you lot could afford to last prissy as well as patiently expect for each node to destination its turn, because f=0, as well as you lot are guaranteed to accomplish what you lot expect for. But when f>0 as well as a node tin fail, you lot can't afford to expect for it to destination its plough (otherwise you lot would take hold to expect for an eternity inwards an asynchronous arrangement model), as well as that is why Paxos is happy to modify leaderships, as well as dueling leaders tin arise (even to the signal of violating progress).

In sum, something "fundamental" changes when you lot desire to become fault-tolerant as well as tolerate node failure inwards an asynchronous system. When you lot combine faults as well as full-asynchrony, you lot teach the FLP impossibility result. That agency you lot lose progress! That is why Paxos does non guarantee making progress nether a total asynchronous model alongside a crash failure. However, it preserves security thank you lot to its balloting as well as anchoring system, as well as volition supply progress every bit shortly every bit the partial synchrony kicks inwards as well as weak-complete & eventually-weak-accurate failure detectors are implementable (i.e., when nosotros are out of the realm of the FLP result). So, yes, in that place is a stage transition going from no faults to faults inwards asynchronous system.

I give thank you lot my PhD students, Ailidani Ailijiang as well as Aleksey Charapko, for give-and-take on this idea.

MAD questions

Was this genuinely how Leslie Lamport come upward up alongside the Paxos protocol? Does the 1978 fault-intolerant distributed nation car replication shape a footing to evolve a fault-tolerant version?

I am non aware of whatever newspaper that makes this connection. Was this connexion noticed as well as mentioned before?