My Crazy Proposal For Achieving Lightweight Distributed Consensus
Distributed consensus is a difficult problem. See about of my previous posts for give-and-take close impossibility results on distributed consensus.
Paxos taught
Perspectives on the CAP theorem
Attacking Generals problem
The argue distributed consensus is difficult is because the parties involved don't receive got access to the same cognition (the same betoken of view) of the organisation state. In ZooKeeper non-leader replicas tin likewise serve reads locally, but the updates yet demand to larn serialized past times the leader. Some Paxos variants such equally Mencius, ePaxos endeavour to salve this problem, but the bottom-line is they are yet dependent plain to the same key performance limitations due to serialization at a unmarried leader.
I receive got this crazy thought for circumventing the impossibility results of distributed consensus equally good equally improving the performance of distributed consensus. The thought is to connect the nodes involved inwards consensus (let's telephone yell upwardly these nodes coordinators) amongst a single-collision-domain Ethernet coach inwards club to solve the asymmetric cognition problem.
No, this setup does non assume reliable communication. There tin survive collisions inwards the Ethernet bus. But the collisions volition survive full collisions since this is a shared Ethernet bus. When at that topographic point is a collision, none of the coordinators would deliver the message, including the sender. Since Ethernet on a shared coach is CSMA-CD, the transmitter likewise detects the collision as well as does non convey its ain message into the consensus log if that is the case.
So, inwards effect, the shared Ethernet coach performs the serialization of the coordinators' proposals. As a result, a coordinator proposing an performance does non demand to collect explicit acknowledgement from whatsoever other coordinator allow lone from a bulk quorum of coordinators. This makes this consensus algorithm real lightweight as well as fast.
This organisation is masking mistake tolerant to coordinator crashes until at to the lowest degree ane coordinator left remaining. (If nosotros are to allow reintegrating coordinators recovering dorsum from crash, things larn complicated of course. Then nosotros would demand to assume reasonable churn to allow fourth dimension for recovering coordinators to pick out grip of upwardly earlier they tin survive integrated. This would likewise require fast ane broadcast consensus on the reconfigurations.)
That is it. That simple. Now comes the algorithmician's apology.
On the theory side, I know I am non suggesting anything novel. The impossibility results withstand; I only changed the organisation weather condition as well as stepped exterior the territory of the impossibility results (that is why I used the term "circumvent"). In fact, I had noticed this thought outset inwards the context of wireless broadcast as well as wireless sensor networks, when I was a post-doc at Nancy Lynch's grouping at MIT. We had published papers exploring the concept for wireless broadcast amongst full as well as partial collisions.
On the practical side, I know this proposal has downsides. This is non readily applicable equally it requires the Ethernet driver to bring out collision information to the application layer. This requires setting upwardly an auxiliary Ethernet LAN across the coordinators. And, yes, I know this doesn't scale exterior a LAN. (The coordinators should survive connected past times the unmarried domain Ethernet bus, but the clients to the coordinators may communicate to the coordinators over TCP/IP as well as demand non survive inwards the same LAN. The coordinators tin survive located across dissimilar racks to guard against rack-wide crashes.)
But every organisation blueprint is an do inwards determining which tradeoffs yous make. The of import enquiry is: Is this tradeoff worth exploring? Would the performance improvement as well as simplicity stemming from this setup makes this a reasonable/feasible pick for solving the distributed consensus occupation at the datacenter level?
Paxos taught
Perspectives on the CAP theorem
Attacking Generals problem
The argue distributed consensus is difficult is because the parties involved don't receive got access to the same cognition (the same betoken of view) of the organisation state. In ZooKeeper non-leader replicas tin likewise serve reads locally, but the updates yet demand to larn serialized past times the leader. Some Paxos variants such equally Mencius, ePaxos endeavour to salve this problem, but the bottom-line is they are yet dependent plain to the same key performance limitations due to serialization at a unmarried leader.
I receive got this crazy thought for circumventing the impossibility results of distributed consensus equally good equally improving the performance of distributed consensus. The thought is to connect the nodes involved inwards consensus (let's telephone yell upwardly these nodes coordinators) amongst a single-collision-domain Ethernet coach inwards club to solve the asymmetric cognition problem.
No, this setup does non assume reliable communication. There tin survive collisions inwards the Ethernet bus. But the collisions volition survive full collisions since this is a shared Ethernet bus. When at that topographic point is a collision, none of the coordinators would deliver the message, including the sender. Since Ethernet on a shared coach is CSMA-CD, the transmitter likewise detects the collision as well as does non convey its ain message into the consensus log if that is the case.
So, inwards effect, the shared Ethernet coach performs the serialization of the coordinators' proposals. As a result, a coordinator proposing an performance does non demand to collect explicit acknowledgement from whatsoever other coordinator allow lone from a bulk quorum of coordinators. This makes this consensus algorithm real lightweight as well as fast.
This organisation is masking mistake tolerant to coordinator crashes until at to the lowest degree ane coordinator left remaining. (If nosotros are to allow reintegrating coordinators recovering dorsum from crash, things larn complicated of course. Then nosotros would demand to assume reasonable churn to allow fourth dimension for recovering coordinators to pick out grip of upwardly earlier they tin survive integrated. This would likewise require fast ane broadcast consensus on the reconfigurations.)
That is it. That simple. Now comes the algorithmician's apology.
On the theory side, I know I am non suggesting anything novel. The impossibility results withstand; I only changed the organisation weather condition as well as stepped exterior the territory of the impossibility results (that is why I used the term "circumvent"). In fact, I had noticed this thought outset inwards the context of wireless broadcast as well as wireless sensor networks, when I was a post-doc at Nancy Lynch's grouping at MIT. We had published papers exploring the concept for wireless broadcast amongst full as well as partial collisions.
On the practical side, I know this proposal has downsides. This is non readily applicable equally it requires the Ethernet driver to bring out collision information to the application layer. This requires setting upwardly an auxiliary Ethernet LAN across the coordinators. And, yes, I know this doesn't scale exterior a LAN. (The coordinators should survive connected past times the unmarried domain Ethernet bus, but the clients to the coordinators may communicate to the coordinators over TCP/IP as well as demand non survive inwards the same LAN. The coordinators tin survive located across dissimilar racks to guard against rack-wide crashes.)
But every organisation blueprint is an do inwards determining which tradeoffs yous make. The of import enquiry is: Is this tradeoff worth exploring? Would the performance improvement as well as simplicity stemming from this setup makes this a reasonable/feasible pick for solving the distributed consensus occupation at the datacenter level?
0 Response to "My Crazy Proposal For Achieving Lightweight Distributed Consensus"
Post a Comment