The Ben-Or Decentralized Consensus Algorithm
In PODC 1983, Michael Ben-Or published a randomized distributed asynchronous consensus algorithm in a newspaper titled "Another wages of gratis pick (Extended Abstract): Completely asynchronous understanding protocols".
After xv years, a newspaper inwards 1998 provided a correctness proof of Ben-Or's algorithm. Above is the pseudocode for Ben-Or from that paper. From the pseudocode it looks similar Ben-Or is a real uncomplicated algorithm, but inwards reality its demeanour is non tardily to sympathize as well as appreciate. But fret not, TLA+ modeling as well as model checking helps a lot for understanding the demeanour of the Ben-Or algorithm. It is fulfilling when you lot finally sympathize how the algorithm works, as well as how security is ever preserved as well as progress is achieved eventually probabilistically.
I had assigned modeling of Ben-Or equally the TLA+ projection for my distributed systems class. Here I portion my PlusCal modeling of Ben-Or, as well as speak over how this model helps us to sympathize the algorithm better. Here is the link to the total model on Github, if you lot similar to play amongst it.
This is the top degree model of each process.
Ben-Or is a decentralized consensus algorithm, it is non a leader-based algorithm. In the absence of a leader for tie-breaking, Ben-Or solves entirely binary input consensus, rather than arbitrary input value consensus. Even then, it employs randomization for achieving progress. (I had discussed the pros as well as cons of leader-based versus leaderless consensus algorithms earlier.)
Ben-Or has rounds but, different Paxos, the rounds are non client-restricted as well as practise non preempt each other. The rounds are mutual to all nodes, as well as they are traversed inwards a lock-step fashion past times N-F nodes inwards an asynchronous manner, where due north is the total issue of nodes, as well as F the upper leap on the issue of faulty nodes. Recently nosotros accept seen that threshold clock synchronization tries to brand such a circular concept to a greater extent than systematic.
Each circular has 2 phases. In the get-go stage of a round, each node tries to position a value supported past times a bulk inwards guild to advise that for the 2nd phase. In the 2nd stage (the decision/ratification phase), a node finalizes its determination if it finds the same value proposed past times at to the lowest degree F+1 nodes. Upon failure to larn a determination inwards the electrical current round, Ben-Or makes about nodes to alter their votes before the adjacent round. This helps jolt/tilt the arrangement toward a decision, so that the arrangement (eventually probabilistically) converges to a consensus value inwards ane of the upcoming rounds.
The model uses the next procedure local variables. p1v denotes the value the node broadcasts inwards stage 1 of a round, as well as p2v denotes the value it broadcasts inwards stage 2 of a round. The decided variable at each node to shop the final/terminal determination a node makes for consensus. Initially decided=-1, as well as entirely when the procedure decides, decided is gear upward to 0 or 1. These variables are initialized equally p1v=INPUT[self], p2v=-1, decided=-1. In the real get-go round, the p1v values are come upward from the INPUT value for each node, inwards the consequent rounds, the p1v is assigned equally a role of the p2v values inwards EvalP2 function.
We utilization MAXROUND to leap the issue of rounds a node tin seek to continue model checking feasible; e.g., MAXROUND=3 or four suffices to larn the properties tested. We utilization INPUT to give initial values to nodes. This is binary consensus, so nosotros utilization 0 or 1 equally possible initial values. When nosotros run our model, inwards the model overview pane, nosotros tin give values to these constants equally follows: F <- 1, due north <- 4, INPUT <- «0,1,1,1», MAXROUND <-3.
We model message passing amidst nodes past times using a shared message board. This is merely a gear upward defined equally a global variable. We utilization p1Msg equally the phase1 message board, as well as p2Msg equally the phase2 message board. Each node broadcasts its p1v or p2v value past times adding it to the corresponding message board via a uncomplicated gear upward union. For instance p1Msg:=p1Msg \union {<<self, r, p1v>>} adds p1v to the corresponding phase1 message board. This amounts to broadcasting the p1v value of the node, as well as whatever other node tin on its ain schedule read from p1Msg gear upward what messages accept been sent for a given round, r. This reading is done via gear upward filtering. SentP1Msgs(r)=={m \in p1Msg: m[2]=r} returns all messages inwards broadcast for circular r to the p1Msg board, as well as SentP1MsgsV(r,a)=={m \in p1Msg: m[2]=r /\ m[3]=a} returns all messages broadcast amongst circular issue r as well as value a.
In phase1, each node broadcasts its p1v to p1Msg, as well as and then waits until at that spot are at to the lowest degree N-F phase1 values shared inwards the channel for circular r. This ensures that the nodes traverse the rounds inwards a lock-step fashion. Among the p1v values retrieved from the channel for circular r, if the node sees a bulk value $a \in {0,1}$, it adopts that value for its p2v value to advise inwards phase2 of this round. If the node does non run into a bulk value (which does non imply that at that spot is no bulk value present, because each node sees a different perspective of values), it adopts the default $p2v=-1$ value for proposing inwards phase2, important that it did non run into either 0 or 1 equally supported from stage 1 communication exchange.
Note that, since both 0 or 1 cannot endure inwards the majority, the p2v value gear upward for a circular tin endure {-1}, {-1,0}, {-1,1}, {0} or {1}. But it is non possible accept {-1,0,1} or {0,1} equally the p2v value gear upward because if either 0 or 1 is seen inwards the majority, it is non possible for the other value to endure likewise inwards the majority.
In phase2, each node broadcasts its p2v to p2Msg, as well as and then waits until at that spot are at to the lowest degree N-F phase2 values shared inwards the channel for this circular r. Among the p2v values retrieved from the channel for circular r, if the node sees a value $a \in {0,1}$ sent past times at to the lowest degree F nodes, it adopts that value for its determination value. Else if the node sees a 0 or 1 value sent past times about other node, it adopts this value for its p1v value for the upcoming round. This is an optimistic effort for accumulating traction for a value that at to the lowest degree ane node witnessed equally bulk amidst p1v values inwards this round. Otherwise, equally the in conclusion resort, the node adopts a random 0 or 1 value equally its p1v value for the adjacent round. This serves equally the random jounce to larn the arrangement contention towards a value that tin endure witnessed equally bulk past times about nodes so the case1 or case2 inwards a higher house applies for the round.
Why does the algorithm postulate seeing at to the lowest degree F nodes amongst the same p2v value before it is chosen for decision? If less than F nodes accept seen a bulk p1v value, state 1, as well as they prematurely determine on this value, it is possible that they may all crash afterwards this decision. The other nodes would endure next case2 as well as case3 of phase2, as well as it is quite possible that inwards the adjacent circular 0 is seen equally bulk value past times about node, as well as the arrangement as well as then converges to determine on 0, violating the understanding property. While progress is probabilistic inwards Ben-Or, security should as well as is ever satisfied inwards Ben-Or.
Once a node sees at to the lowest degree F+1 nodes amongst the same p2v value, state 1, it decides. And it is guaranteed that inwards this stage 2, other available nodes likewise run into at to the lowest degree ane node amongst p2v=1, fifty-fifty if the F nodes that has proposed p2v=1 crashed. (They await for $N-F$ p2v values, remember.) This agency that, inwards the worst case, inwards the adjacent circular all available processes volition run into bulk p1v=1 amidst the N-F phase1 messages they receive, as well as determine inwards the phase2 of that round.
Okay, this brings us to the properties nosotros banking concern check on this model.
Agreement says that for whatever 2 nodes j as well as k that decided, their determination values should endure the same. Agreement should ever endure satisfied, fifty-fifty when F is to a greater extent than than N/2. Of course, if $F>N/2$, nodes volition never endure able to decide, because inwards phase1 a bulk is required for proposing a 0 or 1 p2v value.
For testing progress for F<N/2, nosotros utilization the Progress belongings that says eventually all processes determine something. If nosotros start amongst the same p1v values at all nodes, progress is guaranteed for N>2F. In that instance all nodes volition run into that value equally the bulk value as well as advise it equally the p2v value inwards circular 1, as well as they all determine inwards phase2 of that round.
But what almost if the initial p1v values accept a mixture of 0s as well as 1s? They may non determine because it may non endure possible for whatever available node to detect a bulk 0 or 1 from their sample of N-F values, as well as the default/skip value $p2v=-1$ volition endure proposed for the stage 2. When such an ambiguity is possible, the determination tin as well as then endure postponed forever past times choosing the worst possible "random" assignments to the nodes for the adjacent round. This is of course of report non whatever to a greater extent than a uniformly random assignment of 0s or 1s to p1v, but a pathological/malicious value assignment past times the model checker to demonstrate how the Progress tin endure violated.
For instance for N=4 as well as F=1, INPUT=<<0,1,1,1>> volition violate the Progress property. The model checker volition shroud the "p1v=1" from node, as well as since the other nodes would non run into the value "1" inwards the bulk (which is three for N=4), they volition all advise p2v=-1. The model checker likewise volition expose the "random" assignments that volition extend this indecision past times non giving bulk to whatever value. (On the other hand, for F=0, inwards this setup, the bulk value of "1" volition endure seen past times all nodes, as well as the arrangement volition determine inwards ane round.)
The pathological runs volition intermission the Progress property, but how practise nosotros banking concern check that nether a mixed input set, at that spot are about runs that terminate? To this end, nosotros write a safety/invariant belongings called BaitProgress which claims that it is non possible for all processes to decide, as well as sentry the model checker come upward up amongst a counterexample for how this belongings is violated, which implies that yest consensus/progress is solved for about runs.
If N=4 as well as INPUT=«0,1,1,1», is it possible to accept 0 decided for a run? To seek out this nosotros write a security belongings called MinorityReport which claims that it is non possible for all the nodes to determine "0" equally the consensus value. The model checker volition seek to show us incorrect past times producing a counterexample. If you lot banking concern check this for F=1, nosotros volition expose that yes, MinorityReport volition endure violated important that at that spot exists an execution where all nodes volition determine on 0 equally the consensus value, fifty-fifty though it is clearly inwards the minority inwards the initial input set.
Texel is likewise a leaderless binary asynchronous consensus protocol. But Texel does non utilization rounds, as well as instead utilization consistent cuts to evaluate whether at that spot is a consensus value rubber to determine on. Texel assumes N>3F+1, so that a value supported past times F+1 nodes (which is a bulk of 2F+1) is visible to whatever procedure fifty-fifty afterwards F nodes crash (or larn unavailable otherwise). The work is at that spot may endure 2 different supporting values, as well as this violates progress. Moreover, at that spot is a run a hazard that progress is withal non guaranteed when all values are the same. When the consistent cutting read of nodes (this is called experimentation phase, which is somewhat similar to a phase1 inwards Ben-Or) conflict amongst each other, the read is aborted as well as restarted from scratch. So it is possible that the determination is postponed forever past times conflicting experiment phases, fifty-fifty when all input values are the same.
As I discussed earlier, a binary consensus is withal useful for blockchain transaction. A well-formed customer should avoid double spending, as well as would not advise 2 different transactions amongst the same UTXO, as well as therefore they are guaranteed for liveness. However liveness is non guaranteed (but security withal holds) for double-spending transactions submitted past times Byzantine clients, which conflict amongst ane another. In this instance the liveness violation is a characteristic non a bug.
Avalanche extended Texel's leaderless consensus approach inwards damage of scalability as well as quick finality (by using sampling as well as metastability), as well as applied the resultant decentralized algorithm inwards the blockchain domain. I assume a similar transformation is possible for extending Ben-Or to a sampling based solution. Even though Ben-Or is circular based different Texel, I think via sampling as well as momentum nosotros tin avoid the requirement of the rounds, similar to how Avalanche avoided the requirement of consistent reads inwards Texel.
After xv years, a newspaper inwards 1998 provided a correctness proof of Ben-Or's algorithm. Above is the pseudocode for Ben-Or from that paper. From the pseudocode it looks similar Ben-Or is a real uncomplicated algorithm, but inwards reality its demeanour is non tardily to sympathize as well as appreciate. But fret not, TLA+ modeling as well as model checking helps a lot for understanding the demeanour of the Ben-Or algorithm. It is fulfilling when you lot finally sympathize how the algorithm works, as well as how security is ever preserved as well as progress is achieved eventually probabilistically.
I had assigned modeling of Ben-Or equally the TLA+ projection for my distributed systems class. Here I portion my PlusCal modeling of Ben-Or, as well as speak over how this model helps us to sympathize the algorithm better. Here is the link to the total model on Github, if you lot similar to play amongst it.
Outline of the algorithm
This is the top degree model of each process.
Ben-Or is a decentralized consensus algorithm, it is non a leader-based algorithm. In the absence of a leader for tie-breaking, Ben-Or solves entirely binary input consensus, rather than arbitrary input value consensus. Even then, it employs randomization for achieving progress. (I had discussed the pros as well as cons of leader-based versus leaderless consensus algorithms earlier.)
Ben-Or has rounds but, different Paxos, the rounds are non client-restricted as well as practise non preempt each other. The rounds are mutual to all nodes, as well as they are traversed inwards a lock-step fashion past times N-F nodes inwards an asynchronous manner, where due north is the total issue of nodes, as well as F the upper leap on the issue of faulty nodes. Recently nosotros accept seen that threshold clock synchronization tries to brand such a circular concept to a greater extent than systematic.
Each circular has 2 phases. In the get-go stage of a round, each node tries to position a value supported past times a bulk inwards guild to advise that for the 2nd phase. In the 2nd stage (the decision/ratification phase), a node finalizes its determination if it finds the same value proposed past times at to the lowest degree F+1 nodes. Upon failure to larn a determination inwards the electrical current round, Ben-Or makes about nodes to alter their votes before the adjacent round. This helps jolt/tilt the arrangement toward a decision, so that the arrangement (eventually probabilistically) converges to a consensus value inwards ane of the upcoming rounds.
The model uses the next procedure local variables. p1v denotes the value the node broadcasts inwards stage 1 of a round, as well as p2v denotes the value it broadcasts inwards stage 2 of a round. The decided variable at each node to shop the final/terminal determination a node makes for consensus. Initially decided=-1, as well as entirely when the procedure decides, decided is gear upward to 0 or 1. These variables are initialized equally p1v=INPUT[self], p2v=-1, decided=-1. In the real get-go round, the p1v values are come upward from the INPUT value for each node, inwards the consequent rounds, the p1v is assigned equally a role of the p2v values inwards EvalP2 function.
We utilization MAXROUND to leap the issue of rounds a node tin seek to continue model checking feasible; e.g., MAXROUND=3 or four suffices to larn the properties tested. We utilization INPUT to give initial values to nodes. This is binary consensus, so nosotros utilization 0 or 1 equally possible initial values. When nosotros run our model, inwards the model overview pane, nosotros tin give values to these constants equally follows: F <- 1, due north <- 4, INPUT <- «0,1,1,1», MAXROUND <-3.
Phase1 as well as Phase2 actions
We model message passing amidst nodes past times using a shared message board. This is merely a gear upward defined equally a global variable. We utilization p1Msg equally the phase1 message board, as well as p2Msg equally the phase2 message board. Each node broadcasts its p1v or p2v value past times adding it to the corresponding message board via a uncomplicated gear upward union. For instance p1Msg:=p1Msg \union {<<self, r, p1v>>} adds p1v to the corresponding phase1 message board. This amounts to broadcasting the p1v value of the node, as well as whatever other node tin on its ain schedule read from p1Msg gear upward what messages accept been sent for a given round, r. This reading is done via gear upward filtering. SentP1Msgs(r)=={m \in p1Msg: m[2]=r} returns all messages inwards broadcast for circular r to the p1Msg board, as well as SentP1MsgsV(r,a)=={m \in p1Msg: m[2]=r /\ m[3]=a} returns all messages broadcast amongst circular issue r as well as value a.
In phase1, each node broadcasts its p1v to p1Msg, as well as and then waits until at that spot are at to the lowest degree N-F phase1 values shared inwards the channel for circular r. This ensures that the nodes traverse the rounds inwards a lock-step fashion. Among the p1v values retrieved from the channel for circular r, if the node sees a bulk value $a \in {0,1}$, it adopts that value for its p2v value to advise inwards phase2 of this round. If the node does non run into a bulk value (which does non imply that at that spot is no bulk value present, because each node sees a different perspective of values), it adopts the default $p2v=-1$ value for proposing inwards phase2, important that it did non run into either 0 or 1 equally supported from stage 1 communication exchange.
Note that, since both 0 or 1 cannot endure inwards the majority, the p2v value gear upward for a circular tin endure {-1}, {-1,0}, {-1,1}, {0} or {1}. But it is non possible accept {-1,0,1} or {0,1} equally the p2v value gear upward because if either 0 or 1 is seen inwards the majority, it is non possible for the other value to endure likewise inwards the majority.
In phase2, each node broadcasts its p2v to p2Msg, as well as and then waits until at that spot are at to the lowest degree N-F phase2 values shared inwards the channel for this circular r. Among the p2v values retrieved from the channel for circular r, if the node sees a value $a \in {0,1}$ sent past times at to the lowest degree F nodes, it adopts that value for its determination value. Else if the node sees a 0 or 1 value sent past times about other node, it adopts this value for its p1v value for the upcoming round. This is an optimistic effort for accumulating traction for a value that at to the lowest degree ane node witnessed equally bulk amidst p1v values inwards this round. Otherwise, equally the in conclusion resort, the node adopts a random 0 or 1 value equally its p1v value for the adjacent round. This serves equally the random jounce to larn the arrangement contention towards a value that tin endure witnessed equally bulk past times about nodes so the case1 or case2 inwards a higher house applies for the round.
Why does the algorithm postulate seeing at to the lowest degree F nodes amongst the same p2v value before it is chosen for decision? If less than F nodes accept seen a bulk p1v value, state 1, as well as they prematurely determine on this value, it is possible that they may all crash afterwards this decision. The other nodes would endure next case2 as well as case3 of phase2, as well as it is quite possible that inwards the adjacent circular 0 is seen equally bulk value past times about node, as well as the arrangement as well as then converges to determine on 0, violating the understanding property. While progress is probabilistic inwards Ben-Or, security should as well as is ever satisfied inwards Ben-Or.
Once a node sees at to the lowest degree F+1 nodes amongst the same p2v value, state 1, it decides. And it is guaranteed that inwards this stage 2, other available nodes likewise run into at to the lowest degree ane node amongst p2v=1, fifty-fifty if the F nodes that has proposed p2v=1 crashed. (They await for $N-F$ p2v values, remember.) This agency that, inwards the worst case, inwards the adjacent circular all available processes volition run into bulk p1v=1 amidst the N-F phase1 messages they receive, as well as determine inwards the phase2 of that round.
Okay, this brings us to the properties nosotros banking concern check on this model.
Safety as well as Progress properties
Agreement says that for whatever 2 nodes j as well as k that decided, their determination values should endure the same. Agreement should ever endure satisfied, fifty-fifty when F is to a greater extent than than N/2. Of course, if $F>N/2$, nodes volition never endure able to decide, because inwards phase1 a bulk is required for proposing a 0 or 1 p2v value.
For testing progress for F<N/2, nosotros utilization the Progress belongings that says eventually all processes determine something. If nosotros start amongst the same p1v values at all nodes, progress is guaranteed for N>2F. In that instance all nodes volition run into that value equally the bulk value as well as advise it equally the p2v value inwards circular 1, as well as they all determine inwards phase2 of that round.
But what almost if the initial p1v values accept a mixture of 0s as well as 1s? They may non determine because it may non endure possible for whatever available node to detect a bulk 0 or 1 from their sample of N-F values, as well as the default/skip value $p2v=-1$ volition endure proposed for the stage 2. When such an ambiguity is possible, the determination tin as well as then endure postponed forever past times choosing the worst possible "random" assignments to the nodes for the adjacent round. This is of course of report non whatever to a greater extent than a uniformly random assignment of 0s or 1s to p1v, but a pathological/malicious value assignment past times the model checker to demonstrate how the Progress tin endure violated.
For instance for N=4 as well as F=1, INPUT=<<0,1,1,1>> volition violate the Progress property. The model checker volition shroud the "p1v=1" from node, as well as since the other nodes would non run into the value "1" inwards the bulk (which is three for N=4), they volition all advise p2v=-1. The model checker likewise volition expose the "random" assignments that volition extend this indecision past times non giving bulk to whatever value. (On the other hand, for F=0, inwards this setup, the bulk value of "1" volition endure seen past times all nodes, as well as the arrangement volition determine inwards ane round.)
The pathological runs volition intermission the Progress property, but how practise nosotros banking concern check that nether a mixed input set, at that spot are about runs that terminate? To this end, nosotros write a safety/invariant belongings called BaitProgress which claims that it is non possible for all processes to decide, as well as sentry the model checker come upward up amongst a counterexample for how this belongings is violated, which implies that yest consensus/progress is solved for about runs.
If N=4 as well as INPUT=«0,1,1,1», is it possible to accept 0 decided for a run? To seek out this nosotros write a security belongings called MinorityReport which claims that it is non possible for all the nodes to determine "0" equally the consensus value. The model checker volition seek to show us incorrect past times producing a counterexample. If you lot banking concern check this for F=1, nosotros volition expose that yes, MinorityReport volition endure violated important that at that spot exists an execution where all nodes volition determine on 0 equally the consensus value, fifty-fifty though it is clearly inwards the minority inwards the initial input set.
MAD questions
How does this compare amongst Texel, about other decentralized consensus protocol?Texel is likewise a leaderless binary asynchronous consensus protocol. But Texel does non utilization rounds, as well as instead utilization consistent cuts to evaluate whether at that spot is a consensus value rubber to determine on. Texel assumes N>3F+1, so that a value supported past times F+1 nodes (which is a bulk of 2F+1) is visible to whatever procedure fifty-fifty afterwards F nodes crash (or larn unavailable otherwise). The work is at that spot may endure 2 different supporting values, as well as this violates progress. Moreover, at that spot is a run a hazard that progress is withal non guaranteed when all values are the same. When the consistent cutting read of nodes (this is called experimentation phase, which is somewhat similar to a phase1 inwards Ben-Or) conflict amongst each other, the read is aborted as well as restarted from scratch. So it is possible that the determination is postponed forever past times conflicting experiment phases, fifty-fifty when all input values are the same.
As I discussed earlier, a binary consensus is withal useful for blockchain transaction. A well-formed customer should avoid double spending, as well as would not advise 2 different transactions amongst the same UTXO, as well as therefore they are guaranteed for liveness. However liveness is non guaranteed (but security withal holds) for double-spending transactions submitted past times Byzantine clients, which conflict amongst ane another. In this instance the liveness violation is a characteristic non a bug.
Avalanche extended Texel's leaderless consensus approach inwards damage of scalability as well as quick finality (by using sampling as well as metastability), as well as applied the resultant decentralized algorithm inwards the blockchain domain. I assume a similar transformation is possible for extending Ben-Or to a sampling based solution. Even though Ben-Or is circular based different Texel, I think via sampling as well as momentum nosotros tin avoid the requirement of the rounds, similar to how Avalanche avoided the requirement of consistent reads inwards Texel.
0 Response to "The Ben-Or Decentralized Consensus Algorithm"
Post a Comment