Tla+/Pluscal Solution For Modeling Of 2-Phase Commit Transactions

I had posted almost the minute TLA+ project I assigned for the distributed systems class the other day. Here is the solution to the project. The solution is pretty uncomplicated together with straightforward. (This was built past times modifying the /P2TCommit.tla/ at Lamport's site to add together a TM together with a BTM agent).

Initial definitions


Lines 12-13 laid the initial RMstates for each replica together with the initial TMstate.

Lines 15-18 define the canCommit together with canAbort conditions. If all RMs are "prepared" (they are ready for a commit decision), the TM uses this to laid tmState="commit". CanCommit is too truthful if at that spot is an RM that already has "committed" state. This is possible if the TM already made the "commit" determination together with an RM went ahead amongst it together with transitioned to "committed" from "prepared". Then if the TM failed, together with BTM is to brand a decision, the BTM sets tmState="commit" inwards companionship to boot the bucket along Consistency.

If at that spot exists an RM amongst "abort" dry reason or an RM amongst a "failed" state, canAbort becomes truthified, provided that at that spot does non be an RM amongst "committed" state. This Line xviii is at that spot to forestall inconsistency amongst the Line sixteen condition. If at that spot is a failed node, it is possible to Abort, but non if a previous Commit was signaled together with made an RM="committed".

RM modeling


Lines 37-38 demonstrate the actions RM take. Since "working" together with "prepared" states are nonterminal states, the RM keeps trying an activeness until a end dry reason is reached. The RM may telephone phone 1 of the macros Prepare, Decide, together with Fail.

Prepare get-go checks that the RM is inwards "working" state, together with if together with thus transitions it to the "prepared" state. (In the 2-phase commit implementations, mostly this happens upon a prepare-request sent past times the TM, but this modeling keeps things uncomplicated together with abstracts that away.)

Decide changes the RM dry reason to "committed" if  tmState="commit", together with changes it to "aborted" if tmState="abort". Note that, when Decide is called RM is either inwards "working" or "prepared" state. As the canCommit status shows to a higher house tmState goes to "commit" solely if all RMs are inwards "prepared" state, together with thus the allowed transitions for RM is prepared->committed, prepared->aborted, together with working->aborted.

Fail changes the RM dry reason to "failed".
Don't worry almost all RMs boot the bucket to failed, the model checker is going to cheque all possible scenarios including all up, 1 downwards (at whatever of the dozens of possible points inwards computation), 2 downwards (at whatever of the dozens of possible points inwards computation).

TM modeling


The TM model is simple. The TM checks if it canCommit or canAbort together with updates tmState accordingly. TM tin too neglect which makes the tmState="hidden". To boot the bucket along things uncomplicated all the same interesting, I solely made the TM neglect subsequently it makes a decision. But if y'all notice, I pose a label earlier the neglect action. That makes these 2 updates nonatomic: That is, the tmState volition live available for RMs to read for a duration, earlier the TM fails.

BTM modeling


The BTM model is really like to the TM model. The BTM checks for tmState="hidden" earlier it interjects, but otherwise it makes the same decision. Of class at that spot volition live only about complications, mayhap an RM changed dry reason seeing TMs decision, or mayhap an RM failed, since the TMs decision.

For simplicity nosotros assume the BTM does non fail.

Properties


Termination checks if all processes halt executing eventually. TypeOK is only a sanity check.

Consistency checks that at that spot are no 2 RMs such that 1 says "committed" together with other says "aborted". That was what the 2-phase commit was trying to forestall inwards the get-go place.

Model checking

Model checking amongst TM failure (with the BTM code commented out) led to a Termination violation every bit expected. That is if TM fails earlier the tmState is read past times all RMs, the prepared RMs volition block together with won't live able to terminate.

When nosotros uncomment the BTM code, the BTM is able to accept over, together with the Termination is too satisfied. Of class nosotros didn't larn to this flawless model all at once. The model checker constitute only about counterexamples to my flawed models, together with I fixed them. I approximate this post would live to a greater extent than instructive if I had documented only about of the mistakes I made instead of  showing solely the lastly dry reason of the code. Well, adjacent time.

But that was easy

How were nosotros able to solve this problem? Isn't this supposed to live a complicated occupation inwards the database literature? There were a lot of attempts to brand 2-phase commit fault-tolerant, create it amongst a backup TM. Lamport together with Gray newspaper on Transaction Commit says that all of those attempts had bugs inwards corner cases:
A non-blocking commit protocol is 1 inwards which the failure of a unmarried procedure does non forestall the other processes from deciding if the transaction is committed or aborted. They are oft called Three-Phase Commit protocols. Several have got been proposed, together with a few have got been implemented [3, 4, 19]. They have got commonly attempted to “fix” the Two-Phase Commit protocol past times choosing only about other TM if the get-go TM fails. However, nosotros know of none that provides a consummate algorithm proven to satisfy a clearly stated correctness condition. For example, the give-and-take of non-blocking commit inwards the classic text of Bernstein, Hadzilacos, together with Goodman [3] fails to explicate what a procedure should exercise if it receives messages from 2 unlike processes, both claiming to live the electrical flow TM. Guaranteeing that this province of affairs cannot arise is a occupation that is every bit hard every bit implementing a transaction commit protocol.

The argue nosotros succeeded easily is because nosotros cheated: nosotros assumed clean failures and perfect failure detectors. If nosotros didn't have got perfect failure detectors, this would live a mess: A node may human activeness every bit if only about other is dead, when the other node is live together with human activeness every bit if this 1 is dead. This asymmetry of information is the rootage of all evil inwards distributed systems.

In a hereafter post, I innovation to brand the failure detectors to a greater extent than fuzzy inwards this 2PC amongst backup TM code together with demonstrate what type of problems tin arise.

Before I destination this post, permit me respond this hanging question: How exercise nosotros bargain amongst partial failures together with imperfect failure detectors? We purpose Paxos to cutting the Gordian Knot together with human activeness every bit a definitive jurist to larn the processes grip on the electrical flow state/configuration of the system. This is precisely what Lamport together with Gray demonstrate inwards their Consensus on Transaction Commit to brand 2-phase commit fault-tolerant. I volition verbalise over that operate too inwards a hereafter spider web log post.

0 Response to "Tla+/Pluscal Solution For Modeling Of 2-Phase Commit Transactions"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel