Tla+ Project2 Solution (2016)

In a previous post, I had given the description of project2. Here is the TLA+ project2 solution made available on github. Below I volition briefly explicate the solution.

If you lot are non familiar amongst the chain-replication protocol, read this brief summary. 

The setup together with variables


The starting fourth dimension component division is nearly the setup. We declare the procedure IDs for storage nodes (denoted equally Nodes), the clients, together with the configurator process.


The nodes keep "db" each, where the item is updated together with queried. For simplicity, our distributed key-value shop maintains solely a unmarried item, together with initialized to ver=-1, val=-1, cli=-1.

The nodes also have got an auxiliary variable "up", which is used for modeling the status of the node. A node tin leave of absence "down" at whatsoever fourth dimension provided that less than FAILNUM nodes are currently down.

Each node together with customer has a message box, initially all message boxes are empty.

Finally, at that topographic point is a global variable chain. Initially it is an empty sequence. As the configurator populates the chain, the chain volition listing inward gild the storage nodes: head, interim nodes, together with the tail storage node. The configurator maintains the chain past times removing the IDs of crashed nodes from the chain, together with adding salubrious nodes to the tail of the chain.

(In exercise this tin endure implemented inward several ways. chain could endure the API of the configurator process, together with tin endure obtained past times an RPC telephone weep upward to the configurator. Or chain could endure cached at the client-side library, together with if/when the configurator changes the chain, it tin update the chain variable at the library.)

The define block


Here comes the macros. We write this to simplify our modeling. As I wrote earlier, i time nosotros determine on the macros, nosotros are halfway done amongst our modeling. "I bring out that i time you lot larn the "define" block right, the residuum of the PlusCal algorithm practically writes itself. But you lot may demand to brand a yoke unsuccessful takes before you lot figure out the most suitable /define/ block that leads to a succinct together with elegant model."

IsUp is a piece of work that takes an ID of a storage node, together with returns whether that storage is upward or not. Its implementation is real simple. Since nosotros model a node beingness upward or non using the upward variable for that node, the macro merely returns the upward variable for the corresponding node. Why did nosotros have got a macro for this? We volition job this boolean piece of work equally a filter for SelectSeq operator when nosotros speak over the configurator.

UpNodes render a develop that consists of the storage nodes that are up. We could have got also written it equally {n \in Nodes: IsUp(n)}. InChain(s) predicate returns whether node ID "s" is included inward the chain or not. I implemented this past times using the Seq operator. If chain is i of the subsequences of Nodes \ {s}, hence second is non inward the chain. Otherwise second is inward the chain.

ChainNodes returns the develop of all nodes second where InChain(s) is satisfied. FreeUpNode returns an "up" node that is non inward the chain. GetIndex(s) returns the index of the node second inward the chain.GetNext(s) returns the successor of node second inward the chain.

The client


The customer sends write requests to the storage nodes. The customer gets to its piece of work solely after a chain is formed past times the configurator. To continue our model checking short/finite, nosotros brand the customer to shipping STOP set out of requests.

Before sending an update request, the customer starting fourth dimension reads the electrical flow version of the item from the fundamental value store. This is done inward label CLR within a spell loop. The read asking is sent to the node at the tail of the chain, which is given past times chain[Len(chain)]. The customer sends the asking past times writing to the messagebox of the tail node. Why is this within a spell loop? Because nosotros leave of absence the fault-tolerance to the client. The client's read asking may endure dropped. This tin hap either via a message loss, or when the tail node crushes after receiving the read request. If the customer does non larn a reply dorsum to its read request, it re-sends the read asking to the electrical flow tail of the chain, until it receives a message dorsum to its message box. Then, the customer sets the hver to endure the latest version of the item+1, together with removes the message from its message box, past times resetting the message box to empty.

In the CLW tag, the customer writes the update to the item to the caput of the chain, i.e., chain[1], equally per the chain replication protocol. This is also done inward a spell loop, because an update tin endure lost (if the caput or interim node inward chain crashes), together with it is the client's responsibleness to retry the update until it receives dorsum an acknowledgment for that item update.

After the write is acknowledged, the customer removes the acknowledgment message from its messagebox, together with increments the local variable counter, which maintains the set out of successful updates the customer made.

The storage nodes


The storage node actions are inward ii parts. The instant component division (starting amongst label NDF) is the modeling of a crash or recovery of a node. If FAILNUM set out of crashes are non reached, whatsoever node tin crash past times setting its upward variable False, together with whatsoever crashed node tin recover dorsum past times setting its upward variable to True.

The starting fourth dimension component division actions demo how a node reacts to a asking inward its message box. The preconditions to serving a asking are that the node has a asking inward its message box, the node is upward together with component division of the chain. Then, nosotros banking concern check the type of the message: val=-1 agency this is a read message, otherwise it is an update message together with the db is updated amongst the message. The message is hence propagated to the messagebox of the adjacent node inward the chain. (This is done regardless of whether it is a read message or update message). If this is the tail, hence the message is written to the messagebox of the customer indicated on the message (as that customer has sent this asking to the system). Finally the node sets its messagebox to empty when it finishes processing this message.

The configurator


The configurator procedure is infallible. (In exercise it would endure implemented equally a Paxos box.) Its chore is to monitor together with configure the chain. (If you lot are non familiar amongst the chain-replication protocol, read this brief summary.)

It starting fourth dimension checks if the chain is of length=3 together with if at that topographic point is an available upward node to append to chain it volition add together it to chain to larn the chain of length=3. When a novel node is appended equally the tail, its db is also initialized past times copying the db of the electrical flow tail of the chain.

After repopulating the chain, the configurator checks the wellness of the nodes inward the chain, together with removes the crashed nodes inward the chain. For this I used the SelectSeq functioning which filters the chain based on the output of IsUp predicate on each fellow member of the chain.

Model checking

The organisation is to satisfy single-copy consistency. Single-copy consistency agency the storage nodes seem to exterior equally if it is a unmarried virtual infallible node, fifty-fifty though upto FAILNUM of physical storage nodes tin fail. More specifically, the highest version set out resultant returned past times a read from the tail should tally the item stored past times the most lately acknowledged write functioning on the system.


If at that topographic point is solely i customer of the organisation (you tin configure this past times setting C=1 inward model checking parameters), hence at that topographic point is a shortcut to checking single-copy consistency. Since the unmarried customer is using cntr to continue rail of the writes completed together with updates the item val amongst cntr+1, the version number, hver, together with val of the item should ever match. The Consistent invariant checks that this is truthful at all storage nodes. "Consistent" tin endure violated if the customer reads an quondam ver, together with cntr is incremented equally usual, hence that ver becomes lower than cntr = val. However, if nosotros implemented our chained-storage to implement single-copy consistency correctly, Consistent should concord equally invariant inward the presence of a unmarried Client.

Of course of written report that is a real restrictive invariant (assumes a unmarried client, together with assumes val=cntr). Let's facial expression for to a greater extent than full general invariants. CCon denotes that an before node inward the chain volition have got to a greater extent than recent data "hver" than a later node inward the chain. CPro is an end-to-end invariant at the client. It says that for whatsoever client, the hver maintained at the customer is nondecreasing. That is, a customer never reads an older/smaller hver from the organisation after reading a recent/larger hver. CPro is genuinely written equally a temporal formula, hence when model-checking CPro should endure added to the temporal formulas department inward the model. It has a peculiar form: hver' agency the adjacent state value of hver. And the formula reads as: It is ever the instance that for whatsoever client, the adjacent state value of hver is greater than or equal to hver value at the electrical flow state.

When I model banking concern check amongst C=1 unmarried customer I come across that Consistent, CCon, together with CPro holds for this model. (Of course of written report it took me many iterations to larn there. There were subtle bugs that led to referencing nonexistent nodes inward chain, etc.). When I banking concern check amongst C=2, the model checker complains that Consistent is violated. Of course of written report this is expected, hence I uncheck Consistent from the invariants, hence that I tin banking concern check that CCon together with CPro holds.

Surprise! CCon is violated amongst ii clients. (The same violation also applies for CPro of course.) The model checker spits out a 35 stride counter example. Here is the gist of the problem.

Initially both client1 together with client2 reads version equally -1, together with they both start an update amongst version 0. Client1 writes amongst version 0, client2's write is dropped inward the messagebox at the caput equally Client1 overwrites that. Client1 hence reads version equally 0, together with starts merely about other write version=1, together with succeeds. Client2 hasn't acted yet. Finally, Client2 retries its write amongst version 0 together with manages to write amongst version 0 to the chain. And the chain went from version 1 downwardly to version 0. CCon is violated. (CPro would also endure violated, because Client1 would read a version=0 after it has read version=1.)

What went wrong? Modeling the messagebox amongst length=1 played a piece of work inward the counterexample. Only i message tin endure inward the message box at a time, together with if merely about other message arrives before this message is read/consumed past times the node, that starting fourth dimension message gets lost. If the implementation uses TCP, this overwriting job would endure taken assist of. However, it is yet possible for Client2's write to larn lost together with CCon together with CPro endure violated: Client2 powerfulness write to the caput which after fails, hence Client1 writes to the novel caput together with succeeds amongst version=0 together with version=1, hence Client2 retries together with overwrites the chain db amongst version=0.

As i way to create the problem, nosotros tin consider changing the blind rewriting at label CLW. We may consider adding the customer merely about other activity (which could endure added to the procedure amongst an either clause) to banking concern check if the write succeeds together with if non to retry the write but starting from CLR hence that the customer reads the novel version (if any). But this is yet prone to a race condition, equally nosotros brand assumptions nearly relative speeds of Client1 together with Client2. After Client2 reads a version, before it starts the CLW part of the write, Client1 may have got completed its write incrementing the version together with causing the same counterexample. It seems similar the proper way to handgrip this job is at the storage node level. The storage node should refuse an update that downgrades the version set out of the item, together with the customer should larn a rejection for its update request. I haven't set that update inward my model to demo the counterexample amongst ii customer case.

Some other subtle points inward the modeling

I chose to model crash detection of nodes using an auxiliary variable "up", together with I made it a global variable hence that the configurator tin readily access this information. I didn't set emphasis on a realistic failure detection part, together with instead focused on the correctness modeling of the chain-based key-value store. A to a greater extent than realistic, closer to implementation way of doing this would endure to non to job an auxiliary variable at all. For example,  the configurator could rely on periodic heartbeat messages from the nodes to implement failure detection of the nodes.

I also model "db" equally a global variable. I tried to avoid this at first, but hence I needed a machinery for copying the db of the tail, when I append a novel node to endure the novel tail of the chain. This could have got been done past times message passing together with inventing a particular asking message, but that would brand the model longer. I went for a shorter model together with exposed db of the storage node. The solely house I job this is for copying db when appending a novel tail, together with this would endure something to refine farther equally nosotros motility towards implementing our protocol.

One way to refine this could endure equally follows. The novel tail tin shipping a "read" message to the electrical flow tail, equally if it is a client. This would propagate the db of the electrical flow tail to the candidate tail, together with that is copied at the candidate tail. But hence nosotros demand a signaling machinery betwixt the candidate tail together with the configurator hence that the candidate tail tin endure declared equally the novel tail. But this task  is race status prone, hence this should endure modeled inward PlusCal together with checked before it makes its way to the implementation. For this proposed refinement, the race status tin occur equally follows: the electrical flow tail acknowledges a write to the client, together with hence the novel tail is added which did non larn this update, the customer may enquiry the novel tail for a read together with larn an quondam version of the item. Single re-create serializability is violated.

So nosotros unearthed a concurrency race status for 2 clients, together with merely about other race for adding a candidate tail. What other types of concurrency problems could endure at that topographic point amongst a sloppy protocol? Many unlike types. It is fun to leave of absence through the modeling procedure amongst TLA+ equally you lot tin honour how things may leave of absence incorrect inward ways you lot haven't anticipated. The TaxDC newspaper gives a expert written report of concurrency bugs inward production use, well-debugged opened upward origin systems. Here are the mutual misconceptions nearly distributed systems that leads to these bugs:
+ One hop is faster than ii hops.
+ No hop is faster than i hop.
+ Interactions betwixt multiple protocols seem to endure safe.
+ What you lot idea equally an atomic block of execution wasn't, because merely about other procedure was scheduled to execute something concurrently together with that other procedure changed the organisation state inward a way you lot didn't anticipate.

0 Response to "Tla+ Project2 Solution (2016)"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel