Tla+ Specification Of The Consistent Prefix Guarantee
We had covered consistency properties inward the recent posts. Now to give-up the ghost along it to a greater extent than concrete as well as to laissez passer on yous a gustation of TLA+, I acquaint a model of a distributed information shop that provides the consistent prefix property.
In my side past times side post, I volition extend this model slightly to acquaint bounded as well as rigid consistency properties. And, inward the postal service after that, I volition add together a customer model to exhibit how this information shop tin last extended to render session consistency. The TLA+ (well, PlusCal to last to a greater extent than accurate) code for this model is available here.
WriteRegion == 1
ReadRegions == 2..NumRegions
chan = [n \in 1..NumRegions |-> <<>>];
The write part has 3 actions to perform:
These 3 actions tin last performed inward whatsoever arbitrary lodge within the procedure loop, as well as the model checker volition ruthlessly seek all possible combinations of these actions interleaved with the read part actions to banking company check if at that spot is a violation of the invariant properties entered.
The outset activity choice is to commit a write inward the write region. (We don't acquire into how that is implemented inward the region; maybe commit is done past times replicating the update to a write quorum inward the region.) As a effect the CommittedLSN is incremented. CommittedLSN keeps runway of the write operation. We are non modeling the content of the write, nosotros volition model the write only with its CommittedLSN as well as seek to acquire the writes replicated inward the read regions inward the same order.
The 2d activity choice is to multicast a committed write to the read regions. To ship the writes inward lodge through the FIFO channels, the write part maintains a SentLSN variable to announce the in conclusion CommittedLSN write that is forwarded to the read regions. If at that spot are around CommittedLSN writes that are yet to last forwarded, these are multicasted inward lodge of the LSN numbers without whatsoever gap. However, authorities annotation that, this is an asynchronous replication. The messages may last sent whenever this activity is chosen, as well as this ship activity is non blocked on waiting whatsoever acknowledgements from the read regions.
The concluding activity choice is to have an Ack message from a read region. This model does non occupation the Ack messages for updating anything, but they tin last used for tracking the progress of the read regions. This volition give-up the ghost of import for the bounded consistency as well as rigid consistency properties.
The loop is bounded past times a constant MAXN thence that the model checking infinite is limited. MAXN=5 is an acceptable number. If yous have got worked with a model checker, yous know that checking five repeat of the functioning without the invariant broken is a really skilful indication it works. Yes, this is non induction proof, but since the model checker tries everything that could peradventure give-up the ghost wrong, it would have got found a counterexample (e.g., race-condition that violates an invariant) fifty-fifty with 3 repeats or so.
The read regions only react to the replicated messages from the write region. So they have got only ii actions.
The outset activity choice is to have a message pending inward the channel from the write region. CommittedLSN variable is thence updated to jibe the LSN inward the message. Here nosotros don't banking company check for the gaps, but since the channels are FIFO as well as the messages are sent past times the write part inward the increasing LSN lodge without gaps, the consistent prefix belongings holds ---as nosotros afterwards confirm when nosotros model-check with the invariant.
The 2d activity choice is to ship dorsum an Ack message for whatsoever replication message that is non acknowledged yet. Since this activity is also asynchronously chosen, the acknowledgement tin last cumulative, it tin skip over LSNs as well as admit the highest seen, as well as that's OK.
CP == [][\A i \in ReadRegions:
CommittedLSN'[i] = CommittedLSN[i]
\/ CommittedLSN'[i] = CommittedLSN[i] + 1]_vars
An option would last to insert a PrefixConsistency boolean variable inward the specification, which gets laid to FALSE at the read part if the replica receives an out-of-order commit request. But that makes the model ugly; it is non squeamish to entangle the model as well as belongings to last checked.
Another option would last to write a client, as well as to banking company check for operation consistency with the customer reads. But that is cumbersome, because yous involve to innovate a customer as well as a read functioning at the read regions. Furthermore, that volition also drive the model checker to bring longer fourth dimension to complete.
Let's write around conditions, "fake invariants", to essay that this is the case.
SyncStep == \A i \in ReadRegions :
CommittedLSN[i]> CommittedLSN_[1] -3
SyncStep2 == \A i,j \in ReadRegions:
CommittedLSN[i]# CommittedLSN[j] -3
The model checker is quick to honor counterexamples for these conditions. For the SyncStep the error line provides a brusque counterexample where the write part raced ahead to commit 3 updates without broadcasting whatsoever updates to the read regions. (Since I used CommittedLSN variable for both read as well as write regions, the TLA translator assigned CommittedLSN_ "with underscore" to the write region's version to distinguish it from that of the read regions.)
For the SyncStep2, the error line is longer because it needs around setup. Here the write part performed 3 writes as well as broadcasted these to the read regions, but only the read part 2 performed the updates as well as acquire to CommittedLSN=3, spell the read part 3 has non performed whatsoever have activity from its channel.
My outset model was large. My improvements involved trimming as well as simplifying that model.
What is CompletedLSN? In my before version of the model, the write part used CompletedLSN to give-up the ghost along runway of the progress of the read regions, but I realized this variable was unnecessary for checking the consistent prefix, thence I took that out entirely.
I also performed other simplifications to cut down the model checking soil space. Instead of sending messages to the regions 1 past times one, I modified the write part to queue the messages at in 1 lawsuit via the multicast operation.
For eventual consistency, nosotros don't involve to maintain CommittedLSN as well as a sequencing of writes. The write part tin maintain a laid for writes/updates, as well as multicast whatsoever write/update from that laid at random. The read part but receives an update as well as performs it. There is also no involve for acknowledgement from the read region. The repeated multicasts volition eventually constitute that every write/update is replicated.
2. How would yous extend this to the stronger consistency guarantees?
In my side past times side post, I volition extend this model to specify bounded staleness as well as rigid consistency properties. As purpose of this extension, the write part would involve to give-up the ghost along runway of the updates performed at the read regions as well as slow downwardly to brand certain it is non going likewise far ahead of the read regions.
Since session consistency is specific to a client, nosotros volition write a customer model as well as allow the customer share/copy LSN numbers as well as acquire served using that LSN number. The "LSN" volition last our session state.
3. Was this likewise uncomplicated an illustration to last worth modeling?
As I wrote above, I had several blunders before I could simplify this model to what I presented. And I learned a lot from developing this uncomplicated model. So, this was definitely worth modeling as well as working on.
I had written virtually the benefits of modeling before here.
In my side past times side post, I volition extend this model slightly to acquaint bounded as well as rigid consistency properties. And, inward the postal service after that, I volition add together a customer model to exhibit how this information shop tin last extended to render session consistency. The TLA+ (well, PlusCal to last to a greater extent than accurate) code for this model is available here.
The organization model
We assume at that spot is a write part (with ID=1) as well as read regions (with IDs 2 through NumRegions). The write part performs the write as well as copies it to the read regions. There are FIFO communication channels betwixt the write as well as read regions.WriteRegion == 1
ReadRegions == 2..NumRegions
chan = [n \in 1..NumRegions |-> <<>>];
The write part actions
The write part has 3 actions to perform:
- Commit the write inward the write region
- Multicast the write to the read regions
- Receive/process an ack message from a read region
These 3 actions tin last performed inward whatsoever arbitrary lodge within the procedure loop, as well as the model checker volition ruthlessly seek all possible combinations of these actions interleaved with the read part actions to banking company check if at that spot is a violation of the invariant properties entered.
The outset activity choice is to commit a write inward the write region. (We don't acquire into how that is implemented inward the region; maybe commit is done past times replicating the update to a write quorum inward the region.) As a effect the CommittedLSN is incremented. CommittedLSN keeps runway of the write operation. We are non modeling the content of the write, nosotros volition model the write only with its CommittedLSN as well as seek to acquire the writes replicated inward the read regions inward the same order.
The 2d activity choice is to multicast a committed write to the read regions. To ship the writes inward lodge through the FIFO channels, the write part maintains a SentLSN variable to announce the in conclusion CommittedLSN write that is forwarded to the read regions. If at that spot are around CommittedLSN writes that are yet to last forwarded, these are multicasted inward lodge of the LSN numbers without whatsoever gap. However, authorities annotation that, this is an asynchronous replication. The messages may last sent whenever this activity is chosen, as well as this ship activity is non blocked on waiting whatsoever acknowledgements from the read regions.
The concluding activity choice is to have an Ack message from a read region. This model does non occupation the Ack messages for updating anything, but they tin last used for tracking the progress of the read regions. This volition give-up the ghost of import for the bounded consistency as well as rigid consistency properties.
The loop is bounded past times a constant MAXN thence that the model checking infinite is limited. MAXN=5 is an acceptable number. If yous have got worked with a model checker, yous know that checking five repeat of the functioning without the invariant broken is a really skilful indication it works. Yes, this is non induction proof, but since the model checker tries everything that could peradventure give-up the ghost wrong, it would have got found a counterexample (e.g., race-condition that violates an invariant) fifty-fifty with 3 repeats or so.
The read part actions
The read regions only react to the replicated messages from the write region. So they have got only ii actions.
The outset activity choice is to have a message pending inward the channel from the write region. CommittedLSN variable is thence updated to jibe the LSN inward the message. Here nosotros don't banking company check for the gaps, but since the channels are FIFO as well as the messages are sent past times the write part inward the increasing LSN lodge without gaps, the consistent prefix belongings holds ---as nosotros afterwards confirm when nosotros model-check with the invariant.
The 2d activity choice is to ship dorsum an Ack message for whatsoever replication message that is non acknowledged yet. Since this activity is also asynchronously chosen, the acknowledgement tin last cumulative, it tin skip over LSNs as well as admit the highest seen, as well as that's OK.
Invariant checking as well as testing
Our invariant is brusque as well as sweet. It checks that at whatsoever read region, the CommittedLSN --if updated-- is ever incremented past times 1 over its previous value. That is, at that spot are no gaps inward the commit sequence.CP == [][\A i \in ReadRegions:
CommittedLSN'[i] = CommittedLSN[i]
\/ CommittedLSN'[i] = CommittedLSN[i] + 1]_vars
An option would last to insert a PrefixConsistency boolean variable inward the specification, which gets laid to FALSE at the read part if the replica receives an out-of-order commit request. But that makes the model ugly; it is non squeamish to entangle the model as well as belongings to last checked.
Another option would last to write a client, as well as to banking company check for operation consistency with the customer reads. But that is cumbersome, because yous involve to innovate a customer as well as a read functioning at the read regions. Furthermore, that volition also drive the model checker to bring longer fourth dimension to complete.
What are another properties nosotros tin essay here?
In the consistent prefix guarantee the write part tin commit asynchronously as well as freely without waiting for the read regions to catchup. The read regions tin catchup on their ain pace.Let's write around conditions, "fake invariants", to essay that this is the case.
SyncStep == \A i \in ReadRegions :
CommittedLSN[i]> CommittedLSN_[1] -3
SyncStep2 == \A i,j \in ReadRegions:
CommittedLSN[i]# CommittedLSN[j] -3
The model checker is quick to honor counterexamples for these conditions. For the SyncStep the error line provides a brusque counterexample where the write part raced ahead to commit 3 updates without broadcasting whatsoever updates to the read regions. (Since I used CommittedLSN variable for both read as well as write regions, the TLA translator assigned CommittedLSN_ "with underscore" to the write region's version to distinguish it from that of the read regions.)
For the SyncStep2, the error line is longer because it needs around setup. Here the write part performed 3 writes as well as broadcasted these to the read regions, but only the read part 2 performed the updates as well as acquire to CommittedLSN=3, spell the read part 3 has non performed whatsoever have activity from its channel.
Things I tried to speed upwards the TLA model checking
I believe it is of import to part the mistakes committed equally good equally the terminate result, thence others tin larn better.My outset model was large. My improvements involved trimming as well as simplifying that model.
Perfection is achieved, non when at that spot is null to a greater extent than to add, but when at that spot is null left to bring away. --Antoine de Saint-Exupery.To give-up the ghost along model checking viable I used MAXN. But I found that the model checker would withal last crunching scenarios after 10 minutes. I knew that this is a uncomplicated protocol as well as it shouldn't bring thence long to model check. Then I noticed that the job is with bounding the run on the incorrect thing: I was bounding the CompletedLSN variable with MAXN, but the CommittedLSN was withal able to race unbounded higher upwards the MAXN. After I noticed my error the model checker took only a yoke seconds for MAXN=5.
What is CompletedLSN? In my before version of the model, the write part used CompletedLSN to give-up the ghost along runway of the progress of the read regions, but I realized this variable was unnecessary for checking the consistent prefix, thence I took that out entirely.
I also performed other simplifications to cut down the model checking soil space. Instead of sending messages to the regions 1 past times one, I modified the write part to queue the messages at in 1 lawsuit via the multicast operation.
MAD questions
1. How practise yous relax this for eventual consistency?For eventual consistency, nosotros don't involve to maintain CommittedLSN as well as a sequencing of writes. The write part tin maintain a laid for writes/updates, as well as multicast whatsoever write/update from that laid at random. The read part but receives an update as well as performs it. There is also no involve for acknowledgement from the read region. The repeated multicasts volition eventually constitute that every write/update is replicated.
2. How would yous extend this to the stronger consistency guarantees?
In my side past times side post, I volition extend this model to specify bounded staleness as well as rigid consistency properties. As purpose of this extension, the write part would involve to give-up the ghost along runway of the updates performed at the read regions as well as slow downwardly to brand certain it is non going likewise far ahead of the read regions.
Since session consistency is specific to a client, nosotros volition write a customer model as well as allow the customer share/copy LSN numbers as well as acquire served using that LSN number. The "LSN" volition last our session state.
3. Was this likewise uncomplicated an illustration to last worth modeling?
As I wrote above, I had several blunders before I could simplify this model to what I presented. And I learned a lot from developing this uncomplicated model. So, this was definitely worth modeling as well as working on.
I had written virtually the benefits of modeling before here.
0 Response to "Tla+ Specification Of The Consistent Prefix Guarantee"
Post a Comment