Paper Review: Edifice Consistent Transactions Amongst Inconsistent Replication (Sosp'15)

This paper investigates an interesting too promising inquiry problem: how tin nosotros implement distributed transactions to a greater extent than efficiently yesteryear performing a cross-layer pattern of the transaction protocal/layer amongst the underlying distributed storage protocol/layer? This thought is motivated yesteryear the observation that at that spot is wasted/duplicated piece of work done both at the transaction layer too at the storage layer. The transaction layer already provides ordering (i.e., linearizability), and, if you lot human face at the distributed storage layer, it likewise provides strongly consistent replication which orders/linearizes updates ofttimes amongst Paxos. The newspaper argues that this is overlapping functionality, too leads to higher latency too lower throughput.

Ok, at this point, nosotros receive got to receive got a pace dorsum too realize that nosotros cannot avoid all coordination betwixt the ii layers. The distributed storage layer needs to live on consistent amongst the gild of updates at the transaction layer, because it serves reads too you lot similar it to serve the most recent version, non an one-time version. OK, to cook this, what if nosotros used a multiversion shop too phase information for replication without coordination too consistency, too on transcend of this nosotros commit transactions amongst 2PC? But who decides the version numbers? If it is the transaction layer, where does it acquire the one-time version number: from the storage layer. And, this cyclic dependency volition practise a work when the storage/replication layer is inconsistent too equally a outcome inconsistent version divulge perhaps learned too proposed back. A means to cook this is to utilization timestamps for version numbers. But that would require precise clock synchronization thus that the timestamps likewise check the transaction ordering. I approximate this would live on possible to practise amongst atomic clocks, or perhaps amongst PTP.

Actually, it turned out the newspaper likewise resorted to a multiversion store, but did non demand really precise clock synchronization, too went amongst NTP synchronization. In fact, they could receive got gone without whatever clock synchronization. Clock synchronization is but used for improving the performance, non for satisfying safety. The newspaper  lays out a principled cross-layer codesign of a linearizable transaction protocol on transcend of the  unordered/inconsistent replication. And the pattern is complicated equally nosotros volition meet below. This shows that if nosotros had really precise too dependable clock synchronization to rely on, nosotros tin simplify many problems inwards distributed systems, including this one.

Of course of report a co-design, or a cross-layer design, has an associated drawback: immediately these ii layers are tangled, too you lot cannot swap storage layer protocols too transaction layer protocols too await whatever combination to piece of work without problems. When using a consistent replication layer (such equally Paxos replication), you lot don't receive got to worry virtually swapping transaction layer protocols; utilization OCC, or 2PC, etc.  But, when using an inconsistent replication (IR) layer, a especial purpose transaction protocol that industrial plant on transcend of IR is needed. Any OCC or 2PC protocol volition non piece of work because edifice over IR imposes restrictions on the transaction protocol. The newspaper proposes TAPIR (Transactional Application Protocol for Inconsistent Replication) equally  the transaction layer protocol to accompany IR.

The results are dainty equally they demo pregnant improvement. They demonstrate TAPIR inwards a transactional key-value store, TAPIR-KV, which supports linearizable transactions over a partitioned laid of keys. The experiments flora that TAPIR-KV had: (1) 50% lower commit latency too (2) to a greater extent than than 3x amend throughput compared to systems using conventional transaction protocols, including an implementation of Spanner’s transaction protocol, too (3) comparable performance to MongoDB too Redis, widely-used eventual consistency systems.

What is non dainty is the complexity this consistency relaxation at the replication layer imposes on the overall organisation design. IR is complex, too TAPIR involves fifty-fifty to a greater extent than complex ideas/techniques. It looks similar this is inevitable complexity to compensate for the inconsistent/unordered replication at the storage layer.

 Inconsistent Replication (IR) protocol for relaxing the consistency of replication

In the IR protocol, instead of ordering of operations, replicas grip on the functioning results. What does this mean? The unordered functioning laid provides the next iii guarantees.
P1. [Fault tolerance] At whatever time, every functioning inwards the functioning laid is inwards the tape of at to the lowest degree i replica inwards whatever quorum of f+1 non-failed replicas.
P2. [Visibility] For whatever ii operations inwards the functioning set, at to the lowest degree i is visible to the other.
P3. [Consensus results] At whatever time, the outcome returned yesteryear a successful consensus functioning is inwards the tape of at to the lowest degree i replica inwards whatever quorum. The but exception is if the consensus outcome has been explicitly modified yesteryear the application (i.e., transaction) layer protocol through Merge, afterward which the effect of Merge volition live on recorded instead.

What is this Merge business, you lot ask.
"Some replicas may missy operations or demand to reconcile their solid set down if the consensus outcome chosen yesteryear the application (i.e., transaction) protocol does non check their result. To ensure that IR replicas eventually converge, they periodically synchronize. Similar to eventual consistency, IR relies on the application (i.e., transaction) protocol to reconcile inconsistent replicas. On synchronization, a unmarried IR node get-go upcalls into the application protocol amongst Merge, which takes records from inconsistent replicas too merges them into a primary tape of successful operations too consensus results. Then, IR upcalls into the application (i.e., transaction) protocol amongst Sync at each replica. Sync takes the primary tape too reconciles application (i.e., transaction) protocol solid set down to brand the replica consistent amongst the chosen consensus results."

Requirements from IR clients

IR imposes about difficult requirements on the transaction layer. These bound the transaction layer too complicate its design.

1. Invariant checks must live on performed pairwise.
This is the most restrictive requirement. "IR's limitation is that it tin but back upward organisation guarantees that depend on conflicts betwixt pairs of operations. For example, IR tin live on used to replicate a lock server but non a sales app that but sells matter inwards stock. The lock server's mutual exclusion guarantee is a pair-wise invariant, but calculating the store's stock requires a total history of operations that but rigid consistency tin achieve."

2. Application  (i.e., transaction) protocols must live on able to alter consensus functioning results.

3. Application  (i.e., transaction) protocols should non await operations to execute inwards the same order.

I don't render explanation for the terminal two, because they are complicated. I am non certain I sympathise the explanation provided inwards the paper. This boils downwards to the following: the transaction protocol needs to kind out the mess of inconsistent ordering at the replication/storage layer, yesteryear essentially resorting to retry (with reordering) or abort (due to conflict) transactions too getting it right inwards the side yesteryear side round. This retry amongst reordering resembles the Fast Paxos approach. Due to inconsistent ordering at the replicas, this is the cost to pay.

TAPIR

TAPIR is designed to piece of work on transcend of IR's weak guarantees to render linearizable (strict-serializability) distributed transactions: (1) TAPIR does non receive got whatever leaders or centralized coordination, (2) TAPIR Reads ever acquire to the closest replica, too (3) TAPIR Commit takes a unmarried round-trip to the participants inwards the mutual case.

TAPIR uses clients equally 2PC coordinators. A customer Begins a transaction, too then executes Reads too Writes during the transaction's execution period. During this period, the customer tin Abort the transaction. Once it finishes execution, the customer Commits the transaction, afterward which it tin no longer abort the transaction. The 2PC protocol volition run to completion, committing or aborting the transaction based alone on the conclusion of the participants.

TAPIR uses optimistic transaction ordering via NTP timestamps too OCC inwards gild to concentrate all ordering decisions into a unmarried laid of validation checks at the proposed transaction timestamp. Under load, its abort charge per unit of measurement tin live on equally bad equally OCC. The evaluation department shows a graph comparing TAPIR too OCC abort rates.

TAPIR protocol is complicated. To the credit of the authors, they render a TLA+ model too model checking of TAPIR to back upward its correctness.


Finally, let's view this question: what happens when the TAPIR customer fails? Since customer is the coordinator of the 2PC transaction, this is similar to coordinator failure inwards 2PC, too leads to blocking too a complicated recovery physical care for which necessitates going to 3PC. In other words, TAPIR doesn't solve the 2PC blocking work of transactions. A recovery physical care for at the customer grade is needed if the client/coordinator blocks or dies. (RIFL, equally nosotros verbalise over below, considers almost the dual work of IR/TAPIR too results inwards a cleaner recovery for the customer level.)

RIFL comparison

RIFL likewise appeared on SOSP xv equally the TAPIR paper. RIFL takes a unlike approach to the problem. RIFL advocates edifice transactions over a linearizability layer at the replication, the wages of which is cleaner recovery at the transaction layer. RIFL eliminated the complex recovery procedures of Sinfonia transactions, when the initiator (transaction manager) dies.

Evaluation

The newspaper uses Retwis Twitter clone too YCSB benchmark for evaluation. Evaluation is done on Google App Engine. Note that peak throughput is lower than both Cassandra too Redis, but those are eventual consistency systems, spell TAPIR-KV uses a distributed transaction protocol that ensures strict serializability. Unfortunately the consistency violations inwards the eventual consistency KVs are non provided. That would live on interesting to meet too compare. They may plough out to live on non thus bad, equally PBS newspaper and Facebook consistency papers point out.


Related links

Link to the paper

SOSP conference presentation where Irene does a bully labor of presenting overview of the ideas/techniques involved

The code for TAPIR is available here 

Irene's weblog post service on lessons learned from TAPIR

0 Response to "Paper Review: Edifice Consistent Transactions Amongst Inconsistent Replication (Sosp'15)"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel