Wormspace: A Modular Foundation For Simple, Verifiable Distributed Systems
This newspaper is past times Ji-Yong Shin, Jieung Kim, Wolf Honore, Hernán Vanzetto, Srihari Radhakrishnan, Mahesh Balakrishnan, Zhong Shao, too it appeared at SOCC'19.
The newspaper introduces the Write-Once Register (WOR) abstraction, too argues that the WOR should live a splendid system-building abstraction. By providing single-shot consensus via a unproblematic data-centric API, the WOR acts every bit a edifice block for providing distributed systems durability, concurrency control, too failure atomicity.
Each WOR is implemented via a Paxos instance, too leveraging this, WormSpace (Write-Once-Read-Many Address Space) organizes the address infinite into contiguous write-once segments (WOSes) too provides developers amongst a shared address infinite of durable, highly available, too strongly consistent WORs to construct on. For example, a sequence of WORs tin live used to impose a total order, too a laid of WORs tin popular off on decisions taken past times participants inwards distributed transaction protocols such every bit 2PC.
To demonstrate its utility, the newspaper implements 3 applications over WormSpace. These applications are built solely over the WOR API, soundless supply efficiency comparable to or meliorate than handcrafted implementations.
If yous similar to consider how extensions of to a greater extent than or less of these ideas tin live position inwards action, sentinel this video past times Mahesh Balakrishnan (one of the authors of WormSpace) too Jason Flinn (U Michigan) depict the Delos Storage organisation for Facebook's Control Plane.
The newspaper is well-written, too I actually enjoyed reading this paper. I retrieve this newspaper should have to a greater extent than attention. The newspaper has to a greater extent than or less other track, which I volition non verbalize over here: Formal verification of WormSpace too reuse of this proof for verifying systems built on top, via the Certified Concurrent Abstraction Layer (CCAL) approach. Sometimes, when a newspaper includes many ideas, it may dilute the focus too wound its reception/recognition.
In my summary below, I job many sentences from the newspaper verbatim. I am saving my MAD questions too in-depth review to the Zoom newspaper give-and-take on Midweek 15:30 EST. Our Zoom Distributed Systems Reading Group coming together is opened upwards to anyone who is passionate almost distributed systems. Here is the link to our WormSpace newspaper discussion.)
The address infinite is divided into write-once segments (WOSes) of fixed size. Segments are explicitly allocated via an alloc telephone telephone that takes inwards a segment ID too succeeds if it is every bit soundless unallocated. Once a customer has allocated a WOS, whatsoever customer inwards the organisation tin operate on WORs inside the segment. Specifically, it tin capture a WOR; write to it; too read from it.
Clients must capture an address earlier writing to it to coordinate replicated servers to brand the write atomic too immutable. The capture telephone telephone is similar to a preemtable lock (e.g. phase1, prepare of Paxos): the lock must live acquired to write, but it tin live stolen past times others. A successful capture telephone telephone returns a unique, non-zero captureID; a subsequent write past times the same thread is automatically parameterized amongst this ID, too succeeds if the WOR has non been captured past times to a greater extent than or less other customer inwards the meantime.
The WormSpace pattern is similar to a write-once distributed key-value store: WORs are associated amongst 64-bit IDs (consisting of segment IDs concatenated amongst offsets inside the segment) too mapped to partitions, which inwards plough consist of replica sets of wormservers. Partitioning occurs at WOS granularity; to perform an functioning on a WOR inside a WOS, the customer determines the division storing the segment (via a modulo function) too issues the functioning to the replica set.
Each WOR is implemented via a single-shot Paxos consensus protocol, amongst the wormservers inside a division acting every bit a laid of acceptors. In the context of a unmarried WOR, the wormservers human activity identically to Paxos acceptors; a capture telephone telephone translates to a stage 1a prepare message, whereas a write telephone telephone is a stage 2a take away message. The read protocol mirrors a stage 1a message, but if it encounters a half-written quorum, it completes the write. Each wormserver maintains a map from WOR IDs to the acceptor solid set down for that single-shot Paxos instance. If a map entry is non found, the WOR is treated every bit unwritten.
The client-side library layers the logic for enforcing write-once segments. Each WOS segment is implemented via a laid of information WORs, a unmarried metadata WOR, too a unmarried trim back WOR. Allocating the WOS requires writing to the metadata WOR. If 2 clients race to allocate a WOS, the offset i to capture too write the WOR wins.
WormPaxos is an implementation of Multi-Paxos over WormSpace, exposing a conventional solid set down machine replication (SMR) API to applications. Implementing Multi-Paxos over WormSpace is simple: the sequence of commands is stored on the WormSpace address space. In WormPaxos, servers that want to replicate solid set down human activity every bit WormSpace clients, too are called WP-servers. They tin suggest novel commands past times preparing too writing to the side past times side costless address; too larn commands past times reading the address infinite inwards sequential order.
In WormPaxos, a WP-server becomes a viscid leader only past times using a batch capture on a WOS; accordingly, leadership strategies such every bit viscid leader, rotating leader, etc. tin live implemented only every bit policies on who should telephone telephone the batch capture too when. The leader's identity tin live stored inside the metadata for each segment, obviating the postulate for WormSpace to know almost the notion of a leader or the leadership strategies involved. If the leader crashes, a novel leader that allocates the side past times side WOS tin batch capture the WOS of the previous leader, consummate partially finished operations, too fill upwards inwards junk values to unwritten WORs to forestall holes inwards the SMR/Multi-Paxos log.
The newspaper introduces the Write-Once Register (WOR) abstraction, too argues that the WOR should live a splendid system-building abstraction. By providing single-shot consensus via a unproblematic data-centric API, the WOR acts every bit a edifice block for providing distributed systems durability, concurrency control, too failure atomicity.
Each WOR is implemented via a Paxos instance, too leveraging this, WormSpace (Write-Once-Read-Many Address Space) organizes the address infinite into contiguous write-once segments (WOSes) too provides developers amongst a shared address infinite of durable, highly available, too strongly consistent WORs to construct on. For example, a sequence of WORs tin live used to impose a total order, too a laid of WORs tin popular off on decisions taken past times participants inwards distributed transaction protocols such every bit 2PC.
To demonstrate its utility, the newspaper implements 3 applications over WormSpace. These applications are built solely over the WOR API, soundless supply efficiency comparable to or meliorate than handcrafted implementations.
- WormPaxos, a Multi-Paxos implementation
- WormLog, a distributed shared log (omitted inwards my summary for brevity)
- WormTX, a distributed, mistake tolerant transaction coordinator
If yous similar to consider how extensions of to a greater extent than or less of these ideas tin live position inwards action, sentinel this video past times Mahesh Balakrishnan (one of the authors of WormSpace) too Jason Flinn (U Michigan) depict the Delos Storage organisation for Facebook's Control Plane.
The newspaper is well-written, too I actually enjoyed reading this paper. I retrieve this newspaper should have to a greater extent than attention. The newspaper has to a greater extent than or less other track, which I volition non verbalize over here: Formal verification of WormSpace too reuse of this proof for verifying systems built on top, via the Certified Concurrent Abstraction Layer (CCAL) approach. Sometimes, when a newspaper includes many ideas, it may dilute the focus too wound its reception/recognition.
In my summary below, I job many sentences from the newspaper verbatim. I am saving my MAD questions too in-depth review to the Zoom newspaper give-and-take on Midweek 15:30 EST. Our Zoom Distributed Systems Reading Group coming together is opened upwards to anyone who is passionate almost distributed systems. Here is the link to our WormSpace newspaper discussion.)
The WormSpace system
The WOR abstraction hides the logic for distributed coordination nether a data-centric API: a customer tin capture a WOR; write to a captured WOR; too read the WOR.The address infinite is divided into write-once segments (WOSes) of fixed size. Segments are explicitly allocated via an alloc telephone telephone that takes inwards a segment ID too succeeds if it is every bit soundless unallocated. Once a customer has allocated a WOS, whatsoever customer inwards the organisation tin operate on WORs inside the segment. Specifically, it tin capture a WOR; write to it; too read from it.
Clients must capture an address earlier writing to it to coordinate replicated servers to brand the write atomic too immutable. The capture telephone telephone is similar to a preemtable lock (e.g. phase1, prepare of Paxos): the lock must live acquired to write, but it tin live stolen past times others. A successful capture telephone telephone returns a unique, non-zero captureID; a subsequent write past times the same thread is automatically parameterized amongst this ID, too succeeds if the WOR has non been captured past times to a greater extent than or less other customer inwards the meantime.
The WormSpace pattern is similar to a write-once distributed key-value store: WORs are associated amongst 64-bit IDs (consisting of segment IDs concatenated amongst offsets inside the segment) too mapped to partitions, which inwards plough consist of replica sets of wormservers. Partitioning occurs at WOS granularity; to perform an functioning on a WOR inside a WOS, the customer determines the division storing the segment (via a modulo function) too issues the functioning to the replica set.
Each WOR is implemented via a single-shot Paxos consensus protocol, amongst the wormservers inside a division acting every bit a laid of acceptors. In the context of a unmarried WOR, the wormservers human activity identically to Paxos acceptors; a capture telephone telephone translates to a stage 1a prepare message, whereas a write telephone telephone is a stage 2a take away message. The read protocol mirrors a stage 1a message, but if it encounters a half-written quorum, it completes the write. Each wormserver maintains a map from WOR IDs to the acceptor solid set down for that single-shot Paxos instance. If a map entry is non found, the WOR is treated every bit unwritten.
The client-side library layers the logic for enforcing write-once segments. Each WOS segment is implemented via a laid of information WORs, a unmarried metadata WOR, too a unmarried trim back WOR. Allocating the WOS requires writing to the metadata WOR. If 2 clients race to allocate a WOS, the offset i to capture too write the WOR wins.
WormPaxos
WormPaxos is an implementation of Multi-Paxos over WormSpace, exposing a conventional solid set down machine replication (SMR) API to applications. Implementing Multi-Paxos over WormSpace is simple: the sequence of commands is stored on the WormSpace address space. In WormPaxos, servers that want to replicate solid set down human activity every bit WormSpace clients, too are called WP-servers. They tin suggest novel commands past times preparing too writing to the side past times side costless address; too larn commands past times reading the address infinite inwards sequential order.
In WormPaxos, a WP-server becomes a viscid leader only past times using a batch capture on a WOS; accordingly, leadership strategies such every bit viscid leader, rotating leader, etc. tin live implemented only every bit policies on who should telephone telephone the batch capture too when. The leader's identity tin live stored inside the metadata for each segment, obviating the postulate for WormSpace to know almost the notion of a leader or the leadership strategies involved. If the leader crashes, a novel leader that allocates the side past times side WOS tin batch capture the WOS of the previous leader, consummate partially finished operations, too fill upwards inwards junk values to unwritten WORs to forestall holes inwards the SMR/Multi-Paxos log.
WormTX
2PC is known to live a blocking protocol inwards the presence of crash failures. WormTX shows how it tin live made non-blocking leveraging on WormSpace. A number of variant protocols is presented to present how the efficiency tin live gradually improved.- [Variant A8: 8 message delays] An obvious solution is to only shop the votes inwards a laid of per-RM WORs. In the WOR-based 2PC protocol, an RM initiates the protocol past times contacting the TM (message delay #1); the TM contacts the RMs (#2); they capture the WOR (#3 too #4), too and thence write to it (#5 too #6); mail dorsum their determination to the TM (#7), which sends dorsum a commit message to all the RMs (#8).
- [Variant B6: half-dozen message delays] Each RM tin allocate a dedicated WOS for its decisions too batch capture the WOS inwards advance.
- [Variant C5: five message delays] TM tin straight give away the determination past times listening for write notifications on the WOS.
- [Variant D4: iv message delays] Individual RMs tin straight psyche to each other’s WOSes; this brings us downward to iv message delays.
- [Variant E3: 3 message delays] We create non postulate a TM, since the lastly determination is a deterministic role of the WORs, too whatsoever RM tin time-out on the commit protocol too write a no vote to a blocking RM's WOR to abort the transaction. The initiating RM tin only contact the other RMs on its ain to begin the protocol (combining #1 too #2 of variant A8), bringing downward the number of delays to 3. This variant is non described past times Gray too Lamport' Consensus of Transaction Commit paper.
- [Variant F2: 2 message delays]: This plant only if RMs tin spontaneously begin too vote.
0 Response to "Wormspace: A Modular Foundation For Simple, Verifiable Distributed Systems"
Post a Comment