Efficient Replication Of Large Information Objects
the ABD replicated atomic storage algorithm for replication of large objects. When objects beingness replicated is much larger than the size of the metadata (such equally tags or pointers), it is efficient to tradeoff performing cheaper operations on the metadata inward gild to avoid expensive operations on the information itself.
Related
Client Protocol
When customer i does a read, it goes through iv phases inward order: rdr, rdw, rrr together with rok. The stage names depict what happens during the phase: read-directories-read, read-directories-write, read-replicas-read, together with read-ok. During rdr, i reads (utd, tag) from a quorum of directories to unwrap the most up-to-date replicas. i sets its ain tag together with utd to endure the (tag, utd) it read alongside the highest tag, i.e., timestamp. During rdw, i writes (utd, tag) to a write quorum of directories, hence that subsequently reads volition read i’s tag or higher. During rrr, i reads the value of x from a replica inward utd. Since each replica may shop several values of x, i tells the replica it wants to read the value of x associated alongside tag. During rok, i returns the x-value it read inward rrr.When i writes a value v, it also goes through iv phases inward order: wdr, wrw, wdw together with wok. These stage names correspond write-directories-read, wrw for write-replicas-write, wdw for write-directories-write, together with wok for write-ok, respectively. During wdr, i reads (utd, tag) from a quorum of directories, together with then sets its tag to endure higher than the largest tag it read. During wrw, i writes (v, tag) to a laid acc of replicas, where |acc| ≥ f + 1. Note that the laid acc is arbitrary; it does non cause got to endure a quorum. During wdw, i writes (acc, tag) to a quorum of directories, to betoken that acc is the laid of most up-to-date replicas, together with tag is the highest tag for x. Then i sends each replica a secure message to say them that its write is finished, hence that the replicas tin flaming garbage-collect older values of x. Then i finishes inward stage wok.
If yous cause got difficulty inward agreement the ask for 2-round directory reads/writes this protocol, reviewing how the ABD protocol plant volition help.
Replica together with Directory node protocol
The replicas answer to customer requests to read together with write values of information object x. Replicas also garbage-collect out of appointment values of x, together with gossip amid themselves the latest value of x. The latter is an optimization to assist spread the latest value of x, hence that clients tin flaming read from a nearby replica.The directories' alone chore is to answer to customer requests to read together with write utd together with tag.
Questions together with discussion
Google File System (SOSP 2003) addressed efficient replication of large information objects for datacenter computing inward practice. GFS also provides a metadata service layer together with information object replication layer. For the metadata directory service, GFS uses Chubby, a Paxos service which ZooKeeper cloned equally opensource. Today if yous desire to produce from a consistent large object replication storage from scratch, your architecture would most probable purpose ZooKeeper equally the metadata directory coordination service equally GFS prescribed. ZooKeeper provides atomic consistency already, hence it eliminates the 2-round needed for directory-reads together with directory-writes inward LDR.LDR does non purpose a split upwards metadata service, instead it tin flaming scavenge raw dumb storage nodes for directory service together with accomplish the same number past times using ABD replication for making the metadata directory atomic/fault-tolerant. In other words, LDR takes a fully-decentralized approach, together with tin flaming back upwards loosely-connected heterogenous wimpy devices (maybe fifty-fifty smartphones?). I jurist that agency to a greater extent than freedom. On the other hand, LDR is bad for performance. It requires ii rounds of directory-write for each write functioning together with ii rounds of directory-read for each read operation. This is major drawback for LDR. Considering reads are mostly 90% of the workload, supporting 1 circular directory-reads would cause got alleviated the performance work somewhat. Probably inward normal cases (in the absence of failures, the kickoff directory read (rdr operation) volition present the up-to-date replica re-create is acquaint inward a quorum of directory nodes, together with the minute circular of directory access (rdw operation) tin flaming endure skipped.
Using ZooKeeper for the metadata directory helps a lot, but a downside tin flaming endure that ZooKeeper is a unmarried centralized location, together with that agency for roughly clients across to ZooKeeper volition ever incur high WAN communication penalty. Using ZooKeepers observers trim this toll for read operations. And equally I volition weblog nearly soon, our locomote on WAN-Keeper reduces this toll also for write operations. The LDR newspaper suggests that LDR is suitable for WAN, but LDR even hence incurs WAN latencies piece accessing a quorum of directory nodes (twice!) across WAN.
Another way to efficiently replicate large information objects is of course of teaching key-value stores. In key-value stores, yous don't cause got a metadata directory, equally "hashing" takes attention of that. On the other hand, most key-value stores sacrifice rigid consistency, inward lieu for eventual consistency. Is it truthful that yous can't simply larn away alongside using hashes together with need roughly form of metadata service if yous similar to accomplish consistency? The consistent key-value stores I tin flaming retrieve of (and I can't retrieve of likewise many) purpose either a Paxos commit on metadata or at to the lowest degree a chain replication approach such equally inward Hyperdex together with Replex. The chain replication approach uses a Paxos box alone for directory node replication configuration information; does that even hence count equally a minimal together with 1-level-indirect metadata service?
0 Response to "Efficient Replication Of Large Information Objects"
Post a Comment