Paper Summary: High-Availability Distributed Logging Amongst Bookkeeper

This newspaper is brought to yous past times the Yahoo Research grouping that developed the ZooKeeper, and it appeared inward LADIS'12.

BookKeeper targets the logging problem. More specifically, the distributed logging occupation where high-availability is of import as well as where many distributed clients are interested inward reading the logs.

Most electrical flow applications log to the local disk, merely this constitutes a unmarried betoken of failure (SPOF) as well as betrays high-availability. A hasty remedy is to write to an NFS division to shop log files remotely. But similar a shot the NFS server becomes the SPOF. (We tin flame of class replicate the NSF server, merely the performance would suffer.) Another solution is to purpose NetApp filers that implement RAID. This costs money, as well as all the same does non completely solve SPOF.

BookKeeper provides a no-SPOF efficient information shop  for serving a large issue of concurrent single-writer, multiple-reader logs. It stripes log entries across servers, leading to higher throughput. BookKeeper is opensource as well as is used inward production systems.

BookKeeper presents 2 example studies: Hedwig as well as HDFS Namenode. Hedwig is a scalable topic-based publish-subscribe system. To guarantee the delivery of messages despite partitions as well as server failures, Hedwig uses logging to persist published messages, which is implemented amongst BookKeeper. Hedwig is inward production purpose as well as serves force notifications for Yahoo! properties (e.g., notifications for mobile devices).

The other purpose example concerns replicating the HDFS Namenode, the element of HDFS (Hadoop Distributed File System) that manages the file organisation metadata. On each update, the Namenode writes synchronously to a mag to guarantee that the update is durable. But unfortunately the Namenode is a SPOF. To enable efficient journaling as well as rigid durability through replication, BookKeeper is used for implementing a mag director for HDFS. The implementation is currently business office of the HDFS codebase.

BookKeeper blueprint as well as architecture

BookKeeper has three top dog components:

  • A bookie is a BookKeeper storage server, as well as each bookie stores ledger fragments. A ledger is written across f+1 bookies for fault-tolerance as well as striping. 
  • BookKeeper client is used for interacting amongst bookies. 
  • Ledger abstracts a log file. It is a sequence of entries identified past times a sequence issue (id). 

BookKeeper assumes that at that topographic point is solely a unmarried customer writing to a ledger (clients tin flame employ ZooKeeper coordination for this), as well as inward render it guarantees that, in 1 lawsuit a ledger is closed, all other clients that read from it read the same sequence of entries.

Here is the happy path for BookKeeper. An application using BookKeeper initially designates a ledger writer.  This ledger author creates a ledger as well as appends information to the ledger; solely the ledger author is able to append entries to the ledger. Eventually, afterward appending an arbitrary issue of entries to the ledger, the ledger author closes it. Once the ledger is closed, its content is immutable. Clients tin flame opened upwards shut ledgers for reading as well as whatever private ledger tin flame accept multiple readers over time, as well as fifty-fifty concurrent readers.

The top dog calls inward the API enable applications to:

  • Create a ledger;
  • Add entries to a ledger;
  • Open a ledger for reading;
  • Read entries from a ledger;
  • Close a ledger to forbid farther writes;
  • Delete a ledger.

All these calls accept both a synchronous as well as an asynchronous version.
Creating as well as using a ledger.
When a customer creates a ledger, it selects a prepare of bookies to shape an ensemble for the ledger as well as stores the ensemble information every bit business office of the ledger metadata on ZooKeeper. For each entry the ledger author adds to the ledger, it replicates the entry across f+1 bookies. A asking to add together an entry e completes successfully if e has been successfully replicated across f+1 bookies. If a bookie crashes, as well as so the customer replaces that bookie. BookKeeper uses ZooKeeper to travel along rail of configuration changes for a ledger.

Closing a ledger.
When closing a ledger, the ledger author writes to ZooKeeper the terminal entry that has been written successfully, every bit business office of the ledger metadata. If a ledger author crashes prematurely, earlier it closes its opened upwards ledger, a ledger reader would demand to produce ledger recovery.

Ledger recovery.
When a ledger reader opens a ledger for reading, it outset obtains the ledger metadata. If it finds that it has non been shut past times checking the terra firma of the ledger, the ledger reader triggers a recovery procedure. The outset footstep of recovery for a given ledger consists of having the reader customer asking each bookie inward the ensemble for the terminal add together confirmed (LAC) plain inward the terminal entry that the bookie has processed for the ledger. Since reads are based on entry id, the recovery physical care for tin flame start reading from the highest LAC it receives, as well as thence it is non necessary to read the entire ledger.

Reading from an opened upwards ledger.
BookKeeper too enables clients to read from opened upwards ledgers. When clients demand to read from an opened upwards ledger, they invoke a telephone call upwards to opened upwards the ledger that does non effort to recover it if it is non closed. To avoid reading partially replicated entries from the ledger, which may non hold upwards inward the ledger in 1 lawsuit it is closed, the customer asks bookies for their LAC values. Reading entry i ≤ LAC is safe, since the ledger author has marked it every bit successfully replicated.

Dealing amongst multiple ledgers

To enable recovery, upon each asking to append an entry to a ledger, a bookie appends this entry to the mag as well as flushes the write to the local disk device. A bookie solely acknowledges to the customer in 1 lawsuit it receives a confirmation that the level functioning has completed successfully. Note that the mag is shared across all active ledgers the bookie is currently storing. A bookie too writes entries to the ledger device to serve read requests. Thus, read traffic does non touching the performance of writes to the mag device.

The ledger device stores ledger entries along amongst an index for each ledger. Bookie has a unmarried file, called entry log, as well as interleaves entries of dissimilar ledgers past times appending entries of all ledgers. For each ledger, Bookie too keeps in-memory an index mapping the entry identifier to its seat inward the entry log.

This blueprint targets workloads dominated past times writes, piece non neglecting the performance of reads. Requests to add together an entry to a ledger render every bit shortly every bit the entry is flushed to the mag of a bookie, as well as writes to the ledger device are asynchronous as well as generally sequential to enable the writes to this device to travel along upwards amongst the writes to the mag device. To serve a read request, it is necessary to obtain the seat of the entry inward the entry log. If the index page is cached, as well as so the read requires 1 disk seek.

BookKeeper stores metadata on ZooKeeper.
The ledger metadata includes the ensemble composition of ledgers, write quorum size, ledger status, the terminal entry successfully written to a shut ledger. For the metadata store, BookKeeper uses ZooKeeper. "A different, to a greater extent than scalable information shop becomes necessary when the issue of active ledgers is of the lodge of tens to hundreds of millions." For the availability of bookies, BookKeeper relies upon ZooKeeper because it provides ephemeral znodes as well as watches.

Evaluation

Experiments are conducted using a cluster of identical machines: 2 Quad Core Intel Xeon 2.5Ghz, 16GB of RAM, 1 1 Gbit/s network interface, as well as 4 SATA drives of 1TB as well as rotational speed of 7200 RPM. Each auto inward the cluster mounts an enterprise degree filer via NFS (NetApp FAS3050). This hardware gives a raw performance of 1.2 milliseconds for the latency of add together operations as well as 22.5k adds/sec for 1 kbyte entries when writing to a unmarried bookie. nE-qQ denotes a ledger configuration amongst ensemble size n as well as write quorum q.

Using a 3E-2Q configuration, Figure shows throughput as well as latency for a unmarried customer every bit the maximum issue of outstanding operations is increased. This leads to a higher throughput, inward item for 128-byte entries. No batching tricks employed to meliorate throughput, the processing of an functioning is triggered past times the call.

Here 12 clients write simultaneously to a prepare of bookies, as well as the aggregate throughput is measured. Compared to the results for a unmarried customer writer, the aggregate throughput is substantially higher for shorter entries. For longer entries, throughput is express past times the speed amongst which bookies are able to write to disk, so adding to a greater extent than bookies to the puddle (configurations amongst 6E) results inward increased throughput.

Discussion

BookKeeper resembles chain replication a little. The chain replication approach is to export consensus to Paxos, as well as solely shop information providing high throughput. BookKeeper too does that, merely chain replication is non referred to inward the newspaper at all. Of course, chain replication lacks striping, as well as does non past times default furnish disjoint read replicas (in improver to write replicas) to meliorate read throughput.

The Tango paper mentions BookKeeper as well as states that it has an implementation of BookKeeper inward 300 lines. How would yous implement BookKeeper inward Tango?  What tin flame yous speculate most the performance of BookKeeper versus TangoBookKeeper? Could yous implement Tango using BookKeeper? How most transactions?

After reading the paper, I was kept amongst this question. What happens if the bookie writes the entry to its mag as well as acknowledges it, merely dies earlier asynchronously writing this entry to its ledger? Does this campaign whatever problems?

Final remarks
The newspaper does non speak most consistency of logging, because every consistency occupation organisation is exported to the ZooKeeper. I gauge nosotros tin flame chalk this upwards every bit success points for ZooKeeper. BookKeeper's bottleneck for WAN deployment is ZooKeeper. If ZooKeeper is consulted infrequently things are OK. But if ZooKeeper is consulted often for LAC information inward lodge to read from opened upwards ledgers, performance suffers.

Related links:
Flavio's weblog shipping service on BookKeeper
Flavio's presentation on BookKeeper

0 Response to "Paper Summary: High-Availability Distributed Logging Amongst Bookkeeper"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel