Paper Summary: Calvin, Distributed Transactions For Database Systems

Calvin is a transaction scheduling together with replication administration layer for distributed storage systems. By offset writing transaction requests to a durable, replicated log, together with so using a concurrency command machinery that emulates a deterministic series execution of the log's transaction requests, Calvin supports strongly consistent replication together with fully ACID distributed transactions, together with manages to incur lower inter-partition transaction coordination costs than traditional distributed database systems.

Calvin emphasizes modularity. The holy trinity inward Calvin is: log, scheduler, executor. When a customer submits a transaction asking to Calvin, this is right away appended to a durable log, earlier whatever actual execution begins. Calvin's scheduling machinery so processes this asking log, deciding when each transaction should endure executed inward a means that maintains an invariant slightly stronger than serializable isolation: Transaction execution may endure parallelized exactly must endure equivalent to a deterministic series execution inward log-specified order. As a result, the log of transaction requests itself serve every bit an ultimate "source of truth" almost the dry soil of a database, which makes the recovery real easy.

I idea Calvin resembles the Tango approach a lot. (I had discussed Tango hither recently.) It is almost every bit if Calvin is Tango's cousin inward databases domain. As such, Calvin has similar strengths together with disadvantages similar Tango. For the advantages: Calvin provides skilful throughput, exactly volition non instruct stars for low-latency. Calvin provides scalable replication, together with strongly-consistent replication. (After yous convey ane authoritative log, this is non difficult to furnish anyways.)

The centralized log is the rootage of all disadvantages inward Calvin every bit well.  The transactions shout out for to ever become through the centralized log; so at that topographic point are no genuinely local transactions. Thus Calvin volition perform worse for workloads that convey local/non-coordinating workload. So, the TPC-C workload Calvin uses for evaluation is really best workload to demo Calvin's relative functioning to other systems.

The Log component

Calvin uses Paxos to attain availability of the log past times replicating it consistently. A grouping of front-end servers collect customer requests into batches. Each batch is assigned a globally unique ID together with written to an independent, asynchronously replicated block storage service such every bit Voldemort or Cassandra. Once a batch of transaction requests is durable on plenty replicas, its GUID is appended to a Paxos “MetaLog”. Readers tin give the sack so reassemble the global transaction asking log past times concatenating batches inward the gild that their GUIDs look inward the Paxos MetaLog.

Batching trades off throughput amongst low-latency: yous cannot convey transaction latency lower than the batching duration (epoch). So an epoch is a guaranteed overhead on latency for every transaction.

The scheduler component

The Scheduler element (which is a centralized component) examines a transaction earlier it begins executing together with decides when it is rubber to execute the whole transaction, so hands the transaction asking off to the storage backend for execution amongst no additional oversight. The storage backend so does non shout out for to convey whatever noesis of the concurrency command machinery or implementation. Once a transaction begins running, the storage backend tin give the sack endure certain that it tin give the sack procedure it to completion without worrying almost concurrent transactions accessing the same data. However, each storage backend must furnish plenty information prior to transactional execution inward gild for the scheduler to brand a well-informed determination almost when each transaction tin give the sack safely (from a concurrency command perspective) execute.

For transaction execution, the scheduler yet uses locks. Deterministic locking ensures concurrent execution equivalent to the series transaction gild inward the log, together with likewise makes deadlock impossible (and the associated nondeterministic transaction aborts).

Questions

I don't know why Calvin doesn't adopt Tango means log maintenance: Using chain replication to amend throughput of the centralized log. This mightiness really care Calvin.

Similarly Calvin should likewise adopt selective/custom current replication per replica characteristic inward Tango. That would implement the flexibility/generality of Calvin.

Related links

https://christmasloveday.blogspot.com//search?q=paper-summary-tango-distributed-data

0 Response to "Paper Summary: Calvin, Distributed Transactions For Database Systems"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel