Paper Summary. Proteus: Agile Ml Elasticity Through Tiered Reliability Inwards Dynamic Resources Markets

This newspaper proposes an elastic ML system, Proteus, that tin add/remove transient workers on the wing for exploiting the transient availability of inexpensive but revocable resources inward guild to trim back costs as well as latency of computation. The newspaper appeared inward Eurosys'17 as well as is authored past times Aaron Harlap, Alexey Tumanov, Andrew Chung, Gregory R. Ganger, as well as Phillip B. Gibbons.

Proteus has 2 components: AgileML as well as BidBrain. AgileML extends the parameter-server ML architecture to run on a dynamic mix of stable as well as transient machines, taking wages of opportunistic availability of inexpensive but preemptible AWS Spot instances. BidBrain is the resources allotment factor that decides when to acquire as well as drib transient resources past times monitoring electrical flow marketplace position prices as well as bidding on novel resources when their add-on would increment work-per-dollar.

Before delving into AgileML as well as BidBrain, let's get-go review the AWS Spot model.

See Spot run

AWS provides e'er available compute instances, called "on-demand" instances: y'all acquire them when y'all similar as well as proceed them equally much equally y'all similar provided that y'all pay their fixed hourly rate.

AWS equally good offers transient compute-instances via the AWS Spot market. You specify a bid price, as well as if the electrical flow marketplace position cost is nether your bid, y'all acquire the instance. You exclusively pay the marketplace position price, as well as non your bid-price. In other words, your bid-price is an upperbound on how much y'all are comfortable for paying hourly. And if the AWS spot marketplace position cost for the instance goes inward a higher house your upperbound rate, AWS pulls the instance from y'all alongside exclusively a 2-minute advance warning. Even inward this case, the silverlining is that the final incomplete hr of computing is non charged for you, as well as therefore y'all acquire some gratis computing.

As seen inward Figure 3, y'all tin preserve a lot of coin if your computing project tin exploit AWS Spot instances. (It is peculiar how the peak prices are sometimes upwardly to 10 times higher than the fixed on-demand instances. This is speculated to foreclose a high bid that secures long running instances at AWS Spot at a charge per unit of measurement lower than EC2.)

Jobs that are peculiarly suitable for AWS Spot's transient/preemptible computing trend are embarrassingly parallel information processing tasks, where pieces are non related as well as where at that spot is no demand to maintain long-lived state. For example, for "shallow computing", such equally thumbnail generation, at that spot is no damage done alongside an instance eviction, equally at that spot is no demand for continuity across the computation. The enquiry the newspaper investigates is how to brand the AWS Spot model operate for "deeper computing", such equally ML jobs.

While the newspaper considers this enquiry for the AWS Spot market, the motivation equally good applies to venture computing inward the datacenter. As disclosed inward the Google Borg paper, Google distinguishes as well as prioritizes its production services over analytic/batch services. If the production services demand to a greater extent than resources, they volition live given resources to the extent of preempting them from analytic/batch jobs if demand be. On the other hand, when at that spot is an excess of resources, analytic/batch jobs tin taste them opportunistically.

Stages of AgileML

AgileML has three modes/stages equally shown inward Figure 4. To render a shorter as well as to a greater extent than cost-effective computation, AgileML dynamically changes modes based on the availability of inexpensive transient instances. As the transient to on-demand ratio increases from 1:1 to beyond 15:1, AgileML shifts upwardly from way 1 upwardly to way 3. As the ratio decreases, AgileML shifts downwards from way three downwards to way 1.

Stage 1: Parameter Servers Only on Reliable Machines. Stage 1 spreads the parameter-server across reliable machines only, using transient nodes exclusively for stateless workers. This plant for most ML applications including K-means, DNN, Logistic Regression, Sparse Coding, equally the workers are stateless piece the parameter-servers incorporate the electrical flow solution state.
Stage 2: ActivePSs on Transient Machines as well as BackupPSs on Reliable Machines. For transient to reliable node ratio greater than 1:1, AgileML switches to phase 2. Stage 2 uses a primary-backup model for parameter servers, using transient nodes for an active server (ActivePS) as well as reliable nodes for the hot standby (BackupPS). This relieves the heavy network charge at the few reliable resources past times spreading it across the many transient resources. The model parameters are sharded across the laid of ActivePS instances. Workers post all updates as well as reads to the ActivePSs, which force updates inward volume to the BackupPSs. The solution nation affected past times transient node failures or evictions is recovered from BackupPSs. (For backing upwardly ActivePS to BackupPS, it may live possible to explore a threshold-based update machinery equally outlined inward the Gaia paper.)
Stage 3: No Workers on Reliable Machines. Workers colocated alongside BackupPSs on reliable machines were establish to crusade straggler effects at transient-to-reliable ratios beyond 15:1. Stage three removes these workers, as well as acts similar a sub-case of Stage 2.

Handling elasticity

The elasticity controller factor is responsible for changing modes based on the transient-to-reliable ratio as well as the network bandwidth. It tracks which workers are participating inward the computation, assigns a subset of input information to each worker, as well as starts novel ActivePSs.

For phase 2 as well as phase 3, half of the transient instances are recruited equally ActivePSs, equally that performed best inward the evaluations. This half ratio is probable to live specific to using transient instances, equally alongside reliable instances the to a greater extent than PSs the merrier it is.

During start-up, AgileML divides the parameter nation into northward partitions, where northward is the maximum release of ActivePSs that tin be at whatsoever ane point. By using partitions inward this way, AgileML avoids the demand to re-shard the parameter nation when adding or removing servers, instead re-assigning partitions equally needed.

As the ActivePS instances increment as well as decrease, the elasticity controller re-assigns the parameter-server shards across the ActivePS instances appropriately. If all the ActivePSs are evicted, AgileML transfers to Stage 1. It seems similar using a grade of indirection was sufficient to acquire this working.

BidBrain

BidBrain keeps rail of historical marketplace position prices for transient instances as well as makes allotment decisions to minimize cost-per-work. An allotment is defined equally a laid of instances of the same type acquired at the same fourth dimension as well as price. Before the terminate of an allocation's billing hour, BidBrain compares the cost-per-work ratios to create upwardly one's heed whether the allotment is renewed or terminated.

Evaluation

The experiments were performed alongside three ML applications.

Matrix Factorization (MF) is a technique (a.k.a. collaborative filtering) normally used inward recommendation systems, such equally recommending movies to users on Netflix. The destination is to honor latent interactions betwixt the 2 entities (e.g., users as well as movies). Given a partially filled matrix X (e.g., a matrix where entry (i, j) is user i’s rating of film j), MF factorizes X into factor matrices L as well as R such that their production approximates X.
Multinomial Logistic Regression (MLR) is a pop model for multi-way classification, frequently used inward the final layer of deep learning models for picture classification or text classification. The MLR experiments role the ImageNet dataset alongside LLC features, containing 64k observations alongside a characteristic dimension of 21,504 as well as M classes.
Latent Dirichlet Allocation (LDA) is an unsupervised method for discovering hidden semantic structures (topics) inward an unstructured collection of documents, each consisting of a handbag (multi-set) of words. The evaluated LDA solver implements collapsed Gibbs sampling.

The baseline runs all instances on Spot marketplace position machines as well as uses checkpointing to recover progress if evicted. The experiments demo virtually 17% overhead for MF due to checkpointing. Figure 1 illustrates the cost as well as fourth dimension benefits of Proteus over the MLR application. Compared to all on-demand, the baseline improves on cost significantly equally expected but increases the runtime past times 25%. Proteus improves on cost as well as equally good manages to attain reduced runtime.

On average 32% of Proteus's computing is gratis computing. But aggressively chasing gratis computing past times bidding really to a greater extent than or less marketplace position cost results inward high overhead: 4x increment inward runtime as well as higher costs due to frequent evictions.