Naiad: A Timely Dataflow System

What is inwards a name?

Naiad is from Microsoft Research. Dryad, a full general purpose runtime for execution of information parallel applications, was too from Microsoft Research. An application written for Dryad is modeled every bit a directed acyclic graph (DAG) in addition to Dryad is the "tree nymph" inwards Greek mythology. Naiad is a current processing platform in addition to Naiad is the "stream nymph" inwards Greek mythology.

Naiad is an opensource projection that has been receiving a lot of attending recently. I await nosotros volition listen to a greater extent than nearly Naiad, because it is real useful for low-latency real-time querying in addition to high-throughput incremental-processing of streaming large data. What is non to like?

Naiad is useful specially inwards incremental processing of graphs. As has been observed before, MapReduce is inappropriate for graph processing because of the large number of iterations needed inwards graph applications. MapReduce is a functional language, in addition to thence using MapReduce requires passing the entire nation of the graph from 1 phase to the next, which is inefficient. And for real-time applications batch processing delays of MapReduce becomes unacceptable.

Dataflow graph

The developer supplies a a logical graph to Naiad to clitoris the dataflow of computation. (Don't confuse this alongside the large scale input graph that Naiad computes on). The edges inwards this graph present dataflow. The vertices are stateful computation stages.

Figure 1 shows the overall architecture, alongside 2 primary carve upward components: (1) incremental processing of incoming updates in addition to (2) low-latency realtime querying of the application state. The interrogation path is cleanly separated from the update path. Queries are done separately on a slightly stale version of the electrical current application state. This way, the interrogation path does non acquire blocked or incur delays due to the update processing. This too avoids complexity: If queries shared the same path alongside updates, the queries could live accessing partially-processed/incomplete/inconsistent states, in addition to nosotros would have got to worry nearly how to preclude this.

With this architecture, Naiad is able to furnish <100ms interactive interrogation processing, <1s batch updates, in addition to <1ms loop iterations.

(The carve upward interrogation path is non a novel idea. In large information processing, in that location is a batch layer that does occasional/periodic batch processing. This batch processing would output indexed nation (new graph) in addition to queries were performed over this output nation inwards the serving layer inwards a read-only in addition to quick manner.)

Loops inwards dataflow graph

The logical dataflow graph tin have got loops in addition to fifty-fifty nested loops. (Note that, inwards contrast, a MapReduce computation dataflow does non have got whatsoever loops, it is a chain of stages; at each phase yous progress forwards using output of previous phase in addition to producing input for the side yesteryear side stage.)

The loop concept inwards the dataflow graph is real useful every bit it enables novel applications that may non live possible to compute alongside MapReduce similar frameworks (at to the lowest degree inwards a reasonably efficient manner). Loops are natural means of dealing alongside iterative graph processing every bit inwards PageRank in addition to automobile learning applications.

Naiad fifty-fifty allows nested loops. But, every bit useful every bit they are, loops complicate the task of a current processing organization significantly. How practise yous conk on runway of when information is purged out, in addition to that information doesn't conk on looping inwards the system? How practise yous conk on differentiate betwixt older information looping inwards the organization versus novel information that is but entering the loop? To bargain alongside these issues the newspaper introduces an epoch based timestamp to label information that is beingness processed. These timestamps brand the model suitable for tracking global progress inwards iterative algorithms inwards a local manner. The progress tracking looks similar a deep topic, the Naiad newspaper refers the readers to a carve upward 2013 newspaper for the total explanation of the progress tracking algorithm.

The newspaper calls the resulting model, the timely dataflow model. In sum, the timely dataflow model supports:
+ structured loops allowing feedback inwards the dataflow,
+ stateful information flow vertices capable of consuming in addition to producing records without global coordination, and
+ notifications for vertices 1 time they have got received all records for a given circular of input or loop iteration.

Naiad runtime


The logical dataflow graph, which denotes the stages of computation in addition to the dataflow betwixt these stages, is mapped on the physical worker machines inwards many-to-1 fashion. Each worker may host several stages of the dataflow graph. The workers are information nodes. They conk on a portion of the input information (usually a large-scale input graph, such every bit Twitter follower graph) inwards memory. So it makes feel to movement computation (dataflow graph stages) to the information (the partitioned input graph).

Writing programs alongside Naiad

A vertex (of the logical dataflow graph) may invoke 2 system-provided methods:
this.SendBy(e : Edge, thousand : Message, t : Timestamp)
this.NotifyAt(t : Timestamp).

Each telephone hollo upward to u.SendBy(e,m,t) results inwards a corresponding invocation of v.OnRecv(e,m,t), where e is an border from u to v, in addition to each telephone hollo upward to  v.NotifyAt(t) results inwards a corresponding invocation of v.OnNotify(t).

The OnRecv method may mail elements on the start output every bit shortly every bit they are start observed, allowing for depression latency, but to ensure correctness the vertex must utilisation OnNotify to delay sending a lastly synopsis until all inputs have got been observed. In other words, SendBy in addition to OnRecv are to a greater extent than suitable for streaming, in addition to NotifyAt in addition to OnNotify are to a greater extent than suitable for batching.

As such, Naiad provides tunable consistency. The developer tin utilisation loose-consistency functioning similar OnReceive or a rigid consistency functioning (that requires waiting) similar OnNotify.

A prototypical Naiad programme is given inwards the newspaper every bit follows.


Evaluation results

The newspaper has extensive evaluation results. Naiad was deployed on upward to 64 computers in addition to scalability results are shown for throughput, global barrier latency, progress tracking in addition to speedup. PageRank (on Twitter follower graph), logistic regression (as an instance of batch iterative automobile learning) in addition to k-Exposure algorithms (for Twitter topics) are used every bit examples.

Discussion

A feedback first: It would have got been real useful if the newspaper used dissimilar words for edges/vertices inwards logical dataflow graph versus those inwards the input graph that workers compute in addition to modify. This gets real confusing at places. (See it fifty-fifty became confusing every bit I wrote the above.)

The newspaper is xvi pages long, in addition to packed alongside information. But several things rest unclear to me later reading.

How does Naiad practise charge per unit of measurement control? Within a loop at each epoch a larger neighborhood of a vertex may acquire affected/triggered (e.g., intend of a PageRank similar spreading application). How does this non drive an input avalanche? How does Naiad practise charge per unit of measurement command to send/initiate exclusively every bit much every bit it tin eat on the worker nodes?

It is non clear if nosotros tin implement tightly coordinated applications inwards Naiad. By tightly coordinated applications I hateful applications that require multihop transactions on input graph, such every bit graph coloring in addition to graph subcoloring.

0 Response to "Naiad: A Timely Dataflow System"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel