Paper Summary: I Trillion Edges, Graph Processing At Facebook-Scale

This newspaper was latterly presented at VLDB15.
A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, S. Muthukrishnan, "One Trillion Edges: Graph Processing at Facebook-Scale." Proceedings of the VLDB Endowment 8.12 (2015).

This newspaper is near graph processing. Graphs furnish a full general flexible abstraction to model relations betwixt entities, in addition to detect a lot of demand inward the land of large information analysis (e.g., social networks, web-page linking, coauthorship relations, etc.)

You recollect the graphs inward Table 1 are big, but Frank's laptop begs to differ. These graphs besides neglect to print Facebook. In Facebook, they piece of work amongst graphs of trillion edges, iii orders magnitude larger than these. How would Frank's laptop fare for this? @franks_laptop may pace upwards to response that interrogation soon. This newspaper presents how Facebook deals amongst these huge graphs of i trillion edges.

Apache Giraph in addition to Facebook

In lodge to analyze social network information to a greater extent than efficiently, Facebook considered some graph processing platforms including Hive, GraphLab, Giraph inward the summertime of 2012. They ended upwards choosing Apache Giraph for several reasons: it is opened upwards source, it straight interfaces amongst Facebook's internal version of HDFS in addition to Hive, it is written inward Java, in addition to its BSP model is elementary in addition to like shooting fish in a barrel to argue about.

(The BSP model in addition to Pregel, which Apache Giraph derived from, was covered inward an before post of mine. You tin read that first, if you lot are unfamiliar amongst these concepts. I conduct hold besides summarized some of the Facebook information storage in addition to processing systems before, if you lot similar to read near them.)

However, chosing Apache Giraph was non the terminate of the story. Facebook was non happy amongst the province of Apache Giraph, in addition to extended, polished, optimized it for their production use. (And of course of education Facebook contributed these dorsum to the Apache Giraph projection equally opened upwards source.) This newspaper explains those extensions.

Significant technical extensions to Giraph

Several of Facebook's extensions were done inward lodge to generalize the platform. Facebook extended the master copy input model inward Giraph, which required a rather stiff in addition to express layout (all information relative to a vertex, including outgoing edges, had to live on read from the same tape in addition to were assumed to be inward the same information source) to enable flexible vertex/edge based input. Facebook added parallelization back upwards that enabled adding to a greater extent than workers per machine, in addition to introduced worker local multithreading to accept wages of additional CPU cores. Finally Facebook added retentiveness optimizations to serialize the edges of every vertex into a byte array rather than instantiating them equally native Java objects.

I was to a greater extent than interested inward their extensions to the compute model, which I summarize below.

Sharded aggregators

The aggregator framework inward Giraph was implemented over ZooKeeper rather inefficiently. Workers would write partial aggregated values to znodes (Zookeeper information storage). The principal would aggregate all of them, in addition to write the final outcome dorsum to its znode for workers to access it. This wasn't scalable due to znode size constraints (maximum 1 megabyte) in addition to Zookeeper write limitations in addition to caused a employment for Facebook which needed to back upwards really large aggregators (e.g. gigabytes).

(That was inward fact a bad piece of work of the ZooKeeper framework, as outlined inward this post in that place are improve ways to create it. Incidentally, my pupil Ailidani in addition to I are looking at Paxos piece of work inward production environments in addition to nosotros collect anectodes similar this. Email us if you lot conduct hold some examples to share.)

In the sharded aggregator architecture implemented past times Facebook (Figure 3), each aggregator is randomly assigned to i of the workers. The assigned worker is inward accuse of gathering the values of its aggregators from all workers in addition to distributing the final values to the principal in addition to other workers. This balances aggregation across all workers rather than bottlenecking the principal in addition to aggregators are express exclusively past times the full retentiveness available on each worker. Note that this is non fault-tolerant; they lost the crash-tolerance of ZooKeeper.

Worker in addition to Master Phase Extensions

For the worker-side, the methods preSuperstep(), postSuperstep(), preApplication(), in addition to postApplication() were added. As an example, the preSuperstep() method is executed on every worker prior to every superstep, in addition to tin live on used inward k-means clustering implementation to permit every worker compute the final centroid locations only before the input vectors are processed.

Similarly, Facebook added principal computation to create centralized computation prior to every superstep that tin communicate amongst the workers via aggregators. This is to a greater extent than oft than non a lightweight describe of piece of work (easily computable without requiring much information analysis) that has a global orbit (applies equally input to all workers inward the side past times side supercomputing step).

Superstep Splitting

When operating on really large scale graphs, a superstep may generate a lot of information to part amongst other workers (e.g., inward the friends-of-friends score calculation), that the output does non check inward memory. Giraph tin piece of work disk but this slows things signification. The superstep technique is for doing the same computation all in-memory for such applications. The sentiment is that inward such a message heavy superstep, a worker tin transportation a fragment of the messages to their destinations in addition to create a partial computation that updates the province of the vertex value.

Operational experience

Facebook uses Apache Giraph for production applications, for a diverseness of tasks including label propagation, variants of PageRank, in addition to k-means clustering. The newspaper reports that most of Facebook's production applications run inward less than an lx minutes in addition to piece of work less than 200 machines. Due to the brusk execution duration in addition to modest lay out of machines, the hazard of failure is relatively depression and, when a failure occurs, it is handled past times restarts.

The production application workflow is equally follows. The developer outset develops in addition to unit of measurement tests the application locally. Then tests the application on modest lay out of servers on a bear witness dataset (e.g., Facebook graph for i country). Then the application is run at scale on 200 workers. After tests, the application is create for production use.

Evaluations

This is where Facebook shows off. They ran an iteration of unweighted PageRank on the 1.39B Facebook user dataset amongst over 1 trillion social connections. They were able to execute PageRank inward less than iii minutes per iteration amongst exclusively 200 machines.

Conclusions

The newspaper gives the next equally final remarks:

First, our internal experiments demo that graph partitioning tin conduct hold a pregnant outcome on network saltation applications such equally PageRank.
Second, nosotros conduct hold started to await at making our computations to a greater extent than asynchronous equally a possible way to improve convergence speed.
Finally, nosotros are leveraging Giraph equally a parallel machine-learning platform.

These are of course of education related to what nosotros mentioned latterly near the trends inward distributed systems enquiry inward cloud computing. (Part 1, Part 2)