Title: Pregel: A System for Large-Scale Graph Processing
 1Pregel A System for Large-Scale Graph Processing
- Grzegorz Malewicz, Matthew H. Austern, Aart J. C. 
Bik, James C. Dehnert, Ilan Horn, Naty Leiser, 
and Grzegorz Czajkwoski  - Google, Inc. 
 - SIGMOD 10 
 - 15 Mar 2013 
 - Dong Chang
 
  2Outline 
- Introduction 
 - Computation Model 
 - Writing a Pregel Program 
 - System Implementation 
 - Experiments 
 - Conclusion  Future Work
 
  3Outline 
- Introduction 
 - Computation Model 
 - Writing a Pregel Program 
 - System Implementation 
 - Experiments 
 - Conclusion  Future Work
 
  4Introduction (1/2) 
 5Introduction (2/2)
- Many practical computing problems concern large 
graphs  - MapReduce is ill-suited for graph processing 
 - Many iterations are needed for parallel graph 
processing  - Materializations of intermediate results at every 
MapReduce iteration harm performance 
Large graph data
Graph algorithms
Web graph Transportation routes Citation 
relationships Social networks
PageRank Shortest path Connected 
components Clustering techniques 
 6MapReduce Execution
- Map invocations are distributed across multiple 
machines by automatically partitioning the input 
data into a set of M splits.  - The input splits can be processed in parallel by 
different machines  - Reduce invocations are distributed by 
partitioning the intermediate key space into R 
pieces using a hash function hash(key) mod R  - R and the partitioning function are specified by 
the programmer.  
  7MapReduce Execution 
 8Data Flow
- Input, final output are stored on a distributed 
file system  - Scheduler tries to schedule map tasks close 
 - to physical storage location of input data 
 - Intermediate results are stored on local file 
system of map and reduce workers  - Output can be input to another map reduce task 
 
  9MapReduce Execution 
 10MapReduce Parallel Execution 
 11Outline 
- Introduction 
 - Computation Model 
 - Writing a Pregel Program 
 - System Implementation 
 - Experiments 
 - Conclusion  Future Work
 
  12Computation Model (1/3) 
 13Computation Model (2/3)
- Think like a vertex 
 - Inspired by Valiants Bulk Synchronous Parallel 
model (1990) 
Source http//en.wikipedia.org/wiki/Bulk_synchron
ous_parallel 
 14Computation Model (3/3)
- Superstep the vertices compute in parallel 
 - Each vertex 
 - Receives messages sent in the previous superstep 
 - Executes the same user-defined function 
 - Modifies its value or that of its outgoing edges 
 - Sends messages to other vertices (to be received 
in the next superstep)  - Mutates the topology of the graph 
 - Votes to halt if it has no further work to do 
 - Termination condition 
 - All vertices are simultaneously inactive 
 - There are no messages in transit 
 
  15An Example 
 16Example SSSP  Parallel BFS in Pregel 
 17Example SSSP  Parallel BFS in Pregel
?
10
?
?
?
?
?
?
5
? 
 18Example SSSP  Parallel BFS in Pregel 
 19Example SSSP  Parallel BFS in Pregel
11
14
8
12
7 
 20Example SSSP  Parallel BFS in Pregel 
 21Example SSSP  Parallel BFS in Pregel
9
13
14
15 
 22Example SSSP  Parallel BFS in Pregel 
 23Example SSSP  Parallel BFS in Pregel
13 
 24Example SSSP  Parallel BFS in Pregel 
 25Differences from MapReduce
- Graph algorithms can be written as a series of 
chained MapReduce invocation  - Pregel 
 - Keeps vertices  edges on the machine that 
performs computation  - Uses network transfers only for messages 
 - MapReduce 
 - Passes the entire state of the graph from one 
stage to the next  - Needs to coordinate the steps of a chained 
MapReduce 
  26Outline 
- Introduction 
 - Computation Model 
 - Writing a Pregel Program 
 - System Implementation 
 - Experiments 
 - Conclusion  Future Work
 
  27C API
- Writing a Pregel program 
 - Subclassing the predefined Vertex class
 
Override this!
in msgs
out msg 
 28Example Vertex Class for SSSP 
 29Outline 
- Introduction 
 - Computation Model 
 - Writing a Pregel Program 
 - System Implementation 
 - Experiments 
 - Conclusion  Future Work
 
  30MapReduce Coordination
- Master data structures 
 - Task status (idle, in-progress, completed) 
 - Idle tasks get scheduled as workers become 
available  - When a map task completes, it sends the master 
the location and sizes of its R intermediate 
files, one for each reducer  - Master pushes this info to reducers 
 - Master pings workers periodically to detect 
failures  
  31Mapreduce Failures
- Map worker failure 
 - Map tasks completed or in-progress at worker are 
reset to idle  - Reduce workers are notified when task is 
rescheduled on another worker  - Reduce worker failure 
 - Only in-progress tasks are reset to idle 
 - Master failure 
 - MapReduce task is aborted and client is notified 
 
  32System Architecture
- Pregel system also uses the master/worker model 
 - Master 
 - Maintains worker 
 - Recovers faults of workers 
 - Provides Web-UI monitoring tool of job progress 
 - Worker 
 - Processes its task 
 - Communicates with the other workers 
 - Persistent data is stored as files on a 
distributed storage system (such as GFS or 
BigTable)  - Temporary data is stored on local disk
 
  33Execution of a Pregel Program
- Many copies of the program begin executing on a 
cluster of machines  - The master assigns a partition of the input to 
each worker  - Each worker loads the vertices and marks them as 
active  - The master instructs each worker to perform a 
superstep  - Each worker loops through its active vertices  
computes for each vertex  - Messages are sent asynchronously, but are 
delivered before the end of the superstep  - This step is repeated as long as any vertices are 
active, or any messages are in transit  - After the computation halts, the master may 
instruct each worker to save its portion of the 
graph 
  34Fault Tolerance
- Checkpointing 
 - The master periodically instructs the workers to 
save the state of their partitions to persistent 
storage  - e.g., Vertex values, edge values, incoming 
messages  - Failure detection 
 - Using regular ping messages 
 - Recovery 
 - The master reassigns graph partitions to the 
currently available workers  - The workers all reload their partition state from 
most recent available checkpoint 
  35Outline 
- Introduction 
 - Computation Model 
 - Writing a Pregel Program 
 - System Implementation 
 - Experiments 
 - Conclusion  Future Work
 
  36Experiments
- Environment 
 - H/W A cluster of 300 multicore commodity PCs 
 - Data binary trees, log-normal random graphs 
(general graphs)  - Naïve SSSP implementation 
 - The weight of all edges  1 
 - No checkpointing
 
  37Experiments
- SSSP  1 billion vertex binary tree varying  of 
worker tasks 
  38Experiments
- SSSP  binary trees varying graph sizes on 800 
worker tasks 
  39Experiments
- SSSP  Random graphs varying graph sizes on 800 
worker tasks 
  40Outline 
- Introduction 
 - Computation Model 
 - Writing a Pregel Program 
 - System Implementation 
 - Experiments 
 - Conclusion  Future Work
 
  41Conclusion  Future Work
- Pregel is a scalable and fault-tolerant platform 
with an API that is sufficiently flexible to 
express arbitrary graph algorithms  - Future work 
 - Relaxing the synchronicity of the model 
 - Not to wait for slower workers at inter-superstep 
barriers  - Assigning vertices to machines to minimize 
inter-machine communication  - Caring dense graphs in which most vertices send 
messages to most other vertices 
  42Thank You!