Ex-MATE:%20Data-Intensive%20Computing%20with%20Large%20Reduction%20Objects%20and%20Its%20Application%20to%20Graph%20Mining - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Ex-MATE:%20Data-Intensive%20Computing%20with%20Large%20Reduction%20Objects%20and%20Its%20Application%20to%20Graph%20Mining

Description:

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Ex-MATE:%20Data-Intensive%20Computing%20with%20Large%20Reduction%20Objects%20and%20Its%20Application%20to%20Graph%20Mining


1
Ex-MATE Data-Intensive Computing with Large
Reduction Objects and Its Application to Graph
Mining
  • Wei Jiang and Gagan Agrawal

2
Outline
  • Background
  • System Design of Ex-MATE
  • Parallel Graph Mining with Ex-MATE
  • Experiments
  • Related Work
  • Conclusion

3
Outline
  • Background
  • System Design of Ex-MATE
  • Parallel Graph Mining with Ex-MATE
  • Experiments
  • Related Work
  • Conclusion

4
Background (I)
  • Map-Reduce
  • Simple API map and reduce
  • Easy to write parallel programs
  • Fault-tolerant for large-scale data centers
  • Performance?
  • Always a concern for HPC community
  • Generalized Reduction
  • First proposed in FREERIDE that was developed at
    Ohio State 2001-2003
  • Shared a similar processing structure
  • The key difference lies in a programmer-managed
    reduction-object
  • Better performance?

5
Map-Reduce Execution
6
Comparing Processing Structures
  • Reduction Object represents the intermediate
    state of the execution
  • Reduce func. is commutative and associative
  • Sorting, grouping.. .overheads are eliminated
    with red. func/obj.

7
Our Previous Work
  • A comparative study between FREERIDE and Hadoop
  • FREERIDE outperformed Hadoop with factors of 5 to
    10
  • Possible reasons
  • Java VS C? HDFS overheads? Inefficiency of
    Hadoop?
  • API difference?
  • Developed MATE (Map-Reduce system with an
    AlternaTE API) on top of Phoenix from Stanford
  • Adopted Generalized Reduction
  • Focused on API differences
  • MATE improved Phoenix with an average of 50
  • Avoids large set of intermediate pairs between
    Map Reduce
  • Reduces memory requirements

8
Extending MATE
  • Main issues of the original MATE
  • Only works on a single multi-core machine
  • Datasets should reside in memory
  • Assumes the reduction object MUST fit in memory
  • This paper extended MATE to address these
    limitations
  • Focus on graph mining an emerging class of apps
  • Require large-sized reduction objects as well as
    large-scale datasets
  • E.g., PageRank could have a 8GB reduction object!
  • Support of managing arbitrary-sized reduction
    objects
  • Also reading disk-resident input data
  • Evaluated Ex-MATE using PEGASUS
  • PEGASUS A Hadoop-based graph mining system

9
Outline
  • Background
  • System Design of Ex-MATE
  • Parallel Graph Mining with Ex-MATE
  • Experiments
  • Related Work
  • Conclusion

10
System Design and Implementation
  • System design of Ex-MATE
  • Execution overview
  • Support of distributed environments
  • System APIs in Ex-MATE
  • One set provided by the runtime
  • operations on reduction objects
  • Another set defined or customized by the users
  • reduction, combination, etc..
  • Runtime in Ex-MATE
  • Data partitioning
  • Task scheduling
  • Other low-level details

11
Ex-MATE Runtime Overview
  • Basic one-stage execution

12
Implementation Considerations
  • Support for processing very large datasets
  • Partitioning function
  • Partition and distribute to a number of nodes
  • Splitting function
  • Use the multi-core CPU on each node
  • Management of a large reduction-object (R.O.)
  • Reduce disk I/O!
  • Outputs (R.O.) are updated in a demand-driven way
  • Partition the reduction object into splits
  • Inputs are re-organized based on data access
    patterns
  • Reuse a R.O. split as much as possible in memory
  • Example Matrix-Vector Multiplication

13
A MV-Multiplication Example
Input Vector
(1, 1)
(1, 2)
Output Vector
Input Matrix
(2, 1)
14
Outline
  • Background
  • System Design of Ex-MATE
  • Parallel Graph Mining with Ex-MATE
  • Experiments
  • Related Work
  • Conclusion

15
GIM-V for Graph Mining (I)
  • Generalized Iterative Matrix-Vector
    Multiplication(GIM-V)
  • Proposed at CMU at first
  • Similar to the common MV Multiplication
  • MV Mul.
  • Three operations in
  • GIM-V
  • combine m(i, j) and v(j)
  • Not have to be a multiplication
  • combineAll n partial results for the element i
  • Not have to be the sum
  • assign v(new) to v(i)
  • The previous value of v(i) is updated by a new
    value

Multiplication
Sum
Assignment
16
GIM-V for Graph Mining (II)
  • A set of graph mining applications can fit into
    this GIM-V
  • PageRank, Diameter Estimation, Finding
    Connected Components, Random Walk with Restart,
    etc..
  • Parallelization of GIM-V
  • Use Map-Reduce in PEGASUS
  • A two-stage algorithm two consecutive
    map-reduce jobs
  • Use Generalized Reduction in Ex-MATE
  • A one-stage algorithm simpler code

17
GIM-V Example PageRank
  • PageRank is used by Google to calculate the
    relative importance of web-pages
  • Direct implementation of GIM-V v(j) is the
    ranking value
  • The three customized operations are

Multiplication
Sum
Assignment
18
GIM-V Other Algorithms
  • Diameter Estimation HADI is an algorithm to
    estimate the diameter of a given graph
  • The three customized operations are
  • Finding Connected Components HCC is a new
    algorithm to find the connected components of
    large graphs
  • The three customized operations are

Multiplication
Bitwise-or
Bitwise-or
Multiplication
Minimal
Minimal
19
Parallelization of GIM-V (I)
  • Using Map-Reduce Stage I
  • Map

Map M(i,j) and V(j) to reducer j
20
Parallelization of GIM-V (II)
  • Using Map-Reduce Stage I (cont.)
  • Reduce

Map combine2(M(i,j) , V(j)) to reducer i
21
Parallelization of GIM-V (III)
  • Using Map-Reduce Stage II
  • Map

22
Parallelization of GIM-V (IV)
  • Using Map-Reduce Stage II (cont.)
  • Reduce

23
Parallelization of GIM-V (V)
  • Using Generalized Reduction in Ex-MATE
  • Reduction

24
Parallelization of GIM-V (VI)
  • Using Generalized Reduction in Ex-MATE
  • Finalize

25
Outline
  • Background
  • System Design of Ex-MATE
  • Parallel Graph Mining with Ex-MATE
  • Experiments
  • Related Work
  • Conclusion

26
Experiments Design
  • Applications
  • Three graph mining algorithms
  • PageRank, Diameter Estimation, and Finding
    Connected Components
  • Evaluation
  • Performance comparison with PEGASUS
  • PEGASUS provides a naïve version and an optimized
    version
  • Speedups with an increasing number of nodes
  • Scalability speedups with an increasing size of
    datasets
  • Experimental platform
  • A cluster of multi-core CPU machines
  • Used up to 128 cores (16 nodes)

July 7, 2019
26
27
Results Graph Mining (I)
  • PageRank 16GB dataset a graph of 256 million
    nodes and 1 billion edges

Avg. Time Per Iteration (min)
10.0 speedup
of nodes
28
Results Graph Mining (II)
  • HADI 16GB dataset a graph of 256 million nodes
    and 1 billion edges

Avg. Time Per Iteration (min)
11.0 speedup
of nodes
29
Results Graph Mining (III)
  • HCC 16GB dataset a graph of 256 million nodes
    and 1 billion edges

Avg. Time Per Iteration (min)
9.0 speedup
of nodes
30
Scalability Graph Mining (IV)
  • HCC 8GB dataset a graph of 256 million nodes
    and 0.5 billion edges

Avg. Time Per Iteration (min)
1.7 speedup
1.9 speedup
of nodes
31
Scalability Graph Mining (V)
  • HCC 32GB dataset a graph of 256 million nodes
    and 2 billion edges

Avg. Time Per Iteration (min)
1.9 speedup
2.7 speedup
of nodes
32
Scalability Graph Mining (VI)
  • HCC 64GB dataset a graph of 256 million nodes
    and 4 billion edges

Avg. Time Per Iteration (min)
1.9 speedup
2.8 speedup
of nodes
33
Observations
  • Performance trends are similar for all three
    applications
  • Consistent with the fact that all three
    applications are implemented using the GIM-V
    method
  • Ex-MATE outperforms PEGASUS significantly for all
    three graph mining algorithms
  • Reasonable speedups for different datasets
  • Better scalability for larger datasets with a
    increasing number of nodes

July 7, 2019
33
34
Outline
  • Background
  • System Design of Ex-MATE
  • Parallel Graph Mining with Ex-MATE
  • Experiments
  • Related Work
  • Conclusion

35
Related Work Academia
  • Evaluation of Map-Reduce-like models in various
    parallel programming environments
  • Phoenix-rebirth for large-scale multi-core
    machines
  • Mars for a single GPU
  • MITHRA for GPGPUs in heterogeneous platforms
  • Recent IDAV for GPU clusters
  • Improvement of Map-Reduce API
  • Integrating pre-fetch and pre-shuffling into
    Hadoop
  • Supporting online queries
  • Enforcing a less restrictive synchronization
    semantics between Map and Reduce

July 7, 2019
35
36
Related Work Industry
  • Googles Pregel System
  • Map-reduce may not so suitable for graph
    operations
  • Proposed to target graph processing
  • Open source version HAMA project in Apache
  • Variants of Map-Reduce
  • Dryad/DryadLINQ from Microsoft
  • Sawzall from Google
  • Pig/Map-Reduce-Merge from Yahoo!
  • Hive from Facebook

July 7, 2019
36
37
Outline
  • Background
  • System Design of Ex-MATE
  • Parallel Graph Mining with Ex-MATE
  • Experiments
  • Related Work
  • Conclusion

38
Conclusion
  • Ex-MATE supports the management of reduction
    objects of arbitrary sizes
  • Deals with disk-resident reduction objects
  • Outperforms PEGASUS for both the naïve and
    optimized implementations for all three graph
    mining application
  • Has a simpler code
  • Offers a promising alternative for developing
    efficient data-intensive applications,
  • Uses GIM-V for parallelizing graph mining

39
Thank You, and Acknowledgments
  • Questions and comments
  • Wei Jiang - jiangwei_at_cse.ohio-state.edu
  • Gagan Agrawal - agrawal_at_cse.ohio-state.edu
  • This project was supported by
About PowerShow.com