pR: Automatic, Transparent Runtime Parallelization of the R Scripting Language - PowerPoint PPT Presentation

About This Presentation
Title:

pR: Automatic, Transparent Runtime Parallelization of the R Scripting Language

Description:

pR: Automatic, Transparent Runtime Parallelization of the R Scripting Language. Jiangtian Li. Department of Computer Science. North Carolina State University. 8/20/09 ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 35
Provided by: jiangt
Category:

less

Transcript and Presenter's Notes

Title: pR: Automatic, Transparent Runtime Parallelization of the R Scripting Language


1
pR Automatic, Transparent Runtime
Parallelization of the R Scripting Language
  • Jiangtian Li
  • Department of Computer Science
  • North Carolina State University

2
Acknowledgement
  • This project is originated from and in
    collaboration with Dr. Samatovas group at Oak
    Ridge National Lab
  • Dr. Nagiza Samatova
  • Guru Kora
  • Srikanth Yoginath
  • Advisors
  • Dr. Xiaosong Ma
  • Dr. Nagiza Samatova
  • Supported by grants from
  • NSF
  • DOE

3
Outline
  • Motivation
  • Background
  • Architecture
  • Design
  • Performance
  • Conclusion and Future Work

v
4
Motivation
  • Increasing demand of massive scientific data
    processing
  • Statistical analysis in gene/protein data (61
    billions sequence records in GenBank)
  • Time series analysis of climate data (300GB for
    10 years)
  • Widely used computing tools such as R, Matlab are
    interpreted language in nature
  • Facilitate runtime parallelization
  • Involve both computation-intensive and
    data-intensive tasks
  • Can exploit both task and data parallelism

5
What is R?
  • Portable and extensible software as well as an
    interpreted language
  • Lisp alike - read-eval-print loop
  • Perform diverse statistical analysis
  • Many extension packages are being developed
  • Can be used in either interactive mode or batch
    mode

6
Example R script example.R
  • Assign an integer
  • a lt-1
  • Construct a vector of 9 real numbers
  • conforming to normal distribution
  • c lt- rnorm(9)
  • Initialize a two-dimensional array
  • d lt- array(00, dimc(9,9))
  • Loop, read data from file
  • for(i in 1length(c))
  • di, lt- matrix(scan(paste(test.data, i,
    sep)))

7
Example batch mode execution
  • From R prompt
  • gtsource("example.R")
  • gta
  • 1 1
  • gtc
  • 1 1.16808 0.15877 1.40785 1.73696 -1.19267
    0.41321
  • 7 -0.39817 -0.13059 -0.67247
  • gtd
  • ,1 ,2 ,3
    ,4 ,5
  • 1, 0 0 0
    0 0
  • 2, 0 0 0
    0 0
  • From shell
  • R CMD BATCH example.R

8
Research Goal
  • Propose runtime framework for parallelizing R
  • Provide automatic and transparent manner for
    parallel R programming
  • Achieve speedup and scalability for R
    applications and benefit R community users

9
Outline
  • Motivation
  • Background
  • Architecture
  • Design
  • Performance
  • Conclusion and Future Work

v
10
Related Work
  • Embarrassingly parallel
  • snow package - Rossini et al.
  • Message passing
  • MultiMATLAB - Trefethen et al.
  • pyMPI - Miller
  • Back-end support
  • RScaLAPACK - Yoginath et al.
  • Star-P - Choy et al.
  • Compilers
  • Otter - Quinn et al.
  • Shared memory
  • MATmarks Almasi et al.

11
Related Work
  • Parallelizing compilers
  • SUIF Hall et al.
  • Polaris - Blume et al.
  • Runtime parallelization
  • Jprm - Chen et al.
  • Dynamic compilation
  • DyC - Grant et al.

12
Outline
  • Motivation
  • Background
  • Architecture
  • Design
  • Performance
  • Conclusion and Future Work

v
13
Design Rationale
  • Most R codes consist of high-level pre-built
    functions, e.g., svd for singular value
    decomposition, eigen for eigenvalues and
    eigenvector computation
  • Loops usually has less inter-iteration dependency
    and higher per-iteration execution cost, e.g., R
    applications from Bioconductor
  • No pointer, no aliasing problem

14
Approach
  • Selective parallelizing scheme that focus on
    function calls and loops
  • Dynamic and incremental dependency analysis with
    runtime evaluation pause where dependency
    cannot be determined, such as dynamic loop bound,
    conditional branch
  • Master-worker paradigm to reduce scheduling and
    data communication overhead
  • Outsource expensive tasks, i.e., function calls
    and loops to workers
  • Data are distributed at workers

15
Framework Architecture
  • Inter-node communication MPI
  • Inter-process communication domain socket

16
Outline
  • Motivation
  • Background
  • Architecture
  • Design
  • Performance
  • Conclusion and Future Work

v
17
Analyzer
  • Input R script
  • Output Task Precedence Graph
  • Task finest unit in scheduling
  • Identify precedence relationship among tasks

18
Parsing
  • Identify basic execution unit R statement
  • Retrieve expressions such as variable names,
    array subscripts
  • Output parse tree

19
An example of parse tree
20
Dependence analysis
  • Identify task finest unit in scheduling
  • Statement dependence analysis
  • Loop dependence analysis GCD test
  • Incremental analysis
  • Pause at points where runtime information is
    needed for dependence analysis or branch decision
  • Obtain runtime evaluation results and proceed
  • Output Task Precedence Graph
  • Vertex task
  • Edge - dependence

21
Loop parallelization
  • Parallelize loop if no dependence is discovered
  • Executed in an embarrassingly parallel manner
  • Adjust Task Precedence Graph

22
An running example
23
task 1
task 2
task 3
a lt- 1
b lt- 2
c lt- rnorm(9)
d lt- array(00, dimc(9,9))
task 5
task 4
ll
ll
for (i in 15) di, lt- matrix(scan(paste(t
est.data, i, sep)))
for (i in blength(c)) ci lt- ci-1 a
for (i in 1lenth(c)) di, lt-
matrix(scan(paste(test.data, i, sep)))
for (i in 29) ci lt- ci-1 a
ll
if (clength(c) gt 10) e lt-
eigen(d) else e lt- sum(c)
task 6
task 6
ll
Pause point
24
Parallel Execution Engine
  • Dispatch ready tasks
  • Outsource expensive tasks (loops or function
    calls) to workers
  • Coordinate peer-to-peer data communication and
    monitor execution status
  • Update analyzer with runtime results

25
Outline
  • Motivation
  • Background
  • Architecture
  • Design
  • Performance
  • Conclusion and Future Work

v
26
Ease of use demonstration
  • Comparison of pR and snow (an R add-on package)
  • pR no user interference of source code
  • snow user plugs in APIs

27
Performance
  • Testbed
  • Opt cluster 16 nodes, 2 core, dual Opteron 265,
    1 Gbps Ether
  • Fedora Core 5 Linux x86_64(Linux Kernel 2.6.16)
  • Benchmarks
  • Boost a statistics application
  • Bootstrap
  • SVD

28
Boost
  • Analysis overhead is very small
  • From 16 to 32 processors, computation speedup
    drops to 1.5

29
Boostrap
30
SVD
  • Analysis overhead is very small
  • Serialization large data set in R is major
    overhead (1.9 MB/s)

31
Task Parallelism Test
  • Statistical functions
  • prcomp principal component analysis
  • svd singular value decomposition
  • lm.fit linear model fitting
  • cor variance computation
  • fft Fast Fourier Transform
  • qr QR decomposition
  • Execution time of each task ranges from 3-27
    seconds

32
Outline
  • Motivation
  • Background
  • Architecture
  • Design
  • Performance
  • Conclusion and Future Work

v
33
Future work
  • Apply loop transformation techniques
  • Intelligent scheduling to exploit data locality
  • Explore finer granularity interprocedural
    parallelization
  • Load balance
  • Optimize high-level R function such as
    serialization

34
Conclusion
  • Present pR framework, the first step to
    parallelize R automatically and transparently
  • Optimization is needed to improve efficiency
Write a Comment
User Comments (0)
About PowerShow.com