pR: Automatic, Transparent Runtime Parallelization of the R Scripting Language

About This Presentation

Title:

pR: Automatic, Transparent Runtime Parallelization of the R Scripting Language

Description:

pR: Automatic, Transparent Runtime Parallelization of the R Scripting Language. Jiangtian Li. Department of Computer Science. North Carolina State University. 8/20/09 ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 35

Provided by: jiangt

Learn more at: https://arcb.csc.ncsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: pR: Automatic, Transparent Runtime Parallelization of the R Scripting Language

1
pR Automatic, Transparent Runtime
Parallelization of the R Scripting Language

Jiangtian Li
Department of Computer Science
North Carolina State University

2
Acknowledgement

This project is originated from and in
collaboration with Dr. Samatovas group at Oak
Ridge National Lab
Dr. Nagiza Samatova
Guru Kora
Srikanth Yoginath
Advisors
Dr. Xiaosong Ma
Dr. Nagiza Samatova
Supported by grants from
NSF
DOE

3
Outline

Motivation
Background
Architecture
Design
Performance
Conclusion and Future Work

v
4
Motivation

Increasing demand of massive scientific data
processing
Statistical analysis in gene/protein data (61
billions sequence records in GenBank)
Time series analysis of climate data (300GB for
10 years)
Widely used computing tools such as R, Matlab are
interpreted language in nature
Facilitate runtime parallelization
Involve both computation-intensive and
data-intensive tasks
Can exploit both task and data parallelism

5
What is R?

Portable and extensible software as well as an
interpreted language
Lisp alike - read-eval-print loop
Perform diverse statistical analysis
Many extension packages are being developed
Can be used in either interactive mode or batch
mode

6
Example R script example.R

Assign an integer
a lt-1
Construct a vector of 9 real numbers
conforming to normal distribution
c lt- rnorm(9)
Initialize a two-dimensional array
d lt- array(00, dimc(9,9))
Loop, read data from file
for(i in 1length(c))
di, lt- matrix(scan(paste(test.data, i,
sep)))

7
Example batch mode execution

From R prompt
gtsource("example.R")
gta
1 1
gtc
1 1.16808 0.15877 1.40785 1.73696 -1.19267
0.41321
7 -0.39817 -0.13059 -0.67247
gtd
,1 ,2 ,3
,4 ,5
1, 0 0 0
0 0
2, 0 0 0
0 0
From shell
R CMD BATCH example.R

8
Research Goal

Propose runtime framework for parallelizing R
Provide automatic and transparent manner for
parallel R programming
Achieve speedup and scalability for R
applications and benefit R community users

9
Outline

Motivation
Background
Architecture
Design
Performance
Conclusion and Future Work

v
10
Related Work

Embarrassingly parallel
snow package - Rossini et al.
Message passing
MultiMATLAB - Trefethen et al.
pyMPI - Miller
Back-end support
RScaLAPACK - Yoginath et al.
Star-P - Choy et al.
Compilers
Otter - Quinn et al.
Shared memory
MATmarks Almasi et al.

11
Related Work

Parallelizing compilers
SUIF Hall et al.
Polaris - Blume et al.
Runtime parallelization
Jprm - Chen et al.
Dynamic compilation
DyC - Grant et al.

12
Outline

Motivation
Background
Architecture
Design
Performance
Conclusion and Future Work

v
13
Design Rationale

Most R codes consist of high-level pre-built
functions, e.g., svd for singular value
decomposition, eigen for eigenvalues and
eigenvector computation
Loops usually has less inter-iteration dependency
and higher per-iteration execution cost, e.g., R
applications from Bioconductor
No pointer, no aliasing problem

14
Approach

Selective parallelizing scheme that focus on
function calls and loops
Dynamic and incremental dependency analysis with
runtime evaluation pause where dependency
cannot be determined, such as dynamic loop bound,
conditional branch
Master-worker paradigm to reduce scheduling and
data communication overhead
Outsource expensive tasks, i.e., function calls
and loops to workers
Data are distributed at workers

15
Framework Architecture

Inter-node communication MPI
Inter-process communication domain socket

16
Outline

Motivation
Background
Architecture
Design
Performance
Conclusion and Future Work

v
17
Analyzer

Input R script
Output Task Precedence Graph
Task finest unit in scheduling
Identify precedence relationship among tasks

18
Parsing

Identify basic execution unit R statement
Retrieve expressions such as variable names,
array subscripts
Output parse tree

19
An example of parse tree
20
Dependence analysis

Identify task finest unit in scheduling
Statement dependence analysis
Loop dependence analysis GCD test
Incremental analysis
Pause at points where runtime information is
needed for dependence analysis or branch decision
Obtain runtime evaluation results and proceed
Output Task Precedence Graph
Vertex task
Edge - dependence

21
Loop parallelization

Parallelize loop if no dependence is discovered
Executed in an embarrassingly parallel manner
Adjust Task Precedence Graph

22
An running example
23
task 1
task 2
task 3
a lt- 1
b lt- 2
c lt- rnorm(9)
d lt- array(00, dimc(9,9))
task 5
task 4
ll
ll
for (i in 15) di, lt- matrix(scan(paste(t
est.data, i, sep)))
for (i in blength(c)) ci lt- ci-1 a
for (i in 1lenth(c)) di, lt-
matrix(scan(paste(test.data, i, sep)))
for (i in 29) ci lt- ci-1 a
ll
if (clength(c) gt 10) e lt-
eigen(d) else e lt- sum(c)
task 6
task 6
ll
Pause point
24
Parallel Execution Engine

Dispatch ready tasks
Outsource expensive tasks (loops or function
calls) to workers
Coordinate peer-to-peer data communication and
monitor execution status
Update analyzer with runtime results

25
Outline

Motivation
Background
Architecture
Design
Performance
Conclusion and Future Work

v
26
Ease of use demonstration

Comparison of pR and snow (an R add-on package)
pR no user interference of source code
snow user plugs in APIs

27
Performance

Testbed
Opt cluster 16 nodes, 2 core, dual Opteron 265,
1 Gbps Ether
Fedora Core 5 Linux x86_64(Linux Kernel 2.6.16)
Benchmarks
Boost a statistics application
Bootstrap
SVD

28
Boost

Analysis overhead is very small
From 16 to 32 processors, computation speedup
drops to 1.5

29
Boostrap
30
SVD

Analysis overhead is very small
Serialization large data set in R is major
overhead (1.9 MB/s)

31
Task Parallelism Test

Statistical functions
prcomp principal component analysis
svd singular value decomposition
lm.fit linear model fitting
cor variance computation
fft Fast Fourier Transform
qr QR decomposition
Execution time of each task ranges from 3-27
seconds

32
Outline

Motivation
Background
Architecture
Design
Performance
Conclusion and Future Work

v
33
Future work

Apply loop transformation techniques
Intelligent scheduling to exploit data locality
Explore finer granularity interprocedural
parallelization
Load balance
Optimize high-level R function such as
serialization

34
Conclusion

Present pR framework, the first step to
parallelize R automatically and transparently
Optimization is needed to improve efficiency

Write a Comment

User Comments (0)

About PowerShow.com

pR: Automatic, Transparent Runtime Parallelization of the R Scripting Language - PowerPoint PPT Presentation

pR: Automatic, Transparent Runtime Parallelization of the R Scripting Language

pR: Automatic, Transparent Runtime Parallelization of the R Scripting Language. Jiangtian Li. Department of Computer Science. North Carolina State University. 8/20/09 ... – PowerPoint PPT presentation