pArray as an Efficient Static Parallel Container in STAPL (Standard Template Adaptive Parallel Library) - PowerPoint PPT Presentation

About This Presentation

Title:

pArray as an Efficient Static Parallel Container in STAPL (Standard Template Adaptive Parallel Library)

Description:

pArray as an Efficient Static Parallel Container in STAPL (Standard Template Adaptive Parallel Library) Presenter: Olga Tkachyshyn Grad Student Advisors: Ping An ... – PowerPoint PPT presentation

Number of Views:136

Avg rating:3.0/5.0

Slides: 33

Provided by: Computing

Learn more at: http://archive2.cra.org

Category:

more less

Transcript and Presenter's Notes

Title: pArray as an Efficient Static Parallel Container in STAPL (Standard Template Adaptive Parallel Library)

1
pArray as an Efficient StaticParallel
ContainerinSTAPL(Standard Template Adaptive
Parallel Library)

Presenter Olga Tkachyshyn
Grad Student Advisors Ping An, Gabriel Tanase
Faculty Advisor Nancy Amato

2
Presentation Plan

Motivation
STAPL Overview
pContainer Design
pArray
Prefix-sum using pArray and pVector
Performance results
Conclusion and Future Work

3
Motivation

The time it takes to complete a task is limited
to the speed of the worker
Alternative

Similarly, the time it takes to solve a problem
on a computer is limited to the speed of the
processor
Alternative parallel processing or the
concurrent use of multiple processors to process
data

4
Parallel/Distributed Architecture
Processor 0
Processor 2
Processor 1

Multiple processors are connected together
A processor can have its own memory or share the
memory with another processor

Cache 2
Cache 1
Cache 0
Memory 1
Memory 0
5
Motivation

Powerful parallel computers can solve hard to
compute problems
Computational physics
Protein folding
Parallel programming is challenging due to the
communication and synchronization issues
Parallel libraries reduce the complexity of
parallel programming

6
STAPL Introduction

The Parasol Lab in the Computer Science
Department at TAMU is developing a Standard
Template Adaptive Parallel Library (STAPL)
STAPL is designed as a platform independent
parallel library
STAPL provides a collection of parallel
containers (generic distributed data structures)
that are efficient and easy to use

7
STAPL Overview

STAPL is a C parallel library designed as a
superset of the Standard Template Library(STL).

STAPL simplifies parallel programming by letting
the user ignore the distributed machine details
like
data partitioning
distribution
communication

8
STAPL Main Components

pContainer
Generic distributed data structure
STAPL requires an efficient array data structure
for numeric intensive applications
pRange
Presents an abstract view of a scoped data space,
which allows random access to a partition or
subrange of the data in a pContainer
pAlgorithms
Parallel Algorithms which provide basic
functionality, bound with the pContainer by pRange

9
pContainer Basic Design

pContainers are data structures that allow users
to store and use distributed data as if it is
stored in a single memory
All pContainers have similar functionality
pContainers have three basic components
Base pContainer
Base Distribution Manager
Base Sequential Container Interface

10
Base Distribution Manager

Base Distribution Manager is responsible for
locating elements (finding the memory containing
the element)
Each pContainer element has a unique global
identifier (GID)

Local element Processor 0 needs an element with
GID 2

Processor 0
Processor 1
GID 0 1 2
Data 7 3 6
GID 3 4 5
Data 4 1 2

Remote element Processor 0 needs an element
with GID 4

Processor 0
Processor 1
GID 0 1 2
Data 7 3 6
GID 3 4 5
Data 4 1 2
11
Base Sequential Container

pContainer is composed of sequential containers
Base Sequential Container Interface/Part provides
an uniform interface to easily build pContainers
from different sequential containers

12
Base pContainer

Generic methods to construct a pContainer
Methods to add, access, modify elements
Methods to efficiently locate elements

13
Presentation Plan

Motivation
STAPL Overview
pContainer Design
pArray
Prefix-sum using pArray and pVector
Performance results
Conclusion and Future Work

14
pArray Introduction

An array
a data structure with fixed (unchangeable) size

0 1 2 3 4 5 6 7 8 9
13 98 56 45 0 45 77 38 23 52

Elements can be accessed randomly using their
index
array5 45
Arrays are useful for numerically intensive
applications
In C there is no fixed sized array
C vector allows insertion and deletion of
elements in the middle and thus is hard to
optimize
We have designed a pArray for STAPL for this
purpose

15
pArray Basic Design

pArray is derived from the base classes of the
pContainer
Three Major Components
Array Part
Array Distribution
pArray

16
Array Distribution

Responsible for locating local and remote
elements
Two ways this can be done
Duplicated Distribution Information
Each processor has information about where all
the elements are
Decentralized Distribution Information
Each processor is responsible for keeping track
of the location of an evenly divided amount of
elements

17
Duplicated Distribution Information

Array Distribution information is stored in a
vector of pair lt ltStart_Index, Sizegt,
Processor IDgt
Each processor has a copy of the Distribution
vector
Lookup Process Look in the Distribution Vector
Check if GID is in the range

Processor 0
Processor 1
Data
Data
GID 0 1 6 7
Data
GID 2 3 4 5
Data
Distribution Vector (Start_ Index, Size)PID
Distribution Vector (Start_ Index, Size)PID
(0, 2)0 (2, 4)1 (6, 2)0
(0, 2)0 (2, 4)1 (6, 2)0
18
Decentralized Distribution Information

Evenly divide the array into segments
Each processor is responsible for knowing the
location of one segment

Example Processor 0 needs element with GID 5
The algorithm
Processor 0
Processor 1
Cache Locally
Lookup GID

Data

Data

GID 0 1 6 7
Data
GID 2 3 4 5
Data
Is Local?
Get location information from Map Owner
yes

Distribution

Distribution

no
Location Cache
Location Cache
52/81
GID 3
Proc 1
GID
Proc
5
Is in Cache?
yes
MapOwner GIDnprocs/n
1
Location Map
Location Map
no
GID 0 1 2 3
Proc 0 0 1 1
GID 4 5 6 7
Proc 1 1 0 0
19
Duplicated Distribution Information vs.
Decentralized Distribution Information
Duplicated Distribution Information Decentralized Distribution Information
PROs Each processor has information about the location of each element, no need to request information remotely Location information is distributed and not duplicated, save space
CONs Distribution information is duplicated, potentially space consuming If the distribution info is large, the search is slow May need to look up the location information of an element remotely, slower
20
Array Part

As a wrapper over the sequential STL container
valarray
Has all of functionality of the valarray

21
pArray Class

BasePContainer instatiates ArrayPart and
ArrayDistribution
pArray class is derived from the Base pContainer
to implement the functionality specific to the
pArray

22
pArray Class

class pArray
//constructors
pArray() //default constructor
pArray(int size) //specific constructor with
default distribution
pArray(int size, ArrayDistribution distr)
//constructor with specified distribution

//element access methods Data
GetElement(GID) //returns an element with
specified GID void SetElement(GID,Data)
//sets a specified location with the given
value
//operators and array specific methods
Data operator //index array access
operator pArray operator(Data scalar)
//adds a scalar to the pArray pArray
operator(pArray array) //adds term by term two
pArrays of the same size (undefined otherwise)
//returns an array with the same
distribution as the calling array pArray
operator(Data scalar) //multiplies the pArray
by a scalar pArray operator(pArray
array) //multiplies term by term two pArrays of
the same size (undefined otherwise)
//returns an array with the same
distribution as the calling array Data
accumulate() //sums up all the values stored
in the pArray Data dotproduct(pArray
array) //dot product of two pArrays of the same
size (undefined otherwise) long double
euclideannorm() //euclidean norm of an pArray

23
Presentation Plan

Motivation
STAPL Overview
pContainer Design
pArray
Prefix-sum using pArray and pVector
Performance results
Conclusion and Future Work

24
Prefix Sums

One of the most basic parallel algorithms
Used in other parallel algorithms like sorts

Prefix Sums of a sequence Sx1, x2, ,xn of n
elements are the n partial sums defined by
Pi x1 x2 xi, 1 ? i ? n

Sequential Algorithm
Sn //original array
Pn //prefix sums
P0 S0
for (int i1 iltn i)
PiPi-1Si

Index 0 1 2 3 4
Original Array 2 4 3 5 1
Prefix Sums 2 6 9 14 15
25
Parallel Prefix Sums
Processor 0
Processor 1
Step 1 Each processor sums up its part
Data 1 2 2
Prefix Sum
Data 1 0 3
Prefix Sum
Part Sum 5
Part Sum 4
Step 2 Processor 0 receives all part sums,
calculates starting sums for each processor,
sends the corresponding starting sums to all
processors
Starting Sum 5
Starting Sum 0
Data 1 2 2
Prefix Sum 0
Data 1 0 3
Prefix Sum 5
Step 3 Each processor calculates its prefix sums
Data 1 2 2
Prefix Sum 1 3 5
Data 1 0 3
Prefix Sum 6 6 9
26
Presentation Plan

Motivation
STAPL Overview
pContainer Design
pArray
Prefix-sum using pArray and pVector
Performance results
Conclusion and Future Work

27
Performance Results

Scalability is the ability of a program to
exhibit good speed-up as the number of processors
used is increased
Scalability Time running on 1
Processor/Parallel Running Time

Running Prefix Sums for a pArray of 1,000,000
elements on 1 to 6 processors
28
Performance Results

pVector is a similar to pArray data structure
with a dynamic size (new elements can be added
and deleted at runtime)

Running Prefix Sums on 1,000,000 elements using
pArray and pVector
pArray is faster due to less overhead

29
Conclusions

pArray is a useful pContainer
pArray shows good scalability
pArray is faster than pVector in parallel Prefix
Sum
Parallel Prefix Sums is an efficient pAlgorithm

30
Future Work

Array re-distribution
Optimize Prefix Sums
More pAlgorithms

31
References

1 "STAPL An Adaptive, Generic Parallel C
Library", Ping An, Alin Jula, Silvius Rus,
Steven Saunders, Tim Smith, Gabriel Tanase,
Nathan Thomas, Nancy Amato and Lawrence
Rauchwerger, 14th Workshop on Languages and
Compilers for Parallel Computing (LCPC),
Cumberland Falls, KY, August, 2001.
2 Efficient Parallel Containers with Shared
Object View", Ping An, Alin Jula, Gabriel
Tanase, Paul Thomas, Nancy Amato and Lawrence
Rauchwerger

32
Thank you