pArray as an Efficient Static Parallel Container in STAPL (Standard Template Adaptive Parallel Library) - PowerPoint PPT Presentation

About This Presentation
Title:

pArray as an Efficient Static Parallel Container in STAPL (Standard Template Adaptive Parallel Library)

Description:

pArray as an Efficient Static Parallel Container in STAPL (Standard Template Adaptive Parallel Library) Presenter: Olga Tkachyshyn Grad Student Advisors: Ping An ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 33
Provided by: Computing
Learn more at: http://archive2.cra.org
Category:

less

Transcript and Presenter's Notes

Title: pArray as an Efficient Static Parallel Container in STAPL (Standard Template Adaptive Parallel Library)


1
pArray as an Efficient StaticParallel
ContainerinSTAPL(Standard Template Adaptive
Parallel Library)
  • Presenter Olga Tkachyshyn
  • Grad Student Advisors Ping An, Gabriel Tanase
  • Faculty Advisor Nancy Amato

2
Presentation Plan
  • Motivation
  • STAPL Overview
  • pContainer Design
  • pArray
  • Prefix-sum using pArray and pVector
  • Performance results
  • Conclusion and Future Work

3
Motivation
  • The time it takes to complete a task is limited
    to the speed of the worker
  • Alternative
  • Similarly, the time it takes to solve a problem
    on a computer is limited to the speed of the
    processor
  • Alternative parallel processing or the
    concurrent use of multiple processors to process
    data

4
Parallel/Distributed Architecture
Processor 0
Processor 2
Processor 1
  • Multiple processors are connected together
  • A processor can have its own memory or share the
    memory with another processor

Cache 2
Cache 1
Cache 0
Memory 1
Memory 0
5
Motivation
  • Powerful parallel computers can solve hard to
    compute problems
  • Computational physics
  • Protein folding
  • Parallel programming is challenging due to the
    communication and synchronization issues
  • Parallel libraries reduce the complexity of
    parallel programming

6
STAPL Introduction
  • The Parasol Lab in the Computer Science
    Department at TAMU is developing a Standard
    Template Adaptive Parallel Library (STAPL)
  • STAPL is designed as a platform independent
    parallel library
  • STAPL provides a collection of parallel
    containers (generic distributed data structures)
    that are efficient and easy to use

7
STAPL Overview
  • STAPL is a C parallel library designed as a
    superset of the Standard Template Library(STL).
  • STAPL simplifies parallel programming by letting
    the user ignore the distributed machine details
    like
  • data partitioning
  • distribution
  • communication

8
STAPL Main Components
  • pContainer
  • Generic distributed data structure
  • STAPL requires an efficient array data structure
    for numeric intensive applications
  • pRange
  • Presents an abstract view of a scoped data space,
    which allows random access to a partition or
    subrange of the data in a pContainer
  • pAlgorithms
  • Parallel Algorithms which provide basic
    functionality, bound with the pContainer by pRange

9
pContainer Basic Design
  • pContainers are data structures that allow users
    to store and use distributed data as if it is
    stored in a single memory
  • All pContainers have similar functionality
  • pContainers have three basic components
  • Base pContainer
  • Base Distribution Manager
  • Base Sequential Container Interface

10
Base Distribution Manager
  • Base Distribution Manager is responsible for
    locating elements (finding the memory containing
    the element)
  • Each pContainer element has a unique global
    identifier (GID)
  • Local element Processor 0 needs an element with
    GID 2

Processor 0
Processor 1
GID 0 1 2
Data 7 3 6
GID 3 4 5
Data 4 1 2
  • Remote element Processor 0 needs an element
    with GID 4

Processor 0
Processor 1
GID 0 1 2
Data 7 3 6
GID 3 4 5
Data 4 1 2
11
Base Sequential Container
  • pContainer is composed of sequential containers
  • Base Sequential Container Interface/Part provides
    an uniform interface to easily build pContainers
    from different sequential containers

12
Base pContainer
  • Generic methods to construct a pContainer
  • Methods to add, access, modify elements
  • Methods to efficiently locate elements

13
Presentation Plan
  • Motivation
  • STAPL Overview
  • pContainer Design
  • pArray
  • Prefix-sum using pArray and pVector
  • Performance results
  • Conclusion and Future Work

14
pArray Introduction
  • An array
  • a data structure with fixed (unchangeable) size

0 1 2 3 4 5 6 7 8 9
13 98 56 45 0 45 77 38 23 52
  • Elements can be accessed randomly using their
    index
  • array5 45
  • Arrays are useful for numerically intensive
    applications
  • In C there is no fixed sized array
  • C vector allows insertion and deletion of
    elements in the middle and thus is hard to
    optimize
  • We have designed a pArray for STAPL for this
    purpose

15
pArray Basic Design
  • pArray is derived from the base classes of the
    pContainer
  • Three Major Components
  • Array Part
  • Array Distribution
  • pArray

16
Array Distribution
  • Responsible for locating local and remote
    elements
  • Two ways this can be done
  • Duplicated Distribution Information
  • Each processor has information about where all
    the elements are
  • Decentralized Distribution Information
  • Each processor is responsible for keeping track
    of the location of an evenly divided amount of
    elements

17
Duplicated Distribution Information
  • Array Distribution information is stored in a
  • vector of pair lt ltStart_Index, Sizegt,
    Processor IDgt
  • Each processor has a copy of the Distribution
    vector
  • Lookup Process Look in the Distribution Vector
  • Check if GID is in the range

Processor 0
Processor 1
Data
Data
GID 0 1 6 7
Data
GID 2 3 4 5
Data
Distribution Vector (Start_ Index, Size)PID
Distribution Vector (Start_ Index, Size)PID
(0, 2)0 (2, 4)1 (6, 2)0
(0, 2)0 (2, 4)1 (6, 2)0
18
Decentralized Distribution Information
  • Evenly divide the array into segments
  • Each processor is responsible for knowing the
    location of one segment

Example Processor 0 needs element with GID 5
The algorithm
Processor 0
Processor 1
Cache Locally
Lookup GID
  • Data
  • Data

GID 0 1 6 7
Data
GID 2 3 4 5
Data
Is Local?
Get location information from Map Owner
yes
  • Distribution
  • Distribution

no
Location Cache
Location Cache
52/81
GID 3
Proc 1
GID
Proc
5
Is in Cache?
yes
MapOwner GIDnprocs/n
1
Location Map
Location Map
no
GID 0 1 2 3
Proc 0 0 1 1
GID 4 5 6 7
Proc 1 1 0 0
19
Duplicated Distribution Information vs.
Decentralized Distribution Information
Duplicated Distribution Information Decentralized Distribution Information
PROs Each processor has information about the location of each element, no need to request information remotely Location information is distributed and not duplicated, save space
CONs Distribution information is duplicated, potentially space consuming If the distribution info is large, the search is slow May need to look up the location information of an element remotely, slower
20
Array Part
  • As a wrapper over the sequential STL container
    valarray
  • Has all of functionality of the valarray

21
pArray Class
  • BasePContainer instatiates ArrayPart and
    ArrayDistribution
  • pArray class is derived from the Base pContainer
    to implement the functionality specific to the
    pArray

22
pArray Class
  • class pArray
  • //constructors
  • pArray() //default constructor
  • pArray(int size) //specific constructor with
    default distribution
  • pArray(int size, ArrayDistribution distr)
    //constructor with specified distribution

//element access methods Data
GetElement(GID) //returns an element with
specified GID void SetElement(GID,Data)
//sets a specified location with the given
value
//operators and array specific methods
Data operator //index array access
operator pArray operator(Data scalar)
//adds a scalar to the pArray pArray
operator(pArray array) //adds term by term two
pArrays of the same size (undefined otherwise)
//returns an array with the same
distribution as the calling array pArray
operator(Data scalar) //multiplies the pArray
by a scalar pArray operator(pArray
array) //multiplies term by term two pArrays of
the same size (undefined otherwise)
//returns an array with the same
distribution as the calling array Data
accumulate() //sums up all the values stored
in the pArray Data dotproduct(pArray
array) //dot product of two pArrays of the same
size (undefined otherwise) long double
euclideannorm() //euclidean norm of an pArray

23
Presentation Plan
  • Motivation
  • STAPL Overview
  • pContainer Design
  • pArray
  • Prefix-sum using pArray and pVector
  • Performance results
  • Conclusion and Future Work

24
Prefix Sums
  • One of the most basic parallel algorithms
  • Used in other parallel algorithms like sorts
  • Prefix Sums of a sequence Sx1, x2, ,xn of n
    elements are the n partial sums defined by
  • Pi x1 x2 xi, 1 ? i ? n
  • Sequential Algorithm
  • Sn //original array
  • Pn //prefix sums
  • P0 S0
  • for (int i1 iltn i)
  • PiPi-1Si

Index 0 1 2 3 4
Original Array 2 4 3 5 1
Prefix Sums 2 6 9 14 15
25
Parallel Prefix Sums
Processor 0
Processor 1
Step 1 Each processor sums up its part
Data 1 2 2
Prefix Sum
Data 1 0 3
Prefix Sum
Part Sum 5
Part Sum 4
Step 2 Processor 0 receives all part sums,
calculates starting sums for each processor,
sends the corresponding starting sums to all
processors
Starting Sum 5
Starting Sum 0
Data 1 2 2
Prefix Sum 0
Data 1 0 3
Prefix Sum 5
Step 3 Each processor calculates its prefix sums
Data 1 2 2
Prefix Sum 1 3 5
Data 1 0 3
Prefix Sum 6 6 9
26
Presentation Plan
  • Motivation
  • STAPL Overview
  • pContainer Design
  • pArray
  • Prefix-sum using pArray and pVector
  • Performance results
  • Conclusion and Future Work

27
Performance Results
  • Scalability is the ability of a program to
    exhibit good speed-up as the number of processors
    used is increased
  • Scalability Time running on 1
    Processor/Parallel Running Time

Running Prefix Sums for a pArray of 1,000,000
elements on 1 to 6 processors
28
Performance Results
  • pVector is a similar to pArray data structure
    with a dynamic size (new elements can be added
    and deleted at runtime)
  • Running Prefix Sums on 1,000,000 elements using
    pArray and pVector
  • pArray is faster due to less overhead

29
Conclusions
  • pArray is a useful pContainer
  • pArray shows good scalability
  • pArray is faster than pVector in parallel Prefix
    Sum
  • Parallel Prefix Sums is an efficient pAlgorithm

30
Future Work
  • Array re-distribution
  • Optimize Prefix Sums
  • More pAlgorithms

31
References
  • 1 "STAPL An Adaptive, Generic Parallel C
    Library",  Ping An, Alin Jula, Silvius Rus,
    Steven Saunders, Tim Smith, Gabriel Tanase,
    Nathan Thomas, Nancy Amato and Lawrence
    Rauchwerger,  14th Workshop on Languages and
    Compilers for Parallel Computing (LCPC), 
    Cumberland Falls, KY, August, 2001.
  • 2 Efficient Parallel Containers with Shared
    Object View",  Ping An, Alin Jula, Gabriel
    Tanase, Paul Thomas, Nancy Amato and Lawrence
    Rauchwerger

32
Thank you
  • To my mentors
  • Nancy Amato
  • Ping An
  • Gabriel Tanase
Write a Comment
User Comments (0)
About PowerShow.com