Dynamic Load Balancing Techniques for Nonlinear Structural Dynamics - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Dynamic Load Balancing Techniques for Nonlinear Structural Dynamics

Description:

School of Civil Engineering. Purdue University. 2:43. 1. Research Objectives ... To offer a new Parallel Finite Element Analysis (PFEA) Algorithm with the ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 22
Provided by: succe
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Load Balancing Techniques for Nonlinear Structural Dynamics


1
Dynamic Load Balancing Techniques for
NonlinearStructural Dynamics
  • SEI 2006 Structures Congress Presentation

Elisa Sotelino Professor CEE CS
Departments Virginia Tech
Ammar T. Al-Sayegh PhD. Candidate School of Civil
Engineering Purdue University
2
Outline
  • Research Objectives
  • Types Parallel FEA Algorithms
  • Node-wise Algorithms
  • Element-wise Algorithms
  • Domain-wise Algorithms
  • Proposed Row-wise Algorithm
  • DLB Technique to Overcome Imbalance
  • Numerical Example
  • ParaStruc Intro
  • Problem Definitions
  • ParaStruc Results
  • Questions Comments?

3
Objectives of this Development Effort
  • To offer a new Parallel Finite Element Analysis
    (PFEA) Algorithm with the following qualities
  • Robust
  • Efficient
  • Expandable
  • Easy to implement
  • Devise a Dynamic Load Balancing (DLB) scheme that
    effectively works with this new algorithm.

4
What is a PFEA Algorithm?
  • FEA can be broken down into
  • 1. Element state determination and force
    calculation
  • Store Ke, Fe, Element Properties
  • Get Df
  • Compute Ke, Fe
  • 3. Applying boundary conditions
  • Store DOFf
  • Get K, R
  • Compute Kf, Rf
  • 2. Assembly of global stiffness matrix and load
    vector
  • Store K, R
  • Get Ke, Fe
  • Compute K, R
  • 4. Solving for the nodal displacements
  • Store Kf, Rf
  • Get Kf, Rf
  • Compute Df

5
What is a PFEA Algorithm?
  • FEA can be broken down into
  • A procedure that specifies

6
What is a PFEA Algorithm?
Communi
unication
?
?
cation
Comm
?
Intra-Step Dependencies
Inter-Step Dependencies
-Compute-Store (Mem, Comm)
-Compute-Store (Comm)
4. Solve for Df. Store Kf, Rf Compute Df
2. Assembly of K R. Store K, R Compute K, R
-Compute-Compute (Conc)
?
?
?
?
?
?
Comm
?
cation
K,R
?
Communi
unication
?
  • A procedure that specifies

which step is parallelized,
how it is distributed,
and how the data is communicated.
7
Node-Wise Algorithms
  • Cons
  • Higher storage.
  • Redundant computation.
  • Solution concurrency/efficiency tradeoff.
  • Handling nonlinearity is not trivial.
  • Assembly is partitioned according to nodes
  • Pros
  • Lower communication.
  • Robust.
  • Elements distributed to pertinent assembly
    partitions

1. Element St. Det. Store Ke, Fe, Prop Compute
Ke, Fe
4. Solve for Df. Store Kf, Rf Compute Df
2. Assembly of K R. Store K, R Compute K, R
Comm
cation
3. Apply BCs. Store DOFf Compute Kf, Rf
Kf,Rf
K,R
Communi
unication
8
Element-Wise Algorithms
  • Pros
  • Lower storage.
  • Higher concurrency.
  • Better handling of NL.
  • Cons
  • Longer convergence (Itv).
  • May not converge (Itv).
  • High communication.
  • Fine grained.
  • K R are not explicitly assembled
  • Solve for Df Iteratively
  • Element State Det. Iterative Solution
    Parallelized

1. Element St. Det. Store Ke, Fe, Prop Compute
Ke, Fe
Communi
unication
Ke,Fe
Df
cation
Comm
4. Solve for Df. Store Kf, Rf Compute Df
2. Assembly of K R. Store K, R Compute K, R
4. Solve for Df. Iteratively using Ke, R
Ke,Fe
Ke,Fe
Comm
cation
3. Apply BCs. Store DOFf Compute Kf, Rf
3. Apply BCs. On Ke
Kf,Rf
K,R
Communi
unication
9
Domain-Wise Algorithms
  • Pros
  • Lower communication.
  • Higher concurrency.
  • Cons
  • More computation.
  • Higher storage.
  • Handling nonlinearity is not trivial.
  • Split to domains, Solve, then Join back

1. Element St. Det. Store Ke, Fe, Prop Compute
Ke, Fe
Communi
unication
Ke,Fe
Df
4-c. Solve for Dd1-Dd1 Interface
cation
Comm
4. Solve for Df. Store Kf, Rf Compute Df
2. Assembly of K R. Store K, R Compute K, R
4-b. Solve for Dd1-Dd1 Internal
4-a. Split Kf to Kd1-Kdn Split Rf to Rd1-Rdn
Comm
cation
K,R
Communi
unication
10
Proposed Row-Wise Algorithm
  • Consider structure with analyzed with n 3
    processors
  • 1. Create a vector of elements subdivide vector
    to n rows

2. Partition K R into n rows distributed to n
processors
3. Mark supported rows/cols, and redistribute
free rows
4. Partition the disp vector into rows, and solve
for Df
Communi
unication
cation
Comm
4. Solve for Df. Store Kf, Rf Compute Df
2. Assembly of K R. Store K, R Compute K, R
Comm
cation
K,R
Communi
unication
11
Proposed Row-Wise Algorithm
  • Now, consider the communication

1. Element St. Det. Store Ke, Fe, Prop Compute
Ke, Fe
Ke,Fe
Df
K,R
12
Proposed Row-Wise Algorithm
  • Perfect case No inter-processor communication

Most Often Not True
True
13
Proposed Row-Wise Algorithm
5. Load Balance
  • Consider following Structure

4. Solve for Df
2. Assembly
1. Element State Determination
3. Apply BCs
- Row 5 in System has Multiplicity 3 in P0 - Row
8 in System has Mult. 1 in P0
- Row 4 in Element has Multiplicity of 2 in P2 -
Row 7 in Elem. Has Mult. 1 in P2
N7 N5 N8 N2 N6 N4 N3
14
Source of Nonlinearity Imbalance
  • Element stiffness matrix
  • Nonlinearity
  • Generated at the section level.
  • Propagates to the element level through
    integration.
  • Propagates to the structural level through
    assembly.
  • Therefore, Imbalance must be dealt with at
  • Element state determination stepcaused by more
    iterations on some processors than others.
  • Structure displacement calculation stepcaused by
    introduction of new nonzero or new zero elements
    in the stiffness matrix, residual force vector,
    or displacement increment vector.

15
DLB Technique
  • Element State Determination Step
  • Update the multiplicity of each element row in
    the processor.
  • Start the state determination with the highest
    Multiplicity row for this processor, and end with
    lowest multiplicity row of this processor.
  • When last row reached, broadcast a signal
    requesting this processors maximum multiplicity
    values of undetermined rows on other processors.
  • Import the highest multiplicity row, and perform
    state determination on it.
  • Structure Displacement Solution Step
  • Update the multiplicity of each system row in the
    processor
  • Request maximum multiplicity values for this
    processor in other processors.
  • Compare the multiplicity of the local row with
    the maximum multiplicity in this processor with
    the multiplicity of the remote row in this
    processor. If the remote multiplicity is higher,
    exchange the two rows.

16
ParaStruc
  • New, fully parallelized, structural finite
    element system.
  • Built on Trilinos, a set of parallel numerical
    libraries.
  • Lightweight. Contains preprocessor 3 Classes
    only

17
Numerical Example NL Cantilever
  • Partition to 1,000 subelements
  • Mesh to 1,000 fibers
  • Integrate at 8 sections
  • Apply End Point Load
  • Analyzed on VT System X Supercomputer
  • 1100 node (2200 processors) Apple Xserve G5
    cluster
  • Dual 2.3 GHz PowerPC 970FX processors / node
  • 4 GB ECC DDR400 (PC3200) RAM / node
  • 80 GB S-ATA hard disk drive / node
  • Mellanox Cougar InfiniBand 4x HCA Interconnect

18
Numerical Example Results
  • 2 Factors controlling Efficiency
  • Storage cost (cache availability) Positive
    Impact for Np lt 8
  • - Communication cost (latency) Negative Impact
    for Np gt 8

19
Conclusions
  • A new row-wise partitioning algorithm together
    and a dynamic load balancing technique have been
    developed that
  • Minimizes computation in each processor.
  • Minimizes the required storage in each processor.
  • Minimizes inter-communication between processors.
  • Balances the computation load among the
    processors.

20
Acknowledgements
  • Special Thanks to
  • Kuwait University for funding this research.
  • Virginia Tech TeraScale group for making the
    System X Supercomputer available for this
    research and for their help and support in using
    it.

21
Questions Comments?
System X Supercomupter at Virginia Tech
Write a Comment
User Comments (0)
About PowerShow.com