HPC Middleware (HPC-MW) Infrastructure for Scientific Applications on HPC Environments Overview and Recent Progress - PowerPoint PPT Presentation

Loading...

PPT – HPC Middleware (HPC-MW) Infrastructure for Scientific Applications on HPC Environments Overview and Recent Progress PowerPoint presentation | free to download - id: f86d1-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

HPC Middleware (HPC-MW) Infrastructure for Scientific Applications on HPC Environments Overview and Recent Progress

Description:

HPC Middleware HPCMW Infrastructure for Scientific Applications on HPC Environments Overview and Rec – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 126
Provided by: fmv5
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: HPC Middleware (HPC-MW) Infrastructure for Scientific Applications on HPC Environments Overview and Recent Progress


1
HPC Middleware (HPC-MW)Infrastructure for
Scientific Applications on HPC
EnvironmentsOverview and Recent Progress
  • Kengo Nakajima, RIST.
  • 3rd ACES WG Meeting, June 5th 6th, 2003.
  • Brisbane, QLD, Australia,

2
My Talk
  • HPC-MW (35 min.)
  • Hashimoto/Matsuura Code on ES (5 min.)

3
Table of Contents
  • Background
  • Basic Strategy
  • Development
  • On-Going Works/Collaborations
  • XML-based System for User-Defined Data Structure
  • Future Works
  • Examples

4
Frontier Simulation Software for Industrial
Science (FSIS)http//www.fsis.iis.u-tokyo.ac.jp
  • Part of "IT Project" in MEXT (Minitstry of
    Education, Culture, Sports, Science Technology)
  • HQ at Institute of Industrial Science, Univ.
    Tokyo
  • 5 years, 12M USD/yr. (decreasing ...)
  • 7 internal projects, gt 100 people involved.
  • Focused on
  • Industry/Public Use
  • Commercialization

5
Frontier Simulation Software for Industrial
Science (FSIS) (cont.)http//www.fsis.iis.u-tokyo
.ac.jp
  • Quantum Chemistry
  • Quantum Molecular Interaction System
  • Nano-Scale Device Simulation
  • Fluid Dynamics Simulation
  • Structural Analysis
  • Problem Solving Environment (PSE)
  • High-Performance Computing Middleware (HPC-MW)

6
Background
  • Various Types of HPC Platforms
  • Parallel Computers
  • PC Clusters
  • MPP with Distributed Memory
  • SMP Cluster (8-way, 16-way, 256-way) ASCI, Earth
    Simulator
  • Power, HP-RISC, Alpha/Itanium, Pentium, Vector PE
  • GRID Environment -gt Various Resources
  • Parallel/Single PE Optimization is important !!
  • Portability under GRID environment
  • Machine-dependent optimization/tuning.
  • Everyone knows that ... but it's a big task
    especially for application experts, scientists.

7
Background
  • Various Types of HPC Platforms
  • Parallel Computers
  • PC Clusters
  • MPP with Distributed Memory
  • SMP Cluster (8-way, 16-way, 256-way) ASCI, Earth
    Simulator
  • Power, HP-RISC, Alpha/Itanium, Pentium, Vector PE
  • GRID Environment -gt Various Resources
  • Parallel/Single PE Optimization is important !!
  • Portability under GRID environment
  • Machine-dependent optimization/tuning.
  • Everyone knows that ... but it's a big task
    especially for application experts, scientists.

8
Reordering for SMP Cluster with Vector PEs ILU
Factorization
9
3D Elastic SimulationProblem SizeGFLOPSEarth
Simulator/SMP node (8 PEs)
?PDJDS/CM-RCM,PCRS/CM-RCM,?Natural Ordering
10
3D Elastic SimulationProblem SizeGFLOPSIntel
Xeon 2.8 GHz, 8 PEs
?PDJDS/CM-RCM,PCRS/CM-RCM,?Natural Ordering
11
Parallel Volume RenderingMHD Simulation of Outer
Core
12
Volume Rendering Moduleusing Voxels
On PC Cluster
On Earth Simulator
-Hierarchical Background Voxels -Linked-List
-Globally Fine Voxels -Static-Array
13
Background (cont.)
  • Simulation methods such as FEM, FDM etc. have
    several typical processes for computation.

14
"Parallel" FEM Procedure
Pre-Processing
Main
Post-Processing
Initial Grid Data
Post Proc.
Data Input/Output
Partitioning
Matrix Assemble
Visualization
Linear Solvers
Domain Specific Algorithms/Models
15
Background (cont.)
  • Simulation methods such as FEM, FDM etc. have
    several typical processes for computation.
  • How about "hiding" these process from users by
    Middleware between applications and compilers ?
  • Development efficient, reliable, portable,
    easy-to-maintain
  • accelerates advancement of the applications (
    physics)
  • HPC-MW Middleware close to "Application Layer"

16
Example of HPC Middleware Simulation Methods
include Some Typical Processes
FEM
17
Example of HPC Middleware Individual Process can
be optimized for Various Types of MPP
Architectures
FEM
18
Example of HPC MiddlewareLibrary-Type HPC-MW for
Existing HW
FEM code developed on PC
19
Example of HPC MiddlewareLibrary-Type HPC-MW for
Existing HW
FEM code developed on PC
20
Example of HPC MiddlewareLibrary-Type HPC-MW for
Existing HW
21
Example of HPC MiddlewareParallel FEM Code
Optimized for ES
22
Example of HPC MiddlewareParallel FEM Code
Optimized for Intel Xeon
23
Example of HPC MiddlewareParallel FEM Code
Optimized for SR8000
24
  • Background
  • Basic Strategy
  • Development
  • On-Going Works/Collaborations
  • XML-based System for User-Defined Data Structure
  • Future Works
  • Examples

25
HPC Middleware(HPC-MW)?
  • Based on idea of "Plug-in" in GeoFEM.

26
System Config. of GeoFEM
http//geofem.tokyo.rist.or.jp/
27
Local Data StructureNode-based
Partitioninginternal nodes-elements-external
nodes
28
What can we do by HPC-MW ?
  • We can develop optimized/parallel code easily on
    HPC-MW from user's code developed on PC.
  • Library-Type
  • most fundamental approach
  • optimized library for individual architecture
  • Compiler-Type
  • Next Generation Architecture
  • Irregular Data
  • Network-Type
  • GRID Environment (heterogeneous)
  • Large-Scale Computing (Virtual Petaflops),
    Coupling.

29
HPC-MWProcedure
LINK
30
Library-Type HPC-MWParallel FEM Code Optimized
for ES
31
What can we do by HPC-MW ?
  • We can develop optimized/parallel code easily on
    HPC-MW from user's code developed on PC.
  • Library-Type
  • most fundamental approach
  • optimized library for individual architecture
  • Compiler-Type
  • Next Generation Architecture
  • Irregular Data
  • Network-Type
  • GRID Environment,
  • Large-Scale Computing (Virtual Petaflops),
    Coupling.

32
Compiler-Type HPC-MW
Optimized code is generated by special language/
compiler based on analysis data (cache blocking
etc.) and H/W information.
F E M
Special Compiler
33
What can we do by HPC-MW ?
  • We can develop optimized/parallel code easily on
    HPC-MW from user's code developed on PC.
  • Library-Type
  • most fundamental approach
  • optimized library for individual architecture
  • Compiler-Type
  • Next Generation Architecture
  • Irregular Data
  • Network-Type
  • GRID Environment (heterogeneous).
  • Large-Scale Computing (Virtual Petaflops),
    Coupling.

34
Network-Type HPC-MWHeterogeneous
Environment"Virtual" Supercomputer
F E M
analysis model space
35
What is new, What is nice ?
  • Application Oriented (limited to FEM at this
    stage)
  • Various types of capabilities for parallel FEM
    are supported.
  • NOT just a library
  • Optimized for Individual Hardware
  • Single Performance
  • Parallel Performance
  • Similar Projects
  • Cactus
  • Grid Lab (on Cactus)

36
  • Background
  • Basic Strategy
  • Development
  • On-Going Works/Collaborations
  • XML-based System for User-Defined Data Structure
  • Future Works
  • Examples

37
Schedule
FY.2002
FY.2003
FY.2004
FY.2005
FY.2006
Basic Design
Prototype
Library-type HPC-MW
Scalar
Vector
Compiler-type Network-type HPC-MW
Compiler
Network
FEM Codes on HPC-MW
Public Release
38
System for Development
HW Vendors
HW Info.
HPC-MW
Public Release
Public Users
Library-Type
Compiler-Type
Network-Type
Feed-back, Comments
I/O
Vis.
Solvers
Coupler
AMR
DLB
Mat.Ass.
Feed-back Appl. Info.
Infrastructure
FEM Code on HPC-MW
Solid
Fluid
Thermal
39
FY. 2003
  • Library-Type HPC-MW
  • FORTRAN90,C, PC-Cluster Version
  • Public Release
  • Sept. 2003 Prototype Released
  • March 2004 Full version for PC Cluster
  • Demonstration on Network-Type HPC-MW in SC2003,
    Phoenix, AZ, Nov. 2003.
  • Evaluation by FEM Code

40
Library-Type HPC-MWMesh Generation is not
considered
  • Parallel I/O I/F for commercial code (NASTRAN
    etc.)
  • Adaptive Mesh Refinement (AMR)
  • Dynamic Load-Balancing using pMETIS (DLB)
  • Parallel Visualization
  • Linear Solvers(GeoFEM AMG, SAI)
  • FEM Operations (Connectivity, Matrix Assembling)
  • Coupling I/F
  • Utility for Mesh Partitioning
  • On-line Tutorial

41
AMRDLB
42
Parallel Visualization
 
Tensor Field
Vector Field
Scalar Field
Streamlines
Hyperstreamlines
Surface rendering
Particle tracking
Interval volume-fitting
Volume rendering
Topological map
LIC
Volume rendering
Extension of functions Extension of
dimensions Extension of Data Types
43
PMR(Parallel Mesh Relocator)
  • Data Size is Potentially Very Large in Parallel
    Computing.
  • Handling entire mesh is impossible.
  • Parallel Mesh Generation and Visualization are
    difficult due to Requirement for Global Info.
  • Adaptive Mesh Refinement (AMR) and Grid
    Hierarchy.

44
Parallel Mesh Generationusing AMR
  • Prepare Initial Mesh
  • with size as large as single PE can handle.

Initial Mesh
45
Parallel Mesh Generationusing AMR
  • Partition the Initial Mesh into Local Data.
  • potentially very coarse

Partition
Local Data
Local Data
Local Data
Local Data
Initial Mesh
46
Parallel Mesh Generationusing AMR
  • Parallel Mesh Relocation (PMR) by Local
    Refinement on Each PEs.

Partition
Local Data
Local Data
PMR
Local Data
Local Data
Initial Mesh
47
Parallel Mesh Generationusing AMR
  • Parallel Mesh Relocation (PMR) by Local
    Refinement on Each PEs.

Refine
Partition
Local Data
Local Data
PMR
Refine
Local Data
Local Data
Refine
Initial Mesh
Refine
48
Parallel Mesh Generationusing AMR
  • Parallel Mesh Relocation (PMR) by Local
    Refinement on Each PEs.

Refine
Refine
Partition
Local Data
Local Data
PMR
Refine
Refine
Local Data
Local Data
Refine
Refine
Initial Mesh
Refine
Refine
49
Parallel Mesh Generationusing AMR
  • Hierarchical Refinement History
  • can be utilized for visualization multigrid

Refine
Refine
Partition
Local Data
Local Data
PMR
Refine
Refine
Local Data
Local Data
Refine
Refine
Initial Mesh
Refine
Refine
50
Parallel Mesh Generationusing AMR
  • Hierarchical Refinement History
  • can be utilized for visualization multigrid

mapping results reversely from local fine mesh to
initial coarse mesh
Initial Mesh
51
Parallel Mesh Generationusing AMR
  • Visualization is possible on Initial Coarse Mesh
    using Single PE.
  • various existing software can be used.

mapping results reversely from local fine mesh to
initial coarse mesh
Initial Mesh
52
Parallel Mesh Generation and Visualization using
AMRSummary
Initial Coarse Mesh (Single)
53
Parallel Mesh Generation and Visualization using
AMRSummary
Partition PMR
Initial Coarse Mesh (Single)
Local Fine Mesh (Distributed)
54
Parallel Mesh Generation and Visualization using
AMRSummary
Partition PMR
Reverse Mapping
Second to the Coarsest
Initial Coarse Mesh (Single)
Local Fine Mesh (Distributed)
Initial Coarse Mesh
55
Example (1/3) Initial Entire Mesh
node 1,308 elem 795
56
Example (2/3) Initial Partitioning
nodes
elements
PE
224
0
100
184
1
100
188
2
100
207
3
99
203
4
99
222
5
99
202
6
99
194
7
99
57
Example (3/3) Refinemenet
58
Parallel Linear Solvers
 
On the Earth Simulator176 node,3.8 TFLOPS
59
Coupler
 
Fluid
C o u p l e r
Fluid
Coupler
Structure
Structure
Fluid MAIN Structure is called from Fluid as a
subroutine trough Couper.
Coupler MAIN (MpCCI)
60
Coupler
 
module hpcmw_mesh type hpcmw_local_mesh
... end type hpcmw_local_mesh end module
hpcmw_mesh
Fluid
FLUID program fluid use hpcmw_mesh type
(hpcmw_local_mesh) local_mesh_b ... call
hpcmw_couple_PtoS_put call structure_main call
hpcmw_couple_StoP_get ... end program fluid
Coupler
Structure
STRUCTURE subroutine structure_main use
hpcmw_mesh type (hpcmw_local_mesh)
local_mesh_b call hpcmw_couple_PtoS_get ...
call hpcmw_couple_StoP_put end subroutine
structure_main
Fluid MAIN Structure is called from Fluid as a
subroutine trough Couper.
61
Coupler
 
module hpcmw_mesh type hpcmw_local_mesh
... end type hpcmw_local_mesh end module
hpcmw_mesh
Fluid
FLUID program fluid use hpcmw_mesh type
(hpcmw_local_mesh) local_mesh_b ... call
hpcmw_couple_PtoS_put call structure_main call
hpcmw_couple_StoP_get ... end program fluid
Coupler
Structure
STRUCTURE subroutine structure_main use
hpcmw_mesh type (hpcmw_local_mesh)
local_mesh_b call hpcmw_couple_PtoS_get ...
call hpcmw_couple_StoP_put end subroutine
structure_main
Fluid MAIN Structure is called from Fluid as a
subroutine trough Couper.
62
Coupler
 
module hpcmw_mesh type hpcmw_local_mesh
... end type hpcmw_local_mesh end module
hpcmw_mesh
Fluid
FLUID program fluid use hpcmw_mesh type
(hpcmw_local_mesh) local_mesh_b ... call
hpcmw_couple_PtoS_put call structure_main call
hpcmw_couple_StoP_get ... end program fluid
Coupler
Communication !!
Structure
STRUCTURE subroutine structure_main use
hpcmw_mesh type (hpcmw_local_mesh)
local_mesh_b call hpcmw_couple_PtoS_get ...
call hpcmw_couple_StoP_put end subroutine
structure_main
Fluid MAIN Structure is called from Fluid as a
subroutine trough Couper.
Communication !!
63
FEM Codes on HPC-MW
  • Primary Target Evaluation of HPC-MW itself !
  • Solid Mechanics
  • Elastic, Inelastic
  • Static, Dynamic
  • Various types of elements, boundary conditions.
  • Eigenvalue Analysis
  • Compressible/Incompressible CFD
  • Heat Transfer with Radiation Phase Change

64
Release in Late September 2003
  • Library-Type HPC-MW
  • Parallel I/O
  • Original Data Structure, GeoFEM, ABAQUS
  • Parallel Visualization
  • PVR,PSR
  • Parallel Linear Solvers
  • Preconditioned Iterative Solvers (ILU, SAI)
  • Utility for Mesh Partitioning
  • Serial Partitioner,Viewer
  • On-line Tutorial
  • FEM Code for Linear-Elastic Simulation (prototype)

65
Technical Issues
  • Common Data Structure
  • Flexibility vs. Efficiency
  • Our data structure is efficient ...
  • How to keep user's original data structure
  • Interface to Other Toolkits
  • PETSc (ANL), Aztec/Trillinos (Sandia)
  • ACTS Toolkit (LBNL/DOE)
  • DRAMA (NEC Europe), Zoltan (Sandia)

66
  • Background
  • Basic Strategy
  • Development
  • On-Going Works/Collaborations
  • XML-based System for User-Defined Data Structure
  • Future Works
  • Examples

67
Public Use/Commercialization
  • Very important issues in this project.
  • Industry
  • Education
  • Research

68
Strategy for Public Use
  • General Purpose Parallel FEM Code
  • Environment for Development (1) for Legacy Code
  • "Parallelization"
  • F77-gtF90,COMMON -gt Module
  • Parallel Data Structure, Linear Solvers,
    Visualization
  • Environment for Development (2) from Scratch
  • Education
  • Various type of collaboration

69
On-going Collaboration
  • Parallelization of Legacy Codes
  • CFD Grp. in FSIS project
  • Mitsubishi Material Groundwater Flow
  • JNC (Japan Nuclear Cycle Development Inst.) HLW
  • others research, educations.
  • Part of HPC-MW
  • Coupling Interface for Pump Simulation
  • Parallel Visualization Takashi Furumura
    (ERI/U.Tokyo)

70
On-going Collaboration
  • Environment for Development
  • ACcESS(Australian Computational Earth Systems
    Simulator) Group
  • Research Collaboration
  • ITBL/JAERI
  • DOE ACTS Toolkit (Lawrence Berkeley National
    Laboratory)
  • NEC Europe (Dynamic Load Balancing)
  • ACES/iSERVO GRID

71
(FluidVibrarion) Simulationfor Boiler Pump
  • Hitachi
  • Suppression of Noise
  • Collaboration in FSIS project
  • Fluid
  • Structure
  • PSE
  • HPC-MW Coupling Interface
  • Experiment, Measurement
  • accelerometers on surface

72
(FluidVibrarion) Simulationfor Boiler Pump
  • Hitachi
  • Suppression of Noise
  • Collaboration in FSIS project
  • Fluid
  • Structure
  • PSE
  • HPC-MW Coupling Interface
  • Experiment, Measurement
  • accelerometers on surface

Fluid
Coupler
Structure
73
Parallelization of Legacy Codes
  • Many cases !!
  • Under Investigation through real collaboration
  • Optimum Procedure Document, I/F, Work Assignment
  • FEM suitable for this type of procedure
  • Works
  • to introduce new matrix storage manner for
    parallel iterative solvers in HPC-MW.
  • to add subroutine calls for using parallel
    visualization functions in HPC-MW.
  • to change data-structure is a big issue !!
  • flexibility vs. efficiency
  • problem specific/general

74
Element Connectivity
  • In HPC-MW

do icel 1, ICELTOT iS elem_index(icel-1)
in1 elem_ptr(iS1) in2 elem_ptr(iS2)
in3 elem_ptr(iS3) enddo
  • Sometimes...

do icel 1, ICELTOT in1 elem_node(1,icel)
in2 elem_node(2,icel) in3 elem_node(3,icel) e
nddo
75
Parallelization of Legacy Codes
HPC-MW optimized
Original Code
Input
F E M
Input
Mat. Conn.
Mat. Conn.
Mat. Assem.
Mat. Assem.
Linear Solver
Linear Solver
Visualization
Visualization
Output
Output
Comm.
76
Works with CFD grp. in FSIS
  • Original CFD Code
  • 3D Finite-Volume, Serial, Fortran90
  • Strategy
  • use Poisson solver in HPC-MW
  • keep ORIGINAL data structure We (HPC-MW)
    developed new partitioner.
  • CFD people do matrix assembling using HPC-MW's
    format Matrix assembling part is intrinsically
    parallel
  • Schedule
  • April 3, 2003 1st meeting, overview.
  • April 24, 2003 2nd meeting, decision of
    strategy
  • May 2, 2003 show I/F for Poisson solver
  • May 12, 2003 completed new partitioner

77
CFD Code 2 types of comm.(1) Inter-Domain
Communication
78
CFD Code 2 types of comm.(1) Inter-Domain
Communication
79
CFD Code 2 types of comm.(2) Wall-Law
Communication
80
CFD Code 2 types of comm.(2) Wall-Law
Communication
81
(Common) Data Structure
  • Users like to keep their original data structure.
  • But they wan to parallelize the code.
  • Compromise at this stage
  • keep original data structure
  • We (I, more precisely) develop partitioners for
    individual users (1day work after I understand
    the data structure).

82
Domain Partitioning Util. for Users Original
Data Structure very important for
parallelization of legacy codes
  • Functions
  • Input Initial Entire Mesh in ORIGINAL Format
  • Output Distributed Local Meshes in ORIGINAL
    Format
  • Communication Information (Separate File)
  • Merit
  • Original I/O routines can be utilized.
  • Operations for communication are hidden.
  • Technical Issues
  • Basically, individual support (developing) ...
    BIG WORK.
  • Various data structures for individual user code.
  • Problem specific information.

83
Mesh Partitioningfor Original Data Structure
  • Original I/O for local distributed data.
  • I/F for Comm.Info. provided by HPC-MW

84
Distributed Mesh Comm. Info.
This part is hidden except "CALL
HPCMW_COMM_INIT".
85
  • Background
  • Basic Strategy
  • Development
  • On-Going Works/Collaborations
  • XML-based System for User-Defined Data Structure
  • Future Works
  • Examples

86
XML-based I/O Func. Generator K.Sakane (RIST)
8th-JSCES Conf., 2003.
  • User's original data structure can be described
    by certain XML-based definition information.
  • Generate I/O subroutines (C, F90) for
    partitioning utilities according to XML-based
    definition information.
  • Substitute existing I/O subroutines for
    partitioning utilities in HPC-MW with generated
    codes.

87
XML-based I/O Func. Generator K.Sakane (RIST)
8th-JSCES Conf., 2003.
  • Utility partitioning reads initial entire mesh in
    HPC-MW format and writes distributed local mesh
    files in HPC-MW format with communication
    information.

Utility for Partitioning
I/O for Mesh Data
This is ideal for us... But all of the users are
not necessarily happy with that...
88
XML-based I/O Func. Generator K.Sakane (RIST)
8th-JSCES Conf., 2003.
  • Generate I/O subroutines (C, F90) for
    partitioning utilities according to XML-based
    definition information.

Utility for Partitioning
XML-based System I/O Func. Generator for Orig.
Data Structure
I/O for Mesh Data
89
XML-based I/O Func. Generator K.Sakane (RIST)
8th-JSCES Conf., 2003.
  • Substitute existing I/O subroutines for
    partitioning utilities in HPC-MW with generated
    codes.

Utility for Partitioning
XML-based System I/O Func. Generator for Orig.
Data Structure
I/O for Mesh Data
90
XML-based I/O Func. Generator K.Sakane (RIST)
8th-JSCES Conf., 2003.
  • Substitute existing I/O subroutines for
    partitioning utilities in HPC-MW with generated
    codes.

Utility for Partitioning
I/O for Mesh Data User-Def. Data Structure
91
TAGS in Definition Info. File
  • usermesh starting point
  • parameter parameter definition
  • define sub-structure definition
  • token smallest definition unit (number, label
    etc.)
  • mesh entire structure definition
  • ref reference of sub-structure

92
Example
NODE 1 0. 0. 0. NODE 2 1. 0. 0. NODE 3
0. 1. 0. NODE 4 0. 0. 1. ELEMENT 1 TETRA
1 2 3 4
Tetrahedron Connectivity
93
Example (ABAQUS, NASTRAN)
ABAQUS format NODE 1, 0., 0., 0. 2, 1.,
0., 0. 3, 0., 1., 0. 4, 0., 0.,
1. ELEMENT, TYPEC3D4 1 1 2 3 4
NASTRAN format GRID 1 0 0.
0. 0. 0 GRID 2 0
1. 0. 0. 0 GRID 3
0 0. 1. 0. 0 GRID
4 0 0. 0. 1.
0 CTETRA 1 1 1 2
3 4
94
Ex. Definition Info. File (1/2)
lt?xml version"1.0" encoding"EUC-JP"?gt ltusermeshgt
ltparameter nameTETRA"gt4lt/parametergt ltparameter
namePENTA"gt5lt/parametergt ltparameter
nameHEXA"gt8lt/parametergt ltdefine
name"mynode"gt lttoken means"node.start"gtNODElt
/tokengt lttoken means"node.id"/gt lttoken
means"node.x"/gt lttoken means"node.y"/gt
lttoken means"node.z"/gt lttoken
means"node.end"/gt lt/definegt
95
Ex. Definition Info. File (2/2)
ltdefine name"myelement"gt lttoken
means"element.start"gtELEMENTlt/tokengt lttoken
means"element.id"/gt lttoken
means"element.type"/gt lttoken
means"element.node" times"element.type"/gt
lttoken means"element.end"/gt lt/definegt ltmeshgt
ltref name"mynode"/gt ltref name"myelement"/gt
lt/meshgt lt/usermeshgt
Corresponding to parameter in "element.type"
96
  • Background
  • Basic Strategy
  • Development
  • On-Going Works/Collaborations
  • XML-based System for User-Defined Data Structure
  • Future Works
  • Examples

97
Further Study/Works
  • Develop HPC-MW
  • Collaboration
  • Simplified I/F for Non-Experts
  • Interaction is important !! Procedure for
    Collaboration

98
System for Development
HW Vendors
HW Info.
HPC-MW
Public Release
Public Users
Library-Type
Compiler-Type
Network-Type
Feed-back, Comments
I/O
Vis.
Solvers
Coupler
AMR
DLB
Mat.Ass.
Feed-back Appl. Info.
Infrastructure
FEM Code on HPC-MW
Solid
Fluid
Thermal
99
Further Study/Works
  • Develop HPC-MW
  • Collaboration
  • Simplified I/F for Non-Experts
  • Interaction is important !! Procedure for
    Collaboration
  • Extension to DEM etc.
  • Promotion for Public Use
  • Parallel FEM Applications
  • Parallelization of Legacy Codes
  • Environment for Development

100
Current Remarks
  • If you want to parallelize your legacy code on PC
    cluster ...
  • Keep your own data structure.
  • customized partitioner will be provided (or
    automatically generated by the XML system)
  • Rewrite your matrix assembling part.
  • Introduce linear solvers and visualization in
    HPC-MW

101
Current Remarks (cont.)
  • If you want to optimize your legacy code on the
    Earth Simulator
  • Use HPC-MW's data structure
  • Anyway, you have to rewrite your code for ES.
  • Utilize all components of HPC-MW
  • hpcmw-workers_at_tokyo.rist.or.jp

102
Some Examples
103
Simple Interface Communication
call SOLVER_SEND_RECV_3
( NP,
NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT,
STACK_EXPORT, NOD_EXPORT, WS,
WR, WW(1,ZP) , SOLVER_COMM,
my_rank)
GeoFEM's Original I/F
module solver_SR_3 contains
subroutine SOLVER_SEND_RECV_3
( N,
NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT,

STACK_EXPORT, NOD_EXPORT,
WS, WR, X, SOLVER_COMM,my_rank) implicit
REAL8 (A-H,O-Z) include 'mpif.h'
include 'precision.inc' integer(kindkint
) , intent(in) N
integer(kindkint ) , intent(in)
NEIBPETOT integer(kindkint ), pointer
NEIBPE () integer(kindkint ),
pointer STACK_IMPORT()
integer(kindkint ), pointer NOD_IMPORT ()
integer(kindkint ), pointer
STACK_EXPORT() integer(kindkint ),
pointer NOD_EXPORT () real
(kindkreal), dimension(3N), intent(inout) WS
real (kindkreal), dimension(3N),
intent(inout) WR real (kindkreal),
dimension(3N), intent(inout) X integer
, intent(in)
SOLVER_COMM integer
, intent(in) my_rank ...
104
Simple Interface Communication
call SOLVER_SEND_RECV_3
( NP,
NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT,
STACK_EXPORT, NOD_EXPORT, WS,
WR, WW(1,ZP) , SOLVER_COMM,
my_rank)
module solver_SR_3 contains
subroutine SOLVER_SEND_RECV_3
( N,
NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT,

STACK_EXPORT, NOD_EXPORT,
WS, WR, X, SOLVER_COMM,my_rank) implicit
REAL8 (A-H,O-Z) include 'mpif.h'
include 'precision.inc' integer(kindkint
) , intent(in) N
integer(kindkint ) , intent(in)
NEIBPETOT integer(kindkint ), pointer
NEIBPE () integer(kindkint ),
pointer STACK_IMPORT()
integer(kindkint ), pointer NOD_IMPORT ()
integer(kindkint ), pointer
STACK_EXPORT() integer(kindkint ),
pointer NOD_EXPORT () real
(kindkreal), dimension(3N), intent(inout) WS
real (kindkreal), dimension(3N),
intent(inout) WR real (kindkreal),
dimension(3N), intent(inout) X integer
, intent(in)
SOLVER_COMM integer
, intent(in) my_rank ...
105
Simple Interface Communication
call SOLVER_SEND_RECV_3
( NP,
NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT,
STACK_EXPORT, NOD_EXPORT, WS,
WR, WW(1,ZP) , SOLVER_COMM,
my_rank)
use hpcmw_util use dynamic_grid use
dynamic_cntl type (hpcmw_local_mesh)
local_mesh N local_meshn_internal NP
local_meshn_node ICELTOT local_meshn_elem NEIB
PETOT local_meshn_neighbor_pe NEIBPE gt
local_meshneighbor_pe STACK_IMPORTgt
local_meshimport_index NOD_IMPORTgt
local_meshimport_node STACK_EXPORTgt
local_meshexport_index NOD_EXPORTgt
local_meshexport_node
module solver_SR_3 contains
subroutine SOLVER_SEND_RECV_3
( N,
NEIBPETOT, NEIBPE, STACK_IMPORT, NOD_IMPORT,

STACK_EXPORT, NOD_EXPORT,
WS, WR, X, SOLVER_COMM,my_rank) implicit
REAL8 (A-H,O-Z) include 'mpif.h'
include 'precision.inc' integer(kindkint
) , intent(in) N
integer(kindkint ) , intent(in)
NEIBPETOT integer(kindkint ), pointer
NEIBPE () integer(kindkint ),
pointer STACK_IMPORT()
integer(kindkint ), pointer NOD_IMPORT ()
integer(kindkint ), pointer
STACK_EXPORT() integer(kindkint ),
pointer NOD_EXPORT () real
(kindkreal), dimension(3N), intent(inout) WS
real (kindkreal), dimension(3N),
intent(inout) WR real (kindkreal),
dimension(3N), intent(inout) X integer
, intent(in)
SOLVER_COMM integer
, intent(in) my_rank ...
106
Simple Interface Communication
use hpcmw_util type
(hpcmw_local_mesh) local_mesh ... call
SOLVER_SEND_RECV_3 ( local_mesh, WW(1,ZP))
...
use hpcmw_util use dynamic_grid use
dynamic_cntl type (hpcmw_local_mesh)
local_mesh N local_meshn_internal NP
local_meshn_node ICELTOT local_meshn_elem NEIB
PETOT local_meshn_neighbor_pe NEIBPE gt
local_meshneighbor_pe STACK_IMPORTgt
local_meshimport_index NOD_IMPORTgt
local_meshimport_node STACK_EXPORTgt
local_meshexport_index NOD_EXPORTgt
local_meshexport_node
module solver_SR_3 contains
subroutine SOLVER_SEND_RECV_3 (local_mesh, X)
use hpcmw_util type
(hpcmw_local_mesh) local_mesh
real(kindkreal), dimension(), allocatable,
save WS, WR ...
107
Preliminary Study in FY.2002
  • FEM Procedure for 3D Elastic Problem
  • Parallel I/O
  • Iterative Linear Solvers (ICCG, ILU-BiCGSTAB)
  • FEM Procedures (Matrix Connectivity/Assembling)
  • Linear Solvers
  • SMP Cluster/Distributed Memory
  • Vector/Scalar Processors
  • CM-RCM/MC Reordering

108
Method of Matrix Storage
  • Scalar/Distributed Memory
  • CRS with Natural Ordering
  • Scalar/SMP Cluster
  • PDCRS/CM-RCM
  • PDCRS/MC
  • Vector/Distributed SMP Cluster
  • PDJDS/CM-RCM
  • PDJDS/MC

109
Main Program
  • FEM program developed by users
  • call same subroutines
  • interfaces are same
  • NO MPI !!
  • Procedure of each subroutine is different in the
    individual library

program SOLVER33_TEST use solver33 use
hpcmw_all implicit REAL8(A-H,O-Z) call
HPCMW_INIT call INPUT_CNTL call INPUT_GRID call
MAT_CON0 call MAT_CON1 call MAT_ASS_MAIN
(valA,valB,valX) call MAT_ASS_BC call SOLVE33
(hpcmwIarray, hpcmwRarray) call HPCMW_FINALIZE
end program SOLVER33_TEST
110
HPCMW_INIT ?
for MPI subroutine HPCMW_INIT use
hpcmw_all implicit REAL8(A-H,O-Z) call MPI_INIT
(ierr) call MPI_COMM_SIZE (MPI_COMM_WORLD, PETOT,
ierr) call MPI_COMM_RANK (MPI_COMM_WORLD,
my_rank, err) end subroutine HPCMW_INIT
for NO-MPI (SMP) subroutine HPCMW_INIT use
hpcmw_all implicit REAL8(A-H,O-Z) ierr 0 end
subroutine HPCMW_INIT
111
Solve33 for SMP Cluster/Scalar
module SOLVER33 contains
subroutine SOLVE33 (hpcmwIarray, hpcmwRarray)
use hpcmw_solver_matrix use
hpcmw_solver_cntl use hpcmw_fem_mesh
use solver_CG_3_SMP_novec use
solver_BiCGSTAB_3_SMP_novec implicit
REAL8 (A-H,O-Z) real(kindkreal),
dimension(3,3) ALU real(kindkreal),
dimension(3) PW integer ERROR,
ICFLAG character(lenchar_length)
BUF data ICFLAG/0/
integer(kindkint ), dimension() hpcmwIarray
real (kindkreal), dimension()
hpcmwRarray !C !C ------------ !C
PARAMETERs !C ------------ !C ITER
hpcmwIarray(1) METHOD
hpcmwIarray(2) PRECOND
hpcmwIarray(3) NSET
hpcmwIarray(4) iterPREmax
hpcmwIarray(5) RESID
hpcmwRarray(1) SIGMA_DIAG
hpcmwRarray(2) if (iterPREmax.lt.1)
iterPREmax 1 if (iterPREmax.gt.4)
iterPREmax 4 !C !C !C ----------- !C
BLOCK LUs !C ----------- !C (skipped) !C

!C !C ------------------ !C ITERATIVE solver
!C ------------------ !C if
(METHOD.eq.1) then call CG_3_SMP_novec
( N, NP,
NPL, NPU, PEsmpTOT, NHYP, IVECT, STACKmc,
NEWtoOLD, OLDtoNEW,
D, AL, indexL, itemL, AU,
indexU, itemU, B, X, ALUG,
RESID, ITER, ERROR,
my_rank, NEIBPETOT, NEIBPE,
NOD_STACK_IMPORT, NOD_IMPORT,
NOD_STACK_EXPORT,
NOD_EXPORT,
SOLVER_COMM , PRECOND, iterPREmax) endif
if (METHOD.eq.2) then call
BiCGSTAB_3_SMP_novec
( N, NP, NPL, NPU, PEsmpTOT, NHYP,
IVECT, STACKmc, NEWtoOLD, OLDtoNEW,
D, AL,
indexL, itemL, AU, indexU, itemU,
B, X, ALUG, RESID, ITER, ERROR,
my_rank, NEIBPETOT,
NEIBPE,
NOD_STACK_IMPORT, NOD_IMPORT,
NOD_STACK_EXPORT, NOD_EXPORT,
SOLVER_COMM , PRECOND,
iterPREmax) endif ITERactual
ITER !C end subroutine SOLVE33
end module SOLVER33
112
Solve33 for SMP Cluster/Vector
module SOLVER33 contains
subroutine SOLVE33 (hpcmwIarray, hpcmwRarray)
use hpcmw_solver_matrix use
hpcmw_solver_cntl use hpcmw_fem_mesh
use solver_VCG33_DJDS_SMP use
solver_VBiCGSTAB33_DJDS_SMP implicit
REAL8 (A-H,O-Z) real(kindkreal),
dimension(3,3) ALU real(kindkreal),
dimension(3) PW integer ERROR,
ICFLAG character(lenchar_length)
BUF data ICFLAG/0/
integer(kindkint ), dimension() hpcmwIarray
real (kindkreal), dimension()
hpcmwRarray !C !C ------------ !C PARAMETERs
!C ------------ !C ITER
hpcmwIarray(1) METHOD
hpcmwIarray(2) PRECOND
hpcmwIarray(3) NSET
hpcmwIarray(4) iterPREmax
hpcmwIarray(5) RESID
hpcmwRarray(1) SIGMA_DIAG
hpcmwRarray(2) if (iterPREmax.lt.1)
iterPREmax 1 if (iterPREmax.gt.4)
iterPREmax 4 !C !C !C ----------- !C
BLOCK LUs !C ----------- !C (skipped) !C

!C !C ------------------ !C ITERATIVE solver
!C ------------------ !C if
(METHOD.eq.1) then call VCG33_DJDS_SMP
(
N, NP, NLmax, NUmax, NPL, NPU, NHYP, PEsmpTOT,
STACKmcG, STACKmc, NLmaxHYP,
NUmaxHYP, IVECT, NEWtoOLD,
OLDtoNEW_L, OLDtoNEW_U, NEWtoOLD_U, LtoU,
D, PAL, indexL, itemL, PAU, indexU, itemU,
B, X, ALUG_L, ALUG_U, RESID,
ITER, ERROR, my_rank,
NEIBPETOT, NEIBPE, NOD_STACK_IMPORT, NOD_IMPORT,

NOD_STACK_EXPORT, NOD_EXPORT,
SOLVER_COMM, PRECOND, iterPREmax) endif
if (METHOD.eq.2) then call
VBiCGSTAB33_DJDS_SMP
( N, NP, NLmax, NUmax, NPL, NPU,
NHYP, PEsmpTOT, STACKmcG,
STACKmc, NLmaxHYP, NUmaxHYP, IVECT,
NEWtoOLD, OLDtoNEW_L, OLDtoNEW_U,
NEWtoOLD_U, LtoU, D, PAL, indexL,
itemL, PAU, indexU, itemU, B, X,
ALUG_L, ALUG_U, RESID, ITER, ERROR, my_rank,
NEIBPETOT, NEIBPE,
NOD_STACK_IMPORT, NOD_IMPORT,
NOD_STACK_EXPORT, NOD_EXPORT,
SOLVER_COMM, PRECOND, iterPREmax)
endif ITERactual ITER !C
end subroutine SOLVE33 end module SOLVER33
113
Mat.Ass. for SMP Cluster/Scalar
do kpn 1, 2 do jpn 1, 2
do ipn 1, 2 coef
dabs(DETJ(ipn,jpn,kpn))WEI(ipn)WEI(jpn)WEI(kpn)
VOL VOL coef PNXi
PNX(ipn,jpn,kpn,ie) PNYi
PNY(ipn,jpn,kpn,ie) PNZi
PNZ(ipn,jpn,kpn,ie) PNXj
PNX(ipn,jpn,kpn,je) PNYj
PNY(ipn,jpn,kpn,je) PNZj
PNZ(ipn,jpn,kpn,je) a11
valX(PNXiPNXjvalB(PNYiPNYjPNZiPNZj))coef
a22 valX(PNYiPNYjvalB(PNZiPNZjPN
XiPNXj))coef a33
valX(PNZiPNZjvalB(PNXiPNXjPNYiPNYj))coef
a12 (valAPNXiPNYj
valBPNXjPNYi)coef a13
(valAPNXiPNZj valBPNXjPNZi)coef
... if (jp.gt.ip) then
PAU(9kk-8) PAU(9kk-8) a11
... PAU(9kk ) PAU(9kk ) a33
endif if (jp.lt.ip) then
PAL(9kk-8) PAL(9kk-8) a11
... PAL(9kk ) PAL(9kk
) a33 endif if
(jp.eq.ip) then D(9ip-8)
D(9ip-8) a11 ...
D(9ip ) D(9ip ) a33 endif
enddo enddo enddo
enddo endif enddo enddo
do ie 1, 8 ip nodLOCAL(ie)
do je 1, 8 jp nodLOCAL(je)
kk 0 if (jp.gt.ip) then
iiS indexU(ip-1) 1 iiE
indexU(ip ) do k iiS, iiE
if ( itemU(k).eq.jp ) then
kk k exit endif
enddo endif if
(jp.lt.ip) then iiS indexL(ip-1)
1 iiE indexL(ip ) do
k iiS, iiE if ( itemL(k).eq.jp)
then kk k exit
endif enddo
endif PNXi 0.d0 PNYi 0.d0
PNZi 0.d0 PNXj 0.d0
PNYj 0.d0 PNZj 0.d0 VOL
0.d0
114
Mat.Ass. for SMP Cluster/Vector
do kpn 1, 2 do jpn 1, 2
do ipn 1, 2 coef
dabs(DETJ(ipn,jpn,kpn))WEI(ipn)WEI(jpn)WEI(kpn)
VOL VOL coef PNXi
PNX(ipn,jpn,kpn,ie) PNYi
PNY(ipn,jpn,kpn,ie) PNZi
PNZ(ipn,jpn,kpn,ie) PNXj
PNX(ipn,jpn,kpn,je) PNYj
PNY(ipn,jpn,kpn,je) PNZj
PNZ(ipn,jpn,kpn,je) a11
valX(PNXiPNXjvalB(PNYiPNYjPNZiPNZj))coef
a22 valX(PNYiPNYjvalB(PNZiPNZjPN
XiPNXj))coef a33
valX(PNZiPNZjvalB(PNXiPNXjPNYiPNYj))coef
a12 (valAPNXiPNYj
valBPNXjPNYi)coef a13
(valAPNXiPNZj valBPNXjPNZi)coef
... if (jp.gt.ip) then
PAU(9kk-8) PAU(9kk-8) a11
... PAU(9kk ) PAU(9kk ) a33
endif if (jp.lt.ip) then
PAL(9kk-8) PAL(9kk-8) a11
... PAL(9kk ) PAL(9kk
) a33 endif if
(jp.eq.ip) then D(9ip-8)
D(9ip-8) a11 ...
D(9ip ) D(9ip ) a33 endif
enddo enddo enddo
enddo endif enddo enddo
do ie 1, 8 ip nodLOCAL(ie)
if (ip.le.N) then do je 1, 8
jp nodLOCAL(je) kk 0
if (jp.gt.ip) then ipU
OLDtoNEW_U(ip) jpU OLDtoNEW_U(jp)
kp PEon(ipU) iv
COLORon(ipU) nn ipU -
STACKmc((iv-1)PEsmpTOTkp-1) do k
1, NUmaxHYP(iv) iS
indexU(npUX1(iv-1)PEsmpTOT(k-1)kp-1) nn
if ( itemU(iS).eq.jpU) then
kk iS exit
endif enddo endif
if (jp.lt.ip) then ipL
OLDtoNEW_L(ip) jpL OLDtoNEW_L(jp)
kp PEon(ipL) iv
COLORon(ipL) nn ipL -
STACKmc((iv-1)PEsmpTOTkp-1) do k
1, NLmaxHYP(iv) iS
indexL(npLX1(iv-1)PEsmpTOT(k-1)kp-1) nn
if ( itemL(iS).eq.jpL) then
kk iS exit
endif enddo endif
PNXi 0.d0 PNYi 0.d0 PNZi
0.d0 PNXj 0.d0 PNYj 0.d0
PNZj 0.d0 VOL 0.d0
115
Hardware
  • Earth Simulator
  • SMP Cluster, 8 PE/node
  • Vector Processor
  • Hitachi SR8000/128
  • SMP Cluster, 8 PE/node
  • Pseudo-Vector
  • Xeon 2.8 GHz Cluster
  • 2 PE/node, Myrinet
  • Flat MPI only
  • Hitachi SR2201
  • Pseudo-Vector
  • Flat MPI

116
Simple 3D Cubic Model
117
Earth Simulator, DJDS.64x64x64/SMP node, up to
125,829,120 DOF
GFLOPS rate
Parallel Work Ratio
?Hybrid
?Flat MPI
?Flat MPI, ?Hybrid
118
Earth Simulator, DJDS. 100x100x100/SMP node, up
to 480,000,000 DOF
GFLOPS rate
Parallel Work Ratio
?Hybrid
?Flat MPI
?Flat MPI, ?Hybrid
119
Earth Simulator, DJDS. 256x128x128/SMP node, up
to 2,214,592,512 DOF
GFLOPS rate
Parallel Work Ratio
?Hybrid
?Flat MPI
3.8TFLOPS for 2.2G DOF 176 nodes (33.8)
?Flat MPI, ?Hybrid
120
Hitachi SR8000/1288 PEs/1-SMP node
SMP
Flat-MPI
?PDJDS, ?PDCRS, ?CRS-Natural
121
Hitachi SR8000/1288 PEs/1-SMP nodePDJDS
?SMP ?Flat-MPI
122
Xeon SR22018 PEs
?PDJDS, ?PDCRS, ?CRS-Natural
Xeon
SR2201
123
Xeon Speed UP1-24 PEs
163 nodes/PE
323 nodes/PE
?PDJDS, ?PDCRS, ?CRS-Natural
124
Xeon Speed UP1-24 PEs
163 nodes/PE
323 nodes/PE
?PDJDS, ?PDCRS, ?CRS-Natural
125
Hitachi SR2201 Speed UP1-64 PEs
163 nodes/PE
323 nodes/PE
?PDJDS, ?PDCRS, ?CRS-Natural
About PowerShow.com