Title: High Performance Computational Engineering
1 High Performance Computational Engineering
- Putting the E back in CSE
- Dimitri Mavriplis
- University of Wyoming
- USA
2Computational Science and Engineering CSE
- CSE in 2009 is very similar to 1989
- On the verge of a paradigm shift
- 1989 Vector being replaced by parallel COTS
technology - 2009 Impact of multicore coming fast
- CSE in 2009 is very different than 1989
- 1989 Big drivers were
- Science Earth, NWP, MD
- Engineering CFD (Aerodynamics), Crash
simulation - 2009 Big drivers are
- Science Earth, NWP, Climate, MD,
- Engineering relegated mostly to commodity
clusters - Is this a solved problem (dont need huge HPC
resources)?
3(No Transcript)
4DOE SciDAC Program
- Scientific Discovery through Advanced Computing
- Enable new scientific discoveries
- Initial 5 year program
- Renewed for 5 years
5(No Transcript)
6Science vs. Engineering
- HPC advocacy has increasingly been taken up by
the science community - Numerical simulation is now the third pillar of
scientific discovery on an equal footing
alongside theory and experiment - Increased investment in HPC will enable new
scientific discoveries - SciDAC, ScaLES, Geociences, NSF Office of
Cyberinfrastructure (OCI). - Engineering is not discovery driven
7- Recent NSF Report
- Engineering based simulation needs more attention
- Science has been successful recently as advocate
- Mainly structures, crash dynamics, materials
- No mention of aeronautics activities
8Engineering Community
- Engineering in general and Aero in particular
- Our problems are not complex enough to warrant
such large scale simulations and hardware costs - Prefer to reduce cost of current simulation (i.e.
move to a cluster) instead of increasing the
simulation capability at fixed cost (on best
available hardware) - That is intractable !
- Excuses for lack of investment/vision
- Computer power is not limiting factor (cant use
it) - Algorithms, software, data
- ENG arguably more important for competitiveness
- Engineering lies at the interface of technology
and economics - Available computational power should be seen as a
resource - If do not know how to harness, are at competitive
disadvantage
9 Computational Engineering
- Engineering is not discovery driven
- Design driven
- Specific objectives
- Different techniques, different algorithms
- Different issues for high performance computing
- Why ParCFD ?
- CFD can be either science based or engineering
based - Science Turbulence, climate
- Engineering Aerodynamics
- Aerodynamics has traditionally been at the
forefront of HPC developments
10(No Transcript)
11- The most powerful computer in the world from 1976
to 1980 - NASA Ames Research Center
- Principal Applications CFD
- Leading National HPC Driver
- A principal element of applied mathematics
research
12Traditional Aerospace HPC Leadership
- 1980s National Aerodynamic Simulator (NAS)
- one of the first Cray 2, Cray YMP, Cray C90
- DoD HPCMP continued investment
- Aerospace companies often housed HPC on par with
national labs in 1980s-1990s - 1990s High Performance Computing and
Communication Program (HPCCP) - Transition from small numbers of vector
processors to upcoming class of massively
parallel microprocessors (O(100) cpus) - Computational Aerosciences (CAS)
13Computational Engineering Today
- Survey of current HPC engineering
- Industrial runs at O(100) cpus/cores
- No longer owners of leading-edge HPC hardware
- Many ISV codes even worse
- Government jobs O(100) to O(1000) cpus/cores
- NASA Columbia , DoD MSRC
- Machines available with 1M cores (not to mention
GPUs) - Today Need to go from O(100) cpus to O(105,106)
cpus. WHY? - Advance state-of-art through leading-edge
simulations - Future mid-size machines will have O(104) cores
or more - 2048 core clusters available today
- Cheap 16 core nodes available today
14Barriers and Challenges
- Multicore expansion more disruptive than 1990s
multiprocessor/message passing paradigm shift - No HPCCP/CAS program to address current shift (in
eng) - CFD/Aero is the logical discipline
to lead this effort in engineering - History of traditional leadership in HPC eng
- High technology focus
- National defense
- Good interaction gov/univ/ industry
- Expert user (compared to auto industry)
- Not completely commoditized/ ISVed
- Government support (but diminishing)
AIAA 2007-4084
15 Science Runs on Red Storm
- SEAM (Spectral Element global Atmospheric Model)
Simulation of the breakdown of the polar vortex
used to study the enhanced transport of polar
trapped air to mid latitudes. - Record setting 20 day simulation, 7200 cpus for
36 hours. 1B grid points (3000x1500x300), 300K
timesteps, 1TB of output. - Spectral elements replace spherical harmonics in
horizontal directions - High order (p8) finite element method with
efficient Gauss-Lobatto quadrature used to invert
the mass matrix. - Two dimensional domain decomposition leads to
excellent parallel performance.
c/o Mark Taylor, Sandia National Laboratories
16 SEAM on Red Storm and BG/L
Max 5TF
Max 4TF
Performance of 4 fixed problem sizes, on up to 6K
CPUs. The annotation gives the mean grid spacing
at the equator (in km) and the number of vertical
levels used for each problem.
17Tractable Aero Engineering on Current Leadership
Class Machines
- Digital Flight
- Static (and dynamic) aerodynamic data-base
generation using high-fidelity simulations - Time-dependent servo-aero-elastic maneuvering
aircraft simulations - Transient Full Turbofan Simulation
- New frontiers in multidisciplinary optimization
- Time dependent MDO
- MDO under uncertainty
- Not simply ever-increasing resolution (eg.
Climate simulation) - Requires extracting parallelism from other
dimensions - Physics, space-time, design-space, ensembles
18CH-47 Forward Flight
- Multidisciplinary rotorcraft analysis, including
trim, flight controls, and aeroelastics - Rotor performance prediction (pitch link loads)
and complex rotor-rotor and rotor-fuselage
interaction in forward flight
flight controls
trim
aeroelastics
OVERFLOW/RCAS Dimanlig/Bhagwat AFDD, Boeing,
ART
19DoD CREATE Program
- HI-ARMS (2006 present)
- Take the proven CST-05 approach and improve its
accuracy, speed (by 1000x) - Develop an efficient and maintainable software
integration framework for solving large-scale
multi-disciplinary DoD aeromechanics problems - Dual Mesh Paradigm
- Unstructured near-body
- Adaptive high-order Cartesian off-body
- Overset domain connectivity
20HIARMS Components
21HIARMS Components
22DoD Create Program
- Perhaps highest performance real computational
engineering being attempted (in US) - Based mostly on existing core technology
- Addresses software complexity
- Scales to O(1000) processors
- Next step will require
- O(1M) cores
- Identify and pursue novel simulation capabilities
- Opportunities for additional parallelism
- Novel algorithms, solvers
- programming (?)
23Particular Issues for Comp Eng
- Needs lots of mesh resolution (DPW)
- Good for parallelism
- Hindered by meshing and complex geometry
- By definition for engineering
- No CFD in a box or on a sphere
- Little Parallel grid generation (ISV)
- In-situ refinement more appealing
- Parallel geometry definition and access
- Reliable error estimation for AMR relevant to
engineering objective - Does not require excessive resolution
- Good enough engineering answer
- Bad for parallelism
- Other dimensions must be sought for parallelism
- Time and space
- Design optimization
- Data-Base Fill, Uncertainty quantification
24DPW III Series Cases
- Designed fairing to suppress flow separation
(Vassberg et al. AIAA 2005-4730)
25DPW III RESULTS (2006)
- Idealized drag vs grid index factor (N-2/3)
- Wing-body and Wing-bodyfairing
26VGRID Wing Body (40M pts)
27VGRID Wing Alone (30M pts)
28Grid Resolution
- Always need more
- DPW I (circa 2001) 3M pts
- DPW III (circa 2006) 40M pts
- Interim/Follow-on studies/DPW4 gt 100M pts
- Grid convergence studies point to need for gt 109
pts - Wide range of scales present in aerodynamics
- Highly variable
- Far field 100 MAC
- Trailing edge .01 MAC
- Anisotropic Boundary Layer Y1 10-6 MAC
- And this is just a simple steady-state
disciplinary problem - Not currently feasible to generate 1010 pt grids
on complex geometries - 2 possible remedies
- Adaptive mesh refinement (with adjoint error
estimation) - Higher-order methods
29Adjoint-Based Spatial Error Estimation AMR
- Adjoint Solution Greens Function for Objective
(Lift) - Change in Lift for Point sources of Mass/Momentum
- Error in objective Adjoint . Residual (approx.
solution) - Predicts objective value for new solution (on
finer mesh) - Cell-wise indicator of error in objective (only)
30h-refinement for target functional of lift
- Fixed discretization order of p 1
Final h-adapted mesh (8387 elements)
Close-up view of the final h-adapted mesh
31h-refinement for target functional of lift
- Comparison between h-refinement and uniform mesh
refinement
Error convergence history vs. degrees of freedom
Functional Values and Corrected Values
32Complex Geometry Vehicle Stage
Separation(CART3D/inviscid)
Top View
Initial Mesh
Side View
- Initial mesh contains only 13k cells
- Final meshes contain between 8M to 20M cells
33Pressure Contours
M84.5, a0
34- Minimal refinement of inter-stage region
- Gap is highly refined
- Overall, excellent convergence of functional and
error estimate
Cutaway view of inter-stage
35Unsteady Problems
Total error in solution
Spatial error (discretization/resolution)
Algebraic error
Temporal error (discretization/resolution)
Flow
Mesh
Other
Flow
Mesh
Other
Flow
Mesh
Other
- Solution of time-dependent adjoint backwards
integration in time - Disciplinary adjoint inner product with
disciplinary residual
36 Time-integrated functional
- Interaction of isentropic vortex with slowly
pitching NACA0012 - Mach number 0.4225
- Reduced frequency 0.001
- Center of pitch is quarter chord
- Functional is
8,600 elements
37Unsteady Adjoint Error Estimation
Density contours of initial condition
38Temporal Error Adaptation
Comparison of adapted temporal domain
39Algebraic Error Adaptation
Adapted Flow/Mesh convergence tolerances
40Adjoint-Based Refinement Results
Combined space, time algebraic error control
Expensive in 3D ?
Yes, but I have 1M cores And this will improve
my simulation outcome
- Error in Lift versus CPU Time
- Uniform cost is only finest solution cost
- Adaptive cost is all solutions ( adjoint cost)
- Corrected value provides further improvement
41High Order Methods
- Higher order methods such as Discontinuous
Galerkin best suited to meet high accuracy
requirements - Asymptotic properties
- HOMs reduce grid generation requirements
- HOMs reduce grid handling infrastructure
- Dynamic load balancing
- Compact data representation (data compression)
- Smaller number of modal coefficients versus large
number of point-wise values - HOMs scale very well on massively parallel
architectures
424-Element Airfoil (Euler Solution)
43Parallel Performance Speedup (1 MG-cycle)
2.5M cell mesh
- p0 does not scale
- p1 scales up to 1000 proc.
- pgt1 ideal scalability
44High Performance Computational Engineering
- Other computational engineering problems
- Parametric Analysis/Data Base Generation
- Unsteady optimization
- Optimization under uncertainty
- Characteristics
- Intractable at cluster level
- Seek additional (non-spatial) parallelism
45Flight-Envelope Data-Base Generation
- Based on current NASA experience
- Overflow 8 million points, 211 simulations, 1
week - Assuming
- 100 million grid points (RANS)
- 5 parameters, 10 values instances each ? O(105)
cases - Data-base generation in 1 week using 100,000 cpus
- Available today (LLNL)
- Wait 15 years for Moores Law
- Sooner using model reduction techniques
- Parallelism is evident
- Issues Automation, robustness,
data-collection/regeneration
46Time Dependent Design Optimization
K. Mani and D. J. Mavriplis. An unsteady discrete
adjoint formulation for two-dimensional flow
problems with deforming meshes. AIAA Paper
2007-0060
47Time-Dependent Load Convergence/Comparison
48Unsteady Optimization
49Computational Requirements
- One analysis cycle of rotorcraft
- 100 million grid points, three revolutions (0.25
degree/time step) - 30 hours on 1000 cpus
- One design cycle (twice cost of analysis)
- Forward time dependent simulation
- Backwards time dependent adjoint solution
- 50 to 100 design cycles
- 30 to 60 hours on 100,000 cpus
- Possibilities for additional parallelism
- Space-time parallelism
- Reduce sequential design cycles (Parallel Hessian
calculation) - Now do multipoint unsteady optimization
50Design under Uncertainty
- Most analysis and design assumes problems are
deterministic - Aerospace vehicles are subjected to variability,
e.g. - Manufacturing
- Wear
- Operating environmental conditions
- Few applications of non-deterministic methods
involve high-fidelity aerodynamics - Cost of high-fidelity simulations
- Lack of robustness for high-fidelity models
51Impact of Manufacturing Variability on Compressor
Aerodynamic Performance
- Garzon Darmofal studied the impact of geometry
variability due to manufacturing on compressor
aero performance (2002) - 2-D coupled Euler-boundary layer model (about 5
seconds per run) used for a 2000 run Monte Carlo
(about 3 CPU hours)
Compressor efficiency
Nominal efficiency
Mean efficiency
Geometric variability
Probabilistic Aerodynamic Analysis
- Mean efficiency 1.5 points lower than nominal
- 0 probability of compressor achieving design
efficiency
52Probabilistic Analysis
- Other examples abound
- AFRL/NASA Langley sponsored sort course on UQ
(May 2009) - Authored by Sandia (Dakota) group
- Monte-Carlo simulations / Data Base fill-in
require many high-fidelity simulations O(10,000) - Reduce this to O(100) or less with more
sophisticated techniques - Seek parallelism in construction of reduced order
models - Method of moments (gradients, hessians in
parallel) - Design optimization under uncertainty
- Unsteady design-optimization under uncertainty
53Conclusions
- HPC engineering community has fallen behind
Science community in advocating the need for HPC - Engineering community is guilty of a lack of
vision - Engineering requirements for HPC are essentially
insatiable for the foreseeable future - Often unable to make use of available HPC power
- This is why high end demonstration calculations
are important - Engineering requirements for HPC are different
than Science requirements - Grid generation/geometry is huge bottleneck
- Parallelism often must come from other sources
than spatial resolution (eg. compared to climate
simulation) - Engineering difficulties with HPC are common with
Science - Programming for 1M cores, software costs and
maintainability - New enabling algorithms required