Title: Preconditioning%20Implicit%20Methods%20for%20Coupled%20Physics%20Problems%20%20David%20Keyes%20Center%20for%20Computational%20Science%20Old%20Dominion%20University%20
1Preconditioning Implicit Methods for Coupled
Physics ProblemsDavid KeyesCenter for
Computational ScienceOld Dominion
UniversityInstitute for Scientific Computing
ResearchLawrence Livermore National Laboratory
2Plan of presentation
- Background/progress report
- Fusion applications in the TOPS SciDAC portfolio
- Center for Extended Magnetohydrodynamic Modeling
(CEMM) - Center for Magnetic Reconnection Studies (CMRS)
- Parallel nonlinearly implicit solvers in TOPS
- PETScs domain-decomposed JFNK framework
- Hypres multilevel preconditioners
- Whats wrong with this picture?
- Curse of the fast waves
- Curse of the splitting errors
- Curse of the brute force algebraic solvers
- Current and future needs/plans
- Physics-based preconditioning
- Coupling multi-physics applications
3Acknowledgments
- For mentoring, application codes, and some
slides, our primary SciDAC fusion collaborators
to date - Steve Jardin and Jin Chen (CEMM)
- Amitava Bhattacharjee and Kai Germaschewski
(CMRS) - For solver codes, new development, and some
slides, our TOPS partners - Barry Smith and the PETSc team (ANL)
- Rob Falgout and the Hypre team (LLNL)
- For putting it all together
- Florin Dobrian, TOPS project post-doc (ODU)
- For future algorithmic philosophy
- Xiao-Chuan Cai (CU Boulder) and Dana Knoll (LANL)
- For support and computational resources
- DOE SciDAC program and earlier computational math
investments - NERSC and ORNL computing centers
4Two fusion energy applications
- Center for Magnetic Reconnection Studies (CMRS,
based at Iowa) - Idealized 2D Cartesian geom., periodic BCs,
simple FD discretization - Mix of primitive variables and streamfunctions
- Sequential nonlinearly coupled explicit
evolution, w/2 Poisson inversions on each
timestep - Want from TOPS
- Now scalable Jacobian-free Newton-Krylov
nonlinearly implicit solver for higher resolution
in 3D (and for AMR) - Later physics-based preconditioning for
nonlinearly implicit solver
- Center for Extended MHD Modeling (CEMM, based at
PPPL) - Realistic toroidal geom., unstructured mesh,
hybrid FE/FD discretization - Fields expanded in scalar potentials, and
streamfunctions - Operator-split, linearized, w/11 potential solves
in each poloidal cross-plane/step (90 exe. time) - Parallelized w/PETSc (Tang et al., SIAM PP01,
Chen et al., SIAM AN02, Jardin et al., SIAM
CSE03) - Want from TOPS
- Now scalable linear implicit solver for much
higher resolution (and for AMR) - Later fully nonlinearly implicit solvers and
coupling to other codes
5CEMM profile from SciDAC website
Center for Extended Magnetohydrodynamic
Modeling This research project will develop
computer codes that will enable a realistic
assessment of the mechanisms leading to
disruptive and other stability limits in the
present and next generation of fusion devices.
With an improvement in the efficiency of codes
and with the extension of the leading 3D
nonlinear magneto-fluid models of hot, magnetized
fusion plasmas, this research will pioneer new
plasma simulations of unprecedented realism and
resolution. These simulations will provide new
insights into low frequency, long-wavelength
non-linear dynamics in hot magnetized plasmas,
some of the most critical and complex phenomena
in plasma and fusion science. The underlying
models will be validated through cross-code and
experimental comparisons. PI Steve
JardinPrinceton Plasma Physics Lab
6CMRS profile from SciDAC website
Magnetic Reconnection Applications to Sawtooth
Oscillations, Error Field Induced Islands and the
Dynamo Effect The research goals of this project
include producing a unique high performance code
and using this code to study magnetic
reconnection in astrophysical plasmas, in smaller
scale laboratory experiments, and in fusion
devices. The modular code that will be developed
will be a fully three-dimensional, compressible
Hall MHD code with options to run in slab,
cylindrical and toroidal geometry and flexible
enough to allow change in algorithms as needed.
The code will use adaptive grid refinement, will
run on massively parallel computers, and will be
portable and scalable. The research goals include
studies that will provide increased understanding
of sawtooth oscillations in tokamaks, magnetotail
substorms, error-fields in tokamaks, reverse
field pinch dynamos, astrophysical dynamos, and
laboratory reconnection experiments. PI Amitava
BhattacharjeeUniversity of Iowa
7Summary of progress on CEMM
- TOPS running full-up M3D code on PPPLs preferred
NERSC Seaborg platform - Using up to 75 processors per poloidal
cross-plane in weak scaling fixed memory per
processor (parallel concurrency can be 10-100
times this in toroidal direction), PETSc handles
all aspects of the distributed data structures - Have integrated Hypre as preconditioner to PETSc
and created new Hypre coarsening rules to
accommodate a peculiarity of M3Ds Dirichlet
boundary conditions - 1-level additive Schwarz preconditioner has
excellent per-iteration implementation
efficiency, but nonscalable convergence - Algebraic Multigrid preconditioner has perfect
convergence scalability, but (presently) high
set-up time and poor set-up scalability, and
mildly degrading per-iteration implementation
efficiency - PLAN separately tune each of the 11 different
Poisson solves per timestep - PLAN work on improving Hypre solves and
amortizing Hypre set-ups - PLAN (with APDEC team) incorporate AMR
8 Hypres AMG in CEMM app
- M3Ds custom discretization and parallel data
structures dictated by geometry and physics
periodic toroidally, unstructured poloidally, fit
to flux surfaces, partitioned into 3n2 chunks
poloidally times arbitrary toroidal
decomposition, some mesh anisotropy - Mesh mapped and reordered using PETScs index
sets, solved by GMRES, preconditioned with
Hypres parallel algebraic MG solver of
Ruge-Stueben type
figures c/o S. Jardin, CEMM
9 Hypres AMG in CEMM app
- Iteration count results below are averaged over
19 different PETSc SLESSolve calls in
initialization and first timestep loop for this
operator-split unsteady code, abcissa is number
of procs in scaled problem problem size ranges
from 12K to 303K unknowns (approx 4K per
processor)
iterations
size or procs
10 Hypres AMG in CEMM app, cont.
- Scaled speedup timing results below are summed
over 19 different PETSc SLESSolve calls in
initialization and first timestep loop for this
operator-split unsteady code - Much of AMG cost is coarse-grid formation
(preprocessing) in production, these coarse
hierarchies will be saved for reuse (same linear
systems are called in each timestep loop), making
AMG less expensive
time
size or procs
11 Hypres AMG in CEMM app, cont.
12 Hypres AMG in CEMM app, cont.
13Summary of progress on CMRS
- CMRS team has provided TOPS with discretization
of model 2D multicomponent MHD evolution code in
PETScs FormFunctionLocal format using DMMG and
automatic differentiation for Jacobian objects - TOPS has implemented fully nonlinearly implicit
GMRES-MG-ILU parallel solver with custom
deflation of nullspace in CMRSs doubly periodic
formulation - CMRS and TOPS reproduce the same dynamics on the
same grids with the same time-stepping, up to a
finite-time singularity due to collapse of
current sheet (that falls below presently uniform
mesh resolution) - TOPS code, being implicit, can choose timesteps
an order of magnitude larger, with potential for
higher ratio in more physically realistic
parameter regimes, but is still slower in
wall-clock time - PLAN tune PETSc solver by profiling, blocking,
reuse, etc. - PLAN go to higher-order in time
- PLAN identify the numerical complexity benefits
from implicitness (in suppressing fast
timescales) and quantify (explicit versus
implicit) - PLAN (with APDEC team) incorporate AMR
142D Hall MHD sawtooth instability
(Porcelli et al., 1993, 1999)
Model equations
Equilibrium
figures c/o A. Bhattacharjee, CMRS
15 PETScs DMMG in CMRS app
- Mesh and time refinement studies of CMRS Hall
magnetic reconnection model problem (4 mesh
sizes, dt0.1 (nondimensional, near CFL limit for
fastest wave) on left, dt0.8 on right) - Measure of functional inverse to thickness of
current sheet versus time, for 0lttlt200
(nondimensional), where singularity occurs around
t215
16 PETScs DMMG in CMRS app, cont.
- Implicit timestep increase studies of CMRS Hall
magnetic reconnection model problem, on finest
(192?192) mesh of previous slide, in absolute
magnitude, rather than semi-log
17Scope for TOPS
- Design and implementation of solvers
- Time integrators
- Nonlinear solvers
- Constrained optimizers
- Linear solvers
- Eigensolvers
- Software integration
- Performance optimization
(w/ sens. anal.)
(w/ sens. anal.)
18Jacobian-Free Newton-Krylov Method
- In the Jacobian-Free Newton-Krylov (JFNK) method
for F(u)0 , a Krylov method solves the linear
Newton correction equation, requiring
Jacobian-vector products - These are approximated by the Fréchet derivatives
-
- so that the actual Jacobian elements are
never explicitly needed, where ? is chosen
with a fine balance between approximation and
floating point rounding error (or by automatic
differentiation)
- Build preconditioner using convenient
approximations without losing Newton convergence
in the outer loop
19Background of PETSc Library
- Developed at ANL to support research,
prototyping, and production parallel solutions of
operator equations in message-passing
environments - Distributed data structures as fundamental
objects - index sets, vectors/gridfunctions, and
matrices/arrays - Iterative linear and nonlinear solvers,
combinable modularly and recursively, and
extensibly - Portable, and callable from C, C, Fortran
- Uniform high-level API, with multi-layered entry
- Aggressively optimized copies minimized,
communication aggregated and overlapped, caches
and registers reused, memory chunks preallocated,
inspector-executor model for repetitive tasks
(e.g., gather/scatter)
See http//www.mcs.anl.gov/petsc
20User Code/PETSc Interactions
Main Routine
Timestepping Solvers (TS)
Nonlinear Solvers (SNES)
Linear Solvers (SLES)
PETSc
PC
KSP
Application Initialization
Function Evaluation
Jacobian Evaluation
Post- Processing
User code
PETSc code
21User Code/PETSc Interactions
Main Routine
Timestepping Solvers (TS)
Nonlinear Solvers (SNES)
Linear Solvers (SLES)
PETSc
PC
KSP
Application Initialization
Function Evaluation
Jacobian Evaluation
Post- Processing
User code
PETSc code
AD code
22Background of Hypre Library
- Developed at LLNL to support research,
prototyping, and production parallel solutions of
linear equations in message-passing environments - Object-oriented design similar to PETSc
- Concentrates on linear problems only
- Richer in preconditioners than PETSc, with focus
on algebraic multigrid, including many research
prototypes not yet in the public release - Includes other preconditioners, including sparse
approximate inverse (Parasails) and parallel ILU
(Euclid)
See http//www.llnl.gov/CASC/hypre/
23Algebraic multigrid background
- AMG assumes only information about the underlying
matrix structure - AMG automatically forms coarse operators, based
on problem data, using heuristics - There are many AMG algorithms, and the area is
under active theoretical and experimental
development
algebraically smooth error
Recur, as in geometric MG
24Geometric multigrid a special case
Finest Grid
figure c/o hypre team, LLNL
25Sample of Hypres scaled efficiency
26Whats wrong with this picture?
- Curse of the fast waves
- MHD phenomena include waves at many different
timescales, some of which (e.g., Whistler,
Alfvén) are not dynamically relevant to some
simulations but suppress the timestep of explicit
methods due to instability - Curse of the splitting errors
- Traditional operator-split approaches to
advancing evolutionary PDEs incur
first-order-in-time splitting errors that
suppress the timestep of semi-implicit methods
due to inaccuracy - Curse of the brute force algebraic solvers
- Fully implicit methods to overcome timestep
limitations demand powerful solvers, whose cost
per iteration for large timesteps may take back
any savings they provide by allowing larger
timesteps for a given accuracy
27What do we need?
- To overcome the algebraic complexity of implicit
solvers - Knowledge of the physics already present in
traditional operator-split approaches - To begin to couple existing large-scale nonlinear
physics codes - An algebraic framework like Schwarz methods for
linear problems, which shows how to use
conveniently obtained solutions to subproblems of
the new fully coupled problem (e.g., the solver
procedures in the original component codes) to
arrive at the solution of the full problem,
faster and more reliably than by nonlinear Picard
iteration (successive substitutions) alone - TOPS is addressing these near-downstream issues
- Physics-based preconditioning (work w/ D. Knolls
LANL group) - Nonlinear Schwarz (work w/ X.-C. Cais Boulder
group (part of TOPS)) - Nonlinear solver interface (work w/ SciDAC CCA
project)
28Philosophy of Jacobian-free NK
- To evaluate the linear residual, we use the true
F(u) , giving a true Newton step and asymptotic
quadratic Newton convergence - To precondition the linear residual, we do
anything convenient that uses understanding of
the dominant physics/mathematics in the system
and respects the limitations of the parallel
computer architecture and the cost of various
operations - combinations of operator-split Jacobians (for
reasons of physics or reasons of numerics) - Jacobian of related discretization (for fast
solves) - Jacobian of lower-order discretization (for more
stability, less storage) - Jacobian with lagged values for expensive terms
(for less computation per degree of freedom) - Jacobian stored in lower precision (for less
memory traffic per preconditioning step) - Jacobian blocks decomposed for parallelism
29Philosophy of Jacobian-free NK, cont.
- These motivations are not new most large-scale
application codes also take short cuts on the
approximate Jacobian operator to be inverted
showing physical intuition - The problem with many codes is that they do not
anywhere have an accurate global Jacobian
operator they use only the weak Jacobian - This leads to a weakly nonlinearly converging
defect correction method - Defect correction
- in contrast to preconditioned Newton
30Physics-based preconditioningexample of 1D
Shallow Water Equations
- Continuity
- Momentum
- Gravity wave speed
- Typically , but stability
restrictions require timesteps based on the CFL
criterion for the fastest wave, for an explicit
method - One can solve fully implicitly, or one can filter
out the gravity wave by solving semi-implicitly
31Physics-based preconditioning, cont.
- Continuity ()
- Momentum ()
- Solving () for and
substituting into (),
- where
32Physics-based preconditioning, cont.
- After the parabolic equation is spatially
discretized and solved for , then
can be found from
- One scalar parabolic solve and one scalar
explicit update replace an implicit hyperbolic
system - Similar tricks are employed in aerodynamics
(sound waves), MHD (Alfvén waves), etc. - Temporal truncation error remains in (), due to
lagged advection, but in delta form it makes a
great preconditioner - Mousseau et al. for gravity waves (generalization
of this example) - Pernice et al. for acoustic waves (SIMPLE scheme)
- Chacon et al. for Whistler waves in MHD (very
intricate see JCP)
33 Physics-based Preconditioning
- When the shallow-water wave splitting is used as
a solver, it leaves a first-order in time
splitting error - In the Jacobian-free Newton-Krylov framework,
this solver, which maps a residual into a
correction, can be regarded as a preconditioner - The true Jacobian is never formed yet the
time-implicit nonlinear residual at each time
step can be made as small as needed for nonlinear
consistency in long time integrations
34Physics-based preconditioning
- In Newton iteration, one seeks to obtain a
correction (delta) to solution, by inverting
the Jacobian matrix on (the negative of) the
nonlinear residual - A typical operator-split code also derives a
delta to the solution, by some implicitly
defined means, through a series of implicit and
explicit substeps - This implicitly defined mapping from residual to
delta is a natural preconditioner - TOPS Software must (and will) accommodate this!
351D Shallow water preconditioning
- Define continuity residual for each timestep
- Define momentum residual for each timestep
- Continuity delta-form ()
- Momentum delta form ()
361D Shallow water preconditioning, cont.
- Solving () for and
substituting into (),
- After this parabolic equation is solved for ?? ,
we have - This completes the application of the
preconditioner to one Newton-Krylov iteration at
one timestep - Of course, the parabolic solve need not be done
exactly one sweep of multigrid can be used
37Operator-split preconditioning
- Subcomponents of a PDE operator often have
special structure that can be exploited if they
are treated separately - Algebraically, this is just a generalization of
Schwarz, by term instead of by subdomain - Suppose and a
preconditioner is to be constructed, where
and are each easy to
invert - Form a preconditioned vector from as follows
- Equivalent to replacing with
- First-order splitting error, yet often used as a
solver!
38Operator-split preconditioning, cont.
- Suppose S is convection-diffusion and R is
reaction, among a collection of fields stored as
gridfunctions - On a small regular 2D grid with a five-point
stencil - R is trivially invertible in block diagonal form
- S is invertible with one multilevel solve per
field
J
S
R
39Operator-split preconditioning, cont.
- Preconditioners assembled from just the strong
elements of the Jacobian, alternating the source
term and the diffusion term operators, are
competitive in convergence rates with block-ILU
on the Jacobian - particularly, since the decoupled scalar
diffusion systems are amenable to simple
multigrid treatment not as trivial for the
coupled system - The decoupled preconditioners store many fewer
elements and significantly reduce memory
bandwidth requirements and are expected to be
much faster per iteration when carefully
implemented - See alternative block factorization by Bank et
al. incorporated into SciDAC TSI solver by
DAzevedo
40 Nonlinear Schwarz preconditioning
- Nonlinear Schwarz has Newton both inside and
outside and is fundamentally Jacobian-free - It replaces with a new nonlinear
system possessing the same root, - Define a correction to the
partition (e.g., subdomain) of the solution
vector by solving the following local nonlinear
system - where is nonzero only in
the components of the partition - Then sum the corrections
41 Nonlinear Schwarz, cont.
- It is simple to prove that if the Jacobian of
F(u) is nonsingular in a neighborhood of the
desired root then and
have the same unique root - To lead to a Jacobian-free Newton-Krylov
algorithm we need to be able to evaluate for any
- the residual
- the Jacobian-vector product
- Remarkably, (Cai-Keyes, 2000) it can be shown
that - where and
- All required actions are available in terms of
!
42Example of nonlinear Schwarz
43 Coupling using Nonlinear Schwarz
- Consider systems F1(u1 ,u2)0 and F2(u1 ,u2)0
- Couple together as F(u)0
- Apply nonlinear Schwarz with nonoverlapping
partitions corresponding to original systems - Then
- where
- (gives insight into coupling
direction/scaling)
44Abstract Gantt Chart for TOPS
Each color module represents an algorithmic
research idea on its way to becoming part of a
supported community software tool. At any moment
(vertical time slice), TOPS has work underway at
multiple levels. While some codes are in
applications already, they are being improved in
functionality and performance as part of the TOPS
research agenda.
Dissemination
Applications Integration
Hardened Codes
e.g.,PETSc
Research Implementations
e.g.,TOPSLib
Algorithmic Development
e.g., ASPIN
time
45 Summary
- Important codes in fusion MHD are bottlenecked
without powerful parallel implicit solvers - These same codes often contain valuable
approximate solvers being employed as the only
solver, thereby limiting timestep effectiveness - Coupling such codes together will only impose
further limits on the timestep it is likely to
be smaller than the smallest timestep permitted
by any component code, to accommodate higher
levels of coupling - Nonlinearly implicit solvers for each component
can leverage existing solvers as physics-based
preconditioners, if software allows,
strengthening the components even prior to their
integration - Nonlinear versions of Schwarz contain promise for
preconditioning coupled codes - Doing physics is more fun than doing driven
cavities ?
46For more information ...
Come to poster session tonight, see Knoll Keyes
(2002) JCP preprint at http//www.math.odu.edu/k
eyes and see
http//www.tops-scidac.org