Technologies and Tools for High-Performance Distributed Computing - PowerPoint PPT Presentation

Loading...

PPT – Technologies and Tools for High-Performance Distributed Computing PowerPoint presentation | free to download - id: 69921f-NDQ1Y



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Technologies and Tools for High-Performance Distributed Computing

Description:

... Large platforms have been provided NSF s 13.6 TF TeraGrid coming on line Building platforms is the easy part ... Basic Concepts Iterative ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 70
Provided by: William991
Learn more at: http://www.ccs.uky.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Technologies and Tools for High-Performance Distributed Computing


1
Algorithms and Software for Terascale Computation
of PDEs and PDE-constrained Optimization
David E. Keyes Center for Computational
Science Old Dominion University Institute for
Scientific Computing Research (ISCR) Lawrence
Livermore National Laboratory
2
Plan of presentation
  • Imperative of optimal algorithms for terascale
    computing
  • Basic domain decomposition and multilevel
    algorithmic concepts
  • Illustration of solver performance on ASCI
    platforms
  • Terascale Optimal PDE Simulations (TOPS) software
    project of the U.S. DOE
  • Conclusions and outlook

3
Related URLs
  • Personal homepage papers, talks, etc.
  • http//www.math.odu.edu/keyes
  • SciDAC initiative
  • http//www.science.doe.gov/scidac
  • TOPS software project
  • http//www.math.odu.edu/keyes/scidac
  • PETSc software project
  • http//www.mcs.anl.gov/petsc
  • Hypre software project
  • http//www.llnl.gov/CASC/hypre

4
Bibliography
  • Jacobian-Free Newton-Krylov Methods Approaches
    and Applications, Knoll Keyes, 2002, submitted
    to J. Comp. Phys.
  • Nonlinearly Preconditioned Inexact Newton
    Algorithms, Cai Keyes, 2002, SIAM J. Sci. Comp.
    24183-200
  • High Performance Parallel Implicit CFD, Gropp,
    Kaushik, Keyes Smith, 2001, Parallel Computing
    27337-362
  • Four Horizons for Enhancing the Performance of
    Parallel Simulations based on Partial
    Differential Equations, Keyes, 2000, Lect. Notes
    Comp. Sci., Springer, 19001-17
  • Globalized Newton-Krylov-Schwarz Algorithms and
    Software for Parallel CFD, Gropp, Keyes, McInnes
    Tidriri, 2000, Int. J. High Performance
    Computing Applications 14102-136
  • Achieving High Sustained Performance in an
    Unstructured Mesh CFD Application, Anderson,
    Gropp, Kaushik, Keyes Smith, 1999, Proceedings
    of SC'99
  • Prospects for CFD on Petaflops Systems, Keyes,
    Kaushik Smith, 1999, in Parallel Solution of
    Partial Differential Equations, Springer, pp.
    247-278
  • How Scalable is Domain Decomposition in
    Practice?, Keyes, 1998, in Proceedings of the
    11th Intl. Conf. on Domain Decomposition
    Methods, Domain Decomposition Press, pp. 286-297

5
Terascale simulation has been sold
Applied Physics radiation transport supernovae
Environment global climate contaminant transport
Scientific Simulation
In these, and many other areas, simulation is an
important complement to experiment.
6
Terascale simulation has been sold
Applied Physics radiation transport supernovae
Environment global climate contaminant transport
Experiments controversial
Scientific Simulation
In these, and many other areas, simulation is an
important complement to experiment.
7
Terascale simulation has been sold
Applied Physics radiation transport supernovae
Experiments dangerous
Environment global climate contaminant transport
Experiments controversial
Scientific Simulation
In these, and many other areas, simulation is an
important complement to experiment.
8
Terascale simulation has been sold
Experiments prohibited or impossible
Applied Physics radiation transport supernovae
Experiments dangerous
Environment global climate contaminant transport
Experiments controversial
Scientific Simulation
In these, and many other areas, simulation is an
important complement to experiment.
9
Terascale simulation has been sold
Experiments prohibited or impossible
Applied Physics radiation transport supernovae
Experiments dangerous
Experiments difficult to instrument
Environment global climate contaminant transport
Experiments controversial
Scientific Simulation
In these, and many other areas, simulation is an
important complement to experiment.
10
Terascale simulation has been sold
Experiments prohibited or impossible
Applied Physics radiation transport supernovae
Experiments dangerous
Experiments difficult to instrument
Environment global climate contaminant transport
Experiments controversial
Experiments expensive
Scientific Simulation
In these, and many other areas, simulation is an
important complement to experiment.
11
Terascale simulation has been sold
Experiments prohibited or impossible
Applied Physics radiation transport supernovae
Experiments dangerous
Experiments difficult to instrument
Environment global climate contaminant transport
Experiments controversial
Experiments expensive
Scientific Simulation
However, simulation is far from proven! To meet
expectations, we need to handle problems of
multiple physical scales.
12
Large platforms have been provided
100 Tflop / 30 TB
Livermore
50 Tflop / 25 TB
7.2 Tflop/s LINPACK
30 Tflop / 10 TB
Capability
10 Tflop / 4 TB
White
3 Tflop / 1.5 TB
Blue
Livermore
Red
1 Tflop / 0.5 TB
97
98
99
00
01
02
03
04
05
06
Time (CY)
Sandia
Los Alamos
ASCI program of the U.S. DOE has roadmap to go to
100 Tflop/s by 2006 www.llnl.gov/asci/platforms
13
NSFs 13.6 TF TeraGrid coming on line
TeraGrid NCSA, SDSC, Caltech, Argonne
www.teragrid.org
Site Resources
Site Resources
26
HPSS
HPSS
4
24
External Networks
External Networks
8
5
40Gb/s
External Networks
External Networks
Site Resources
Site Resources
HPSS
UniTree
c/o I. Foster
14
Earth Simulator
Birds-eye View of the Earth Simulator System
Disks
Cartridge Tape Library System
Processor Node (PN) Cabinets
35.6 Tflop/s LINPACK
Interconnection Network (IN) Cabinets
Air Conditioning System
65m
Power Supply System
50m
Double Floor for IN Cables
15
Building platforms is the easy part
  • Algorithms must be
  • highly concurrent and straightforward to load
    balance
  • latency tolerant
  • cache friendly (temporal and spatial locality of
    reference)
  • highly scalable (in the sense of convergence)
  • Goal for algorithmic scalability fill up memory
    of arbitrarily large machines while preserving
    constant running times with respect to
    proportionally smaller problem on one processor
  • Domain-decomposed multilevel methods natural
    for all of these
  • Domain decomposition also natural for software
    engineering

or logarithmically growing
16
Algorithmic requirementsfrom architecture
  • Must run on physically distributed memory units
    connected by message-passing network, each
    serving one or more processors with multiple
    levels of cache

horizontal aspects
vertical aspects
T3E
message passing, shared memory threads
register blocking, cache blocking, prefetching
17
Keyword Optimal
  • Convergence rate nearly independent of
    discretization parameters
  • Multilevel schemes for rapid linear convergence
    of linear problems
  • Newton-like schemes for quadratic convergence of
    nonlinear problems
  • Convergence rate as independent as possible of
    physical parameters
  • Continuation schemes
  • Physics-based preconditioning

18
Why Optimal Algorithms?
  • The more powerful the computer, the greater the
    importance of optimality
  • Example
  • Suppose Alg1 solves a problem in time CN2, where
    N is the input size
  • Suppose Alg2 solves the same problem in time CN
  • Suppose that the machine on which Alg1 and Alg2
    have been parallelized to run has 10,000
    processors
  • In constant time (compared to serial), Alg1 can
    run a problem 100X larger, whereas Alg2 can run a
    problem 10,000X larger

19
Why Optimal?, cont.
  • Alternatively, filling the machines memory, Alg1
    requires 100X time, whereas Alg2 runs in constant
    time
  • Is 10,000 processors a reasonable expectation?
  • Yes, we have it today (ASCI White)!
  • Could computational scientists really use 10,000X
    scaling?
  • Of course we are approximating the continuum
  • A grid for weather prediction allows points every
    1km versus every 100km on the earths surface
  • In 2D 10,000X disappears fast in 3D even faster
  • However, these machines are expensive (Earth
    Simulator is 0.5B, plus ongoing operating
    costs), and optimal algorithms are the only
    algorithms that we can afford to run on them

20
Decomposition strategies for Luf in ?
  • Operator decomposition
  • Function space decomposition
  • Domain decomposition

21
Operator decomposition
  • Consider ADI
  • Iteration matrix consists of four sequential
    (multiplicative) substeps per timestep
  • two sparse matrix-vector multiplies
  • two sets of unidirectional bandsolves
  • Parallelism within each substep
  • But global data exchanges between bandsolve
    substeps

22
Function space decomposition
  • Consider a spectral Galerkin method

23
Domain decomposition
  • Consider restriction and extension operators for
    subdomains, , and for possible
    coarse grid,
  • Replace discretized with
  • Solve by a Krylov method, e.g., CG
  • Matrix-vector multiplies with
  • parallelism on each subdomain
  • nearest-neighbor exchanges, global reductions
  • possible small global system (not needed for
    parabolic case)

24
Comparison
  • Operator decomposition (ADI)
  • natural row-based assignment requires all-to-all,
    bulk data exchanges in each step (for transpose)
  • Function space decomposition (Fourier)
  • natural mode-based assignment requires
    all-to-all, bulk data exchanges in each step (for
    transform)
  • Domain decomposition (Schwarz)
  • natural domain-based assignment requires local
    (nearest neighbor) data exchanges, global
    reductions, and optional small global problem

25
Theoretical scaling of domain decomposition (for
three common network topologies)
  • With logarithmic-time (hypercube- or tree-based)
    global reductions and scalable nearest neighbor
    interconnects
  • optimal number of processors scales linearly with
    problem size (scalable, assumes one subdomain
    per processor)
  • With power-law-time (3D torus-based) global
    reductions and scalable nearest neighbor
    interconnects
  • optimal number of processors scales as
    three-fourths power of problem size (almost
    scalable)
  • With linear-time (common bus) network
  • optimal number of processors scales as one-fourth
    power of problem size (not scalable)
  • bad news for conventional Beowulf clusters, but
    see 2000 2001 Bell Prize price-performance
    awards using multiple commodity NICs per Beowulf
    node!

26
Three Basic Concepts
  • Iterative correction
  • Schwarz preconditioning
  • Schur preconditioning

27
Iterative correction
  • The most basic idea in iterative methods
  • Evaluate residual accurately, but solve
    approximately, where is an approximate
    inverse to A
  • A sequence of complementary solves can be used,
    e.g., with first and then one has
  • Optimal polynomials of lead to
    various preconditioned Krylov methods
  • Scale recurrence, e.g., with
    , leads to multilevel methods

28
Multilevel Preconditioning
Finest Grid
29
Schwarz Preconditioning
  • Given A x b , partition x into subvectors,
    corresp. to subdomains of the domain
    of the PDE, nonempty, possibly overlapping, whose
    union is all of the elements of
  • Let Boolean rectangular matrix extract the
    subset of
  • Let

The Boolean matrices are gather/scatter
operators, mapping between a global vector and
its subdomain support
30
Iteration count estimates from the Schwarz theory
  • Krylov-Schwarz iterative methods typically
    converge in a number of iterations that scales as
    the square-root of the condition number of the
    Schwarz-preconditioned system
  • In terms of N and P, where for d-dimensional
    isotropic problems, Nh-d and PH-d, for mesh
    parameter h and subdomain diameter H, iteration
    counts may be estimated as follows

31
Comments on the Schwarz theory
  • Basic Schwarz estimates are for
  • self-adjoint operators with smooth coefficients
  • positive definite operators
  • exact subdomain solves,
  • two-way overlapping with
  • generous overlap, ?O(H) (otherwise 2-level
    result is O(1H/?))
  • Extensible to
  • nonself-adjointness (e.g, convection) and jumping
    coefficients
  • indefiniteness (e.g., wave Helmholtz)
  • inexact subdomain solves
  • one-way overlap communication (restricted
    additive Schwarz)
  • small overlap

32
Schur Preconditioning
  • Given a partition
  • Condense
  • Let M be a good preconditioner for S
  • Then is a
    preconditioner for A
  • Moreover, solves with may be done
    approximately if all degrees of freedom are
    retained

33
Schwarz polynomials
  • Polynomials of Schwarz projections that are
    combinations of additive and multiplicative may
    be appropriate for certain implementations
  • We may solve the fine subdomains concurrently and
    follow with a coarse grid (redundantly/cooperative
    ly)

34
Schwarz-on-Schur
  • Preconditioning the Schur complement is complex
    in and of itself Schwarz can be used on the
    reduced problem
  • Neumann-Neumann alg
  • Multigrid on the Schur complement

35
Schwarz-inside-Schur
  • Consider Newtons method for solving the
    nonlinear rootfinding problem derived from the
    necessary conditions for constrained
    optimization
  • Constraint
  • Objective
  • Lagrangian
  • Form the gradient of the Lagrangian with respect
    to each of x, u, and ?

36
Schwarz-inside-Schur
  • Equality constrained optimization leads to the
    KKT system for states x , designs u , and
    multipliers ?
  • Then

37
Schwarz-inside-Schur, cont.
  • Problems
  • is the Jacobian of a PDE ? huge!
  • involve Hessians of objective and
    constraints ? second derivatives and huge
  • H is unreasonable to form, store, or invert
  • Solutions
  • Use Schur preconditioning on full system
  • Form forward action of Hessians by automatic
    differentiation (vector-to-vector map)
  • Form approximate inverse action of state Jacobian
    and its transpose by Schwarz

38
Example of PDE-constrained Optimization
Lagrange-Newton-Krylov-Schur implemented in
Veltisto/PETSc
  • Optimal control of laminar viscous flow
  • optimization variables are surface
    suction/injection
  • objective is minimum drag
  • 700,000 states 4,000 controls
  • 128 Cray T3E processors
  • 5 hrs for optimal solution (1 hr for analysis)

c/o G. Biros and O. Ghattas
www.cs.nyu.edu/biros/veltisto/
39
Nonlinear Schwarz preconditioning
  • Nonlinear Schwarz has Newton both inside and
    outside and is fundamentally Jacobian-free
  • It replaces with a new nonlinear
    system possessing the same root,
  • Define a correction to the
    partition (e.g., subdomain) of the solution
    vector by solving the following local nonlinear
    system
  • where is nonzero only in the
    components of the partition
  • Then sum the corrections

40
Nonlinear Schwarz, cont.
  • It is simple to prove that if the Jacobian of
    F(u) is nonsingular in a neighborhood of the
    desired root then and
    have the same unique root
  • To lead to a Jacobian-free Newton-Krylov
    algorithm we need to be able to evaluate for any
  • The residual
  • The Jacobian-vector product
  • Remarkably, (Cai-Keyes, 2000) it can be shown
    that
  • where and
  • All required actions are available in terms of
    !

41
Example of nonlinear Schwarz
42
Unreasonable effectiveness of Schwarz
  • When does the sum of partial inverses equal the
    inverse of the sums? When the decomposition is
    right!

Let be a complete set of orthonormal row
eigenvectors for A or
  • Good decompositions are a compromise between
    conditioning and parallel complexity, in practice

43
Newton-Krylov-Schwarz a parallel PDE
workhorse
Popularized in parallel Jacobian-free form under
this name by Cai, Gropp, Keyes Tidriri (1994),
in PETSc since Balays MS project at ODU (1995)
Newton nonlinear solver asymptotically quadratic
Krylov accelerator spectrally adaptive
Schwarz preconditioner parallelizable
44
Jacobian-Free Newton-Krylov Method
  • In the Jacobian-Free Newton-Krylov (JFNK) method,
    a Krylov method solves the linear Newton
    correction equation, requiring Jacobian-vector
    products
  • These are approximated by the Fréchet derivatives
  • so that the actual Jacobian elements are
    never explicitly needed, where ? is chosen
    with a fine balance between approximation and
    floating point rounding error
  • ? Schwarz preconditions, using approximate
    elements

45
Philosophy of Jacobian-free NK
  • To evaluate the linear residual, we use the true
    F(u) , giving a true Newton step and asymptotic
    quadratic Newton convergence
  • To precondition the linear residual, we do
    anything convenient that uses understanding of
    the dominant physics/mathematics in the system
    and respects the limitations of the parallel
    computer architecture and the cost of various
    operations
  • combinations of operator-split Jacobians (for
    reasons of physics or reasons of numerics)
  • Jacobian of related discretization (for fast
    solves)
  • Jacobian of lower-order discretization (for more
    stability, less storage)
  • Jacobian with lagged values for expensive terms
    (for less computation per degree of freedom)
  • Jacobian stored in lower precision (for less
    memory traffic per preconditioning step)
  • Jacobian blocks decomposed for parallelism

46
Philosophy of Jacobian-free NK, cont.
  • These motivations are not new most large-scale
    application codes also take short cuts on the
    approximate Jacobian operator to be inverted
    showing physical intuition
  • The problem with many codes is that they do not
    anywhere have an accurate global Jacobian
    operator they use only the weak Jacobian
  • This leads to a weakly nonlinearly converging
    defect correction method
  • Defect correction
  • in contrast to preconditioned Newton

47
Physics-based Preconditioning
  • Recall the example of the shallow-water wave
    splitting, treated earlier as a solver, leaving a
    first-order in time splitting error
  • In the Jacobian-free Newton-Krylov framework,
    this solver, which maps a residual into a
    correction, can be regarded as a preconditioner
  • The true Jacobian is never formed yet the
    time-implicit nonlinear residual at each time
    step can be made as small as needed for nonlinear
    consistency in long time integrations

48
Physics-based preconditioning
  • In Newton iteration, one seeks to obtain a
    correction (delta) to solution, by inverting
    the Jacobian matrix on (the negative of) the
    nonlinear residual
  • A typical operator-split code also derives a
    delta to the solution, by some implicitly
    defined means, through a series of implicit and
    explicit substeps
  • This implicitly defined mapping from residual to
    delta is a natural preconditioner
  • Software must accommodate this!

49
1D Shallow water preconditioning
  • Define continuity residual for each timestep
  • Define momentum residual for each timestep
  • Continuity delta-form ()
  • Momentum delta form ()

50
1D Shallow water preconditioning, cont.
  • Solving () for and
    substituting into (),
  • After this parabolic equation is solved for ?? ,
    we have
  • This completes the application of the
    preconditioner to one Newton-Krylov iteration at
    one timestep
  • Of course, the parabolic solve need not be done
    exactly one sweep of multigrid can be used
  • See paper by Mousseau et al. (2002) in Ref 1
    for impressive results for longtime weather
    integration

51
Operator-split preconditioning
  • Subcomponents of a PDE operator often have
    special structure that can be exploited if they
    are treated separately
  • Algebraically, this is just a generalization of
    Schwarz, by term instead of by subdomain
  • Suppose and a
    preconditioner is to be constructed, where
    and are each easy to
    invert
  • Form a preconditioned vector from as follows
  • Equivalent to replacing with
  • First-order splitting error, yet often used as a
    solver!

52
Operator-split preconditioning, cont.
  • Suppose S is convection-diffusion and R is
    reaction, among a collection of fields stored as
    gridfunctions
  • On a small regular 2D grid with a five-point
    stencil
  • R is trivially invertible in block diagonal form
  • S is invertible with one multilevel solve per
    field

J

S

R
53
Operator-split preconditioning, cont.
  • Preconditioners assembled from just the strong
    elements of the Jacobian, alternating the source
    term and the diffusion term operators, are
    competitive in convergence rates with block-ILU
    on the Jacobian
  • particularly, since the decoupled scalar
    diffusion systems are amenable to simple
    multigrid treatment not as trivial for the
    coupled system
  • The decoupled preconditioners store many fewer
    elements and significantly reduce memory
    bandwidth requirements and are expected to be
    much faster per iteration when carefully
    implemented
  • See alternative block factorization by Bank et
    al. in Ref 1 incorporated into SciDAC TSI
    solver by DAzevedo

54
Using Jacobian of related discretization
  • To precondition a variable coefficient operator,
    such as ?(?? ?) , use , based on a
    constant coefficient average
  • Brown Saad (1980) showed that, because of the
    availability of fast solvers, it may even be
    acceptable to use to precondition
    something like

55
Using Jacobian of lower order discretization
  • Orszag popularized the use of linear finite
    element discretizations as preconditioners for
    high-order spectral element discretizations in
    the 1970s both approach the same continuous
    operator
  • It is common in CFD to employ first-order
    upwinded convective operators as approximate
    inversions for higher-order operators
  • better factorization stability
  • smaller matrix bandwidth and complexity
  • With Jacobian-free NK, we can have the best of
    both worlds a stable factorization/cheap solve
    and a true Jacobian step

56
Using Jacobian with lagged terms
  • Newton-chord methods (e.g., papers by Smooke et
    al.) freeze the Jacobian matrices
  • saves Jacobian evaluation and factorization,
    which can be up to 90 of the running time of the
    code in some apps
  • however, nonlinear convergence degrades to linear
    rate
  • In Jacobian-free NK, we can freeze some or all
    of the terms in the Jacobian preconditioner,
    while always accessing the action of the true
    Jacobian for the Krylov matrix-vector multiply
  • still saves Jacobian work
  • maintains asymptotically quadratic rate for
    nonlinear convergence
  • See ref 1 for example with coupled edge plasma
    and Navier-Stokes, showing five-fold improvement
    over full Newton with constantly refreshed
    Jacobian on LHS, versus JFNK with preconditioner
    refreshed once each ten timesteps

57
Using Jacobian with lower precision elements
  • Memory bandwidth is the critical architectural
    parameter for sparse linear algebra computations
  • Storing the preconditioner elements in single
    precision effectively doubles memory bandwidth
    (and potentially halves runtime) for this
    critical phase
  • We still form the Jacobian-vector product with
    full precision and zero-pad the preconditioner
    elements back to full length in the arithmetic
    unit, so the numerical quality of the Krylov
    subspace does not degrade

58
Memory BW bottleneck revealed via precision
reduction
Execution times for unstructured NKS Euler
Simulation on Origin 2000 double precision
matrices versus single precision preconditioner
Note that times are nearly halved, along with
precision, for the BW-limited linear solve phase,
indicating that the BW can be at least doubled
before hitting the next bottleneck!
59
Computational Aerodynamics
Transonic Lambda Shock, Mach contours on
surfaces
mesh c/o D. Mavriplis, ICASE
Implemented in PETSc www.mcs.anl.gov/petsc
60
Fixed-size Parallel Scaling Results
This scaling study, featuring our widest range of
processor number, was done for the incompressible
case.
c/o K. Anderson, W. Gropp, D. Kaushik, D. Keyes
and B. Smith
61
  • Lab-university collaborations to develop
    Integrated Software Infrastructure Centers
    (ISICs) and partner with application groups
  • For FY2002, 51 new projects at 57M/year total
  • Approximately one-third for ISICs
  • A third for grid infrastructure and
    collaboratories
  • A third for applications groups
  • 5 Tflop/s IBM SP platforms Seaborg at NERSC (5
    in Jun02 Top 500 list) and Cheetah at ORNL
    (recently installed) available for SciDAC

62
Introducing Terascale Optimal PDE Simulations
(TOPS) ISIC
Nine institutions, 17M, five years, 24 co-PIs
63
Scope for TOPS
  • Design and implementation of solvers
  • Time integrators, with sens. analysis
  • Nonlinear solvers, with sens. analysis
  • Optimizers
  • Linear solvers
  • Eigensolvers

Optimizer
Sens. Analyzer
Time integrator
Nonlinear solver
Eigensolver
Linear solver
  • Software integration
  • Performance optimization

Indicates dependence
64
TOPS Philosophy on PDEs
  • Solution of a system of PDEs is rarely a goal in
    itself
  • PDEs are solved to derive various functionals
    from specified inputs
  • Actual goal is characterization of a response
    surface or a design or control strategy
  • Together with analysis, sensitivities and
    stability are often desired
  • Tools for PDE solution should also support such
    related desires

65
Conclusions
  • Domain decomposition and multilevel iteration the
    dominant paradigm in contemporary terascale PDE
    simulation
  • Several freely available software toolkits exist,
    and successfully scale to thousands of tightly
    coupled processors for problems on quasi-static
    meshes
  • Concerted efforts underway to make elements of
    these toolkits interoperate, and to allow
    expression of the best methods, which tend to be
    modular, hierarchical, recursive, and above all
    adaptive!
  • Tunability of NKS algorithmics allows solver
    adaption to application/architecture combinations
  • Next generation software should incorporate best
    practices in applications as preconditioners

66
Acknowledgments
  • Collaborators or Contributors
  • Xiao-Chuan Cai (Univ. Colorado, Boulder)
  • Omar Ghattas (Carnegie-Mellon)
  • Dinesh Kaushik (ODU)
  • Dana Knoll (LANL)
  • Dimitri Mavriplis (ICASE)
  • PETSc team at Argonne National Laboratory
  • Satish Balay, Bill Gropp, Lois
    McInnes, Barry Smith
  • Sponsors DOE, NASA, NSF
  • Computer Resources LLNL, LANL, SNL, NERSC

67
Related URLs
  • Personal homepage papers, talks, etc.
  • http//www.math.odu.edu/keyes
  • SciDAC initiative
  • http//www.science.doe.gov/scidac
  • TOPS project
  • http//www.math.odu.edu/keyes/scidac
  • PETSc project
  • http//www.mcs.anl.gov/petsc
  • Hypre project
  • http//www.llnl.gov/CASC/hypre
  • ASCI platforms
  • http//www.llnl.gov/asci/platforms

68
Bibliography
  • Jacobian-Free Newton-Krylov Methods Approaches
    and Applications, Knoll Keyes, 2002, to be
    submitted to J. Comp. Phys.
  • Nonlinearly Preconditioned Inexact Newton
    Algorithms, Cai Keyes, 2002, to appear in SIAM
    J. Sci. Comp.
  • High Performance Parallel Implicit CFD, Gropp,
    Kaushik, Keyes Smith, 2001, Parallel Computing
    27337-362
  • Four Horizons for Enhancing the Performance of
    Parallel Simulations based on Partial
    Differential Equations, Keyes, 2000, Lect. Notes
    Comp. Sci., Springer, 19001-17
  • Globalized Newton-Krylov-Schwarz Algorithms and
    Software for Parallel CFD, Gropp, Keyes, McInnes
    Tidriri, 2000, Int. J. High Performance
    Computing Applications 14102-136
  • Achieving High Sustained Performance in an
    Unstructured Mesh CFD Application, Anderson,
    Gropp, Kaushik, Keyes Smith, 1999, Proceedings
    of SC'99
  • Prospects for CFD on Petaflops Systems, Keyes,
    Kaushik Smith, 1999, in Parallel Solution of
    Partial Differential Equations, Springer, pp.
    247-278
  • How Scalable is Domain Decomposition in
    Practice?, Keyes, 1998, in Proceedings of the
    11th Intl. Conf. on Domain Decomposition
    Methods, Domain Decomposition Press, pp. 286-297

69
EOF
About PowerShow.com