Massively Parallel Cosmological Simulations with ChaNGa - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Massively Parallel Cosmological Simulations with ChaNGa

Description:

... with ChaNGa. Pritish Jetley, Filippo Gioachin, Celso Mendes, Laxmikant V. Kale and Thomas ... Calculate final states of theories of structure formation ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 24
Provided by: pritish
Learn more at: http://charm.cs.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Massively Parallel Cosmological Simulations with ChaNGa


1
Massively Parallel Cosmological Simulations with
ChaNGa
  • Pritish Jetley, Filippo Gioachin, Celso Mendes,
    Laxmikant V. Kale and Thomas Quinn

2
Simulations and Scientific Discovery
  • Help reconcile observation and theory
  • Calculate final states of theories of structure
    formation
  • Direct observational programs
  • What should we look for in space?
  • Help determine underlying structures and masses

3
Computational Challenges
  • N 1012
  • Direct summation forces would take 1010
    Teraflop years
  • Need efficient, scalable algorithms
  • Large dynamic ranges
  • Need multiple timestepping
  • Irregular domains
  • Balance load across processors

4
ChaNGa
  • Uses Barnes-Hut algorithm
  • Based on Charm
  • Processor virtualization
  • Asynchronous message-driven model
  • Computation and communication overlap
  • Intelligent, adaptive runtime system
  • Load balancing

5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
Major Optimizations
  • Pipelined computation
  • Prefetch tree chunk before starting traversal
  • Tree-in-Cache
  • Aggregate trees from all chares on processor
  • Tunable computation granularity
  • Response time for data requests vs Scheduling
    overhead

9
Experimental Setup
dwarf 5 and 50 million particles
lambs 3 million particles
drgas 700 million particles
hrwh_LCDMs 16 milllion particles
10
Experimental Setup (contd.)?
  • Platforms

11
Parallel Performance
A comparison of Parallel Performance with
PKDGRAV. (Dwarf' dataset on Tungsten.)?
12
Scaling Tests
IBM BG/L
Cray XT3
Poor scaling
13
Towards Greater Scalability
  • Load Imbalance causes poor scaling
  • Static balancing not good enough
  • Even number of particles ! Even work
    distribution
  • Must balance both computation communication

14
(No Transcript)
15
(No Transcript)
16
Results with OrbRefineLB
  • Different datasets
  • OrbRefineLB

17
(No Transcript)
18
(No Transcript)
19
Balancing Load in MS Runs
  • Different strategies for different phases
  • Multiphase instrumentation
  • Model-based load estimation (first few small
    steps)?

0
0
1
2
20
Preliminary Results
Singlestepped (613 s)?
  • Dwarf dataset
  • 32 BG/L processors
  • Different timestepping schemes

Multistepped (429 s)?
Multistepped with load balancing (228 s)?
21
Preliminary Results
  • 50 reduction in execution time
  • Lambb dataset
  • 512 and 1024 BG/L processors
  • Singlestepped vs load-balanced multistepped
  • Multistepping and overdecomposition
  • Lambb dataset
  • 1024 BG/L processors
  • Varying num. TreePieces

More TreePieces ? greater load balance
22
Future Work
  • SPH
  • Alternative decomposition schemes
  • Runtime optimizations to reduce communication
    cost
  • More sophisticated load balancing algorithms
  • Account for
  • Complete simulation space topology
  • Processor topology (reduce hop-bytes)?

23
Conclusions
  • Introduced ChaNGa
  • Optimizations to reduce simulation time
  • Load imbalance issues tackled
  • Multiple timestepping beneficial
  • Balancing load in multistepped simulations
Write a Comment
User Comments (0)
About PowerShow.com