Title: Large scale simulations of astrophysical turbulence
1Large scale simulations of astrophysical
turbulence
- Axel Brandenburg (Nordita, Copenhagen)
- Wolfgang Dobler (Univ. Calgary)
- Anders Johansen (MPIA, Heidelberg)
- Antony Mee (Univ. Newcastle)
- Nils Haugen (NTNU, Trondheim)
- etc.
(...just google for Pencil Code)
2Pencil Code
- Started in Sept. 2001 with Wolfgang Dobler
- High order (6th order in space, 3rd order in
time) - Cache memory efficient
- MPI, can run PacxMPI (across countries!)
- Maintained/developed by many people (CVS!)
- Automatic validation (over night or any time)
- Max resolution so far 10243 , 256 procs
3Pencil formulation
- In CRAY days worked with full chunks
f(nx,ny,nz,nvar) - Now, on SGI, nearly 100 cache misses
- Instead work with f(nx,nvar), i.e. one nx-pencil
- No cache misses, negligible work space, just 2N
- Can keep all components of derivative tensors
- Communication before sub-timestep
- Then evaluate all derivatives, e.g. call
curl(f,iA,B) - Vector potential Af(,,,iAxiAz), BB(nx,3)
4Switch modules
- magnetic or nomagnetic (e.g. just hydro)
- hydro or nohydro (e.g. kinematic dynamo)
- density or nodensity (burgulence)
- entropy or noentropy (e.g. isothermal)
- radiation or noradiation (solar convection,
discs) - dustvelocity or nodustvelocity (planetesimals)
- Coagulation, reaction equations
- Homochirality (reaction-diffusion-advection
equations)
Other physics modules MHD, radiation, partial
ionization, chemical reactions, selfgravity
5Pencil Code check-ins
6High-order schemes
- Alternative to spectral or compact schemes
- Efficiently parallelized, no transpose necessary
- No restriction on boundary conditions
- Curvilinear coordinates possible (except for
singularities) - 6th order central differences in space
- Non-conservative scheme
- Allows use of logarithmic density and entropy
- Copes well with strong stratification and
temperature contrasts
7(i) High-order spatial schemes
Main advantage low phase errors
8Wavenumber characteristics
9Higher order less viscosity
10Less viscosity also in shocks
11(ii) High-order temporal schemes
Main advantage low amplitude errors
2N-RK3 scheme (Williamson 1980)
2nd order
3rd order
1st order
12Shock tube test
13Hyperviscous, Smagorinsky, normal
height of bottleneck increased
Haugen Brandenburg (PRE, astro-ph/0402301)
onset of bottleneck at same position
Inertial range unaffected by artificial diffusion
14256 processor run at 10243
15MHD equations
Magn. Vector potential
Induction Equation
Momentum and Continuity eqns
16Vector potential
- BcurlA, advantage divB0
- JcurlBcurl(curlA) curl2A
- Not a disadvantage consider Alfven waves
B-formulation
A-formulation
2nd der once is better than 1st der twice!
17Comparison of A and B methods
18Wallclock time versus processor
nearly linear Scaling 100 Mb/s
shows limitations 1 - 10 Gb/s no limitation
19Sensitivity to layout onLinux clusters
Gigabit uplink
100 Mbit link only
- yprox x zproc
- 4 x 32 ? 1 (speed)
- 8 x 16 ? 3 times slower
- 16 x 8 ? 17 times slower
24 procs per hub
20Why this sensitivity to layout?
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
6 7 8 9 0 1 2 3 4
All processors need to communicate with
processors outside to group of 24
21Use exactly 4 columns
Only 2 x 4 8 processors need to communicate
outside the group of 24 ? optimal use of speed
ratio between 100 Mb ethernet switch and 1 Gb
uplink
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
22Fragmentation over many switches
23Pre-processed data for animations
24Ma10 supersonic turbulence
25Animation of B vectors
26Animation of energy spectra
Very long run at 5123 resolution
27MRI turbulenceMRI magnetorotational instability
2563 w/o hypervisc. t 600 20 orbits
5123 w/o hypervisc. Dt 60 2 orbits
28Fully convective star
29Geodynamo simulation
30Homochirality competition of left/right
Reaction-diffusion equation
31Conclusions
- Subgrid scale modeling can be unsafe (some
problems) - shallower spectra, longer time scales, different
saturation amplitudes (in helical dynamos) - High order schemes
- Low phase and amplitude errors
- Need less viscosity
- 100 MB link close to bandwidth limit
- Comparable to and now faster than Origin
- 2x faster with GB switch
- 100 MB switches with GB uplink /- optimal
32Transfer equation parallelization
Processors
Analytic Solution
33The Transfer Equation Parallelization
Processors
34The Transfer Equation Parallelization
Processors
35Current implementation
- Plasma composed of H and He
- Only hydrogen ionization
- Only H- opacity, calculated analytically
- No need for look-up tables
- Ray directions determined by grid geometry
- No interpolation is needed
36Convection with radiation