Cosmic Microwave Background Data Analysis : From Time-Ordered Data To Power Spectra - PowerPoint PPT Presentation

About This Presentation
Title:

Cosmic Microwave Background Data Analysis : From Time-Ordered Data To Power Spectra

Description:

load balancing (work & memory) - data-delivery, including communication & I/O ... encodes the error information - is sparse, so can be saved even for huge data ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 50
Provided by: JulianB3
Category:

less

Transcript and Presenter's Notes

Title: Cosmic Microwave Background Data Analysis : From Time-Ordered Data To Power Spectra


1
Cosmic Microwave Background Data Analysis From
Time-Ordered Data To Power Spectra
  • Julian Borrill
  • Computational Research Division, Berkeley Lab
  • Space Sciences Laboratory, UC Berkeley

2
CMB Science - I
  • The CMB is a snapshot of the Universe when it
    first became neutral 400,000 years after the Big
    Bang.

Cosmic - filling all of space. Microwave -
redshifted by the expansion of the Universe from
3000K to 3K. Background - primordial photons
coming from behind all astrophysical sources.
3
CMB Science - II
  • The CMB is a unique probe of the very early
    Universe.
  • Its tiny (1105-8) fluctuations carry information
    about
  • - the fundamental parameters of cosmology
  • - ultra-high energy physics beyond the Standard
    Model

4
CMB Science - III
  • The new frontier in CMB research is polarization
  • consistency check of temperature results
  • re-ionization history of the Universe
  • gravity wave production during inflation
  • But polarization fluctuations are up to 3 orders
    of magnitude fainter than temperature (we think)
    requiring
  • many more detectors
  • much longer observation times
  • very careful analysis of very large datasets

5
The Planck Satellite
  • A joint ESA/NASA mission due to launch in fall
    2007.
  • An 18 month all-sky survey at 9 microwave
    frequencies from 30 to 857 GHz.
  • O(1012) observations, O(108) sky pixels,
    O(104) spectral multipoles.

6
Overview (I)
CMB analysis moves from the time domain -
observations to the pixel domain - maps to
the multipole domain - power spectra calculating
the compressed data and their reduced error bars
at each step.
7
Overview (II)
  • CMB data analysis typically proceeds in 4 steps
  • - Pre-processing (deglitching, pointing,
    calibrating).
  • - Estimating the time-domain noise statistics.
  • - Estimating the map (and its errors).
  • - Estimating the power spectra (and their
    errors).
  • iterating looping as we learn more about the
    data.
  • Then we can ask about the likelihoods of the
  • parameters of any particular class of cosmologies.

8
Goals
  • To cover the
  • basic mathematical formalism
  • algorithms their scaling behaviours
  • example implementation issues
  • for map-making and power spectrum estimation.
  • To consider how to extract the maximum amount of
    information from the data, subject to practical
    computational constraints.
  • To illustrate some of the computational issues
    faced when analyzing very large datasets (eg.
    Planck).

9
Data Compression
  • CMB data analysis is an exercise in data
    compression
  • 1. Time-ordered data
  • samples detectors x sampling rate x
    duration
  • 70 x 200Hz x 18 months for
    Planck
  • 2. (HEALPixelized) Maps
  • pixels components x sky fraction x 12
    nside2
  • (3 - 6) x 1 x 12 x 40962 for
    Planck
  • 3. Power Spectra
  • bins spectra x multipoles / bin
    resolution
  • 6 x (3 x 103) / 1 for Planck

10
Data Parameters
Symbol Description Planck
Number of samples 5 x 1011
Noise bandwidth O(104)
Number of pixels 6 x 108
Number of spectra 6
Maximum multipole 3 x 103
Number of spectral bins 2 x 104
Number of iterations -
Number of realizations -
11
Computational Constraints
  • 1 GHz processor running at 100 efficiency for 1
    day performs O(1014) operations.
  • 1 Gbyte of memory can hold O(108) element vector,
    or O(104 x 104) matrix, in 64-bit precision.
  • Parallel (multiprocessor) computing increases the
    operation count and memory limits.
  • Challenges to computational efficiency scaling
  • - load balancing (work memory)
  • - data-delivery, including communication I/O

12
(No Transcript)
13
Map-Making - Formalism (I)
  • Consider data consisting of noise and sky-signal
  • where the pointing matrix A encodes the weight of
    each
  • pixel p in each sample t - for a total power
    temperature
  • observation
  • and the sky-signal sp is both beam pixel
    smoothed.

14
Map-Making - Formalism (II)
  • Assume Gaussian noise with probability
    distribution
  • and a time-time noise correlation matrix
  • whose inverse is (piecewise) stationary
    band-limited

15
Aside Noise Estimation
  • To make the map we need the inverse time-time
    noise correlations.
  • Approximate
  • which requires the pure noise timestream.
  • i) Assume nt dt
  • ii) Solve for the map dp sp
  • iii) Subtract the map from the data nt dt -
    Atp dp
  • iv) Iterate.

16
Map-Making - Formalism (III)
  • Writing the noise in terms of the data and signal
    maximizing its likelihood over all possible
    signals gives the minimum variance map
  • with pixel-pixel noise correlations
  • Taken together, these are a complete description
    of the data.

17
Map-Making - Algorithms (I)
  • We want to solve the system
  • Eg. (5 x 1011)2 x (6 x 108) 2 x 1032 for
    Planck !

Equation Naive OpCount




18
Map-Making - Algorithms (II)
  • a) Exploit the structure of the matrices
  • Pointing matrix is sparse
  • Inverse noise correlation matrix is band-Toeplitz
  • Associated matrix-matrix -vector multiplication
  • scalings reduced from to
    .
  • Eg. (5 x 1011) x 104 5 x 1015 for Planck.

19
Map-Making - Algorithms (III)
  • b) Replace explicit matrix inversion with an
    iterative solver (eg. preconditioned conjugate
    gradient) using repeated matrix-vector
    multiplications
  • reducing the scaling from to
    .
  • depends on the required solution accuracy
    and the quality of the preconditioner (white
    noise works well).
  • Eg. 30 x (6 x 108)2 1019 for Planck.

20
Map-Making - Algorithms (IV)
  • c) Leave the inverse pixel-pixel noise matrix in
    implicit form
  • Now each multiplication takes operations
    in pixel space, or operations
    in fourier space.
  • Eg. 30 x (5 x 1011) x log2 104 2 x 1014 for
    Planck.
  • But this gives no information about the
    pixel-pixel noise correlations (ie. errors).

21
Map-Making - Extensions (I)
  • For polarization data the signal can be written
    in terms of the i, q u Stokes parameters and
    the angle ??of the polarizer relative to some
    chosen coordinate frame
  • where Atp now has 3 non-zero entries per row.
  • We need at least 3 observation-angles to separate
    i, q u.

22
Map-Making - Extensions (II)
  • If the data includes a sky-asynchronous
    contaminating
  • signal (eg. MAXIMAs chopper-synchronous signal)
  • This can be extended to any number of
    contaminants,
  • including relative offsets between sub-maps.

23
Map-Making - Extensions (III)
  • If the data includes a sky-synchronous
    contaminating signal then more information is
    needed.
  • Given multi-frequency data with foreground
  • With detectors at d different frequencies this
    can be extended to (d-1) different foregrounds.

24
Map-Making - Implementation (I)
  • We want to be able to calculate the inverse
    pixel-pixel
  • noise correlation matrix
  • because it
  • - encodes the error information
  • - is sparse, so can be saved even for huge data
  • - provides a good (block) diagonal
    preconditioner i/q/u pixel degeneracy test
    with white noise.

25
Map-Making - Implementation (II)
  • Both the time- pixel-domain data must be
    distributed.
  • We want to balance simultaneously both the work-
    the memory-load per processor.
  • Balanced distribution in one basis is unbalanced
    in other
  • uniform work/observation but varied work/pixel.
  • unequal numbers of pixels observed/interval.
  • No perfect solution - have to accept the least
    limiting
  • imbalance (pixel memory).

26
Map-Making - Implementation (III)
  • Pseudo-code
  • Initialize
  • For each time t
  • For each pixel p observed at time t with weight
    w
  • For each time t within of t
  • For each pixel p observed at time t with
    weight w
  • Accumulate
  • Doesnt work well - pixel-memory access too
    random.

27
Map-Making - Implementation (IV)
  • Alternative pseudo-code
  • For each pixel p find the times at which it is
    observed.
  • For each pixel p
  • Initialize
  • For each time t at which p is observed with
    weight w
  • For each time t within of t
  • For each pixel p observed at time t
    with weight w
  • Accumulate

28
Map-Making - Implementation (V)
  • Distribute time-ordered data equally over
    processors.
  • Each processor accumulates part of Npp-1 in
    sparse form
  • - only over pixels observed in its time
    interval
  • - only over some of the observations of these
    pixels
  • Then all of these partial matrices, each stored
    in its own sparse form
  • (p, z, (p1, n1), , (pz, nz)), , (p, z,
    (p1, n1), , (pz, nz))
  • have to be combined.

29
Map-Making - Conclusions
  • The maximum-likelihood map only can be calculated
    in operations - O(1014) for Planck.
  • The sparse inverse pixel-pixel correlation matrix
    can be calculated in operations -
    O(1015) for Planck.
  • A single column of the pixel-pixel correlation
    matrix can be calculated (just like a map) in
    operations - O(1014) for
    Planck.
  • Computational efficiency only 5 serial/small
    data, 0.5 parallel/large data.

30
(No Transcript)
31
Power Spectrum Estimation
  • From the map-making we have
  • the pixelized data-vector dp containing the CMB
    signal(s) plus any other separated components.

(ii) some representation of the pixel-pixel noise
correlation matrix Npp - explicit, explicit
inverse, or implicit inverse. Truncation of Npp
(only) is equivalent to marginalization.
32
Power Spectrum - Formalism (I)
  • Given a map of noise Gaussian CMB signal
  • with correlations
  • and - assuming a uniform prior - probability
    distribution

33
Power Spectrum - Formalism (II)
  • Assuming azimuthal symmetry of the CMB, its
    pixel-pixel correlations depend only on the angle
    between the pair.

3 components ? 6 spectra (3 auto 3 cross).
Theory predicts CTB CEB 0 but this
should still be calculated/confirmed (test of
systematics).
34
Power Spectrum - Formalism (III)
  • Eg temperature signal correlations
  • with beampixel window
  • For binned multipoles

35
Power Spectrum - Formalism (IV)
  • No analytic likelihood maximization - use
    iterative search.
  • Newton-Raphson requires two derivatives wrt bin
    powers

36
Power Spectrum - Algorithms (I)
  • For each NR iteration we want to solve the
    system

Equation Naive OpCount




Eg. (2 x 104) x (6 x 108)3 4 x 1030
operations/iteration 8 x (6 x 108)2 3 x
1018 bytes/matrix for Planck !
37
Power Spectrum - Algorithms (II)
  • Exploiting the signal derivative matrices
    structures
  • - reduces the operation count by up to a factor
    of 10.
  • - increases the memory needed from 2 to 3
    matrices.

38
Power Spectrum - Algorithms (III)
  • b) Using PCG for D-1 trace approximant
  • for dL/dC d2L/dC2 terms.
  • scaling as
  • Eg. 5 x 104 x (2 x 104) x 103 x (5 x 1011) x
    10
  • 5 x 1024 for Planck.

39
Power Spectrum - Algorithms (IV)
  • c) Abandoning maximum likelihood altogether !
  • Eg. Take SHT of noisy, cut-sky map.
  • Build a pseudo-spectrum
  • Assume spectral independence
  • an invertible linear transfer function

40
Power Spectrum - Algorithms (VI)
  • Now use Monte-Carlo simulations of the
    experiments observations of signalnoise noise
    only to determine the transfer matrix Tll the
    noise spectrum Cln, and hence the signal spectrum
    Cls.
  • Scales as
  • Eg. 104 x 30 x 5 x 1011 x 10 1018 for Planck.
  • And provided we simulate time-ordered data, any
    other abuses (filtering etc) can be included in
    the Monte Carlo.

41
Power Spectrum - Implementation (I)
  • Even with supercomputers full maximum likelihood
    analysis
  • is unfeasible for more than O(105) pixels.
  • However, it is still useful for
  • low-resolution all-sky maps (eg. for low
    multipoles where Monte Carlo methods are less
    reliable).
  • high-resolution cut-sky maps (eg. for consistency
    systematics cross-checks).
  • hierarchical resolution/sky-fraction analyses.

42
Power Spectrum - Implementation (II)
  • Build the dS/dCb matrices.
  • Make initial guess Cb
  • Calculate D N Cb dS/dCb invert.
  • For each b, calculate Wb D-1 dS/dCb.
  • For each b, calculate dL/dCb d Wb z - TrWb
  • For each b, b calculate Fbb TrWb Wb
  • Calculate new Cb Fbb-1 dL/dCb.
  • Iterate to convergence.
  • Using as many processors as possible, as
    efficiently as possible.

43
Power Spectrum - Implementation (III)
  • The limiting calculation is
  • so it sets the constraints for the overall
    implementation.
  • Maximum efficiency requires minimum communication
  • keep data as dense as possible on the processors
  • 3 matrices fill available memory
  • Check other steps - none requires more than 3
    matrices.
  • Out-of-core implementation, so I/O efficiency is
    critical.

44
Power Spectrum - Implementation (IV)
  • Dense data requirement sets the processor count
  • PE 3 x 8 x Np2 / bytes per processor
  • Eg. 3 x 8 x (105)2 / 109 240,
  • but we want to scale to very large systems.
  • Each bin multiplication is independent - divide
    the processors into gangs do many Wb
    simultaneously.
  • Derivatives terms are bin-parallel too
  • dL/dCb trivial
  • d2L/dCbdCb requires some load-balancing

45
Aside Fisher Matrix (I)
  • Use TrAB Aij Bji
  • ATji Bji
  • For all my gangs b
  • Read Wb
  • Transpose WTb
  • WTb Wb ? Fbb
  • For all b gt b
  • Read Wb over Wb
  • WTb Wb ? Fbb

2 matrices in memory Nb(Nb1)/2 reads.
46
Aside Fisher Matrix (II)
For all my gangs b Read Wb Transpose WTb
WTb Wb ? Fbb Read Wb1 over Wb WTb Wb1?
Fbb1 Transpose WTb1 For all b gt b1
Read Wb over Wb1 WTb Wb ? Fbb WTb1
Wb ? Fb1 b
3 matrices in memory Nb(Nb2)/4 reads
47
Power Spectrum - Implementation (V)
  • All gangs need dS/dCb D-1 first but these terms
    do not bin-parallelize.
  • Either - calculate them using all processors
    re-map
  • or - calculate them on 1 gang leave
    others idle
  • Balance re-mapping/idling cost against multi-gang
    benefit optimal WT2C depends on architecture
    data parameters.
  • For large data with many bins efficient
    communication network, sustain 50 peak on
    thousands of processors.

48
Aside Parameterizing Architectures
  • Portability is crucial, but architectures
    differ.
  • Identify critical components parameterize
    limits.
  • Tune codes to each architecture.
  • Eg. ScaLAPACK - block-cyclic matrix distribution
    is most efficient data layout (mapped to re-use
    pattern).
  • Hardware limit is cache size - set blocksize so
    that requisite number of blocks fit fill cache.
  • Eg. Input/Output - how many processors can
    efficiently read/write simultaneously (contention
    vs idling) ?

49
Conclusions
  • Maximum likelihood spectral estimation cannot
    handle high-resolution all-sky observation, but
    Monte Carlo methods can (albeit with questions at
    low-l).
  • For sufficiently cut-sky or low-resolution maps
    ML methods are feasible, and allow complete
    error-tracking.
  • Dense matrix-matrix kernels achieve 50 of peak
    performance (cf. 5 for FFT SHT).
  • Inefficient implementation of data-delivery can
    reduce these by an order of magnitude or more
    re-use each data unit as much as possible before
    moving to the next.
  • Things are only going to get harder !
Write a Comment
User Comments (0)
About PowerShow.com