Title: Cosmic Microwave Background Data Analysis : From Time-Ordered Data To Power Spectra
1Cosmic Microwave Background Data Analysis From
Time-Ordered Data To Power Spectra
- Julian Borrill
- Computational Research Division, Berkeley Lab
- Space Sciences Laboratory, UC Berkeley
2CMB Science - I
- The CMB is a snapshot of the Universe when it
first became neutral 400,000 years after the Big
Bang.
Cosmic - filling all of space. Microwave -
redshifted by the expansion of the Universe from
3000K to 3K. Background - primordial photons
coming from behind all astrophysical sources.
3CMB Science - II
- The CMB is a unique probe of the very early
Universe. - Its tiny (1105-8) fluctuations carry information
about - - the fundamental parameters of cosmology
- - ultra-high energy physics beyond the Standard
Model
4CMB Science - III
- The new frontier in CMB research is polarization
- consistency check of temperature results
- re-ionization history of the Universe
- gravity wave production during inflation
- But polarization fluctuations are up to 3 orders
of magnitude fainter than temperature (we think)
requiring - many more detectors
- much longer observation times
- very careful analysis of very large datasets
5The Planck Satellite
- A joint ESA/NASA mission due to launch in fall
2007. - An 18 month all-sky survey at 9 microwave
frequencies from 30 to 857 GHz. - O(1012) observations, O(108) sky pixels,
O(104) spectral multipoles.
6Overview (I)
CMB analysis moves from the time domain -
observations to the pixel domain - maps to
the multipole domain - power spectra calculating
the compressed data and their reduced error bars
at each step.
7Overview (II)
- CMB data analysis typically proceeds in 4 steps
- - Pre-processing (deglitching, pointing,
calibrating). - - Estimating the time-domain noise statistics.
- - Estimating the map (and its errors).
- - Estimating the power spectra (and their
errors). - iterating looping as we learn more about the
data. -
- Then we can ask about the likelihoods of the
- parameters of any particular class of cosmologies.
8Goals
- To cover the
- basic mathematical formalism
- algorithms their scaling behaviours
- example implementation issues
- for map-making and power spectrum estimation.
-
- To consider how to extract the maximum amount of
information from the data, subject to practical
computational constraints. -
- To illustrate some of the computational issues
faced when analyzing very large datasets (eg.
Planck).
9Data Compression
- CMB data analysis is an exercise in data
compression - 1. Time-ordered data
- samples detectors x sampling rate x
duration - 70 x 200Hz x 18 months for
Planck - 2. (HEALPixelized) Maps
- pixels components x sky fraction x 12
nside2 - (3 - 6) x 1 x 12 x 40962 for
Planck - 3. Power Spectra
- bins spectra x multipoles / bin
resolution - 6 x (3 x 103) / 1 for Planck
10Data Parameters
Symbol Description Planck
Number of samples 5 x 1011
Noise bandwidth O(104)
Number of pixels 6 x 108
Number of spectra 6
Maximum multipole 3 x 103
Number of spectral bins 2 x 104
Number of iterations -
Number of realizations -
11Computational Constraints
- 1 GHz processor running at 100 efficiency for 1
day performs O(1014) operations. - 1 Gbyte of memory can hold O(108) element vector,
or O(104 x 104) matrix, in 64-bit precision. - Parallel (multiprocessor) computing increases the
operation count and memory limits. - Challenges to computational efficiency scaling
- - load balancing (work memory)
- - data-delivery, including communication I/O
12(No Transcript)
13Map-Making - Formalism (I)
- Consider data consisting of noise and sky-signal
- where the pointing matrix A encodes the weight of
each - pixel p in each sample t - for a total power
temperature - observation
- and the sky-signal sp is both beam pixel
smoothed.
14Map-Making - Formalism (II)
- Assume Gaussian noise with probability
distribution - and a time-time noise correlation matrix
- whose inverse is (piecewise) stationary
band-limited -
15Aside Noise Estimation
- To make the map we need the inverse time-time
noise correlations. - Approximate
- which requires the pure noise timestream.
- i) Assume nt dt
- ii) Solve for the map dp sp
- iii) Subtract the map from the data nt dt -
Atp dp - iv) Iterate.
16Map-Making - Formalism (III)
- Writing the noise in terms of the data and signal
maximizing its likelihood over all possible
signals gives the minimum variance map - with pixel-pixel noise correlations
- Taken together, these are a complete description
of the data.
17Map-Making - Algorithms (I)
- We want to solve the system
- Eg. (5 x 1011)2 x (6 x 108) 2 x 1032 for
Planck !
Equation Naive OpCount
18Map-Making - Algorithms (II)
- a) Exploit the structure of the matrices
- Pointing matrix is sparse
- Inverse noise correlation matrix is band-Toeplitz
- Associated matrix-matrix -vector multiplication
- scalings reduced from to
. - Eg. (5 x 1011) x 104 5 x 1015 for Planck.
19Map-Making - Algorithms (III)
- b) Replace explicit matrix inversion with an
iterative solver (eg. preconditioned conjugate
gradient) using repeated matrix-vector
multiplications - reducing the scaling from to
. - depends on the required solution accuracy
and the quality of the preconditioner (white
noise works well). - Eg. 30 x (6 x 108)2 1019 for Planck.
20Map-Making - Algorithms (IV)
- c) Leave the inverse pixel-pixel noise matrix in
implicit form -
- Now each multiplication takes operations
in pixel space, or operations
in fourier space. - Eg. 30 x (5 x 1011) x log2 104 2 x 1014 for
Planck. - But this gives no information about the
pixel-pixel noise correlations (ie. errors).
21Map-Making - Extensions (I)
- For polarization data the signal can be written
in terms of the i, q u Stokes parameters and
the angle ??of the polarizer relative to some
chosen coordinate frame - where Atp now has 3 non-zero entries per row.
- We need at least 3 observation-angles to separate
i, q u.
22Map-Making - Extensions (II)
- If the data includes a sky-asynchronous
contaminating - signal (eg. MAXIMAs chopper-synchronous signal)
- This can be extended to any number of
contaminants, - including relative offsets between sub-maps.
23Map-Making - Extensions (III)
- If the data includes a sky-synchronous
contaminating signal then more information is
needed. - Given multi-frequency data with foreground
- With detectors at d different frequencies this
can be extended to (d-1) different foregrounds.
24Map-Making - Implementation (I)
- We want to be able to calculate the inverse
pixel-pixel - noise correlation matrix
- because it
- - encodes the error information
- - is sparse, so can be saved even for huge data
- - provides a good (block) diagonal
preconditioner i/q/u pixel degeneracy test
with white noise.
25Map-Making - Implementation (II)
- Both the time- pixel-domain data must be
distributed. - We want to balance simultaneously both the work-
the memory-load per processor. - Balanced distribution in one basis is unbalanced
in other - uniform work/observation but varied work/pixel.
- unequal numbers of pixels observed/interval.
- No perfect solution - have to accept the least
limiting - imbalance (pixel memory).
26Map-Making - Implementation (III)
- Pseudo-code
- Initialize
- For each time t
- For each pixel p observed at time t with weight
w - For each time t within of t
- For each pixel p observed at time t with
weight w - Accumulate
- Doesnt work well - pixel-memory access too
random.
27Map-Making - Implementation (IV)
- Alternative pseudo-code
- For each pixel p find the times at which it is
observed. - For each pixel p
- Initialize
- For each time t at which p is observed with
weight w - For each time t within of t
- For each pixel p observed at time t
with weight w - Accumulate
28Map-Making - Implementation (V)
- Distribute time-ordered data equally over
processors. - Each processor accumulates part of Npp-1 in
sparse form - - only over pixels observed in its time
interval - - only over some of the observations of these
pixels - Then all of these partial matrices, each stored
in its own sparse form - (p, z, (p1, n1), , (pz, nz)), , (p, z,
(p1, n1), , (pz, nz)) - have to be combined.
29Map-Making - Conclusions
- The maximum-likelihood map only can be calculated
in operations - O(1014) for Planck. - The sparse inverse pixel-pixel correlation matrix
can be calculated in operations -
O(1015) for Planck. - A single column of the pixel-pixel correlation
matrix can be calculated (just like a map) in
operations - O(1014) for
Planck. - Computational efficiency only 5 serial/small
data, 0.5 parallel/large data.
30(No Transcript)
31Power Spectrum Estimation
- From the map-making we have
- the pixelized data-vector dp containing the CMB
signal(s) plus any other separated components.
(ii) some representation of the pixel-pixel noise
correlation matrix Npp - explicit, explicit
inverse, or implicit inverse. Truncation of Npp
(only) is equivalent to marginalization.
32Power Spectrum - Formalism (I)
- Given a map of noise Gaussian CMB signal
- with correlations
- and - assuming a uniform prior - probability
distribution
33Power Spectrum - Formalism (II)
- Assuming azimuthal symmetry of the CMB, its
pixel-pixel correlations depend only on the angle
between the pair.
3 components ? 6 spectra (3 auto 3 cross).
Theory predicts CTB CEB 0 but this
should still be calculated/confirmed (test of
systematics).
34Power Spectrum - Formalism (III)
- Eg temperature signal correlations
- with beampixel window
- For binned multipoles
35Power Spectrum - Formalism (IV)
- No analytic likelihood maximization - use
iterative search. - Newton-Raphson requires two derivatives wrt bin
powers
36Power Spectrum - Algorithms (I)
- For each NR iteration we want to solve the
system
Equation Naive OpCount
Eg. (2 x 104) x (6 x 108)3 4 x 1030
operations/iteration 8 x (6 x 108)2 3 x
1018 bytes/matrix for Planck !
37Power Spectrum - Algorithms (II)
- Exploiting the signal derivative matrices
structures - - reduces the operation count by up to a factor
of 10. - - increases the memory needed from 2 to 3
matrices.
38Power Spectrum - Algorithms (III)
- b) Using PCG for D-1 trace approximant
- for dL/dC d2L/dC2 terms.
- scaling as
- Eg. 5 x 104 x (2 x 104) x 103 x (5 x 1011) x
10 - 5 x 1024 for Planck.
39Power Spectrum - Algorithms (IV)
- c) Abandoning maximum likelihood altogether !
- Eg. Take SHT of noisy, cut-sky map.
- Build a pseudo-spectrum
-
- Assume spectral independence
- an invertible linear transfer function
-
40Power Spectrum - Algorithms (VI)
- Now use Monte-Carlo simulations of the
experiments observations of signalnoise noise
only to determine the transfer matrix Tll the
noise spectrum Cln, and hence the signal spectrum
Cls. -
- Scales as
- Eg. 104 x 30 x 5 x 1011 x 10 1018 for Planck.
-
- And provided we simulate time-ordered data, any
other abuses (filtering etc) can be included in
the Monte Carlo.
41Power Spectrum - Implementation (I)
- Even with supercomputers full maximum likelihood
analysis - is unfeasible for more than O(105) pixels.
- However, it is still useful for
- low-resolution all-sky maps (eg. for low
multipoles where Monte Carlo methods are less
reliable). - high-resolution cut-sky maps (eg. for consistency
systematics cross-checks). - hierarchical resolution/sky-fraction analyses.
42Power Spectrum - Implementation (II)
- Build the dS/dCb matrices.
- Make initial guess Cb
- Calculate D N Cb dS/dCb invert.
- For each b, calculate Wb D-1 dS/dCb.
- For each b, calculate dL/dCb d Wb z - TrWb
- For each b, b calculate Fbb TrWb Wb
- Calculate new Cb Fbb-1 dL/dCb.
- Iterate to convergence.
- Using as many processors as possible, as
efficiently as possible.
43Power Spectrum - Implementation (III)
- The limiting calculation is
- so it sets the constraints for the overall
implementation. -
- Maximum efficiency requires minimum communication
- keep data as dense as possible on the processors
- 3 matrices fill available memory
-
- Check other steps - none requires more than 3
matrices. - Out-of-core implementation, so I/O efficiency is
critical.
44Power Spectrum - Implementation (IV)
- Dense data requirement sets the processor count
- PE 3 x 8 x Np2 / bytes per processor
- Eg. 3 x 8 x (105)2 / 109 240,
- but we want to scale to very large systems.
- Each bin multiplication is independent - divide
the processors into gangs do many Wb
simultaneously. - Derivatives terms are bin-parallel too
- dL/dCb trivial
- d2L/dCbdCb requires some load-balancing
45Aside Fisher Matrix (I)
- Use TrAB Aij Bji
- ATji Bji
- For all my gangs b
- Read Wb
- Transpose WTb
- WTb Wb ? Fbb
- For all b gt b
- Read Wb over Wb
- WTb Wb ? Fbb
2 matrices in memory Nb(Nb1)/2 reads.
46Aside Fisher Matrix (II)
For all my gangs b Read Wb Transpose WTb
WTb Wb ? Fbb Read Wb1 over Wb WTb Wb1?
Fbb1 Transpose WTb1 For all b gt b1
Read Wb over Wb1 WTb Wb ? Fbb WTb1
Wb ? Fb1 b
3 matrices in memory Nb(Nb2)/4 reads
47Power Spectrum - Implementation (V)
- All gangs need dS/dCb D-1 first but these terms
do not bin-parallelize. -
- Either - calculate them using all processors
re-map - or - calculate them on 1 gang leave
others idle - Balance re-mapping/idling cost against multi-gang
benefit optimal WT2C depends on architecture
data parameters. -
- For large data with many bins efficient
communication network, sustain 50 peak on
thousands of processors.
48Aside Parameterizing Architectures
- Portability is crucial, but architectures
differ. - Identify critical components parameterize
limits. - Tune codes to each architecture.
-
- Eg. ScaLAPACK - block-cyclic matrix distribution
is most efficient data layout (mapped to re-use
pattern). - Hardware limit is cache size - set blocksize so
that requisite number of blocks fit fill cache. - Eg. Input/Output - how many processors can
efficiently read/write simultaneously (contention
vs idling) ?
49Conclusions
- Maximum likelihood spectral estimation cannot
handle high-resolution all-sky observation, but
Monte Carlo methods can (albeit with questions at
low-l). - For sufficiently cut-sky or low-resolution maps
ML methods are feasible, and allow complete
error-tracking. - Dense matrix-matrix kernels achieve 50 of peak
performance (cf. 5 for FFT SHT). - Inefficient implementation of data-delivery can
reduce these by an order of magnitude or more
re-use each data unit as much as possible before
moving to the next. - Things are only going to get harder !