Cosmic Microwave Background Data Analysis : From Time-Ordered Data To Power Spectra

About This Presentation

Title:

Cosmic Microwave Background Data Analysis : From Time-Ordered Data To Power Spectra

Description:

load balancing (work & memory) - data-delivery, including communication & I/O ... encodes the error information - is sparse, so can be saved even for huge data ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 50

Provided by: JulianB3

Learn more at: https://people.eecs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Cosmic Microwave Background Data Analysis : From Time-Ordered Data To Power Spectra

1
Cosmic Microwave Background Data Analysis From
Time-Ordered Data To Power Spectra

Julian Borrill
Computational Research Division, Berkeley Lab
Space Sciences Laboratory, UC Berkeley

2
CMB Science - I

The CMB is a snapshot of the Universe when it
first became neutral 400,000 years after the Big
Bang.

Cosmic - filling all of space. Microwave -
redshifted by the expansion of the Universe from
3000K to 3K. Background - primordial photons
coming from behind all astrophysical sources.
3
CMB Science - II

The CMB is a unique probe of the very early
Universe.
Its tiny (1105-8) fluctuations carry information
about
- the fundamental parameters of cosmology
- ultra-high energy physics beyond the Standard
Model

4
CMB Science - III

The new frontier in CMB research is polarization
consistency check of temperature results
re-ionization history of the Universe
gravity wave production during inflation
But polarization fluctuations are up to 3 orders
of magnitude fainter than temperature (we think)
requiring
many more detectors
much longer observation times
very careful analysis of very large datasets

5
The Planck Satellite

A joint ESA/NASA mission due to launch in fall
2007.
An 18 month all-sky survey at 9 microwave
frequencies from 30 to 857 GHz.
O(1012) observations, O(108) sky pixels,
O(104) spectral multipoles.

6
Overview (I)
CMB analysis moves from the time domain -
observations to the pixel domain - maps to
the multipole domain - power spectra calculating
the compressed data and their reduced error bars
at each step.
7
Overview (II)

CMB data analysis typically proceeds in 4 steps
- Pre-processing (deglitching, pointing,
calibrating).
- Estimating the time-domain noise statistics.
- Estimating the map (and its errors).
- Estimating the power spectra (and their
errors).
iterating looping as we learn more about the
data.
Then we can ask about the likelihoods of the
parameters of any particular class of cosmologies.

8
Goals

To cover the
basic mathematical formalism
algorithms their scaling behaviours
example implementation issues
for map-making and power spectrum estimation.
To consider how to extract the maximum amount of
information from the data, subject to practical
computational constraints.
To illustrate some of the computational issues
faced when analyzing very large datasets (eg.
Planck).

9
Data Compression

CMB data analysis is an exercise in data
compression
1. Time-ordered data
samples detectors x sampling rate x
duration
70 x 200Hz x 18 months for
Planck
2. (HEALPixelized) Maps
pixels components x sky fraction x 12
nside2
(3 - 6) x 1 x 12 x 40962 for
Planck
3. Power Spectra
bins spectra x multipoles / bin
resolution
6 x (3 x 103) / 1 for Planck

10
Data Parameters
Symbol Description Planck
Number of samples 5 x 1011
Noise bandwidth O(104)
Number of pixels 6 x 108
Number of spectra 6
Maximum multipole 3 x 103
Number of spectral bins 2 x 104
Number of iterations -
Number of realizations -
11
Computational Constraints

1 GHz processor running at 100 efficiency for 1
day performs O(1014) operations.
1 Gbyte of memory can hold O(108) element vector,
or O(104 x 104) matrix, in 64-bit precision.
Parallel (multiprocessor) computing increases the
operation count and memory limits.
Challenges to computational efficiency scaling
- load balancing (work memory)
- data-delivery, including communication I/O

12
(No Transcript)
13
Map-Making - Formalism (I)

Consider data consisting of noise and sky-signal
where the pointing matrix A encodes the weight of
each
pixel p in each sample t - for a total power
temperature
observation
and the sky-signal sp is both beam pixel
smoothed.

14
Map-Making - Formalism (II)

Assume Gaussian noise with probability
distribution
and a time-time noise correlation matrix
whose inverse is (piecewise) stationary
band-limited

15
Aside Noise Estimation

To make the map we need the inverse time-time
noise correlations.
Approximate
which requires the pure noise timestream.
i) Assume nt dt
ii) Solve for the map dp sp
iii) Subtract the map from the data nt dt -
Atp dp
iv) Iterate.

16
Map-Making - Formalism (III)

Writing the noise in terms of the data and signal
maximizing its likelihood over all possible
signals gives the minimum variance map
with pixel-pixel noise correlations
Taken together, these are a complete description
of the data.

17
Map-Making - Algorithms (I)

We want to solve the system
Eg. (5 x 1011)2 x (6 x 108) 2 x 1032 for
Planck !

Equation Naive OpCount

18
Map-Making - Algorithms (II)

a) Exploit the structure of the matrices
Pointing matrix is sparse
Inverse noise correlation matrix is band-Toeplitz
Associated matrix-matrix -vector multiplication
scalings reduced from to
.
Eg. (5 x 1011) x 104 5 x 1015 for Planck.

19
Map-Making - Algorithms (III)

b) Replace explicit matrix inversion with an
iterative solver (eg. preconditioned conjugate
gradient) using repeated matrix-vector
multiplications
reducing the scaling from to
.
depends on the required solution accuracy
and the quality of the preconditioner (white
noise works well).
Eg. 30 x (6 x 108)2 1019 for Planck.

20
Map-Making - Algorithms (IV)

c) Leave the inverse pixel-pixel noise matrix in
implicit form
Now each multiplication takes operations
in pixel space, or operations
in fourier space.
Eg. 30 x (5 x 1011) x log2 104 2 x 1014 for
Planck.
But this gives no information about the
pixel-pixel noise correlations (ie. errors).

21
Map-Making - Extensions (I)

For polarization data the signal can be written
in terms of the i, q u Stokes parameters and
the angle ??of the polarizer relative to some
chosen coordinate frame
where Atp now has 3 non-zero entries per row.
We need at least 3 observation-angles to separate
i, q u.

22
Map-Making - Extensions (II)

If the data includes a sky-asynchronous
contaminating
signal (eg. MAXIMAs chopper-synchronous signal)
This can be extended to any number of
contaminants,
including relative offsets between sub-maps.

23
Map-Making - Extensions (III)

If the data includes a sky-synchronous
contaminating signal then more information is
needed.
Given multi-frequency data with foreground
With detectors at d different frequencies this
can be extended to (d-1) different foregrounds.

24
Map-Making - Implementation (I)

We want to be able to calculate the inverse
pixel-pixel
noise correlation matrix
because it
- encodes the error information
- is sparse, so can be saved even for huge data
- provides a good (block) diagonal
preconditioner i/q/u pixel degeneracy test
with white noise.

25
Map-Making - Implementation (II)

Both the time- pixel-domain data must be
distributed.
We want to balance simultaneously both the work-
the memory-load per processor.
Balanced distribution in one basis is unbalanced
in other
uniform work/observation but varied work/pixel.
unequal numbers of pixels observed/interval.
No perfect solution - have to accept the least
limiting
imbalance (pixel memory).

26
Map-Making - Implementation (III)

Pseudo-code
Initialize
For each time t
For each pixel p observed at time t with weight
w
For each time t within of t
For each pixel p observed at time t with
weight w
Accumulate
Doesnt work well - pixel-memory access too
random.

27
Map-Making - Implementation (IV)

Alternative pseudo-code
For each pixel p find the times at which it is
observed.
For each pixel p
Initialize
For each time t at which p is observed with
weight w
For each time t within of t
For each pixel p observed at time t
with weight w
Accumulate

28
Map-Making - Implementation (V)

Distribute time-ordered data equally over
processors.
Each processor accumulates part of Npp-1 in
sparse form
- only over pixels observed in its time
interval
- only over some of the observations of these
pixels
Then all of these partial matrices, each stored
in its own sparse form
(p, z, (p1, n1), , (pz, nz)), , (p, z,
(p1, n1), , (pz, nz))
have to be combined.

29
Map-Making - Conclusions

The maximum-likelihood map only can be calculated
in operations - O(1014) for Planck.
The sparse inverse pixel-pixel correlation matrix
can be calculated in operations -
O(1015) for Planck.
A single column of the pixel-pixel correlation
matrix can be calculated (just like a map) in
operations - O(1014) for
Planck.
Computational efficiency only 5 serial/small
data, 0.5 parallel/large data.

30
(No Transcript)
31
Power Spectrum Estimation

From the map-making we have
the pixelized data-vector dp containing the CMB
signal(s) plus any other separated components.

(ii) some representation of the pixel-pixel noise
correlation matrix Npp - explicit, explicit
inverse, or implicit inverse. Truncation of Npp
(only) is equivalent to marginalization.
32
Power Spectrum - Formalism (I)

Given a map of noise Gaussian CMB signal
with correlations
and - assuming a uniform prior - probability
distribution

33
Power Spectrum - Formalism (II)

Assuming azimuthal symmetry of the CMB, its
pixel-pixel correlations depend only on the angle
between the pair.

3 components ? 6 spectra (3 auto 3 cross).
Theory predicts CTB CEB 0 but this
should still be calculated/confirmed (test of
systematics).
34
Power Spectrum - Formalism (III)

Eg temperature signal correlations
with beampixel window
For binned multipoles

35
Power Spectrum - Formalism (IV)

No analytic likelihood maximization - use
iterative search.
Newton-Raphson requires two derivatives wrt bin
powers

36
Power Spectrum - Algorithms (I)

For each NR iteration we want to solve the
system

Equation Naive OpCount

Eg. (2 x 104) x (6 x 108)3 4 x 1030
operations/iteration 8 x (6 x 108)2 3 x
1018 bytes/matrix for Planck !
37
Power Spectrum - Algorithms (II)

Exploiting the signal derivative matrices
structures
- reduces the operation count by up to a factor
of 10.
- increases the memory needed from 2 to 3
matrices.

38
Power Spectrum - Algorithms (III)

b) Using PCG for D-1 trace approximant
for dL/dC d2L/dC2 terms.
scaling as
Eg. 5 x 104 x (2 x 104) x 103 x (5 x 1011) x
10
5 x 1024 for Planck.

39
Power Spectrum - Algorithms (IV)

c) Abandoning maximum likelihood altogether !
Eg. Take SHT of noisy, cut-sky map.
Build a pseudo-spectrum
Assume spectral independence
an invertible linear transfer function

40
Power Spectrum - Algorithms (VI)

Now use Monte-Carlo simulations of the
experiments observations of signalnoise noise
only to determine the transfer matrix Tll the
noise spectrum Cln, and hence the signal spectrum
Cls.
Scales as
Eg. 104 x 30 x 5 x 1011 x 10 1018 for Planck.
And provided we simulate time-ordered data, any
other abuses (filtering etc) can be included in
the Monte Carlo.

41
Power Spectrum - Implementation (I)

Even with supercomputers full maximum likelihood
analysis
is unfeasible for more than O(105) pixels.
However, it is still useful for
low-resolution all-sky maps (eg. for low
multipoles where Monte Carlo methods are less
reliable).
high-resolution cut-sky maps (eg. for consistency
systematics cross-checks).
hierarchical resolution/sky-fraction analyses.

42
Power Spectrum - Implementation (II)

Build the dS/dCb matrices.
Make initial guess Cb
Calculate D N Cb dS/dCb invert.
For each b, calculate Wb D-1 dS/dCb.
For each b, calculate dL/dCb d Wb z - TrWb
For each b, b calculate Fbb TrWb Wb
Calculate new Cb Fbb-1 dL/dCb.
Iterate to convergence.
Using as many processors as possible, as
efficiently as possible.

43
Power Spectrum - Implementation (III)

The limiting calculation is
so it sets the constraints for the overall
implementation.
Maximum efficiency requires minimum communication
keep data as dense as possible on the processors
3 matrices fill available memory
Check other steps - none requires more than 3
matrices.
Out-of-core implementation, so I/O efficiency is
critical.

44
Power Spectrum - Implementation (IV)

Dense data requirement sets the processor count
PE 3 x 8 x Np2 / bytes per processor
Eg. 3 x 8 x (105)2 / 109 240,
but we want to scale to very large systems.
Each bin multiplication is independent - divide
the processors into gangs do many Wb
simultaneously.
Derivatives terms are bin-parallel too
dL/dCb trivial
d2L/dCbdCb requires some load-balancing

45
Aside Fisher Matrix (I)

Use TrAB Aij Bji
ATji Bji
For all my gangs b
Read Wb
Transpose WTb
WTb Wb ? Fbb
For all b gt b
Read Wb over Wb
WTb Wb ? Fbb

2 matrices in memory Nb(Nb1)/2 reads.
46
Aside Fisher Matrix (II)
For all my gangs b Read Wb Transpose WTb
WTb Wb ? Fbb Read Wb1 over Wb WTb Wb1?
Fbb1 Transpose WTb1 For all b gt b1
Read Wb over Wb1 WTb Wb ? Fbb WTb1
Wb ? Fb1 b
3 matrices in memory Nb(Nb2)/4 reads
47
Power Spectrum - Implementation (V)

All gangs need dS/dCb D-1 first but these terms
do not bin-parallelize.
Either - calculate them using all processors
re-map
or - calculate them on 1 gang leave
others idle
Balance re-mapping/idling cost against multi-gang
benefit optimal WT2C depends on architecture
data parameters.
For large data with many bins efficient
communication network, sustain 50 peak on
thousands of processors.

48
Aside Parameterizing Architectures

Portability is crucial, but architectures
differ.
Identify critical components parameterize
limits.
Tune codes to each architecture.
Eg. ScaLAPACK - block-cyclic matrix distribution
is most efficient data layout (mapped to re-use
pattern).
Hardware limit is cache size - set blocksize so
that requisite number of blocks fit fill cache.
Eg. Input/Output - how many processors can
efficiently read/write simultaneously (contention
vs idling) ?

49
Conclusions

Maximum likelihood spectral estimation cannot
handle high-resolution all-sky observation, but
Monte Carlo methods can (albeit with questions at
low-l).
For sufficiently cut-sky or low-resolution maps
ML methods are feasible, and allow complete
error-tracking.
Dense matrix-matrix kernels achieve 50 of peak
performance (cf. 5 for FFT SHT).
Inefficient implementation of data-delivery can
reduce these by an order of magnitude or more
re-use each data unit as much as possible before
moving to the next.
Things are only going to get harder !

Write a Comment

User Comments (0)

About PowerShow.com

Cosmic Microwave Background Data Analysis : From Time-Ordered Data To Power Spectra - PowerPoint PPT Presentation

Cosmic Microwave Background Data Analysis : From Time-Ordered Data To Power Spectra

load balancing (work & memory) - data-delivery, including communication & I/O ... encodes the error information - is sparse, so can be saved even for huge data ... – PowerPoint PPT presentation