Alex Szalay - PowerPoint PPT Presentation

About This Presentation
Title:

Alex Szalay

Description:

The Sloan Digital Sky Survey Alex Szalay Department of Physics and Astronomy The Johns Hopkins University The Sloan Digital Sky Survey Scientific Motivation Cosmology ... – PowerPoint PPT presentation

Number of Views:237
Avg rating:3.0/5.0
Slides: 61
Provided by: AlexS45
Learn more at: http://www.sdss.jhu.edu
Category:

less

Transcript and Presenter's Notes

Title: Alex Szalay


1

The Sloan Digital Sky Survey
  • Alex Szalay
  • Department of Physics and Astronomy
  • The Johns Hopkins University

2
The Sloan Digital Sky Survey
A project run by the Astrophysical Research
Consortium (ARC)
The University of Chicago Princeton
University The Johns Hopkins University The
University of Washington Fermi National
Accelerator Laboratory US Naval Observatory
The Japanese Participation Group The Institute
for Advanced Study Max Planck Inst,
Heidelberg SLOAN Foundation, NSF, DOE, NASA
Goal To create a detailed multicolor map of the
Northern Sky over 5 years, with a budget of
approximately 80M Data Size 40 TB raw, 2 TB
processed
3
Scientific Motivation
Create the ultimate map of the Universe ? The
Cosmic Genome Project! Study the distribution of
galaxies ? What is the origin of
fluctuations? ? What is the topology of the
distribution? Measure the global properties of
the Universe ? How much dark matter is
there? Local census of the galaxy population ?
How did galaxies form? Find the most distant
objects in the Universe ? What are the highest
quasar redshifts?
4
Cosmology Primer
The Universe is expanding the galaxies move
away from us spectral lines are redshifted
v Ho r Hubbles law
The fate of the universe depends on the
balance between gravity and the expansion
velocity
? density/criticalif ? lt1, expand forever
Most of the mass in the Universe is dark
matter, and it may be cold (CDM)
?dgt ?
The spatial distribution of galaxies is
correlated, due to small ripples in the early
Universe
P(k) power spectrum
5
The Naught Problem
What are the global parameters of the
Universe? H0 the Hubble constant 55-75
km/s/Mpc ?0 the density parameter 0.25-1 ?0 the
cosmological constant 0 - 0.7 Their values are
still quite uncertain today... Goal measure
these parameters with an accuracy of a few percent
High Precision Cosmology!
6
The Cosmic Genome Project
The SDSS will create the ultimate mapof the
Universe, with much more detailthan any other
measurement before
7
Area and Size of Redshift Surveys
8
Clustering of Galaxies
We will measure the spectrum of the density
fluctuations to high precision even on very
large scales
The error in the amplitude of the
fluctuation spectrum 1970 x100 1990 x2 1995
0.4 1998 0.2 1999 0.1 2002 0.05
9
Relevant Scales
Distances measured in Mpc megaparsec 1
Mpc 3 x 1024 cm 5 Mpc distance
between galaxies 3000 Mpc scale of the
Universe
if ? gt200 Mpc fluctuations have a PRIMORDIAL
shape if ? lt100 Mpc gravity creates sharp
features, like walls, filaments and voids
Biasing conversion of mass into light is
nonlinear light is much more clumpy than the mass
10
The Topology of Local Universe
Measure the Topology of the Universe
Does it consist of walls and voids
or is it randomly distributed?
11
Finding the Most Distant Objects
Intermediate and high redshift QSOs
Multicolor selection function.
Luminosity functions and spatial clustering.
High redshift QSOs (zgt5).
12
Features of the SDSS
Special 2.5m telescope, located at Apache Point,
NM 3 degree field of view. Zero distortion
focal plane. Two surveys in one Photometric
survey in 5 bands. Spectroscopic redshift
survey. Huge CCD Mosaic 30 CCDs 2K x
2K (imaging) 22 CCDs 2K x 400 (astrometry) Two
high resolution spectrographs 2 x 320 fibers,
with 3 arcsec diameter. R2000 resolution with
4096 pixels. Spectral coverage from 3900Å to
9200Å. Automated data reduction Over 100
man-years of development effort. (Fermilab
collaboration scientists) Very high data
volume Expect over 40 TB of raw data. About 2
TB processed products Data made available to the
public
13
Apache Point Observatory
Located in New Mexico, near White Sands National
Monument
14
The Telescope
Special 2.5m telescope 3 degree field of
view Zero distortion focal plane Wind
screen moved separately
15
The Photometric Survey
Northern Galactic Cap 5 broad-band filters
( u', g', r', i', z )
limiting magnitudes (22.3, 23.3, 23.1, 22.3,
20.8) drift scan of 10,000 square degrees
55 sec exposure time 40 TB raw imaging
data -gt pipeline -gt 100,000,000 galaxies
50,000,000 stars calibration to 2 at
r'19.8 only done in the best seeing (20
nights/yr) pixel size is 0.4 arcsec,
astrometric precision is 60 milliarcsec Southern
Galactic Cap multiple scans (gt 30 times) of
the same stripe Continuous data rate of 8
Mbytes/sec
16
Survey Strategy
Overlapping 2.5 degree wide stripes Avoiding the
Galactic Plane (dust) Multiple exposures on the
three Southern stripes
17
The Spectroscopic Survey
Measure redshifts of objects ? distance SDSS
Redshift Survey 1 million galaxies 100,000
quasars 100,000 stars Two high throughput
spectrographs spectral range 3900-9200 Å. 640
spectra simultaneously. R2000
resolution. Automated reduction of spectra Very
high sampling density and completeness Objects in
other catalogs also targeted
18
Optimal Tiling
Fields have 3 degree diameter Centers determined
by an optimization procedure A total of
2200 pointings 640 fibers assigned simultaneously
19
The Mosaic Camera
20
Photometric Calibrations
The SDSS will create a new photometric
system u' g' r' i' z' Primary standards
observed with the USNO 40-inch telescope in
Flagstaff Secondary standards observed with
the SDSS 20-inch telescope at Apache
Point calibrating the SDSS imaging data
21
The Spectrographs
Two double spectrographs very high
throughput two 2048x2048 CCD detectors
mounted on the telescope light fed through
slithead
22
The Fiber Feed System
Galaxy images are captured by optical fibers
lined up on the spectrograph slit Manually
plugged during the day into Al plugboards 640
fibers in each bundle The largest fiber system
today
23
First Light Images
Telescope First light May 9th 1998
Equatorial scans
24
The First Stripes
Camera 5 color imaging of gt100 square
degrees Multiple scans across the same
fields Photometric limits as expected
25
NGC 2068
26
UGC 3214
27
NGC 6070
28
The First Quasars
The four highest redshift quasars have been
found in the first SDSS test data !
29
Methane/T Dwarf
  • Discovery of several newobjects by SDSS 2MASS

30
Detection of Gravitational Lensing
28,000 foreground galaxies and 2,045,000
background galaxies in test data(McKay etal 1999)
31
SDSS Data Flow
32
Distributed Collaboration
Fermilab
U.Chicago
U.Washington
ESNET
I. AdvancedStudy
Japan
Princeton U.
VBNS
JHU
Apache PointObservatory
USNO
NMSU
33
Data Processing Pipelines
34
Concept of the SDSS Archive
Science Archive (products accessible to users)
OperationalArchive (raw processed data)
35
SDSS Data Products
Object catalog 400 GB parameters of
gt108 objects Redshift Catalog 1 GB
parameters of 106 objects Atlas Images 1.5
TB 5 color cutouts of gt108 objects
Spectra 60 GB in a one-dimensional
form Derived Catalogs 20 GB - clusters
- QSO absorption lines 4x4 Pixel All-Sky Map
60 GB heavily compressed
All raw data saved in a tape vault at Fermilab
36
Who will be using the archive?
Power Users sophisticated, with lots of
resources research is centered around the
archive data moderate number of very intensive
queries mostly statistical, large output
sizes General Astronomy Public frequent, but
casual lookup of objects/regions the archives
help their research, but not central to
it large number of small queries a lot of
cross-identification requests Wide
Public browsing a Virtual Telescope can have
large public appeal need special
packaging could be a very large number of
requests
37
How will the data be analyzed?
The data are inherently multidimensional gt
positions, colors, size, redshift Improved
classifications result in complex N-dimensional
volumes gt complex constraints, not
ranges Spatial relations will be
investigated gt nearest neighbors gt other
objects within a radius Data Mining finding the
needle in the haystack gt separate typical
from rare gt recognize patterns in the
data Output size can be prohibitively large for
intermediate files gt import output directly
into analysis tools
38
Geometric Approach
  • The Main Problem
  • fast, indexed, complex searches of Terabytes in
    k-dim space
  • searches are not necessary parallel to the
    axes gt traditional indexing (b-tree) does not
    work
  • Geometric Approach
  • Use the geometric nature of the k-dimensional
    data
  • Quantize data into containers of
    friends objects of similar colors close on
    the sky stored together gt efficient cache
    performance
  • Containers represent a coarse grained density map
    of the data multidimensional index tree k-d
    tree r-tree

39
Organization of Searches
Queries are inherently geometric the primitive
constraint is a half-space formed by a linear
combination gt k-dimensional hyperplane Boolean
combinations are allowed the constraints form
k-dimensional polyhedra Queries are run on the
coarse grained map determine intersections of
index tree and query polyhedron List of
containers is prepared for query projections of
full query time and output volume created The
list of containers and query is sent to the
Search Engine actual searches quantized by
containers Searches can be optimized, executed
in parallel
40
Geometric Indexing
Divide and Conquer
Partitioning
3 ? N ? M
HierarchicalTriangular Mesh
Split as k-d treeStored as r-treeof bounding
boxes
Using regularindexing techniques
41
Sky coordinates
Stored as Cartesian coordinates projected onto
a unit sphere Longitude and Latitude
lines intersections of planes and the
sphere Boolean combinations query polyhedron
42
Sky Partitioning
Hierarchical Triangular Mesh - based on octahedron
43
Hierarchical Subdivision
Hierarchical subdivision of spherical
triangles represented as a quadtree In SDSS the
tree is 5 levels deep - 8192 triangles
44
Result of the Query
45
Magnitudes and Multicolor Searches
  • Galaxy fluxes
  • large dynamic range
  • errors
  • divergent as x? 0 !

For multicolor magnitudes the error
contours can be very anisotropic and
skewed, extremely poor localization!
But this is an artifact of the logarithm at zero
flux, in flux space the object is well localized
46
Novel Magnitude Scale
b softnessc set to match normal magnitudes
  • Advantages
  • monotonic
  • degrades gracefully
  • objects have small error ellipse
  • unified handling of detections and upper
    limits!
  • Disadvantages
  • unusual
  • (Lupton, Gunn and Szalay, AJ 99)

47
Flux Indexing
Split along alternating flux directions Create
balanced partitions Store bounding boxes at each
stepBuild a 10-12 level tree in each triangle
48
How to build compact cells?
The SDSS will measure fluxes in 5 bands gt
asinh magnitudes Axis-parallel splits in median
flux, in 8 separate zones in Galactic
latitude gt 5 dimensional bounding boxes
The fluxes are strongly correlated gt 2 ?
dimensional distribution of typical objects gt
widely scattered rare objects gt large density
contrasts
Therefore first create a local density and
split on its value (Csabai etal 96) typical
(98) rare (2)
49
Coarse Grained Design

Archive
50
Distributed Implementation
User Interface
Analysis Engine
Master
SX Engine
Objectivity Federation
Objectivity
Slave
Slave
Slave
Objectivity
Slave
Objectivity
Objectivity
RAID
Objectivity
RAID
RAID
RAID
51
JHU Contributions
Fiber spectrographs P. FeldmanA. UomotoS.
FriedmanS. Smee
  • Science Archive
  • A. SzalayA. ThakarP. Kunszt
  • I. CsabaiGy. SzokolyA. ConnollyA. Chaudhaury
  • A lot of help from
  • Jim Gray, Microsoft

Management T. HeckmanT. PoehlerA. DavidsenA.
UomotoA. Szalay
52
Processing Platforms
  • At Fermilab
  • 2 AlphaServer 8200 data processing
  • 1 SGI Origin 2000 data bases
  • Archive at JHU
  • 1 AlphaServer 1000A (development)
  • 10 Intel based servers w. LVD RAID
  • software verified on
  • Digital Unix, IRIX, Solaris, Linux

53
Exploring new methods
New spectral classification techniques galaxy
spectra can be expressed as a superposition of a
few (lt5) principal components gt objective
classification of 1 million spectra!
Photometric redshifts galaxy colors
systematically change with redshift, the SDSS
photometry works like a 5-pixel spectrograph gt
?z0.05, but with 100 million objects!
Measuring cosmological parameters before data
analysis was limited by small number
statistics after dominant errors are systematic
(extinction) gt new analysis methods are
required!
54
Photometric redshifts
Multicolor photometry maps physical
parameters luminosity L redshift z
spectral type T
Inversion u,g,r,I,z gt z, L,
T
observed fluxes
Redshifts are statistical, with large errors
?z?0.05 The data set is huge, more than 100
million galaxies Easy to subdivide into coarse z
bins, and by type gt study evolution gt
enormous volume - 1 Gpc3
55
Measuring P(k)
Karhunen-Loeve transform Signal-to-noise
eigenmodes of the redshift survey Optimal
extraction of clustering signal Maximal
rejection of systematic errors(Vogeley and
Szalay 96, Matsubara, Szalay and Landy 99)
Pilot project using the Las Campanas Redshift
Survey with 22,000 galaxies
We simultaneously measure the values of the
redshift-distortion parameter (??0.6/b),
the normalization (?8 ) and the CDM shape
parameter ( ? ?h).
56
Trends
  • Future dominated by detector improvements
  • Moores Law growth in CCD capabilities
  • Gigapixel arrays on the horizon
  • Improvements in computing and storage will
    track growth in data volume
  • Investment in software is critical, and
    growing

Total area of 3m telescopes in the world in m2,
total number of CCD pixels in Megapix, as a
function of time. Growth over 25 years is a
factor of 30 in glass, 3000 in pixels.
57
The Age of Mega-Surveys
The next generation of astronomical archives with
Terabyte catalogs will dramatically change
astronomy top-down design large sky
coverage built on sound statistical
plans uniform, homogeneous, well
calibrated well controlled and documented
systematics The technology to acquire, store and
index the data is here we are riding Moores
Law Data mining in such vast archives will be a
challenge, but possibilities are quite
unimaginable Integrating these archives into a
single entity is a project for the whole
community gt National Virtual Observatory
58
New Astronomy Different!
  • Systematic Data Exploration
  • will have a central role in the New Astronomy
  • Digital Archives of the Sky
  • will be the main access to data
  • Data Avalanche
  • the flood of Terabytes of data is already
    happening, whether we like it or not!
  • Transition to the new
  • may be organized or chaotic

59
NVO The Challenges
  • Size of the archived data
  • 40,000 square degrees is 2 trillion pixels
  • One band 4 Terabytes
  • Multi-wavelength 10-100 Terabytes
  • Time dimension few Petabytes
  • The development of
  • new archival methods
  • new analysis tools
  • new standards (metadata, interchange formats)
  • Hardware/networking requirements
  • Training the next generation!

60
Summary
The SDSS project combines astronomy, physics, and
computer science
It promises to fundamentally change our view of
the universe
It will determine how the largest structures in
the universe were formed
It will serve as the standard astronomy
reference for several decades
Its virtual universe can be explored by both
scientists and the public
Through its archive it will create a new paradigm
in astronomy
61
www.sdss.org www.sdss.jhu.edu
Write a Comment
User Comments (0)
About PowerShow.com