Introduction to Sky Survey Problems - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Introduction to Sky Survey Problems

Description:

Title: Sky Survey Database Design eSI Apr 03 - Presentation Author: Bob Mann Last modified by: andrewsm Created Date: 1/1/1601 12:00:00 AM Document presentation format – PowerPoint PPT presentation

Number of Views:179
Avg rating:3.0/5.0
Slides: 21
Provided by: BobM228
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Sky Survey Problems


1
Introduction to Sky Survey Problems
  • Bob Mann

2
Introduction to sky survey database problems
  • Astronomical data
  • Astronomical databases
  • The Virtual Observatory concept status
  • Large sky survey databases
  • Spatial indexing in astronomical databases
  • Case Study SDSS SkyServer

3
Observational Astronomy
  • Electromagnetic spectrum

4
Astronomical data in original form
  • Optical
  • Image array of pixel values
  • X-ray
  • Event list positions, arrival times, energies of
    all detected photons
  • Radio
  • Interferometric visibilities sparse Fourier
    transform of a region of the sky
  • Very different types of data

5
Astronomical data in final form
  • Most research done using catalogue data
  • i.e. tables of attributes of detected sources
    mainly discrete sources (stars, galaxies, etc)
  • Data compression
  • Catalogue - few of image data volume
  • Amenable to representation in relational DB
  • Natural indexing by location in sky

6
Astronomical Databases
  • Sky survey archives
  • Homogeneous data, standard reduction pipeline
  • Science Archive do science on DB
  • Telescope archives
  • Semi-indexed collections of raw data files from
    all observations taken heterogeneous
  • Download data for reduction and analysis
  • Specialist data centres collections of
    catalogues
  • Bibliographic databases scans of major journals

7
The Virtual Observatory
  • Concept
  • Interoperable federation of all the worlds
    significant astronomical databases
  • Facilitate multi-wavelength astronomy
  • Status
  • Several projects underway AstroGrid in UK
  • 5 years work to create a fully working VO
  • The VO sets the context for the design of new sky
    survey databases

8
AstroGrid www.astrogrid.org
  • Consortium
  • Edinburgh, Leicester, Cambridge, RAL, MSSL,
    Jodrell Bank, Queens Belfast
  • 3 year (4M) project
  • 1 yr Phase A Study finished end of 2002
  • 2 yr Phase B Implementation to end 2004
  • Web (later Grid) service framework in Java
  • Currently building web services, portals, etc -
    researching OGSA and OGSA-DAI

9
Large sky survey databases
  • Major science driver for AstroGrid and VO
  • New science mining multi-wavelength data
  • Largest are optical/near-infrared sky surveys
  • Largest of these hosted in Edinburgh
  • current - SuperCOSMOS, SDSS (mirror)
  • future - WFCAM, VISTA
  • Each yield 1-10TB of catalogue data in RDBMS

10
Spatial queries in astronomy
  • Two important types
  • Select entries (with predicate) in area of sky
  • Match entries (esp. between two tables)
  • Second is special case of first
  • i.e. both boil down to point-within-distance-of-p
    oint
  • but distances in two cases can be very different
  • Advantage in using a hierarchical spatial
    indexing scheme
  • Perform spatial query at appropriate granularity

11
Spatial Indexingin Astronomy
  • The Celestial Sphere
  • Many coordinate systems
  • Most common is the
  • equatorial system, with
  • Right Ascension and
  • Declination as analogues
  • of Longitude Latitude

12
Spatial indexing in astronomical databases
  • Basic DBMS indexes are 1-D e.g. B-trees
  • Some DBMSs support general 2-D indexing
  • Usually using R-trees (or variants) rectangles
    astronomical experiments not too successful
    Clive
  • Some DBMSs have native spatial indexing
  • Little knowledge of this in astronomy - want to
    know more
  • But
  • The Celestial Sphere is a sphere(!)
  • Many geographical spatial DBs use planar
    projections
  • So, astronomers have felt the need to develop
    spatial indexing prescriptions of their own

13
Hierarchical Triangular Mesh - HTM
  • Developed by Sloan survey archive team at JHU
  • Start with projection of octahedron on sphere and
    subdivide triangles at their midpoints
  • Generate unique pixel ID code based on position
    in the sky and level in hierarchy
    can index that with B-tree

14
Hierarchical Equal Area Iso-Latitude Pixelisation
(HEALPix)
  • Developed by Kris Gorski (now JPL/Caltech)
  • Start with division of sphere into twelve equal
    area curvilinear quadrilaterals,
  • then divide each into four
  • Like HTM, produces a
  • pixel code on which a
  • B-tree index can be made
  • (Ian HEALPix in Oracle?)

15
Sky survey DB case studySkyServer for SDSS
  • Sloan Digital Sky Survey (SDSS)
  • first of new generation of sky surveys
  • US-led team, dedicated telescope camera
  • Image half of northern sky in 5 optical bands
  • Then obtain optical
  • spectra for 1,000,000
  • galaxies
  • Estimated 1TB of
  • catalogue data

16
SDSS Archive
  • First of new generation of sky survey archives
  • Represents the state-of-the-art in sky survey
    databases
  • Developed by Alex Szalays team at Johns Hopkins
  • Project started in earnest in about 1996
  • OODBMSs seen as the coming thing
  • SDSS chose Objectivity/DB for their archive
    15 staff-years of effort later, theyd rewritten
    much of the DBMS themselvesand then jumped ship
    and started using MS SQL Server! - SkyServer
    (in collaboration with Jim Gray, MS Research)

17
SkyServer design considerations
  • Power flexibility to pose arbitrary queries
  • Simple astronomers ignorant of SQL!
  • Hide messy spherical trigonometry
  • Distance on sphere between (a1,d1) and (a2,d2) is
    given in SQL by
  • 2.0asin(sqrt(square(sin(0.5(radians(d1-d2)))
    ) cos(radians(d1))cos(radians(d2))
    square(sin(0.5(radians(a1-a2)))))
  • Dont want users typing this
  • Dont really want DBMS to evaluate expressions
    like this often

18
SkyServer spatial queries
  • Simple table-valued functions exposed to user
  • E.g. select count()
  • from fGetNearbyObjEq(a,d,radius)
  • (a,d)(Right Ascension, Declination)
  • Functions call SQL Server Extended Stored
    Procedure
  • HTM index manipulation routines, implemented in a
    Dynamically Linked Library (DLL)
  • DLL generated from HTM package in C

19
Lessons from HTM implementation in SkyServer
  • SQL is not great for spherical trigonometry
  • Messy to write, slow to compute
  • Have to define stored procedures/functions
  • Expose a clean interface to users
  • Let them pose queries the way they want to
  • Replace trig operations by integer arithmetic
  • Library of HTM index operations underneath
  • Precompute tables of neighbouring objects
  • Far fewer spatial match operations at query time

20
Problems with this approach
  • How easy to develop stored procedures, etc?
  • Needs detailed knowledge of DBMS
  • Extended Stored Procedure calls slow
  • How well will query optimiser use HTM?
  • less well than built-in spatial index?
  • but that might be poorly suited to astronomical
    applications
  • How easy to implement all this in DBMSs other
    than SQL Server?
  • But this works reasonably well in practice!
Write a Comment
User Comments (0)
About PowerShow.com