High Performance Presentation: 5 slides/Minute? (65 slides / 15 minutes) IO and DB - PowerPoint PPT Presentation

About This Presentation
Title:

High Performance Presentation: 5 slides/Minute? (65 slides / 15 minutes) IO and DB

Description:

A new world record? Jim Gray. Microsoft Research. 2. TerraServer Lessons Learned ... SQL can scan at 5M records/cpu/second. Sequential scans are embarrassingly ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 66
Provided by: jimg178
Category:

less

Transcript and Presenter's Notes

Title: High Performance Presentation: 5 slides/Minute? (65 slides / 15 minutes) IO and DB


1
High Performance Presentation5
slides/Minute?(65 slides / 15 minutes)IO and
DB stuff for LSST
  • A new world record?
  • Jim Gray
  • Microsoft Research

2
TerraServer Lessons Learned
  • Hardware is 5 9s (with clustering)
  • Software is 5 9s (with clustering)
  • Admin is 4 9s (offline maintenance)
  • Network is 3 9s (mistakes, environment)
  • Simple designs are best
  • 10 TB DB is management limit1 PB 100 x 10 TB
    DBthis is 100x better than 5 years ago.(yahoo!,
    HotMail are 300TB, Google! Is 2PB)
  • Minimize use of tape
  • Backup to disk (snapshots)
  • Portable disk TBs

3
Serving BIG images
  • Break into tiles (compressed)
  • 10KB for modems
  • 1MB for LANs
  • Mosaic the tiles for pan, crop
  • Store image pyramid for zoom
  • 2x zoom only adds 33 overhead1 ¼ 1/16
  • Use a spatial index to cluster find objects

4
Economics
  • People are more than 50 of costs
  • Disks are more than 50 of capital
  • Networking is the other 50
  • People
  • Phone bill
  • Routers
  • Cpus are free (they come with the disks)

5
SkyServer/ SkyQuery Lessons
  • DB is easy
  • Search
  • It is BEST to index
  • You can put objects and attributes in a row
    (SQL puts big blobs off-page)
  • If you cant index, you can extract attributes
    and quickly compare
  • SQL can scan at 5M records/cpu/second
  • Sequential scans are embarrassingly parallel
  • Web services are easy
  • XML Data Sets
  • a universal way to represent answers
  • minimize round trips 1 request/response
  • Diffgrams allow disconnected update

6
How Will We Find Stuff?Put everything in the DB
(and index it)
  • Need dbms features Consistency, Indexing,
    Pivoting, Queries, Speed/scalability, Backup,
    replicationIf you dont use one, your creating
    one!
  • Simple logical structure
  • Blob and link is all that is inherent
  • Additional properties (facets extra
    tables)and methods on those tables
    (encapsulation)
  • More than a file system
  • Unifies data and meta-data
  • Simpler to manage
  • Easier to subset and reorganize
  • Set-oriented access
  • Allows online updates
  • Automatic indexing, replication

SQL
7
How Do We Represent Data To The Outside World?
lt?xml version"1.0" encoding"utf-8" ?gt -
ltDataSet xmlns"http//WWT.sdss.org/"gt -
ltxsschema id"radec" xmlns"" xmlnsxs"http//ww
w.w3.org/2001/XMLSchema" xmlnsmsdata"urnschemas
-microsoft-comxml-msdata"gt ltxselement
name"radec" msdataIsDataSet"true"gt ltxselement
name"Table"gt   ltxselement name"ra"
type"xsdouble" minOccurs"0" /gt   ltxselement
name"dec" type"xsdouble" minOccurs"0" /gt
- ltdiffgrdiffgram xmlnsmsdata"urnschemas-micr
osoft-comxml-msdata" xmlnsdiffgr"urnschemas-m
icrosoft-comxml-diffgram-v1"gt - ltradec
xmlns""gt - ltTable diffgrid"Table1"
msdatarowOrder"0"gt   ltragt184.028935351008lt/ragt
  ltdecgt-1.12590950121524lt/decgt   lt/Tablegt -
ltTable diffgrid"Table10" msdatarowOrder"9"gt  
ltragt184.025719033547lt/ragt   ltdecgt-1.2179582792018
6lt/decgt lt/Tablegt lt/radecgt  lt/diffgrdiffgramgt lt/
DataSetgt
  • File metaphor too primitive just a blob
  • Table metaphor too primitive just records
  • Need Metadata describing data context
  • Format
  • Providence (author/publisher/ citations/)
  • Rights
  • History
  • Related documents
  • In a standard format
  • XML and XML schema
  • DataSet is great example of this
  • World is now defining standard schemas

schema
Data or difgram
8
Emerging Concepts
  • Standardizing distributed data
  • Web Services, supported on all platforms
  • Custom configure remote data dynamically
  • XML Extensible Markup Language
  • SOAP Simple Object Access Protocol
  • WSDL Web Services Description Language
  • DataSets Standard representation of an answer
  • Standardizing distributed computing
  • Grid Services
  • Custom configure remote computing dynamically
  • Build your own remote computer, and discard
  • Virtual Data new data sets on demand

9
Szalays LawThe utility of N comparable
datasets is N2
  • Metcalfs law applies to telephones, fax,
    Internet.
  • Szalay argues as followsEach new dataset gives
    new information2-way combinations give new
    information.
  • Example Combine these 3 datasets
  • (ID, zip code)
  • (ID, birth day)
  • (ID, height)
  • Other example quark star Chandra Xray
    Hubble optical,600 year old records..Drake,
    J. J. et al. Is RX J185635-375 a Quark Star?.
    Preprint, (2002).

10
Science is hitting a wallFTP and GREP are not
adequate
  • You can GREP 1 MB in a second
  • You can GREP 1 GB in a minute
  • You can GREP 1 TB in 2 days
  • You can GREP 1 PB in 3 years.
  • Oh!, and 1PB 10,000 disks
  • At some point you need indices to limit
    search parallel data search and analysis
    search and analysis tools
  • This is where databases can help
  • You can FTP 1 MB in 1 sec
  • You can FTP 1 GB / min ( 1 /GB)
  • 2 days and 1K
  • 3 years and 1M

11
Networking Great hardware Software
  • WANs _at_ 5GBps (1? 40 Gbps)
  • GbpsEthernet common (100 MBps)
  • Offload gives 2 hz/Byte
  • Will improve with RDMA zero-copy
  • 10 Gbps mainstream by 2004
  • Faster I/O
  • 1 GB/s today (measured)
  • 10 GB/s under development
  • SATA (serial ATA) 150MBps/device

12
Bandwidth 3x bandwidth/year for 25 more years
  • Today
  • 40 Gbps per channel (?)
  • 12 channels per fiber (wdm) 500 Gbps
  • 32 fibers/bundle 16 Tbps/bundle
  • In lab 3 Tbps/fiber (400 x WDM)
  • In theory 25 Tbps per fiber
  • 1 Tbps USA 1996 WAN bisection bandwidth
  • Aggregate bandwidth doubles every 8 months!

13
Hero/Guru Networking
Redmond/Seattle, WA
Information Sciences Institute Microsoft Qwest Uni
versity of Washington Pacific Northwest
Gigapop HSCC (high speed connectivity
consortium) DARPA
New York
Arlington, VA
San Francisco, CA
5626 km 10 hops
14
Real Networking
  • Bandwidth for 1 Gbps stunt cost 400k/month
  • 200/Mbps/m (at each end hardware admin)
  • Price not improving very fast
  • Doesnt include operations / local hardware costs
  • Admin costs more 1/GB to 10/GB
  • Challenge Go home and FTP from a fastserver
  • The Guru Gap FermiLab lt-gt JHU
  • Both well connected
  • vBNS, NGI, Internet2, Abilene,.
  • Actual desktop-to-desktop 100KBps
  • 12 days/TB (but it crashes first).
  • The reality to move 10GB, mail it! TeraScale
    Sneakernet ?

15
How Do You Move A Terabyte?
Source TeraScale Sneakernet, Microsoft Research,
Jim Gray et. all
16
There Is A Problem
Niklaus Wirth Algorithms Data Structures
Programs
  • GREAT!!!!
  • XML documents are portable objects
  • XML documents are complex objects
  • WSDL defines the methods on objects (the class)
  • But will all the implementations match?
  • Think of UNIX or SQL or C or
  • This is a work in progress.

17
Changes To DBMSs
  • Integration of Programs and Data
  • Put programs inside the databaseallows OODB
  • Gives you parallel execution
  • Integration of Relational, Text, XML, Time
  • Scaleout (even more)
  • AutoAdmin (no knobs)
  • Manage Petascale databases (utilities, geoplex,
    online, incremental)

18
Publishing Data
Roles Authors Publishers Curators Archives Consume
rs
Traditional Scientists Journals Libraries Archives
Scientists
Emerging Collaborations Project web site DataDoc
Archives Digital Archives Scientists
19
The Core Problem No Economic Model
  • The archive user has not yet been born. How can
    he pay you to curate the data?
  • The Scientist gathered data for his own
    purposeWhy should he pay (invest time) for your
    needs?
  • Answer to both thats the scientific method
  • Curating data (documenting the design, the
    acquisition and the processing)Is very hard and
    there is no reward for doing it.The results are
    rewarded, not the process of getting them.
  • Storage/archive NOT the problem (its almost
    free)
  • Curating/Publishing is expensive.

20
SDSS Data Inflation Data Pyramid
  • Level 2Derived data products 10x smaller But
    there are many catalogs.
  • Publish new edition each year
  • Fixes bugs in data.
  • Must preserve old editions
  • Creates data pyramid
  • Store each edition
  • 1, 2, 3, 4 N N2 bytes
  • Net Data Inflation L2 L1
  • Level 1AGrows 5TB pixels/year growing to
    25TB 2 TB/y compressed growing to 13TB 4
    TB today (level 1A in NASA terms)

21
Whats needed?(not drawn to scale)
22
CS Challenges For Astronomers
  • Objectify your field
  • Precisely define what you are talking about.
  • Objects and Methods / Attributes
  • This is REALLY difficult.
  • UCDs are a great start but, there is a long way
    to go
  • Software is like entropy, it always increases.
    -- Norman Augustine, Augustines Laws
  • Beware of legacy software cost can eat you
    alive
  • Share software where possible.
  • Use standard software where possible.
  • Expect it will cost you 25 to 40 of project. ?
  • Explain what you want to do with the VO
  • 20 queries or something like that.

23
Challenge to Data Miners Linear and Sub-Linear
Algorithms
Techniques
  • Today most correlation / clustering
    algorithmsare polynomial N2 or N3 or
  • N2 is VERY big when N is big (1018 is big)
  • Need sub-linear algorithms
  • Current approaches are near optimal given
    current assumptions.
  • So, need new assumptionsprobably heuristic and
    approximate

24
Challenge to Data Miners Rediscover Astronomy
  • Astronomy needs deep understanding of physics.
  • But, some was discovered as variable
    correlations then explained with physics.
  • Famous example Hertzsprung-Russell Diagramstar
    luminosity vs color (temperature)
  • Challenge 1 (the student test) How much of
    astronomy can data mining discover?
  • Challenge 2 (the Turing test)Can data mining
    discover NEW correlations?

25
Plumbers Organize and Search Petabytes
  • Automate
  • instrument-to-archive pipelinesIt is is a messy
    business very labor intensiveMost current
    designs do not scale (too many manual
    steps)BaBar (1TB/day) and ESO pipeline seem
    promising.A job-scheduling or workflow system
  • Physical Database design access
  • Data access patterns are difficult to anticipate
  • Aggressively and automatically use indexing,
    sub-setting.
  • Search in parallel
  • Goals
  • Answer easy queries in 10 seconds.
  • Answer hard queries (correlations) in 10 minutes.

26
Scaleable Systems
  • Scale UP grow by adding components to a
    single system.
  • Scale Out grow by adding more systems.

Scale OUT
27
Whats New Scale Up
  • 64 bit TB size main memory
  • SMP on chip everythings smp
  • 32 256 SMP locality/affinity matters
  • TB size disks
  • High-speed LANs

28
Who needs 64-bit addressing?You! Need 64-bit
addressing!
  • 640K ought to be enough for anybody.
    Bill Gates, 1981
  • But that was 21 years ago 2?21/3
    14 bits ago.
  • 20 bits 14 bits 34 bits so.. 16GB ought
    to be enough for anybody Jim Gray,
    2002
  • 34 bits gt 31 bits so34 bits 64 bits
  • YOU need 64 bit addressing!

29
64 bit Why bother?
  • 1966 Moores law 4x more RAM every 3 years.
    1 bit of addressing every 18 months
  • 36 years later 2?36/3 24 more bits Not
    exactly right, but 32 bits not enough for
    servers 32 bits gives no headroom for clients
  • So, time is running out ( has run out )
  • Good news Itanium and Hammer are maturingAnd
    so is the base software (OS, drivers, DB,
    Web,...)Windows SQL _at_ 256GB today!

30
64 bit why bother?
  • Memory intensive calculations
  • You can trade memory for IO and processing
  • Example Data Analysis Clustering a JHU
  • in memory CPU time is NlogN , N 100M
  • Disk M chunks ? time M2
  • must run many times
  • Now running on HP Itanium Windows.Net Server
    2003 SQL Server

Graph courtesy of Alex Szalay Adrian Pope of
Johns Hopkins University
31
Amdahls balanced System Laws
  • 1 mips needs 4 MB ram and needs 20 IO/s
  • At 1 billion instructions per secondneed 4
    GB/cpuneed 50 disks/cpu!
  • 64 cpus 3,000 disks

1 bips cpu
4 GB RAM
50 disks 10,000 IOps 7.5 TB
32
The 5 Minute Rule Trade RAM for Disk Arms
  • If data re-referenced every 5 minutes It is
    cheaper to cache it in ram than to get it
    from diskA disk access/second 50 or
    50MB for 1 second or 50KB for
    1,000 seconds.
  • Each app has a memory knee Up to the knee,
    more memory helps a lot.

33
64 bit Reduces IO, saves disks
  • Large memory reduces IO
  • 64-bit simplifies code
  • Processors can be faster (wider word)
  • Ram is cheap (4 GB 1k to 20k)
  • Can trade ram for disk IO
  • Better response time.
  • Example
  • tpcC
  • 4x1Ghz Itanium2 vs
  • 4x1.6Ghz IA32
  • 40 extra GB ? 60 extra throughput

4x1.6Ghz IA32 8GB
4x1 Ghz IA64 48GB
4x1.6Ghz IA32 32GB
34
AMD Hammer Coming Soon
  • AMD Hammer is 64bit capable
  • 2003 millions of Hammer CPUs will ship
  • 2004 most AMD CPUs will be 64bit
  • 4GB ram is less than 1,000 today less than
    500 in 2004
  • Desktops (Hammer) and servers (Opteron).
  • You do the math,Who will demand 64bit capable
    software?

35
A 1TB Main Memory
  • Amdahls law 1mips/MB , now 15so 20 x 10 Ghz
    cpus need 1TB ram
  • 1TB ram 250k 2m today 25k 200k
    in 5 years
  • 128 million pages
  • Takes a LONG time to fill
  • Takes a LONG time to refill
  • Needs new algorithms
  • Needs parallel processing
  • Which leads us to
  • The memory hierarchy
  • smp
  • numa

36
Hyper-Threading SMP on chip
  • If cpu is always waiting for memoryPredict
    memory requests and prefetch
  • done
  • If cpu still always waiting for
    memoryMulti-program it (multiple hardware
    threads per cpu)
  • Hyper Threading Everything is SMP
  • 2 now more later
  • Also multiple cpus/chip
  • If your program is single threaded
  • You waste ½ the cpu and memory bandwidth
  • Eventually waste 80
  • App builders need to plan for threads.

37
The Memory Hierarchy
  • Locality REALLY matters
  • CPU 2 G hz, RAM at 5 MhzRAM is no longer random
    access.
  • Organizing the code gives 3x (or more)
  • Organizing the data gives 3x (or more)
  • Level latency (clocks) size
  • Registers 1 1 KB
  • L1 2 32 KB
  • L2 10 256 KB
  • L3 30 4 MB
  • Near RAM 100 16 GB
  • Far RAM 300 64 GB

38
(No Transcript)
39
Scaleup Systems Non-Uniform Memory Architecture
(NUMA)Coherent but remote memory is even
slower
All cells see a common memory Slow local main
memory Slower remote main memory
Partition manager
Service Processor
Scaleup by adding cells Planning for 64 cpu, 1TB
ram
Config DB
Service Processor
Interconnect, Service Processor, Partition
management are vendor specific Several vendors
doing thisItanium and Hammer
System interconnect Crossbar/Switch
40
Changed Ratios Matter
  • If everything changes by 2x, Then nothing
    changes.
  • So, it is the different rates that matter.

Slowly changing Speed of light People
costs Memory bandwidth WAN prices
Improving FAST CPU speed Memory disk
size Network Bandwidth
41
Disks are becoming tapes
  • Capacity
  • 150 GB now, 300 GB this year, 1 TB by
    2007
  • Bandwidth
  • 40 MBps now150 MBps by 2007
  • Read time
  • 2 hours sequential, 2 days random now4 hours
    sequential, 12 days random by 2007

150 GB
150 IO/s 40 MBps
1 TB
200 IO/s 150 MBps
42
Disks are becoming tapesConsequences
  • Use most disk capacity for archivingCopy on
    Write (COW) file system in Windows and other
    OSs.
  • RAID10 saves arms, costs space (OK!).
  • Backup to diskPretend it is a 100GB disk 1 TB
    disk
  • Keep hot 10 of data on fastest part of disk.
  • Keep cold 90 on colder part of disk
  • Organize computations to read/write disks
    sequentially in large blocks.

43
Wiring is going serial and getting FAST!
  • Gbps Ethernet and SATA built into chips
  • Raid Controllers inexpensive and fast.
  • 1U storage bricks _at_ 2-10 TB
  • SAN or NAS (iSCSI or CIFS/DAFS)

44
NAS SAN Horse Race
  • Storage Hardware 1k/TB/yStorage
    Management 10k...300k/TB/y
  • So as with Server ConsolidationStorage
    Consolidation
  • Two styles NAS (Network Attached
    Storage) File Server SAN (System Area
    Network) Disk Server
  • I believe NAS is more manageable.

45
SAN/NAS Evolution
Monolithic
Modular
Sealed
46
IO ThroughputK Access Per Second Vs. RPM
Kaps vs. RPM
Kaps
47
Comparison Of Disk Costs for similar
performance
Seagate Disk Prices
Source Seagate online store, quantity one prices
48
Comparison Of Disk Costs /MB for different
systems
Source Dell
49
Why Serial ATA Matters
  • Modern interconnect
  • Point-to-point drive connection
  • 150Mbs gt 300Mbs
  • Facilitates ATA disk arrays
  • Enables inexpensivecool storage

50
Performance (on Y2k SDSS data)
  • Run times on 15k HP Server (2 cpu, 1 GB , 8
    disk)
  • Some take 10 minutes
  • Some take 1 minute
  • Median 22 sec.
  • Ghz processors are fast!
  • (10 mips/IO, 200 ins/byte)
  • 2.5 m rec/s/cpu

1,000 IO/cpu sec 64 MB IO/cpu sec
51
NVO How Will It Work?
  • Define commonly used atomic services
  • Build higher level toolboxes/portals on top
  • We do not build everything for everybody
  • Use the 90-10 rule
  • Define the standards and interfaces
  • Build the framework
  • Build the 10 of services that are used by 90
  • Let the users build the rest from the components

52
Data Federations of Web Services
  • Massive datasets live near their owners
  • Near the instruments software pipeline
  • Near the applications
  • Near data knowledge and curation
  • Super Computer centers become Super Data Centers
  • Each Archive publishes a web service
  • Schema documents the data
  • Methods on objects (queries)
  • Scientists get personalized extracts
  • Uniform access to multiple Archives
  • A common global schema

Federation
53
Grid and Web Services Synergy
  • I believe the Grid will be many web
    services share data (computrons are free)
  • IETF standards Provide
  • Naming
  • Authorization / Security / Privacy
  • Distributed Objects
  • Discovery, Definition, Invocation, Object Model
  • Higher level services workflow, transactions,
    DB,..
  • Synergy commercial Internet Grid tools

54
Web Services The Key?
  • Web SERVER
  • Given a url parameters
  • Returns a web page (often dynamic)
  • Web SERVICE
  • Given a XML document (soap msg)
  • Returns an XML document
  • Tools make this look like an RPC.
  • F(x,y,z) returns (u, v, w)
  • Distributed objects for the web.
  • naming, discovery, security,..
  • Internet-scale distributed computing

Your program
Web Server
http
Web page
Your program
Web Service
soap
Data In your address space
objectin xml
55
Grid?
  • Harvesting spare cpu cycles is not important
  • They are free (1/cpu day)
  • They need applications and data (which are not
    free) (1/GB shipped)
  • Accessing distributed data IS important
  • Send the programs to the data
  • Send the questions to the databases.
  • Super Computer Centers become Super Data
    Centers Super Application Centers

56
The Grid Foster Kesselman (Argonne National
Laboratory)
Internet computing and GRID technologies promise
to change the way we tackle complex problems.
They will enable large-scale aggregation and
sharing of computational, data and other
resources across institutional boundaries .
Transform scientific disciplines ranging from
high energy physics to the life sciences
57
Grid/Globus
  • Leader of the pack for GRID middleware
  • Layered software toolkit
  • 1 Grid Fabric (OS, TCP)
  • 2 Grid Services Globus Resource Allocation
    Manager Globus Information Service
    (meta-computing directory service) Grid
    Security Infrastructure GridFTP
  • 3 Application Toolkits Job submission MPICH-G
    2 message passing interface
  • 4Specific Applications OVERFLOW Navier-Stokes
    flow solver

58
Globus in gory detail
  • SHELL SCRIPTS
  • globus-mds-search '((hndenali.mcs.anl.gov)(objec
    tclassGlobusSystemDynamicInformation))' cpuload1
    \
  • sed -n -e '/hn/p' -e '/cpuload1/p' \
  • sed -e 's/,.//' -e 's// /g' \
  • awk '/hn/printf "s", 2
    /cpuload/printf " s\n", 2
  • if -eq 0 then
  • echo "provide argument ltnumber of processes to
    startgt" 1gt2
  • exit 1
  • fi
  • if -z "GRAMCONTACT" then
  • GRAMCONTACT"globus-hostname2contacts -type
    fork pitcairn.mcs.anl.gov"
  • fi
  • pwd/bin/pwd
  • rsl"(executablepwd/myjobtest)(count1)"
  • archGLOBUS_INSTALL_PATH/sbin/config.guess
  • GLOBUS_INSTALL_PATH/tools/arch/bin/globusrun
    -o -r "GRAMCONTACT" "rsl"
  • LIBRARIES
  • / get process id and hostname /
  • pid getpid()
  • rc globus_libc_gethostname(hn, 256)
  • globus_assert(rc GLOBUS_SUCCESS)
  • / get current time and convert to string
    format. setting 25 to zero will strip the
    newline character. /
  • mytime time(GLOBUS_NULL)
  • timestr globus_libc_ctime_r( mytime, buf,
    30 )
  • timestr25 '\0'
  • globus_libc_printf("s process d on s
    came to \ life\n",timestr, pid, hn)
  • /THE BARRIER!!! /
  • globus_duroc_runtime_barrier()
  • /Passed the barrier get current time again
    and print it out./
  • mytime time(GLOBUS_NULL)
  • timestr globus_libc_ctime_r( mytime, buf,
    30 )
  • globus_libc_printf("s process d on s
    passed \the barrier\n", timestr, pid, hn)
  • /TODO 1 get the layout of the DUROC job
    using first globus_duroc_runtime_intra_subjob_r
    ank() and then globus_duroc_runtime_inter_subj
    ob_structure(). /
  • / We are done./
  • rc globus_module_deactivate_all()

59
Shielding Users
  • Users do not want to deal with XML,they want
    their data
  • Users do not want to deal with configuring grid
    computing, they want results
  • SOAP data appears in user memory, XML is
    invisible
  • SOAP call just a remote procedure

60
Atomic Services
  • Metadata information about resources
  • Waveband
  • Sky coverage
  • Translation of names to universal dictionary
    (UCD)
  • Simple search patterns on the resources
  • Cone Search
  • Image mosaic
  • Unit conversions
  • Simple filtering, counting, histogramming
  • On-the-fly recalibrations

61
Higher Level Services
  • Built on Atomic Services
  • Perform more complex tasks
  • Examples
  • Automated resource discovery
  • Cross-identifications
  • Photometric redshifts
  • Outlier detections
  • Visualization facilities
  • Expectation
  • Build custom portals in matter of days from
    existing building blocks (like today in IRAF or
    IDL)

62
SkyQuery
  • Distributed Query tool using a set of services
  • Feasibility study, built in 6 weeks from scratch
  • Tanu Malik (JHU CS grad student)
  • Tamas Budavari (JHU astro postdoc)
  • Implemented in C and .NET
  • Won 2nd prize of Microsoft XML Contest
  • Allows queries like

SELECT o.objId, o.r, o.type, t.objId FROM
SDSSPhotoPrimary o, TWOMASSPhotoPrimary t
WHERE XMATCH(o,t)lt3.5 AND AREA(181.3,-0.76,6.5)
AND o.type3 and (o.I - t.m_j)gt2
63
Architecture
Web Page
Image cutout
SkyQuery
SkyNodeSDSS
SkyNode2Mass
SkyNodeFirst
64
Cross-id Steps
SELECT o.objId, o.r, o.type, t.objId FROM
SDSSPhotoPrimary o, TWOMASSPhotoPrimary t
WHERE XMATCH(o,t)lt3.5 AND
AREA(181.3,-0.76,6.5) AND (o.i - t.m_j) gt
2 AND o.type3
  • Parse query
  • Get counts
  • Sort by counts
  • Make plan
  • Cross-match
  • Recursively, from small to large
  • Select necessary attributes only
  • Return output
  • Insert cutout image

65
Show Cutout Web Service
Write a Comment
User Comments (0)
About PowerShow.com