Using the Grid for Astronomical Data Roy Williams, Caltech - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Using the Grid for Astronomical Data Roy Williams, Caltech

Description:

Edge-free. Pyramid weight. Mining AND Outreach. DPOSS 15 . Griffith Observatory 'Big Picture' ... eg file:///envoy4/raid3/frames/20050825/012.fits ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 54
Provided by: usvo
Learn more at: http://www.us-vo.org
Category:

less

Transcript and Presenter's Notes

Title: Using the Grid for Astronomical Data Roy Williams, Caltech


1
Using the Grid for Astronomical Data Roy
Williams, Caltech
2
Palomar-Quest SurveyCaltech, NCSA, Yale
Transient pipeline computing reservation at
sunrise for immediate followup of
transients Synoptic survey massive
resampling (Atlasmaker) for ultrafaint
detection
P48 Telescope
50 Gbyte/night
ALERT
Caltech
Yale
TG ?
NCSA
NCSA and Caltech and Yale run different pipelines
on the same data
5 Tbyte
3
Wide-area Mosaicking (Hyperatlas)An NVO-Teragrid
projectCaltech
DPOSS 15º
High-quality flux-preserving, spatial
accuracy Stackable Hyperatlas Edge-free Pyramid
weight Mining AND Outreach
Griffith Observatory "Big Picture"
4
Synoptic Image Stack
5
PQ Pipeline
VOEventNet
Observation Night 28 columns x 4 filters up to 70
Gbyte
Computing
real-time
next day
coadd
cleaned frames
hyperatlas pages
quasars _at_z4
6
Mosaicking service
NVO Registry
Logical SIAP
Security
Physical SIAP
Portal

Request
http
Sandbox
Computing
7
Transient from PQ
from catalog pipeline
8
VOEventNet a Rapid-Response Telescope Grid
GRB satellites
Palomar-Quest
PQ next-daypipelines
baselinesky
Raptor
catalog
Palomar 60
Event Synthesis Engine
PQ Event Factory
VOEventNet
Pairitel
SDSS
2MASS
known Variables
known asteroids
remote archives
9
ISW Effect
Correlation of mass distribution (SDSS) with CMB
(ISW effect) -- statistical significance through
ensemble of simulated universes Connolly and
Scrantom, U Pittsburgh
10
Amanda analysis
Analysis of data from AMANDA Antarctic Muon and
Neutrino Detector Array Barwick and Silvestri,
UC Irvine
11
Quasar ScienceAn NVO-Teragrid projectPennState,
CMU, Caltech
  • 60,000 quasar spectra from Sloan Sky Survey
  • Each is 1 cpu-hour submit to grid queue
  • Fits complex model (173 parameter)
  • derive black hole mass from line widths

clusters
NVO data services
globusrun
manager
12
N-point galaxy correlationAn NVO-Teragrid
projectPitt, CMU
Finding triple correlation in 3D SDSS galaxy
catalog (RA/Dec/z) Lots of large parallel
jobs kd-tree algorithms
13
TeraGrid
14
TeraGrid Wide Area Network
15
TeraGrid Components
  • Compute hardware
  • Intel/Linux Clusters, Alpha SMP clusters, POWER4
    cluster,
  • Large-scale storage systems
  • hundreds of terabytes for secondary storage
  • Very high-speed network backbone
  • bandwidth for rich interaction and tight
    coupling
  • Grid middleware
  • Globus, data management,
  • Next-generation applications

16
Overview of Distributed TeraGrid Resources
Site Resources
Site Resources
HPSS
HPSS
External Networks
External Networks
Caltech
Argonne
External Networks
External Networks
NCSA/PACI 10.3 TF 240 TB
SDSC 4.1 TF 225 TB
Site Resources
Site Resources
HPSS
UniTree
17
Cluster Supercomputer
job submission and queueing (Condor, PBS, ..)
login node
100s of nodes
user
purged /scratch
parallel I/O
parallel file system
/home (backed-up)
metadata node
18
TeraGrid Allocations Policies
  • Any US researcher can request an allocation
  • Policies/procedures posted at
  • http//www.paci.org/Allocations.html
  • Online proposal submission
  • https//pops-submit.paci.org/
  • NVO has an account on Teragrid
  • (just ask RW)

19
Data storage
20
Logical and Physical names
  • Logical name
  • application-context
  • eg frame_20050828.012.fits
  • Physical name
  • storage-context
  • eg /home/roy/data/frame_20050828.012.fits
  • eg file///envoy4/raid3/frames/20050825/012.fits
  • eg http//nvo.caltech.edu/vostore/6ab7c828fe73.fit
    s.gz

21
Logical and Physical Names
  • Allows
  • replication of data
  • movement/optimization of storage
  • transition to database (lname - key)
  • heterogeneous/extensible storage hardware
  • /envoy2/raid2, /pvfs/nvo/, etc etc

22
Physical Name
  • Suggest URI form
  • protocol//identifier
  • if you know the protocol, you can interpret the
    identifier
  • Examples
  • file//
  • ftp//
  • srb//
  • uberftp//
  • Transition to services
  • http//server/MadeToOrder?frame012a2b3

23
Typical types of HPC storage needs
24
Disk Farms (datawulf)
  • Homogeneous Disk Farm
  • ( parallel file system)

parallel I/O
metadata node
parallel file system
Large files striped over disks Management node
for file creation, access, ls, etc etc
25
Parallel File System
  • Large files are striped
  • very fast parallel access
  • Medium files are distributed
  • Stripes do not all start the same place
  • Small files choke the PFS manager
  • Either containerize
  • or use blobs in a database
  • not a file system anymore pool of 108 blobs with
    lnames

26
Containerizing
  • Shared metadata
  • Easier for bulk movement

file in container
container
27
Extraction from Container
  • tar container
  • slow extraction (reads whole container)
  • zip container
  • indexed for fast partial extraction
  • 2 Gbyte limit on container size
  • used for fast access 2MASS image service at
    Caltech

28
Storage Resource Broker (SRB)
  • Single logical namespace while accessing
    distributed archival storage resources
  • Effectively infinite storage (first to 1TB wins a
    t-shirt)
  • Data replication
  • Parallel Transfers
  • Interfaces command-line, API, web/portal.

29
Storage Resource Broker (SRB)Virtual Resources,
Replication
NCSA
SDSC
SRB Client (cmdline, or API)

30
Running jobs
31
3 Ways to Submit a Job
  • 1. Directly to PBS Batch Scheduler
  • Simple, scripts are portable among PBS TeraGrid
    clusters
  • 2. Globus common batch script syntax
  • Scripts are portable among other grids using
    Globus
  • 3. Condor-G
  • Nice interface atop Globus, monitoring of all
    jobs submitted via Condor-G
  • Higher-level tools like DAGMan

32
PBS Batch Submission
  • Single executables to be on a single remote
    machine
  • login to a head node, submit to queue
  • Direct, interactive execution
  • mpirun np 16 ./a.out
  • Through a batch job manager
  • qsub my_script
  • where my_script describes executable location,
    runtime duration, redirection of stdout/err,
    mpirun specification
  • ssh tg-login.caltechncsasdscuc.teragrid.org
  • qsub flatten.sh v "FILEf544"
  • qstat or showq
  • ls .dat
  • pbs.out, pbs.err files

33
Remote submission
  • Through globus
  • globusrun -r some-teragrid-head-node.teragrid.or
    g/jobmanager -f my_rsl_script
  • where my_rsl_script describes the same details as
    in the qsub my_script!
  • Through Condor-G
  • condor_submit my_condor_script
  • where my_condor_script describes the same details
    as the globus my_rsl_script!

34
globus-job-submit
  • For running of batch/offline jobs
  • globus-job-submit Submit job
  • same interface as globus-job-run
  • returns immediately
  • globus-job-status Check job status
  • globus-job-cancel Cancel job
  • globus-job-get-output Get job stdout/err
  • globus-job-clean Cleanup after job

35
Condor-G
  • A Grid-enabled version of Condor that provides
    robust job management for Globus clients.
  • Robust replacement for globusrun
  • Provides extensive fault-tolerance
  • Can provide scheduling across multiple Globus
    sites
  • Brings Condors job management features to Globus
    jobs

36
Condor DAGMan
  • Manages workflow interdependencies
  • Each task is a Condor description file
  • A DAG file controls the order in which the Condor
    files are run

37
  • Data intensive computing
  • with NVO services

38
Two Key Ideas for Fault-Tolerance
  • Transactions
  • No partial completion -- either all or nothing
  • eg copy to a tmp filename, then mv to correct
    file name
  • Idempotent
  • Acting as if done only once, even if used
    multiple times
  • Can run the script repeatedly until finished

39
DPOSS flattening
Source
Target
2650 x 1.1 Gbyte files Cropping borders Quadratic
fit and subtract Virtual data
40
Driving the Queues
for f in os.listdir(inputDirectory) if the
file exists, with the right size and age, then we
keep it ofile outputDirectory "/" f
if os.path.exists(ofile) osize
os.path.getsize(ofile) if osize !
1109404800 print " -- wrong
target size, remaking", osize else
time_tgt filetime(ofile)
time_src filetime(file)
if time_tgt print(" -- target too old or nonexistant,
making") else
print " -- already have target file "
continue cmd "qsub flat.sh -v
\"FILE" f "\"" print " -- submitting
batch job ", cmd os.system(cmd)
  • Here is the driver that makes and submits jobs

41
PBS script
  • A PBS script. Can do "qsub script.sh v
    "FILEf345"

!/bin/sh PBS -N dposs PBS -V PBS -l
nodes1 PBS -l walltime10000 cd
/home/roy/dposs-flat/flat ./flat \ -infile
/pvfs/mydata/source/FILE.fits \ -outfile
/pvfs/mydata/target/FILE.fits \ -chop 0 0 1500
23552 \ -chop 0 0 23552 1500 \ -chop 0 22052
23552 23552 \ -chop 22052 0 23552 23552 \ -chop
18052 0 23552 4000
42
Hyperatlas
Standard naming for atlases and pages TM-5-SIN-20
Page 1589
Standard Scales scale s means 220-s arcseconds
per pixel
Standard Layout
TM-5 layout
Standard Projections
HV-4 layout
SIN projection
TAN projection
43
Hyperatlas is a Service
  • All Pages /getChart?atlasTM-5-SIN-20
    (and no other arguments)
  • 0 2.77777778E-4 'RA---SIN 'DEC--SIN' 0.0
    -90.0
  • 1 2.77777778E-4 'RA---SIN 'DEC--SIN' 0.0 -85.0
  • 2 2.77777778E-4 'RA---SIN 'DEC--SIN' 36.0
    -85.0
  • ...
  • 1731 2.77777778E-4 'RA---SIN 'DEC--SIN' 288.0
    85.0
  • 1732 2.77777778E-4 'RA---SIN 'DEC--SIN' 324.0
    85.0
  • 1733 2.77777778E-4 'RA---SIN 'DEC--SIN' 0.0
    90.0
  • Sky to Page page1603RA182Dec62 -- page,
    scale, ctype, RA, Dec. x, y
  • 1603 2.777777777777778E-4 'RA---TAN' 'DEC--TAN'
    175.3 60.0 -11180.1 7773.7
  • Best Page RA182Dec62 -- page, scale, ctype,
    RA, Dec. x, y
  • 1604 2.77777778E-4 'RA---SIN 'DEC--SIN'
    184.61538 60.0 4422.4 7292.1
  • Page WCS page1604 -- page, scale, ctype, RA,
    Dec
  • 1604 2.77777778E-4 'RA---SIN' 'DEC--SIN'
    184.61538 60.0
  • Replicated Implementations

44
Hyperatlas Service
  • Page to Sky page1603x200y500 -- RA, Dec,
    nx,n y, nz
  • 184.5 60.1 -0.496 -0.039 0.867
  • Relevant pages from sky region
    tilesize4096ramin200.0ramax202.0decmin11.0
    decmax12.0
  • -- RA, Dec, nx,n y, nz
  • 1015 -1 1
  • 1015 -1 2
  • 1015 -2 1
  • 1015 -2 2
  • 1015 0 1
  • 1015 0 2
  • Implementation
  • baseURL http//nvo.caltech.edu8080/hyperatlas
    (services try)

page 1015 ref point RA200, Dec10
45
GET services from Python
  • This code uses a service to find the best
    hyperatlas page for a given sky location

import urllib hyperatlasURL self.hyperatlasServe
r "/getChart?atlas" atlas \ "RA"
str(center1) "Dec" str(center2) stream
urllib.urlopen(hyperatlasURL) result is a
tab-separated line, so use split() to
tokenize tokens stream.readline().split('\t') pr
int "Using page ", tokens0, " of atlas ",
atlas self.scale float(tokens1) self.CTYPE1
tokens2 self.CTYPE2 tokens3 rval1
float(tokens4) rval2 float(tokens5)
46
VOTable parser in Python
  • From a SIAP URL, we get the XML, and extract the
    columns that have the image references, image
    format, and image RA/Dec

import urllib import xml.dom.minidom stream
urllib.urlopen(SIAP_URL) doc xml.dom.minidom.par
se(stream) Make a dictionary for the
columns col_ucd_dict for XML_TABLE in
doc.getElementsByTagName("TABLE") for
XML_FIELD in XML_TABLE.getElementsByTagName("FIELD
") col_ucd XML_FIELD.getAttribute("
ucd") col_ucd_dictcol_title
col_counter urlColumn col_ucd_dict"VOXImage_A
ccessReference" formatColumn
col_ucd_dict"VOXImage_Format" raColumn
col_ucd_dict"POS_EQ_RA_MAIN" deColumn
col_ucd_dict"POS_EQ_DEC_MAIN"
47
VOTable parser in Python
  • Table is a list of rows, and each row is a list
    of table cells

import xml.dom.minidom table for XML_TABLE in
doc.getElementsByTagName("TABLE") for
XML_DATA in XML_TABLE.getElementsByTagName("DATA")
for XML_TABLEDATA in XML_DATA.getElement
sByTagName("TABLEDATA") for XML_TR
in XML_TABLEDATA.getElementsByTagName("TR")
row for XML_TD in
XML_TR.getElementsByTagName("TD")
data "" for child in
XML_TD.childNodes data
child.data
row.append(data) table.append(row)
48
Science Gateways
49
Grid Impediments
and now do some science....
Learn Globus Learn MPI Learn PBS Port code to
Itanium Get certificate Get logged in Wait 3
months for account Write proposal
50
A better wayGraduated Securityfor Science
Gateways
power user
Write proposal - own account
big-ironcomputing....
Authenticate X.509 - browser or cmd line
morescience....
Register - logging and reporting
somescience....
Web form - anonymous
51
2MASS Mosaicking portalAn NVO-Teragrid
projectCaltech IPAC
52
Three Types of Science Gateways
  • Web-based Portals
  • User interacts with community-deployed web
    interface.
  • Runs community-deployed codes
  • Service requests forwarded to grid resources
  • Scripted service call
  • User writes code to submit and monitor jobs
  • Grid-enabled applications
  • Application programs on users' machines (eg IRAF)
  • Also runs program on grid resource

53
Secure Web services for Teragrid Access
Write a Comment
User Comments (0)
About PowerShow.com