Using the Grid for Astronomy Roy Williams, Caltech - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Using the Grid for Astronomy Roy Williams, Caltech

Description:

Simulated dark matter density in early universe. N-body gravitational dynamics ... and Carl Kesselman 'The Grid: blueprint for a new computing infrastructure' ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 57
Provided by: usvo
Learn more at: http://www.us-vo.org
Category:

less

Transcript and Presenter's Notes

Title: Using the Grid for Astronomy Roy Williams, Caltech


1
Using the Grid for Astronomy Roy Williams,
Caltech
2
Enzo Case Study
  • Simulated dark matter density in early universe
  • N-body gravitational dynamics (particle-mesh
    method)
  • Hydrodynamics with PPM and ZEUS
    finite-difference
  • Up to 9 species of H and He
  • Radiative cooling
  • Uniform UV background (Haardt Madau)
  • Star formation and feedback
  • Metallicity fields

3
Adaptive Mesh Refinement (AMR)
  • multilevel grid hierarchy
  • automatic, adaptive, recursive
  • no limits on depth,complexity of grids
  • C/F77
  • Bryan Norman (1998)

Source J. Shalf
4
Distributed Computing Zoo
  • Grid Computing
  • Also called High-Performance Computing
  • Big clusters, Big data, Big pipes, Big centers
  • Globus backbone, which now includes Services and
    Gateways
  • Decentralized control
  • Cluster Computing
  • local interconnect between identical cpus
  • Peer-to-Peer (Napster, Kazaa)
  • Systems for sharing data without centeral server
  • Internet Computing
  • Screensaver cycle scavenging
  • eg SETI_at_home, Einstein_at_home, ClimatePrediction.net
    , etc
  • Access Grid
  • A videoconferencing system
  • Globus
  • A popular software package to federate resources
    into a grid
  • TeraGrid
  • A 150M award from NSF to the Supercomputer
    centers (NCSA, SCSC, PSC, etc etc)

5
What is the Grid?
  • The World Wide Web provides seamless access to
    information that is stored in many millions of
    different geographical locations
  • In contrast, the Grid is an emerging
    infrastructure that provides seamless access to
    computing power and data storage capacity
    distributed over the globe.

6
What is the Grid?
  • Grid was coined by Ian Foster and Carl
    Kesselman The Grid blueprint for a new
    computing infrastructure.
  • Analogy with the electric power grid plug-in to
    computing power without worrying where it comes
    from, like a toaster.
  • The idea has been around under other names for a
    while (distributed computing, metacomputing, ).
  • Technology is in place to realise the dream on a
    global scale.

7
What is Middleware?
  • The GRID middleware
  • Finds convenient places for the scientists job
    (computing task) to be run
  • Optimises use of the widely dispersed resources
  • Organises efficient access to scientific data
  • Deals with authentication to the different sites
  • Interfaces to local site authorisation /
    resource allocation
  • Runs the jobs
  • Monitors progress
  • Recovers from problems
  • and .
  • Tells you when the work is complete and transfers
    the result back!

8
Grid as Federation
  • Grid as a federation
  • independent centers
  • ? flexibility
  • unified interface
  • power and strength
  • Large/small state compromise

9
Three Big Ideas of Grid
  • Federation and Uniformity
  • independent management uniform face open
    standards
  • Trust and Security
  • access policy uniform authentication/authorizatio
    n
  • Distance doesnt matter
  • 20 Mbyte/sec, global file system

10
Grid projects in the world
  • DOE Science Grid
  • NSF National Virtual Observatory
  • NSF GriPhyN/iVDGL
  • DOE Particle Physics Data Grid
  • NSF TeraGrid
  • DOE Earth Systems Grid
  • NEESGrid
  • DOH BIRN
  • UK e-Science Grid
  • EUROGRID
  • DataGrid (CERN, ...)
  • EuroGrid (Unicore)
  • DataTag (CERN,)
  • GridLab (Cactus Toolkit)
  • CrossGrid (Infrastructure Components)

11
TeraGrid Wide Area Network
12
TeraGrid Components
  • Compute hardware
  • Intel/Linux Clusters, Alpha SMP clusters, POWER4
    cluster,
  • Large-scale storage systems
  • hundreds of terabytes for secondary storage
  • Very high-speed network backbone
  • bandwidth for rich interaction and tight
    coupling
  • Grid middleware
  • Globus, data management,
  • Next-generation applications

13
TeraGrid Resources
14
The TeraGrid VisionDistributing the resources is
better than putting them at one site
  • Build new, extensible, grid-based infrastructure
  • New hardware, new networks, new software, new
    practices, new policies
  • Leverage homogeneity
  • Run single job across entire TeraGrid
  • Move executables between sites
  • Catch-phrase Open, Deep and Wide
  • Open to US science community
  • Heroic computing possible by programming Unix
  • Easy to use through science gateways

15
TeraGrid Allocations Policies
  • Any US researcher can request an allocation
  • http//www.teragrid.org

16
Wide Variety of Usage Scenarios
  • Tightly coupled simulation jobs storing vast
    amounts of data, performing visualization
    remotely as well as making data available through
    online collections (ENZO)
  • Thousands of independent jobs using data from a
    distributed data collection (NVO)
  • Science Gateways "not a Unix prompt"!
  • from web browser with security
  • SOAP client for scripting
  • from application eg IRAF, IDL

17
Running jobs
18
Account Security
  • Username/Password
  • weak security, too many holes
  • deprecated in many places
  • SSH keys
  • put public key on remote machine
  • serves as single sign-on
  • X.509 Certificates
  • Proves identity
  • Flexible

19
Ways to Submit a Job
  • 1. Directly to PBS Batch Scheduler
  • Simple, scripts are portable among PBS TeraGrid
    clusters
  • 2. Globus common batch script syntax
  • Scripts are portable among other grids using
    Globus
  • 3. Condor-G
  • Condor Globus
  • 4. Use a science gateway, eg Nesssi
  • specific tasks, easy to use

20
PBS Batch Submission
  • Single executables to be on a single remote
    machine
  • login to a head node, submit to queue
  • Direct, interactive execution
  • mpirun np 16 ./a.out
  • Through a batch job manager
  • qsub my_script
  • where my_script describes executable location,
    runtime duration, redirection of stdout/err,
    mpirun specification
  • ssh tg-login.sdsc.teragrid.org
  • qsub flatten.sh v "FILEf544"
  • qstat or showq
  • ls .dat
  • pbs.out, pbs.err files

21
Remote submission
  • Through globus
  • globusrun -r some-teragrid-head-node.teragrid.or
    g/jobmanager -f my_rsl_script
  • where my_rsl_script describes the same details as
    in the qsub my_script!
  • Through Condor-G
  • condor_submit my_condor_script
  • where my_condor_script describes the same details
    as the globus my_rsl_script!

22
Condor-G
  • A Grid-enabled version of Condor that provides
    robust job management for Globus clients.
  • Robust replacement for globusrun
  • Provides extensive fault-tolerance
  • Can provide scheduling across multiple Globus
    sites
  • Brings Condors job management features to Globus
    jobs

23
Condor DAGMan
  • Manages workflow interdependencies
  • Each task is a Condor description file
  • A DAG file controls the order in which the Condor
    files are run

24
Cluster Supercomputer
job submission and queueing (Condor, PBS, ..)
login node
100s of nodes
user
purged /scratch
parallel I/O
parallel file system
/home (backed-up)
global file system
metadata node
25
MPI parallel programming
  • Each node runs same program
  • first finds its number (rank)
  • and the number of coordinating nodes (size)
  • Laplace solver example

Algorithm Each value becomes average of neighbor
values
node 0
node 1
Serial for each point, compute average remember
boundary conditions
Parallel Run algorithm with ghost points Use
messages to exchange ghost points
26
Globus
  • Security
  • Single-sign-on, certificate handling, CAS,
    MyProxy
  • Execution Management
  • Remote jobs GRAM and Condor-G
  • Data Management
  • GridFTP, reliable FT, 3rd party FT
  • Information Services
  • aggregating information from federated grid
    resources
  • Common Runtime Components
  • web services through GT4
  • The following is a personal opinion,
  • it is NOT the position of the NVO
  • Globus is a complex and difficult installation
  • Globus needs frequent maintenance and updates
  • Globus is monolithic (all or nothing)

27
Data storage
28
Typical types of HPC storage needs
29
Disk Farms (datawulf)
  • Homogeneous Disk Farm
  • ( parallel file system)

parallel I/O
metadata node
parallel file system
Large files striped over disks Management node
for file creation, access, ls, etc etc
30
Parallel File System
  • Large files are striped
  • very fast parallel access
  • Medium files are distributed
  • Stripes do not all start the same place
  • Small files choke the PFS manager
  • Either containerize
  • or use blobs in a database
  • not a file system anymore pool of 108 blobs with
    lnames

31
Storage Resource Broker (SRB)
  • Single logical namespace while accessing
    distributed archival storage resources
  • Effectively infinite storage
  • Data replication
  • Parallel Transfers
  • Interfaces command-line, API, SOAP, web/portal.

32
Storage Resource Broker (SRB)Virtual Resources,
Replication
NCSA
SDSC
SRB Client (cmdline, or API)

33
Storage Resource Broker (SRB)Virtual Resources,
Replication
Similar to VOSpace concept
certificate
casjobs at JHU
Browser SOAP client Command-line ....
tape at sdsc
File may be replicated File comes with
metadata ... may be customized
myDisk
34
Containerizing
  • Shared metadata
  • Easier for bulk movement

file in container
container
35
  • Data intensive computing
  • with NVO services

36
Two Key Ideas for Fault-Tolerance
  • Transactions
  • No partial completion -- either all or nothing
  • eg copy to a tmp filename, then mv to correct
    file name
  • Idempotent
  • Acting as if done only once, even if used
    multiple times
  • Can run the script repeatedly until finished

37
DPOSS flattening
Source
Target
2650 x 1.1 Gbyte files Cropping borders Quadratic
fit and subtract Virtual data
38
Driving the Queues
for f in os.listdir(inputDirectory) if the
file exists, with the right size and age, then we
keep it ofile outputDirectory "/" f
if os.path.exists(ofile) osize
os.path.getsize(ofile) if osize !
1109404800 print " -- wrong
target size, remaking", osize else
time_tgt filetime(ofile)
time_src filetime(file)
if time_tgt lt time_src
print(" -- target too old or nonexistant,
making") else
print " -- already have target file "
continue cmd "qsub flat.sh -v
\"FILE" f "\"" print " -- submitting
batch job ", cmd os.system(cmd)
  • Here is the driver that makes and submits jobs

39
PBS script
  • A PBS script. Can do "qsub script.sh v
    "FILEf345"

!/bin/sh PBS -N dposs PBS -V PBS -l
nodes1 PBS -l walltime10000 cd
/home/roy/dposs-flat/flat ./flat \ -infile
/pvfs/mydata/source/FILE.fits \ -outfile
/pvfs/mydata/target/FILE.fits \ -chop 0 0 1500
23552 \ -chop 0 0 23552 1500 \ -chop 0 22052
23552 23552 \ -chop 22052 0 23552 23552 \ -chop
18052 0 23552 4000
40
GET services from Python
  • This code uses a service to find the best
    hyperatlas page for a given sky location

import urllib hyperatlasURL self.hyperatlasServe
r "/getChart?atlas" atlas \ "RA"
str(center1) "Dec" str(center2) stream
urllib.urlopen(hyperatlasURL) result is a
tab-separated line, so use split() to
tokenize tokens stream.readline().split('\t') pr
int "Using page ", tokens0, " of atlas ",
atlas self.scale float(tokens1) self.CTYPE1
tokens2 self.CTYPE2 tokens3 rval1
float(tokens4) rval2 float(tokens5)
41
VOTable parser in Python
  • From a SIAP URL, we get the XML, and extract the
    columns that have the image references, image
    format, and image RA/Dec

import urllib import xml.dom.minidom stream
urllib.urlopen(SIAP_URL) doc xml.dom.minidom.par
se(stream) Make a dictionary for the
columns col_ucd_dict for XML_TABLE in
doc.getElementsByTagName("TABLE") for
XML_FIELD in XML_TABLE.getElementsByTagName("FIELD
") col_ucd XML_FIELD.getAttribute("
ucd") col_ucd_dictcol_title
col_counter urlColumn col_ucd_dict"VOXImage_A
ccessReference" formatColumn
col_ucd_dict"VOXImage_Format" raColumn
col_ucd_dict"POS_EQ_RA_MAIN" deColumn
col_ucd_dict"POS_EQ_DEC_MAIN"
42
VOTable parser in Python
  • Table is a list of rows, and each row is a list
    of table cells

import xml.dom.minidom table for XML_TABLE in
doc.getElementsByTagName("TABLE") for
XML_DATA in XML_TABLE.getElementsByTagName("DATA")
for XML_TABLEDATA in XML_DATA.getElement
sByTagName("TABLEDATA") for XML_TR
in XML_TABLEDATA.getElementsByTagName("TR")
row for XML_TD in
XML_TR.getElementsByTagName("TD")
data "" for child in
XML_TD.childNodes data
child.data
row.append(data) table.append(row)
43
Science Gateways
44
Grid Impediments
and now do some science....
Learn Globus Learn MPI Learn PBS Port code to
Itanium Get certificate Get logged in Wait 3
months for account Write proposal
45
A better wayGraduated Securityfor Science
Gateways
power user
Write proposal - own account
big-ironcomputing....
Authenticate X.509 - browser or cmd line
morescience....
Register - logging and reporting
somescience....
Web form - anonymous
46
2MASS Mosaicking portalAn NVO-Teragrid
projectCaltech IPAC
47
Three Types of Science Gateways
  • Web-based Portals
  • User interacts with community-deployed web
    interface.
  • Runs community-deployed codes
  • Service requests forwarded to grid resources
  • Scripted service call
  • User writes code to submit and monitor jobs
  • Grid-enabled applications
  • Application programs on users' machines (eg IRAF)
  • Also runs program on grid resource

Nesssi
Nesssi
48
Nesssi Secure Web services for astronimy
certificate repository
certificate policies
node
select user account
fetch proxy
node
SOAP http
nesssi web portal
web form
queue
client
nesssi
node
node
sandbox storage
open http
49
Mosaic service
nesssiServer. dpossMosaic.mosaic ( -ra 49.1
-dec 60.1 -rawidth 0.5 -decwidth 0.5 -filt f
-bgcorr 0)
50
Coadd service
nesssiServer.hyperatlas.run ( -bandpass z1 -ra
170.08 -dec 13.275 -rawidth 1.0 -decwidth 1.0
)
51
Cutout Service
nesssiServer.cutout.run(sessionID, "-surveys
PQgr,PQgi,PQz1,PQz2,SDSSr,SDSSi,SDSSz,2MASS
k,2MASSh -size 64)
52
cutouts from Palomar-Quest, SDSS, 2MASS of
sources from Veron quasar catalog
53
Amazon Grid(who will pay?)
54
Amazon Grid
  • Simple Storage Service
  • Write, read, and delete.
  • Each object has a unique, developer-assigned key.
  • Authentication mechanisms. Objects can be private
    or public. Rights can be granted to specific
    users.
  • REST and SOAP interfaces
  • Default download protocol is HTTP.
    BitTorrent(TM) also available.

55
Amazon Grid
  • Elastic Compute Cloud
  • Create an Amazon Machine Image (AMI) containing
    your applications, libraries, data and associated
    configuration settings.
  • Upload the AMI into Amazon Simple Storage
    Service.
  • Configure security and network access.
  • Start, terminate, and monitor as many instances
    of your AMI as needed.
  • Pay for the instance hours and bandwidth that you
    actually consume.
  • 0.10 per instance-hour consumed
  • 0.20 per GB of data transferred outside of
    Amazon
  • 0.15 per GB-Month of Amazon S3 storage

56
Amazon Grid
  • Simple Queue Service
  • Move data between distributed application
    components performing different tasks, without
    losing messages or requiring each component to be
    always available.
  • Unlimited number of queues, unlimited number of
    messages.
  • New messages can be added at any time.
  • A computer can check a queue at any time for
    messages waiting to be read.
  • REST, SOAP and query interfaces.
  • The queue creator determines which other users
    can write to or read from the queue.
Write a Comment
User Comments (0)
About PowerShow.com