Title: Montage, Pegasus and ROME
1THE US NATIONAL VIRTUAL OBSERVATORY
Montage, Pegasus and ROME
G. B. Berriman, J.C. Good, M. Kong, A.
Laity IPAC/Caltech J. Jacob, A. Bergou, D. S.
Katz JPL R. Williams CACR E. Deelman, G. Singh,
M.-H. Su, C. Kesselman ISI
2Montage
- Version 1.7 approved for public release
- Download page will be at montage.ipac.caltech.edu
- Complete users guide including caveats
- Tested and validated on 2MASS 2IDR images on
single processor Linux platforms - Tested on 10 WCS projections with mosaics smaller
than 2 x 2 degrees and coordinate transformations
Equ J2000 to Galactic and Ecliptic - First release emphasizes accuracy in photometry
and astrometry - 20 modules 7560 Lines of code 2595 test cases
executed - 119 defects reported and 116 corrected
-
3Montage Test Results Summary
4Montage The Grid Years
- Re-projection is slow (2 min for a2 MASS image,
single processor 1.4 GHz Linux box) ? parallel
processing - Grid is an abstraction - array of processors,
grid of clusters, - Montage has loosely coupled code - run on any
environment - Prototype version of a methodology for running on
any grid environment - Many parts of the process can be parallelized
- Build a Directed Acyclic Graph (DAG)
- DAG is a script that enables parallelization
- Describes what is to be run and when, so flow of
processing is specified - DAG is submitted to standard tools for execution
5War and Peace Nebula
6Montage and Pegasus
Pegasus takes the abstract workflow
description, locates the compute resources and
data and produces a concrete DAG which can be run
on the Grid
7Why ROME and Why Not Apache?
- Apache accepts http requests over a TCP/IP
network and returns html documents - Accepts requests anonymously, parses requests ,
checks if executable is in path, runs it - Works very well when response is fast
- BUT it has no memory of the request and so
cannot manage information and respond to messages - Apaches limitations are exposed when data
processing or requests take an indeterminate
time (hours, days, even weeks) - complex database queries,
- large-scale image processing or
- large scale statistical analysis
- ? A simple, portable request management
environment which can work in conjunction with
existing browsers, HTTP services and custom
client environments to provide reliable execution
of long-lived jobs and can communicate status
information in more detailed ways to clients.
8ROME Demonstration- Registration
9User preferences
10ROME Demonstration - Job Submission
-
- Custom order for mosaics of ISSA images submitted
to a Linux processor
11Job Information Filters
12ROME Interactive Request Monitor
13Rho Oph Orion
14- ROME
- Architectural Diagram
- Clients include Browsers, NVO portals, and
user-built custom code - The heart of ROME is an EJB container tightly
coupled with a DBMS - Container where special hooks exist to simplify
synchronization of user and service interaction - Container and DBMS immaterial - during initial
development used WebLogic and Informix - ROME does not start processing- special
processor does this - Contact ROME (via Servlet URL) to get job
parameter - Starts CGI program for user
- Process messages from the CGI program through
stdout - Process kill or abort requests
- Processor is currently a very simple JAVA VM
- Can be run anywhere on the net.
- Can in principle be implemented in other
languages. - Applications can be as simple as reusing existing
CGI programs, but should support more complex
processing.
15ROMEProcessing Scenario
- User registers with ROME. This is necessary for
messaging (including completion notification).
The user identity is simply their email
address. - User submits job to ROME through the User
Registration servlet. The user is added to the
DBMS. - A processor (there can be many) asks ROME for a
job to process (through the Get Next Request
servlet). - The processor starts (or potentially continues
talking to) an application (e.g. a CGI program)
which does the real work. - The application at a minimum emits messages (text
printed to stdout) when job started and at the
completion. In addition it can optionally emit
progress report messages at any time.
- On completion, all data products of the
application will have been saved to a temporary
workspace in the application file system. This
workspace is HTTP accessible and the completion
message from the application contains a pointer
to this data. - All messages are forwarded from the processor to
the ROME core where they are stored in the DBMS
and forwarded to the user either directly (if
they are using a client which can register a
message socket with ROME), later (if the user
reconnects with such a client), or eventually by
email (email is usually only for completion
status messages). - The user (manually through a browser or with
degrees of automation through custom GUI clients)
retrieves the data.
16Whither Next?
- Submit requests through ROME for processing jobs
running on grids - Montage, others, . . .
- Support executives requesting complex jobs and
pipelines - Support registries
- Pass security certificates
- Open Source EJB server