Montage, Pegasus and ROME

About This Presentation

Title:

Montage, Pegasus and ROME

Description:

Victoria. 2. Montage. Version 1.7 approved for public release ... Victoria. 7. Why ROME and Why Not Apache? ... Victoria. 16. Whither Next? Submit requests ... – PowerPoint PPT presentation

Number of Views:74

Avg rating:3.0/5.0

Slides: 17

Provided by: Krista63

Learn more at: http://www.us-vo.org

Category:

more less

Transcript and Presenter's Notes

Title: Montage, Pegasus and ROME

1
THE US NATIONAL VIRTUAL OBSERVATORY
Montage, Pegasus and ROME
G. B. Berriman, J.C. Good, M. Kong, A.
Laity IPAC/Caltech J. Jacob, A. Bergou, D. S.
Katz JPL R. Williams CACR E. Deelman, G. Singh,
M.-H. Su, C. Kesselman ISI
2
Montage

Version 1.7 approved for public release
Download page will be at montage.ipac.caltech.edu
Complete users guide including caveats
Tested and validated on 2MASS 2IDR images on
single processor Linux platforms
Tested on 10 WCS projections with mosaics smaller
than 2 x 2 degrees and coordinate transformations
Equ J2000 to Galactic and Ecliptic
First release emphasizes accuracy in photometry
and astrometry
20 modules 7560 Lines of code 2595 test cases
executed
119 defects reported and 116 corrected

3
Montage Test Results Summary
4
Montage The Grid Years

Re-projection is slow (2 min for a2 MASS image,
single processor 1.4 GHz Linux box) ? parallel
processing
Grid is an abstraction - array of processors,
grid of clusters,
Montage has loosely coupled code - run on any
environment
Prototype version of a methodology for running on
any grid environment
Many parts of the process can be parallelized
Build a Directed Acyclic Graph (DAG)
DAG is a script that enables parallelization
Describes what is to be run and when, so flow of
processing is specified
DAG is submitted to standard tools for execution

5
War and Peace Nebula
6
Montage and Pegasus
Pegasus takes the abstract workflow
description, locates the compute resources and
data and produces a concrete DAG which can be run
on the Grid
7
Why ROME and Why Not Apache?

Apache accepts http requests over a TCP/IP
network and returns html documents
Accepts requests anonymously, parses requests ,
checks if executable is in path, runs it
Works very well when response is fast
BUT it has no memory of the request and so
cannot manage information and respond to messages
Apaches limitations are exposed when data
processing or requests take an indeterminate
time (hours, days, even weeks)
complex database queries,
large-scale image processing or
large scale statistical analysis
? A simple, portable request management
environment which can work in conjunction with
existing browsers, HTTP services and custom
client environments to provide reliable execution
of long-lived jobs and can communicate status
information in more detailed ways to clients.

8
ROME Demonstration- Registration
9
User preferences
10
ROME Demonstration - Job Submission
-

Custom order for mosaics of ISSA images submitted
to a Linux processor

11
Job Information Filters
12
ROME Interactive Request Monitor
13
Rho Oph Orion
14

ROME
Architectural Diagram
Clients include Browsers, NVO portals, and
user-built custom code
The heart of ROME is an EJB container tightly
coupled with a DBMS
Container where special hooks exist to simplify
synchronization of user and service interaction
Container and DBMS immaterial - during initial
development used WebLogic and Informix
ROME does not start processing- special
processor does this
Contact ROME (via Servlet URL) to get job
parameter
Starts CGI program for user
Process messages from the CGI program through
stdout
Process kill or abort requests
Processor is currently a very simple JAVA VM
Can be run anywhere on the net.
Can in principle be implemented in other
languages.
Applications can be as simple as reusing existing
CGI programs, but should support more complex
processing.

15
ROMEProcessing Scenario

User registers with ROME. This is necessary for
messaging (including completion notification).
The user identity is simply their email
address.
User submits job to ROME through the User
Registration servlet. The user is added to the
DBMS.
A processor (there can be many) asks ROME for a
job to process (through the Get Next Request
servlet).
The processor starts (or potentially continues
talking to) an application (e.g. a CGI program)
which does the real work.
The application at a minimum emits messages (text
printed to stdout) when job started and at the
completion. In addition it can optionally emit
progress report messages at any time.

On completion, all data products of the
application will have been saved to a temporary
workspace in the application file system. This
workspace is HTTP accessible and the completion
message from the application contains a pointer
to this data.
All messages are forwarded from the processor to
the ROME core where they are stored in the DBMS
and forwarded to the user either directly (if
they are using a client which can register a
message socket with ROME), later (if the user
reconnects with such a client), or eventually by
email (email is usually only for completion
status messages).
The user (manually through a browser or with
degrees of automation through custom GUI clients)
retrieves the data.