A JSDL Applications Repository and Data Staging Portal: Some New Parameter Sweep Developments and Da - PowerPoint PPT Presentation

About This Presentation
Title:

A JSDL Applications Repository and Data Staging Portal: Some New Parameter Sweep Developments and Da

Description:

A JSDL Applications Repository and Data Staging Portal: ... William Lee, An Ly, Steve McGough, Darren Pulsipher, Andreas Savva, Chris Smith ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 22
Provided by: nbiAns
Category:

less

Transcript and Presenter's Notes

Title: A JSDL Applications Repository and Data Staging Portal: Some New Parameter Sweep Developments and Da


1
A JSDL Applications Repository and Data Staging
Portal Some New Parameter Sweep Developments
and Data transfer Requirements
  • David Meredith
  • STFC e-Science Centre
  • Daresbury Laboratory, UK
  • david.meredith_at_stfc.ac.uk
  • Geoff Williams
  • Oxford University Computing Lab, UK
  • geoff.williams_at_comlab.ox.ac.uk

2
  • What it does
  • Aim Provide an easy way to access computing
    resources, execute installed applications,
    stage/move data between different remote file
    systems (e.g. off and onto Grid resources).
  • Browse application templates in the repository
    according to categories of interest (e.g
    bioinformatics, tutorials/examples, physics).
  • Templates fully describe all of the requirements
    of an application for execution (ready to run
    applications, provides a starting point for new
    users).
  • Users benefit from immediate access to the
    expertise, artefacts and configuration captured
    in application description templates (e.g.
    published and shared by domain-experts).
  • Select, load, modify / tweak, save as personal
    template.
  • Browse and perform file operations on different
    file systems (currently SRB, GridFTP, FTP, SFTP).
    List, upload, download, delete, rename.
  • Recursive data copy between different file
    systems.
  • Execute application and stage data in one action.

3
Example Use Case (NGS Applications Repository)
Applications described using middleware agnostic
Job Submission Description Language (JSDL GUI
editor). Ensures community formation around a
best practice approach (OGF), aids
interoperability. Middleware specific
dependencies added at run time - convert the JSDL
into middleware specific scheme (e.g. RSL).
4
JSDL
Ali Anjomshoaa, Fred Brisard, Michel Drescher,
Donal K. Fellows, William Lee, An Ly, Steve
McGough, Darren Pulsipher, Andreas Savva, Chris
Smith
  • JSDL 1.0 is an OGF standard
  • JSDL 1.0 is published as GFD-R-P.56
    http//www.ggf.org/gf/docs/?final
  • An XML Schema language for describing the
    requirements of computational jobs for submission
    to Grids.
  • Is agnostic of middleware - no dependencies on
    Globus, WSRF, gLite (means portal can be generic
    and not tied to any particular set of Grid
    technologies).
  • JSDL documents can be validated against the JSDL
    and JSDL POSIX XSD Schema to ensure its
    correctness

ltjsdlApplicationgt
ltjsdlApplicationNamegtgnuplotlt/jsdlApplicationNam
egt ltjsdl-posixPOSIXApplicationgt
ltjsdl-posixExecutablegt
/usr/local/bin/gnuplot
lt/jsdl-posixExecutablegt
ltjsdl-posixArgumentgtcontrol.txtlt/jsdl-posixArgum
entgt ltjsdl-posixInputgtinput.datlt/jsdl-po
sixInputgt ltjsdl-posixOutputgtoutp
ut1.pnglt/jsdl-posixOutputgt
lt/jsdl-posixPOSIXApplicationgt
lt/jsdlApplicationgt ltjsdlResourcesgt
.
5
Pre configured Job Detail
  • Input fields are pre-configured / filled out.
  • Fields are taken from the JSDL and JSDL-POSIX
    extension schemas.
  • POSIXApplication is a JSDL extension. It defines
    standard POSIX elements.
  • stdin, stdout, stderr
  • Working directory
  • Command line arguments
  • Environment variables

ltPOSIXApplicationgt ltExecutable ... /gt
ltInput ... /gt? ltOutput ... /gt? ltError ...
/gt? ltWorkingDirectory ... /gt?
lt/POSIXApplicationgt
6
Pre configured Environment Variables
ltjsdl1Environment nameTMP"gt/tmplt/jsdl1Environm
entgt ltjsdl1Environment name"NGSMODULES"gtenvVar
Value1lt/jsdl1Environmentgt ..
7
Pre configured Command Line Arguments
Paste and parse command line arguments (space
and/or line separated values)
ltjsdl1Argumentgtfasta34lt/jsdl1Argumentgt ltjsdl1Ar
gumentgt-Hlt/jsdl1Argumentgt ltjsdl1ArgumentgthumanDN
A2.inputlt/jsdl1Argumentgt ltjsdl1Argumentgt/var/dat
a/bioinformatics/..lt/jsdl1Argumentgt ltjsdl1Argume
ntgtSlt/jsdl1Argumentgt
8
Pre configured Named File Systems
Named file systems used to declare mount points
on the consuming system. File system names are
referenced throughout the portal (and JSDL doc)
for substituting mount points. Changes to a FS
mount point will be updated automatically
throughout the portal/JSDL. Used when
specifying path info e.g. locations to
files/dirs, stage data locations etc.
ltjsdlFileSystem nameWORKINGDIR"gt
ltjsdlMountPointgt/home/ngs0024/myScratchDir
lt/jsdlMountPointgt lt/jsdlFileSystemgt ltjsdlFileSy
stem nameDataDir"gt
ltjsdlMountPointgt/home/ngs0024/myDataDirlt/jsdlMou
ntPointgt lt/jsdlFileSystemgt ltjsdlposixOutput
filesystemName"WORKINGDIR"gt fasta.out
lt/jsdl1Outputgt
9
JSDL Parameter Sweep Extensions
http//forge.gridforum.org/sf/projects/jsdl-wg
  • A common requirement to select a job and submit
    it 10, 50, 300 times, each time making some
    modifications to the original/master JSDL (e.g.
    args, parameters, output dir, input file
    whatever).
  • The JSDL PS extensions allows you to group the
    master JSDL the required modifications (which
    JSDL fields require sweeping)
  • Saves writing multiple separate JSDL docs.
  • Can be any value within the JSDL document itself,
  • Can be any value within a named file that is
    referenced by the JSDL (e.g. an input file).
  • Actually yields multiple separate jobs (rather
    than solely parameter sweeps).

10
Recently submitted for public comment at OGF24,
Sept.
11
JSDL SweepOverview
  • Nest ltSweepgt elements within a JSDL doc.
  • The ltAssignmentgt identifies which set of
    ltParametersgt should be swept / iterated using the
    given sweep ltFunctiongt.
  • ltParametergt ltFunctiongt are abstract (can
    define different implementations as required).
  • Spec v1 Parameters
  • ltDocumentNodegt, ltTemplateFilegt
  • Spec v1 Functions
  • ltValuesgt, ltLoopIntegergt, ltLoopDoublegt

JSDL
Select give new values for ltAppgt element
JSDL PS
PS ext
12
Basic Example - Modify the command line using
Values and a Loop
Two arguments are swept, yielding 3 separate
jobs (1, 4, again), (2, 5, again), (3, 7,
again)
13
Portal Implementation
Select which values require sweeping
14
Portal Implementation
Build sweep Identify which parameter Define
function values
Note, more interface work required (e.g. upload
.csv file for values)
15
JSDL JSDL POSIX
PS Extensions
16
Part 2 Data Staging / File Transfer (Portal is
VFS client)
Portal is a single interface to different remote
file systems (Ftp, Srb, GridFtp, Sftp). Browse
and perform file operations (upload, download,
delete, list, rename)
17
Manual Copy Data Between Different File Systems
Compile data (spread over different file systems)
Copy data to target URI (e.g. SRB or wherever)
Select files/dirs and copy to opposite host
Copy data between these different file systems.
18
Specify Data Staging Requirements of an
Application
List of data from across the Grid that should be
copied to the consuming system Before job
src URI After job tgt URI JSDL does
not mandate the protocol / URI format. Data is
staged relative to named file systems.
ltjsdlDataStaginggt ltjsdlFileNamegtMg.psflt/
jsdlFileNamegt ltjsdlFilesystemNamegtWORKINGD
IRlt/jsdlFilesystemNamegt ltjsdlCreationFlaggt
overwritelt/jsdlCreationFlaggt
ltjsdlDeleteOnTerminationgtfalselt/jsdlDeleteOnTerm
inationgt ltjsdlSourcegt
ltjsdlURIgtgsiftp//ngs.rl.ac.uk2811/apps/Siesta_m
pi/lt/jsdlURIgt lt/jsdlSourcegt
lt/jsdlDataStaginggt
19
Current Data Staging / File Transfer
Implementation (VFS)
File operations (list, upload, download, delete,
rename)
Single interface to different remote file systems
(Srb GsiFtp, Ftp, Sftp).
Bit pipe (byte IO stream)
Authentication tokens (un/pw, x509)
Portal VFS client
Auth tokens only in memory on one server. Self
contained. Piping bytes via portal server is not
ideal (bottleneck, single point of failure,
concurrency issues).
SRB/ FTP
SFTP/ GSIFTP
VFS Apache Commons VFS
20
Required / Suggested Architecture for Data
Staging / File Transfer Service
File operations (list, upload, download, delete,
rename)
Portal VFS client
Bit pipe (byte IO stream)
Authentication tokens (un/pw, x509)
JMS QUEUE
Move file transfers to different server (farm),
increase bandwidth, concurrency (large
transfers). Passing auth tokens around in
messages (strong security required) Development
effort / testing.
VFS clients
SFTP/ GSIFTP
SRB/ FTP
21
Summary
  • Parameter Sweeping
  • Portal role is the JSDL Producer (author/persist
    JSDL with sweep extensions for applications), not
    JSDL Consumer.
  • JSDL Consumers role is to enact the JSDLPS e.g.
    create, submit, stage data for 1000 jobs. This
    is the responsibility of OGSA BES, middleware and
    SAGA.
  • But, middleware / SAGA support for PS extensions
    not yet available - May have to devise a hack
    in the meantime ? ?
  • File Transfer / Data Staging Service
  • Need to support large data transfers by moving
    byte streaming to dedicated servers/services.
  • Will have to pass security tokens from portal to
    staging service (looking at WSS Username and
    Certificate profiles).
  • Here to explore this requirement (also for
    facilities work ?) and investigate solutions.
Write a Comment
User Comments (0)
About PowerShow.com