An Open Standards-based Scalable Heavy Lifting Data Transfer Service for e-Research - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

An Open Standards-based Scalable Heavy Lifting Data Transfer Service for e-Research

Description:

... Meredith, Peter Turner, Alex Arana, Gerson Galang, David Wallom, Phil Kershaw, ... (Mario, Steve C, Ally, David M, Peter T, Alex A, Gerson G, David W, Weijian F) ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 14
Provided by: SteveC163
Category:

less

Transcript and Presenter's Notes

Title: An Open Standards-based Scalable Heavy Lifting Data Transfer Service for e-Research


1
An Open Standards-based Scalable Heavy Lifting
Data Transfer Service for e-Research
  • David Meredith, Peter Turner, Alex Arana, Gerson
    Galang, David Wallom, Phil Kershaw, Weijing Fang,
    Ally Hume, Mario Antonioletti, Steve Crouch

2
Problem
  • Moving data is a growing problem
  • Data increasing in size difficult to move about
  • Storage
  • Network
  • Initiating data transfers across different
    protocols (data onto/off grids) from a range of
    clients
  • Remote user - desktop, portal
  • Grid Web
  • e.g. copy from beam-line data resource to my home
    storage lab
  • Cant do transfer through clients not scalable
  • Need something lightweight for users

3
Users/Use Cases
  • For users from e.g.
  • Diamond Synchrotron, STFC
  • Australian Synchrotron Facility
  • Use Cases
  • Hermes (e.g. Oxford Anatomy Institute of Biology
    not wanting to deploy whole other machine to do
    this 100gbs of data. They want desktop client
    to do this)
  • NGS Portal
  • Any Commons VFS-style Client
  • SAGA client?

4
High-level Requirements
  • Properties
  • Scalable
  • Durable/Reliable
  • Asynchronous
  • Support protocols ftp/sftp/http/https/gsiftp/SRB/
    iRODS/SRM
  • Core requirement third party transfer needs to
    be cross-platform (e.g. SRB -gt gsiftp)
  • Construct XML that specifies requirements, send
    to 3rd party service for asynchronous

5
Realisations
  • Need to discuss at a high-level separate into
    particular layers
  • Top-level service, scheduling/movement
  • I/fs to individual data protocols (i.e. thru VFS)
  • Could go to data service providers and ask them
    to support 3rd party
  • But process could take too long
  • The tech is already out there
  • Would this go into UMD (Unified Middleware
    Distribution)? They want all projects using
    eu-funded e-Infrastructure

6
Current Cross Protocol File Transfer Data is
buffered through the client, this does Not Scale
and is synchronous !
File operations (list, upload, download, delete,
rename)
Bit pipe (byte IO stream)
Client provides single interface to different
remote file systems (Srb GsiFtp, Ftp, Sftp).
VFS/Saga client, e.g. Portal/Hermes
Authentication tokens (un/pw, x509?)
Auth tokens only in memory on one server. Self
contained. Piping bytes via client is bottleneck,
single point of failure, concurrency issues).
SRB/ FTP
SFTP/ GSIFTP
7
Required / Suggested Architecture Asynchronous,
no concurrency issues, no data buffered via
client !
VFS/Saga client
File operations (list, upload, download, delete,
rename)
Bit pipe (byte IO stream)
JMS QUEUE behind WS-I interface
Authentication tokens (un/pw, x509?)
Move file transfers to different server (farm),
increase bandwidth, concurrency. Passing auth
tokens around in messages (strong security
required) Development / testing.
VFS workers
SFTP/ GSIFTP
SRB/ FTP
8
Work to date
  • Data transfer currently done via e.g. Hermes
    Client
  • Commons VFS provides ftp/sftp/HTTP/HTTPS/webdav/gs
    iftp
  • Will always need clients via interface e.g.
    Portal, Hermes, VFS client but have transfer via
    scalable third party service
  • Asynchronous, poll for progress
  • Architecture underlying VFS code exists,
    deployed into service-oriented, scalable manner
  • Standards-driven?
  • OGSA-DMI
  • JSDL
  • GridSAM compute-focused

9
DataMINX DTS Heavy Lifting Data Transfer
Service
  • This is just one possible implementation of this,
    GridSAM another?
  • Under discussion last 4 days
  • JMS-based scalability for asynchronously/in
    parallel moving data
  • DTS web service submits to JMS queue
  • DTS worker nodes (VFS clients) picks up JMS
    transfer msgs
  • Can specify in JMS queue direct machines to
    perform transfer
  • Within J2EE environment
  • Abstractions with target URIs
  • Through shared connection pool per machine
  • One connection to target URI

10
Other Possible Solution Paths
  • GridSAM does some but not all
  • gLite File Transfer Service does this on a
    large scale
  • Stork
  • Supports ftp/http/fsiftp/nest/srb/srm/csrm/unitree
  • But not web service suitable?
  • Alan W Vbrowser Hermes-esque?
  • DW Cloud-based (e.g. Amazon solution?)
  • AH Parallelisation in OGSA-DAI for compute, here
    is parallelisation for data
  • GridSAMs data transfer is not parallelised
  • Could have job that just moves data but cannot
    guarantee network availability on worker nodes,
    and not architecturally ok
  • If one web service supports a single protocol,
    just extend it

11
Issues
  • Its a big problem with a big suggested solution
    lots of developer work
  • Need to think about failure use cases
  • Worker nodes fails JMS gives you isolation from
    service failure through tested, transaction-based
    durability
  • Need to discuss and uncover other failure cases
  • Specs do they cover all the use cases?
  • JSDL/HPC File Staging Profile, OGSA-DMI?
  • Interfaces limited?

12
Next Steps (Within CW)
  • Recommend further session (Mario, Steve C, Ally,
    David M, Peter T, Alex A, Gerson G, David W,
    Weijian F)
  • Have others critique the design work over last 4
    days
  • Possible subdivision for detailed issues
  • High-level requirements discussion
  • Implementation/specification
  • Go over issues with schema specs, possible ways
    forward
  • Possible architectures that can assist the
    problem now Stork!

13
Next Steps (Out of CW)
  • Spec issues
  • Schedule discussion within OGSA-DMI WG (Mario to
    organise)
  • HPC File Staging Profile/JSDL WGs (David M/Steve
    C to organise)
  • DW attend the OGF PGI sessions they will be
    observing championing necessary changes to
    JSDL/HPC Profile (Steve C)
Write a Comment
User Comments (0)
About PowerShow.com