HTTPHTTPS as Grid data transport 6 March 2003 - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

HTTPHTTPS as Grid data transport 6 March 2003

Description:

EU DataGrid is interested in large High Energy Physics, Earth Observation and ... Element' fileservers which can support additional transfer protocol front-ends. ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 12
Provided by: Zaqu
Category:

less

Transcript and Presenter's Notes

Title: HTTPHTTPS as Grid data transport 6 March 2003


1
HTTP/HTTPS as Grid data transport 6 March 2003
  • Andrew McNab, University of Manchester
  • mcnab_at_hep.man.ac.uk

2
Overview
  • EDG Motivations
  • Why use HTTP(S) for data transport
  • What needs promoting/agreeing?
  • Example multistream HTTP
  • Extensions to HTTP(S)
  • Example delegation over HTTPS
  • HTTP(S) vs alternatives

3
Background EDG Motivations
  • EU DataGrid is interested in large High Energy
    Physics, Earth Observation and Bio/Medical
    datasets.
  • Currently using GridFTP and HEP-specific RFIO
    protocol for bulk data transfer
  • EDG has modular Storage Element fileservers
    which can support additional transfer protocol
    front-ends.
  • Looking at adding support for HTTP(S) to Storage
    Elements
  • Widespread availability and quality of HTTP
    clients discussed later.
  • Interest in remote filesystems using GSI
    credentials
  • (cf Kerberos and AFS)
  • need protocol with low overhead, reuse of
    connections etc.
  • Also interest in delegation extensions for some
    aspects of information services

4
Why use HTTP(S) for data transport? (1)
  • HTTP(S) are interesting and important protocols
    for several reasons
  • HTTPS is by far the most widely deployed secure
    protocol
  • HTTP(S) has a large amount of high quality
    software that we can leverage
  • has excellent interaction with Firewalls, Network
    Address Translation and Application Proxies
  • HTTP is the basis for most Web and Grid Services
    work
  • HTTPS consists of HTTP/1.1 over an SSL connection
  • security done by SSL layer, using X509
    certificates (including GSI)
  • HTTP/1.1 (rfc2616) and extensions like WebDAV
    (rfc2518) have a rich set of methods (GET, PUT,
    DELETE, COPY, LOCK etc), headers (Expires
    etc) and Errors (413 Request Entity Too Large)
  • so a standard way exists already for many data
    transfer operations

5
Why use HTTP(S) for data transport? (2)
  • HTTP includes mechanisms for redirection and for
    offering multiple versions and letting the client
    choose.
  • HTTPs Range header allows partial GET and PUT
    operations
  • this makes it possible to implement multi-stream
    HTTP, with multiple TCP streams coming from one
    server, or striped across multiple servers.
  • In practice, HTTP can be as fast as other
    TCP-based protocols
  • eg multistream copying of 300MB files across
    Europe by HTTP or GridFTP
  • A very large amount of effort goes into producing
    HTTP(S) servers and clients with particular
    robustness or efficiency properties
  • eg Kernel-based zero-copy HTTP servers like tux
    are very efficient

6
What needs promoting/agreeing to use this?
  • Informational
  • eg What can be achieved using HTTP(S)
  • eg performance of HTTP(S) vs other protocols for
    given context
  • Best Practice
  • eg How existing standards should be used to
    achieve particular performance / functionality.
  • Standards
  • eg Which part of existing standards should go
    from MAY to SHOULD or MUST in a Grid data
    transport context.
  • eg What extensions or new standards do we need to
    achieve particular functionality or performance.

7
Best practice example multistream HTTP
  • HTTP can support application-level multiple
    streams and striping by using the standard Range
    header from RFC 2616 (HTTP/1.1) to set up many
    partial fetches.
  • This mechanism is supported by almost all modern
    web servers
  • eg Apache and RedHats tux kernel httpd
  • Multiple streams implemented by client splitting
    into threads
  • Each thread requests a block of the file from the
    server
  • As each request completes, thread finds next
    unfetched block and requests it
  • For this, it is essential that servers support
    Range header, and yet this is a relatively
    obscure feature in Web contexts, which many
    developers are not aware of.
  • So best practice statement would be support the
    Range header

8
Extensions to HTTP(S)
  • HTTPS/HTTP already have most of the functionality
    we need for Grid information/control/data
    transport
  • some of these come from several sources (eg the
    WebDAV RFC2518 not just HTTP/1.1 itself) and can
    be done different ways
  • frequently MAY -gt SHOULD / MUST
  • so want to specify a sufficient subset for
    interoperability
  • However, can identify some extensions that are
    also valuable
  • delegation over HTTPS
  • some way of returning access control information
    along with data
  • may want to specify TCP parameters for bulk data
    tranfer
  • so want to specify new HTTP headers and methods
    for the above
  • (My feeling is that we should retain backwards
    and pass through compatibility with existing
    HTTP(S) implementations.)

9
Example delegation over HTTPS
  • Client issues GET-PROXY-REQ request.
  • server generates a key and a certificate request,
    returns this in the response message body.
  • Client signs the cert request, and returns it in
    body of PUT-PROXY-CERT request.
  • Need a Delegation-ID header in the above
    exchanges so can keep track of the delegation
    session
  • may want to maintain delegation sessions for the
    same user at one server, but with different
    amounts of delegation
  • subsequent GET, PUT etc actions carry on using
    the Delegation-ID
  • Most clients and servers can pass through unknown
    methods/headers
  • Delegation-unaware server responds with 501
    Method not implemented
  • (Demonstration implementation of this in GridSite)

10
(Extended) HTTP(S) vs alternatives
  • Could use existing protocols GridFTP etc
  • HTTP(S) motivated for reasons given at start
  • Some environments (eg NAT) better suited to
    HTTP(S)
  • Could use ad-hoc conventions for some things
  • eg always use POST /cgi-bin/delegation.cgi
    for delegation
  • messy to implement, difficult to agree /
    standardize
  • difficult to implement transparently (eg for
    Trusted Caches)
  • Could do it all in SOAP, Web Services etc
  • worried about efficiency of encoding, set up time
    of transfers etc what if we want to grab a large
    number of small files say?
  • only works for SOAP- or WS- applications.
  • So HTTP(S) appears to address Grid data transport
    in some contexts better than other protocols.

11
Summary
  • HTTP(S) interesting because
  • widespread adoption, widespread support by
    multiple languages/platforms
  • coexists well with NAT etc
  • HTTPS naturally interoperates with GSI-based
    security
  • protocol has many features (eg Range header)
    which are very useful for data transport
  • Scope for doing informational, best practice and
    standardisation activities
  • how much (other) interest is there in doing this
    in GGF?
Write a Comment
User Comments (0)
About PowerShow.com