Protocols and Services for Distributed DataIntensive Science - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Protocols and Services for Distributed DataIntensive Science

Description:

FTP is defined by several IETF RFCs. Start with most ... globus-url-copy: Simple URL-to-URL copy. Experimental FTP servers ... Replica catalog definition ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 18
Provided by: carlk159
Category:

less

Transcript and Presenter's Notes

Title: Protocols and Services for Distributed DataIntensive Science


1
Protocols and Services for Distributed
Data-Intensive Science
  • Bill Allcock, ANL
  • ACAT Conference 19 Oct 2000
  • Fermi National Accelerator Laboratory
  • Contributors Ian Foster, Carl Kesselman, Steve
    Tuecke, Ann Chervenak

2
The Globus Data Grid
  • Two major components
  • 1. Data Transport and Access
  • Common protocol
  • Secure, efficient, flexible, extensible data
    movement
  • Family of tools supporting this protocol
  • 2. Replica Management Architecture
  • Simple scheme for managing
  • multiple copies of files
  • collections of files
  • APIs, white papers http//www.globus.org

3
GridFTP
  • Data Transport and Access

4
GridFTP Basic Approach
  • FTP is defined by several IETF RFCs
  • Start with most commonly used subset
  • Standard FTP get/put etc., 3rd-party transfer
  • Implement standard but often unused features
  • GSS binding, extended directory listing, simple
    restart
  • Extend in various ways, while preserving
    interoperability with existing servers
  • Parameter set/negotiate, parallel transfers
    (multiple TCP streams), striped transfers
    (multiple hosts), partial file transfers,
    automatic manual TCP buffer setting, progress
    monitoring, extended restart

5
The GridFTP Family of Tools
  • Patches to existing FTP code
  • GSI-enabled versions of existing FTP client and
    server, for high-quality production code
  • Custom-developed libraries
  • Implement full GridFTP protocol, targeting custom
    use, high-performance
  • Custom-developed tools
  • Servers and clients with specialized
    functionality and performance

6
Family of ToolsPatches to Existing Code
  • Patches to standard FTP clients and servers
  • gsi-ncftp Widely used client
  • gsi-wuftpd Widely used server
  • GSI modified HPSS pftpd
  • GSI modified Unitree ftpd
  • Provides high-quality, production ready, FTP
    clients and servers
  • Integration with common mass storage systems
  • Do not support the full GridFTP protocol

7
Family of ToolsCustom Developed Libraries
  • Custom developed libraries
  • globus_ftp_control Low level FTP driver
  • Client server protocol and connection
    management
  • globus_ftp_client Simple, reliable FTP client
  • globus_gass_copy Simple URL-to-URL copy library,
    supporting (Grid-)ftp, http(s), file URLs
  • Implement full GridFTP protocol
  • Various levels of libraries, allowing
    implementation of custom clients and servers
  • Tuned for high performance on WAN

8
Family of ToolsCustom Developed Programs
  • Simple production client
  • globus-url-copy Simple URL-to-URL copy
  • Experimental FTP servers
  • Modified WUFTPD with parallel channels
  • Striped FTP server (ala.DPSS)
  • Firewall FTP proxy Securely and efficiently
    allow transfers through firewalls

9
Replica Management Architecture
10
Replica Management
  • Maintain a mapping between logical names for
    files and collections and one or more physical
    locations
  • we define a replica to be a managed copy of a
    file.
  • The replica management system controls where and
    when copies are created, and provides information
    about where copies are located. However, the
    system does not make any statements about file
    consistency. In other words, it is possible for
    copies to get out of date with respect to one
    another, if a user chooses to modify a copy.
  • Based on the LDAP Protocol

11
A Model Architecture for Data Grids
Attribute Specification
Replica Catalog
Metadata Catalog
Application
Multiple Locations
NWS
Logical Collection and Logical File Name
Selected Replica
Replica Selection
Performance Information and Predictions
MDS
Disk Cache
Replica Location 1
Replica Location 2
Replica Location 3
12
Replica Manager Components
  • Replica catalog definition
  • LDAP object classes for representing
    logical-to-physical mappings in an LDAP catalog
  • Low-level replica catalog API
  • globus_replica_catalog library
  • Manipulates replica catalog add, delete, etc.
  • URL http//www.globus.org
  • High-level reliable replication API
  • globus_replica_manager library
  • Combines calls to file transfer operations and
    calls to low-level API functions create,
    destroy, etc.

13
Replica Catalog Structure A Climate Modeling
Example
14
Outstanding Issues
  • What write consistency should we support?
  • Methodology for handling updates
  • Access Control
  • Intermediate feedback required (callbacks)
  • Timing
  • Replicating the replica catalog
  • Replication of partial files
  • Alternate catalog views files belong to more
    than one logical collection

15
Status
  • Grid FTP and Replica Catalog API and tools in
    alpha test
  • Applications with climate data, intended for
    production use.
  • Replica Management API under design
  • Grid based access control strategy under design

16
Globus Data-Intensive Services Architecture
ReplicaPrograms
17
The End
Write a Comment
User Comments (0)
About PowerShow.com