Grid Data Management Gabor Hermann on the base of lecture of Simone Campana LCG Experiment Integration and Support CERN IT - PowerPoint PPT Presentation

About This Presentation
Title:

Grid Data Management Gabor Hermann on the base of lecture of Simone Campana LCG Experiment Integration and Support CERN IT

Description:

Some performance problems detected during LCG Data Challenges. New LCG ... User exposed transaction API ( auto rollback on failure of mutating method call) ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 58
Provided by: Fab158
Category:

less

Transcript and Presenter's Notes

Title: Grid Data Management Gabor Hermann on the base of lecture of Simone Campana LCG Experiment Integration and Support CERN IT


1
Grid Data Management Gabor Hermann on the base
of lecture of Simone Campana LCG Experiment
Integration and Support CERN IT
www.eu-egee.org
EGEE is a project funded by the European Union
under contract IST-2003-508833
2
Overview
  • Introduction on Data Management (DM)
  • General Concepts
  • Some details on transport protocols
  • Data management operations
  • Files replicas Name Convention
  • File catalogs
  • Cataloging requirements and catalogs in egee/LCG
  • RLS file catalog
  • LCG file catalog
  • DM tools overview
  • Data Management CLI
  • lcg_utils
  • Data Management API
  • lcg_utils
  • GFAL
  • Advanced concepts
  • Advanced utilities CLIAPIs
  • OutputData JDL attribute
  • Conclusions

3
Overview
  • Introduction on Data Management (DM)
  • General Concepts
  • Some details on transport protocols
  • Data management operations
  • Files replicas Name Convention
  • File catalogs
  • Cataloging requirements and catalogs in egee/LCG
  • RLS file catalog
  • LCG file catalog
  • DM tools overview
  • Data Management CLI
  • lcg_utils
  • Data Management API
  • lcg_utils
  • GFAL
  • Advanced concepts
  • Advanced utilities CLIAPIs
  • OutputData JDL attribute
  • Conclusions

4
Data Management general concepts
  • What does Data Management mean?
  • Users and applications produce and require data
  • Data may be stored in Grid files
  • Granularity is at the file level (no data
    structures)
  • Users and applications need to handle files on
    the Grid
  • Files are stored in appropriate permanent
    resources called Storage Elements (SE)
  • Present almost at every site together with
    computing resources
  • We will treat a storage element as a black box
    where we can store data
  • Appropriate data management utilities/services
    hide internal structure of SE
  • Appropriate data management utilities/services
    hide details on transfer protocols

5
Data Management general concepts
  • A Grid file is READ-ONLY (at least in egee/LCG)
  • It can not be modified
  • It can be deleted (so it can be replaced)
  • Files are heterogeneous (ascii, binary )
  • High level Data Management tools (lcg_utils, see
    later) hide
  • transport layer details (protocols )
  • Storage location
  • To use lower level tools (edg-gridftp, see later
    ) you need
  • some knowledge of the transport layer
  • some knowledge of Storage Element implementation

Historic NameEuropian Data Grid
6
Some details on protocols
  • Data channel protocol mostly gridFTP (gsiftp)
  • secure and efficient data movement
  • extends the standard FTP protocol
  • Public-key-based Grid Security Infrastructure
    (GSI) support
  • Third-party control of data transfer
  • Parallel data transfer
  • Other protocols are available, especially for
    File I/O
  • rfio protocol
  • for CASTOR SE (and classic SE)
  • Not yet GSI enabled
  • gsidcap protocol
  • for secure access to dCache SE
  • file protocol
  • for local file access
  • Other Control Channel Protocols (SRM, discussed
    elsewere )

7
Data Management operations
  • Upload a file to the grid
  • U ser need to store data in SE (from a U I)
  • Application need to store data in SE (from a CE)
  • U ser need to store the application (to be
    retrieved and run on a CE)
  • For small files the InputSandbox can be used (see
    WMS lecture)

SE
CE
SE
CE
Several Grid Components
8
Data Management operations
  • Download files from the grid
  • User need to retrieve (onto the UI) data stored
    into SE
  • For small files produced in WN the OutputSandbox
    can be used
  • (see WMS lecture)
  • Applications need to copy data locally (into the
    CE) and use them
  • The application itself must be downloaded onto
    the CE and run

SE
CE
SE
CE
Several Grid Components
9
Data Management operations
  • Replicate a file across different SEs
  • Load share balacing of computing resources
  • Often a job needs to run at a site where a copy
    of input data is present (See InputData JDL
    attribute in WMS lecture)
  • Performance improvement in data access
  • Several applications might need to access the
    same file concurrently
  • Important for redundancy of key files (backup)

SE
CE
SE
CE
Several Grid Components
10
  • One of the base idea of LCG
  • Let us bring the little programs close to the
  • big files
  • Asymmetry in JDL
  • In given situation it is the task of the user
    to copy the GRID files mentioned in Input Data
    to the CE
  • The JDL supports the creating of GRID files from
    local files via Output Data

11
Data management operations
  • Data Management means movement and replication of
    files across/on grid elements
  • Grid DM tools/applications/services can be used
    for all kind of files
  • HOWEVER
  • Data Management focuses on large files
  • large means greater than 20MB
  • Tipically on the order of few hundreds MB
  • Tools/applications/services are optimized to deal
    with large files
  • In many cases, small files can be efficiently
    treated using different procedures
  • Examples
  • User can ship data to be used by the application
    on the WN (and possibly the application itself)
    using the InputSandbox (see WMS lecture)
  • User can retrieve (on the UI) data generated by a
    job (on the WN) using the OutputSandbox (see WMS
    lecture)

12
Files replicas Name Conventions
  • Logical File Name (LFN)
  • An alias created by a user to refer to some item
    of data, e.g. lfncms/20030203/run2/track1
  • Globally Unique Identifier (GUID)
  • A non-human-readable unique identifier for an
    item of data, e.g.
  • guidf81d4fae-7dec-11d0-a765-00a0c91e6bf6
  • Site URL (SURL) (or Physical File Name (PFN) or
    Site FN)
  • The location of an actual piece of data on a
    storage system, e.g. srm//pcrd24.cern.ch/flatfil
    es/cms/output10_1 (SRM)
    sfn//lxshare0209.cern.ch/data/alice/ntuples.dat
    (Classic SE)
  • Transport URL (TURL)
  • Temporary locator of a replica access protocol
    understood by a SE, e.g.
  • rfio//lxshare0209.cern.ch//data/alice/ntuples.d
    at

13
Overview
  • Introduction on Data Management (DM)
  • General Concepts
  • Some details on transport protocols
  • Data management operations
  • Files replicas Name Convention
  • File catalogs
  • Cataloging requirements and catalogs in egee/LCG
  • RLS file catalog
  • LCG file catalog
  • DM tools overview
  • Data Management CLI
  • lcg_utils
  • Data Management API
  • lcg_utils
  • GFAL
  • Advanced concepts
  • Advanced utilities CLIAPIs
  • OutputData JDL attribute
  • Conclusions

14
File Catalogs
  • At this point you should ask
  • How do I keep track of all my files on the Grid?
  • Even if I remember all the lfns of my files, what
    about someone else files?
  • Anyway, how does the Grid keep track of
    associations lfn/GUID/surl?
  • Well we need a FILE CATALOGUE

15
Cataloging Requirements
  • Need to keep track of the location of copies
    (replicas) of Grid files
  • Replicas might be described by attributes
  • Support for METADATA
  • Could be system metadata or user metadata
  • Potentially, milions of files need to be
    registered and located
  • Requirement for performance
  • Distributed architecture might be desirable
  • scalability
  • prevent single-point of failure
  • Site managers need to change autonomously file
    locations

16
File Catalogs in egee/LCG
  • Who has access to the file catalog?
  • The command line tools, APIs and the WMS interact
    with the catalog
  • Hide catalogue implementation details
  • Even lower level tools allow direct catalogue
    access
  • EDGs Replica Location Service (RLS)
  • Catalogs in use in LCG-2
  • Replica Metadata Catalog (RMC) Local Replica
    Catalog (LRC)
  • Some performance problems detected during LCG
    Data Challenges
  • New LCG File Catalog (LCF)
  • Already being certified deployment in January
    2005
  • Coexistence with RLS and migration tools provided
  • Better performance and scalability
  • Provides new features security, hierarchical
    namespace, transactions...

17
Overview of File catalogues
18
File Catalogs The RLS
  • RMC
  • Stores LFN-GUID mappings
  • Accessible by edg-rmc CLI API
  • LRC
  • Stores GUID-SURL mappings
  • Accessible by edg-lrc CLI API

DM
LRC
RMC
RMC
LRC
19
File Catalogs The LFC
  • One single catalog
  • LFN acts as main key in the database. It has
  • Symbolic links to it (additional LFNs)
  • Unique Identifier (GUID)
  • System metadata
  • Information on replicas
  • One field of user metadata

20
File Catalogs The LFC (II)
  • Fixes performance and scalability problems seen
    in EDG Catalogs
  • Cursors for large queries
  • Timeouts and retries from the client
  • Provides more features than the EDG Catalogs
  • User exposed transaction API ( auto rollback on
    failure of mutating method call)
  • Hierarchical namespace and namespace operations
    (for LFNs) /grid/ltVOgt/..
  • Integrated GSI Authentication Authorization
  • Access Control Lists (Unix Permissions and POSIX
    ACLs)
  • Checksums
  • Interaction with other components
  • Supports Oracle and MySQL database backends
  • Integration with GFAL and lcg_util APIs complete
  • New specific API provided

21
LFC commands
Summary of the LFC Catalog commands
lfc-chmod Change access mode of the LFC file/directory
lfc-chown Change owner and group of the LFC file-directory
lfc-delcomment Delete the comment associated with the file/directory
lfc-getacl Get file/directory access control lists
lfc-ln Make a symbolic link to a file/directory
lfc-ls List file/directory entries in a directory
lfc-mkdir Create a directory
lfc-rename Rename a file/directory
lfc-rm Remove a file/directory
lfc-setacl Set file/directory access control lists
lfc-setcomment Add/replace a comment
22
LFC C API
Low level methods (many POSIX-like)
lfc_setacl lfc_setatime lfc_setcomment lfc_seterrb
uf lfc_setfsize lfc_starttrans lfc_stat lfc_symlin
k lfc_umask lfc_undelete lfc_unlink lfc_utime send
2lfc
lfc_deleteclass lfc_delreplica lfc_endtrans lfc_en
terclass lfc_errmsg lfc_getacl lfc_getcomment lfc_
getcwd lfc_getpath lfc_lchown lfc_listclass lfc_li
stlinks
lfc_listreplica lfc_lstat lfc_mkdir lfc_modifyclas
s lfc_opendir lfc_queryclass lfc_readdir lfc_readl
ink lfc_rename lfc_rewind lfc_rmdir lfc_selectsrvr
lfc_access lfc_aborttrans lfc_addreplica lfc_apiin
it lfc_chclass lfc_chdir lfc_chmod lfc_chown lfc_c
losedir lfc_creat lfc_delcomment lfc_delete
23
  • Important environment variables
  • export LCG_GFAL_INFOSYSgrid152.kfki.hu2170
    Must be set for each catalogue type
  • export LCG_CATALOG_TYPElfc Must be set only for
    LFC
  • export LFC_HOSTgrid155.kfki.hu Must be set only
    for LFC

24
Overview
  • Introduction on Data Management (DM)
  • General Concepts
  • Some details on transport protocols
  • Data management operations
  • Files replicas Name Convention
  • File catalogs
  • Cataloging requirements and catalogs in egee/LCG
  • RLS file catalog
  • LCG file catalog
  • DM tools overview
  • Data Management CLI
  • lcg_utils
  • Data Management API
  • lcg_utils
  • GFAL
  • Advanced concepts
  • Advanced utilities CLIAPIs
  • OutputData JDL attribute
  • Conclusions

25
DM CLIs APIs overview
User Tools
Data Management (Replication, Indexing, Querying)
lcg_utils CLI C API
edg-rm CLI API
Data transfer
Cataloging
Storage
File I/O
GFAL C API GFAL C API
GFAL C API (GFAL C API)
Classic SE
GridFTP
RFIO
bbFTP
EDG
LFC
SRM
DCAP
edg-rmc edg-lrc CLI API
edg- gridtp Globus API
bbFTP API
CLI API
SRM API
rfio API
dcap API
26
SRM Storage Management
27
Data management tools
  • High level tools Replica manager lcg- commands
    lcg_ API
  • Provide (all) the functionality needed by the
    egee/LCG user
  • Combine file transfer and cataloging as an atomic
    transaction
  • Insure consistent operations on catalogues and
    storage systems
  • Offers high level layer over technology specific
    implementations
  • Based on the Grid File Access Library (GFAL) API
  • Low level tools edg-gridftp tools CLI

28
DM CLIs APIs Old EDG tools
  • Old versions of EDG CLIs and APIs still available
  • File replica management
  • edg-rm
  • Implemented (mostly) in java
  • Catalog interaction (only for EDG catalogs)
  • edg-lrc
  • edg-rmc
  • Java and C APIs
  • Use discouraged
  • Worse performance (slower)
  • New features added only to lcg_utils
  • Less general than GFAL and lcg_utils

29
Overview
  • Introduction on Data Management (DM)
  • General Concepts
  • Some details on transport protocols
  • Data management operations
  • Files replicas Name Convention
  • File catalogs
  • Cataloging requirements and catalogs in egee/LCG
  • RLS file catalog
  • LCG file catalog
  • DM tools overview
  • Data Management CLI
  • lcg_utils
  • Data Management API
  • lcg_utils
  • GFAL
  • Advanced concepts
  • Advanced utilities CLIAPIs
  • OutputData JDL attribute
  • Conclusions

30
Gathering informations lcg-infosites
  • Not really a Data Management tool
  • Wrapper around Information System Client
  • Very usefull to discover resources
  • Storage Elements
  • Catalog end points
  • ()
  • Usage lcg-infosites --vo voname option --is
    BDII --help
  • Possible options se, ce, closeSE, lrc, rmc, all
  • --vo field is mandatory
  • --is allows to specify the BDII to query
  • If flag not used, the BDII defined into
    LCG_GFAL_INFOSYS environmental variable is used
  • Try the help flag for a list of possible options

31
lcg-utils commands
  • Replica Management

lcg-cp Copies a grid file to a local destination
lcg-cr Copies a file to a SE and registers the file in the catalog
lcg-del Delete one file
lcg-rep Replication between SEs and registration of the replica
lcg-gt Gets the TURL for a given SURL and transfer protocol
lcg-sd Sets file status to Done for a given SURL in a SRM request
File Catalog Interaction
lcg-aa Add an alias in LFC for a given GUID
lcg-ra Remove an alias in LFC for a given GUID
lcg-rf Registers in LFC a file placed in a SE
lcg-uf Unregisters in LFC a file placed in a SE
lcg-la Lists the alias for a given SURL, GUID or LFN
lcg-lg Get the GUID for a given LFN or SURL
lcg-lr Lists the replicas for a given GUID, SURL or LFN
32
Gathering informations lcg-infosites
  • scampana_at_grid019 lcg-infosites --vo gilda se

  • These are the related data for gilda (in terms
    of SE)

  • Avail Space(Kb) Used Space(Kb) SEs
  • --------------------------------------------------
    --------
  • 1570665704 576686868
    grid3.na.astro.it
  • 225661244 1906716
    grid009.ct.infn.it
  • 523094840 457000
    grid003.cecalc.ula.ve
  • 1570665704 576686868
    testbed005.cnaf.infn.it
  • 15853516 1879992
    gilda-se01.pd.infn.it

33
lcg_utils CLI usage example
scampana_at_grid019 lcg-cr --vo gilda -l
lfnsimone-important \ -d grid3.na.astro.it
file//pwd/important-file.txt guid08d02e56-bdf
6-4833-a4da-e0247c188242
scampana_at_grid019 ls -l important-file.txt -rw
-r--r-- 1 scampana users 19 Oct 31
1709 important-file.txt
scampana_at_grid019 lcg-lr --vo gilda
lfnsimone-important sfn//grid3.na.astro.it/flat
files/SE00/gilda/generated/2004-10-31/
\ file4c7c2ad6-4d93-4cd2-be24-bf4239f58208
scampana_at_grid019 lcg-rep --vo gilda \ -d
grid003.cecalc.ula.ve lfnsimone-important
scampana_at_grid019 lcg-lr --vo gilda
lfnsimone-important sfn//grid003.cecalc.u
la.ve/flatfiles/SE00/gilda/generated/2004-10-31/
\ file39568d15-e873-4f17-9371-b8862ae77c36 sfn//g
rid3.na.astro.it/flatfiles/SE00/gilda/generated/20
04-10-31/ \ file4c7c2ad6-4d93-4cd2-be24-bf4239f582
08
scampana_at_grid019 lcg-del --vo gilda -a
lfnsimone-important scampana_at_grid019
lcg-lr --vo gilda lfnsimone-important lcg_lr No
such file or directory
IMPORTANT The lcg_utils (both CLI and API
described later) need to access the Information
System (BDII). The name of the BDII host used by
lcg_utils is specified in the environment
variable LCG_GFAL_INFOSYS REMEMBER THAT,
ESPECIALLY WHEN PERFORMING DATA MANAGEMENT
OPERATIONS FROM THE WN
  • We have a local file in our UI in Catania

Upload the file in Naples (Italy)
The file is effectively there
. Let s replicate it to Merida now
Delete all the replicas in the storage elements.
34
Overview
  • Introduction on Data Management (DM)
  • General Concepts
  • Some details on transport protocols
  • Data management operations
  • Files replicas Name Convention
  • File catalogs
  • Cataloging requirements and catalogs in egee/LCG
  • RLS file catalog
  • LCG file catalog
  • DM tools overview
  • Data Management CLI
  • lcg_utils
  • Data Management API
  • lcg_utils
  • GFAL
  • Advanced concepts
  • Advanced utilities CLIAPIs
  • OutputData JDL attribute
  • Conclusions

35
lcg_utils API
  • lcg_utils API
  • High-level data management C API
  • Same functionality as lcg_util command line tools
  • Single shared library
  • liblcg_util.so
  • Single header file
  • lcg_util.h
  • ( linking against libglobus_gass_copy_gcc32.so)

36
lcg_utils Replica management
  • int lcg_cp (char src_file, char dest_file, char
    vo, int nbstreams, char conf_file, int
    insecure, int insecure)
  • int lcg_cr (char src_file, char dest_file, char
    guid, char lfn, char vo, char relative_path,
    int nbstreams, char conf_file, int insecure, int
    verbose, char actual_guid)
  • int lcg_del (char file, int aflag, char se,
    char vo, char conf_file, int insecure, int
    verbose)
  • int lcg_rep (char src_file, char dest_file,
    char vo, char relative_path, int nbstreams,
    char conf_file, int insecure, int verbose)
  • int lcg_sd (char surl, int regid, int fileid,
    char token, int oflag)

37
lcg_utils Catalog interaction
  • int lcg_aa (char lfn, char guid, char vo, char
    insecure, int verbose)
  • int lcg_gt (char surl, char protocol, char
    turl, int regid, int fileid, char token)
  • int lcg_la (char file, char vo, char
    conf_file, int insecure, char lfns)
  • int lcg_lg (char lfn_or_surl, char vo, char
    conf_file, int insecure, char guid)
  • int lcg_lr (char file, char vo, char
    conf_file, int insecure, char pfns)
  • int lcg_ra (char lfn, char guid, char vo, char
    conf_file, int insecure)
  • int lcg_rf (char surl, char guid, char lfn,
    char vo, char conf_file, int insecure, int
    verbose, char actual_guid)
  • int lcg_uf (char surl, char guid, char vo,
    char conf_file, int insecure)

38
Available APIs
include ltiostreamgt include ltstdlib.hgt
include ltstring.hgt include ltstringgt include
ltstdio.hgt include lterrno.hgt // lcg_util is a
C library. Since we write C code here, we need
to // use extern C // extern "C" include
ltlcg_util.hgt using namespace std
/
/ / The
folling example code shows you how you can use
the lcg_util API for / / replica management.
We expect that you modify parts of this code in
/ / to make it work in your environment.
This is particularly indicated / /
by ACTION, i.e. your action is required.
/
/
/ int main
() cout ltlt "Data Management API Example " ltlt
endl char vo "cms" // ACTION fill in your
correct VO here gilda ! cout ltlt
"-------------------------------------------------
--" ltlt endl
C APIs
39
Available APIs
// Copy a local file to the Storage Element and
register it in RLS // char localFile
"file/tmp/test-file" // ACTION create a
testfile char destSE "lxb0707.cern.ch" //
ACTION fill in a specific SE char actualGuid
(char) malloc(50) int verbose 2 // we use
verbosity level 2 int nbstreams 8 // we use 8
parallel streams to transfer a file
lcg_cr(localFile, destSE, NULL,
NULL, vo, NULL, nbstreams, NULL, 0,
verbose, actualGuid) if (errno)
perror("Error in copyAndRegister") return -1
else cout ltlt "We registered the file
with GUID " ltlt actualGuid ltlt endl cout ltlt
"-------------------------------------------------
--" ltlt endl
Copy and Register
40
Available APIs
// Call the listReplicas (lcg_lr) method and
print the returned URLs // // The actualGuid
does not contain the prefix "guid". We add it
here and // then use the new guid as a parameter
to list replicas // stdstring guid "guid"
guid.insert(5,actualGuid) char pfns
(char) malloc(200) lcg_lr((char)
guid.c_str(), vo, NULL, 0, pfns) if(errno)
perror("Error in listReplicas")
free(pfns) return -1 else cout ltlt
"PFN " ltlt pfns ltlt endl free(pfns)
cout ltlt "---------------------------------------
------------" ltlt endl
List Replicas
41
Available APIs
// Delete the replica again // int rc
lcg_del((char) guid.c_str(), 1, destSE, vo,
NULL, 0, verbose) if(rc ! 0)
perror("Error in delete") return -1 else
cout ltlt "Delete OK" ltlt endl return 0

Delete Replica
42
Available APIs
CC g GLOBUS_FLAVOR gcc32 all
data-management data-management
data-management.o (CC) -o data-management \
-LGLOBUS_LOCATION/lib
-lglobus_gass_copy_GLOBUS_FLAVOR \
-LLCG_LOCATION/lib -llcg_util
-lgfal \
data-management.o data-management.o
data-management.cpp (CC) -I LCG_LOCATION/inc
lude -c data-management.cpp clean rm -rf
data-management data-management.o
Makefile used
43
Overview
  • Introduction on Data Management (DM)
  • General Concepts
  • Some details on transport protocols
  • Data management operations
  • Files replicas Name Convention
  • File catalogs
  • Cataloging requirements and catalogs in egee/LCG
  • RLS file catalog
  • LCG file catalog
  • DM tools overview
  • Data Management CLI
  • lcg_utils
  • Data Management API
  • lcg_utils
  • GFAL
  • Advanced concepts
  • Advanced utilities CLIAPIs
  • OutputData JDL attribute
  • Conclusions

44
Grid File Access Library
  • GFAL is a library to provide access to Grid files
  • File I/O, Catalog Interaction, Storage
    Interaction
  • Abstraction from specific implementations
  • Transparent interaction with the information
    service, the file catalogs
  • Single shared library in threaded and unthreaded
    versions
  • libgfal.so, libgfal_pthr.so
  • Single header file
  • gfal_api.h

45
GFAL Catalog API
  • int create_alias (const char guid, const char
    lfn, long long size)
  • int guid_exists (const char guid)
  • char guidforpfn (const char surl)
  • char guidfromlfn (const char lfn)
  • char lfnsforguid (const char guid)
  • int register_alias (const char guid, const char
    lfn)
  • int register_pfn (const char guid, const char
    surl)
  • int setfilesize (const char surl, long long
    size)
  • char surlfromguid (const char guid)
  • char surlsfromguid (const char guid)
  • int unregister_alias (const char guid, const
    char lfn)
  • int unregister_pfn (const char guid, const char
    surl)

46
GFAL Storage API
  • int deletesurl (const char surl)
  • int getfilemd (const char surl, struct stat64
    statbuf)
  • int set_xfer_done (const char surl, int reqid,
    int fileid, char token, int oflag)
  • int set_xfer_running (const char surl, int
    reqid, int fileid, char token)
  • char turlfromsurl (const char surl, char
    protocols, int oflag, int reqid, int fileid,
    char token)
  • int srm_get (int nbfiles, char surls, int
    nbprotocols, char protocols, int reqid, char
    token, struct srm_filestatus filestatuses)
  • int srm_getstatus (int nbfiles, char surls, int
    reqid, char token, struct srm_filestatus
    filestatuses)

47
GFAL File I/O API (I)
  • int gfal_access (const char path, int amode)
  • int gfal_chmod (const char path, mode_t mode)
  • int gfal_close (int fd)
  • int gfal_creat (const char filename, mode_t
    mode)
  • off_t gfal_lseek (int fd, off_t offset, int
    whence)
  • int gfal_open (const char filename, int flags,
    mode_t mode)
  • ssize_t gfal_read (int fd, void buf, size_t
    size)
  • int gfal_rename (const char old_name, const char
    new_name)
  • ssize_t gfal_setfilchg (int, const void ,
    size_t)
  • int gfal_stat (const char filename, struct stat
    statbuf)
  • int gfal_unlink (const char filename)
  • ssize_t gfal_write (int fd, const void buf,
    size_t size)

48
GFAL protocol of File Open
49
GFAL File I/O API (II)
  • int gfal_closedir (DIR dirp)
  • int gfal_mkdir (const char dirname, mode_t
    mode)
  • DIR gfal_opendir (const char dirname)
  • struct dirent gfal_readdir (DIR dirp)
  • int gfal_rmdir (const char dirname)

50
Overview
  • Introduction on Data Management (DM)
  • General Concepts
  • Some details on transport protocols
  • Data management operations
  • Files replicas Name Convention
  • File catalogs
  • Cataloging requirements and catalogs in egee/LCG
  • RLS file catalog
  • LCG file catalog
  • DM tools overview
  • Data Management CLI
  • lcg_utils
  • Data Management API
  • lcg_utils
  • GFAL
  • Advanced concepts
  • Advanced utilities CLIAPIs
  • OutputData JDL attribute
  • Conclusions

51
Advanced utilities edg-gridftp
Used for low level management of file/directories
in SEs
  • edg-gridftp-exists TURL Checks if file/dir
    exists on a SE
  • edg-gridftp-ls TURL Lists a directory on a SE
  • globus-url-copy srcTURL dstTURL Copies files
    between SEs
  • edg-gridftp-mkdir TURL Creates a
    directory on a SE
  • edg-gridftp-rename srcTURL dstTURL Renames a
    file on a SE
  • edg-gridftp-rm TURL Removes a file from a SE
  • edg-gridftp-rmdir TURL Removes a directory on
    a SE

52
edg-gridftp example
Create and delete a directory in a GILDA Storage
Element
53
Other Advanced CLIAPI
  • globus-url-copy srcTURL destTURL
  • low level file transfer
  • Interaction with RLS components
  • edg-lrc command (actions on LRC)
  • edg-rmc command (actions on RMC)
  • C and Java API for all catalog operations
  • http//edg-wp2.web.cern.ch/edg-wp2/replication/doc
    u/r2.1/edg-lrc-devguide.pdf
  • http//edg-wp2.web.cern.ch/edg-wp2/replication/doc
    u/r2.1/edg-rmc-devguide.pdf
  • Using low level CLI and API is STRONGLY
    discouraged
  • Risk loose consistency between SEs and
    catalogues
  • REMEMBER a file is in Grid if it is BOTH
  • stored in a Storage Element
  • registered in the file catalog

54
OutputData JDL attribute
  • Same as lcg-cr command
  • OutputData JDL attribute specifies files to be
    copied and registered into the Grid
  • The filename (OutputData) is compulsory
  • If no LFN specified (LogicalFileName), none is
    set!
  • If no SE specified (StorageElement), the default
    SE is chosen (VO_ltVOgt_DEFAULT_SE)
  • At the end of the job the files are moved from WN
    and registered
  • OutputData OutputFile toto.out
    StorageElement adc0021.cern.ch
    LogicalFileName lfntheBestTotoEver ,
  • OutputFile toto2.out
    StorageElement adc0021.cern.ch
    LogicalFileName lfntheBestTotoEver2

55
Overview
  • Introduction on Data Management (DM)
  • General Concepts
  • Some details on transport protocols
  • Data management operations
  • Files replicas Name Convention
  • File catalogs
  • Cataloging requirements and catalogs in egee/LCG
  • RLS file catalog
  • LCG file catalog
  • DM tools overview
  • Data Management CLI
  • lcg_utils
  • Data Management API
  • lcg_utils
  • GFAL
  • Advanced concepts
  • Advanced utilities CLIAPIs
  • OutputData JDL attribute
  • Conclusions

56
Summary
  • We provided a description to the egee/LCG Data
    Management Middleware Components and Tools
  • We described how to use the available CLIs
  • Use-case scenarios of Data Movement on Grid
  • We presented the available APIs
  • An example usage of lcg_util library is shown

57
Bibliography
  • General egee/LCG information
  • EGEE Homepage
  • http//public.eu-egee.org/
  • EGEEs NA3 User Training and Induction
  • http//www.egee.nesc.ac.uk/
  • LCG Homepage
  • http//lcg.web.cern.ch/LCG/
  • LCG-2 User Guide
  • https//edms.cern.ch/file/454439//LCG-2-UserGuide.
    html
  • GILDA
  • http//gilda.ct.infn.it/
  • GENIUS (GILDA web portal)
  • http//grid-tutor.ct.infn.it/

58
Bibliography
  • Information on Data Management middleware
  • LCG-2 User Guide (chapters 3rd and 6th)
  • https//edms.cern.ch/file/454439//LCG-2-UserGuide.
    html
  • Evolution of LCG-2 Data Management. J-P Baud,
    James Casey.
  • http//indico.cern.ch/contributionDisplay.py?contr
    ibId278sessionId7confId0
  • Globus 2.4
  • http//www.globus.org/gt2.4/
  • GridFTP
  • http//www.globus.org/datagrid/gridftp.html
  • GFAL
  • http//grid-deployment.web.cern.ch/grid-deployment
    /gis/GFAL/GFALindex.html

59
Bibliography
  • Information on egee/LCG tools and APIs
  • Manpages (in UI)
  • lcg_utils lcg- (commands), lcg_ (C
    functions)
  • Header files (in LCG_LOCATION/include)
  • lcg_util.h
  • CVS developement (sources for commands)
  • http//isscvs.cern.ch8180/cgi-bin/cvsweb.cgi/?hid
    enonreadable1fu logsortdatesortbyfilehidea
    ttic1cvsrootlcgwarepath
  • Information on other tools and APIs
  • EDG CLIs and APIs
  • http//edg-wp2.web.cern.ch/edg-wp2/replication/doc
    umentation.html
  • Globus
  • http//www-unix.globus.org/api/c/ ,
    ...globus_ftp_client/html , ...globus_ftp_control
    /html
Write a Comment
User Comments (0)
About PowerShow.com