Title: Grid Data Management Gabor Hermann on the base of lecture of Simone Campana LCG Experiment Integration and Support CERN IT
1Grid Data Management Gabor Hermann on the base
of lecture of Simone Campana LCG Experiment
Integration and Support CERN IT
www.eu-egee.org
EGEE is a project funded by the European Union
under contract IST-2003-508833
2Overview
- Introduction on Data Management (DM)
- General Concepts
- Some details on transport protocols
- Data management operations
- Files replicas Name Convention
- File catalogs
- Cataloging requirements and catalogs in egee/LCG
- RLS file catalog
- LCG file catalog
- DM tools overview
- Data Management CLI
- lcg_utils
- Data Management API
- lcg_utils
- GFAL
- Advanced concepts
- Advanced utilities CLIAPIs
- OutputData JDL attribute
- Conclusions
3Overview
- Introduction on Data Management (DM)
- General Concepts
- Some details on transport protocols
- Data management operations
- Files replicas Name Convention
- File catalogs
- Cataloging requirements and catalogs in egee/LCG
- RLS file catalog
- LCG file catalog
- DM tools overview
- Data Management CLI
- lcg_utils
- Data Management API
- lcg_utils
- GFAL
- Advanced concepts
- Advanced utilities CLIAPIs
- OutputData JDL attribute
- Conclusions
4Data Management general concepts
- What does Data Management mean?
- Users and applications produce and require data
- Data may be stored in Grid files
- Granularity is at the file level (no data
structures) - Users and applications need to handle files on
the Grid - Files are stored in appropriate permanent
resources called Storage Elements (SE) - Present almost at every site together with
computing resources - We will treat a storage element as a black box
where we can store data - Appropriate data management utilities/services
hide internal structure of SE - Appropriate data management utilities/services
hide details on transfer protocols
5Data Management general concepts
- A Grid file is READ-ONLY (at least in egee/LCG)
- It can not be modified
- It can be deleted (so it can be replaced)
- Files are heterogeneous (ascii, binary )
- High level Data Management tools (lcg_utils, see
later) hide - transport layer details (protocols )
- Storage location
- To use lower level tools (edg-gridftp, see later
) you need - some knowledge of the transport layer
- some knowledge of Storage Element implementation
Historic NameEuropian Data Grid
6Some details on protocols
- Data channel protocol mostly gridFTP (gsiftp)
- secure and efficient data movement
- extends the standard FTP protocol
- Public-key-based Grid Security Infrastructure
(GSI) support - Third-party control of data transfer
- Parallel data transfer
- Other protocols are available, especially for
File I/O - rfio protocol
- for CASTOR SE (and classic SE)
- Not yet GSI enabled
- gsidcap protocol
- for secure access to dCache SE
- file protocol
- for local file access
- Other Control Channel Protocols (SRM, discussed
elsewere )
7Data Management operations
- Upload a file to the grid
- U ser need to store data in SE (from a U I)
- Application need to store data in SE (from a CE)
- U ser need to store the application (to be
retrieved and run on a CE) - For small files the InputSandbox can be used (see
WMS lecture)
SE
CE
SE
CE
Several Grid Components
8Data Management operations
- Download files from the grid
- User need to retrieve (onto the UI) data stored
into SE - For small files produced in WN the OutputSandbox
can be used - (see WMS lecture)
- Applications need to copy data locally (into the
CE) and use them - The application itself must be downloaded onto
the CE and run
SE
CE
SE
CE
Several Grid Components
9Data Management operations
- Replicate a file across different SEs
- Load share balacing of computing resources
- Often a job needs to run at a site where a copy
of input data is present (See InputData JDL
attribute in WMS lecture) - Performance improvement in data access
- Several applications might need to access the
same file concurrently - Important for redundancy of key files (backup)
SE
CE
SE
CE
Several Grid Components
10- One of the base idea of LCG
- Let us bring the little programs close to the
- big files
- Asymmetry in JDL
- In given situation it is the task of the user
to copy the GRID files mentioned in Input Data
to the CE - The JDL supports the creating of GRID files from
local files via Output Data
11Data management operations
- Data Management means movement and replication of
files across/on grid elements - Grid DM tools/applications/services can be used
for all kind of files - HOWEVER
- Data Management focuses on large files
- large means greater than 20MB
- Tipically on the order of few hundreds MB
- Tools/applications/services are optimized to deal
with large files - In many cases, small files can be efficiently
treated using different procedures - Examples
- User can ship data to be used by the application
on the WN (and possibly the application itself)
using the InputSandbox (see WMS lecture) - User can retrieve (on the UI) data generated by a
job (on the WN) using the OutputSandbox (see WMS
lecture)
12Files replicas Name Conventions
- Logical File Name (LFN)
- An alias created by a user to refer to some item
of data, e.g. lfncms/20030203/run2/track1 - Globally Unique Identifier (GUID)
- A non-human-readable unique identifier for an
item of data, e.g. - guidf81d4fae-7dec-11d0-a765-00a0c91e6bf6
- Site URL (SURL) (or Physical File Name (PFN) or
Site FN) - The location of an actual piece of data on a
storage system, e.g. srm//pcrd24.cern.ch/flatfil
es/cms/output10_1 (SRM)
sfn//lxshare0209.cern.ch/data/alice/ntuples.dat
(Classic SE) - Transport URL (TURL)
- Temporary locator of a replica access protocol
understood by a SE, e.g. - rfio//lxshare0209.cern.ch//data/alice/ntuples.d
at
13Overview
- Introduction on Data Management (DM)
- General Concepts
- Some details on transport protocols
- Data management operations
- Files replicas Name Convention
- File catalogs
- Cataloging requirements and catalogs in egee/LCG
- RLS file catalog
- LCG file catalog
- DM tools overview
- Data Management CLI
- lcg_utils
- Data Management API
- lcg_utils
- GFAL
- Advanced concepts
- Advanced utilities CLIAPIs
- OutputData JDL attribute
- Conclusions
14File Catalogs
- At this point you should ask
- How do I keep track of all my files on the Grid?
- Even if I remember all the lfns of my files, what
about someone else files? - Anyway, how does the Grid keep track of
associations lfn/GUID/surl? - Well we need a FILE CATALOGUE
15Cataloging Requirements
- Need to keep track of the location of copies
(replicas) of Grid files - Replicas might be described by attributes
- Support for METADATA
- Could be system metadata or user metadata
- Potentially, milions of files need to be
registered and located - Requirement for performance
- Distributed architecture might be desirable
- scalability
- prevent single-point of failure
- Site managers need to change autonomously file
locations
16File Catalogs in egee/LCG
- Who has access to the file catalog?
- The command line tools, APIs and the WMS interact
with the catalog - Hide catalogue implementation details
- Even lower level tools allow direct catalogue
access - EDGs Replica Location Service (RLS)
- Catalogs in use in LCG-2
- Replica Metadata Catalog (RMC) Local Replica
Catalog (LRC) - Some performance problems detected during LCG
Data Challenges - New LCG File Catalog (LCF)
- Already being certified deployment in January
2005 - Coexistence with RLS and migration tools provided
- Better performance and scalability
- Provides new features security, hierarchical
namespace, transactions...
17Overview of File catalogues
18File Catalogs The RLS
- RMC
- Stores LFN-GUID mappings
- Accessible by edg-rmc CLI API
- LRC
- Stores GUID-SURL mappings
- Accessible by edg-lrc CLI API
DM
LRC
RMC
RMC
LRC
19File Catalogs The LFC
- One single catalog
- LFN acts as main key in the database. It has
- Symbolic links to it (additional LFNs)
- Unique Identifier (GUID)
- System metadata
- Information on replicas
- One field of user metadata
20File Catalogs The LFC (II)
- Fixes performance and scalability problems seen
in EDG Catalogs - Cursors for large queries
- Timeouts and retries from the client
- Provides more features than the EDG Catalogs
- User exposed transaction API ( auto rollback on
failure of mutating method call) - Hierarchical namespace and namespace operations
(for LFNs) /grid/ltVOgt/.. - Integrated GSI Authentication Authorization
- Access Control Lists (Unix Permissions and POSIX
ACLs) - Checksums
- Interaction with other components
- Supports Oracle and MySQL database backends
- Integration with GFAL and lcg_util APIs complete
- New specific API provided
21LFC commands
Summary of the LFC Catalog commands
lfc-chmod Change access mode of the LFC file/directory
lfc-chown Change owner and group of the LFC file-directory
lfc-delcomment Delete the comment associated with the file/directory
lfc-getacl Get file/directory access control lists
lfc-ln Make a symbolic link to a file/directory
lfc-ls List file/directory entries in a directory
lfc-mkdir Create a directory
lfc-rename Rename a file/directory
lfc-rm Remove a file/directory
lfc-setacl Set file/directory access control lists
lfc-setcomment Add/replace a comment
22LFC C API
Low level methods (many POSIX-like)
lfc_setacl lfc_setatime lfc_setcomment lfc_seterrb
uf lfc_setfsize lfc_starttrans lfc_stat lfc_symlin
k lfc_umask lfc_undelete lfc_unlink lfc_utime send
2lfc
lfc_deleteclass lfc_delreplica lfc_endtrans lfc_en
terclass lfc_errmsg lfc_getacl lfc_getcomment lfc_
getcwd lfc_getpath lfc_lchown lfc_listclass lfc_li
stlinks
lfc_listreplica lfc_lstat lfc_mkdir lfc_modifyclas
s lfc_opendir lfc_queryclass lfc_readdir lfc_readl
ink lfc_rename lfc_rewind lfc_rmdir lfc_selectsrvr
lfc_access lfc_aborttrans lfc_addreplica lfc_apiin
it lfc_chclass lfc_chdir lfc_chmod lfc_chown lfc_c
losedir lfc_creat lfc_delcomment lfc_delete
23- Important environment variables
- export LCG_GFAL_INFOSYSgrid152.kfki.hu2170
Must be set for each catalogue type -
- export LCG_CATALOG_TYPElfc Must be set only for
LFC - export LFC_HOSTgrid155.kfki.hu Must be set only
for LFC
24Overview
- Introduction on Data Management (DM)
- General Concepts
- Some details on transport protocols
- Data management operations
- Files replicas Name Convention
- File catalogs
- Cataloging requirements and catalogs in egee/LCG
- RLS file catalog
- LCG file catalog
- DM tools overview
- Data Management CLI
- lcg_utils
- Data Management API
- lcg_utils
- GFAL
- Advanced concepts
- Advanced utilities CLIAPIs
- OutputData JDL attribute
- Conclusions
25DM CLIs APIs overview
User Tools
Data Management (Replication, Indexing, Querying)
lcg_utils CLI C API
edg-rm CLI API
Data transfer
Cataloging
Storage
File I/O
GFAL C API GFAL C API
GFAL C API (GFAL C API)
Classic SE
GridFTP
RFIO
bbFTP
EDG
LFC
SRM
DCAP
edg-rmc edg-lrc CLI API
edg- gridtp Globus API
bbFTP API
CLI API
SRM API
rfio API
dcap API
26SRM Storage Management
27Data management tools
- High level tools Replica manager lcg- commands
lcg_ API - Provide (all) the functionality needed by the
egee/LCG user - Combine file transfer and cataloging as an atomic
transaction - Insure consistent operations on catalogues and
storage systems - Offers high level layer over technology specific
implementations - Based on the Grid File Access Library (GFAL) API
- Low level tools edg-gridftp tools CLI
28DM CLIs APIs Old EDG tools
- Old versions of EDG CLIs and APIs still available
- File replica management
- edg-rm
- Implemented (mostly) in java
- Catalog interaction (only for EDG catalogs)
- edg-lrc
- edg-rmc
- Java and C APIs
- Use discouraged
- Worse performance (slower)
- New features added only to lcg_utils
- Less general than GFAL and lcg_utils
29Overview
- Introduction on Data Management (DM)
- General Concepts
- Some details on transport protocols
- Data management operations
- Files replicas Name Convention
- File catalogs
- Cataloging requirements and catalogs in egee/LCG
- RLS file catalog
- LCG file catalog
- DM tools overview
- Data Management CLI
- lcg_utils
- Data Management API
- lcg_utils
- GFAL
- Advanced concepts
- Advanced utilities CLIAPIs
- OutputData JDL attribute
- Conclusions
30Gathering informations lcg-infosites
- Not really a Data Management tool
- Wrapper around Information System Client
- Very usefull to discover resources
- Storage Elements
- Catalog end points
- ()
- Usage lcg-infosites --vo voname option --is
BDII --help - Possible options se, ce, closeSE, lrc, rmc, all
- --vo field is mandatory
- --is allows to specify the BDII to query
- If flag not used, the BDII defined into
LCG_GFAL_INFOSYS environmental variable is used - Try the help flag for a list of possible options
31lcg-utils commands
lcg-cp Copies a grid file to a local destination
lcg-cr Copies a file to a SE and registers the file in the catalog
lcg-del Delete one file
lcg-rep Replication between SEs and registration of the replica
lcg-gt Gets the TURL for a given SURL and transfer protocol
lcg-sd Sets file status to Done for a given SURL in a SRM request
File Catalog Interaction
lcg-aa Add an alias in LFC for a given GUID
lcg-ra Remove an alias in LFC for a given GUID
lcg-rf Registers in LFC a file placed in a SE
lcg-uf Unregisters in LFC a file placed in a SE
lcg-la Lists the alias for a given SURL, GUID or LFN
lcg-lg Get the GUID for a given LFN or SURL
lcg-lr Lists the replicas for a given GUID, SURL or LFN
32Gathering informations lcg-infosites
- scampana_at_grid019 lcg-infosites --vo gilda se
- These are the related data for gilda (in terms
of SE)
- Avail Space(Kb) Used Space(Kb) SEs
- --------------------------------------------------
-------- - 1570665704 576686868
grid3.na.astro.it - 225661244 1906716
grid009.ct.infn.it - 523094840 457000
grid003.cecalc.ula.ve - 1570665704 576686868
testbed005.cnaf.infn.it - 15853516 1879992
gilda-se01.pd.infn.it
33lcg_utils CLI usage example
scampana_at_grid019 lcg-cr --vo gilda -l
lfnsimone-important \ -d grid3.na.astro.it
file//pwd/important-file.txt guid08d02e56-bdf
6-4833-a4da-e0247c188242
scampana_at_grid019 ls -l important-file.txt -rw
-r--r-- 1 scampana users 19 Oct 31
1709 important-file.txt
scampana_at_grid019 lcg-lr --vo gilda
lfnsimone-important sfn//grid3.na.astro.it/flat
files/SE00/gilda/generated/2004-10-31/
\ file4c7c2ad6-4d93-4cd2-be24-bf4239f58208
scampana_at_grid019 lcg-rep --vo gilda \ -d
grid003.cecalc.ula.ve lfnsimone-important
scampana_at_grid019 lcg-lr --vo gilda
lfnsimone-important sfn//grid003.cecalc.u
la.ve/flatfiles/SE00/gilda/generated/2004-10-31/
\ file39568d15-e873-4f17-9371-b8862ae77c36 sfn//g
rid3.na.astro.it/flatfiles/SE00/gilda/generated/20
04-10-31/ \ file4c7c2ad6-4d93-4cd2-be24-bf4239f582
08
scampana_at_grid019 lcg-del --vo gilda -a
lfnsimone-important scampana_at_grid019
lcg-lr --vo gilda lfnsimone-important lcg_lr No
such file or directory
IMPORTANT The lcg_utils (both CLI and API
described later) need to access the Information
System (BDII). The name of the BDII host used by
lcg_utils is specified in the environment
variable LCG_GFAL_INFOSYS REMEMBER THAT,
ESPECIALLY WHEN PERFORMING DATA MANAGEMENT
OPERATIONS FROM THE WN
- We have a local file in our UI in Catania
Upload the file in Naples (Italy)
The file is effectively there
. Let s replicate it to Merida now
Delete all the replicas in the storage elements.
34Overview
- Introduction on Data Management (DM)
- General Concepts
- Some details on transport protocols
- Data management operations
- Files replicas Name Convention
- File catalogs
- Cataloging requirements and catalogs in egee/LCG
- RLS file catalog
- LCG file catalog
- DM tools overview
- Data Management CLI
- lcg_utils
- Data Management API
- lcg_utils
- GFAL
- Advanced concepts
- Advanced utilities CLIAPIs
- OutputData JDL attribute
- Conclusions
35lcg_utils API
- lcg_utils API
- High-level data management C API
- Same functionality as lcg_util command line tools
- Single shared library
- liblcg_util.so
- Single header file
- lcg_util.h
- ( linking against libglobus_gass_copy_gcc32.so)
36lcg_utils Replica management
- int lcg_cp (char src_file, char dest_file, char
vo, int nbstreams, char conf_file, int
insecure, int insecure) - int lcg_cr (char src_file, char dest_file, char
guid, char lfn, char vo, char relative_path,
int nbstreams, char conf_file, int insecure, int
verbose, char actual_guid) - int lcg_del (char file, int aflag, char se,
char vo, char conf_file, int insecure, int
verbose) - int lcg_rep (char src_file, char dest_file,
char vo, char relative_path, int nbstreams,
char conf_file, int insecure, int verbose) - int lcg_sd (char surl, int regid, int fileid,
char token, int oflag)
37lcg_utils Catalog interaction
- int lcg_aa (char lfn, char guid, char vo, char
insecure, int verbose) - int lcg_gt (char surl, char protocol, char
turl, int regid, int fileid, char token) - int lcg_la (char file, char vo, char
conf_file, int insecure, char lfns) - int lcg_lg (char lfn_or_surl, char vo, char
conf_file, int insecure, char guid) - int lcg_lr (char file, char vo, char
conf_file, int insecure, char pfns) - int lcg_ra (char lfn, char guid, char vo, char
conf_file, int insecure) - int lcg_rf (char surl, char guid, char lfn,
char vo, char conf_file, int insecure, int
verbose, char actual_guid) - int lcg_uf (char surl, char guid, char vo,
char conf_file, int insecure)
38Available APIs
include ltiostreamgt include ltstdlib.hgt
include ltstring.hgt include ltstringgt include
ltstdio.hgt include lterrno.hgt // lcg_util is a
C library. Since we write C code here, we need
to // use extern C // extern "C" include
ltlcg_util.hgt using namespace std
/
/ / The
folling example code shows you how you can use
the lcg_util API for / / replica management.
We expect that you modify parts of this code in
/ / to make it work in your environment.
This is particularly indicated / /
by ACTION, i.e. your action is required.
/
/
/ int main
() cout ltlt "Data Management API Example " ltlt
endl char vo "cms" // ACTION fill in your
correct VO here gilda ! cout ltlt
"-------------------------------------------------
--" ltlt endl
C APIs
39Available APIs
// Copy a local file to the Storage Element and
register it in RLS // char localFile
"file/tmp/test-file" // ACTION create a
testfile char destSE "lxb0707.cern.ch" //
ACTION fill in a specific SE char actualGuid
(char) malloc(50) int verbose 2 // we use
verbosity level 2 int nbstreams 8 // we use 8
parallel streams to transfer a file
lcg_cr(localFile, destSE, NULL,
NULL, vo, NULL, nbstreams, NULL, 0,
verbose, actualGuid) if (errno)
perror("Error in copyAndRegister") return -1
else cout ltlt "We registered the file
with GUID " ltlt actualGuid ltlt endl cout ltlt
"-------------------------------------------------
--" ltlt endl
Copy and Register
40Available APIs
// Call the listReplicas (lcg_lr) method and
print the returned URLs // // The actualGuid
does not contain the prefix "guid". We add it
here and // then use the new guid as a parameter
to list replicas // stdstring guid "guid"
guid.insert(5,actualGuid) char pfns
(char) malloc(200) lcg_lr((char)
guid.c_str(), vo, NULL, 0, pfns) if(errno)
perror("Error in listReplicas")
free(pfns) return -1 else cout ltlt
"PFN " ltlt pfns ltlt endl free(pfns)
cout ltlt "---------------------------------------
------------" ltlt endl
List Replicas
41Available APIs
// Delete the replica again // int rc
lcg_del((char) guid.c_str(), 1, destSE, vo,
NULL, 0, verbose) if(rc ! 0)
perror("Error in delete") return -1 else
cout ltlt "Delete OK" ltlt endl return 0
Delete Replica
42Available APIs
CC g GLOBUS_FLAVOR gcc32 all
data-management data-management
data-management.o (CC) -o data-management \
-LGLOBUS_LOCATION/lib
-lglobus_gass_copy_GLOBUS_FLAVOR \
-LLCG_LOCATION/lib -llcg_util
-lgfal \
data-management.o data-management.o
data-management.cpp (CC) -I LCG_LOCATION/inc
lude -c data-management.cpp clean rm -rf
data-management data-management.o
Makefile used
43Overview
- Introduction on Data Management (DM)
- General Concepts
- Some details on transport protocols
- Data management operations
- Files replicas Name Convention
- File catalogs
- Cataloging requirements and catalogs in egee/LCG
- RLS file catalog
- LCG file catalog
- DM tools overview
- Data Management CLI
- lcg_utils
- Data Management API
- lcg_utils
- GFAL
- Advanced concepts
- Advanced utilities CLIAPIs
- OutputData JDL attribute
- Conclusions
44Grid File Access Library
- GFAL is a library to provide access to Grid files
- File I/O, Catalog Interaction, Storage
Interaction - Abstraction from specific implementations
- Transparent interaction with the information
service, the file catalogs - Single shared library in threaded and unthreaded
versions - libgfal.so, libgfal_pthr.so
- Single header file
- gfal_api.h
45GFAL Catalog API
- int create_alias (const char guid, const char
lfn, long long size) - int guid_exists (const char guid)
- char guidforpfn (const char surl)
- char guidfromlfn (const char lfn)
- char lfnsforguid (const char guid)
- int register_alias (const char guid, const char
lfn) - int register_pfn (const char guid, const char
surl) - int setfilesize (const char surl, long long
size) - char surlfromguid (const char guid)
- char surlsfromguid (const char guid)
- int unregister_alias (const char guid, const
char lfn) - int unregister_pfn (const char guid, const char
surl)
46GFAL Storage API
- int deletesurl (const char surl)
- int getfilemd (const char surl, struct stat64
statbuf) - int set_xfer_done (const char surl, int reqid,
int fileid, char token, int oflag) - int set_xfer_running (const char surl, int
reqid, int fileid, char token) - char turlfromsurl (const char surl, char
protocols, int oflag, int reqid, int fileid,
char token) - int srm_get (int nbfiles, char surls, int
nbprotocols, char protocols, int reqid, char
token, struct srm_filestatus filestatuses) - int srm_getstatus (int nbfiles, char surls, int
reqid, char token, struct srm_filestatus
filestatuses)
47GFAL File I/O API (I)
- int gfal_access (const char path, int amode)
- int gfal_chmod (const char path, mode_t mode)
- int gfal_close (int fd)
- int gfal_creat (const char filename, mode_t
mode) - off_t gfal_lseek (int fd, off_t offset, int
whence) - int gfal_open (const char filename, int flags,
mode_t mode) - ssize_t gfal_read (int fd, void buf, size_t
size) - int gfal_rename (const char old_name, const char
new_name) - ssize_t gfal_setfilchg (int, const void ,
size_t) - int gfal_stat (const char filename, struct stat
statbuf) - int gfal_unlink (const char filename)
- ssize_t gfal_write (int fd, const void buf,
size_t size)
48GFAL protocol of File Open
49GFAL File I/O API (II)
- int gfal_closedir (DIR dirp)
- int gfal_mkdir (const char dirname, mode_t
mode) - DIR gfal_opendir (const char dirname)
- struct dirent gfal_readdir (DIR dirp)
- int gfal_rmdir (const char dirname)
50Overview
- Introduction on Data Management (DM)
- General Concepts
- Some details on transport protocols
- Data management operations
- Files replicas Name Convention
- File catalogs
- Cataloging requirements and catalogs in egee/LCG
- RLS file catalog
- LCG file catalog
- DM tools overview
- Data Management CLI
- lcg_utils
- Data Management API
- lcg_utils
- GFAL
- Advanced concepts
- Advanced utilities CLIAPIs
- OutputData JDL attribute
- Conclusions
51Advanced utilities edg-gridftp
Used for low level management of file/directories
in SEs
- edg-gridftp-exists TURL Checks if file/dir
exists on a SE - edg-gridftp-ls TURL Lists a directory on a SE
- globus-url-copy srcTURL dstTURL Copies files
between SEs - edg-gridftp-mkdir TURL Creates a
directory on a SE - edg-gridftp-rename srcTURL dstTURL Renames a
file on a SE - edg-gridftp-rm TURL Removes a file from a SE
- edg-gridftp-rmdir TURL Removes a directory on
a SE
52edg-gridftp example
Create and delete a directory in a GILDA Storage
Element
53Other Advanced CLIAPI
- globus-url-copy srcTURL destTURL
- low level file transfer
- Interaction with RLS components
- edg-lrc command (actions on LRC)
- edg-rmc command (actions on RMC)
- C and Java API for all catalog operations
- http//edg-wp2.web.cern.ch/edg-wp2/replication/doc
u/r2.1/edg-lrc-devguide.pdf - http//edg-wp2.web.cern.ch/edg-wp2/replication/doc
u/r2.1/edg-rmc-devguide.pdf - Using low level CLI and API is STRONGLY
discouraged - Risk loose consistency between SEs and
catalogues - REMEMBER a file is in Grid if it is BOTH
- stored in a Storage Element
- registered in the file catalog
54OutputData JDL attribute
- Same as lcg-cr command
- OutputData JDL attribute specifies files to be
copied and registered into the Grid - The filename (OutputData) is compulsory
- If no LFN specified (LogicalFileName), none is
set! - If no SE specified (StorageElement), the default
SE is chosen (VO_ltVOgt_DEFAULT_SE) - At the end of the job the files are moved from WN
and registered - OutputData OutputFile toto.out
StorageElement adc0021.cern.ch
LogicalFileName lfntheBestTotoEver , - OutputFile toto2.out
StorageElement adc0021.cern.ch
LogicalFileName lfntheBestTotoEver2
55Overview
- Introduction on Data Management (DM)
- General Concepts
- Some details on transport protocols
- Data management operations
- Files replicas Name Convention
- File catalogs
- Cataloging requirements and catalogs in egee/LCG
- RLS file catalog
- LCG file catalog
- DM tools overview
- Data Management CLI
- lcg_utils
- Data Management API
- lcg_utils
- GFAL
- Advanced concepts
- Advanced utilities CLIAPIs
- OutputData JDL attribute
- Conclusions
56Summary
- We provided a description to the egee/LCG Data
Management Middleware Components and Tools - We described how to use the available CLIs
- Use-case scenarios of Data Movement on Grid
- We presented the available APIs
- An example usage of lcg_util library is shown
57Bibliography
- General egee/LCG information
- EGEE Homepage
- http//public.eu-egee.org/
- EGEEs NA3 User Training and Induction
- http//www.egee.nesc.ac.uk/
- LCG Homepage
- http//lcg.web.cern.ch/LCG/
- LCG-2 User Guide
- https//edms.cern.ch/file/454439//LCG-2-UserGuide.
html - GILDA
- http//gilda.ct.infn.it/
- GENIUS (GILDA web portal)
- http//grid-tutor.ct.infn.it/
58Bibliography
- Information on Data Management middleware
- LCG-2 User Guide (chapters 3rd and 6th)
- https//edms.cern.ch/file/454439//LCG-2-UserGuide.
html - Evolution of LCG-2 Data Management. J-P Baud,
James Casey. - http//indico.cern.ch/contributionDisplay.py?contr
ibId278sessionId7confId0 - Globus 2.4
- http//www.globus.org/gt2.4/
- GridFTP
- http//www.globus.org/datagrid/gridftp.html
- GFAL
- http//grid-deployment.web.cern.ch/grid-deployment
/gis/GFAL/GFALindex.html
59Bibliography
- Information on egee/LCG tools and APIs
- Manpages (in UI)
- lcg_utils lcg- (commands), lcg_ (C
functions) - Header files (in LCG_LOCATION/include)
- lcg_util.h
- CVS developement (sources for commands)
- http//isscvs.cern.ch8180/cgi-bin/cvsweb.cgi/?hid
enonreadable1fu logsortdatesortbyfilehidea
ttic1cvsrootlcgwarepath - Information on other tools and APIs
- EDG CLIs and APIs
- http//edg-wp2.web.cern.ch/edg-wp2/replication/doc
umentation.html - Globus
- http//www-unix.globus.org/api/c/ ,
...globus_ftp_client/html , ...globus_ftp_control
/html