GridFS Targeting Data Sharing in Grid Environments - PowerPoint PPT Presentation

About This Presentation
Title:

GridFS Targeting Data Sharing in Grid Environments

Description:

GridFS used to implement CSFS Daemon ... CSFS Daemons allow local file system ... CSBase server starts a command on that host using the Node Daemon ... – PowerPoint PPT presentation

Number of Views:126
Avg rating:3.0/5.0
Slides: 22
Provided by: hom4487
Category:

less

Transcript and Presenter's Notes

Title: GridFS Targeting Data Sharing in Grid Environments


1
GridFSTargeting Data Sharing in Grid Environments
  • Marcelo Nery dos Santos / Renato Cerqueira
  • PUC-Rio, Brazil
  • Presented by Francisco Silva

2
Motivation
  • User-level file system infra-structure
  • Providing access to remote file systems
  • Having a simple configuration
  • No need for super-user privileges
  • Lessening problems faced by CSBase, a framework
    for developing grid environments
  • Reducing NFS dependency
  • Facilitating deployment
  • Enabling useful file transfer metrics

3
Related Work
  • Distributed File Systems (e.g. NFS/AFS)
  • Configuration overhead for system administrators
  • Local access to large files is not available
  • Avaki Data Grid
  • Proprietary solution, no file transfer metrics
  • Globus
  • GridFTP / Reliable File Transfer Service
  • Useful, but hard installation for novices
  • Oversized solution for simpler cases

4
GridFS - Characteristics
  • Scalability, allowing a large number of files to
    be shared
  • Performance
  • Interoperability through the use of CORBA for
    remote access
  • Federative approach.

5
GridFS - Characteristics
  • Historical data about data transfers, that can be
    used by scheduling algorithms in order to choose
    na executing host for a task based on the
    estimated time and effort for data transfer
  • Metadata support that can store (field, value)
    tuples
  • Object Oriented Interface.

6
GridFS - Features
  • Remote File System Access
  • List / Create / Delete files and directories
  • Read / Write operations over files
  • Retrieve file system free space
  • General Operations
  • Metadata get / set operations
  • Copy files directly between servers
  • Add / Remove mount points
  • In order to allow a GridFS federation

7
CORBA IDL RemoteFile
  • interface RemoteFile
  • RemoteFile createDirectory(in Path name)
  • RemoteFile createFile(in Path name)
  • RemoteFile getChild(in Path name)
  • FileSequence getChildren()
  • boolean remove()
  • ReadChannel getReadChannel()
  • WriteChannel getWriteChannel()
  • RandomAccessChannel getRandomAccessChannel()
  • boolean copyTo (in RemoteFile dst, in string
    method)
  • boolean addMount (in Path name, in RemoteFile
    target)
  • RemoteFile removeMountPoint (in Path name)
  • FileServer getFileServer()
  • //continues...

8
GridFS Data Accessibility
  • Remote Access
  • Through CORBA remote invocations
  • Allows read/write access
  • By mounting a GridFS on the local file systems
    using FUSE
  • Allows use of legacy applications
  • File Transfer Operations
  • Several implementation methods
  • Java NIO / CORBA / FTP
  • New methods/protocols can be easily added
  • Performance evaluation

9
Implementation Aspects
  • CORBA
  • Interoperability
  • Scalability
  • POA Policies (RootPOA) (DefaultServant)
  • Java
  • Portability
  • Performance issues
  • Use of NIO allows performance similar to FTP
  • (Transfer Rate) (CPU) (Load)

10
GridFS - Limitations
  • Remove operations only over leaves
  • Files or empty directories
  • No lock mechanism
  • Several writers to the same file (unix-like)
  • Single user
  • No users, groups or permissions
  • Caching
  • No caching policies implemented

11
Limits Tested
  • Simultaneous file transfer operations
  • NIO (96 - 192, independently of the method used)
  • FTP (50, PureFTPd server limit)
  • CORBA (80 - 480, 80 threads dealing with 480 ops)
  • Performance
  • NIO and FTP limited by IDE disk speed (Gigabit
    network)
  • CORBA limited by disk speed and Round Trip Time
  • FUSE 1,5MB/s (naive implementation)
  • Remote Access Channels
  • 1000 (operating system file descriptors limit)

12
CSBase
  • Infra-structure for remote algorithm execution
  • GridFS used to implement CSFS Daemon
  • Files are copied to execution host or accessed
    remotely by NFS
  • CSBase server controls file transfer operations
    from Data Repository to Execution Hosts
  • CSFS Daemons allow local file system accessibility

13
CSBase Algorithm Execution
  1. User requests an algorithm execution
  2. CSBase server creates an object to handle the
    request
  3. This object verifies if the selected execution
    host has access to binaries and data files (uses
    CSFS to copy files, if necessary)
  4. CSBase server starts a command on that host using
    the Node Daemon
  5. Whenever the command is finished, the modified
    files are synchronized back to repository
  6. A clean-up procedure is invoked
  7. Client is notified of command completion

14
Main Contributions
  • A file server that
  • Is scalable, portable, interoperable
  • Has reasonable performance
  • Combines the benefits of different approaches
  • Remote File Access
  • File Staging
  • Offers special functionalities for Grid Computing
    (estimated transfer cost and file copy to local
    system)

15
Future Work
  • Notification Mechanism
  • Allowing caching policies implementation and
    online remote-tree visualization for GUIs
  • Users and security issues
  • In order to guarantee data integrity and
    confidentiality
  • Index and search capabilities
  • Over the stored metadata

16
Questions?
17
(No Transcript)
18
(No Transcript)
19
(xy) x number of client machines / y number of
threads
20
(No Transcript)
21
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com