Title: Building Hierarchical Grid Storage Using the GFarm Global File System and the JuxMem Grid DataSharin
1Building Hierarchical Grid Storage Using the
GFarm Global File System and the JuxMem Grid
Data-Sharing Service
- Gabriel Antoniu,
- Loïc Cudennec,
- Majd Ghareeb
- INRIA/IRISA
- Rennes, France
Osamu Tatebe University of Tsukuba Japan
2Context Grid Computing
- Target architecture cluster federations (e.g.
Grid5000) - Focus large-scale data sharing
Solid mechanics
Optics
Dynamics
Satellite design
Thermodynamics
3Approaches for Data Management on Grids
- Use of data catalogs Globus
- GridFTP, Replica Location Service, etc
- Logistical networking of data IBP
- Buffers available across Internet
- Unified access to data SRB
- From file-systems to tapes and databases
- Limitations
- No transparency gt Increased complexity at large
scale - No consistency guarantees for replicated data
4Towards Transparent Access to Data
- Desirable features
- Uniform access to distributed data via global
identifiers - Transparent data localization and transfer
- Consistency models and protocols for replicated
data - Examples of systems taking this approach
- On clusters
- Memory level DSM systems (Ivy, TreadMarks, etc.)
- File level NFS-like systems
- On grids
- Memory level data sharing services
- JuxMem - INRIA Rennes, France
- File level global file systems
- Gfarm - AIST/University of Tsukuba, Japan
5Idea a Collaborative Research of Memory and
File-level Data Sharing
- Study possible interactions between
- The JuxMem grid data sharing service
- The Gfarm global file system
- Goal
- Enhance global data sharing functionality
- Improve performance and reliability
- Build a memory hierarchy for global data sharing
by combining the memory level and the file system
level - Approach
- Enhance JuxMem with Persistent Storage using
Gfarm - Support
- The DISCUSS Sakura bilateral collaboration
(2006-2007) - NEGST (2006-2008)
6JuxMem a Grid Data-Sharing Service
- Generic grid data-sharing service
- Grid-scale 103-104 nodes
- Transparent data localization
- Data consistency
- Fault-tolerance
- JuxMem DSM P2P
- Implementation
- Multiple replication strategies
- Configurable consistency protocols
- Based on JXTA 2.0 (http//www.jxta.org/)
- Integrated into 2 grid programming
- models
- GridRPC (DIET, ENS Lyon)
- Component models (CCM CCA)
Juxmem group
Data group D
Cluster group B
Cluster group A
Cluster group C
http//juxmem.gforge.inria.fr
7JuxMems Data Group a Fault-Tolerant,
Self-Organizing Group
- Data availability despite failures is ensured
through replication and fault-tolerant building
blocks - Hierarchical self-organizing groups
- Cluster level Local Data Group (LDG)
- Grid level Global Data Group (GDG)
GDG Global Data Group LDG Local Data Group
8JuxMem Memory Model and API
- Memory model (currently) entry consistency
- Explicit association of data to locks
- Multiple Reader Single Writer (MRSW)
- juxmem_acquire, acquire_read, release
- Explicit lock acquire/release before/after access
- API
- Allocate memory for JuxMem data
- ptr juxmem_malloc (size, clusters, replicas
per cluster, ID) - Map existing JuxMem data to local memory
- ptr juxmem_mmap (ID), juxmem_unmap (ptr)
- Synchronization before/after data access
- juxmem_acquire(ptr), juxmem_acquire_read(ptr),
juxmem_release(ptr) - Read and write data direct access through
pointers! - int n ptr
- ptr
9Gfarm a Global File System CCGrid 2002
- Commodity-based distributed file system that
federates storage of each site - It can be mounted from all cluster nodes and
clients - It provides scalable I/O performance wrt the
number of parallel processes and users - It avoids access concentration by automatic
replica selection
Global namespace
mapping
File replica creation
Gfarm File System
10Gfarm a Global File System (2)
- Files can be shared among all nodes and clients
- Physically, it may be replicated and stored on
any file system node - Applications can access it regardless of its
location - File system nodes can be distributed
Client PC
/gfarm
Gfarm file system
metadata
File A
File A
Note PC
File B
File C
File C
File A
File B
File B
US
File C
Japan
11Our Goal Build a Memory Hierarchy for Global
Data Sharing
- Approach
- Applications use JuxMems API (memory-level
sharing) - Applications DO NOT use Gfarm directly
- JuxMem uses Gfarm to enhance data persistence
- Without Gfarm, JuxMem supports some crashes of
memory providers thanks to the self-organizing
groups - With Gfarm, persistence is further enhanced
thanks to secondary storage - How does it work?
- Basic principle on each lock release, data can
be flushed to Gfarm - Flush frequency can be tuned to compromise
efficiency/fault tolerance
12Step 1 A Single Flush by One Provider
- One particular JuxMem provider (GDG leader)
flushes data to Gfarm - Then, other Gfarm copies can be created using
Gfarms gfrep command
JuxMem Global Data Group (GDG)
JuxMem Provider
GDG Leader
JuxMem Provider
JuxMem Provider
GFSD
GFSD
GFSD
GFSD
GFarm
Cluster 1
Cluster 2
13Step 2 Parallel Flush by LDG Leaders
- One particular JuxMem provider in each cluster
(LDG leader) flushes data to Gfarm (parallel copy
creation, one copy per cluster) - The copies are registered as the same Gfarm file
- Then, extra Gfarm copies can be created using
Gfarms gfrep command
JuxMem Provider
LDG 1 Leader
JuxMem Provider
LDG 2 Leader
JuxMem Local Data Group (LDG 1)
JuxMem Local Data Group (LDG 2)
GFSD
GFSD
GFSD
GFSD
GFarm
Cluster 1
Cluster 2
14Step 3 Parallel Flush by All Providers
- All JuxMem providers in each cluster (LDG leader)
flush data to Gfarm - All copies are registered as the same Gfarm file
- Useful to create multiple copies of the Gfarm
file per cluster - No more replication using gfrep
JuxMem Global Data Group (GDG)
JuxMem Provider
JuxMem Provider
JuxMem Provider
JuxMem Provider
GFSD
GFSD
GFSD
GFSD
GFarm
Cluster 1
Cluster 2
15Deployment issues
- Application deployment on large scale
infrastructures - Reserve resources
- Configure the nodes
- Manage dependencies between processes
- Start processes
- Monitor and clean up the nodes
- Mixed-deployment of GFarm and JuxMem
- Manage dependencies between processes of both
applications - Make the JuxMem provider able to act as a Gfarm
client - Approach use a generic deployment tool ADAGE
(INRIA, Rennes, France) - Design specific plugins for Gfarm and JuxMem
16ADAGE Automatic Deployment of Applications in a
Grid Environment
- IRISA/INRIA Paris Research Group
- Deploy a same applicationon different kinds of
resources - from clusters to grids
- Support multi-middleware applications
- MPICORBAJXTAGFARM...
- Network topology description
- Latency and bandwidth hierarchy
- NAT, non-IP networks
- Firewalls, Asymmetric links
- Planner as plugin
- Round robin Random
- Preliminary support for dynamic applications
- Some successes
- 29,000 JXTA peers on 400 nodes
- 4003 components on 974 processors on 7 sites
GFarm Application Description
JuxMem Application Description
Generic Application Description
Resource Description
Control Parameters
Deployment Planning
Deployment Plan Execution
Application Configuration
17Roadmap overview (1)
- Design of the common architecture 2006
- Discussions on possible interactions between
JuxMem and Gfarm - May 2006, Singapore (CCGRID 2006)
- June 2006, Paris (HPDC 2006 and NEGST workshop)
- October 2006 Gabriel Antoniu and Loïc Cudennec
visited the Gfarm team - First deployment tests of Gfarm on G5K
- Overall Gfarm/JuxMem design
- December 2006 Osamu Tatebe visited the JuxMem
team - Refinement of the Gfarm/JuxMem design
- Implementation of JuxMem on top of Gfarm 2007
- April 2007 Gabriel Antoniu and Loïc Cudennec
visited the Gfarm team - One JuxMem provider (GDG leader) flushes data to
Gfarm after each critical section (step 1 done) - Master internship Majd Ghareeb
- December 2007 Osamu Tatebe visited the JuxMem
team - Commun paper at Euro-Par 2008
18Read performance
Worst case 39 MB/s
Gfarm 69 MB/s
Usual case 100 MB/s
19Write performance
Worst case 28.5 MB/s
Gfarm 42 MB/s
Usual case 89 MB/s
20Roadmap (2)
- Design the Gfarm plugin for ADAGE (April 2007)
- Propose a specific application description
language for GFarm - Translate the specific description into a generic
description - Start processes with respect of the dependencies
- Transfer the Gfarm configuration files from
- The Metadata Server to the Agents
- The Agents to their GFSD and Clients
- Deployment of JuxMem on top of Gfarm (May 2007) -
first prototype running on G5K) - ADAGE deploys Gfarm, then JuxMem (separate
deployment) - Limitations the user still needs to indicate the
Gfarm client hostname, the Gfarm configuration
file location - Design a meta-plugin for ADAGE that automatically
deploys a mixed description of a GfarmJuxMem
configuration (December 2007) - Gfarm v1 and v2
- Work in progress (2008)
- Fault-tolerant, distributed meta-data server
Gfarm on top of JuxMem - Master internship Andre Lage