Dynamic Data Grid Replication Strategy based on Internet Hierarchy - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Dynamic Data Grid Replication Strategy based on Internet Hierarchy

Description:

... to Data Grid. Optimizations in ... Two Most Important Approaches for Data Grid ... IEEE Workshop on Grid Computing (Grid'2002), Baltimore, USA, November 2002. ... – PowerPoint PPT presentation

Number of Views:656
Avg rating:3.0/5.0
Slides: 19
Provided by: young3
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Data Grid Replication Strategy based on Internet Hierarchy


1
Dynamic Data Grid Replication Strategy based on
Internet Hierarchy
  • Sang Min Park ?, Jai-Hoon Kim,
  • and Young-Bae Ko
  • Ajou University
  • South Korea

2
Contents
  • Introduction to Data Grid
  • Optimizations in Data Grid
  • Novel Replication Strategy based on Internet
    Hierarchy
  • Simulation
  • Simulation Results
  • Conclusions

3
Introduction to Data Grid
  • Data Grid Motivations
  • Petabyte scale data production
  • Distributed data storage to store parts of data
  • Distributed computing resources which process the
    data
  • Two Most Important Approaches for Data Grid
  • Secure, reliable, and efficient data transport
    protocol
  • (ex. GridFTP)
  • Replication (ex. Replica catalog)
  • Replication
  • Large size files are partially replicated among
    sites
  • Reduce data access time
  • Application Scheduling, Dynamic replication
    issues are emerging

4
Introduction to Data Grid
  • Typical Job Execution Scenario

5
Optimizations in Data Grid
  • Reducing the Overall Job Execution Time
  • Scheduling Optimization
  • Deciding where to allocate the job
  • Considering location of replicas and
    computational capabilities of sites
  • Short-term Optimization
  • Deciding from where to fetch replicas
  • Considering available network bandwidth between
    sites
  • Long-term Optimization (Dynamic Replication
    Strategy)
  • Shortage of storage in a site
  • Deciding which file should be remaining as a
    replica
  • Better to replicate popular files because of its
    future usage

6
Existing Dynamic Replication Strategies
  • Replica Optimization based on Site-level Locality
  • Replicate the file that is predicted to be used
    in future from the perspective of a site
  • Try to reduce the number of fetch
  • Delete Oldest, Delete LRU Method
  • Economic Strategy from European Data Grid
  • Developing OptorSim Data Grid Optimization
    Simulator
  • Using Auction Protocol to trigger Long-term
    Optimization
  • Site-level Locality based on File access patterns

7
Existing Dynamic Replication Strategies
  • The Limitations of the site-level optimization
  • A Site certainly have limitations of their
    storage size, which means that the rate of data
    request locality is also limited
  • There should be predictable file access patterns,
    but we do not know if there will be.

8
Replication Strategy based on Bandwidth Hierarchy
(BHR)
  • Network-level Locality
  • A site is not the only possible source of
    locality
  • Another source of locality Network-level
    locality
  • If the replica is located in a close site, not
    long delay would be taken to fetch this replica

Slow Replica Transmission
Fast Replica Transmission
Network Region (e.g., a country)
9
Replication Strategy based on Bandwidth Hierarchy
(BHR)
  • Bandwidth Hierarchy

10
Replication Strategy based on Bandwidth Hierarchy
(BHR)
  • Maximizing Network-level locality
  • 1. Avoiding Replica Duplication in a region
  • 2. Considering popularity of file request at the
    region-level

No space here! We should remove some file
Replica X is duplicated here!
X
Receiving New Replica
a Site
a Site
A Region
11
Simulation
  • OptorSim
  • Data Grid Dynamic Replication Simulation tool
  • Developed as part of European Data Grid Project
  • Implemented in Java
  • Implemented Our own Region-based Optimizer in
    OptorSim

12
Simulation
  • Simulation Environment

13
Simulations
General configuration of parameters
Bandwidth and Storage Size
14
Simulation Results
Total Job times of three strategies
15
Simulation Results

Total job time with varying bandwidth and storage
size
16
Conclusions
  • The existing dynamic replication strategies are
    based only on site-level locality of file request
  • BHR strategy is based on the network-locality
  • BHR shows quite good performance when hierarchy
    of bandwidth clearly appears, and size of storage
    at a site is small
  • We extend current site-level replica optimization
    study to more scalable way

17
References
  • William H. Bell, David G. Cameron, Luigi Capozza,
    A. Paul Millar, Kurt Stockinger, and Floriano
    Zini. Simulation of Dynamic Grid Replication
    Strategies in OptorSim. In Proc. of the 3rd
    Int'l. IEEE Workshop on Grid Computing
    (Grid'2002), Baltimore, USA, November 2002.
    Springer Verlag, Lecture Notes in Computer
    Science.
  • William H. Bell, David G. Cameron, Ruben
    Carvajal-Schiaffino, A. Paul Millar, Kurt
    Stockinger, and Floriano Zini. Evaluation of an
    Economy-Based File Replication Strategy for a
    Data Grid. In International Workshop on Agent
    based Cluster and Grid Computing at CCGrid 2003,
    Tokyo, Japan, May 2003. IEEE Computer Society
    Press.
  • Mark Carman, Floriano Zini, Luciano Serafini, and
    Kurt Stockinger. Towards an Economy-Based
    Optimisation of File Access and Replication on a
    Data Grid. In International Workshop on Agent
    based Cluster and Grid Computing at International
    Symposium on Cluster Computing and the Grid
    (CCGrid'2002), Berlin, Germany, May 2002. IEEE
    Computer Society Press.
  • Ann Chervenak, Ian Foster, Carl Kesselman,
    Charles Salisbury and Steven Tuecke. The Data
    Grid Towards an Architecture for the Distributed
    Management and Analysis of Large Scientific
    Datasets. Journal of Network and Computer
    Applications, 23187-200, 2001.
  • EU Data Grid Project http//www.eu-datagrid.org

18
References
  • I. Foster, C. Kesselman and S. Tuecke. The
    Anatomy of the Grid Enabling Scalable Virtual
    Organizations. International J. Supercomputer
    Applications, 15(3), 2001.
  • Wolfgang Hoschek, Javier Jaen-Martinez, Asad
    Samar, Heinz Stockinger and Kurt Stockinger.
    Data Management in an International Data Grid
    Project. 1st IEEE/ACM International Workshop on
    Grid Computing (Grid'2000), Bangalore, India, Dec
    2000.
  • OptorSim A Replica Optimizer Simulation
    http//edg-wp2.web.cern.ch/edg-wp2/optimization/op
    torsim.html
  • Sang-Min Park and Jai-Hoon Kim. Chameleon A
    Resource Scheduler in a Data Grid Environment.
    2003 IEEE/ACM International Symposium on Cluster
    Computing and the Grid (CCGRID'2003), Tokyo,
    Japan, May 2003. IEEE Computer Society Press.
  • Kavitha Ranganathan and Ian Foster. Design and
    Evaluation of Dynamic Replication Strategies for
    a High Performance Data Grid. International
    Conference on Computing in High Energy and
    Nuclear Physics, Beijing, September 2001.
  • Kavitha Ranganathan and Ian Foster. Identifying
    Dynamic Replication Strategies for a High
    Performance Data Grid. International Workshop on
    Grid Computing, Denver, November 2001.
Write a Comment
User Comments (0)
About PowerShow.com