Reliable%20and%20Efficient%20%20%20Grid%20Data%20Placement%20using%20Stork%20and%20DiskRouter - PowerPoint PPT Presentation

About This Presentation
Title:

Reliable%20and%20Efficient%20%20%20Grid%20Data%20Placement%20using%20Stork%20and%20DiskRouter

Description:

Control number of concurrent transfers from/to any storage system. Prevents overloading ... A mechanism for high performance, large scale data transfers ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 32
Provided by: Tevfik7
Category:

less

Transcript and Presenter's Notes

Title: Reliable%20and%20Efficient%20%20%20Grid%20Data%20Placement%20using%20Stork%20and%20DiskRouter


1
Reliable and Efficient Grid Data Placement
using Stork and DiskRouter
Tevfik Kosar University of Wisconsin-Madison kosa
rt_at_cs.wisc.edu April 15th, 2004
2
A Single Project..
  • LHC (Large Hadron Collider)
  • Comes online in 2006
  • Will produce 1 Exabyte data by 2012
  • Accessed by 2000 physicists, 150 institutions,
    30 countries

3
And Many Others..
  • Genomic information processing applications
  • Biomedical Informatics Research Network (BIRN)
    applications
  • Cosmology applications (MADCAP)
  • Methods for modeling large molecular systems
  • Coupled climate modeling applications
  • Real-time observatories, applications, and
    data-management (ROADNet)

4
The Same Big Problem..
  • Need for data placement
  • Locate the data
  • Send data to processing sites
  • Share the results with other sites
  • Allocate and de-allocate storage
  • Clean-up everything
  • Do these reliably and efficiently

5
Outline
  • Introduction
  • Stork
  • DiskRouter
  • Case Studies
  • Conclusions

6
Stork
  • A scheduler for data placement activities in the
    Grid
  • What Condor is for computational jobs, Stork is
    for data placement
  • Stork comes with a new concept
  • Make data placement a first class citizen in the
    Grid.

7
The Concept
8
The Concept
9
The Concept
Condor Job Queue
DaP A A.submit DaP B B.submit Job C
C.submit .. Parent A child B Parent B child
C Parent C child D, E ..
DAG specification
C
DAGMan
Stork Job Queue
C
E
10
Why Stork?
  • Stork understands the characteristics and
    semantics of data placement jobs.
  • Can make smart scheduling decisions, for reliable
    and efficient data placement.

11
Failure Recovery and Efficient Resource
Utilization
  • Fault tolerance
  • Just submit a bunch of data placement jobs, and
    then go away..
  • Control number of concurrent transfers from/to
    any storage system
  • Prevents overloading
  • Space allocation and De-allocations
  • Make sure space is available

12
Support for Heterogeneity
Protocol translation using Stork memory buffer.
13
Support for Heterogeneity
Protocol translation using Stork Disk Cache.
14
Flexible Job Representation and Multilevel Policy
Support
  • Type Transfer
  • Src_Url srb//ghidorac.sdsc.edu/kosart.cond
    or/x.dat
  • Dest_Url nest//turkey.cs.wisc.edu/kosart/x
    .dat
  • Max_Retry 10
  • Restart_in 2 hours

15
Run-time Adaptation
  • Dynamic protocol selection
  • dap_type transfer
  • src_url drouter//slic04.sdsc.edu/tmp/tes
    t.dat
  • dest_url drouter//quest2.ncsa.uiuc.edu/tmp
    /test.dat
  • alt_protocols nest-nest, gsiftp-gsiftp
  • dap_type transfer
  • src_url any//slic04.sdsc.edu/tmp/test.da
    t
  • dest_url any//quest2.ncsa.uiuc.edu/tmp/tes
    t.dat

16
Run-time Adaptation
  • Run-time Protocol Auto-tuning
  • link slic04.sdsc.edu quest2.ncsa.uiuc.edu
  • protocol gsiftp
  • bs 1024KB //block size
  • tcp_bs 1024KB //TCP buffer size
  • p 4

17
Outline
  • Introduction
  • Stork
  • DiskRouter
  • Case Studies
  • Conclusions

18
DiskRouter
  • A mechanism for high performance, large scale
    data transfers
  • Uses hierarchical buffering to aid in large scale
    data transfers
  • Enables application-level overlay network for
    maximizing bandwidth
  • Supports application-level multicast

19
Store and Forward
C
A
With DiskRouter
DiskRouter
B
Without DiskRouter
Improves performance when bandwidth fluctuation
between A and B is independent of the bandwidth
fluctuation between B and C
20
DiskRouter Overlay Network
90 Mb/s
B
A
21
DiskRouter Overlay Network
90 Mb/s
B
A
400 Mb/s
400 Mb/s
DiskRouter
C
Add a DiskRouter Node C which is not necessarily
on the path from A to B, to enforce use of an
alternative path.
22
Data Mover/Distributed Cache
Source
Destination
DiskRouter Cloud
  • Source writes to the closest DiskRouter and
    Destination receives it up from its closest
    DiskRouter

23
Outline
  • Introduction
  • Stork
  • DiskRouter
  • Case Studies
  • Conclusions

24
Case Study I SRB-UniTree Data Pipeline
  • Transfer 3 TB of DPOSS data from SRB _at_SDSC to
    UniTree _at_NCSA
  • A data pipeline created with Stork and DiskRouter

25
Failure Recovery
Diskrouter reconfigured and restarted
UniTree not responding
SDSC cache reboot UW CS Network outage
Software problem
26
Case Study -II
27
Dynamic Protocol Selection
28
Runtime Adaptation
  • Before Tuning
  • parallelism 1
  • block_size 1 MB
  • tcp_bs 64 KB
  • After Tuning
  • parallelism 4
  • block_size 1 MB
  • tcp_bs 256 KB

29
Conclusions
  • Regard data placement as first class citizen.
  • Introduce a specialized scheduler for data
    placement.
  • Introduce a high performance data transfer tool.
  • End-to-end automation, fault tolerance, run-time
    adaptation, multilevel policy support, reliable
    and efficient transfers.

30
Future work
  • Enhanced interaction between Stork, DiskRouter
    and higher level planners
  • co-scheduling of CPU and I/O
  • Enhanced authentication mechanisms
  • More run-time adaptation

31
You dont have to FedEx your data anymore.. We
deliver it for you!
  • For more information
  • Stork
  • Tevfik Kosar
  • Email kosart_at_cs.wisc.edu
  • http//www.cs.wisc.edu/condor/stork
  • DiskRouter
  • George Kola
  • Email kola_at_cs.wisc.edu
  • http//www.cs.wisc.edu/condor/diskrouter
Write a Comment
User Comments (0)
About PowerShow.com