Parrot:%20Transparent%20User-Level%20Middleware%20for%20Data-Intensive%20Computing - PowerPoint PPT Presentation

About This Presentation
Title:

Parrot:%20Transparent%20User-Level%20Middleware%20for%20Data-Intensive%20Computing

Description:

Transparent User-Level Middleware. for Data-Intensive Computing ... Software, manuals, more info: http://www.cs.wisc.edu/condor/parrot. The Condor Project: ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 26
Provided by: dougl229
Category:

less

Transcript and Presenter's Notes

Title: Parrot:%20Transparent%20User-Level%20Middleware%20for%20Data-Intensive%20Computing


1
ParrotTransparent User-Level Middlewarefor
Data-Intensive Computing
  • Douglas Thain
  • Condor Project, University of Wisconsin
  • Workshop on Adaptive Grid Middleware
  • 28 September 2003

2
The Reality of the Grid
afwuhweiuhsdvxmndf (and then a miracle
happens) PNP
I think you have a problem here...
Look at my new proof!
3
Condor
PBS
NQE
LSF
Load Leveler
run this batch job
Local Operating System
Process Interface (main, exit, abort, kill, sleep)
Users App
Parrot
4
Applications of Parrot
  • Interactive Browsing
  • tcsh, tar, gzip, make, acroread, gv, xv...
  • Improved Reliability
  • Transparent retry/reassignment/reallocation
  • Files, sockets, even repair broken apps.
  • Private Namespaces
  • Make /home/thain appear the same everywhere.
  • Make /usr/data/calibration different everywhere.
  • Dynamic/Distributed Program Construction
  • Remote link, remote exec, remote eval...
  • Profiling and Debugging
  • Users may not know low-level I/O patterns.

5
Challenges
  • Technical Methods of Interposition
  • Semantic Differences
  • Error Management
  • CPU I/O Integration
  • Performance
  • The butterfly effect
  • Subtle underlying differences can have large
    effects in performance and usability.

6
Internal Techniques
Binary Rewriting
Polymorphic Extension


App Code
App Code
Standard Library
Library
M1
M2
NEW
New Code

App Code
New Library
Standard Library
Static or Dynamic Re-Linking
7
External Techniques
Debugger Trap
Remote Filesystem
App
Agent
App
Kernel
Kernel Callout
Kernel
Agent
App
NFS
LFS
FFS
NFS
LFS
FFS
agent
Kernel
NFS
LFS
USR
8
Techniques Compared
technique burden speed hole detection
polymorphic rewrite fast easy
static link relink fast hard
dynamic link dynlink medium hard
binary rewrite dynlink fast hard
remote fs root varies easy
callout root slow easy
debugger none very slow easy
9
Hole Detection Matters
  • Dynamic Linking
  • Bypass Toolkit, ca. 2000
  • Works with some standard tools.
  • Many still crash in strange ways.
  • Doesnt apply to static exes always a surprise.
  • Debugger Trap
  • Parrot Coding began in May of 2003.
  • Works reliably with almost everything in
    /usr/bin.
  • Caveat 1 Twice as much code
  • Caveat 2 Higher latency

10
Debugger Trap
  • For the rest of this talk, we select the debugger
    trap for completeness and reliability. Much of
    the discussion still applies to the other
    techniques too.
  • Some technical details in the paper
  • Only on Linux.
  • Must manage process ancestry.
  • Must fudge some broken ptrace behavior.
  • Cannot write directly to process, must take
    roundabout path through temp file.

11
User Process
SYS_write
SYS_read
SYS_open
(debugger trap)
parrot_read
parrot_open
parrot_write
File Descr.
0
1
2
3
4
5
6
7
8
9
...
name resolver
File Pointers
pos 100
pos 0
pos 0
pos 1 MB
pos 42
mount list driver
chirp lookup driver
File Objects
outfile
infile
config
data
Local Driver
Chirp Driver
FTP Driver
NeST Driver
RFIO Driver
DCAP Driver
Device Drivers
12
Adaptation
On same host
/mydata -gt /usr/data
App
open(/mydata/foo)
Parrot
Local
FTP
Chirp
/usr/data
13
What Protocol?
  • File Transfer Protocol
  • Internet standard, many implementations.
  • High bandwidth sequential access.
  • NeST
  • General purpose storage appliance from UW.
  • Virtual users, namespace, and allocation.
  • RFIO
  • Remote I/O protocol used with CERN CASTOR.
  • UNIX like, most ops require a new TCP.
  • DCAP
  • Remote I/O protocol used with Fermi D-Cache
  • UNIX like, WORM semantics, no directories,
    caching/
  • Chirp
  • Protocol developed _at_ UW for Parrot.
  • Corresponds very closely to UNIX, incl errnos.

14
Small Details Matter
  • Standard tools need to know subtle details,
    otherwise, they break
  • ls lR performs getdents(foo)
  • on success descend
  • on ENOTDIR display and continue
  • on ENOENT display error and stop.
  • FTP does not provide this detail
  • Failed LIST -gt error 550
  • Failed GET -gt error 550
  • Failed CDIR -gt error 550
  • Simple assignment doesnt work
  • Making 550ENOENT breaks many tools.

15
Example Solution
LIST foo
200
Success
other
550
CWD foo
Transient Error
550
other
Not a dir.
200
SIZE foo
other
200
Access denied.
No such entry.
550
16
CPU-IO Integration
  • Errors that cannot be expressed in the clients
    interface must be passed to a higher level (the
    batch system.)
  • Simple options
  • kill 9 application (retry app elsewhere)
  • exit(1) application (dont retry app)
  • Complex options (Condor only)
  • restart with (Subnet!128.101.175)
  • restart with (CurrentTimegt5pm)

17
Bandwidth by Protocol
18
Latency by Protocol (ms)
stat open close read 1B read 8KB write 1B write 8KB
chirp 0.50 0.84 0.61 2.80 0.38 2.23
ftp 0.87 2.82 - - - -
nest 2.51 2.53 2.96 4.48 5.53 7.41
rfio 13.41 23.11 0.50 3.32 39.8 2.85
dcap 152.53 159.09 40.05 3.01 40.14 3.14
19
Andrew-Like Benchmark
  • Original Andrew benchmark is no longer
    appropriate, so replace with the Parrot source
    296 files, 955 KB.
  • Copy the source to a remote device, then
    manipulate in five stages
  • copy cp rp
  • list ls lR
  • scan grep searchstring r
  • make make
  • delete rm rf

20
Overheads Compared
21
Overheads Compared
22
Protocols Compared
23
Protocols Compared
24
Moral of the story
  • The butterfly effect Small underlying
    differences can have big effects on performance
    and reliability.
  • Examples in interposition
  • Dynamic linking fast but poor hole detection.
  • Debugger trap slow but good hold detection.
  • Examples in protocols
  • Chirp UNIX semantics restrict bandwidth.
  • FTP Need for multiple ops increases latency.
  • NeST Powerful virtualization increases latency.
  • RFIO Connection per op doesnt scale.

25
For more info...
  • Douglas Thain
  • thain_at_cs.wisc.edu
  • Miron Livny
  • miron_at_cs.wisc.edu
  • Software, manuals, more info
  • http//www.cs.wisc.edu/condor/parrot
  • The Condor Project
  • http//www.cs.wisc.edu/condor
Write a Comment
User Comments (0)
About PowerShow.com