Title: Efficient Access to Many Small Files in a Grid Filesystem
1Efficient Access toMany Small Files in a Grid
Filesystem
- Douglas Thain and Christopher Moretti
- University of Notre Dame
2Efficient Access to ManySmall (and Big) Files
in a Grid Filesystem
- Douglas Thain and Christopher Moretti
- University of Notre Dame
3Abstract
- Many grid data tools focus on transferring,
storing, and managing large (GB-TB) files. - But, many users need to manage, transfer, and
process lots (1000s) of small (KB-MB) files. - We describe protocols and interfaces for
manipulating many small files over wide area
networks. (Doesnt hurt large files, either.) - Implemented in the Chirp file system.
- Performance
- Best case order of magnitude improvement.
- Worst case no slower than before.
4The Small File Problem
5Who has lots of small files?
- Anyone using a batch system.
- One file for submit, input, output, error, log...
- Anyone using a large software package.
- Executables, libraries, config files...
- Anyone using a filesystem like a database.
- Genomics, astronomy, physics...
- Anyone who likes to write shell scripts.
- foreach host in list ssh host gt host.output
6Why is this a problem?
- Users do the sensible thing
- foreach file in (list) do transfer done
- The sensible thing performs miserably
- New TCP Connection
- SSL Authentication
- Configuration Operations
- Slow Start Again
- Result is KB/s on a GB/s link.
7Why not just use tar?
- If you can, you should!
- Sometimes you cannot
- The system semantics demand multiple files.
- Packing and unpacking can be very slow.
- Not enough disk space to unpack.
- Different apps select different data subsets.
- Using an existing script or program.
- Users dont know or care that its a dist system,
why should they change?
8The ChallengeHow to design interfacesso that
users get the expectedperformance and behavior?
9Chirp and ParrotA Grid Filesystem
10Requirements for a Grid Filesystem
- Transparent access to files in the same manner as
a local Unix filesystem. - Non privileged deployment at both client and
server. (root not possible on the grid.) - User control over policies for naming, caching,
consistency, and fault tolerance. - Flexible access controls for sharing.
- Good performance on both small and large files.
11Chirp/Parrot A Grid Filesystem
Ordinary Unix Program
Authentication Kerberos /
Globus / Hostname / Unix
No Privs Needed!
Automatic Recovery
unix system calls
ptrace trap
Single TCP Stream
Chirp
Parrot
Protocol open / pread / pwrite / close stat /
mkdir / rmdir / unlink getfile / putfile /
movefile
No Privs Needed!
Ordinary Unix Filesystem
Authorization kerberosjoe_at_nd.edu
RWLDA globus/OND/CNJoe
RWLDA hostname.nd.edu
RL groupserver.nd.edu/team RWL
12Ordinary Unix Commands
gt ls /chirp alpha.nd.edu beta.nd.edu ... gt cd
/chirp/alpha.nd.edu/mydir gt cp /tmp/bigdata . gt
emacs mydata.txt
13Parrot Specific Commands
gt parrot_whoami globus/OND/CNJoe gt
parrot_getacl /chirp/alpha.nd.edu/ kerberosjoe_at_nd
.edu RWLDA globus/OND/CNJoe
RWL hostname.nd.edu RL
14Chirp as Remote Filesystem
Grid Site A
Grid Site B
Secured by GSI
Chirp Server
Grid Middleware
Unix Filesystem
15Chirp as Cluster Filesystem
Grid Site A
Grid Site B
Chirp Server
Chirp Server
Chirp Server
Chirp Server
Unix Filesystem
Unix Filesystem
Unix Filesystem
Unix Filesystem
16http//www.cse.nd.edu/ccl/viz
17Sample Applications
- Image Processing for Biometrics
- Moretti et al, PCGRID 2007
- Bioinformatics on EGEE
- Blanchet et al, Grid 2006
- High Energy Physics on LCG
- Sfiligoi et al, CHEP 2005,
- Molecular Dynamics Repository
- Wozniak et al, HPDC 2005
- Remote DB Access on EDG
- Klous et al, CCPE 2005
18Protocols for Small Files
19What About FTP?
- FTP is a great data transfer system, but it was
never designed to be a file system - New TCP stream per data transfer.
- New TCP stream for each directory list.
- Lots of connections can overwhelm net devices.
- Coarse errors 550 for all file system errors.
- Semantic problems e.g. empty directory.
- Unix access controls, (But, see SecPAL)
- Wildly varying implementations and support.
20FTP Protocol Reminder
Control Connection
AUTH GSSAPI MIC MIC PORT RETR
FTP Client
FTP Server
Data Connection
Minimum of four round trips (plus auth overhead)
to fetch a file loss of TCP window.
AUTH GSSAPI MIC MIC Data Transfer
Common practice is new control connection for
every data transfer!
21What About NFS?
- NFS was designed for a local area network among
(relatively) trusted hosts. - Fine-grained file access very slow on WAN.
- Kernel support and root assistance needed to
start server, mount client, change target. - Unix UID for ownership, access control.
- Need to bind to privileged port, often filtered.
- Use of file handles to refer to files makes it
very difficult to build a user-level server. - lots of lookup operations over the WAN.
22NFS Protocol Reminder
lookup(00,a) lookup(10,b) lookup(20,c) ...
NFS Client
NFS Server
read 4KB read 4KB read 4KB ...
On a WAN, throughput limited to 4KB/latency. 10ms
400 KB/s 100ms 40 KB/s
23Chirp Hybrid Protocol Overview
auth globus (8 RTT) open read write close ... getf
ile(mydata) putfile(otherdata,size)
Chirp Client
Chirp Server
size and data
data
24Protocol Comparison
- FTP - Stream per File
- Latency 4 RTT for each file
- Throughput TCP limit after slow start
- NFS Remote Procedure Call
- Latency 1 RTT for each file
- Throughput block size / latency
- Chirp - Hybrid
- Latency 1 RTT for each file
- Throughput TCP limit in steady state
25Local Area Performance
26Wide Area Performance
27Real WAN Performance
28Interfaces for Small Files
29Standard Unix Copy
cp /tmp/source /chirp/B/target
cp
open(source)
open(target)
loop read/write
Parrot
open(source)
read
open
write
Local
Chirp
open(source)
read
open
write
Chirp Server
Local Disk
30ProblemThe system does not know the context of
the operation!SolutionIntroduce a
higher-level operationcopyfile that exploits the
context.
31Improved Copy with Copyfile
cp /tmp/source /chirp/B/target
new cp
Parrot
Local
Chirp
Chirp Server
Local Disk
32Is it reasonable to modify cp?
- Installation
- Cannot modify /bin/cp.
- Install new parrot_cp
- Alias cp or link named cp in PATH.
- Backwards compatibility
- parrot_cp without Parrot falls back to normal.
- Ordinary cp on Parrot behaves as before.
- Parrot_cp on a different filesystem falls back.
33Improved Copy with Copyfile
cp /chirp/A/source /chirp/B/target
new cp
Parrot
Chirp
Chirp Server B
Chirp Server A
34Directory Copy
cp r /chirp/A/mydir /chirp/B/mydir
cp
Parrot
Chirp Server B
Chirp Server A
mydir
ACL
X
Y
Z
35Improved Directory Copy
cp r /chirp/A/mydir /chirp/B/mydir
cp
Parrot
mkdir putfile3 setacl
Chirp Server B
Chirp Server A
mydir
ACL
X
Y
Z
36Third Party Performance
37You get the idea...
- ls la D
- Original getdir D Nstat
- Improved getlongdir D
- rm rf D
- Original getdir D Nunlink (recursive)
- Improved rmall D
- md5sum F
- Original open F Nread close
- Improved md5 F
38Final Example
- ls la /chirp/alpha/data
- md5sum /chirp/alpha/data/
- cp -r /chirp/alpha/data
- /chirp/beta/data
- md5sum /chirp/beta/data/
- rm rf /chirp/alpha/data
39Original Implementation
app
parrot
chirp server A
chirp server B
40Improved Implementation
app
parrot
chirp server A
chirp server B
41Performance on Script
42The ChallengeHow to design interfacesso that
users get the expectedperformance and behavior?
43Summary
- Good small file performance requires attention to
low level network protocols. - getfile, putfile, thirdput, rmall, checksum
- Exploiting protocols requires minor changes to
the Unix I/O interface. - copyfile, rmall, checksum, others?
- Easy to apply those changes in a user transparent
way. - cp, rm, md5sum all operate as normal
- Usable performance in a wide-area FS.
44For more information...
- Douglas Thain
- dthain_at_nd.edu
- Chris Moretti
- cmoretti_at_nd.edu
- Parrot and Chirp
- http//www.cctools.org