Software Systems File Systems and Storage - PowerPoint PPT Presentation

About This Presentation
Title:

Software Systems File Systems and Storage

Description:

Software Systems File Systems and Storage Emery Berger and Mark Corner University of Massachusetts Amherst – PowerPoint PPT presentation

Number of Views:264
Avg rating:3.0/5.0
Slides: 46
Provided by: uma104
Category:

less

Transcript and Presenter's Notes

Title: Software Systems File Systems and Storage


1
Software SystemsFile Systems and Storage
  • Emery Berger and Mark Corner
  • University of Massachusetts Amherst

2
Files
  • Associate names with data
  • Usually stored on persistent media (disks)

3
File Names
  • Hierarchical directory structure
  • Absolute, relative to current
  • Windows names location dir

4
File Systems
  • Organized set of data types
  • organize data
  • point to where data is stored
  • searchable database of files
  • LOTS of file systems
  • AFS, BFS, CFS, DFS, EFS, FFS, GFS, HFS, etc.
  • Distributed, local, encrypted, different OSs

5
Directories
  • Directory just special file
  • Contains metadata, filenames
  • pointers to inodes
  • Typically hierarchical tree
  • odd exposure of data structure to user

6
Blocks
  • Storage organized as a sequence of blocks
  • Unit or reading and writing
  • Read, modify, write sequence
  • File system tracks free and full blocks
  • typically stored in a bitmap

7
Inodes
  • On disk data structure
  • Describes where all the bits of a file (dir) are

8
Storage
  • Lots of forms of permanent storage
  • Disk drives, flash storage, Tape, CDs, DVDs

9
Storage
  • Disks
  • Seek latency rotational latency
  • High bandwidth
  • One of two moving parts in a PC

10
Storage
  • Flash memory
  • Predictable low latency (including random)
  • Lower bandwidth
  • Larger erase blocks, wears out, energy
  • Prediction all PC storage Flash-based in 10
    years

11
Locality
  • File systems use directory structure to improve
    locality
  • More important for disks than Flash
  • E.g., ext2 all files in same directory
    clustered in same region of disk
  • Try to make all blocks of same file sequential
  • Move directories apart for expansion

12
Caching
  • Disk blocks, inodes, directories all cached
  • 1/3 to 1/2 of memory is disk cache
  • Disk drive has a cache too!

13
Poor Mans Database
  • Because files directories are easy to use, they
    get used as de facto databases
  • e.g., Internet Explorer web cache
  • 1000 files in each hash subdirectory

C\Documents and Settings\Emery\Local
Settings\Temporary Internet Files\Content.IE5gtls
-ltra total 1873 -rwx------ 1 Emery None 67
Jan 10 1731 desktop.ini drwx------ 2 Emery None
0 Jan 17 2242 0NDWKTYT drwx------ 7 Emery
None 0 Feb 19 1953 . drwx------ 7 Emery
None 0 Apr 20 1445 .. drwx------ 2 Emery
None 0 May 1 2141 8HZD6WS6 drwx------ 2
Emery None 0 May 1 2154
I4F15DOK drwx------ 2 Emery None 0 May 1
2203 XM0N4Q4W -rwx------ 1 Emery None 1916928
May 3 1221 index.dat drwx------ 2 Emery None
0 May 3 1221 S0RKZRFZ C\Documents and
Settings\Emery\Local Settings\Temporary Internet
Files\Content.IE5gt
14
File Systems Abstraction
  • File system manages files
  • Traditionally file system maps files to disk
  • But files convenient abstractionuse same, easy
    interface (read, write)
  • Block devices (/dev/scsi0)
  • Disk drives transfer in blocks
  • Character devices (/dev/tty)
  • Console, printer
  • Proc filesystem (/proc/mem)
  • FIFO (named pipes)

15
Device files
  • Unix devices live in /dev,act like ordinary files

elnux14gt echo "foo" gt /dev/tty foo
16
/proc filesystem
  • Normal file access to kernel internals

elnux14gt ls -l /proc/30917/ total 0 dr-xr-xr-x 2
emery fac 0 May 3 1318 attr -r-------- 1 emery
fac 0 May 3 1318 auxv -r--r--r-- 1 emery fac 0
May 3 1301 cmdline lrwxrwxrwx 1 emery fac 0
May 3 1318 cwd -gt /nfs/elsrv4/users5/fac/emery -
r-------- 1 emery fac 0 May 3 1318
environ lrwxrwxrwx 1 emery fac 0 May 3 1318
exe -gt /bin/tcsh dr-x------ 2 emery fac 0 May 3
1206 fd -rw-r--r-- 1 emery fac 0 May 3 1318
loginuid -r-------- 1 emery fac 0 May 3 1318
maps -rw------- 1 emery fac 0 May 3 1318
mem -r--r--r-- 1 emery fac 0 May 3 1318
mounts lrwxrwxrwx 1 emery fac 0 May 3 1318
root -gt / -r--r--r-- 1 emery fac 0 May 3 1301
stat -r--r--r-- 1 emery fac 0 May 3 1318
statm -r--r--r-- 1 emery fac 0 May 3 1301
status dr-xr-xr-x 3 emery fac 0 May 3 1318
task -r--r--r-- 1 emery fac 0 May 3 1310 wchan
17
File Metadata
  • Files have a lot of associated metadata ex.
    Unix (from stat)
  • Date created, last modified, last accessed
  • Size (bytes)
  • User group ID of files owner
  • File type (not content type)
  • Directory
  • Regular file
  • Block / character device (disk drive, screen)
  • FIFO

18
Untyped Files
  • Unix, Windows file contents untyped
  • Stream of bytes
  • Type implied by convention (extensions)
  • .ppt, .pdf,
  • Mac file types stored in metadata

19
Access Control
  • Unix each file has associated bits that control
    access ( other stuff)
  • Read
  • Write
  • Execute
  • Can specify for three users
  • User (file owner)
  • Group (set of users)
  • Other (everyone else)

20
Access Control - chmod
  • Can read bits via ls, set bits via chmod

elnux14gt ls -l ack.scm -rw-r----- 1 emery fac
197 Feb 25 1519 ack.scm elnux14gt chmod -r
ack.scm elnux14gt ls -l ack.scm --w------- 1
emery fac 197 Feb 25 1519 ack.scm elnux14gt cat
ack.scm cat ack.scm Permission denied
21
Access Control Lists (ACLs)
  • ACLs are more expressive
  • Specify different rights per user or group
  • Opinion one of the biggest UNIX problems

22
Whats Wrong with One Disk?
23
Distributed File Systems
  • Numerous drawbacks of local file systems
  • Inconvenient
  • Administrative overhead
  • Single point-of-failure
  • Solution distributed file systems
  • FS appears local, but data remote
  • Two major implementations
  • Windows (CIFS, SAMBA)
  • NFS (Suns Network File System)
  • Lots of manual DFSs (rsync, svn, USB keys)

24
Complications
  • Complexity and design tradeoffs
  • Naming absolute vs. relative (to server)
  • Remote access vs. caching
  • Stateless or stateful server
  • Single image or replication

25
Naming Transparency
  • Issues
  • How are files named?
  • Do filenames reveal location?
  • Do filenames change if file moves?
  • Do filenames change if user moves?

26
Location naming
  • Location transparency
  • Use indirection!
  • filename does not reveal storage location
  • Normal in Unix
  • Compare to Windows - C\foo\bar
  • Name may still change
  • if storage location changes
  • transparent not independent!

27
What parts are transparent?
  • Windows
  • Local //computer/share/./directory/file
  • Remote files are explicit!
  • Remote ./directory/file
  • UNIX
  • Local /./mountpoint/directory/file
  • Remote files look like any other file
  • Remote /./directory/file
  • Neither reveals all of storage location
  • Windows reveals machine, UNIX does not

28
NFS Example
29
URLs Viewed as File System
  • Uniform Resource Locator names increasingly
    standard way to access dataprotocol//machine/pa
    th/to/file
  • Good? Bad?
  • Looks like Windows same?

30
File Caching
  • Cache information from file server locally
  • Local disk
  • Reduces access time (compared to remote)
  • Safe if node fails
  • Requires client to have disk ()
  • Local memory
  • Quick
  • Works without disks
  • Smaller cache size
  • Not fault-tolerant

31
Remote File Access Caching
  • Caching issues
  • Performance
  • Where when to cache file blocks?
  • Correctness
  • When to propagate updates back to remote file?
  • What happens with multiple clients sharing?

32
Sharing with Others
33
When do changes get written?
  • User A opens a file, changes a file
  • When does it write it to file server?
  • If another user opens file does it see the
    changes?
  • Unix/one-copy semantics
  • Immediate
  • keep in mind UI issues
  • Session semantics
  • After close
  • Transaction semantics
  • Defined by program
  • Uncommon in FS

34
How is client informed?
  • Client-initiated consistency
  • client contacts server and checks consistency
  • every access
  • at given intervals
  • only upon opening a file
  • Server-initiated consistency
  • server detects potential conflicts, invalidates
    caches
  • Server needs to know
  • which clients have cached which parts of which
    files, plus
  • which clients are readers which are writers

35
Conflicts
  • Simplest kind Read-Write Conflicts
  • Two people read same thing
  • The cat is red
  • Both write
  • The cat is brown, The cat is purple
  • Which is right?
  • Can this happen locally?
  • Yes! Try it with an editor
  • Worse with DFS, not obvious to user why

36
RAID, NAS, SAN Storage
  • Redundant Array of Inexpensive Disks
  • Multiple disks attached to controller
  • Disks each carry part of data
  • Redundancy, error detection, parallel transfer
  • Network Attached Storage
  • Box w/network port and storage (ie. XRAID)
  • Storage Area Network
  • Specialized network of NAS (ie. XSAN)

37
The Near-Future
  • Parallel File Systems (pNFS, GFS)
  • Separate meta-data and data
  • Store data chunks on different machines

38
The End
39
Atomic Updates
  • Shadowing
  • Logs
  • Explain!

40
Named Pipes (FIFO)
  • Special file acts like unnamed pipe
  • E.g., cat file wc -l

elnux14gt mkfifo thePipe elnux14gt ls -ld
thePipe prw-r----- 1 emery fac 0 May 3 1400
thePipe elnux14gt cat simplesocket.h gt thePipe
1 32242 elnux14gt wc -l lt thePipe 155 1
Done cat simplesocket.h
gt thePipe elnux14gt
41
Named Pipes (FIFO)
  • Special file acts like unnamed pipe
  • E.g., cat file wc -l

elnux14gt mkfifo thePipe elnux14gt ls -ld
thePipe prw-r----- 1 emery fac 0 May 3 1400
thePipe elnux14gt cat simplesocket.h gt thePipe
1 32242 elnux14gt wc -l lt thePipe 155 1
Done cat simplesocket.h
gt thePipe elnux14gt
42
Named Pipes (FIFO)
  • Special file acts like unnamed pipe
  • E.g., cat file wc -l

elnux14gt mkfifo thePipe elnux14gt ls -ld
thePipe prw-r----- 1 emery fac 0 May 3 1400
thePipe elnux14gt cat simplesocket.h gt thePipe
1 32242 elnux14gt wc -l lt thePipe 155 1
Done cat simplesocket.h
gt thePipe elnux14gt
43
Named Pipes (FIFO)
  • Special file acts like unnamed pipe
  • E.g., cat file wc l
  • Useful when cannot do redirection
  • Especially for compression

elnux14gt mkfifo thePipe elnux14gt ls -ld
thePipe prw-r----- 1 emery fac 0 May 3 1400
thePipe elnux14gt cat simplesocket.h gt thePipe
1 32242 elnux14gt wc -l lt thePipe 155 1
Done cat simplesocket.h
gt thePipe elnux14gt
44
Named Pipes (FIFO)
  • Exercise
  • Program named joe outputs file joe.out
  • Huge ( 3 GB)
  • Compress it automagically using gzip -c named
    FIFO to joe.out.gz

45
Named Pipes (FIFO)
  • Exercise
  • Program named joe outputs file joe.out
  • Huge ( 3 GB)
  • Compress it automagically using gzip -c named
    FIFO to joe.out.gz

elnux14gt mkfifo joe.out elnux14gt gzip c lt
joe.out gt joe.out.gz 1elnux14gt joe
Write a Comment
User Comments (0)
About PowerShow.com