Other File Systems: LFS, NFS, and AFS - PowerPoint PPT Presentation

About This Presentation
Title:

Other File Systems: LFS, NFS, and AFS

Description:

The upload/download model locks a downloaded file ... Clients upload files at end of session. Whole file caching is central idea behind AFS ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 38
Provided by: ranveer7
Category:
Tags: afs | lfs | nfs | file | systems | upload

less

Transcript and Presenter's Notes

Title: Other File Systems: LFS, NFS, and AFS


1
Other File SystemsLFS, NFS, and AFS
2
Goals for Today
  • Discuss specific file systems
  • both local and remote
  • Log-structured file system (LFS)
  • Distributed file systems (DFS)
  • Network file system (NFS)
  • Andrew file system (AFS)

3
Log-Structured File Systems
  • The trend CPUs are faster, RAM caches are
    bigger
  • So, a lot of reads do not require disk access
  • Most disk accesses are writes ? pre-fetching not
    very useful
  • Worse, most writes are small ? 10 ms overhead for
    50 µs write
  • Example to create a new file
  • i-node of directory needs to be written
  • Directory block needs to be written
  • i-node for the file has to be written
  • Need to write the file
  • Delaying these writes could hamper consistency
  • Solution LFS to utilize full disk bandwidth

4
LFS Basic Idea
  • Structure the disk a log
  • Periodically, all pending writes buffered in
    memory are collected in a single segment
  • The entire segment is written contiguously at end
    of the log
  • Segment may contain i-nodes, directory entries,
    data
  • Start of each segment has a summary
  • If segment around 1 MB, then full disk bandwidth
    can be utilized
  • Note, i-nodes are now scattered on disk
  • Maintain i-node map (entry i points to i-node i
    on disk)
  • Part of it is cached, reducing the delay in
    accessing i-node
  • This description works great for disks of
    infinite size

5
LFS vs. UFS
inode directory data inode map
file1
file2
dir1
dir2
Unix File System
dir2
dir1
Blocks written to create two 1-block files
dir1/file1 and dir2/file2, in UFS and LFS
Log
Log-Structured File System
file1
file2
6
LFS Cleaning
  • Finite disk space implies that the disk is
    eventually full
  • Fortunately, some segments have stale information
  • A file overwrite causes i-node to point to new
    blocks
  • Old ones still occupy space
  • Solution LFS Cleaner thread compacts the log
  • Read segment summary, and see if contents are
    current
  • File blocks, i-nodes, etc.
  • If not, the segment is marked free, and cleaner
    moves forward
  • Else, cleaner writes content into new segment at
    end of the log
  • The segment is marked as free!
  • Disk is a circular buffer, writer adds contents
    to the front, cleaner cleans content from the back

7
Distributed File Systems
  • Goal view a distributed system as a file system
  • Storage is distributed
  • Web tries to make world a collection of
    hyperlinked documents
  • Issues not common to usual file systems
  • Naming transparency
  • Load balancing
  • Scalability
  • Location and network transparency
  • Fault tolerance
  • We will look at some of these today

8
Transfer Model
  • Upload/download Model
  • Client downloads file, works on it, and writes it
    back on server
  • Simple and good performance
  • Remote Access Model
  • File only on server client sends commands to get
    work done

9
Naming transparency
  • Naming is a mapping from logical to physical
    objects
  • Ideally client interface should be transparent
  • Not distinguish between remote and local files
  • /machine/path or mounting remote FS in local
    hierarchy are not transparent
  • A transparent DFS hides the location of files in
    system
  • 2 forms of transparency
  • Location transparency path gives no hint of file
    location
  • /server1/dir1/dir2/x tells x is on server1, but
    not where server1 is
  • Location independence move files without
    changing names
  • Separate naming hierarchy from storage devices
    hierarchy

10
File Sharing Semantics
  • Sequential consistency reads see previous writes
  • Ordering on all system calls seen by all
    processors
  • Maintained in single processor systems
  • Can be achieved in DFS with one file server and
    no caching

11
Caching
  • Keep repeatedly accessed blocks in cache
  • Improves performance of further accesses
  • How it works
  • If needed block not in cache, it is fetched and
    cached
  • Accesses performed on local copy
  • One master file copy on server, other copies
    distributed in DFS
  • Cache consistency problem how to keep cached
    copy consistent with master file copy
  • Where to cache?
  • Disk Pros more reliable, data present locally
    on recovery
  • Memory Pros diskless workstations, quicker data
    access,
  • Servers maintain cache in memory

12
File Sharing Semantics
  • Other approaches
  • Write through caches
  • immediately propagate changes in cache files to
    server
  • Reliable but poor performance
  • Delayed write
  • Writes are not propagated immediately, probably
    on file close
  • Session semantics (AFS) write file back on close
  • Alternative (NFS) scan cache periodically and
    flush modified blocks
  • Better performance but poor reliability
  • File Locking
  • The upload/download model locks a downloaded file
  • Other processes wait for file lock to be released

13
Network File System (NFS)
  • Developed by Sun Microsystems in 1984
  • Used to join FSes on multiple computers as one
    logical whole
  • Used commonly today with UNIX systems
  • Assumptions
  • Allows arbitrary collection of users to share a
    file system
  • Clients and servers might be on different LANs
  • Machines can be clients and servers at the same
    time
  • Architecture
  • A server exports one or more of its directories
    to remote clients
  • Clients access exported directories by mounting
    them
  • The contents are then accessed as if they were
    local

14
Example
15
NFS Mount Protocol
  • Client sends path name to server with request to
    mount
  • Not required to specify where to mount
  • If path is legal and exported, server returns
    file handle
  • Contains FS type, disk, i-node number of
    directory, security info
  • Subsequent accesses from client use file handle
  • Mount can be either at boot or automount
  • Using automount, directories are not mounted
    during boot
  • OS sends a message to servers on first remote
    file access
  • Automount is helpful since remote dir might not
    be used at all
  • Mount only affects the client view!

16
NFS Protocol
  • Supports directory and file access via remote
    procedure calls (RPCs)
  • All UNIX system calls supported other than open
    close
  • Open and close are intentionally not supported
  • For a read, client sends lookup message to server
  • Server looks up file and returns handle
  • Unlike open, lookup does not copy info in
    internal system tables
  • Subsequently, read contains file handle, offset
    and num bytes
  • Each message is self-contained
  • Pros server is stateless, i.e. no state about
    open files
  • Cons Locking is difficult, no concurrency control

17
NFS Implementation
  • Three main layers
  • System call layer
  • Handles calls like open, read and close
  • Virtual File System Layer
  • Maintains table with one entry (v-node) for each
    open file
  • v-nodes indicate if file is local or remote
  • If remote it has enough info to access them
  • For local files, FS and i-node are recorded
  • NFS Service Layer
  • This lowest layer implements the NFS protocol

18
NFS Layer Structure
19
How NFS works?
  • Mount
  • Sys ad calls mount program with remote dir, local
    dir
  • Mount program parses for name of NFS server
  • Contacts server asking for file handle for remote
    dir
  • If directory exists for remote mounting, server
    returns handle
  • Client kernel constructs v-node for remote dir
  • Asks NFS client code to construct r-node for file
    handle
  • Open
  • Kernel realizes that file is on remotely mounted
    directory
  • Finds r-node in v-node for the directory
  • NFS client code then opens file, enters r-node
    for file in VFS, and returns file descriptor for
    remote node

20
Cache coherency
  • Clients cache file attributes and data
  • If two clients cache the same data, cache
    coherency is lost
  • Solutions
  • Each cache block has a timer (3 sec for data, 30
    sec for dir)
  • Entry is discarded when timer expires
  • On open of cached file, its last modify time on
    server is checked
  • If cached copy is old, it is discarded
  • Every 30 sec, cache time expires
  • All dirty blocks are written back to the server

21
Andrew File System (AFS)
  • Named after Andrew Carnegie and Andrew Mellon
  • Transarc Corp. and then IBM took development of
    AFS
  • In 2000 IBM made OpenAFS available as open source
  • Features
  • Uniform name space
  • Location independent file sharing
  • Client side caching with cache consistency
  • Secure authentication via Kerberos
  • Server-side caching in form of replicas
  • High availability through automatic switchover of
    replicas
  • Scalability to span 5000 workstations

22
AFS Overview
  • Based on the upload/download model
  • Clients download and cache files
  • Server keeps track of clients that cache the file
  • Clients upload files at end of session
  • Whole file caching is central idea behind AFS
  • Later amended to block operations
  • Simple, effective
  • AFS servers are stateful
  • Keep track of clients that have cached files
  • Recall files that have been modified

23
AFS Details
  • Has dedicated server machines
  • Clients have partitioned name space
  • Local name space and shared name space
  • Cluster of dedicated servers (Vice) present
    shared name space
  • Clients run Virtue protocol to communicate with
    Vice
  • Clients and servers are grouped into clusters
  • Clusters connected through the WAN
  • Other issues
  • Scalability, client mobility, security,
    protection, heterogeneity

24
AFS Shared Name Space
  • AFSs storage is arranged in volumes
  • Usually associated with files of a particular
    client
  • AFS dir entry maps vice files/dirs to a 96-bit
    fid
  • Volume number
  • Vnode number index into i-node array of a volume
  • Uniquifier allows reuse of vnode numbers
  • Fids are location transparent
  • File movements do not invalidate fids
  • Location information kept in volume-location
    database
  • Volumes migrated to balance available disk space,
    utilization
  • Volume movement is atomic operation aborted on
    server crash

25
AFS Operations and Consistency
  • AFS caches entire files from servers
  • Client interacts with servers only during open
    and close
  • OS on client intercepts calls, and passes it to
    Venus
  • Venus is a client process that caches files from
    servers
  • Venus contacts Vice only on open and close
  • Does not contact if file is already in the cache,
    and not invalidated
  • Reads and writes bypass Venus
  • Works due to callback
  • Server updates state to record caching
  • Server notifies client before allowing another
    client to modify
  • Clients lose their callback when someone writes
    the file
  • Venus caches dirs and symbolic links for path
    translation

26
AFS Implementation
  • Client cache is a local directory on UNIX FS
  • Venus and server processes access file directly
    by UNIX i-node
  • Venus has 2 caches, one for status one for data
  • Uses LRU to keep them bounded in size

27
Summary
  • LFS
  • Local file system
  • Optimize writes
  • NFS
  • Simple distributed file system protocol. No
    open/close
  • Stateless server
  • Has problems with cache consistency, locking
    protocol
  • AFS
  • More complicated distributed file system protocol
  • Stateful server
  • session semantics consistency on close

28
Enjoy Spring Break!!!
29
Storage Area Networks (SANs)
  • New generation of architectures for managing
    storage in massive data centers
  • For example, Google is said to have
    50,000-200,000 computers in various centers
  • Amazon is reaching a similar scale
  • A SAN system is a collection of file systems with
    tools to help humans administer the system

30
Examples of SAN issues
  • Where should a file be stored
  • Many of these systems have an indirection
    mechanism so that a file can move from volume to
    volume
  • Allows files to migrate, e.g. from a slow server
    to a fast one or from long term storage onto an
    active disk system
  • Eco-computing systems that seek to minimize
    energy in big data centers

31
Examples of SAN issues
  • Disk-to-disk backup
  • Might want to do very fast automated backups
  • Ideally, can support this while the disk is
    actively in use
  • Easiest if two disks are next to each other
  • Challenge back up entire data center in New York
    at site in Kentucky
  • US Dept of Treasury e-Cavern

32
File System Reliability
  • 2 considerations backups and consistency
  • Why backup?
  • Recover from disaster
  • Recover from stupidity
  • Where to backup? Tertiary storage
  • Tape holds 10 or 100s of GBs, costs pennies/GB
  • sequential access ? high random access time
  • Backup takes time and space

33
Backup Issues
  • Should the entire FS be backup up?
  • Binaries, special I/O files usually not backed up
  • Do not backup unmodified files since last backup
  • Incremental dumps complete per month, modified
    files daily
  • Compress data before writing to tape
  • How to backup an active FS?
  • Not acceptable to take system offline during
    backup hours
  • Security of backup media

34
Backup Strategies
  • Physical Dump
  • Start from block 0 of disk, write all blocks in
    order, stop after last
  • Pros Simple to implement, speed
  • Cons skip directories, incremental dumps,
    restore some file
  • No point dumping unused blocks, avoiding it is a
    big overhead
  • How to dump bad blocks?
  • Logical Dump
  • Start at a directory
  • dump all directories and files changed since base
    date
  • Base date could be of last incremental dump, last
    full dump, etc.
  • Also dump all dirs (even unmodified) in path to a
    modified file

35
Logical Dumps
  • Why dump unmodified directories?
  • Restore files on a fresh FS
  • To incrementally recover a single file

File that has not changed
36
A Dumping Algorithm
  • Algorithm
  • Mark all dirs modified files
  • Unmark dirs with no mod. files
  • Dump dirs
  • Dump modified files

37
Logical Dumping Issues
  • Reconstruct the free block list on restore
  • Maintaining consistency across symbolic links
  • UNIX files with holes
  • Should never dump special files, e.g. named pipes
Write a Comment
User Comments (0)
About PowerShow.com