Other File Systems: LFS, NFS, and AFS presentation

About This Presentation

Transcript and Presenter's Notes

Title: Other File Systems: LFS, NFS, and AFS

1
Other File SystemsLFS, NFS, and AFS
2
Goals for Today

Discuss specific file systems
both local and remote
Log-structured file system (LFS)
Distributed file systems (DFS)
Network file system (NFS)
Andrew file system (AFS)

3
Log-Structured File Systems

The trend CPUs are faster, RAM caches are
bigger
So, a lot of reads do not require disk access
Most disk accesses are writes ? pre-fetching not
very useful
Worse, most writes are small ? 10 ms overhead for
50 µs write
Example to create a new file
i-node of directory needs to be written
Directory block needs to be written
i-node for the file has to be written
Need to write the file
Delaying these writes could hamper consistency
Solution LFS to utilize full disk bandwidth

4
LFS Basic Idea

Structure the disk a log
Periodically, all pending writes buffered in
memory are collected in a single segment
The entire segment is written contiguously at end
of the log
Segment may contain i-nodes, directory entries,
data
Start of each segment has a summary
If segment around 1 MB, then full disk bandwidth
can be utilized
Note, i-nodes are now scattered on disk
Maintain i-node map (entry i points to i-node i
on disk)
Part of it is cached, reducing the delay in
accessing i-node
This description works great for disks of
infinite size

5
LFS vs. UFS
inode directory data inode map
file1
file2
dir1
dir2
Unix File System
dir2
dir1
Blocks written to create two 1-block files
dir1/file1 and dir2/file2, in UFS and LFS
Log
Log-Structured File System
file1
file2
6
LFS Cleaning

Finite disk space implies that the disk is
eventually full
Fortunately, some segments have stale information
A file overwrite causes i-node to point to new
blocks
Old ones still occupy space
Solution LFS Cleaner thread compacts the log
Read segment summary, and see if contents are
current
File blocks, i-nodes, etc.
If not, the segment is marked free, and cleaner
moves forward
Else, cleaner writes content into new segment at
end of the log
The segment is marked as free!
Disk is a circular buffer, writer adds contents
to the front, cleaner cleans content from the back

7
Distributed File Systems

Goal view a distributed system as a file system
Storage is distributed
Web tries to make world a collection of
hyperlinked documents
Issues not common to usual file systems
Naming transparency
Load balancing
Scalability
Location and network transparency
Fault tolerance
We will look at some of these today

8
Transfer Model

Upload/download Model
Client downloads file, works on it, and writes it
back on server
Simple and good performance
Remote Access Model
File only on server client sends commands to get
work done

9
Naming transparency

Naming is a mapping from logical to physical
objects
Ideally client interface should be transparent
Not distinguish between remote and local files
/machine/path or mounting remote FS in local
hierarchy are not transparent
A transparent DFS hides the location of files in
system
2 forms of transparency
Location transparency path gives no hint of file
location
/server1/dir1/dir2/x tells x is on server1, but
not where server1 is
Location independence move files without
changing names
Separate naming hierarchy from storage devices
hierarchy

10
File Sharing Semantics

Sequential consistency reads see previous writes
Ordering on all system calls seen by all
processors
Maintained in single processor systems
Can be achieved in DFS with one file server and
no caching

11
Caching

Keep repeatedly accessed blocks in cache
Improves performance of further accesses
How it works
If needed block not in cache, it is fetched and
cached
Accesses performed on local copy
One master file copy on server, other copies
distributed in DFS
Cache consistency problem how to keep cached
copy consistent with master file copy
Where to cache?
Disk Pros more reliable, data present locally
on recovery
Memory Pros diskless workstations, quicker data
access,
Servers maintain cache in memory

12
File Sharing Semantics

Other approaches
Write through caches
immediately propagate changes in cache files to
server
Reliable but poor performance
Delayed write
Writes are not propagated immediately, probably
on file close
Session semantics (AFS) write file back on close
Alternative (NFS) scan cache periodically and
flush modified blocks
Better performance but poor reliability
File Locking
The upload/download model locks a downloaded file
Other processes wait for file lock to be released

13
Network File System (NFS)

Developed by Sun Microsystems in 1984
Used to join FSes on multiple computers as one
logical whole
Used commonly today with UNIX systems
Assumptions
Allows arbitrary collection of users to share a
file system
Clients and servers might be on different LANs
Machines can be clients and servers at the same
time
Architecture
A server exports one or more of its directories
to remote clients
Clients access exported directories by mounting
them
The contents are then accessed as if they were
local

14
Example
15
NFS Mount Protocol

Client sends path name to server with request to
mount
Not required to specify where to mount
If path is legal and exported, server returns
file handle
Contains FS type, disk, i-node number of
directory, security info
Subsequent accesses from client use file handle
Mount can be either at boot or automount
Using automount, directories are not mounted
during boot
OS sends a message to servers on first remote
file access
Automount is helpful since remote dir might not
be used at all
Mount only affects the client view!

16
NFS Protocol

Supports directory and file access via remote
procedure calls (RPCs)
All UNIX system calls supported other than open
close
Open and close are intentionally not supported
For a read, client sends lookup message to server
Server looks up file and returns handle
Unlike open, lookup does not copy info in
internal system tables
Subsequently, read contains file handle, offset
and num bytes
Each message is self-contained
Pros server is stateless, i.e. no state about
open files
Cons Locking is difficult, no concurrency control

17
NFS Implementation

Three main layers
System call layer
Handles calls like open, read and close
Virtual File System Layer
Maintains table with one entry (v-node) for each
open file
v-nodes indicate if file is local or remote
If remote it has enough info to access them
For local files, FS and i-node are recorded
NFS Service Layer
This lowest layer implements the NFS protocol

18
NFS Layer Structure
19
How NFS works?

Mount
Sys ad calls mount program with remote dir, local
dir
Mount program parses for name of NFS server
Contacts server asking for file handle for remote
dir
If directory exists for remote mounting, server
returns handle
Client kernel constructs v-node for remote dir
Asks NFS client code to construct r-node for file
handle
Open
Kernel realizes that file is on remotely mounted
directory
Finds r-node in v-node for the directory
NFS client code then opens file, enters r-node
for file in VFS, and returns file descriptor for
remote node

20
Cache coherency

Clients cache file attributes and data
If two clients cache the same data, cache
coherency is lost
Solutions
Each cache block has a timer (3 sec for data, 30
sec for dir)
Entry is discarded when timer expires
On open of cached file, its last modify time on
server is checked
If cached copy is old, it is discarded
Every 30 sec, cache time expires
All dirty blocks are written back to the server

21
Andrew File System (AFS)

Named after Andrew Carnegie and Andrew Mellon
Transarc Corp. and then IBM took development of
AFS
In 2000 IBM made OpenAFS available as open source
Features
Uniform name space
Location independent file sharing
Client side caching with cache consistency
Secure authentication via Kerberos
Server-side caching in form of replicas
High availability through automatic switchover of
replicas
Scalability to span 5000 workstations

22
AFS Overview

Based on the upload/download model
Clients download and cache files
Server keeps track of clients that cache the file
Clients upload files at end of session
Whole file caching is central idea behind AFS
Later amended to block operations
Simple, effective
AFS servers are stateful
Keep track of clients that have cached files
Recall files that have been modified

23
AFS Details

Has dedicated server machines
Clients have partitioned name space
Local name space and shared name space
Cluster of dedicated servers (Vice) present
shared name space
Clients run Virtue protocol to communicate with
Vice
Clients and servers are grouped into clusters
Clusters connected through the WAN
Other issues
Scalability, client mobility, security,
protection, heterogeneity

24
AFS Shared Name Space

AFSs storage is arranged in volumes
Usually associated with files of a particular
client
AFS dir entry maps vice files/dirs to a 96-bit
fid
Volume number
Vnode number index into i-node array of a volume
Uniquifier allows reuse of vnode numbers
Fids are location transparent
File movements do not invalidate fids
Location information kept in volume-location
database
Volumes migrated to balance available disk space,
utilization
Volume movement is atomic operation aborted on
server crash

25
AFS Operations and Consistency

AFS caches entire files from servers
Client interacts with servers only during open
and close
OS on client intercepts calls, and passes it to
Venus
Venus is a client process that caches files from
servers
Venus contacts Vice only on open and close
Does not contact if file is already in the cache,
and not invalidated
Reads and writes bypass Venus
Works due to callback
Server updates state to record caching
Server notifies client before allowing another
client to modify
Clients lose their callback when someone writes
the file
Venus caches dirs and symbolic links for path
translation

26
AFS Implementation

Client cache is a local directory on UNIX FS
Venus and server processes access file directly
by UNIX i-node
Venus has 2 caches, one for status one for data
Uses LRU to keep them bounded in size

27
Summary

LFS
Local file system
Optimize writes
NFS
Simple distributed file system protocol. No
open/close
Stateless server
Has problems with cache consistency, locking
protocol
AFS
More complicated distributed file system protocol
Stateful server
session semantics consistency on close

28
Enjoy Spring Break!!!
29
Storage Area Networks (SANs)

New generation of architectures for managing
storage in massive data centers
For example, Google is said to have
50,000-200,000 computers in various centers
Amazon is reaching a similar scale
A SAN system is a collection of file systems with
tools to help humans administer the system

30
Examples of SAN issues

Where should a file be stored
Many of these systems have an indirection
mechanism so that a file can move from volume to
volume
Allows files to migrate, e.g. from a slow server
to a fast one or from long term storage onto an
active disk system
Eco-computing systems that seek to minimize
energy in big data centers

31
Examples of SAN issues

Disk-to-disk backup
Might want to do very fast automated backups
Ideally, can support this while the disk is
actively in use
Easiest if two disks are next to each other
Challenge back up entire data center in New York
at site in Kentucky
US Dept of Treasury e-Cavern

32
File System Reliability

2 considerations backups and consistency
Why backup?
Recover from disaster
Recover from stupidity
Where to backup? Tertiary storage
Tape holds 10 or 100s of GBs, costs pennies/GB
sequential access ? high random access time
Backup takes time and space

33
Backup Issues

Should the entire FS be backup up?
Binaries, special I/O files usually not backed up
Do not backup unmodified files since last backup
Incremental dumps complete per month, modified
files daily
Compress data before writing to tape
How to backup an active FS?
Not acceptable to take system offline during
backup hours
Security of backup media

34
Backup Strategies

Physical Dump
Start from block 0 of disk, write all blocks in
order, stop after last
Pros Simple to implement, speed
Cons skip directories, incremental dumps,
restore some file
No point dumping unused blocks, avoiding it is a
big overhead
How to dump bad blocks?
Logical Dump
Start at a directory
dump all directories and files changed since base
date
Base date could be of last incremental dump, last
full dump, etc.
Also dump all dirs (even unmodified) in path to a
modified file

35
Logical Dumps

Why dump unmodified directories?
Restore files on a fresh FS
To incrementally recover a single file

File that has not changed
36
A Dumping Algorithm

Algorithm
Mark all dirs modified files
Unmark dirs with no mod. files
Dump dirs
Dump modified files

Other File Systems: LFS, NFS, and AFS PowerPoint PPT Presentation