ECE 6160: Advanced Computer Networks Networked Storage - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

ECE 6160: Advanced Computer Networks Networked Storage

Description:

100 KB/hr, 25 GB/lifetime. Hear (Speech _at_ 10KB/s) 40 MB/hr, 10 TB/lifetime. See (TV _at_ .5 MB/s) 2 GB/hr, 500TB/lifetime. 1K=210. 1M=220 ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 61
Provided by: ben57
Category:

less

Transcript and Presenter's Notes

Title: ECE 6160: Advanced Computer Networks Networked Storage


1
ECE 6160 Advanced Computer NetworksNetworked
Storage
  • Instructor Dr. Xubin (Ben) He
  • Email Hexb_at_tntech.edu
  • Tel 931-372-3462

2
Prev
  • Storage devices
  • RAIDs

3
Background
  • Data storage plays an essential role in todays
    fast-growing data-intensive network services.
  • Online data storage doubles every 9 months
  • How much data is there?
  • Read (Text)
  • 100 KB/hr, 25 GB/lifetime
  • Hear (Speech _at_ 10KB/s)
  • 40 MB/hr, 10 TB/lifetime
  • See (TV _at_ .5 MB/s)
  • 2 GB/hr, 500TB/lifetime

1K210 1M220 1G230 1T240
1 book (400pp) 1MB
4
Background
Storage cost as proportion of total IT spending
as compared to server cost (Source
Brocade)
5
A Server-to-Storage Bottleneck
Source Brocade
6
How is a server connected to a storage?
  • Traditional SCSI bus architecture (DAS)
  • Both physical transport and a protocol
  • Device drivers under the OS talk with SCSI
    devices
  • Parallel bus with 8 bits, 16 bits, 32 bits
  • Problems
  • Termination of unused ports
  • Number of devices connected to a SCSI bus 15
  • Length limitation 15-25 meters

7
Problem
  • Each new application server requires its own
    dedicated storage solution.

8
Solution
  • Simplify storage management by separating the
    data from the application server.

9
Benefits of storage networking
  • Consolidation
  • Centralized Data Management
  • Scalability
  • Fault Resiliency

Network delays Network cost
Storage capacity/volume Administrative
cost Network bandwidth
10
Storage as a service
SSP ASP
Storage Service Provider
Application Service Provider
Outsourcing storage and/or applications as a
service. For ASPs (e.g., Web services), storage
is just a component.
11
Current State-of-the-art
  • Network Attached Storage(NAS)
  • Storage accessed over TCP/IP, using industry
    standard file sharing protocols like NFS, HTTP,
    Windows Networking
  • Provide File System Functionality
  • Take LAN bandwidth of Servers
  • Storage Area Network(SAN)
  • Storage accessed over a Fibre Channel switching
    fabric, using encapsulated SCSI.
  • Block level storage system
  • Fibre-Channel SAN
  • IP SAN
  • Implementing SAN over well-known TCP/IP
  • iSCSI Cost-effective, SCSI and TCP/IP

12
Storage Abstractions
  • relational database (IBM and Oracle)
  • tables, transactions, query language
  • file system
  • hierarchical name space with ACLs
  • block storage/volumes
  • SAN, Petal, RAID-in-a-box (e.g., EMC)
  • object storage
  • object file, with a flat name space NASD, DDS
  • persistent objects
  • pointer structures, requires transactions OODB,
    ObjectStore

13
NAS vs SAN?
14
Typical NAS
15
High BW NAS
  • Accelerate applications
  • Data sharing for NT, UNIX, and Web
  • Offload file sharing function

16
Typical SAN
  • Backup solutions (tape sharing)
  • Disaster tolerance solutions (distance to remote
    location)
  • Reliable, maintainable, scalable infrastructure

17
NAS vs. SAN
  • In the commercial sector there is a raging debate
    today about NAS vs. SAN.
  • Network-Attached Storage has been the dominant
    approach to shared storage since NFS.
  • NAS NFS or CIFS named files over
    Ethernet/Internet.
  • E.g., Network Appliance filers
  • Proponents of FibreChannel SANs market them as a
    fundamentally faster way to access shared
    storage.
  • no indirection through a file server (SAD)
  • lower overhead on clients
  • network is better/faster (if not cheaper) and
    dedicated/trusted
  • Brocade, HP, Emulex are some big players.

18
Network File System (NFS)
server
client
syscall layer
user programs
VFS
syscall layer
NFS server
VFS
FS
NFS client
FS
RPC over UDP or TCP
19
NFS Protocol
  • NFS is a network protocol layered above IP.
  • Original implementations (and most today) use UDP
    datagram transport for low overhead.
  • Maximum IP datagram size was increased to match
    FS block size, to allow send/receive of entire
    file blocks.
  • Some implementations use TCP as a transport.
  • The NFS protocol is a set of message formats and
    types.
  • Client issues a request message for a service
    operation.
  • Server performs requested operation and returns a
    reply message with status and (perhaps) requested
    data.

20
File Handles
  • Question how does the client tell the server
    which file or directory the operation applies to?
  • Similarly, how does the server return the result
    of a lookup?
  • More generally, how to pass a pointer or an
    object reference as an argument/result of an RPC
    call?
  • In NFS, the reference is a file handle or
    fhandle, a token/ticket whose value is determined
    by the server.
  • Includes all information needed to identify the
    file/object on the server, and find it quickly.

volume ID
inode
generation
21
Consistency for File Systems
  • How is the consistency problem different for
    network file systems, relative to DSM/SVM?
  • Note Linux (any Unix) includes a lot of detail
    about the kernel implementation issues for these
    file systems. These are interesting and useful,
    but in this course we focus on the distribution
    aspects.

22
NFS as a Stateless Service
  • A classical NFS server maintains no in-memory
    hard state.
  • The only hard state is the stable file system
    image on disk.
  • no record of clients or open files
  • no implicit arguments to requests
  • E.g., no server-maintained file offsets read and
    write requests must explicitly transmit the byte
    offset for each operation.
  • no write-back caching on the server
  • no record of recently processed requests
  • etc., etc....
  • Statelessness makes failure recovery simple and
    efficient.

23
Recovery in Stateless NFS
  • If the server fails and restarts, there is no
    need to rebuild in-memory state on the server.
  • Client reestablishes contact (e.g., TCP
    connection).
  • Client retransmits pending requests.
  • Classical NFS uses a connectionless transport
    (UDP).
  • Server failure is transparent to the client no
    connection to break or reestablish.
  • A crashed server is indistinguishable from a slow
    server.
  • Sun/ONC RPC masks network errors by
    retransmitting a request after an adaptive
    timeout.
  • A dropped packet is indistinguishable from a
    crashed server.

24
Drawbacks of a Stateless Service
  • The stateless nature of classical NFS has
    compelling design advantages (simplicity), but
    also some key drawbacks
  • Recovery-by-retransmission constrains the server
    interface.
  • ONC RPC/UDP has execute-mostly-once semantics
    (send and pray), which compromises performance
    and correctness.
  • Update operations are disk-limited.
  • Updates must commit synchronously at the server.
  • NFS cannot (quite) preserve local single-copy
    semantics.
  • Files may be removed while they are open on the
    client.
  • Server cannot help in client cache consistency.

25
AFS 1985
  • AFS is an alternative to NFS developed at CMU.
  • Designed for wide area file sharing
  • Internet is large and growing exponentially.
  • Global name hierarchy with local naming contexts
    and location info embedded in fully qualified
    names.
  • Much like DNS
  • Security features, with per-domain authentication
    / access control.
  • Whole file caching or 64KB chunk caching
  • Amortize request/transfer cost
  • Client uses a disk cache
  • Cache is preserved across client failure.
  • Again, it looks a lot like the Web.

26
CFS Cluster File Systems
storage client
storage client
shared block storage service (FC/SAN, Petal, NASD)
xFS Dahlin95 Petal/Frangipani
Lee/Thekkath GFS Veritas EMC Celerra
issues trust compatibility with NAS
protocols sharing, coordination, recovery
27
Sharing and Coordination
block allocation and layout locking/leases,
granularity shared access separate lock
service logging and recovery network
partitions reconfiguration
NAS
SAN
storage service lock manager
28
Storage Architectures
29
Storage Area Networks
30
SAN connection
  • FC
  • FC-SAN
  • LAN (Ethernet)
  • IP-SAN
  • iSCSI
  • Other networks
  • Petal (ATM)

31
Typical SAN
  • Backup solutions (tape sharing)
  • Disaster tolerance solutions (distance to remote
    location)
  • Reliable, maintainable, scalable infrastructure

32
A real SAN.
33
NAS and SAN shortcomings
  • SAN Shortcomings--Data to desktop--Sharing
    between NT and UNIX--Lack of standards for file
    access and locking
  • NAS Shortcomings--Shared tape resources--Number
    of drives--Distance to tapes/disks
  • NAS--Focuses on applications, users, and the
    files and data that they share
  • SAN--Focuses on disks, tapes, and a scalable,
    reliable infrastructure to connect them
  • NAS Plus SAN--The complete solution, from
    desktop to data center to storage device

34
NAS plus SAN.
  • NAS Plus SAN--The complete solution, from
    desktop to data center to storage device

35
Petal/Frangipani
NFS
NAS
Frangipani
SAN
Petal
36
Petal/Frangipani
Untrusted OS-agnostic
FS semantics Sharing/coordination
Disk aggregation (bricks) Filesystem-agnostic Re
covery and reconfiguration Load balancing Chained
declustering Snapshots Does not control sharing
Each cloud may resize or reconfigure
independently. What indirection is required to
make this happen, and where is it?
37
Remaining Slides
  • The following slides have been borrowed from the
    Petal and Frangipani presentations, which were
    available on the Web until Compaq SRC dissolved.
    This material is owned by Ed Lee, Chandu
    Thekkath, and the other authors of the work. The
    Frangipani material is still available through
    Chandu Thekkaths site at www.thekkath.org.
  • For ECE6160, several issues are important
  • Understand the role of each layer in the
    previous slides, and the strengths and
    limitations of each layer as a basis for
    innovating behind its interface (NAS/SAN).
  • Understand the concepts of virtual disks and a
    cluster file system embodied in Petal and
    Frangipani.
  • Understand how the features of Petal simplify the
    design of a scalable cluster file system
    (Frangipani) above it.

38
Petal Distributed Virtual Disks
  • Systems Research Center
  • Digital Equipment Corporation
  • Edward K. Lee
  • Chandramohan A. Thekkath

11/15/2009
39
Logical System View
AdvFS
NT FS
PC FS
UFS
Scalable Network
Petal
40
Physical System View
Parallel Database or Cluster File System
Scalable Network
/dev/shared1
41
Virtual Disks
  • Each disk provides 264 byte address space.
  • Created and destroyed on demand.
  • Allocates disk storage on demand.
  • Snapshots via copy-on-write.
  • Online incremental reconfiguration.

42
Virtual to Physical Translation
Server 0
Server 1
Server 2
Server 3
Virtual Disk Directory
vdiskID
offset
GMap
PMap0
PMap1
PMap2
PMap3
(disk, diskOffset)
43
Global State Management
  • Based on Leslie Lamports Paxos algorithm.
  • Global state is replicated across all servers.
  • Consistent in the face of server network
    failures.
  • A majority is needed to update global state.
  • Any server can be added/removed in the presence
    of failed servers.

44
Fault-Tolerant Global Operations
  • Create/Delete virtual disks.
  • Snapshot virtual disks.
  • Add/Remove servers.
  • Reconfigure virtual disks.

45
Data Placement Redundancy
  • Supports non-redundant and chained-declustered
    virtual disks.
  • Parity can be supported if desired.
  • Chained-declustering tolerates any single
    component failure.
  • Tolerates many common multiple failures.
  • Throughput scales linearly with additional
    servers.
  • Throughput degrades gracefully with failures.

46
Chained Declustering
Server0
Server1
Server2
Server3
D0
D1
D2
D3
D3
D0
D1
D2
D4
D5
D6
D7
D7
D4
D5
D6
47
Chained Declustering
Server0
Server1
Server2
Server3
D0
D2
D3
D1
D3
D1
D2
D0
D4
D6
D7
D5
D7
D5
D6
D4
48
The Prototype
  • Digital ATM network.
  • 155 Mbit/s per link.
  • 8 AlphaStation Model 600.
  • 333 MHz Alpha running Digital Unix.
  • 72 RZ29 disks.
  • 4.3 GB, 3.5 inch, fast SCSI (10MB/s).
  • 9 ms avg. seek, 6 MB/s sustained transfer rate.
  • Unix kernel device driver.
  • User-level Petal servers.

49
The Prototype

src-ss1
src-ss2
src-ss8
/dev/vdisk1
/dev/vdisk1
/dev/vdisk1
/dev/vdisk1

petal1
petal2
petal8
50
Throughput Scaling
51
Virtual Disk Reconfiguration
8 servers
6 servers
virtual disk w/ 1GB of allocated storage 8KB
reads writes
52
Frangipani A Scalable Distributed File System
  • C. A. Thekkath, T. Mann, and E. K. Lee
  • Systems Research Center
  • Digital Equipment Corporation

53
Why Not An Old File System on Petal?
  • Traditional file systems (e.g., UFS, AdvFS)
    cannot share a block device
  • The machine that runs the file system can become
    a bottleneck

54
Frangipani
  • Behaves like a local file system
  • multiple machines cooperatively managea Petal
    disk
  • users on any machine see a consistentview of
    data
  • Exhibits good performance, scaling, and load
    balancing
  • Easy to administer

55
Ease of Administration
  • Frangipani machines are modular
  • can be added and deleted transparently
  • Common free space pool
  • users dont have to be moved
  • Automatically recovers from crashes
  • Consistent backup without halting the system

56
Components of Frangipani
  • File system core
  • implements the Digital Unix vnode interface
  • uses the Digital Unix Unified Buffer Cache
  • exploits Petals large virtual space
  • Locks with leases
  • Write-ahead redo log

57
Locks
  • Multiple reader/single writer
  • Locks are moderately coarse-grained
  • protects entire file or directory
  • Dirty data is written to disk before lock is
    given to another machine
  • Each machine aggressively caches locks
  • uses lease timeouts for lock recovery

58
Logging
  • Frangipani uses a write ahead redo log for
    metadata
  • log records are kept on Petal
  • Data is written to Petal
  • on sync, fsync, or every 30 seconds
  • on lock revocation or when the log wraps
  • Each machine has a separate log
  • reduces contention
  • independent recovery

59
Recovery
  • Recovery is initiated by the lock service
  • Recovery can be carried out on any machine
  • log is distributed and available via Petal

60
References
  • Storage Netwoking Indurstry Association
    www.snia.org
  • Xubin He, Ming Zhang, and Qing Yang, "STICS
    SCSI-To-IP Cache for Storage Area Networks,"
    Journal of Parallel and Distributed Computing,
    vol. 64, no. 9, pp.1069-1085, September 2004.
  • G. Gibson, D. Nagle, W. Courtright II, N. Lanza,
    P. Mazaitis, M. Unangst, and J. Zelenka, NASD
    Scalable Storage Systems, USENIX99
  • T. Anderson, M. Dahlin, J. Neefe, D. Patterson,
    D. Roselli, and R. Wang, Serverless Network File
    Systems, ACM Transactions on Computer Systems
  • R. Hernandez, C. Kion, and G. Cole, IP Storage
    Networking IBM NAS and iSCSI Solutions,
    Redbooks Publications (IBM)
  • E. Miller, D. Long, W. Freeman, and B. Reed,
    Strong Security for Network-Attached Storage,
    Proc. of the Conference on Fast and Storage
    Technologies (FAST2002)
  • D. Nagle, G. Ganger, J. Butler, G. Goodson, and
    C. Sabol, Network Support for Network-Attached
    Storage, Hot Interconnects1999
  • E. Lee and C. Thekkath, Petal Distributed
    Virtual Disks, Proceedings of the international
    conference on Architectural support for
    programming languages and operating systems
    (ASPLOS 1996)
  • P. Sarkar, S. Uttamchandani, and K. Voruganti,
    Storage Over IP When Does Hardware Support
    Help? Proc. of 2nd USENIX Conference on File And
    Storage Technologies (FAST2003)
  • C. Thekkath, T. Mann, and E. Lee, Frangipani A
    scalable distributed file system, Proceedings of
    the 16th ACM Symposium on Operating Systems
    Principles (SOSP), pp. 224-237, October 1997
Write a Comment
User Comments (0)
About PowerShow.com