Principles of Reliable Distributed Systems Recitation 11: Fault-Tolerant Storage Systems Petal and Frangipani - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Principles of Reliable Distributed Systems Recitation 11: Fault-Tolerant Storage Systems Petal and Frangipani

Description:

Uses lease timeouts for lock recovery ... independent recovery ... Disk Layout ... – PowerPoint PPT presentation

Number of Views:160
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Principles of Reliable Distributed Systems Recitation 11: Fault-Tolerant Storage Systems Petal and Frangipani


1
Principles of Reliable Distributed Systems
Recitation 11 Fault-Tolerant Storage
SystemsPetal and Frangipani
  • Spring 2007
  • Alex Shraer

2
Motivation
  • Large-scale distributed file systems are hard to
    administer
  • Administration is a problem because of
  • size of installation
  • number of components

3
Our Solution
  • Frangipani
  • a scalable, distributed file system
  • Two layered
  • simple file system core
  • Petal storage server

4
Petal Distributed Virtual Disks
  • C. A. Thekkath and E. K. Lee
  • Systems Research Center
  • Digital Equipment Corporation

5
Clients View
6
Petal Overview
  • Petal provides virtual disks
  • large (264 bytes), sparse virtual space
  • disk storage allocated on demand
  • accessible to all file servers over a network
  • Virtual disks implemented by
  • cooperating CPUs executing Petal software
  • ordinary disks attached to the CPUs
  • a scalable interconnection network

7
Petal Prototype
8
Global State Management
  • Based on Lamports Paxos Algorithm
  • Global state is replicated across all servers
  • Metadata (disk allocation) only!
  • Consistent in the face of server and network
    failures.
  • A majority is needed to update global state
  • Any server can be added/removed in the presence
    of failed servers.

9
Virtual to Physical Mapping
10
Key Petal Features
  • Storage is incrementally expandable
  • Data is optionally mirrored over multiple servers
  • Recall that metadata is replicated between all
  • Transparent addition and deletion of servers
  • Read-only snapshots of virtual disks
  • Copy-on-Write techniques
  • Client API looks like a block-level disk device
  • Throughput
  • Scales linearly with additional servers.
  • Degrades gracefully with failures.

11
Frangipani A Scalable Distributed File System
  • C. A. Thekkath, T. Mann, and E. K. Lee
  • Systems Research Center
  • Digital Equipment Corporation

12
Why Frangipani?
  • Why not use traditional file systems?
  • Cannot share a block device
  • The machine that runs the file system can become
    a bottleneck

13
Frangipani Overview
  • Behaves like a local file system
  • multiple machines cooperatively managea Petal
    disk
  • users on any machine see a consistentview of
    data
  • Exhibits good performance, scaling, and load
    balancing
  • Easy to administer

14
Ease of Administration
  • Frangipani machines are modular
  • can be added and deleted transparently
  • Common free space pool
  • users dont have to be moved
  • Automatically recovers from crashes
  • Consistent backup without halting the system

15
Frangipani Layering
16
Standard Organization
17
Components of Frangipani
  • File system core
  • implements the filesystem interface
  • uses the FS mechanisms (buffer cache etc.)
  • exploits Petals large virtual space
  • Locks with leases
  • Granted for finite time, must be refreshed
  • Write-ahead redo log
  • Performance optimization failure recovery

18
Locks
  • Multiple reader/single writer
  • Locks are moderately coarse-grained
  • Protects entire file or directory
  • Dirty data is written to disk before lock is
    given to another machine
  • Alternative?
  • Each machine aggressively caches locks
  • Soft state no need to explicitly release locks
  • Uses lease timeouts for lock recovery

19
Logging
  • Frangipani uses a write ahead redo log for
    metadata
  • log records are kept on Petal
  • Data is written to Petal
  • on sync, fsync, or every 30 seconds
  • on lock revocation or when the log wraps
  • Each machine has a separate log
  • reduces contention
  • independent recovery

20
Disk Layout
21
Recovery
  • Recovery is initiated by the lock service
  • Recovery can be carried out on any machine
  • log is distributed and available via Petal

22
Scaling (Throughput)
Read Throughput (MB/s)
Write Throughput (MB/s)
Frangipani machines
Frangipani machines
23
Conclusions
  • Simple two-layer structure has served us well
  • all shared state is on a Petal diskeasy to add,
    delete, and recover servers
  • Frangipani servers do not communicate with each
    other simple to design, implement, debug, and
    test
  • Frangipani performance scales well on Unix
    workloads
  • effects of lock contention and virtualization of
    storageappear tolerable for this workload
Write a Comment
User Comments (0)
About PowerShow.com