Principles of Reliable Distributed Systems Recitation 11: Fault-Tolerant Storage Systems Petal and Frangipani

About This Presentation

Title:

Principles of Reliable Distributed Systems Recitation 11: Fault-Tolerant Storage Systems Petal and Frangipani

Description:

Uses lease timeouts for lock recovery ... independent recovery ... Disk Layout ... – PowerPoint PPT presentation

Number of Views:160

Avg rating:3.0/5.0

Slides: 24

Provided by: idi3

Category:

more less

Transcript and Presenter's Notes

Title: Principles of Reliable Distributed Systems Recitation 11: Fault-Tolerant Storage Systems Petal and Frangipani

1
Principles of Reliable Distributed Systems
Recitation 11 Fault-Tolerant Storage
SystemsPetal and Frangipani

Spring 2007
Alex Shraer

2
Motivation

Large-scale distributed file systems are hard to
administer
Administration is a problem because of
size of installation
number of components

3
Our Solution

Frangipani
a scalable, distributed file system
Two layered
simple file system core
Petal storage server

4
Petal Distributed Virtual Disks

C. A. Thekkath and E. K. Lee
Systems Research Center
Digital Equipment Corporation

5
Clients View
6
Petal Overview

Petal provides virtual disks
large (264 bytes), sparse virtual space
disk storage allocated on demand
accessible to all file servers over a network
Virtual disks implemented by
cooperating CPUs executing Petal software
ordinary disks attached to the CPUs
a scalable interconnection network

7
Petal Prototype
8
Global State Management

Based on Lamports Paxos Algorithm
Global state is replicated across all servers
Metadata (disk allocation) only!
Consistent in the face of server and network
failures.
A majority is needed to update global state
Any server can be added/removed in the presence
of failed servers.

9
Virtual to Physical Mapping
10
Key Petal Features

Storage is incrementally expandable
Data is optionally mirrored over multiple servers
Recall that metadata is replicated between all
Transparent addition and deletion of servers
Read-only snapshots of virtual disks
Copy-on-Write techniques
Client API looks like a block-level disk device
Throughput
Scales linearly with additional servers.
Degrades gracefully with failures.

11
Frangipani A Scalable Distributed File System

C. A. Thekkath, T. Mann, and E. K. Lee
Systems Research Center
Digital Equipment Corporation

12
Why Frangipani?

Why not use traditional file systems?
Cannot share a block device
The machine that runs the file system can become
a bottleneck

13
Frangipani Overview

Behaves like a local file system
multiple machines cooperatively managea Petal
disk
users on any machine see a consistentview of
data
Exhibits good performance, scaling, and load
balancing
Easy to administer

14
Ease of Administration

Frangipani machines are modular
can be added and deleted transparently
Common free space pool
users dont have to be moved
Automatically recovers from crashes
Consistent backup without halting the system

15
Frangipani Layering
16
Standard Organization
17
Components of Frangipani

File system core
implements the filesystem interface
uses the FS mechanisms (buffer cache etc.)
exploits Petals large virtual space
Locks with leases
Granted for finite time, must be refreshed
Write-ahead redo log
Performance optimization failure recovery

18
Locks

Multiple reader/single writer
Locks are moderately coarse-grained
Protects entire file or directory
Dirty data is written to disk before lock is
given to another machine
Alternative?
Each machine aggressively caches locks
Soft state no need to explicitly release locks
Uses lease timeouts for lock recovery

19
Logging

Frangipani uses a write ahead redo log for
metadata
log records are kept on Petal
Data is written to Petal
on sync, fsync, or every 30 seconds
on lock revocation or when the log wraps
Each machine has a separate log
reduces contention
independent recovery

20
Disk Layout
21
Recovery

Recovery is initiated by the lock service
Recovery can be carried out on any machine
log is distributed and available via Petal

22
Scaling (Throughput)
Read Throughput (MB/s)
Write Throughput (MB/s)
Frangipani machines
Frangipani machines
23
Conclusions

Simple two-layer structure has served us well
all shared state is on a Petal diskeasy to add,
delete, and recover servers
Frangipani servers do not communicate with each
other simple to design, implement, debug, and
test
Frangipani performance scales well on Unix
workloads
effects of lock contention and virtualization of
storageappear tolerable for this workload