Title: PersonalRAID: Mobile Storage for Distributed and Disconnected Computers
1PersonalRAID Mobile Storage for Distributed and
Disconnected Computers
2Outline
- Introduction
- Basic Ideas and Issues
- Solutions
- Experimental Results
- Conclusion Discussion
3Storage Technology Trends
- Capacity and speed grows
- 80GB single disk, 30MB/s
- Cost and physical size reduce
- 0.2 cents/MB, Pocket Disk, MicroDrive
- Mobile storage devices emerge
- MicroDrive, SmartMedia, CompactFlash
4Share data a trivial problem
5Problems
- Inconvenience involved in manual movement of data
- Example ftp
- The poor performance in terms of both latency and
throughput - Example floppy disk, email
- Network connection is not always available
- Example ftp, email, NFS
6Key idea in PersonalRAID
- Use a mobile storage as a floppy disk (Virtual A)
to synchronize contents among disconnected
computers
7Issues - Performance
8Performance goals
- Recording not impose excessive overhead
- During disconnection, user should not be forced
to wait for long before VA removed - During connection, user should not be forced to
wait for long before perform I/O - Replaying should not impose excessive overhead
- Replaying should proceed quickly
9Issues (cont.)
- Reliability
- Users should have some degree of confidence over
data reliability - Transparency
- Users should not be involved in data update and
propagation manually - Users should see a single global name space
regardless which device they use
10Attack problems (preview)
- Performance
- Distributed log-structured design
- Memory buffer
- V2A
- Reliability
- Physical redundancy
- Replaying, checkpoints
- Transparency
- Single coherent storage name space
- No human involvement
11Solutions
12Log-Structured File System
- Store data in a single, contiguous log
- Never seek between writes (because youre always
writing at the end of contiguous space) - Read is achieved by index
- Large memory is assumed to buffer writes and also
satisfy read requests - Segment and segment cleaner
13Log-structured file system (cont.)
LFS
Only 1 write
14Log-structured file system (cont.)
Traditional FS
4 writes 4 disk head moves
15Log-structured file system (cont.)
- Advantages
- Improve write performance by single sequential
write - Fast recovery in case of failure
- Disadvantages
- Read performance is not good
- Segment cleaning cost (or disk garbage
collection) is high
16Why LFS in PersonalRAID?
- Incremental update to diminish performance gap
- Fast recovery in case of failure
- Typical office workload
- Frequently small files read/write
- Example software developing
17The Logical Disk
- Separation of concerns
- File Management
- Example file cache, directory, file, inode
- Disk Management
- Example disk layout, blocks, cylinders, tracks
18The Logical Disk (cont.)
Read(bid, buf, cnt) Write(bid, buf,
cnt) NewBlock() DeleteBlock()
Small block size, optimized for office workload
bigger block size, optimized for scientific
computing workload
19Advantages of Logical Disk
- Makes file systems easier to develop, maintain
and modify - Makes file systems more flexible
- Allows efficient use of I/O bandwidth
- Example Reorganize the layout of blocks on the
disk on the fly
20Log-structured Logical Disk
User Space
write(fd, buf, len)
write(logical block number, buf, len)
write(segment number, buf)
Kernel Space
21Data Structures in LLD
in-memory mapping table is checkpointed to disk
Contains logical address and the time stamp of
each data block
22Implementations
23Inside PRS
- PRS maintains several main-memory segments for
both the local disk and the VA device - Segment cleaning is begun when the number of
clean segment falls below a threshold - PRS works sequentially
24Data Structures
25Solutions
26Recovering from Failures
- Crash problem
- Tolerant single device failure
- Host disk loss
- VA device loss
- PersonalRAIDs backup ability
- Data is recorded on host disk and VA
- Metadata is checkpointed
27Crash problem
- During recording
- Writes are buffered in memory
- Segment summary is updated
- In-memory map is also updated
- During disconnection
- Flush data to disk and VA sequentially
- Flush in-memory map to host disk and VAs
checkpoint region
28Crash Problem (cont.)
- Crash in recording
- Simply restore to last checkpoint
- Crash in disconnection
- Data inconsistence on VA and host disk
- Segment summary and checkpoint region are also
inconsistent - Goal is to make VA and host disk mutually
consistent
29Crash Recovery
- read all segment summaries from host disk
- read VA's checkpoint region
- for (each block)
- compare the time stamp in segment summary and
VA's checkpoint - if (less)
- synchronize this data block
- update the timestamp on VA's checkpoint and
segment summary - update propagation bits on VA's checkpoint
-
30Host Disk Loss
- Synchronize As disk with VA
- Make a disk mirror C of As disk
- Replace Bs disk with C
31VA Device Loss
Metadata is constructed
First tour
Data blocks are restored
Second tour
32VA Device Loss (cont)
Metadata is constructed
A
Data blocks are restored
Tour
33Solutions
34Transparency
- Single global name space
- Every participating host has a separate partition
- User has to work on this partition
- All operations are done automatically
- Except crash recovery, and disk failure
35Performance Evaluation
- Benchmarks
- Andrew Benchmark
- Mixed big files and small files, recursive
directory operations - Evaluate almost all performance aspects for a
file system - Mozilla source tree
- Small files creation, read, and write
- Software development workload
36Andrew Results
37Recording Performance
38Disconnection, connection, and replaying
performance
39Conflicts Solution
- Target on personal data usage
- Conflicts are system-level or app-level events,
must be addressed at higher levels - PersonalRAID is a storage-level solution
40Use Virtual VAs
- Physical mobile storage is not necessary in the
design - Example a file, a local disk partition
- Improve recording efficiency
- Make a copy of VA for purpose of reconstructing
of a lost VA
41Critiques
- VA device has to be present when the file system
is accessed - Implementation decisions
- Achieving global name space is awkward
- Security issues