Data Staging on Untrusted Surrogates - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Data Staging on Untrusted Surrogates

Description:

Data Staging on Untrusted Surrogates Jason Flinn Shafeeq Sinnamohideen Niraj Tolia Mahadev Satyanarayanan Intel Research Pittsburgh, University of Michigan, – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 25
Provided by: JasonF156
Category:

less

Transcript and Presenter's Notes

Title: Data Staging on Untrusted Surrogates


1
Data Staging on Untrusted Surrogates
  • Jason Flinn
  • Shafeeq Sinnamohideen
  • Niraj Tolia
  • Mahadev Satyanarayanan
  • Intel Research Pittsburgh, University of
    Michigan,
  • Carnegie Mellon University

2
Mobile Data Access Expectation vs. Reality
  • Mobile computers increasingly connected
  • expectation of ubiquitous data access
  • distributed file systems can help
  • Does reality match expectations?
  • Size, weight, energy constaints
  • Less storage, processing power, etc.
  • How to match reality and expectations?
  • Use untrusted, unmanaged infrastructure!

3
Problem Limited Storage
  • Latency often the real performance-killer
  • File systems many sequential RPCs
  • Network latency not improving (much)!
  • What if one cant cache all files of interest?
  • Borrow storage from nearby surrogate
  • Use as a L2 file cache

Client
Surrogate
File server
4
Problem Limited Battery Energy
  • File system consumes a lot of energy
  • Network communication
  • Storage (disk spin-ups, reads, writes)
  • Surrogate helps preserve client battery
  • Use surrogate cache to avoid disk spin-ups
  • Prefetch updates to surrogate, not client

5
Problem Limited Bandwidth
  • How to fetch large updates in a short window?
  • Example passing through airport gate
  • 11 Mbps (or more) local wireless bandwidth
  • Wide-area Internet bandwidth often less
  • InfoStation (Wu, Badrinath, et al.)
  • Cache updates before mobile user arrives
  • Blast data as user passes through cell
  • Surrogate mechanism for caching file data.

6
Location, Location, Location
  • Requirement surrogate located near the client!
  • Must be opportunistic (use whats there)
  • Vision surrogates ubiquitously deployed
  • Computers getting ever cheaper
  • Already 802.11b wireless networks in cafes
  • Cant trust or assume good behavior!

7
Outline
  • Motivation
  • Architecture and design
  • Implementation
  • Evaluation
  • Related work and conclusions

8
Data Staging Architecture
File system traffic
9
Trust (or Lack Thereof)
  • Trusted client, file server, desktop, file
    system
  • Untrusted surrogate, network
  • How to deal with untrusted surrogate?
  • End-to-end encryption (privacy)
  • Cryptographic hashes (authenticity)
  • Read-only data (cant lose updates)
  • Monitor performance (mitigate DoS)

10
Ease of Management
  • Cant require a system administrator!
  • Build on commodity software
  • Apache with Perl scripts (643 LoC)
  • No long-term state
  • OK to trip over power cord!
  • Allow file system diversity
  • Minimalist API
  • Currently support Coda and NFS

11
Surrogate API
  • Register() Get lease, quota for surrogate
  • Renew() Renew a lease
  • Deregister() Explicitly stop using surrogate
  • Stage() Put data on the surrogate
  • Unstage() Remove data from surrogate
  • Get() Retrieve data from surrogate

12
Which Files to Stage?
  • Must predict the files most likely to be accessed
  • Prediction orthogonal to data staging
  • Client proxy has hooks for prediction code
  • Hoarding user manually specifies files, dirs
  • Clustering per-activity LRU caching

Manual Copy
Coda Hoarding
User-Driven Clustering
SEER
Less Transparent
More Transparent
13
Client Proxy Data Structures
  • Client proxy final arbiter of validity
  • For each staged file, maintains
  • Valid bit
  • Data length
  • Encryption key and secure hash

File id Valid? Length Key Hash
0x3fdc Yes 32,558 0xeabc 0xea67
0x3fe6 No 23,458 0xabc3 0x7345
14
Staging Data
  • Client proxy sends list of files to data pump
  • For each file, data pump
  • Reads file and attributes from file system
  • Encrypts file, generates hash over data
  • Sends encrypted data to surrogate
  • Sends key, hash, length to client
  • Staging asynchronous with client file accesses
  • If file staged, client gets it from surrogate
  • Otherwise, gets it from file server

15
Outline
  • Motivation
  • Architecture and design
  • Implementation
  • Evaluation
  • Related work and conclusions

16
Experimental Setup
Client IPAQ 3850 64 MB Coda cache
30 ms delay
Ethernet
802.11b Wireless Access Point
Coda file server
Surrogate
Cold cache no data on client or surrogate Warm
cache data initially on client and surrogate
17
Benchmark Image Trace
  • Record accesses to digital photo library in Coda
  • Take the first 10,148 accesses
  • 150 MB unique data, 401 MB total data read
  • Replay trace as fast as possible (DFSTrace)
  • Variables
  • Wastage ratio extra data prefetched
  • Miss ratio amount of data never prefetched
  • Assume wastage ratio 33, miss ratio 0
  • Then do sensitivity analysis

18
Baseline Image Results
Staging reduces execution time 45-48!
19
Sensitivity Analysis
Higher miss ratio has relatively greater effect
20
Longer-Duration File Traces
  • Used Mummerts Coda file system traces
  • Traces of client activity (open, mkdir, etc.)
  • Duration 16-55 hours
  • Working set size 57-254 MB
  • Methodology
  • Keep inter-request delays when prefetching
  • Eliminate delays afterwards

21
File Trace Results
Up to 48 reduction in cumulative file access
delay
22
Request Latency Breakdown
23
Related Work
  • Web Caching (Akamai, Squid)
  • Different data access patterns, consistency
  • Fluid Replication (Kim02)
  • Assume more trust and management
  • OceanStore (Kubiatowicz02)
  • Staging minimalist, file-system agnostic
  • Builds on work in file prefetching, InfoStations

24
Conclusion
  • Possible to significantly improve distributed
    file system performance with untrusted, unmanaged
    infrastructure!
  • Future work
  • Grow set of supported file systems
  • Surrogate discovery and migration
  • Support for energy-awareness
  • http//info.pittsburgh.intel-research.net
Write a Comment
User Comments (0)
About PowerShow.com