A Semanticbased Cache Replacement Algorithm for Mobile File Access - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

A Semanticbased Cache Replacement Algorithm for Mobile File Access

Description:

Analysis of DFS traces. Mobile and Internet Systems Group. Inter ... DFS Traces from CMU were utilized during the simulation. Mobile and Internet Systems Group ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 37
Provided by: mist68
Category:

less

Transcript and Presenter's Notes

Title: A Semanticbased Cache Replacement Algorithm for Mobile File Access


1
A Semantic-based Cache Replacement Algorithm for
Mobile File Access
  • Sharun Santhosh and Weisong Shi
  • Department of Computer Science
  • Wayne State University
  • weisong_at_wayne.edu
  • http//mist.cs.wayne.edu

2
Motivation
  • The Future
  • Staying connected anywhere, anytime will become a
    reality
  • How ?
  • Cable modem or DSL connection at home
  • High speed Ethernet network at work or school
  • Satellite network in the car
  • WiFi network at the airport or the neighborhood
    coffee shop
  • Challenges
  • Effectiveness - Adapt to the various underlying
    connectivity
  • Convenience - Adaptation should be transparent to
    the user
  • Security secure access in resource constraint
    devices

3
Heterogeneous Environment
802.11a,b,g
Local Area Network wLAN
Bluetooth
Personal Area Network (PAN)
Wide Area Network (WAN)
WirelessBridge
GPS
LAN
lt1Mbs
  • Access
  • Synchronization
  • 10 Meters

WorkgroupSwitches
GSM/CDMA
lt100Mbs
9.6 Kbit/s lt2Mbs
  • Access
  • hot spots
  • LAN equivalent
  • Voice
  • SMS
  • e-Mail
  • Web browsing
  • mCommerce
  • Internet access
  • Document transfer
  • Low/high quality video

4
Our Solution
CEGOR ClosE and Go, Open and Resume
Connection View based Secure and Transparent
Reconnection
5
Roadmap
  • Motivation
  • Caching
  • Semantic-based Caching
  • Simulation Results
  • Conclusion

6
Caching
  • Three basic steps involved in accessing data
    anywhere and anytime.
  • Retrieve the files from the server
  • Work on them locally
  • Write the changes back to the server
  • A Cache optimizes this process
  • Reduce frequency of disk operations performed
  • Reduce frequency of requests to the fileservers
  • Reducing network load
  • Problem being addressed
  • Minimization of Communication

7
Why Study Caching?
  • It has been studied extensively yet LRU is the
    most commonly used algorithm
  • Used in NFS, AFS, Sprite, CODA and most operating
    systems buffer caches
  • Why ?
  • Its simple to implement.
  • Cache misses are acceptable in existing systems.
  • Number of files replaced do not matter
  • high hit ratio vs. of replacement
  • But in a heterogeneous environment
  • Each miss implies additional communication
  • Storage of work (when in a weakly connected or
    disconnected state)
  • Cannot assume a reliable link exists with the
    server

8
Usage Scenario
Imagine a field engineer is accessing layout
diagrams for a faulty electricity sub-station,
half way through communications go down. A cache
MISS may cause several minutes delay, perhaps
longer, e.g.,
Which was the 10,000 volt cable?
9
Is simple caching (LRU) enough??
10
Goals
  • Caches for distributed file systems, that operate
    across heterogeneous networks must
  • Provide the hit rates of conventional caches that
    operate over homogenous networks
  • Minimize communication overhead, i.e., minimize
    replacements which mean increased file
    availability and

11
Our Approach to Caching
  • File access patterns arent random
  • A semantic relationship exists between two files
    in a file access sequence
  • User behavior
  • Program execution
  • We define and investigate two kinds of such
    relations
  • Inter-file relations
  • Intra-file relations
  • We introduce the notion of eviction index for
    each cached item

12
Outline
  • Motivation
  • Caching
  • Semantic-based Caching
  • Simulation Results
  • Conclusion

13
Inter-file relations
Analysis of DFS traces
14
Inter-file relations
An inter-file relationship exits between two
files i and j, if i is the next file opened
following j being closed. File j is called file
is precursor. Xi - represents the number of
times file i is accessed. Ti - represents the
time since the last access to file i. Yj -
represents the number of times file j precedes
file i.
15
Intra-file relations
  • An intra-file relationship is said to exist
    between two files i and j if they are both open
    before they are closed.
  • Intra-file relations are based on shared time
    Si,j defined below
  • Where O(i) and C(i) are the time at which file i
    was opened or closed respectively

C
O
i
C
O
j
Sij
16
Intra-file relations
Ti - represents the time since the last access to
file i. Tj - represents the time since the last
access to file j where j is open before i is
closed. Si,j - represents the shared time of file
i with respect to file j where i is closed before
j. Stotal - represents the total shared time with
all files that are open before i is closed
17
Inter Intra
18
Workload
DFS Traces from CMU were utilized during the
simulation
19
Implementation
  • Seven replacement algorithms
  • RR Round Robin
  • LRU Least recently used
  • LFU Least frequently used
  • GDS Greedy dual size
  • INTER based only on inter-file relations
  • INTRA based on intra-file relations
  • Both based on both intra and inter file
    relations
  • Varying cache sizes
  • 10KB, 25KB, 50KB, 100KB, 500KB
  • Seven traces
  • Simulator maintains a cache (hashlist), open list
    (list of currently open files), close list (list
    of files that are closed).

20
Structure of simulator
21
Simulator pseudocode
22
Outline
  • Motivation
  • Caching
  • Semantic-based Caching
  • Simulation Results
  • Conclusion

23
Hit rates of all algorithms
24
Replace attempts of all algorithms
25
Performance DFS Traces
26
Performance DFS Traces
27
Performance
28
The Need For File System Tracing
  • Traces havent been collected periodically enough
    to reflect present day usage activity
  • Publicly available traces such as traces
    collected at the disk driver level or web proxy
    traces do not give us relevant information on
    file system workload.

29
System Call Interception
USER SPACE
KERNEL SPACE
int fopen(char name,char mode)
Standard Library
30
Analysis Summary
  • Most files were opened for less than a hundredth
    of a second
  • Majority of files are accessed only a few times.
    There is a small percentage of very popular files
  • Majority of files are less than 100KB in size.
    Large file can be very large (heavy tail)
  • Almost half the accesses repeat within a short
    period of initially occurring
  • File throughput has greatly increased due to
    presence of large files
  • Majority of files accessed have a unique
    predecessor

31
MIST traces Hit Rates
32
MIST Traces Files Replaced
33
MIST Traces Byte Hit Rate
34
Summary
  • We have presented a semantic-based caching
    algorithm and shown that it performs better than
    conventional caching approaches in terms of hit
    ratio and byte hit ratio
  • We have also shown that it does this performing
    far fewer replacements
  • Compared to prevalent replacement strategies that
    ignore file relations and communication overhead,
    this approach would seem to better suit
    distributed file systems that operate across
    heterogeneous environments

35
Future Work
  • Collecting more state-of-the-art distributed file
    systems traces
  • Applying the cache replacement algorithm into a
    real wireless file system in computer-assisted
    surgery application
  • Investigating the idea into more general
    applications, such as mobile database access,
    etc.

36
Questions Comments?
weisong_at_wayne.edu http//mist.cs.wayne.edu
Write a Comment
User Comments (0)
About PowerShow.com