Title: A Semanticbased Cache Replacement Algorithm for Mobile File Access
1A Semantic-based Cache Replacement Algorithm for
Mobile File Access
- Sharun Santhosh and Weisong Shi
- Department of Computer Science
- Wayne State University
- weisong_at_wayne.edu
- http//mist.cs.wayne.edu
2Motivation
- The Future
- Staying connected anywhere, anytime will become a
reality - How ?
- Cable modem or DSL connection at home
- High speed Ethernet network at work or school
- Satellite network in the car
- WiFi network at the airport or the neighborhood
coffee shop - Challenges
- Effectiveness - Adapt to the various underlying
connectivity - Convenience - Adaptation should be transparent to
the user - Security secure access in resource constraint
devices
3Heterogeneous Environment
802.11a,b,g
Local Area Network wLAN
Bluetooth
Personal Area Network (PAN)
Wide Area Network (WAN)
WirelessBridge
GPS
LAN
lt1Mbs
- Access
- Synchronization
- 10 Meters
WorkgroupSwitches
GSM/CDMA
lt100Mbs
9.6 Kbit/s lt2Mbs
- Access
- hot spots
- LAN equivalent
- Voice
- SMS
- e-Mail
- Web browsing
- mCommerce
- Internet access
- Document transfer
- Low/high quality video
4Our Solution
CEGOR ClosE and Go, Open and Resume
Connection View based Secure and Transparent
Reconnection
5Roadmap
- Motivation
- Caching
- Semantic-based Caching
- Simulation Results
- Conclusion
6Caching
- Three basic steps involved in accessing data
anywhere and anytime. - Retrieve the files from the server
- Work on them locally
- Write the changes back to the server
- A Cache optimizes this process
- Reduce frequency of disk operations performed
- Reduce frequency of requests to the fileservers
- Reducing network load
-
- Problem being addressed
- Minimization of Communication
7Why Study Caching?
- It has been studied extensively yet LRU is the
most commonly used algorithm - Used in NFS, AFS, Sprite, CODA and most operating
systems buffer caches - Why ?
- Its simple to implement.
- Cache misses are acceptable in existing systems.
- Number of files replaced do not matter
- high hit ratio vs. of replacement
- But in a heterogeneous environment
- Each miss implies additional communication
- Storage of work (when in a weakly connected or
disconnected state) - Cannot assume a reliable link exists with the
server
8Usage Scenario
Imagine a field engineer is accessing layout
diagrams for a faulty electricity sub-station,
half way through communications go down. A cache
MISS may cause several minutes delay, perhaps
longer, e.g.,
Which was the 10,000 volt cable?
9Is simple caching (LRU) enough??
10Goals
- Caches for distributed file systems, that operate
across heterogeneous networks must - Provide the hit rates of conventional caches that
operate over homogenous networks - Minimize communication overhead, i.e., minimize
replacements which mean increased file
availability and
11Our Approach to Caching
- File access patterns arent random
- A semantic relationship exists between two files
in a file access sequence - User behavior
- Program execution
- We define and investigate two kinds of such
relations - Inter-file relations
- Intra-file relations
- We introduce the notion of eviction index for
each cached item
12Outline
- Motivation
- Caching
- Semantic-based Caching
- Simulation Results
- Conclusion
13Inter-file relations
Analysis of DFS traces
14Inter-file relations
An inter-file relationship exits between two
files i and j, if i is the next file opened
following j being closed. File j is called file
is precursor. Xi - represents the number of
times file i is accessed. Ti - represents the
time since the last access to file i. Yj -
represents the number of times file j precedes
file i.
15Intra-file relations
- An intra-file relationship is said to exist
between two files i and j if they are both open
before they are closed. - Intra-file relations are based on shared time
Si,j defined below - Where O(i) and C(i) are the time at which file i
was opened or closed respectively
C
O
i
C
O
j
Sij
16Intra-file relations
Ti - represents the time since the last access to
file i. Tj - represents the time since the last
access to file j where j is open before i is
closed. Si,j - represents the shared time of file
i with respect to file j where i is closed before
j. Stotal - represents the total shared time with
all files that are open before i is closed
17Inter Intra
18Workload
DFS Traces from CMU were utilized during the
simulation
19Implementation
- Seven replacement algorithms
- RR Round Robin
- LRU Least recently used
- LFU Least frequently used
- GDS Greedy dual size
- INTER based only on inter-file relations
- INTRA based on intra-file relations
- Both based on both intra and inter file
relations - Varying cache sizes
- 10KB, 25KB, 50KB, 100KB, 500KB
- Seven traces
- Simulator maintains a cache (hashlist), open list
(list of currently open files), close list (list
of files that are closed).
20Structure of simulator
21Simulator pseudocode
22Outline
- Motivation
- Caching
- Semantic-based Caching
- Simulation Results
- Conclusion
23Hit rates of all algorithms
24Replace attempts of all algorithms
25Performance DFS Traces
26Performance DFS Traces
27Performance
28The Need For File System Tracing
- Traces havent been collected periodically enough
to reflect present day usage activity - Publicly available traces such as traces
collected at the disk driver level or web proxy
traces do not give us relevant information on
file system workload.
29System Call Interception
USER SPACE
KERNEL SPACE
int fopen(char name,char mode)
Standard Library
30Analysis Summary
- Most files were opened for less than a hundredth
of a second - Majority of files are accessed only a few times.
There is a small percentage of very popular files - Majority of files are less than 100KB in size.
Large file can be very large (heavy tail) - Almost half the accesses repeat within a short
period of initially occurring - File throughput has greatly increased due to
presence of large files - Majority of files accessed have a unique
predecessor
31MIST traces Hit Rates
32MIST Traces Files Replaced
33MIST Traces Byte Hit Rate
34Summary
- We have presented a semantic-based caching
algorithm and shown that it performs better than
conventional caching approaches in terms of hit
ratio and byte hit ratio -
- We have also shown that it does this performing
far fewer replacements - Compared to prevalent replacement strategies that
ignore file relations and communication overhead,
this approach would seem to better suit
distributed file systems that operate across
heterogeneous environments
35Future Work
- Collecting more state-of-the-art distributed file
systems traces - Applying the cache replacement algorithm into a
real wireless file system in computer-assisted
surgery application - Investigating the idea into more general
applications, such as mobile database access,
etc.
36Questions Comments?
weisong_at_wayne.edu http//mist.cs.wayne.edu