A Semanticbased Cache Replacement Algorithm for Mobile File Access - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

A Semanticbased Cache Replacement Algorithm for Mobile File Access

Description:

Analysis of DFS traces. Mobile and Internet Systems Group. Inter ... DFS Traces from CMU were utilized during the simulation. Mobile and Internet Systems Group ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 37

Provided by: mist68

Category:

more less

Transcript and Presenter's Notes

Title: A Semanticbased Cache Replacement Algorithm for Mobile File Access

1
A Semantic-based Cache Replacement Algorithm for
Mobile File Access

Sharun Santhosh and Weisong Shi
Department of Computer Science
Wayne State University
weisong_at_wayne.edu
http//mist.cs.wayne.edu

2
Motivation

The Future
Staying connected anywhere, anytime will become a
reality
How ?
Cable modem or DSL connection at home
High speed Ethernet network at work or school
Satellite network in the car
WiFi network at the airport or the neighborhood
coffee shop
Challenges
Effectiveness - Adapt to the various underlying
connectivity
Convenience - Adaptation should be transparent to
the user
Security secure access in resource constraint
devices

3
Heterogeneous Environment
802.11a,b,g
Local Area Network wLAN
Bluetooth
Personal Area Network (PAN)
Wide Area Network (WAN)
WirelessBridge
GPS
LAN
lt1Mbs

Access
Synchronization
10 Meters

WorkgroupSwitches
GSM/CDMA
lt100Mbs
9.6 Kbit/s lt2Mbs

Access
hot spots
LAN equivalent

Voice
SMS
e-Mail
Web browsing

mCommerce
Internet access
Document transfer
Low/high quality video

4
Our Solution
CEGOR ClosE and Go, Open and Resume
Connection View based Secure and Transparent
Reconnection
5
Roadmap

Motivation
Caching
Semantic-based Caching
Simulation Results
Conclusion

6
Caching

Three basic steps involved in accessing data
anywhere and anytime.
Retrieve the files from the server
Work on them locally
Write the changes back to the server
A Cache optimizes this process
Reduce frequency of disk operations performed
Reduce frequency of requests to the fileservers
Reducing network load
Problem being addressed
Minimization of Communication

7
Why Study Caching?

It has been studied extensively yet LRU is the
most commonly used algorithm
Used in NFS, AFS, Sprite, CODA and most operating
systems buffer caches
Why ?
Its simple to implement.
Cache misses are acceptable in existing systems.
Number of files replaced do not matter
high hit ratio vs. of replacement
But in a heterogeneous environment
Each miss implies additional communication
Storage of work (when in a weakly connected or
disconnected state)
Cannot assume a reliable link exists with the
server

8
Usage Scenario
Imagine a field engineer is accessing layout
diagrams for a faulty electricity sub-station,
half way through communications go down. A cache
MISS may cause several minutes delay, perhaps
longer, e.g.,
Which was the 10,000 volt cable?
9
Is simple caching (LRU) enough??
10
Goals

Caches for distributed file systems, that operate
across heterogeneous networks must
Provide the hit rates of conventional caches that
operate over homogenous networks
Minimize communication overhead, i.e., minimize
replacements which mean increased file
availability and

11
Our Approach to Caching

File access patterns arent random
A semantic relationship exists between two files
in a file access sequence
User behavior
Program execution
We define and investigate two kinds of such
relations
Inter-file relations
Intra-file relations
We introduce the notion of eviction index for
each cached item

12
Outline

Motivation
Caching
Semantic-based Caching
Simulation Results
Conclusion

13
Inter-file relations
Analysis of DFS traces
14
Inter-file relations
An inter-file relationship exits between two
files i and j, if i is the next file opened
following j being closed. File j is called file
is precursor. Xi - represents the number of
times file i is accessed. Ti - represents the
time since the last access to file i. Yj -
represents the number of times file j precedes
file i.
15
Intra-file relations

An intra-file relationship is said to exist
between two files i and j if they are both open
before they are closed.
Intra-file relations are based on shared time
Si,j defined below
Where O(i) and C(i) are the time at which file i
was opened or closed respectively

C
O
i
C
O
j
Sij
16
Intra-file relations
Ti - represents the time since the last access to
file i. Tj - represents the time since the last
access to file j where j is open before i is
closed. Si,j - represents the shared time of file
i with respect to file j where i is closed before
j. Stotal - represents the total shared time with
all files that are open before i is closed
17
Inter Intra
18
Workload
DFS Traces from CMU were utilized during the
simulation
19
Implementation

Seven replacement algorithms
RR Round Robin
LRU Least recently used
LFU Least frequently used
GDS Greedy dual size
INTER based only on inter-file relations
INTRA based on intra-file relations
Both based on both intra and inter file
relations
Varying cache sizes
10KB, 25KB, 50KB, 100KB, 500KB
Seven traces
Simulator maintains a cache (hashlist), open list
(list of currently open files), close list (list
of files that are closed).

20
Structure of simulator
21
Simulator pseudocode
22
Outline

Motivation
Caching
Semantic-based Caching
Simulation Results
Conclusion

23
Hit rates of all algorithms
24
Replace attempts of all algorithms
25
Performance DFS Traces
26
Performance DFS Traces
27
Performance
28
The Need For File System Tracing

Traces havent been collected periodically enough
to reflect present day usage activity
Publicly available traces such as traces
collected at the disk driver level or web proxy
traces do not give us relevant information on
file system workload.

29
System Call Interception
USER SPACE
KERNEL SPACE
int fopen(char name,char mode)
Standard Library
30
Analysis Summary

Most files were opened for less than a hundredth
of a second
Majority of files are accessed only a few times.
There is a small percentage of very popular files
Majority of files are less than 100KB in size.
Large file can be very large (heavy tail)
Almost half the accesses repeat within a short
period of initially occurring
File throughput has greatly increased due to
presence of large files
Majority of files accessed have a unique
predecessor

31
MIST traces Hit Rates
32
MIST Traces Files Replaced
33
MIST Traces Byte Hit Rate
34
Summary

We have presented a semantic-based caching
algorithm and shown that it performs better than
conventional caching approaches in terms of hit
ratio and byte hit ratio
We have also shown that it does this performing
far fewer replacements
Compared to prevalent replacement strategies that
ignore file relations and communication overhead,
this approach would seem to better suit
distributed file systems that operate across
heterogeneous environments

35
Future Work

Collecting more state-of-the-art distributed file
systems traces
Applying the cache replacement algorithm into a
real wireless file system in computer-assisted
surgery application
Investigating the idea into more general
applications, such as mobile database access,
etc.

36
Questions Comments?
weisong_at_wayne.edu http//mist.cs.wayne.edu

Write a Comment

User Comments (0)