Using%20Database%20Technology%20to%20Improve%20Performance%20of%20Web%20Proxy%20Servers - PowerPoint PPT Presentation

About This Presentation
Title:

Using%20Database%20Technology%20to%20Improve%20Performance%20of%20Web%20Proxy%20Servers

Description:

Using Database Technology to Improve Performance of Web Proxy Servers K. Cheng , Y. Kambayashi , M. Mohania Kyoto University, Japan Western Michigan ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 23
Provided by: yme7
Category:

less

Transcript and Presenter's Notes

Title: Using%20Database%20Technology%20to%20Improve%20Performance%20of%20Web%20Proxy%20Servers


1
Using Database Technology to Improve Performance
of Web Proxy Servers
  • K. Cheng¹, Y. Kambayashi¹, M. Mohania²
  • ¹Kyoto University, Japan
  • ²Western Michigan University, USA

2
Caching on web proxy servers
Web Servers
Clients
  • Improve throughput of proxy servers
  • Improve response times for end users
  • Bridge bandwidth gap between WAN and LAN
  • Distribute workload from web servers

3
Characteristics of proxy caching
Traditional Caching Proxy Caching
Storage Memory-based Disk-based
Cache size Small Huge
Object survival time Short Long
Algorithm Simple Can be complex
Who use ? Programmed process People with specific interest
4
Limitations of current caching schemes case 1
  1. Tom found a very good page P1 about car models
  2. John is also looking for that kind of pages, but
    he only got P2
  3. Both P1 and P2 were cached, but Tom didnt
    know P2 and John didnt know about P1.
  4. After several days, however, both were replaced
    since no further visits.
  5. As a result, Tom missed P2, John missed P1,
    and cache missed 2 hits

State-of-art caching schemes cannot deal this
case!!
5
Limitations of current caching schemes case 2
  1. Suppose the users of a proxy server are mostly
    interested in XML, but rarely favor of Fuzzy
  2. Suppose some clients retrieved pages P1 and
    P2
  3. After checking the content of P1and P2, we
    know P1 is a XML one, P2 is a Fuzzy one

Should we prefer to cache P1 or P2 ?
6
Why current schemes cant deal with these cases ?
  • Physical object based cache management
  • Content transparency ? low utilization rate (Case
    1)
  • Approximately 60 data in cache never used
  • Approximately 90 data in cache rarely used
  • Usage-based object replacement ? Needlessly long
    stay time for irrelevant contents (Case 2)

7
Our solution
  • We propose a hierarchical data model for
    management of web data (physical pages, logical
    pages and topics).
  • Object replacement based on
  • Link structure (logical pages)
  • Semantic similarity with other objects (topics
    )
  • Facilitate active access to cache contents

8
A hierarchical model for web data
Topics
navigate
Topic manager
T1
T2
Mapping
Logical pages
Search
Logical page manager
L1
L2
L3
Mapping
Physical pages
Physical page manager
p1
p2
p3
p4
p5
p6
Browse
9
Physical pages
http//www.difa.unibas.it/webdb2001
../icons/webdblogo.gif
Physical page A
Physical page B
/instructionsPage/index.html
10
Logical page
A
B
11
Managing physical pages
  • Physical page
  • HTML/plain text file (.html, .txt)
  • Embedded media file (.gif, .png, wav, .mp3)
  • Application Generated File (.pdf, .ps, .doc)
  • Managing physical pages based on
  • URL (protocol, ip, port, path)
  • Physical properties (e.g. size, cost etc.)
  • Usage (frequency, recency)

12
Constructing logical pages
  • Basic logical pages
  • Single multimedia document
  • HTML(1) embedded media files(1..)
  • Extended logical pages
  • Several closely related directly linked pages
  • E.g. an HTML paper with sections on different
    multimedia documents

13
Managing topics
  • Defining a topic
  • Topic ltid, name, criteria, popularity, date, gt
  • Popularityf(F, R, P, U)
  • F Access Frequency of Topic
  • R - Time interval between last access time
    and current time
  • P Number of logical pages belonging to
    a topic
  • U Number of users accessing a topic
  • Deciding membership of a logical page to a topic
  • IR Approaches (K-NN, )
  • ML Approaches (e.g. Support Vector Machine-SVM)

14
Definitions
  • We use a term Priority for object replacement.
    It is a function of several parameters, e.g.
    access frequency(F), time interval(R), size of
    object(S), retrieval cost(C), significance(G).
  • Significance Importance of the topic

15
Caching policy LRU-SP
  • Topic management
  • Priority f(F, R, G)
  • Logical page management
  • Basic logical pages only
  • Priority g(F, R)
  • Physical page management
  • LRU-SP --size-adjusted popularity-aware LRU
    (K. Cheng et al, Compsac00)
  • Priority h(F, R, S)

16
Evaluate add new objects
D is of higher priority
T2
T1
Topics
Priority
Higher
Lower
L1
L2
L3
Logical Pages
P10
P40
P30
P20
Physical Pages
P11
P41
P31
P22
P12 P21
P42
New Object D

17
Replace an object
  1. Choose a candidate topic (T1)
  2. T1 has 1 logical page (L1), choose (L1)
  3. (L1) has 3 physical pages (P10), ( P11), (P12),
    where (P12) shared by (L2)
  4. Choose a victim (P) from (P10), ( P11).
  5. Replace (P) with the new page

18
Preliminary experiments
  • Replay access logs of our proxy server(Squid)
  • 30 clients, 30 days
  • 873,824 requests, 21.30GB data
  • 7 Topics, Priority ? 1..5
  • Significance Factor (0, 2)
  • Measure the significance of each topic
  • Hit Rate(HR)
  • Percentage of requests satisfied by cache
  • Profit Rate(PR)-- is significance of
    topic

19
Baseline algorithm LRV (Rizzo et al 1998)
  • A physical-page-based algorithm
  • Using size(S) to predict further access to
    incoming objects
  • Parameters in consideration
  • Access frequency (F)
  • Time interval (R)
  • Size of objects (S)

20
Results Hit Rates 20 UP
Cache space in of total unique data
21
Results Profit Rates 30 Up
Cache space in of total unique data
22
Conclusion and future work
  • Performance of caching proxies can be remarkably
    improved if cache contents were well organized
    and managed
  • Proposed a hierarchical model and the cache
    management scheme based on that model
  • Future work
  • Tuning various parameters to achieve better
    performance(Logical page clustering, priority
    balancing significance and popularity etc.)
  • More experiments
Write a Comment
User Comments (0)
About PowerShow.com