Using User Access Patterns for Semantic Query Caching - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Using User Access Patterns for Semantic Query Caching

Description:

help to anticipate and pre-fetch incoming queries. help for semantic query caching. ... pre-fetching queries based on user access graphs. integrating query pre ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 18

Provided by: cseY

Category:

more less

Transcript and Presenter's Notes

Title: Using User Access Patterns for Semantic Query Caching

1
Using User Access Patterns for Semantic Query
Caching

Qingsong Yao, and Aijun An
qingsong,ann_at_cs.yorku.ca
Department of Computer Science
York University
Toronto, Canada
2003.08

2
Topics of Discussion

User Access Patterns
Semantic Query Caching Problem and Solutions
Algorithms
Implementation and Experiments
Conclusion and Future Work

3
Background

All SQL queries submitted by a client or a user
have specific meaning, and the query execution
orders follow certain business logics or rules.
The applications are written by using certain
programming tools. The embedded business logics
ensure that the submitted queries have certain
formats and follow certain rules.
In a database-driven web site, each dynamic web
page corresponds to a set of queries. The web
visitors show certain navigation patterns, thus
the queries show certain orders.
User access patterns describe how a group of
users or a client application accesses the data
of a database, include
a collection of user access events which
represents the format of queries.
a collection of frequent user access graphs
which describes the query execution orders.
User access patterns can be mined from database
workload or business logics, and they can
help to rewrite certain SQL to gain fast
execution time.
help to tune the database system,
help to anticipate and pre-fetch incoming
queries.
help for semantic query caching.

4
User Access Patterns

User access patterns describe how a group of
users or a client application accesses the data
of a database, include
a collection of user access events associated
with the occurrence frequency and the parameter
distribution which describe the format of
queries.
a collection of frequent user access graphs
which describes the query execution orders.
User access patterns can be mined from database
workload or business logics, and they can
help to rewrite certain SQL to gain fast
execution time.
help to tune the database system, and generate
aalternative query execution path by adding
index, materialized view , statistics.
help to anticipate and pre-fetch incoming
queries.
help for semantic query caching.
help to find possible data distributions or
relations.

5
User Access Event and User Access Graph

A user access event represents a set of similar
queries. It contains a SQL template and a set of
parameters.
SQL Template each value of the SQL queries is
replaced by a wildcard character ()
Parameters the corresponding values of the
queries, and can be constants or variables
For example, event (select name from customer
where id,101) will retrieve customer 101 s
name, and event (select name from customer where
idd, cid) represent a set of queries that
retrieve the name of a given customer
A user access graph is a directed dependency
graph which represents the query execution order
each node is a user access event or user access
graph
each edge associates with a confidence value
contains a set of global variables shared by the
nodes
some nodes associate with actions which change
the value of global variables
Semantic relationship exists between the events
of a graph

6
An Example of User Access Graph
V31,V32,V34 are determined event.V34 is
parameter-determined by V33. V31 and V32 have the
same query predicate.
7
Problem and Solutions

Semantic Query Caching
Previous query results are cached at clients or
mediates.
Each cache associates with a formula to describe
the content.
By finding the semantic relationship between the
cache formula and the query predicate, cached
query result can help to evaluate incoming
queries.
In order to answer a query, a probe query is
performed on the cached data, as well as a
reminder query is performed on the server to
retrieve the data which is not in the caches.
Problem
no good cache selection and replacement policies.
cache matching time can not be ignored.
Solution
anticipate future queries according to request
sequence and user access graphs.
rewrite current query to answer future queries.

8
Problem and Solutions (2)

Given two consecutive parameterized queries u and
v, three kinds of solutions are proposed

PR
SEQ

Costs
Response Time Network Traffic Server
Processing Time and Costs.
Factors
The semantic relationship between u and v.
The result size of u and v.
The possibility that v follows u .

9
Semantic Relationship
rows
u v
columns
6. u and v are irrelevant.
10
Rewriting Algorithm

Assumption the query predicates of u and v are
both the conjunction of basic predicate units
var1 op var2 cons., where op in ,lt,gt,gt,lt
solutions
if v is undetermined, solutions
else if v is result-determined by u, solutions
SEQ
else if u and v are irrelevant, solutions
SEQ
else Gu weighted_directed_graph(u)
Gv weighted_directed_graph(v)
comm_diff(Gu,Gv,comm,diff)
rel relation(comm,diff)
choose solutions according to rel
generate rewriting queries
select a solution based on the overall costs.

11
Example

u1 select a1 from r1 where a21 and a3lt3
u2 select a1 from r1 where a21 and a3gt1,

u1gt select a1 from r1 where a2lt1 and 0lta2-1
and a3lt3
comm (a2,0,1),(0,a2,-1) difflt(a3,0,3),null
gt, ltnull,(0,a3,-1)gt, lt(a3,a2,2),nullgt,ltn
ull,(a2,a3,0)gt
Relationship partial-match
Rewriting queries u1 ? u2 select a1,a3 from r1
where a21 u1 select a1,a3 from r1 where a21
and a3 lt 3 u2 select a1 from r1 where a21
and a3gt3
12
Implementation - Architecture

SQL-Relay is an event-driven, rule-based database
gateway
Each connected user correspond to a state machine
which contains a set of states, variables and
user request sequence.
Each incoming query is one kind of user access
event.
Each event associates with a set of pre-defined
execution rules.
SQL-Relay contains a set of standard routine to
process a given execution rules.
Previous query results are cached for answering
incoming queries.
Two different kinds of caches global cache and
local cache.

13
Implementation-Cache Management
Two kinds of caches

A global cache aims to answer multiple request
from the clients.
always available to answer queries.
various cache update policies can be used
(immediate update, periodic update, or
deferred update) .
can not be replaced by other caches, and is
always available unless it is temporarily
disabled due to updating.
A local cache aims to answer the request from a
specific client.
has shorter life-time.
may be replaced by other local caches, and an
LRU-like replacement policy is used for cache
replacement.
is discarded when the underlying data is update.

14
Experiment Result (1)

Mining result for a client/server application
from one day's database queries log
12 instances of the application, and 9,344 SQL
queries.
190 user access events.
718 user request sequences belong to 21 frequent
user access graphs (support gt10).

q2,q3,q4 has horizontal-match relationship, can
be rewrite as select count() as num,
card_id, contract_last,
contract_first from customer where
cust_num1074
15
Experiment Result (2)

Client program is implemented by using java
language, simulates user request sequences based
on the user access graphs and the following
parameters
nClients the number of concurrent connected
clients.
nPaths the number of user request
sequences.
nRuns the number of runs.
errorFreq the frequency of a random event
occurs.
abortFreq the frequency of a sequence abort
to execution.
randomFreq the frequency of a random sequence
occurs.
sleepTimePerEvent average sleep time between
two queries, default 0.1s.
sleepTimePerPath average sleep time between
two sequences,default 0.5s.
sleepTimePerRun average sleep time between
two runs, default 3min.
Database server is MySQL version 4 which
implements a server-side query cache function,
and MySQL is configured with a 128Kbytes server
cache.
SQL-Relay is implemented by using java language,
and has 128Kbytes global cache, 4Kbytes local
cache per connected client.

16
Experiment Result (3)

Client program simulates user request sequences
based on the user access graphs. Database server
is MySQL version 4 which is configured with a
128Kbytes server cache. SQL-Relay is implemented
by using java language, and has 128Kbytes global
cache, 4Kbytes local cache per connected client
Comparison of cache performance under the
following conditions
executing queries without cache
executing queries with 128K server cache
pre-fetching queries based on user access graphs
integrating query pre-fetching and query
rewriting rules together

Our solution has better performance than
others.
cache hit frequency is higher that server-side
cache.
save network traffic by retrieving less data from
the server.
the rewritten queries have less server I/Os.

17
Experiment Result (4) TPC-W Benchmark

Pseudo code for order display web interaction
q1 select c_id from customer where
c_uname_at_c_uname and c_passwd_at_c_passwd
q2 select max(o_id) from orders where
o_c_id_at_c_id
q3 select customer., orders., address.
country. from customer,address,country,orders
where o_id_at_o_id and c_id_at_c_id and
o_bill_addraddr_id and addr_co_idco_id
q4 select address., country. from address,
country
where addr_id_at_a_ship_id and addr_co_idco_id
q5 select from order_line,item where
ol_i_idi_id and ol_o_id_at_o_id

split join query q3 into two queries
q3_1 select from customer where cid_at_c_id
q3_2 select orders., address. country. from
customer,address,country,orders
where o_id_at_o_id and o_bill_addraddr_id and
addr_co_idco_id
rewrite q1 to include the answer of q1 and q3_1
q1 select from customer where
c_uname_at_c_uname and c_passwd_at_c_passwd
reason c_uname and c_passwd is the key.
benefit access table customer only once, and
avoid join customer with other tables.
disadvantage retrieve more data when the
customer has no orders.
question can we rewrite q3_2 to include the
answer of q3_2 and q4?
select orders., a1., a2. c1., c2. from
address a1, address a2, country c1, country c2,
orders
where o_id_at_o_id and o_bill_addra1.addr_id and
a1.addr_co_idc1.co_id
and o_ship_addra2.addr_id and
a2.addr_co_idc2.co_id
reason o_ship_addr , addr_co_id are foreign
key.
benefit make use of foreign key constraints.
disadvantage introduce new joins.

18
Experiment Result (5) TPC-C Benchmark

Pseudo code for order status transaction
q1 EXEC SQL select count(c_id) INTO namecnt
from customer where
c_lastc_last AND c_d_idd_id AND c_w_idw_id
q2EXEC SQL DECLARE c_name CURSOR FOR
select c_balance, c_first, c_middle, c_id
from customer where c_lastc_last AND
c_d_idd_id AND c_w_idw_id
order by c_first
EXEC SQL OPEN c_name
q3 if (namecnt2) namecnt // Locate midpoint
customer
for (n0 nltnamecnt/2 n)
q4 EXEC SQL FETCH c_name INTO c_balance,
c_first, c_middle, c_id
q5 EXEC SQL CLOSE c_name

Solution
Execute q2 instead of q1 when q1 is submitted.
q1 can be answered by retrieving the number of
rows returned by the cursor.
Reason
q2 always follows q1 (i.e., the confidence is
1.0).
q1 and q2 have similar query predicate (except
the order by clause).
q2 contains q1 (i.e., the answer of q1 can be
answered by that of q2).
Benefit
Only submit one query to the server.
Only access the base relation once.
Improve performance when no customer meets the
condition since the original solution search the
database twice.
Disadvantage
Not a general solution, every DB has own function
to retrieve the number of rows of a cursor.
More
DB server can take advantage of such
relationship.

19
Conclusion and Future Work

Our solution has the several advantages
our caching algorithms are based on the query
execution orders and the semantic relationship
between queries, which are better than the
selection policies based on the global query
reference statistics.
It separates global cache with local cache, and
which will result in a better cache hit ratio.
Our SQL-Relay application is flexible and
extendable where various caching and rewriting
rules can be added and tested.
Future work
Finding user access pattern from database
workload.
Exploring more ways to use user access patterns.