The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS - PowerPoint PPT Presentation

About This Presentation

Title:

The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS

Description:

C(w1, w2, ..., wn, c) : co-occurrence of word groups. Each feature is estimated by a model ... Experiment: create two TEXTURE databases and compare across systems ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 36

Provided by: prad158

Learn more at: https://pages.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS

1
The TEXTURE Benchmark Measuring Performance of
Text Queries on a Relational DBMS

Vuk Ercegovac
David J. DeWitt
Raghu Ramakrishnan

2
Applications Combining Text and Relational Data
Query
SELECT SCORE, P.id, FROM Products P WHERE P.type
PDA and CONTAINS(P.complaint, short
battery life, SCORE) ORDER BY SCORE DESC
Score P.id
0.9 123
0.87 987
0.82 246

ProductComplaints
How should such an application be expected to
perform?
3
Possibilities for Benchmarking
Measure Workload Quality Response Time/ Throughput
Relational N/A TPC3, AS3AP10, Set Query8
Text TREC2, VLC21 FTDR4, VLC21
Relational Text ?? TEXTURE

1. http//es.csiro.au/TRECWeb/vlc2info.html
2. http//trec.nist.gov
3. http//www.tpc.org
4. S. DeFazio, Full-text Document Retrieval
Benchmark, chapter 8. Morgan Kaufman, 2 edition,
1993
8. P. ONeil. The Set Query Benchmark. The
Benchmark Handbook, 1991
10. C. Turbyfill, C. Orji, and D. Bitton. AS3AP-
a Comparative Relational Database Benchmark. IEEE
Compcon, 1989.

4
Contributions of TEXTURE

Design micro-benchmark to compare response time
using a mixed relational text query workload
Develop TextGen to synthetically grow a text
collection given a real text collection
Evaluate TEXTURE on 3 commercial systems

5
Why a Micro-benchmark Design?

A fine level of control for experiments is needed
to differentiate effects due to
How text data is stored
How documents are assigned a score
Optimizer decisions

6
Why use Synthetic Text?

Allows for systematic scale-up
Users current data set may be too small
Users may be more willing to share synthetic data

Measurements on synthetic data shown empirically
by us to be close to same measurements on real
data
7
A Note on Quality

Measuring quality is important!
Easy to quickly return poor results
We assume that the three commercial systems
strive for high quality results
Some participated at TREC
Large overlap between result sets

8
Outline

TEXTURE Components
Evaluation
Synthetic Text Generation

9
TEXTURE Components
System A
System B
Relational
Text Attributes
num_id num_u num_05 num_5 num_50 txt_short txt_long
pkey un-clustered indexes un-clustered indexes un-clustered indexes un-clustered indexes display body
10
Overview of Data

Schema based on Wisconsin Benchmark 5
Used to control relational predicate selectivity
Relational attributes populated by DBGen 6
Text attributes populated by TextGen (new)
Input
D document collection, m scale-up factor
Output
D document collection with D x m documents
Goal Same response times for workloads on D and
corresponding real collection

5. D. DeWitt. The Wisconsin Benchmark Past,
Present, and Future. The Benchmark Handbook,
1991. 6. J. Gray, P. Sundaresan, S. Englert, K.
Baclawski, and P. J. Weinberger. Quickly
Generating Billion-record Synthetic Databases.
ACM SIGMOD, 1994
11
Overview of Queries

Query workloads derived from query templates with
following parameters
Text expressions
Vary number of keywords, keyword selectivity, and
type of expression (i.e., phrase, Boolean, etc.)
Keywords chosen from text collection
Relational expression
Vary predicate selectivity, join condition
selectivity
Sort order
Choose between relational attribute or score
Retrieve ALL or TOP-K results

12
Example Queries

Example of a single relation, mixed relational
and text
query that sorts according to a relevance
score.

SELECT SCORE, num_id, txt_short FROM R WHERE
NUM_5 3 and CONTAINS(R.txt_long, foo
bar, SCORE) ORDER BY SCORE DESC

Example of a join query, sorting according to a
relevance score on S.txt_long.

SELECT S.SCORE, S.num_id, S.txt_short FROM R,
S WHERE R.num_id S.num_id and S.NUM_05 2
and CONTAINS(S.txt_long, foo bar,
S.SCORE) ORDER BY S.SCORE DESC
13
Outline

TEXTURE Components
Evaluation
Synthetic Text Generation

14
Overview of Experiments

How is response time affected as the database
grows in size?
How is response time affected by sort order and
top-k optimizations?
How do the results change when input collection
to TextGen differs?

15
Data and Query Workloads

TextGen input is TREC AP Vol.11 and VLC2 2
Output relations w/ 1, 2.5, 5, 7.5, 10 x
84,678 tuples
Corresponds to 250 MB to 2.5 GB of text data
Text-only queries
Low (lt 0.03) vs. high selectivity (lt 3)
Phrases, OR, AND
Mixed, single relation queries
Low (lt0.01) vs. high selectivity (5)
Pair with all text-only queries
Mixed, multi relation queries
2, 3 relations, vary text attribute used, vary
selectivity
Each query workload consists of 100 queries

1. http//es.csiro.au/TRECWeb/vlc2info.html
2. http//trec.nist.gov

16
Methodology for Evaluation

Setup database and query workloads
Run workload per system multiple times to obtain
warm numbers
Discard first run, report average of remaining
Repeat for all systems (A, B, C)
Platform Microsoft Windows 2003 Server, dual
processor 1.8 GHz AMD, 2 GB of memory, 8 120 GB
IDE drives

17
Scaling Text-Only Workloads

How does response time vary per system as the
data set scales up?
Query workload low text selectivity (0.03)
Text data synthetic based on TREC AP Vol. 1

18
Mixed Text/Relational Workloads

Drill down on scale factor 5 (450K tuples)
Query workload Low text selectivity (0.03)
Query workload High text selectivity (3)
Do the systems take advantage of relational
predicate for mixed workload queries?
Query workload Mix High text, low relational
selectivity (0.01)

Workload System Low
A 2.8
B 30
C 2.6
High
71
140
28
Mix
69 (97)
97 (69)
21 (75)
Seconds per system and workload (synthetic TREC)
19
Top-k vs. All Results

Compare retrieving all vs. top-k results
Query workload is Mix from before
High selectivity text expression (3)
Low selectivity relational predicate (0.01)

Workload System All Top-k
A 69 2.6
B 97 96
C 28 2.2
Seconds per system and workload (450K tuples,
synthetic TREC)
20
Varying Sort Order

Compare sorting by score vs. sorting by
relational attribute
When retrieving all, results similar to previous
Results for retrieving top-k shown below

Workload System Score Relational
A 2.6 2.7
B 96 715
C 2.2 2.2
Seconds per system and workload (450K tuples,
synthetic TREC)
21
Varying the Input Collection

What is the effect of different input text
collections on response time?
Query workload low text selectivity (0.03)
All results retrieved
Text Data synthetic TREC and VLC2

Collection System Synthetic TREC Synthetic VLC2
A 2.9 1.2
B 30 3.6
C 2.5 1.6
Seconds per system and collection (450K tuples)
22
Outline

Benchmark Components
Evaluation
Synthetic Text Generation

23
Synthetic Text Generation

TextGen
Input document collection D, scale-up factor m
Output document collection D with D x m
documents
Problem Given documents D, how do we add
documents to obtain D ?
Goal Same response times for workloads on D and
corresponding real collection C, CD
Approach Extract features from D and draw D
samples according to features

24
Document Collection Features

Features considered
W(w,c) word distribution
G(n, v) vocabulary growth
U,L number of unique, total words per document
C(w1, w2, , wn, c) co-occurrence of word
groups
Each feature is estimated by a model
Ex. Zipf11 or empirical distribution for W
Ex. Heaps Law for G7

7. H. S. Heaps, Information Retrieval,
Computational and Theoretical Aspects. Academic
Press, 1978. 11. G. Zipf. Human Behavior and the
Principle of Least Effort An Introduction to
Human Ecology. Hafner Publications, 1949.
25
Process to Generate D

Pre-process estimate features
Depends on model used for feature
Generate D documents
Generate each document by sampling W according to
U and L
Grow vocabulary according to G
Post-process Swap words between documents in
order to satisfy co-occurrence of word groups C

26
Feature-Model Combinations

Considered 3 instances of TextGen, each a
combination of features/models

Feature TextGen W (Word distr.) G (Vocab) L (Length) U (Unique) C (co-occur.)
Synthetic1 Zipf Heaps Average N/A N/A
Synthetic2 Empirical Heaps Average Average N/A
Synthetic3 Empirical Heaps Average Average Empirical
27
Which TextGen is a Good Generator?

Goal response time measured on synthetic (S) and
real (D) should be similar across systems
Does the use of randomized words in D affect
response time accuracy?
How does the choice of features and models effect
response time accuracy as the data set scales?

28
Use of Random Words

Words are strings composed of a random
permutation of letters
Random words are useful for
Vocabulary growth
Sharing text collections
Do randomized words affect measured response
times?
What is the affect on stemming, compression, and
other text processing components?

29
Effect of Randomized Words

Experiment create two TEXTURE databases and
compare across systems
Database AP based on TREC AP Vol. 1
Database R-AP randomize each word in AP
Query workload low high selectivity keywords
Result response times differ on average by lt 1,
not exceeding 4.4
Conclusion using random words is reasonable for
measuring response time

30
Effect of Features and Models

Experiment compare response times over same
sized synthetic (S) and real (D) collections
Sample s documents of D
Use TextGen to produce S at several scale factors
S 10, 25, 50, 75, and 100 of D
Compare response time across systems
Must repeat for each type of text-only query
workload
Used as framework for picking features/models

31
TextGen Evaluation Results

How does response time measured on real data
compare to the synthetic TextGen collections?
Query workload low selectivity text only query
(0.03)
Graph is for System A
Similar results obtained for other systems

32
Future Work

How should quality measurements be incorporated?
Extend the workload to include updates
Allow correlations between attributes when
generating database

33
Conclusion

We propose TEXTURE to fill the gap seen by
applications that use mixed relational and text
queries
We can scale-up a text collection through
synthetic text generation in such a way that
response time is accurately reflected
Results of evaluation illustrate significant
differences between current commercial relational
systems

34
References

http//es.csiro.au/TRECWeb/vlc2info.html
http//trec.nist.gov
http//www.tpc.org
S. DeFazio, Full-text Document Retrieval
Benchmark, chapter 8. Morgan Kaufman, 2 edition,
1993
D. DeWitt. The Wisconsin Benchmark Past,
Present, and Future. The Benchmark Handbook,
1991.
J. Gray, P. Sundaresan, S. Englert, K. Baclawski,
and P. J. Weinberger. Quickly Generating
Billion-record Synthetic Databases. ACM SIGMOD,
1994
H. S. Heaps, Information Retrieval, Computational
and Theoretical Aspects. Academic Press, 1978.
P. ONeil. The Set Query Benchmark. The Benchmark
Handbook, 1991
K. A. Shoens, A. Tomasic, H. Garcia-Molina.
Synthetic Workload Performance Analysis of
Incremental Updates. In Research and Development
in Information Retrieval, 1994.
C. Turbyfill, C. Orji, and D. Bitton. AS3AP- a
Comparative Relational Database Benchmark. IEEE
Compcon, 1989.
G. Zipf. Human Behavior and the Principle of
Least Effort An Introduction to Human Ecology.
Hafner Publications, 1949.