Estimating Costs of Path Expression Evaluation in Distributed Object Databases - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Estimating Costs of Path Expression Evaluation in Distributed Object Databases

Description:

Estimating selectivity factors is an essential task in cost-based optimization techniques ... Selectivity of nested predicates ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 24
Provided by: gabrielalo
Category:

less

Transcript and Presenter's Notes

Title: Estimating Costs of Path Expression Evaluation in Distributed Object Databases


1
Estimating Costs of Path Expression Evaluation in
Distributed Object Databases
  • Gabriela Ruberg Fernanda Baião Marta
    Mattoso
  • (gruberg, baiao, marta _at_cos.ufrj.br)

2
Outline
  • Motivation
  • Path Expression Evaluation
  • An Analytical Cost Model for Query Processing in
    Distributed Object Databases
  • Selectivity Factor of Path Expressions
  • Cost Model Analysis
  • Related Work
  • Conclusions and Ongoing Work

3
Motivation
  • Modeling query processing performance is a hard
    task in object DBMSs
  • partial participation of collections in
    relationships
  • NxM relationships by the use of set attributes
  • object sharing
  • object clustering on disk
  • pointer-based (navigation) algorithms
  • binary and n-ary query operators ...

4
Strategies for Path Expression Evaluation
  • Basic dimensions
  • Evaluation direction
  • Forward or reverse
  • Query operator (algorithm)
  • Binary (join) or n-ary (naïve)
  • Performance may significantly vary
  • Selectivity factors of selection predicates
  • Participation of collections in PE relationships

5
An Analytical Cost Model
  • Four model pillars
  • Object Data Model
  • Data Storage Model
  • Query Execution Model
  • Distributed Database Design

6
An Analytical Cost Model
  • Object Data Model
  • Set attributes
  • Object sharing
  • Path expressions selectivity factors
  • Data Storage Model
  • Physical clustering of objects in disk
  • Collection extension
  • Relationship references

7
An Analytical Cost Model
  • Query Execution Model
  • Typical strategies and algorithms
  • Small memory hypothesis
  • IO reload overhead due to object sharing
  • Database Distribution Design
  • Several techniques for data fragmentation
  • Object allocation

8
Cost Model Overview
SELECTIVITY ESTIMATES
SELECTIVITY ESTIMATES
DATA-VOLUME ESTIMATES
DB STATS
IO, CPU, and COMMUNICATION COSTS ESTIMATES
9
Selectivity of Path Expressions
  • Estimating selectivity factors is an essential
    task in cost-based optimization techniques
  • Combination of
  • Evaluation direction
  • Participation of object collections in PE
    relationships
  • Selectivity of nested predicates
  • Traditional prediction methods present high
    computational costs and may lead to errors

10
Selectivity of Path Expressions
PEa.to-t (t) Measured 300000
  • An example
  • Traditional estimation methods
  • Our method

37
11
Selectivity of Path Expressions
START
12
Fragmentation Effects
  • Cost models for centralized query processing can
    not be directly applied in a distributed context
  • Some data fragment may be previously disregarded
    during the query execution
  • Different OO fragmentation techniques
  • Horizontal Fragmentation
  • Primary or Derived
  • Vertical Fragmentation

13
Fragmentation Effects
14
Cost Model Analysis
  • Experimental validation
  • OO7 benchmark
  • GOA ODBMS prototype at COPPE/UFRJ
  • Cost model estimates are very close to
    experimental results in all scenarios
  • Simulated scenarios have been obtained by varying
    cost parameters

15
Experimental Validation
  • IO analysis in OO7 centralized database

16
Experimental Validation
  • Distributed database (horizontal fragmentation)
  • Q1-Fa.part_of.document(datelt10/01/1990)

17
Simulated Scenario
  • Object-sharing degree versus IO costs

18
Related Work
  • Physical object clustering
  • Gardarin, Gruser, and Tang, VLDB 1995
  • Different PE evaluation strategies
  • Gardarin, Gruser, and Tang, VLDB 1996
  • Ozkan, Dogac, and Altinel, Jornal DB Management
    1996
  • Selectivity factors of path expressions
  • Cho, Park, Whang, and Son, Information Systems
    1996
  • Bertino, and Foscoli, IEEE TKDE 1997
  • Cho, Han, Hong, and Whang, CIKM 2000
  • Cost metrics to support distributed object
    database design
  • Fung, Karlapalem, and Li, DASFAA 1997
  • Bellatreche, Karlapalem, and Basak, DEXA 1998
  • Bellatreche, L., Karlapalem, K., Li, Q., ER 1998
  • Ezeife, and Zheng, IDEAS 1999

19
Conclusions
  • A realistic cost model is not obtained with a
    simple combination of relevant issues into a
    single model
  • These issues are strongly related to each other,
    therefore they have to be remodeled.

20
Conclusions
  • Traditional methods for the estimation of PE
    selectivity are not suited for frequent queries
  • Low selectivity factors
  • It is import to consider several alternative
    strategies (algorithms, evaluation direction) for
    path expression evaluation
  • Otherwise the cost model can be biased to bad
    choices

21
Contributions
  • A realistic cost model that encompasses several
    issues in object query processing
  • PE selectivity
  • Participation of object collections in PE
    relationships
  • NxM relationships
  • Set attributes and object sharing
  • Typical PE evaluation strategies
  • IO overhead in small memory hypothesis
  • Distributed database design
  • Communication costs

yet fairly simple!
22
Ongoing Work
  • Cost model extensions
  • Semistructured and XML data
  • Indexes
  • Data replication
  • New algorithms, etc.
  • Path expression evaluation in distributed, XML
    databases with NxM relationships
  • XLink and linkbases

23
Estimating Costs of Path Expression Evaluation in
Distributed Object DatabasesGabriela Ruberg
Fernanda Baião Marta Mattoso(gruberg,
baiao, marta _at_cos.ufrj.br)
  • Detailed version available at
  • http//www.cos.ufrj.br/gruberg/ruberg2001_english
    .pdf
Write a Comment
User Comments (0)
About PowerShow.com