Overview of Query Evaluation - PowerPoint PPT Presentation

About This Presentation
Title:

Overview of Query Evaluation

Description:

Chapter 12 Ramakrishnan & Gehrke (Sections 12.1-12.3) What will you learn from this lecture? Query Evaluation basics System catalog (revisit) Generic techniques for ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 18
Provided by: RaghuRama104
Category:

less

Transcript and Presenter's Notes

Title: Overview of Query Evaluation


1
Overview of Query Evaluation
  • Chapter 12 Ramakrishnan Gehrke
  • (Sections 12.1-12.3)

2
What will you learn from this lecture?
  • Query Evaluation basics
  • System catalog (revisit)
  • Generic techniques for operator evaluation
  • Access paths
  • Implementation choices for select/project.

3
Some basics to be borne in mind.
  • Recall ?3 alternatives for data entry
    representation.
  • An index may be clustered or unclustered.
  • An index may be dense or sparse
  • E.g., students(ID, name, addr, dept) file sorted
    on ID and indexed on ID (alternative 2) and on
    dept. (alt. 3). ID index sparse dept. index
    dense.
  • Index on ID is a primary index while that on
    dept. is a secondary index.
  • A file may have multiple indexes.

4
Schema for Examples
Songs (sid integer, sname string, genre
string, year date) Ratings (uid integer, sid
integer, time date, rating integer)
  • Assume the following sizes
  • Songs
  • Each tuple is 50 bytes long, 80 tuples per page,
    500 pages. ( index pages if it had a hash
    index with bktpg, and with data entries
    alternative 1). what are we assuming here?
  • Ratings
  • Each tuple is 40 bytes long, 100 tuples per
    page, 1000 pages. ( leaf pages if it had a
    Btree index, with data entries alternative 1,
    and pages filled to capacity).

5
System catalog
  • maintains important metadata about all data in
    the db.
  • what is data?
  • application data in tables
  • also all indexes for the tables
  • system catalog catalog tables data dictionary
    catalog.

6
System catalog
  • what info. is maintained in the catalog?
  • static info.
  • ?table table name, file name, file structure
    (e.g., sorted/heap file), attribute name(s) and
    type(s), index name(s), primary and foreign key
    constraints.
  • ?index index name and structure (e.g., hash vs.
    Btrees, clustered or not), search key
    attribute(s).
  • auxiliary info. on views.

7
System catalog
  • statistical info. ? relation
  • cardinality tuples
  • size pages
  • index cardinality distinct keys
  • index size index pages (leaf)
  • index height only for tree indexes
  • index range min/max key values.
  • most of the static info. stored as a table
    (already discussed).
  • may maintain additional info. e.g., Histograms.

8
Generic Techniques for Operator Evaluation
  • indexing use index to examine only tuples
    satisfying given selection/join condition. (also
    called index probe.)
  • iteration scan a table (i.e., data pages) to
    retrieve tuples satisfying condition (table scan)
    OR scan index pages and examine data entries
    therein (index scan).
  • when is the latter feasible? why is it a good
    idea?
  • the actual index structure is NOT used. (i.e.,
    index scan ! index probe.)
  • partitioning sorting and hashing used for this
    purpose. (commonly needed for group-bys,
    duplicate elimination, etc.)

9
Generic Techniques reviewed
  • Table scan read the whole table (usually, from
    disk).
  • Index scan read all data entries in the index
    file (makes sense for alternatives 2 and 3).
  • Have to read much less than for table scan.
  • Can we use index scan for alt. 1?
  • What if we had alternative 2 but a sparse index?
  • Index probe use index to home in on tuples
    satisfying selection condition(s).
  • Typically, best option for equality selection.

10
Some important terms
  • access path method for accessing tuples of a
    table
  • Table/index scan OR
  • index plus matching selection condition. (only
    consider conjunctions C of attr op value.) ?
    consider probing the index.
  • when does an index match a selection condition C
    ?
  • hash index for each search key attribute A, C
    has a conjunct of the form A v.
  • Btree index for some prefix of search key, for
    every attribute A in the prefix, C has a conjunct
    of the form A op v.
  • why treat match differently? what does match do
    for us?

11
Access Paths
  • suppose we have a hash index on uid, sid, time.
    does it match uid1 sid2 time3? what about
    uid1 sid2? and uid1 time3? uid1 sid2
    time gt 2?
  • when a hash index matches a selection condition,
    we can fetch just those tuples that satisfy it.
  • when the match (between the index and selection
    condition) covers some (but not all) conjuncts in
    it, we can still fetch those tuples that satisfy
    all matched conjuncts and check them for
    satisfaction of remaining conditions. Clustered
    vs. unclustered index makes a big difference!

12
Access Paths
  • suppose we have a Btree index on (uid, sid,
    time). does it match uid1 sid2 timegt3?
    what about uid 1 sid2? and uid1 timegt3?
  • when there is a match, we can exploit the Btree
    index to fetch tuples satisfying the matched
    conjuncts.
  • if we had separate indexes on uid, sid and on
    time what are our options for timegt3 uid1
    sid2?
  • use one of the indexes and verify unmatched
    conditions.
  • use both indexes and intersect rid sets.

13
Reduction Factor
  • when there is total match, can fetch exactly
    tuples that satisfy given condition.
  • if not, can fetch tuples satisfying the matched
    conjuncts.
  • reduction factor fraction of tuples in table
    that satisfy a conjunct.
  • smaller the RF ? more selective the access path
    using the index.
  • RF often estimated using independence assumption.

14
Operator Take 1 Algorithms (Selection)
  • ?A op val(R).
  • No index? ? table scan (plan 1).
  • If there are indexes, use one to fetch tuples
    satisfying matched conjuncts and verify rest
    (plan 2).
  • E.g. sname lt S year2007 on Songs.
  • Approx. 70 RF for sname lt S (uniformity
    assumption extremely approximate for this
    example).
  • Clustered Btree index on sname 70 x 500 350
    pages.
  • Unclustered Btree index 70 x 40,000 28,000
    tuples could be 28,000 pages (I/Os) in the
    worst case!
  • Better off scanning table when RF gt 1.25 in this
    example!
  • Can you construct a similar situation for
    Ratings?

15
Some explanations
  • Where did RF70 come from? Assuming snames
    starting with each letter are equally likely, for
    sname lt S, there are 18 possible cases in
    range out of a total of 26 possible cases (26
    letters of the alphabet). This gives 18/26 which
    is approx. 70.
  • Why are we better off scanning the table when RF
    gt 1.25? Because 40000 1.25 500 tuples which
    might in the worst case cost us 500 I/Os. Even if
    we did a table scan it would cost us the same!

16
Projection -- ??uid,sid(Ratings).
  • Straightforward op key challenge duplicate
    elimination can be expensive.
  • Avoid DE if you can. (e.g., no DISTINCT in select
    clause.)
  • Simple table scan or index scan when index keys
    include project fields.
  • DE Plan 1 use partitioning
  • Scan Ratings and write out (uid,sid) pairs
    handshake with phase 1 of external sorting.
  • Handshake last pass of phase 2 with DE.

17
Projection (contd.)
  • Plan 2 If index data entries include uid, sid,
    then sort data entries as opposed to records.
  • Plan 3 Suppose there is a clustered index on
    (uid,sid,ltwhatevergt), simply retireve data
    entries and DE on the fly. (since they are
    already sorted.)
  • Note both plans above scan index file rather
    than data file! (But you look at the data entries
    in the index file.)
Write a Comment
User Comments (0)
About PowerShow.com